473,563 Members | 2,635 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

ensuring valid latin-1

Hey all,

I'm trying to write something that will "fail fast" if one of my users
gives me non-latin-1 characters. So I tried this:
>>testString = "\x80"
foo = unicode(testStr ing, "latin-1")
foo
u'\x80'

I would have thought that that should have raised an error, because
\x80 is not a valid character in latin-1 (according to what I can
find). Is this the expected behavior, or am I missing something?

I'm on Windows, but I have explicitly set the character set to be
latin-1 in sitecustomize.p y
>>import sys
sys.getdefaul tencoding()
'latin-1'

Nov 29 '06 #1
1 1695
Chris Curvey wrote:
Hey all,

I'm trying to write something that will "fail fast" if one of my users
gives me non-latin-1 characters. So I tried this:
>testString = "\x80"
foo = unicode(testStr ing, "latin-1")
foo
u'\x80'

I would have thought that that should have raised an error, because
\x80 is not a valid character in latin-1 (according to what I can
find). Is this the expected behavior, or am I missing something?
Depends on what you call 'latin-1'. The standard ISO 8859-1 defined
only displayable characters. If you used that definition, even the
basic ASCII carriage return, line feed and tab would raise an error.
However, according to wikipedia:

"""In 1992, the IANA registered the character map ISO_8859-1:1987, more
commonly known by its preferred MIME name of ISO-8859-1 (note the extra
hyphen over ISO 8859-1), a superset of ISO 8859-1, for use on the
Internet. This map assigns the C0 and C1 control characters to the code
values 00-1F, 7F, and 80-9F. It thus provides for 256 characters
via every possible 8-bit value."""

'latin-1' and 'iso-8859-1' are the same encoding.

If you articulate your definition of "valid latin-1", we should be able
to help you with some Python code to check it for you.
>
I'm on Windows, but I have explicitly set the character set to be
latin-1 in sitecustomize.p y
Why??

Don't do that. That's a self-inflicted double whammy.
(1) You should *not* assume that all the legacy str data your machine
will ever process is in only one encoding.
(2) On a Windows machine, your legacy data is extremely likely to be
encoded in a Microsoft-developed encoding (like cp1252), not latin-1.
>
>import sys
sys.getdefault encoding()
'latin-1'
HTH,
John

Nov 29 '06 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
6174
by: Uwe Mayer | last post by:
Hi, in a PyQt application I'm writing, the user can select a filename with a FileDialog. Then I would like to open this file. This works fine if there are no "special symbols" in the filename, such as the german umlauts: "ÖÄÜ" Currently I convert the returned, of what I'm pretty sure its a utf-8 string object, to a normal string using...
17
8324
by: thinkfirst | last post by:
Hello CIWAH ... I want to propose full internationalization of three related websites: http://africadatabase.org/ http://people.africadatabase.org/ http://institutions.africadatabase.org/ My role is mainly advisory and server management. I have very little to do with content or page generation, so it's not something I can do myself -- I...
10
9689
by: Arne | last post by:
Since I am Swedish, I write website content mostly in Swedish language and using charset iso-8859-1. I have (just for testing) tried to use utf-8 on a test page ( http://w1.978.telia.com/~u97802964/test.html ) but the special Swedish characters don't come out right if I dont use entities for them. The Swedish characters in question is:...
1
6888
by: Markus Ernst | last post by:
Hi I wrote a function that "normalizes" strings for use in URLs in a UTF-8 encoded content administration application. After having removed the accents from latin characters I try to remove all non-word characters from the string: // PCRE syntax: $string = preg_replace("/(+)/", "-", $string);
3
1549
by: Peter Hardy | last post by:
Hi guys, Sorry for the cross-post but I got no response in the asp.net newsgroup. I am trying to develop a mini e-learning application where the user provides content for each page. Eventually, I'd like to shift to using templates but at the moment the users is just entering content using html. Whats the best way to allow the user to do...
9
6148
by: Andy | last post by:
I am trying to write a for loop that will print all the ISO-Latin characters to a database. However: I am not sure exactly how to go about printing the ISO-Latin character set. Would anyone be able to give me some pointers? I think I have to use Encoding eISOLatin = Encoding.GetEncoding(28591); but after this: I am a bit lost. Thanks...
0
833
by: Peter Hardy | last post by:
Hi guys, I am trying to develop a mini e-learning application where the user provides content for each page. Eventually, I'd like to shift to using templates but at the moment the users is just entering content using html. Whats the best way to allow the user to do this and whats the best way of ensuring the html is valid before I store it...
10
2384
by: NoelByron | last post by:
Hi! I'm struggling with the conversion of a UTF-8 string to latin-1. As far as I know the way to go is to decode the UTF-8 string to unicode and then encode it back again to latin-1? So I tried: 'K\xc3\xb6ni'.decode('utf-8') # 'K\xc3\xb6ni' should be 'König', contains a german 'umlaut'
10
22797
by: ahoway | last post by:
I am having problems entering a sentence for translating into pig latin. It is set up now to read the entire sentence as one word. I would like to know how to look at each word in the sentence so that each word is translated to Pig latin seperately. Attached is my code: Thank you for your time. the out put for hello world comes out :ello...
0
7659
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
7580
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
8103
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
0
6244
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
1
5481
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
5208
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3618
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2079
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
0
916
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.