ensuring valid latin-1

Chris Curvey

Hey all,

I'm trying to write something that will "fail fast" if one of my users
gives me non-latin-1 characters. So I tried this:

>>testString = "\x80"
foo = unicode(testStr ing, "latin-1")
foo

u'\x80'

I would have thought that that should have raised an error, because
\x80 is not a valid character in latin-1 (according to what I can
find). Is this the expected behavior, or am I missing something?

I'm on Windows, but I have explicitly set the character set to be
latin-1 in sitecustomize.p y

>>import sys
sys.getdefaul tencoding()

'latin-1'

Nov 29 '06 #1

Subscribe Reply

1695

John Machin

Chris Curvey wrote:

Hey all,

I'm trying to write something that will "fail fast" if one of my users
gives me non-latin-1 characters. So I tried this:

>testString = "\x80"
foo = unicode(testStr ing, "latin-1")
foo

u'\x80'

I would have thought that that should have raised an error, because
\x80 is not a valid character in latin-1 (according to what I can
find). Is this the expected behavior, or am I missing something?

Depends on what you call 'latin-1'. The standard ISO 8859-1 defined
only displayable characters. If you used that definition, even the
basic ASCII carriage return, line feed and tab would raise an error.
However, according to wikipedia:

"""In 1992, the IANA registered the character map ISO_8859-1:1987, more
commonly known by its preferred MIME name of ISO-8859-1 (note the extra
hyphen over ISO 8859-1), a superset of ISO 8859-1, for use on the
Internet. This map assigns the C0 and C1 control characters to the code
values 00-1F, 7F, and 80-9F. It thus provides for 256 characters
via every possible 8-bit value."""

'latin-1' and 'iso-8859-1' are the same encoding.

If you articulate your definition of "valid latin-1", we should be able
to help you with some Python code to check it for you.

>
I'm on Windows, but I have explicitly set the character set to be
latin-1 in sitecustomize.p y

Why??

Don't do that. That's a self-inflicted double whammy.
(1) You should *not* assume that all the legacy str data your machine
will ever process is in only one encoding.
(2) On a Windows machine, your legacy data is extremely likely to be
encoded in a Microsoft-developed encoding (like cp1252), not latin-1.

>

>import sys
sys.getdefault encoding()

'latin-1'

HTH,
John

Nov 29 '06 #2

Similar topics

6174

convert utf-8 to latin-1?

by: Uwe Mayer | last post by:

Hi, in a PyQt application I'm writing, the user can select a filename with a FileDialog. Then I would like to open this file. This works fine if there are no "special symbols" in the filename, such as the german umlauts: "Ã–Ã„Ãœ" Currently I convert the returned, of what I'm pretty sure its a utf-8 string object, to a normal string using...

Python

8324

Latin & Arabic Characters on the Same Page

by: thinkfirst | last post by:

Hello CIWAH ... I want to propose full internationalization of three related websites: http://africadatabase.org/ http://people.africadatabase.org/ http://institutions.africadatabase.org/ My role is mainly advisory and server management. I have very little to do with content or page generation, so it's not something I can do myself -- I...

HTML / CSS

9689

UTF-8 and Latin-1 characters

by: Arne | last post by:

Since I am Swedish, I write website content mostly in Swedish language and using charset iso-8859-1. I have (just for testing) tried to use utf-8 on a test page ( http://w1.978.telia.com/~u97802964/test.html ) but the special Swedish characters don't come out right if I dont use entities for them. The Swedish characters in question is:...

HTML / CSS

6888

Regular expression: non-latin word/non-word characters and UTF-8

by: Markus Ernst | last post by:

Hi I wrote a function that "normalizes" strings for use in URLs in a UTF-8 encoded content administration application. After having removed the accents from latin characters I try to remove all non-word characters from the string: // PCRE syntax: $string = preg_replace("/(+)/", "-", $string);

PHP

1549

ensuring html is valid before storing it

by: Peter Hardy | last post by:

Hi guys, Sorry for the cross-post but I got no response in the asp.net newsgroup. I am trying to develop a mini e-learning application where the user provides content for each page. Eventually, I'd like to shift to using templates but at the moment the users is just entering content using html. Whats the best way to allow the user to do...

.NET Framework

6148

ISO-Latin Encoding

by: Andy | last post by:

I am trying to write a for loop that will print all the ISO-Latin characters to a database. However: I am not sure exactly how to go about printing the ISO-Latin character set. Would anyone be able to give me some pointers? I think I have to use Encoding eISOLatin = Encoding.GetEncoding(28591); but after this: I am a bit lost. Thanks...

C# / C Sharp

833

Ensuring users have provided valid html

by: Peter Hardy | last post by:

Hi guys, I am trying to develop a mini e-learning application where the user provides content for each page. Eventually, I'd like to shift to using templates but at the moment the users is just entering content using html. Whats the best way to allow the user to do this and whats the best way of ensuring the html is valid before I store it...

ASP.NET

2384

UTF-8 to unicode or latin-1 (and yes, I read the FAQ)

by: NoelByron | last post by:

Hi! I'm struggling with the conversion of a UTF-8 string to latin-1. As far as I know the way to go is to decode the UTF-8 string to unicode and then encode it back again to latin-1? So I tried: 'K\xc3\xb6ni'.decode('utf-8') # 'K\xc3\xb6ni' should be 'König', contains a german 'umlaut'

Python

22797

C++ Translate English Sentence to Pig Latin

by: ahoway | last post by:

I am having problems entering a sentence for translating into pig latin. It is set up now to read the entire sentence as one word. I would like to know how to look at each word in the sentence so that each word is translated to Pig latin seperately. Attached is my code: Thank you for your time. the out put for hello world comes out :ello...

C / C++

7659

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...

General

7580

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...

Windows Server

8103

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...

Online Marketing

6244

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...

Career Advice

5481

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...

Microsoft Access / VBA

5208

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...

C# / C Sharp

3618

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

2079

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

C# / C Sharp

916

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

General