473,322 Members | 1,347 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,322 software developers and data experts.

trouble w/ unicode file

Hi there,

I have a python source file encoded in unicode(utf-8) with some
iso8859-1 strings. I've encoded this file as utf-8 in the hope that
python will understand these strings as unicode (<type 'unicode'>)
strings whithout the need to use unicode() or u"" on these strings. But
this didn't happen.

Am I expecting something that really shoudn't happen or we have a bug?

This is the test i've made:
$cat bar.py
#-*- coding: utf-8 -*-
x = 'ééééáááááííí'
print x, type(x)

$python
Python 2.3.3 (#2, Jan 4 2004, 12:24:16)
[...]
import bar

ééééáááááÃ*Ã*Ã* <type 'str'>

Thanks in advance,
[]'s
Guilherme Salgado

--
This email has been inspected by Hans Blix, who has reported that no
weapons of mass destruction were used in its construction.
Read his report here:
<http://www.un.org/apps/news/infocusnewsiraq.asp?NewsID=414&sID=6>
Jul 18 '05 #1
4 1884

"Guilherme Salgado" <sa*****@freeshell.org> wrote in message news:ma**************************************@pyth on.org...
Hi there,

I have a python source file encoded in unicode(utf-8) with some
iso8859-1 strings. I've encoded this file as utf-8 in the hope that
python will understand these strings as unicode (<type 'unicode'>)
strings whithout the need to use unicode() or u"" on these strings. But
this didn't happen.
You hoped, but you forgot to pray <wink> Why do you think Python
should behave this way? There is (an experimental?) option -U that
forces all string literals to be unicode. Obviously if you use this option
your sources won't be easily distributable to other people

C:\Python23>python -U
Python 2.3.3 (#51, Dec 18 2003, 20:22:39) [MSC v.1200 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
type('a')

<type 'unicode'>
Am I expecting something that really shoudn't happen or we have a bug?


We have a bug here as well. But in your code. The coding must
be the same as the coding of your source file. bar.py must be:
#-*- coding: latin-1 -*-
x = 'ééééáááááííí'
print x, type(x)

-- Serge.
Jul 18 '05 #2
On Sun, 2004-01-25 at 06:54, Serge Orlov wrote:
"Guilherme Salgado" <sa*****@freeshell.org> wrote in message news:ma**************************************@pyth on.org...
Hi there,

I have a python source file encoded in unicode(utf-8) with some
iso8859-1 strings. I've encoded this file as utf-8 in the hope that
python will understand these strings as unicode (<type 'unicode'>)
strings whithout the need to use unicode() or u"" on these strings. But
this didn't happen.
You hoped, but you forgot to pray <wink> Why do you think Python
should behave this way? There is (an experimental?) option -U that


Ok, ok. I'll remember to pray next time. :-)
I need to store unicode strings(declared in files) in ZODB, but i don't
want to use u"" around all my strings (cause most of them are latin-1),
so i think storing the file as unicode will work. Is there a better way
for doing this?
forces all string literals to be unicode. Obviously if you use this option
your sources won't be easily distributable to other people
[...]
Am I expecting something that really shoudn't happen or we have a bug?


We have a bug here as well. But in your code. The coding must
be the same as the coding of your source file. bar.py must be:
#-*- coding: latin-1 -*-
x = 'ééééáááááííí'
print x, type(x)


I didn't understand this (even after some pray) :-)
My file is encoded in utf-8, look:
$ file bar.py
bar.py: UTF-8 Unicode text

Why should i declare it as latin1 encoded though?

[]'s
Guilherme Salgado

--
This email has been inspected by Hans Blix, who has reported that no
weapons of mass destruction were used in its construction.
Read his report here:
<http://www.un.org/apps/news/infocusnewsiraq.asp?NewsID=414&sID=6>
Jul 18 '05 #3

"Guilherme Salgado" <sa*****@freeshell.org> wrote in message news:ma**************************************@pyth on.org...
On Sun, 2004-01-25 at 06:54, Serge Orlov wrote:
"Guilherme Salgado" <sa*****@freeshell.org> wrote in message news:ma**************************************@pyth on.org...
Hi there,

I have a python source file encoded in unicode(utf-8) with some
iso8859-1 strings. I've encoded this file as utf-8 in the hope that
python will understand these strings as unicode (<type 'unicode'>)
strings whithout the need to use unicode() or u"" on these strings. But
this didn't happen.


You hoped, but you forgot to pray <wink> Why do you think Python
should behave this way? There is (an experimental?) option -U that


Ok, ok. I'll remember to pray next time. :-)
I need to store unicode strings(declared in files) in ZODB, but i don't
want to use u"" around all my strings (cause most of them are latin-1),
so i think storing the file as unicode will work. Is there a better way
for doing this?


Not that I'm aware of.
We have a bug here as well. But in your code. The coding must
be the same as the coding of your source file. bar.py must be:
#-*- coding: latin-1 -*-
x = 'ééééáááááííí'
print x, type(x)


I didn't understand this (even after some pray) :-)
My file is encoded in utf-8, look:
$ file bar.py
bar.py: UTF-8 Unicode text

Why should i declare it as latin1 encoded though?


Sorry, I was confused by your words "with some iso8859-1 strings".
I thought you were using simple (unaware of encodings) editor and
just added #-*- coding: utf-8 -*- with hope that it will work. You're
right the coding should stay utf-8. After that you have two options:
either use -U option or put u before every string.

-- Serge.


Jul 18 '05 #4
Serge Orlov wrote:
Sorry, I was confused by your words "with some iso8859-1 strings".
I thought you were using simple (unaware of encodings) editor and
just added #-*- coding: utf-8 -*- with hope that it will work. You're
right the coding should stay utf-8. After that you have two options:
either use -U option or put u before every string.


There is a third option: Programmatically convert the strings to
Unicode, e.g.

# -*- coding: utf-8 -*-
s = "ééééáááááÃ*Ã*Ã*"
s = unicode(s, 'utf-8')

This assumes that you know thy source encoding at the point of
conversion.

Regards,
Martin

Jul 18 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Michael Weir | last post by:
I'm sure this is a very simple thing to do, once you know how to do it, but I am having no fun at all trying to write utf-8 strings to a unicode file. Does anyone have a couple of lines of code...
1
by: marek | last post by:
trying this example to make print MatchObject reference. Fails (prints None). Does anybody know where I am wrong? # -*- coding: cp1251 -*- import re # pattern in Ukrainian ('привіт') p = ...
19
by: Svennglenn | last post by:
I'm working on a program that is supposed to save different information to text files. Because the program is in swedish i have to use unicode text for ÅÄÖ letters. When I run the following...
4
by: webdev | last post by:
lo all, some of the questions i'll ask below have most certainly been discussed already, i just hope someone's kind enough to answer them again to help me out.. so i started a python 2.3...
1
by: stevelooking41 | last post by:
Can someone explain why I don't seem unable to use document.write to produce a valid UTF-8 none breaking space sequence (Hex: C2A0) ? I've tried everyway I've been able to find to tell the...
20
by: Peter E. Granger | last post by:
I'm having a strange problem (or at least it seems strange to me) trying to display a MessageBox in a VC++ .NET forms application. If I put the call to MessageBox::Show in the form's .h file, it...
2
by: acc13 | last post by:
I have written a .dll that exports a class MyClass, which has a member function MyFunction(LPCWSTR szMyString). If I build (I'm using VC7) with the /showIncludes option, I can see that LPCWSTR...
3
by: Michael | last post by:
Hi all, I'm having trouble PInvoking a TCHAR within a struct. I'll paste the specific struct's API definition below. I've tried so many numerous variations. The main Win32 error I get is...
0
by: UncleRic | last post by:
Environment: Mac OS X (10.4.10) on MacBook Pro I'm a Perl Neophyte. I've downloaded the XML::Parser module and am attempting to install it in my working directory (referenced via PERL5LIB env): ...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.