473,249 Members | 1,229 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,249 software developers and data experts.

a question about unicode in python

i have a python source code test.py

# -*- coding: UTF-8 -*-

# s is a unicode string, include chinese
s = u''

then i run

$ python test.py
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-1:
invalid data

by in python interactive, it is right
>>s = u''
why?

Jun 12 '07 #1
5 1359
In <11*********************@i13g2000prf.googlegroups. com>, hzqij wrote:
i have a python source code test.py

# -*- coding: UTF-8 -*-

# s is a unicode string, include chinese
s = u'*三'

then i run

$ python test.py
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-1:
invalid data

by in python interactive, it is right
>>>s = u'*三'

why?
Does the "coding comment" match the actual encoding of the source file?

Ciao,
Marc 'BlackJack' Rintsch
Jun 12 '07 #2
hzqij schrieb:
i have a python source code test.py

# -*- coding: UTF-8 -*-

# s is a unicode string, include chinese
s = u''

then i run

$ python test.py
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-1:
invalid data

by in python interactive, it is right
>>>s = u''

why?

just an idea: is your text editor really supporting utf-8? In the mail
it is only displayed as '??' which looks for me as the mail editor did
not send the mail as utf. Try to attach a correct text file.

Jun 12 '07 #3
On 6/12/07, hzqij <hz*******@gmail.comwrote:
i have a python source code test.py

# -*- coding: UTF-8 -*-
As Marc pointed out, you should test the actual file encoding of the
program to check that it is, in fact, UTF-8 encoded. If you're on a
Unix/Linux system you should be able to test for a UTF-8 encoded file
using the "file" command, e.g.

evan@dhcp-10-10-7-101 ~ $ file ~/uni.py
/home/evan/uni.py: UTF-8 Unicode text

--
Evan Klitzke <ev**@yelp.com>
Jun 12 '07 #4
On Jun 12, 12:29 pm, "Evan Klitzke" <e...@yelp.comwrote:
On 6/12/07, hzqij <hzqij1...@gmail.comwrote:
i have a python source code test.py
# -*- coding: UTF-8 -*-

As Marc pointed out, you should test the actual file encoding of the
program to check that it is, in fact, UTF-8 encoded. If you're on a
Unix/Linux system you should be able to test for a UTF-8 encoded file
using the "file" command, e.g.

evan@dhcp-10-10-7-101 ~ $ file ~/uni.py
/home/evan/uni.py: UTF-8 Unicode text

--
Evan Klitzke <e...@yelp.com>
If you're using IDLE to edit the source with, you can set IDLE to
encode in utf8 by going to Options, Configure IDLE, General Tab, and
change the Default Source Encoding to utf-8.

Mike

Jun 12 '07 #5
2007/6/12, WolfgangZ <wo****@gmx.net>:
hzqij schrieb:
i have a python source code test.py

# -*- coding: UTF-8 -*-

# s is a unicode string, include chinese
s = u'*三'

then i run

$ python test.py
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-1:
invalid data

by in python interactive, it is right
>>s = u'*三'
why?

just an idea: is your text editor really supporting utf-8? In the mail
it is only displayed as '??' which looks for me as the mail editor did
not send the mail as utf. Try to attach a correct text file.
That must be your mail client, not his text editor or mail client. I
do see two Chinese characters in the message.

--
Andre Engels, an*********@gmail.com
ICQ: 6260644 -- Skype: a_engels
Jun 13 '07 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: yzzzzz | last post by:
Hi, I am writing my python programs using a Unicode text editor. The files are encoded in UTF-8. Python's default encoding seems to be Latin 1 (ISO-8859-1) or maybe Windows-1252 (CP1252) which...
4
by: Guilherme Salgado | last post by:
Hi there, I have a python source file encoded in unicode(utf-8) with some iso8859-1 strings. I've encoded this file as utf-8 in the hope that python will understand these strings as unicode...
2
by: Neil Schemenauer | last post by:
python-dev@python.org.] The PEP has been rewritten based on a suggestion by Guido to change str() rather than adding a new built-in function. Based on my testing, I believe the idea is...
1
by: Kenneth McDonald | last post by:
I am going to demonstrate my complete lack of understanding as to going back and forth between character encodings, so I hope someone out there can shed some light on this. I have always...
3
by: DurumDara | last post by:
Hi ! I need to speedup my MD5/SHA1 calculator app that working on filesystem's files. I use the Python standard modules, but I think that it can be faster if I use C, or other module for it. ...
0
by: Anthony Baxter | last post by:
SECURITY ADVISORY Buffer overrun in repr() for UCS-4 encoded unicode strings http://www.python.org/news/security/PSF-2006-001/ Advisory ID: PSF-2006-001 Issue Date: October 12, 2006...
19
by: Thomas W | last post by:
I'm getting really annoyed with python in regards to unicode/ascii-encoding problems. The string below is the encoding of the norwegian word "fdselsdag". I stored the string as "fdselsdag"...
7
by: JTree | last post by:
Hi,all I encountered a problem when using unicode() function to fetch a webpage, I don't know why this happenned. My codes and error messages are: Code: #!/usr/bin/python #Filename: test.py...
7
by: 7stud | last post by:
Based on this example and the error: ----- u_str = u"abc\u9999" print u_str UnicodeEncodeError: 'ascii' codec can't encode character u'\u9999' in position 3: ordinal not in range(128) ------
13
by: Liang Chen | last post by:
Hope you all had a nice weekend. I have a question that I hope someone can help me out. I want to run a Python program that uses Tkinter for the user interface (GUI). The program allows me to type...
0
by: abbasky | last post by:
### Vandf component communication method one: data sharing ​ Vandf components can achieve data exchange through data sharing, state sharing, events, and other methods. Vandf's data exchange method...
0
by: stefan129 | last post by:
Hey forum members, I'm exploring options for SSL certificates for multiple domains. Has anyone had experience with multi-domain SSL certificates? Any recommendations on reliable providers or specific...
0
Git
by: egorbl4 | last post by:
Скачал я git, хотел начать настройку, а там вылезло вот это Что это? Что мне с этим делать? ...
1
by: davi5007 | last post by:
Hi, Basically, I am trying to automate a field named TraceabilityNo into a web page from an access form. I've got the serial held in the variable strSearchString. How can I get this into the...
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, youll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: Aftab Ahmad | last post by:
Hello Experts! I have written a code in MS Access for a cmd called "WhatsApp Message" to open WhatsApp using that very code but the problem is that it gives a popup message everytime I clicked on...
0
by: marcoviolo | last post by:
Dear all, I would like to implement on my worksheet an vlookup dynamic , that consider a change of pivot excel via win32com, from an external excel (without open it) and save the new file into a...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.