Hi. Presumably this is a easy question, but anyone who understands the sax
docs thinks completely differently than I do :-)
Following the usual cookbook examples, my app parses an open file as
follows::
parser = xml.sax.make_parser()
parser.setFeature(xml.sax.handler.feature_external _ges,1)
# Hopefully the content handler can figure out the encoding from the <?xml>
element.
handler = saxContentHandler(c,inputFileName,silent)
parser.setContentHandler(handler)
parser.parse(theFile)
Here 'theFile' is an open file. Usually this works just fine, but when the
filename contains u'\u8116' I get the following exception:
Traceback (most recent call last):
File "c:\prog\tigris-cvs\leo\src\leoFileCommands.py", line 2159, in
parse_leo_file
parser.parse(theFile)
File "c:\python25\lib\xml\sax\expatreader.py", line 107, in parse
xmlreader.IncrementalParser.parse(self, source)
File "c:\python25\lib\xml\sax\xmlreader.py", line 119, in parse
self.prepareParser(source)
File "c:\python25\lib\xml\sax\expatreader.py", line 111, in prepareParser
self._parser.SetBase(source.getSystemId())
UnicodeEncodeError: 'ascii' codec can't encode character u'\u8116' in
position 44: ordinal not in range(128)
Presumably the documentation at: http://docs.python.org/lib/module-xm...xmlreader.html
would be sufficient for a sax-head, but I have absolutely no idea of how to
create an InputSource that can handle non-ascii filenames.
Any help would be appreciated. Thanks!
Edward
--------------------------------------------------------------------
Edward K. Ream email: ed*******@charter.net
Leo: http://webpages.charter.net/edreamleo/front.html
-------------------------------------------------------------------- 9 1607
Edward K. Ream wrote:
Hi. Presumably this is a easy question, but anyone who understands the
sax docs thinks completely differently than I do :-)
Following the usual cookbook examples, my app parses an open file as
follows::
parser = xml.sax.make_parser()
parser.setFeature(xml.sax.handler.feature_external _ges,1)
# Hopefully the content handler can figure out the encoding from the
# <?xml>
element.
handler = saxContentHandler(c,inputFileName,silent)
parser.setContentHandler(handler)
parser.parse(theFile)
Here 'theFile' is an open file. Usually this works just fine, but when
Filenames are expected to be bytestrings. So what happens is that the
unicode string you pass as filename gets implicitly converted using the
default encoding.
You have to encode the unicode string according to your filesystem
beforehand.
Diez
Diez B. Roggisch wrote:
Filenames are expected to be bytestrings. So what happens is that the
unicode string you pass as filename gets implicitly converted using the
default encoding.
it is ?
>>f = open(u"\u8116", "w") f.write("hello") f.close()
>>f = open(u"\u8116", "r") f.read()
'hello'
</F>
Filenames are expected to be bytestrings.
The exception happens in a method to which no fileName is passed as an
argument.
parse_leo_file:
'C:\\prog\\tigris-cvs\\leo\\test\\unittest\\chinese?folder\\chinese? test.leo'
(trace of converted fileName)
Unexpected exception parsing
C:\prog\tigris-cvs\leo\test\unittest\chinese?folder\chinese?test. leo
Traceback (most recent call last):
File "c:\prog\tigris-cvs\leo\src\leoFileCommands.py", line 2162, in
parse_leo_file
parser.parse(theFile)
File "c:\python25\lib\xml\sax\expatreader.py", line 107, in parse
xmlreader.IncrementalParser.parse(self, source)
File "c:\python25\lib\xml\sax\xmlreader.py", line 119, in parse
self.prepareParser(source)
File "c:\python25\lib\xml\sax\expatreader.py", line 111, in prepareParser
self._parser.SetBase(source.getSystemId())
UnicodeEncodeError: 'ascii' codec can't encode character u'\u8116' in
position 44: ordinal not in range(128)
To repeat, theFile is an open file. I believe the actual filename is passed
nowhere as an argument to sax in my code. Just to make sure, I converted
the filename to ascii in my code, and got (no surprise) exactly the same
crash. I suppose a workaround would be to pass a 'file-like-object to sax
instead of an open file, so that theFile.getSystemId won't crash. But this
looks like a bug to me.
BTW:
Python 2.5.0, Tk 8.4.12, Pmw 1.2
Windows 5, 1, 2600, 2, Service Pack 2
Edward
--------------------------------------------------------------------
Edward K. Ream email: ed*******@charter.net
Leo: http://webpages.charter.net/edreamleo/front.html
--------------------------------------------------------------------
Diez B. Roggisch wrote:
Edward K. Ream wrote:
Hi. Presumably this is a easy question, but anyone who understands the
sax docs thinks completely differently than I do :-)
Following the usual cookbook examples, my app parses an open file as
follows::
parser = xml.sax.make_parser()
parser.setFeature(xml.sax.handler.feature_external _ges,1)
# Hopefully the content handler can figure out the encoding from the
# <?xml>
element.
handler = saxContentHandler(c,inputFileName,silent)
parser.setContentHandler(handler)
parser.parse(theFile)
Here 'theFile' is an open file. Usually this works just fine, but when
Filenames are expected to be bytestrings. So what happens is that the
unicode string you pass as filename gets implicitly converted using the
default encoding.
You have to encode the unicode string according to your filesystem
beforehand.
Not if your filesystem supports Unicode names, as Windows does.
Edward's point is that something is (whether by accident or "design")
trying to coerce it to str, and failing.
Happily, the workaround is easy. Replace theFile with:
# Use cStringIo to avoid a crash in sax when inputFileName has unicode
characters.
s = theFile.read()
theFile = cStringIO.StringIO(s)
My first attempt at a workaround was to use:
s = theFile.read()
parser.parseString(s)
but the expat parser does not support parseString...
Edward
--------------------------------------------------------------------
Edward K. Ream email: ed*******@charter.net
Leo: http://webpages.charter.net/edreamleo/front.html
--------------------------------------------------------------------
Fredrik Lundh schrieb:
Diez B. Roggisch wrote:
>Filenames are expected to be bytestrings. So what happens is that the unicode string you pass as filename gets implicitly converted using the default encoding.
it is ?
Yes. While you can pass Unicode strings as file names to many Python
functions, you can't pass them to Expat, as Expat requires the file name
as a byte string. Hence the error.
Regards,
Martin
P.S. and just to anticipate nit-picking: yes, you can pass a Unicode
string to Expat, too, as long as the Unicode string only contains
ASCII characters. And yes, it doesn't have to be ASCII, if you change
the system default encoding.
Edward K. Ream schrieb:
Happily, the workaround is easy. Replace theFile with:
# Use cStringIo to avoid a crash in sax when inputFileName has unicode
characters.
s = theFile.read()
theFile = cStringIO.StringIO(s)
My first attempt at a workaround was to use:
s = theFile.read()
parser.parseString(s)
but the expat parser does not support parseString...
Right - you would have to use xml.sax.parseString (which is a global
function, not a method).
Of course, parseString just does what you did: create a cStringIO
object and operate on that.
Regards,
Martin
Martin v. Löwis wrote:
Yes. While you can pass Unicode strings as file names to many Python
functions, you can't pass them to Expat, as Expat requires the file name
as a byte string. Hence the error.
sounds like a bug in the xml.sax layer, really (ET also uses Expat, and
doesn't seem to have any problems dealing with unicode filenames...)
</F>
Fredrik Lundh schrieb:
Martin v. Löwis wrote:
>Yes. While you can pass Unicode strings as file names to many Python functions, you can't pass them to Expat, as Expat requires the file name as a byte string. Hence the error.
sounds like a bug in the xml.sax layer, really (ET also uses Expat, and
doesn't seem to have any problems dealing with unicode filenames...)
That's because ET never invokes XML_SetBase. Without testing, this
suggests that there might be problem in ET with relative URIs
in parsed external entities. XML_SetBase expects a char* for the
base URI.
Regards,
Martin This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Edward K. Ream |
last post by:
Am I reading pep 277 correctly? On Windows NT/XP, should filenames always
be converted to Unicode using the mbcs encoding? For example,
myFile = unicode(__file__, "mbcs", "strict")
This...
|
by: Gerson Kurz |
last post by:
AAAAAAAARG I hate the way python handles unicode. Here is a nice
problem for y'all to enjoy: say you have a variable thats unicode
directory = u"c:\temp"
Its unicode not because you want it...
|
by: fanbanlo |
last post by:
C:\MP3\001.txt -> 0.txt
C:\MP3\01. ??? - ????(???).mp3 -> 1.mp3
Traceback (most recent call last):
File
"C:\Python24\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py",
line 310, in...
|
by: Kevin Ollivier |
last post by:
Hi all,
On Windows, it's very common to have a string of long directories in the
pathname for files, like "C:\Documents and Settings\My Long User Name\My
Documents\My Long Subdirectory...
|
by: Sune |
last post by:
Hi!
For example:
1)
I want to open a file in a Chinese locale and print it.
2)
The program takes the file name as a command line argument.
|
by: Robert |
last post by:
Hello,
I'm using Pythonwin and py2.3 (py2.4). I did not come clear with this:
I want to use win32-fuctions like win32ui.MessageBox,
listctrl.InsertItem ..... to get unicode strings on the...
|
by: gabor |
last post by:
hi,
from the documentation (http://docs.python.org/lib/os-file-dir.html) for
os.listdir:
"On Windows NT/2k/XP and Unix, if path is a Unicode object, the result
will be a list of Unicode...
|
by: durumdara |
last post by:
Hi!
As I experienced in the year 2006, the Python's zip module is not
unicode-safe.
With the hungarian filenames I got wrong result.
I need to convert iso-8859-2 to cp852 chset to get good...
|
by: Donn Ingle |
last post by:
Hello,
I hope someone can illuminate this situation for me.
Here's the nutshell:
1. On start I call locale.setlocale(locale.LC_ALL,''), the getlocale.
2. If this returns "C" or anything...
|
by: emmanuelkatto |
last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud.
Please let me know.
Thanks!
Emmanuel
|
by: nemocccc |
last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
|
by: Hystou |
last post by:
There are some requirements for setting up RAID:
1. The motherboard and BIOS support RAID configuration.
2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers,...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new...
| |