I've been using the xml.sax.handler module to do event-driven parsing
of XML files in this python application I'm working on. However, I
keep having really pesky invalid token exceptions. Initially, I was
only getting them on control characters, and a little "sed -e 's/
[^[:print:]]/ /g' $1;" took care of that just fine. But recently, I've
been getting these invalid token excpetions with n-tildes (like the n
in España), smart/fancy/curly quotes and other seemingly harmless
characters. Specifying encoding="utf-8" in the xml header hasn't
helped matters.
Any ideas? As a last resort, I'd be willing to scrub invalid
characters.... it just seems strange that curly quotes and n-tildes
wouldn't be valid XML! Is that really the case?
TIA!
Jason 4 7285
On Mar 27, 9:59 am, jvictor...@yahoo.fr wrote:
I've been using the xml.sax.handler module to do event-driven parsing
of XML files in this python application I'm working on. However, I
keep having really pesky invalid token exceptions. Initially, I was
only getting them on control characters, and a little "sed -e 's/
[^[:print:]]/ /g' $1;" took care of that just fine. But recently, I've
been getting these invalid token excpetions with n-tildes (like the n
in España), smart/fancy/curly quotes and other seemingly harmless
characters. Specifying encoding="utf-8" in the xml header hasn't
helped matters.
Any ideas? As a last resort, I'd be willing to scrub invalid
characters.... it just seems strange that curly quotes and n-tildes
wouldn't be valid XML! Is that really the case?
TIA!
Jason
Are you making sure to encode the strings you pass into the parser in
UTF-8 or UTF-16? This article was illuminating in that respect and may
be helpful in diagnosing your problem: http://www.xml.com/pub/a/2002/11/13/py-xml.html?page=2
Mike jv********@yahoo.fr wrote:
I've been using the xml.sax.handler module to do event-driven parsing
of XML files in this python application I'm working on. However, I
keep having really pesky invalid token exceptions. Initially, I was
only getting them on control characters, and a little "sed -e 's/
[^[:print:]]/ /g' $1;" took care of that just fine. But recently, I've
been getting these invalid token excpetions with n-tildes (like the n
in España), smart/fancy/curly quotes and other seemingly harmless
characters. Specifying encoding="utf-8" in the xml header hasn't
helped matters.
Any ideas? As a last resort, I'd be willing to scrub invalid
characters.... it just seems strange that curly quotes and n-tildes
wouldn't be valid XML! Is that really the case?
It's not the case, unless you have a wrong encoding. Then the whole
XML-Document isn't a XML-document at all.
Just putting an encoding header that doesn't match the actually used
encoding won't fix that.
Read up on what encodings are, and ensure your XML-generation respects that.
Then reading these files will cause no problems.
Diez
I checked the file format (of the file containing the n-tilde - ñ) and
it is indeed UTF-8! I'm baffled! Any ideas?
Thanks,
Jason
On Mar 27, 11:16 am, "Diez B. Roggisch" <d...@nospam.web.dewrote:
jvictor...@yahoo.fr wrote:
I've been using the xml.sax.handler module to do event-driven parsing
of XML files in this python application I'm working on. However, I
keep having really pesky invalid token exceptions. Initially, I was
only getting them on control characters, and a little "sed -e 's/
[^[:print:]]/ /g' $1;" took care of that just fine. But recently, I've
been getting these invalid token excpetions with n-tildes (like the n
in España), smart/fancy/curly quotes and other seemingly harmless
characters. Specifying encoding="utf-8" in the xml header hasn't
helped matters.
Any ideas? As a last resort, I'd be willing to scrub invalid
characters.... it just seems strange that curly quotes and n-tildes
wouldn't be valid XML! Is that really the case?
It's not the case, unless you have a wrong encoding. Then the whole
XML-Document isn't a XML-document at all.
Just putting an encoding header that doesn't match the actually used
encoding won't fix that.
Read up on what encodings are, and ensure your XML-generation respects that.
Then reading these files will cause no problems.
Diez
jv********@yahoo.fr schrieb:
I checked the file format (of the file containing the n-tilde - ñ) and
it is indeed UTF-8! I'm baffled! Any ideas?
Without you showing us your actual code and data - no. Because it works
for me and a lot of other people.
Diez This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: tag |
last post by:
Hi,
I have a xml field:
<Cell Col="2" Row="10">="Actual Asset mix
"&text($C$46,"dd-mmm-yyyy")</Cell>
^
|
this is column 56
|
by: Martin Hampl |
last post by:
Hi,
I am using PostgreSQL 7.4, but I did have the same problem with the
last version.
I indexed the column word (defined as varchar(64)) using lower(word).
If I use the following query,...
|
by: Erik H. |
last post by:
I have an ASPX page in which I am trying to bind a datagrid to a dataset
pulled from Microsoft Access DB using code inline method. For some reason,
the compiler is having a problem with 'using'....
|
by: Erik H. |
last post by:
I have an ASPX page in which I am trying to bind a datagrid to a dataset
pulled from Microsoft Access DB using code inline method. For some reason,
the compiler is having a problem with 'using'....
|
by: Twanger |
last post by:
I'm getting a compiler error on my ASP.NET page and I can't see the
cause. I have a simple C# class compiled into a DLL and placed in my
bin directory which has a public property QuestionText. ...
|
by: Andy Sutorius via DotNetMonster.com |
last post by:
With this line (asp.net/c#):
colNameVal = System.Configuration.ConfigurationSettings.AppSettings;
I get "invalid token '=' in declaration". I also get it for "invalid token
';' in declaration".
I...
|
by: John Hopper |
last post by:
I have an xml document, created using xmlTextWriter,
at(http://www.supremecourtofohio.gov/rss/docketitems/2006-1942.xml) that
won't validate. It fails with "invalid token, document not well...
|
by: rcoco |
last post by:
Hi everyone,
I'm having trouble with a datagrid that is supposed to insert data.
But one row has to insert data in the data base using Radio button. So
I created two buttons one of Bad Mood then...
|
by: brad |
last post by:
This works:
This does not (one the end, 09 is used instead of 9)
File "<stdin>", line 1
area_group = {001:06, 002:04, 003:04, 006:09}
SyntaxError: invalid token
Why does 09 cause an...
|
by: DolphinDB |
last post by:
Tired of spending countless mintues downsampling your data? Look no further!
In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
|
by: ryjfgjl |
last post by:
ExcelToDatabase: batch import excel into database automatically...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM).
In this month's session, we are pleased to welcome back...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM).
In this month's session, we are pleased to welcome back...
|
by: Vimpel783 |
last post by:
Hello!
Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
|
by: PapaRatzi |
last post by:
Hello,
I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
|
by: Defcon1945 |
last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
|
by: Shællîpôpï 09 |
last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome former...
| |