Hi,
I am trying to parse an xml file using the minidom parser.
<code>
from xml.dom import minidom
xmlfilename = "sample.xml "
xmldoc = minidom.parse(x mlfilename)
</code>
The parser is failing on this line:
<mrcb245-c>Heinrich Kčufner, Norbert Nedopil, Heinz Schčoch (Hrsg.).</
mrcb245-c>
This is the error message I get:
Traceback (most recent call last):
File "readXML.py ", line 11, in <module>
xmldoc = minidom.parse(x mlfilename)
File "C:\Python25\li b\xml\dom\minid om.py", line 1913, in parse
return expatbuilder.pa rse(file)
File "C:\Python25\li b\xml\dom\expat builder.py", line 924, in parse
result = builder.parseFi le(fp)
File "C:\Python25\li b\xml\dom\expat builder.py", line 207, in
parseFile
parser.Parse(bu ffer, 0)
xml.parsers.exp at.ExpatError: not well-formed (invalid token): line
2254, column 21
It seems to me that it is having an issue with the 'č' character. I
have even tried the following to make sure it recognises the file as
utf-8 file:
<code>
from xml.dom import minidom
import codecs
xmlfilename = "sample.xml "
xmlfile = codecs.open(xml filename,"r","u tf-8")
xmlstring = xmlfile.read()
xmldoc = minidom.parse(x mlfilename)
</code>
However, this doesn't work either and I get the following error
message:
Traceback (most recent call last):
File "readXML.py ", line 9, in <module>
xmlstring = xmlfile.read()
File "C:\Python25\li b\codecs.py", line 618, in read
return self.reader.rea d(size)
File "C:\Python25\li b\codecs.py", line 424, in read
newchars, decodedbytes = self.decode(dat a, self.errors)
UnicodeDecodeEr ror: 'utf8' codec can't decode bytes in position
69343-69345: invalid data
I'm assuming here that it is failing at the same place...
Can someone please point me in the right direction?
Thanks,
Ashmir 2 12584
The parser is failing on this line:
>
<mrcb245-c>Heinrich Kčufner, Norbert Nedopil, Heinz Schčoch (Hrsg.).</
mrcb245-c>
If it is literally this line, it's no surprise: there must not be a line
break between the slash and the closing element name.
However, since you are getting the error in a different column, it's
indeed more likely that there is a problem with the encoding.
Given that the Python UTF-8 codec refuses the data, most likely, the
data is *not* encoded in UTF-8 (but perhaps in Latin-1). If so, you
need to prefix the XML document with a proper XML declaration, such
as
<?xml version="1.0" encoding="iso-8859-1"?>
Alternatively, make sure that the file is really encoded in UTF-8.
Regards,
Martin
On Jul 4, 2:36 pm, "Martin v. Löwis" <mar...@v.loewi s.dewrote:
The parser is failing on this line:
<mrcb245-c>Heinrich Kčufner, Norbert Nedopil, Heinz Schčoch (Hrsg.)..</
mrcb245-c>
If it is literally this line, it's no surprise: there must not be a line
break between the slash and the closing element name.
However, since you are getting the error in a different column, it's
indeed more likely that there is a problem with the encoding.
Given that the Python UTF-8 codec refuses the data, most likely, the
data is *not* encoded in UTF-8 (but perhaps in Latin-1). If so, you
need to prefix the XML document with a proper XML declaration, such
as
<?xml version="1.0" encoding="iso-8859-1"?>
Alternatively, make sure that the file is really encoded in UTF-8.
Regards,
Martin
There is no line break in the xml file. It was just a formatting issue
on this forum.
However, you were right about the encoding not being
utf-8. The xml file is autogenerated by a different script so that's
probably where it is going wrong.
The parser works fine if I change the first line to
<?xml version="1.0" encoding="iso-8859-1"?>
Thank you very much This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: wenke |
last post by:
Hi,
I am using the following code (see below) from php.net
(http://www.php.net/manual/en/ref.xml.php, example 1) to parse an XML
file (encoded in UTF-8). I changed the code slightly so that the cdata
sections will be echoed an not the element names as in the original
example.
In the cdata sections of my XML file I have terms like this:
|
by: Jim Cobban |
last post by:
I must be missing something.
I am using org.apache.xml.serialize.XMLSerializer to save a DOM but I am not
getting non-basic characters converted to UTF-8.
I create Text nodes in the DOM by, for example:
Document doc;
JTextArea textPrompt;
Text newTextNode;
|
by: Ulrich Vollenbruch |
last post by:
Hi all!
since I'am used to work with matlab for a long time and now have to work
with c/c++, I have again some problems with the usage of strings, pointers
and arrays. So please excuse my basic question:
I want to parse a string like "3.12" to get two integers 3 and 12. I wanted
to use the function STRTOK()
I wrote a main- and a subfunction like:
main() {
|
by: Benzari.Alex |
last post by:
Hello, the majority of my sites use PHP MYSQL and XSLT for dynamic
pages and all of it works ok for English characters. The problem begins
when I try to use Romanian or Russian characters while creating the XML
string.
What I used to do is:
1) Query the MYSQL database that is UTF-8 (the conection is also set by
a query to UTF-8)
2) Create a XML string in PHP using the query results
3) Transform the XML string using a XSLT parser
|
by: Uncle Leo |
last post by:
I created an OleDbDataAdapter with the wizard in Visual Studio 2003.
It created a dataset, connectionstring etc. for me to work with. It
also created a .xsd file where one of the columns type is set to date.
My program is being used in many different countries, and many
different local settings. Some time ago a user from Turkey contacted
me saying my program crashed on his system with the following error
code:
System.ArgumentException:...
| |
by: Charles |
last post by:
I'm designing a C++ application for the web (with FastCGI) and it has
to use UTF-8 because there will be users who will type Asian glyphs.
When I compile the application, if I use ANSI, no problem, it compiles
properly. But if I save the files as UTF-8, I get this error message:
%g++ -o cgi-bin/test.fcgi test.cpp
test.csp.cpp:1: error: stray '\239' in program
test.csp.cpp:1: error: stray '\187' in program
test.csp.cpp:1: error: stray...
|
by: barronmo |
last post by:
I'm having difficulty getting the following code to work. All I want
to do is remove the '0:00:00' from the end of each line. Here is part
of the original file:
3,3,"Dyspepsia NOS",9/12/2003 0:00:00
4,3,"OA of lower leg",9/12/2003 0:00:00
5,4,"Cholera NOS",9/12/2003 0:00:00
6,4,"Open wound of ear NEC*",9/12/2003 0:00:00
7,4,"Migraine with aura",9/12/2003 0:00:00
8,6,"HTN ",10/15/2003 0:00:00
|
by: Zvi |
last post by:
Hi All,
Can someone tell me why id the following not working?
I have a soap response envelope, for test purpose it's just a string
and I create ElementTree from it.
Then I try to find Response tag, but I get None.
data = """<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/
soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
by: gnewsgroup |
last post by:
I need to bind *some* nodes of an xml document to an asp.net Menu
control. The problem is that the XML document uses the same node name
and attribute name for for the entire document among all levels like
so:
<node attr1="blahblah" attr2="blahblahblahblah">
<node attr1="flufffluff" attr2="shshshshsh">
<node attr1="cluckcluck" attr2="naynaynay" />
<node attr1="chillaladolads" attr2="lasfojelajdfljalsdf" />
<node attr1="eoijoawelkjladsf"...
|
by: pindoriya1 |
last post by:
Hi,
m using split function to parse the csv file ..... i m getting problem in one line of file which looks like this :
"BLUEUSBXM","X-MICRO (XBT-DG5R (C) R1) BLUETOOTH USB DONGLE PLUS, CLASS 1",360,0.00,"","",0.00,"",0,0.00,"","T1","",5.58,"4000",8.50,0.00,"02/10/2006",0.00,"16/04/2007"
now the problem is it parses first field properly.
BUT
Not parsing second field.....
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
| |
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
| |
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
|
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |