473,651 Members | 2,716 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

XML file parsing with SAX

I decided to use SAX to parse my xml file.
But the parser crashes on:
File "/usr/lib/python2.3/site-packages/_xmlplus/sax/handler.py", line 38, in fatalError
raise exception
xml.sax._except ions.SAXParseEx ception: NCBI_Entrezgene .dtd:8:0: error in processing external entity reference

This is caused by:
<!DOCTYPE Entrezgene-Set PUBLIC "-//NCBI//NCBI Entrezgene/EN"
"NCBI_Entrezgen e.dtd">

If I remove it, it parses normally.
I've created my parser like this:
import sys
from xml.sax import make_parser
from handler import EntrezGeneHandl er

fopen = open("mouse2.xm l", "r")
ch = EntrezGeneHandl er()
saxparser = make_parser()
saxparser.setCo ntentHandler(ch )
saxparser.parse (fopen)

And the handler is:
from xml.sax import ContentHandler

class EntrezGeneHandl er(ContentHandl er):
"""
A handler to deal with EntrezGene in XML
"""

def startElement(se lf, name, attrs):
print "Start element:", name

So it doesn't do much yet. And still it crashes...
How can I tell the parser not to look at the DOCTYPE declaration.
On a website:
http://www.devarticles.com/c/a/XML/P...-and-Python/1/
it states that the SAX parsers are not validating, so this error shouldn't
even occur?

Cheers,

Willem
Jul 19 '05 #1
3 3652
On Sat, 2005-04-23 at 15:20 +0200, Willem Ligtenberg wrote:
I decided to use SAX to parse my xml file.
But the parser crashes on:
File "/usr/lib/python2.3/site-packages/_xmlplus/sax/handler.py", line 38, in fatalError
raise exception
xml.sax._except ions.SAXParseEx ception: NCBI_Entrezgene .dtd:8:0: error in processing external entity reference

This is caused by:
<!DOCTYPE Entrezgene-Set PUBLIC "-//NCBI//NCBI Entrezgene/EN"
"NCBI_Entrezgen e.dtd">

If I remove it, it parses normally.
I've created my parser like this:
import sys
from xml.sax import make_parser
from handler import EntrezGeneHandl er

fopen = open("mouse2.xm l", "r")
ch = EntrezGeneHandl er()
saxparser = make_parser()
saxparser.setCo ntentHandler(ch )
saxparser.parse (fopen)

And the handler is:
from xml.sax import ContentHandler

class EntrezGeneHandl er(ContentHandl er):
"""
A handler to deal with EntrezGene in XML
"""

def startElement(se lf, name, attrs):
print "Start element:", name

So it doesn't do much yet. And still it crashes...
How can I tell the parser not to look at the DOCTYPE declaration.
On a website:
http://www.devarticles.com/c/a/XML/P...-and-Python/1/
it states that the SAX parsers are not validating, so this error shouldn't
even occur?


Just because it's not validating doesn't mean that the parser won't try
to read the external entity.

Maybe you're looking for

"""
feature_externa l_ges
Value: "http://xml.org/sax/features/external-general-entities"
true: Include all external general (text) entities.
false: Do not include external general entities.
access: (parsing) read-only; (not parsing) read/write
"""

Quote from:

http://docs.python.org/lib/module-xml.sax.handler.html

But you're on pretty shaky ground in any XML 1.x toolkit using a bogus
DTDecl in this way. Why go through the hassle? Why not use a catalog,
or remove the DTDecl?
--
Uche Ogbuji Fourthought, Inc.
http://uche.ogbuji.net http://fourthought.com
http://copia.ogbuji.net http://4Suite.org
Use CSS to display XML, part 2 - http://www-128.ibm.com/developerwork...xmlcss2-i.html
XML Output with 4Suite & AMara - http://www.xml.com/pub/a/2005/04/20/py-xml.html
Use XSLT to prepare XML for import into OpenOffice Calc - http://www.ibm.com/developerworks/xml/library/x-oocalc/
Schema standardization for top-down semantic transparency - http://www-128.ibm.com/developerwork...x-think31.html

Jul 19 '05 #2
I didn't make the XML file. And I don't like messing with other peoples
data. So I just want my SAX parser to ignore it. I can't help if other
people make it hard for me to read their xml file...

On Sat, 23 Apr 2005 13:48:49 -0600, Uche Ogbuji wrote:
On Sat, 2005-04-23 at 15:20 +0200, Willem Ligtenberg wrote:
I decided to use SAX to parse my xml file.
But the parser crashes on:
File "/usr/lib/python2.3/site-packages/_xmlplus/sax/handler.py", line 38, in fatalError
raise exception
xml.sax._except ions.SAXParseEx ception: NCBI_Entrezgene .dtd:8:0: error in processing external entity reference

This is caused by:
<!DOCTYPE Entrezgene-Set PUBLIC "-//NCBI//NCBI Entrezgene/EN"
"NCBI_Entrezgen e.dtd">

If I remove it, it parses normally.
I've created my parser like this:
import sys
from xml.sax import make_parser
from handler import EntrezGeneHandl er

fopen = open("mouse2.xm l", "r")
ch = EntrezGeneHandl er()
saxparser = make_parser()
saxparser.setCo ntentHandler(ch )
saxparser.parse (fopen)

And the handler is:
from xml.sax import ContentHandler

class EntrezGeneHandl er(ContentHandl er):
"""
A handler to deal with EntrezGene in XML
"""

def startElement(se lf, name, attrs):
print "Start element:", name

So it doesn't do much yet. And still it crashes...
How can I tell the parser not to look at the DOCTYPE declaration.
On a website:
http://www.devarticles.com/c/a/XML/P...-and-Python/1/
it states that the SAX parsers are not validating, so this error shouldn't
even occur?


Just because it's not validating doesn't mean that the parser won't try
to read the external entity.

Maybe you're looking for

"""
feature_externa l_ges
Value: "http://xml.org/sax/features/external-general-entities"
true: Include all external general (text) entities.
false: Do not include external general entities.
access: (parsing) read-only; (not parsing) read/write
"""

Quote from:

http://docs.python.org/lib/module-xml.sax.handler.html

But you're on pretty shaky ground in any XML 1.x toolkit using a bogus
DTDecl in this way. Why go through the hassle? Why not use a catalog,
or remove the DTDecl?


Jul 19 '05 #3
On 4/23/05, Willem Ligtenberg <wl*********@gm ail.com> wrote:
so that will be sax.handler.fea ture_external_g es = "false"
Yes.
And it will work?
Honestly, I'm not sure. It should, but I've found these edge cases a
bit hard to predict in the Python built-in libs :-(
But what about using a catalog? I am very new to python and XML...


Catalogs allow you to rewrite the IDs for entities and such. So if
you had an XML file with an entity at a URL, but you were working
offline, you could use a catalog to "redirect" the entity to a copy on
your local filesystem.

Problem, now that I think of it, is that I'm not sure you can specify
an catalog in PySAX. You might instead have to override the method
entityResolver in your handler (and be sure to ). See the example in
listing 1 and and discussion here:

http://www.xml.com/pub/a/2005/03/02/pyxml.html

Good luck.

--
Uche Ogbuji Fourthought, Inc.
http://uche.ogbuji.net http://fourthought.com
http://copia.ogbuji.net http://4Suite.org
Use CSS to display XML, part 2 -
http://www-128.ibm.com/developerwork...xmlcss2-i.html
XML Output with 4Suite & Amara -
http://www.xml.com/pub/a/2005/04/20/py-xml.htmlUse XSLT to prepare XML
for import into OpenOffice Calc -
http://www.ibm.com/developerworks/xml/library/x-oocalc/
Schema standardization for top-down semantic transparency -
http://www-128.ibm.com/developerwork...x-think31.html
Jul 19 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
1911
by: Roberto A. F. De Almeida | last post by:
Hi, I'm interested in parsing a file containing this "structure": """dataset { int catalog_number; sequence { string experimenter; int32 time; structure {
2
3421
by: Oxmard | last post by:
Armed with my new O'Reilly book Optimizing Oracle Performance I have been trying to get a better understanding of how Oracle works. The book makes the statement, " A database cal with dep=n + 1 is the recursive child of the first subsequent dep=n database call listed in the SQL data stream. The book gives a few examples, and in trying it out it seemed to work until I tried the following SQL. My question are why does this not keep with...
2
3946
by: Cigdem | last post by:
Hello, I am trying to parse the XML files that the user selects(XML files are on anoher OS400 system called "wkdis3"). But i am permenantly getting that error: Directory0: \\wkdis3\ROOT\home Canonicalpath-Directory4: \\wkdis3\ROOT\home\bwe\ You selected the file named AAA.XML getXmlAlgorithmDocument(): IOException Not logged in
8
1538
by: H | last post by:
Now, I'm here with another newbie question .... I want to read a text file, string by string (to do some things with some words etc etc), but I can't seem to find a way to do this String by String. Is there anyway, like String s = something.ReadString() ? Or what may be a fine way to do this ? Only thing I can some up with is to read 1 char at a time, and look if the next char is a space-sign, and that way "make" the Strings myself....
7
10281
by: christian.eickhoff | last post by:
Hi Everyone, I am currently implementing an XercesDOMParser to parse an XML file and to validate this file against its XSD Schema file which are both located on my local HD drive. For this purpose I set the corresponding XercesDOMParser feature as shown in the upcoming subsection of my code. As far as I understand, the parsing process should throw an DOMException in case the XML file doesn't match the Schema file (e.g. Element...
5
3787
by: baskarpr | last post by:
Hi all, I my program after parsing in SAX parser, I want to write the parse result as an XML file. I want to ensure that there should be no difference between source XML file and parse result xml file. Because I set some properties in parser, which may cause to changes between actual and parsed. What I expect is the exact XML file structure is to be available into another XML file (incl white spc's) after SAX parsing. Below is a snippet...
5
64619
AdrianH
by: AdrianH | last post by:
Assumptions I am assuming that you know or are capable of looking up the functions I am to describe here and have some remedial understanding of C++ programming. FYI Although I have called this article “How to Parse a File in C++”, we are actually mostly lexing a file which is the breaking down of a stream in to its component parts, disregarding the syntax that stream contains. Parsing is actually including the syntax in order to make...
1
64093
AdrianH
by: AdrianH | last post by:
Assumptions I am assuming that you know or are capable of looking up the functions I am to describe here and have some remedial understanding of C programming. FYI Although I have called this article “How to Parse a File in C++”, we are actually mostly lexing a file which is the breaking down of a stream in to its component parts, disregarding the syntax that stream contains. Parsing is actually including the syntax in order to make...
7
2844
by: souravmallik | last post by:
Hello, I'm facing a big logical problem while writing a parser in VC++ using C. I have to parse a file in a chunk of bytes in a round robin fashion. Means, when I select a file, the parser will read first 512kb(IBUFFSIZE) of data, then move to next file and parse the same way. This way I can parse a number of file spreaded over different directory uniformly. I'm keeping a meta data in a file where I'm keeping the track of file parse...
2
3603
by: Felipe De Bene | last post by:
I'm having problems parsing an HTML file with the following syntax : <TABLE cellspacing=0 cellpadding=0 ALIGN=CENTER BORDER=1 width='100%'> <TH BGCOLOR='#c0c0c0' Width='3%'>User ID</TH> <TH Width='10%' BGCOLOR='#c0c0c0'>Name</TH><TH width='7%' BGCOLOR='#c0c0c0'>Date</TH> and so on.... whenever I feed the parser with such file I get the error :
0
8352
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8697
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8465
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8579
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7297
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6158
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5612
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4144
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
2699
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.