471,579 Members | 1,279 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,579 software developers and data experts.

PyXML, Sax, error in processing external entity reference

I'm attempting to read an XHTML 1.1 file[1], perform some DOM manipulation,
then write the results to a different file.

I've found myself rather stuck at the first hurdle.

I have the following:

from xml.dom.ext.reader import Sax2
reader = Sax2.Reader()
f = open('dorward.me.uk/sitemap.html', 'r')
doc = reader.fromStream(f)

(dorward.me.uk/sitemap.html being a local copy of
http://dorward.me.uk/sitemap.html)

.... which outputs the following:

Traceback (most recent call last):
File "x.py", line 4, in ?
doc = reader.fromStream(f)
File "/usr/lib/python2.3/site-packages/_xmlplus/dom/ext/reader/Sax2.py",
line 372, in fromStream
self.parser.parse(s)
File "/usr/lib/python2.3/site-packages/_xmlplus/sax/expatreader.py", line
109, in parse
xmlreader.IncrementalParser.parse(self, source)
File "/usr/lib/python2.3/site-packages/_xmlplus/sax/xmlreader.py", line
123, in parse
self.feed(buffer)
File "/usr/lib/python2.3/site-packages/_xmlplus/sax/expatreader.py", line
220, in feed
self._err_handler.fatalError(exc)
File "/usr/lib/python2.3/site-packages/_xmlplus/dom/ext/reader/Sax2.py",
line 340, in fatalError
raise exception
xml.sax._exceptions.SAXParseException:
http://www.w3.org/TR/xhtml-modulariz...s-1.mod:115:0:
error in processing external entity reference

I'm not sure where I should proceed from here. Is it a bug in my code? In
PyXML? In the DTD itself? What should I do next?

Thanks.

[1] Actually, lots of files, but one at a time.

--
David Dorward <http://dorward.me.uk/>
Jul 18 '05 #1
2 3331
David Dorward wrote:
I'm attempting to read an XHTML 1.1 file[1], perform some DOM manipulation,
then write the results to a different file.

I've found myself rather stuck at the first hurdle.

I have the following:

from xml.dom.ext.reader import Sax2
reader = Sax2.Reader()
f = open('dorward.me.uk/sitemap.html', 'r')
doc = reader.fromStream(f)

(dorward.me.uk/sitemap.html being a local copy of
http://dorward.me.uk/sitemap.html)

... which outputs the following:

Traceback (most recent call last):
File "x.py", line 4, in ?
doc = reader.fromStream(f)
File "/usr/lib/python2.3/site-packages/_xmlplus/dom/ext/reader/Sax2.py",
line 372, in fromStream
self.parser.parse(s)
File "/usr/lib/python2.3/site-packages/_xmlplus/sax/expatreader.py", line
109, in parse
xmlreader.IncrementalParser.parse(self, source)
File "/usr/lib/python2.3/site-packages/_xmlplus/sax/xmlreader.py", line
123, in parse
self.feed(buffer)
File "/usr/lib/python2.3/site-packages/_xmlplus/sax/expatreader.py", line
220, in feed
self._err_handler.fatalError(exc)
File "/usr/lib/python2.3/site-packages/_xmlplus/dom/ext/reader/Sax2.py",
line 340, in fatalError
raise exception
xml.sax._exceptions.SAXParseException:
http://www.w3.org/TR/xhtml-modulariz...s-1.mod:115:0:
error in processing external entity reference

I'm not sure where I should proceed from here. Is it a bug in my code? In
PyXML? In the DTD itself? What should I do next?

Thanks.

[1] Actually, lots of files, but one at a time.

I think you need a parser
import xml.sax
parser = xml.sax.make_parser()
file = "dorward.me.uk/sitemap.html"
parser.parse(file)


How furder I don't now, I'am stuck to!

Try the 'http://pyxml.sourceforge.net/topics/howto/xml-howto.html'site.

Bennie,
Jul 18 '05 #2
David Dorward <do*****@yahoo.com> wrote in message news:<c1*******************@news.demon.co.uk>...
I'm attempting to read an XHTML 1.1 file[1], perform some DOM manipulation,
then write the results to a different file.

I've found myself rather stuck at the first hurdle.

I have the following:

from xml.dom.ext.reader import Sax2
reader = Sax2.Reader()
f = open('dorward.me.uk/sitemap.html', 'r')
doc = reader.fromStream(f)

(dorward.me.uk/sitemap.html being a local copy of
http://dorward.me.uk/sitemap.html)

... which outputs the following:

Traceback (most recent call last):
File "x.py", line 4, in ?
doc = reader.fromStream(f)
File "/usr/lib/python2.3/site-packages/_xmlplus/dom/ext/reader/Sax2.py",
line 372, in fromStream
self.parser.parse(s)
File "/usr/lib/python2.3/site-packages/_xmlplus/sax/expatreader.py", line
109, in parse
xmlreader.IncrementalParser.parse(self, source)
File "/usr/lib/python2.3/site-packages/_xmlplus/sax/xmlreader.py", line
123, in parse
self.feed(buffer)
File "/usr/lib/python2.3/site-packages/_xmlplus/sax/expatreader.py", line
220, in feed
self._err_handler.fatalError(exc)
File "/usr/lib/python2.3/site-packages/_xmlplus/dom/ext/reader/Sax2.py",
line 340, in fatalError
raise exception
xml.sax._exceptions.SAXParseException:
http://www.w3.org/TR/xhtml-modulariz...s-1.mod:115:0:
error in processing external entity reference

I'm not sure where I should proceed from here. Is it a bug in my code? In
PyXML? In the DTD itself? What should I do next?


The bug is with the W3C. Through a chain of parameter entity refs, it
http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd references
http://www.w3.org/TR/xhtml-modulariz...11-model-1.mod,
which gives 404 (and yes XML heads, it is in an INCLUDE section so the
URI must be traversed unless there's a resoltion through pubID).

I'm actually rather amazed at such carelessness by the W3C, but I
don't have time to dig further to see if I can figure out how things
got broken.

I can tell you that you can use minidom or OK with this because it
does not even read the external DTD subset:
from xml.dom import minidom
doc = minidom.parse('sitemap.html')
doc <xml.dom.minidom.Document instance at 0x400635ec>
Also, 4Suite's cDomlette makes it easy for you to avoid the DTD
problem:
from Ft.Xml.Domlette import NoExtDtdReader
doc = NoExtDtdReader.parseUri("file:sitemap.html")
doc <cDocument at 0x0x403ab42c>


http://4suite.org
http://uche.ogbuji.net/tech/akara/no...1-01/domlettes

Good luck.

--Uche
http://uche.ogbuji.net
Jul 18 '05 #3

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

reply views Thread by ittay.dror | last post: by
3 posts views Thread by Robert Lintner | last post: by
reply views Thread by Geiger Ho | last post: by
11 posts views Thread by Douglas Reith | last post: by
1 post views Thread by Razvan | last post: by
1 post views Thread by Aravind | last post: by
reply views Thread by punjabinezzie | last post: by
reply views Thread by XIAOLAOHU | last post: by
reply views Thread by Vinnie | last post: by
1 post views Thread by lumer26 | last post: by
reply views Thread by lumer26 | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.