473,772 Members | 2,292 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Help with xml.parsers.exp at please?

There seems to be no XML parser that can do validation in
the Python Standard Libraries. And I am stuck with Python
2.1.1. until my web master upgrades (I use Python for
CGI). I know pyXML has validating parsers, but I can not
compile things on the (unix) webserver. And even if I
could, the compiler I have access to would be different
than what was used to compile python for CGI.

I need to write a CGI script that does XML validation (and
then later also does other things). It does not have to
be complete standards compliant validation but at least it
should check if elements are declared and allowed in
special places in the XML tree.

I tried to understand SAX and DOM but I gave up, and
effbot advises to avoid them anyway. So I am studying
xml.parsers.exp at now, but I am stuck.

The program below *does* print information about DOCTYPE
declarations but nothing about the element definitions in
the DTD. I feed it an XML file with a DOCTYPE declaration
like <!DOCTYPE ROOTTAG SYSTEM "MYDTD.DTD" > and the DTD is
in the same directory. I also tried inputting the DTD
itself to this program but that doesn't work either
(ExpatError: syntaxerror at the first element definition).

Please help if you can.


# file: minimal_validat e.py
#
import xml.parsers.exp at

def element_decl_ha ndler(name, model):
print 'ELEMENT definition: ', name, ' model: ', model

def doctype_decl_ha ndler(doctypeNa me, systemId, publicId, has_internal_su bset):
print 'DOCTYPE declaration: '
print ' doctypeName: ', doctypeName
print ' systemId: ', systemId
print ' publicId:', publicId
print ' internal subset:', has_internal_su bset

p = xml.parsers.exp at.ParserCreate ()

p.ElementDeclHa ndler = element_decl_ha ndler
p.StartDoctypeD eclHandler = doctype_decl_ha ndler

import sys
input = file(sys.argv[1]).read()
p.Parse(input)
Jul 18 '05 #1
1 6060
Will Stuyvesant wrote:
There seems to be no XML parser that can do validation in
the Python Standard Libraries. And I am stuck with Python
2.1.1. until my web master upgrades (I use Python for
CGI). I know pyXML has validating parsers, but I can not
compile things on the (unix) webserver. And even if I
could, the compiler I have access to would be different
than what was used to compile python for CGI.
So it didn't work out with xmlproc? Isn't xmlproc a pure python
parser that you should be able to drop in and run without
compiling anything?
I need to write a CGI script that does XML validation (and
then later also does other things). It does not have to
be complete standards compliant validation but at least it
should check if elements are declared and allowed in
special places in the XML tree.
I think you would be much more likely to get constructive help
if you posted some examples of the tree structures and data
that you're processing.
I tried to understand SAX and DOM but I gave up, and
effbot advises to avoid them anyway. So I am studying
xml.parsers.exp at now, but I am stuck.
SAX and DOM aren't solutions, they're tools. They are simply
different ways to accessing the contents of an XML document.
They may or may not be suitable for your problem, depending
on a wide variety of considerations.

I think the problem needs to be clearly defined before an
appropriate solution can be reached.
The program below *does* print information about DOCTYPE
declarations but nothing about the element definitions in
the DTD. I feed it an XML file with a DOCTYPE declaration
like <!DOCTYPE ROOTTAG SYSTEM "MYDTD.DTD" > and the DTD is
in the same directory. I also tried inputting the DTD
itself to this program but that doesn't work either
(ExpatError: syntaxerror at the first element definition).

Please help if you can.

# file: minimal_validat e.py
#
import xml.parsers.exp at

def element_decl_ha ndler(name, model):
print 'ELEMENT definition: ', name, ' model: ', model

def doctype_decl_ha ndler(doctypeNa me, systemId, publicId, has_internal_su bset):
print 'DOCTYPE declaration: '
print ' doctypeName: ', doctypeName
print ' systemId: ', systemId
print ' publicId:', publicId
print ' internal subset:', has_internal_su bset

p = xml.parsers.exp at.ParserCreate ()

p.ElementDeclHa ndler = element_decl_ha ndler
p.StartDoctypeD eclHandler = doctype_decl_ha ndler

import sys
input = file(sys.argv[1]).read()
p.Parse(input)


I think you need to do some reading on what SAX does. In summary, it
gives you the pieces of an XML document, in a series of function
callbacks. You've got to do something with the pieces that you're
given.
SAX won't solve your problem any more than anything else unless you
know what pieces you are receiving, and are doing something with them.

One memory efficient way of building up a document in memory is to
create a python object to represent every element, and with each
"element object" being a (python) attribute of its parent. It's a lot
easier than it sounds, and can be read about here

http://aspn.activestate.com/ASPN/Coo.../Recipe/149368

And you can read about SAX in general here

http://www.devarticles.com/art/1/383/2
http://www-106.ibm.com/developerwork...ipsaxflex.html

The latter is a good example from Uche Ogbuji about extracting pieces
of a document from a SAX stream, which might be easily adaptable to
your
problem.

But I still think you'd be better to describe the problem as simply as
you can here, rather than fumbling around.

--
alan kennedy
-----------------------------------------------------
check http headers here: http://xhaus.com/headers
email alan: http://xhaus.com/mailto/alan
Jul 18 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
3937
by: Thomas Guettler | last post by:
Hi! What are the difference between xml.parsers.expat and xml.sax? Up to now I used xml.sax.make_parser and subclass from ContentHandler. I think xml.sax.make_parser uses expat as default. Why should I want to use xml.parsers.expat?
0
1467
by: dagurp | last post by:
I have this code: import xml.parsers.expat parser = xml.parsers.expat.ParserCreate(encoding="UTF-8") text = unicode("<div>þórður</div>",'UTF-8') print parser.Parse(text,1) And this is what I get: UnicodeEncodeError: 'ascii' codec can't encode characters in position 5-6: ordinal not in range(128)
4
3800
by: Laurens | last post by:
Hi, Is there any good open-source C++ XML parser library that isn't as huge as Xerces? The Xerces DLL is about 2.4 Mb in size, which is far too big for my application. (It doesn't seem to be possible to link Xerces statically.) The parser doesn't need to do DTD/Schema validation, but it should be able to expand entities. I also need a DOM to navigate the document. XPath querying would be nice, but is not essential.
2
1171
by: Nikhil | last post by:
Hi, Does anybody knows faster parsers than C - RXP in validating category and Expat in non-validating category? Are both of them (or their faster ones) portable in Mac OS X? Thanks, Nikhil
1
1324
by: Avi Kak | last post by:
Hello: This questions relates to the behavior of the Perl SAX 2.0 parser XML::LibXML::SAX. (This behavior is also shown by the XML::SAX::Expat parser and, possibly by all other Perl SAX 2.0 parsers.) My understanding of a default namespace is that all unprefixed names get assigned to the default namespace at and below the level of the element in
1
1558
by: bloon | last post by:
I know there are three most popular open-source XML parsers. They are expat, libxml, and Xerces. All three are cross-platform. Does anybody test these three parsers? Which one is the fastest when loading and parsing large XML files? Thanks.
1
1292
by: Srinivasa Parupalli | last post by:
Dear Friends, I am struck up with logic. I am using one class(BMSLexer) which takes filename as arugument and the instance of the class is used my another class. example is shown below. Now my requirement is parser should have a functionality in such a way that parser class should parse the file with the existing lexer funtionality and lexer class should have constructor with no arguments. Nxet when i call from Test calls by instancing...
6
2647
by: kaens | last post by:
Hey everyone, this may be a stupid question, but I noticed the following and as I'm pretty new to using xml and python, I was wondering if I could get an explanation. Let's say I write a simple xml parser, for an xml file that just loads the content of each tag into a dict (the xml file doesn't have multiple hierarchies in it, it's flat other than the parent node) so we have <parent>
6
4432
by: rellaboyina | last post by:
Dear All, I am having some data which will be stored in XML format and this needs to be parsed using the parser module XML::Parser and XML::Parser::Expat. This data consists of some special characters like "ø, á, í, é, È, ž, ù, ý". But when I try to parse the particular record with these special characters using the method parse(), I got an error "not well-formed (invalid token)". Could anyone please help me out in solving this...
0
9621
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10264
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10106
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10039
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
1
7461
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5355
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
4009
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3610
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2851
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.