473,608 Members | 2,054 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

XML Validation with Python

Can you give a commandline example how to do XML Validation (checking
against a DTD) with Python? Not with 4Suite or other 3rd party
libraries, just the Python standard distribution. I have Python 2.2
but can upgrade to 2.3 beta if needed.

I am looking for something like:

"
$ python validate.py myxmlfile.xml mydtd.dtd
"

where validate.py contains something like:

"
import somexmllib
import sys

# prints 1 if Okay :-)
print somexmllib.vali date(sys.argv[1], sys.argv[2])
"

I am sorry if this is a FAQ or if it is in one of the xml libraries, I
just could not figure it out!
Jul 18 '05 #1
7 16252
Will Stuyvesant wrote:
Can you give a commandline example how to do XML Validation (checking
against a DTD) with Python? Not with 4Suite or other 3rd party
libraries, just the Python standard distribution.


You can't do it. The base distribution doesn't include a validating
XML parser.

The only pure python validating parser is Lars Garshol's "xmlproc",
which is a part of pyxml (a "third-party" optional extension). You can
read the documentation for xmlproc here

http://www.garshol.priv.no/download/software/xmlproc/

and the bit about validating on the command line is here

http://www.garshol.priv.no/download/...c/cmdline.html

Is there any reason why it has to be in the base distribution?

Assuming that you have a good reason, maybe you can tell us what
platform you're running on? There might be a platform specific
parser/validator that you can call from python.

HTH,

--
alan kennedy
-----------------------------------------------------
check http headers here: http://xhaus.com/headers
email alan: http://xhaus.com/mailto/alan
Jul 18 '05 #2
Will Stuyvesant wrote:
Can you give a commandline example how to do XML Validation (checking
against a DTD) with Python? Not with 4Suite or other 3rd party
libraries, just the Python standard distribution.


You can't do it. The base distribution doesn't include a validating
XML parser.

The only pure python validating parser is Lars Garshol's "xmlproc",
which is a part of pyxml (a "third-party" optional extension). You can
read the documentation for xmlproc here

http://www.garshol.priv.no/download/software/xmlproc/

and the bit about validating on the command line is here

http://www.garshol.priv.no/download/...c/cmdline.html

Is there any reason why it has to be in the base distribution?

Assuming that you have a good reason, maybe you can tell us what
platform you're running on? There might be a platform specific
parser/validator that you can call from python.

HTH,

--
alan kennedy
-----------------------------------------------------
check http headers here: http://xhaus.com/headers
email alan: http://xhaus.com/mailto/alan
Jul 18 '05 #3
I could not find a solution using the Python Standard
Libraries to write a simple commandline utility to do
XML validation. And I found the xml.sax documentation
unclear, there are no good examples to look at. Also
in the Python Cookbook and in the Python in a Nutshell
book the XML examples are BAD. There is nowhere a
motivation for the class library design, for example
"why do you need a handler in a xml.sax.parse() and why
is there no default handler", nor simple examples how
to use it. I like the approach taken by the Python
Standard Library book by Fredrik Lundh MUCH more: clear
examples and explanations. A damn shame they do not
want a new edition at O'Reilly, the poor guy is now
putting a free version on his website.

I have found a solution for XML validation using the
3rd party pyRXP library from http://www.reportlab.com/xml/pyrxp.html
Their "download and install" info is a mess, I
downloaded first a .ZIP with
only .DLL and .PYD files and it turned out you had to
plunk that into C:\Python22\DLL . This made me turn
away from pyRXP initially because bad installation
usually means bad software. But later on I found a
bigger .ZIP with more stuff so maybe I should've used
that one? At least it works now. I can do "import
pyRXP". Make sure you also download
pyRXP_Documenta tion.pdf. This is good documentation
with examples. I notice the docs in the other big .ZIP
are in .RML format...whatev er that is!

I can not believe the amount of bad documentation and
bad install approaches I see with 3rd party software.
That is why I normally stick to Python Standard Library
only.

Anyway, I can now do XML validation, below is
"validate.p y". But I am not solving my initial
problem: if it validates, then validate.py prints
nothing, if there is a mistake then it prints an error
message. What I really wanted; giving more confidence
that the validation is okay; is to print 1 or 0
depending on the result, but I have not figured out yet
how to do that and now I am too tired of it all...

# file: validate.py
import sys
if len(sys.argv)<2 or sys.argv[1] in ['-h','--help','/?']:
print 'Usage: validate.py xmlfilename'
sys.exit()
import pyRXP
p = pyRXP.Parser()
fn=open(sys.arg v[1], 'r').read()
p.parse(fn)
Jul 18 '05 #4
> [Alan Kennedy <al****@hotmail .com>]
The only pure python validating parser is Lars Garshol's "xmlproc",
which is a part of pyxml (a "third-party" optional extension). You can
read the documentation for xmlproc here

http://www.garshol.priv.no/download/software/xmlproc/

and the bit about validating on the command line is here

http://www.garshol.priv.no/download/...c/cmdline.html

Is there any reason why it has to be in the base distribution?


Because I want to use it from a cgi script written in Python. And I
am not allowed to install 3rd party stuff on the webserver. Even if I
was it would not be a solution since it has to be easy to put it on
another webserver. But of course: if there is a validating parser
written completely in Python then I can use it too! If it runs under
Python 2.1.1, that is (that is what they have at the website). I will
investigate this www.garshol.priv.no link you gave me, thank you.
Jul 18 '05 #5
Will Stuyvesant wrote:
Because I want to use it from a cgi script written in Python. And I
am not allowed to install 3rd party stuff on the webserver. Even if I
was it would not be a solution since it has to be easy to put it on
another webserver. But of course: if there is a validating parser
written completely in Python then I can use it too! If it runs under
Python 2.1.1, that is (that is what they have at the website). I will
investigate this www.garshol.priv.no link you gave me, thank you.


Glad to be of help.

There is a comment on Lars site, which is vaguely worrying, which
says:

"Note that it is recommended to use xmlproc through the SAX API rather
than directly, since this provides much greater freedom in the choice
of
parsers. (For example, you can switch to using Pyexpat which is
written
in C without changing your code.)"

Which seems to indicate to me that the author is encouraging the user
not to rely on xmlproc too much. Perhaps performance might be an
issue?

One more thing: There are alternative validation methods, which may or
not be suitable, based on your requirements.

For example, there is a python implementation of James Clark's Tree
Regular EXpressions (TREX), written in pure python, and which uses the
inbuilt C parser, written by James Tauber and called pytrex. I
personally find trex and pytrex a very natural, and thus easy to
learn, way to check structures in a tree, including data validation.
Pytrex is not complete, and is no longer maintained, but what's there
is good code, and with nice little features, such as the ability to
define your own datatype validation functions, which are called at
match time.

http://pytrex.sourceforge.net/

Pytrex is unlikely to be ever completed, because James Clark has
abandoned TREX in favour of RELAX-NG, for which I haven't seen any
python implementation.

http://www.relaxng.org/

There is a python implementation of XML-Schema, xsv, written by Henry
Thompson, which I think was kept fairly up-to-date with the XML-Schema
spec as it evolved. However, given the complexity of XML-Schema, and
having never tried to use xsv, I have no idea of its stability.

http://www.ltg.ed.ac.uk/~ht/xsv-status.html

I note that the author also maintains a web service for validating
documents.

Are you sure that XML validation-parsing is the right solution for
your problem? There may be simpler ways.

--
alan kennedy
-----------------------------------------------------
check http headers here: http://xhaus.com/headers
email alan: http://xhaus.com/mailto/alan
Jul 18 '05 #6
> [Alan Kennedy]
... interesting links and comments ...
Are you sure that XML validation-parsing is the right solution for
your problem? There may be simpler ways.


We have defined a new XML vocabulary with a DTD. I offered to make a
webservice so everybody can validate their XML files based on this
DTD. For this I use CGI with Python 2.1.1 and I have no web master
privileges.

The idea of web applications is nice in that you do not have to code
GUIs anymore: you can do pretty much everything with (X)HTML.
Sometimes you have to rethink your UI so it is possible to give every
user state a URI. A big plus is that everybody can now use your
application. And you can do more than I thought before, for example
users can send files from their computer with type=FILE fields in
forms. And for development you can just download Apache and install
it on your laptop and configure it such that everything is exactly the
same as on the target website (#!/usr/bin/python...means install their
python version in C:\usr\bin on you laptop :-)

The big problem with web applications is all the permissions you need
to install, compile, configure, etc. For Python CGI this means you
are stuck with some Python version and you realize how important the
Python Standard Library is.

--
Experience is what allows you to recognize a mistake the second time
you make it.
Jul 18 '05 #7
hw***@hotmail.c om (Will Stuyvesant) wrote in message news:<cb******* *************** ****@posting.go ogle.com>...
Anyway, I can now do XML validation, below is
"validate.p y". But I am not solving my initial
problem: if it validates, then validate.py prints
nothing, if there is a mistake then it prints an error
message. What I really wanted; giving more confidence
that the validation is okay; is to print 1 or 0
depending on the result, but I have not figured out yet
how to do that and now I am too tired of it all...


This might do the trick:

# file: validate.py
import sys, pyRXP

if len(sys.argv)<2 or sys.argv[1] in ['-h','--help','/?']:
print 'Usage: validate.py xmlfilename'
sys.exit()

fn = open(sys.argv[1], 'r').read()
try :
pyRXP.Parser(). parse(fn)
print True
except pyRXP.error :
print False
Though personally, rather than printing False, I would simply raise in
the except clause, as the traceback provides the user with more
information as to what is wrong with their xml.
Jul 18 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
654
by: Will Stuyvesant | last post by:
Can you give a commandline example how to do XML Validation (checking against a DTD) with Python? Not with 4Suite or other 3rd party libraries, just the Python standard distribution. I have Python 2.2 but can upgrade to 2.3 beta if needed. I am looking for something like: " $ python validate.py myxmlfile.xml mydtd.dtd "
0
1007
by: Herman Geldenhuys | last post by:
Oops! Sorry guys, wrong list... Apologies. ----- Original Message ----- From: Herman Geldenhuys To: python-list@python.org Sent: Wednesday, January 28, 2004 4:54 PM Subject: Security validation issue
11
1778
by: Paul Rubin | last post by:
I frequently find myself writing stuff like # compute frob function, x has to be nonnegative x = read_input_data() assert x >= 0, x # mis-use of "assert" statement frob = sqrt(x) + x/2. + 3. This is not really correct because the assert statement is supposed to validate the logical consistency of the program itself, not the input data. So, for example, when you compile with optimization on, assert
4
2479
by: Edward Diener | last post by:
Try as I might I can not find a routine in os.path which validates whether or not a path is syntactically valid, either as a directory or as a file. This is surprising since, although I know this is OS dependent, Python has many other classes and functions which will work properly depending on what OS they are currently running under. Is there such a path validation routine in any of the libraries distributed with Python or in any other 3rd...
8
1612
by: David S. | last post by:
I am looking for a way to implement the same simple validation on many instance attributes and I thought descriptors (http://users.rcn.com/python/download/Descriptor.htm) looked like the right tool. But I am confused by their behavior on instance of my class. I can only get the approximate behavior by using class variables. I am looking for something like:
2
1968
by: mmm | last post by:
I found Python code to validate a XML document basd on DTD file layout. The code uses the 'xmlproc' package and these module loading steps from xml.parsers.xmlproc import xmlproc from xml.parsers.xmlproc import xmlval from xml.parsers.xmlproc import xmldtd Unfortunately, the xml package no longer seems to hold the xmlproc module. As a standalone the xmlproc module seems to be no longer
11
1090
by: Nikolaus Rath | last post by:
Hello, I need to synchronize the access to a couple of hundred-thousand files. It seems to me that creating one lock object for each of the files is a waste of resources, but I cannot use a global lock for all of them either (since the locked operations go over the network, this would make the whole application essentially single-threaded even though most operations act on different files). My idea is therefore to create and destroy...
0
849
by: Edwin.Madari | last post by:
can you edit the xml and add the dtd/scheama ? ..Edwin -----Original Message----- From: python-list-bounces+edwin.madari=verizonwireless.com@python.org On Behalf Of Ben Finney Sent: Wednesday, August 06, 2008 7:07 PM To: python-list@python.org
8
2732
by: Bryan | last post by:
I want my business objects to be able to do this: class Person(base): def __init__(self): self.name = None @base.validator def validate_name(self): if not self.name: return
0
8087
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8025
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8509
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8493
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
6847
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
3993
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4053
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
1620
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
1363
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.