473,387 Members | 3,684 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

Using XML w/ Python...

Jay
OK, I have this XML doc, i dont know much about XML, but what i want
to do is take certain parts of the XML doc, such as </title> blah
</title> and take just that and put onto a text doc. Then same thing
doe the </body> part. Thats about it, i checked out some of the xml
modules but dont understand how to use them. Dont get parsing, so if
you could please explain working with XML and python to me. Email me at
jm******@gmail.com

Aim- jayjay08balla
MSN- Jm******@gmail.com
Yahoo- raeraefad72
Thx

Dec 11 '05 #1
28 2227
XPath is the least painful way of doing it.

Here are some samples with various libraries for XPath
http://www.oreillynet.com/pub/wlg/6225

Read XPath basics here
http://www.w3schools.com/xpath/default.asp

It is not practical and perhaps not polite to expect people write
tutorials just for you and send by email. There are a lot of tutorials
on the web on this. Just use Google.

Dec 11 '05 #2
Jay
Yes i know, i did check out a couple but i could never understand it.
They were confusing for me and i wasnt hoping for a full typed
tutorial, just like some help with excactly wat im trying to do, not
the whole module... but watever, Thx alot for the feedbak.

Dec 11 '05 #3
Look at the standard python library reference

http://docs.python.org/lib/dom-example.html

the handleSlide function almost does what you want, except that you should use
'parse' and not 'parseString'.

-------- Original Message --------
From: "Jay" <JM******@gmail.com>
To:
Subject: Re:Using XML w/ Python...
Date: 11/12/2005 09:33
Yes i know, i did check out a couple but i could never understand it.
They were confusing for me and i wasnt hoping for a full typed
tutorial, just like some help with excactly wat im trying to do, not
the whole module... but watever, Thx alot for the feedbak.

Dec 11 '05 #4
Jay wrote:
Yes i know, i did check out a couple but i could never understand it.
They were confusing for me and i wasnt hoping for a full typed
tutorial, just like some help with excactly wat im trying to do, not
the whole module... but watever, Thx alot for the feedbak.

Well I don't want to hold this up as an example of best practice (it was
a quick hack to get some book graphics for my web site), but this
example shows you how you can extract stuff from XML, in this case
returned from Amazon's web services module.

Sorry about any wrapping that mangles the code.

regards
Steve

#!/usr/bin/python
#
# getbooks.py: download book details from Amazon.com
#
# hwBuild: database-driven web content management system
# Copyright (C) 2005 Steve Holden - st***@holdenweb.com
#
# This program is free software; you can redistribute it
# and/or modify it under the terms of the GNU General
# Public License as published by the Free Software
# Foundation; either version 2 of the License, or (at
# your option) any later version.
#
# This program is distributed in the hope that it will be
# useful, but WITHOUT ANY WARRANTY; without even the implied
# warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
# PURPOSE. See the GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public
# License along with this program; if not, write to the
# Free Software Foundation, Inc., 59 Temple Place, Suite 330,
# Boston, MA 02111-1307 USA
#
import urllib
import urlparse
import os
import re
from xml.parsers import expat
from config import Config
picindir = os.path.join(Config['datadir'], "pybooks")
for f in os.listdir(picindir):
os.unlink(os.path.join(picindir, f))

filpat = re.compile(r"\d+")

class myParser:
def __init__(self):
self.parser = expat.ParserCreate()
self.parser.StartElementHandler = self.start_element
self.parser.EndElementHandler = self.end_element
self.parser.CharacterDataHandler = self.character_data
self.processing = 0
self.count = 0

def parse(self, f):
self.parser.ParseFile(f)
return self.count

def start_element(self, name, attrs):
if name == "MediumImage":
self.processing = 1
self.imgname = ""
if self.processing == 1 and name == "URL":
self.processing = 2

def end_element(self, name):
if self.processing == 2 and name == "URL":
self.processing = 1
print "Getting:", self.imgname
scheme, loc, path, params, query, fragment =
urlparse.urlparse(self.imgname)
itemno = filpat.match(os.path.basename(path))
fnam = itemno.group()
u = urllib.urlopen(self.imgname)
img = u.read()
outfile = file(os.path.join(picindir, "%s.jpg" % fnam), "wb")
outfile.write(img)
outfile.close()
self.count += 1
if self.processing ==1 and name == "MediumImage":
self.processing = 0

def character_data(self, data):
if self.processing == 2:
self.imgname += data

def main(search=None):
print "Search:", search
count = 0
for pageNum in range(1,5):
f =
urllib.urlopen("http://webservices.amazon.com/onca/xml?Service=AWSECommerceService&AWSAccessKeyId=XXX XXXXXXXXXXXXXXXXX&t=steveholden-20&SearchIndex=Books&Operation=ItemSearch&Keywords =%s&ItemPage=%d&ResponseGroup=Images&type=lite&Ver sion=2004-11-10&f=xml"
% (urllib.quote(search or Config['book-search']), pageNum))
fnam = os.path.join(picindir, "bookdata.txt")
file(fnam, "w").write(f.read())
f = file(fnam, "r")
p = myParser()
n = p.parse(f)
if n == 0:
break
count += n
return count
if __name__ == "__main__":
import sys
search = None
if len(sys.argv) > 1:
search = sys.argv[1]
n = main(search)
print "Pictures found:", n
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC www.holdenweb.com
PyCon TX 2006 www.python.org/pycon/

Dec 11 '05 #5
On Sat, 10 Dec 2005 21:12:04 -0800, Jay wrote:
OK, I have this XML doc, i dont know much about XML, but what i want
to do is take certain parts of the XML doc


the most simple module I've found to do that is xmltramp from
http://www.aaronsw.com/2002/xmltramp/

for example:

#!/usr/bin/env python
import xmltramp
note = xmltramp.load('http://www.w3schools.com/xml/note.xml')
print note.body
Dec 11 '05 #6
Jay:
"""
K, I have this XML doc, i dont know much about XML, but what i want
to do is take certain parts of the XML doc, such as </title> blah
</title> and take just that and put onto a text doc. Then same thing
doe the </body> part. Thats about it, i checked out some of the xml
modules but dont understand how to use them. Dont get parsing, so if
you could please explain working with XML and python to me.
"""

Someone already mentioned

http://www.oreillynet.com/pub/wlg/6225

I do want to update that Amara API. As of recent releases it's as
simple as

import amara
doc = amara.parse("foo.opml")
for url in doc.xpath("//@xmlUrl"):
print url.value

Besides the XPath option, Amara [1] provides Python API options for
unknown elements, such as

node.xml_child_elements
node.xml_attributes

This is all covered with plenty of examples in the manual [2]

[1] http://uche.ogbuji.net/tech/4suite/amara/
[2] http://uche.ogbuji.net/uche.ogbuji.n...ara/manual-dev

--
Uche Ogbuji Fourthought, Inc.
http://uche.ogbuji.net http://fourthought.com
http://copia.ogbuji.net http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

Dec 11 '05 #7
Jay
some great suggestions.
Ok, i am now understanding some of parseing and how to use it and
nodes, things like that. But say i wanted to take the title of
http://www.digg.com/rss/index.xml

and XMLTramp seemed the most simple to understand.

would the path be something like this?

import xmltramp
rssDigg = xmltramp.load("http://www.digg.com/rss/index.xml")
print note.rss.channel.item.title
I think thats wat im having the most confusion on now, is how to direct
to the path that i want...
Suggestions?

Dec 11 '05 #8
"""
Ok, i am now understanding some of parseing and how to use it and
nodes, things like that. But say i wanted to take the title of
http://www.digg.com/rss/index.xml

and XMLTramp seemed the most simple to understand.

would the path be something like this?

import xmltramp
rssDigg = xmltramp.load("http://www.digg.com/rss/index.xml")
print note.rss.channel.item.title

I think thats wat im having the most confusion on now, is how to direct
to the path that i want...
"""

I suggest you read at least the front page information for the tools
you are using. It's quite clear from the xmltramp Web site (
http://www.aaronsw.com/2002/xmltramp/ ) that you want tomething like
(untested: the least homework you can do is to refine the example
yourself):

print rssDigg[rss.channel][item][title]

BTW, in Amara, the API is pretty much exactly what you guessed:
import amara
rssDigg = amara.parse("http://www.digg.com/rss/index.xml")
print rssDigg.rss.channel.item.title

Video: Conan O'Brien iPod Ad Parody
--
Uche Ogbuji Fourthought, Inc.
http://uche.ogbuji.net http://fourthought.com
http://copia.ogbuji.net http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

Dec 12 '05 #9
Jay
Ok, im convinced to that i need to get Amara, I just installed 4Suite
and now installed Amara. Still doesnt work because like i said before,
i use ActivePython from

http://www.activestate.com/Products/ActivePython/

And the requirements for Amara is Python 2.4 so.... Thats where we have
a problem, i need Amara for ActivePython. And i would like to keep
working on ActivePython w/o downloading Python 2.4.

Dec 12 '05 #10
Jay
Ummm, my error conditions.....
PythonWin 2.3.5 (#62, Feb 9 2005, 16:17:08) [MSC v.1200 32 bit
(Intel)] on win32.
Portions Copyright 1994-2004 Mark Hammond (mh******@skippinet.com.au) -
see 'Help/About PythonWin' for further copyright information.
import amara

Traceback (most recent call last):
File "<interactive input>", line 1, in ?
ImportError: No module named amara

Pretty straight forward....
As far as it should work since their both transparent, umm, well its
not.
But what would be a help would be if u knew the install dir for
ActivePython so maybe i can install amara stand alone into the
ActivePython installation dir. ?? Maybe

Dec 12 '05 #11
ActivePython is same as Standard Python distribution but with a few
extras.

"As far as it should work since their both transparent, umm, well its
not."

Why do you think it is not transparent? Did you try installing it on
both?
I have ActivePython 2.4 here and it loads amara fine.

"
Traceback (most recent call last):
File "<interactive input>", line 1, in ?
ImportError: No module named amara
"

That means you did not manage to install it properly. Are you new to
installing Python modules from command line? If you need more hand
holding, try the Python IRC channel on freenode. The responses will be
more in real time. You probably need that since you seem to have more
than one thing to learn about.

Dec 12 '05 #12
Jay
No, when i said
"As far as it should work since their both transparent, umm, well its
not."

I meant that only mine isnt, maybe urs is but for some reason it isnt.
And you said amara works fine for you, ok, then could you tell me what
package to install...

I have installed Amara 1.1.6 for Python 2.4 and it works on python 2.4
only.
Now, which package should i download for it to work on any python
prompt:
Allinone
Standalone
Or something else

Dec 12 '05 #13
"""
No, when i said
"As far as it should work since their both transparent, umm, well its
not."

I meant that only mine isnt, maybe urs is but for some reason it isnt.
And you said amara works fine for you, ok, then could you tell me what
package to install...

I have installed Amara 1.1.6 for Python 2.4 and it works on python 2.4
only.
Now, which package should i download for it to work on any python
prompt:
Allinone
Standalone
Or something else
"""

I've never used ActivePython. I don't know of any special gotchas for
it. But Amara works in Python 2.3 or 2.4. The only differences
between the Allinone and standalone packages is that Allinone includes
4Suite. Do get at least version 1.1.6.

If you're still having trouble with the ActivePython setup, the first
thing I'd ask is how you installed Amara. DId you run a WIndows
installer? Next I'd check the library path for ActivePython. What is
the output of

python -c "import sys; print sys.path"

Where you replace "python" abpve with whatever way you invoke
ActivePython.
--
Uche Ogbuji Fourthought, Inc.
http://uche.ogbuji.net http://fourthought.com
http://copia.ogbuji.net http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

Dec 12 '05 #14
Jay
hmmmm, i just tryed the same thing earlier today and it didnt work, but
now it does, i downloaded the standalone package and now it works in
activepython when it didnt before and i tryed the same thing.

And yes, last time i did type python setup.py install.

Thx anyway.

Dec 12 '05 #15
Jay
Spoke too soon, i get this error when running amara in ActivePython
import amara
amara.parse("http://www.digg.com/rss/index.xml")

Traceback (most recent call last):
File "<interactive input>", line 1, in ?
File "C:\Python23\Lib\site-packages\amara\__init__.py", line 50, in
parse
if IsXml(source):
NameError: global name 'IsXml' is not defined

So im guessing theres an error with one of the files...

Dec 12 '05 #16
"""
Spoke too soon, i get this error when running amara in ActivePython
import amara
amara.parse("http://www.digg.com/rss/index.xml")


Traceback (most recent call last):
File "<interactive input>", line 1, in ?
File "C:\Python23\Lib\site-packages\amara\__init__.py", line 50, in
parse
if IsXml(source):
NameError: global name 'IsXml' is not defined

So im guessing theres an error with one of the files...
"""

IsXml is imported conditionally, so this is an indicator that somethign
about your module setup is still not agreeing with ActivePython. What
do you see as the output of:

python -c "import amara; print dir(amara)"

? I get:

['InputSource', 'IsXml', 'Uri', 'Uuid', '__builtins__', '__doc__',
'__file__', '__name__', '__path__', '__version__', 'bindery',
'binderytools', 'binderyxpath', 'create_document', 'dateutil_standins',
'domtools', 'os', 'parse', 'pushbind', 'pushdom', 'pyxml_standins',
'saxtools']

--
Uche Ogbuji Fourthought, Inc.
http://uche.ogbuji.net http://fourthought.com
http://copia.ogbuji.net http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

Dec 12 '05 #17
Jay
when putting excactly what you got, i got
python -c "import amara; print dir(amara)" Traceback ( File "<interactive input>", line 1
python -c "import amara; print dir(amara)"
^
SyntaxError: invalid syntax

when doing it seperately, i got>
import amara
print dir(amara) ['__builtins__', '__doc__', '__file__', '__name__', '__path__',
'__version__', 'binderytools', 'os', 'parse']


Dec 12 '05 #18
"""
import amara
print dir(amara)


['__builtins__', '__doc__', '__file__', '__name__', '__path__',
'__version__', 'binderytools', 'os', 'parse']
"""

So it's not able to load domtools. What do you get trying

from amara import domtools
print domtools.py

--
Uche Ogbuji Fourthought, Inc.
http://uche.ogbuji.net http://fourthought.com
http://copia.ogbuji.net http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

Dec 12 '05 #19
uc*********@gmail.com wrote in
news:11*********************@g14g2000cwa.googlegro ups.com:
"""
Spoke too soon, i get this error when running amara in
ActivePython
import amara
amara.parse("http://www.digg.com/rss/index.xml")


Traceback (most recent call last):
File "<interactive input>", line 1, in ?
File "C:\Python23\Lib\site-packages\amara\__init__.py", line
50, in
parse
if IsXml(source):
NameError: global name 'IsXml' is not defined

So im guessing theres an error with one of the files...
"""

IsXml is imported conditionally, so this is an indicator that
somethign about your module setup is still not agreeing with
ActivePython. What do you see as the output of:

python -c "import amara; print dir(amara)"

? I get:

['InputSource', 'IsXml', 'Uri', 'Uuid', '__builtins__',
'__doc__', '__file__', '__name__', '__path__', '__version__',
'bindery', 'binderytools', 'binderyxpath', 'create_document',
'dateutil_standins', 'domtools', 'os', 'parse', 'pushbind',
'pushdom', 'pyxml_standins', 'saxtools']


Not wanting to hijack this thread, but it got me interested in
installing amara. I downloaded Amara-allinone-1.0.win32-py2.4.exe
and ran it. It professed that the installation directory was to be
D:\Python24\Lib\site-packages\ ... but it placed FT and amara in D:
\Python24\Python24\Lib\site-packages . Possibly the installer is
part of the problem here?
--
rzed
Dec 12 '05 #20
"""
Not wanting to hijack this thread, but it got me interested in
installing amara. I downloaded Amara-allinone-1.0.win32-py2.4.exe
and ran it. It professed that the installation directory was to be
D:\Python24\Lib\site-packages\ ... but it placed FT and amara in D:
\Python24\Python24\Lib\site-packages . Possibly the installer is
part of the problem here?
"""

That's really good to know. Someone else builds the Windows installer
package for Amara (I'm a near Windows illiterate), but I definitely
want to help be sure the installer works properly. In fact, your
message rings a bell that this specifically came up before:

http://lists.fourthought.com/piperma...er/007610.html

I'll have to ask some of the Windows gurus on the 4Suite list whether
they know why this might be. Do you mind if I cc you on those
messages, so that you can perhaps try out any solutions we come up
with?

Thanks.

--
Uche Ogbuji Fourthought, Inc.
http://uche.ogbuji.net http://fourthought.com
http://copia.ogbuji.net http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

Dec 12 '05 #21
uc*********@gmail.com wrote in
news:11**********************@g49g2000cwa.googlegr oups.com:
"""
Not wanting to hijack this thread, but it got me interested in
installing amara. I downloaded
Amara-allinone-1.0.win32-py2.4.exe and ran it. It professed that
the installation directory was to be
D:\Python24\Lib\site-packages\ ... but it placed FT and amara in
D: \Python24\Python24\Lib\site-packages . Possibly the installer
is part of the problem here?
"""

That's really good to know. Someone else builds the Windows
installer package for Amara (I'm a near Windows illiterate), but
I definitely want to help be sure the installer works properly.
In fact, your message rings a bell that this specifically came
up before:

http://lists.fourthought.com/piperma...November/00761
0.html

I'll have to ask some of the Windows gurus on the 4Suite list
whether they know why this might be. Do you mind if I cc you on
those messages, so that you can perhaps try out any solutions we
come up with?

Thanks.


I'd be delighted to run them. Bring 'em on!

If this is useful information: the opening screen of the installer
correctly shows D:\Python24\ as my Python directory, and correctly
shows (on my computer):
D:\Python24\Lib\site-packages\ as the Installation Directory. The
file names as it installs are of the form
"Python24\Lib\site-packages\...", which to me hints that it takes
that generated name and appends it to the Python directory to
produce the actual file path it then uses.

--
rzed
Dec 12 '05 #22
Jay
Umm, yea, u definatly hijacked my thread. If you didnt mean to then
dont....

But anyway, i get this...
import amara
from amara import domtools
print domtools.py Traceback (most recent call last):
File "<interactive input>", line 1, in ?
NameError: name 'domtools' is not defined

suggestions?

Dec 12 '05 #23
"""
But anyway, i get this...
import amara
from amara import domtools
print domtools.py


Traceback (most recent call last):
File "<interactive input>", line 1, in ?
NameError: name 'domtools' is not defined
"""

Sheesh! That right after waking up. And it shows :-)

Should have been "print domtools.__file__"

--
Uche Ogbuji Fourthought, Inc.
http://uche.ogbuji.net http://fourthought.com
http://copia.ogbuji.net http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

Dec 12 '05 #24
Jay
you might be on to something....
from amara import domtools
print domtools.__file__ C:\Python23\lib\site-packages\amara\domtools.pyc


Dec 12 '05 #25
Jay
Suggestions maybe?????

Dec 13 '05 #26
Jay
ok, thx

Dec 13 '05 #27
Jay
come on guys, the post isnt dead yet....

Dec 13 '05 #28
Rick, thanks. Based on your clue I checked, and it seems those Amara
packages are not being built rightly. I'll look to get those packages
fixed and updated tomorrow.

--
Uche Ogbuji Fourthought, Inc.
http://uche.ogbuji.net http://fourthought.com
http://copia.ogbuji.net http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/

Dec 13 '05 #29

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: The_Incubator | last post by:
As the subject suggests, I am interested in using Python as a scripting language for a game that is primarily implemented in C++, and I am also interested in using generators in those scripts... ...
8
by: Sridhar R | last post by:
Hi, I am a little experienced python programmer (2 months). I am somewhat experienced in C/C++. I am planning (now in design stage) to write an IDE in python. The IDE will not be a simple...
0
by: Adam | last post by:
I have an application which interacts with a webserver over https using client certificates. Due to a bug in openssl 0.9.6, I upgraded to 0.9.7 and rebuilt python. Now, when I access the page...
1
by: David | last post by:
I have this error message poping up when I try to import a module I made in C using the Python/C API. Everything compiles like a charm. Gives me this error message : Traceback (most recent...
8
by: Joakim Persson | last post by:
Hello all. I am involved in a project where we have a desire to improve our software testing tools, and I'm in charge of looking for solutions regarding the logging of our software (originating...
0
by: Richard Taylor | last post by:
User-Agent: OSXnews 2.07 Xref: number1.nntp.dca.giganews.com comp.lang.python:437315 Hi I am trying to use py2app (http://undefined.org/python/) to package a gnome-python application...
29
by: 63q2o4i02 | last post by:
Hi, I'm interested in using python to start writing a CAD program for electrical design. I just got done reading Steven Rubin's book, I've used "real" EDA tools, and I have an MSEE, so I know what...
23
by: Python Maniac | last post by:
I am new to Python however I would like some feedback from those who know more about Python than I do at this time. def scrambleLine(line): s = '' for c in line: s += chr(ord(c) | 0x80)...
3
by: Alexnb | last post by:
Okay, I tried to follow that, and it is kinda hard. But since you obviously know what you are doing, where did you learn this? Or where can I learn this? Maric Michaud wrote:...
0
by: rajasankar | last post by:
Hi, I am using Jython based application and trying to use inspect.py in the python files. Here is my code import inspect,os,sys,pprint,imp def handle_stackframe_without_leak(getframe): ...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.