473,394 Members | 1,854 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,394 software developers and data experts.

Parse XML using Python

Hi,

I wanted to write a script that will read the below file:

<abcd label="ABC">
..
<efg label="EFGA">
.....
<decg label="ABDG">
...
</decg>
...

</efg>

...
<mon1 label="MON">
...
</mon1>
...
</abcd>
..
..
<xyz label="A1">
..
<eg1 label="FGA">
.....
<dg label="BG">
...

</dg>

...

</eg1>

</xyz>

...
and so on

The output of the script shud be

ABC
...EFGA
.....ABDG
...MON

A1
...FGA
.....BG

Please help me in writing a Python script for the above task.
Regards,
Anil.

Jul 18 '05 #1
8 1864
Am Wed, 08 Dec 2004 23:25:49 -0800 schrieb anilby:
Hi,

I wanted to write a script that will read the below file:


Hi,

Here is an example how to use sax:

http://pyxml.sourceforge.net/topics/howto/node12.html

Thomas

--
Thomas Güttler, http://www.thomas-guettler.de/
Jul 18 '05 #2

Thomas Guettler wrote:
Am Wed, 08 Dec 2004 23:25:49 -0800 schrieb anilby:
Hi,

I wanted to write a script that will read the below file:


Hi,

Here is an example how to use sax:

http://pyxml.sourceforge.net/topics/howto/node12.html

Thomas

--
Thomas Güttler, http://www.thomas-guettler.de/

Could you please tell me how to achieve the below.
I am interested in getting the output like:

ABC
EFGA --> child of ABC
ABDG --> child of AEFGA
MON --> child of ABC
A1
FGA --> child of A1
BG --> child of FGA

Jul 18 '05 #3
Anil wrote:
Could you please tell me how to achieve the below.
I am interested in getting the output like:

ABC
EFGA --> child of ABC
ABDG --> child of AEFGA
MON --> child of ABC
A1
FGA --> child of A1
BG --> child of FGA


print """
ABC
EFGA --> child of ABC
ABDG --> child of AEFGA
MON --> child of ABC
A1
FGA --> child of A1
BG --> child of FGA
"""

Unless you don't tell us what _input_ shall be processed to yield that
output, I doubt anybody can be of more help....

--
Regards,

Diez B. Roggisch
Jul 18 '05 #4
On Thu, 09 Dec 2004 06:00:27 -0800, Anil wrote:

Thomas Guettler wrote:
Hi,

Here is an example how to use sax:

http://pyxml.sourceforge.net/topics/howto/node12.html

Thomas

--
Thomas Güttler, http://www.thomas-guettler.de/

Could you please tell me how to achieve the below.
I am interested in getting the output like:


Anil, "use sax" is all you are likely to get as a starting point. If
someone just hands you a solution, what have you learned?

Start using sax (or some other parser, personally I think I'd recommend
ElementTree for this use (google it)), and if you have trouble, post a
the exact code you have, the exact input you are using, what happens, and
what you expected to happen.

It is unlikely that repeated appeals, in the absence of evidence that you
tried to solve it yourself, will get you anywhere.

Jul 18 '05 #5
<an****@gmail.com> wrote:
<abcd label="ABC"> </abcd>
.
.
<xyz label="A1"> </xyz>

..
and so on


an XML document can only have a single root element, but your example
has at least two top-level elements (abcd and xyz).

here is some elementtree code that handles this by wrapping your data in
a "root" element.

from elementtree import ElementTree

p = ElementTree.XMLTreeBuilder()

p.feed("<root>")
p.feed(open("mydocument.xml").read())
p.feed("</root>")

root = p.close()

def printelem(elem, prefix=""):
label = elem.get("label")
if label:
if not prefix:
print
print prefix + label
for elem in elem:
printelem(elem, prefix + "..")

for elem in root:
printelem(elem)

# end

the elementtree library can be found here:

http://effbot.org/zone/element-index.htm

</F>

Jul 18 '05 #6
an****@gmail.com wrote:
Hi,

I wanted to write a script that will read the below file:

<abcd label="ABC">
.
<efg label="EFGA">
....
<decg label="ABDG">
..
</decg>
..

</efg>

..
<mon1 label="MON">
..
</mon1>
..
</abcd>
.
.
<xyz label="A1">
.
<eg1 label="FGA">
....
<dg label="BG">
..

</dg>

..

</eg1>

</xyz>

..
and so on

The output of the script shud be

ABC
..EFGA
....ABDG
..MON

A1
..FGA
....BG

Please help me in writing a Python script for the above task.


Take a look at
http://home.eol.ca/~parkw/park-january.html
on "Expat XML" section towards the end. Translating it to Python is
left for homework.

In essence,
indent=..
start () {
local "${@:2}"
echo "${indent|*XML_ELEMENT_DEPTH-1}$label"
}
xml -s start "`< file.xml`"
which prints
..ABC
....EFGA
......ABDG
....MON
..A1
....FGA
......BG
with modified input, ie. wrapping XML pieces into single root tree.

--
William Park <op**********@yahoo.ca>
Open Geometry Consulting, Toronto, Canada
Linux solution for data processing.
Jul 18 '05 #7

William Park wrote:
an****@gmail.com wrote:
Hi,

I wanted to write a script that will read the below file:

<abcd label="ABC">
.
<efg label="EFGA">
....
<decg label="ABDG">
..
</decg>
..

</efg>

..
<mon1 label="MON">
..
</mon1>
..
</abcd>
.
.
<xyz label="A1">
.
<eg1 label="FGA">
....
<dg label="BG">
..

</dg>

..

</eg1>

</xyz>

..
and so on

The output of the script shud be

ABC
..EFGA
....ABDG
..MON

A1
..FGA
....BG

Please help me in writing a Python script for the above task.


Take a look at
http://home.eol.ca/~parkw/park-january.html
on "Expat XML" section towards the end. Translating it to Python is
left for homework.

In essence,
indent=..
start () {
local "${@:2}"
echo "${indent|*XML_ELEMENT_DEPTH-1}$label"
}
xml -s start "`< file.xml`"
which prints
..ABC
....EFGA
......ABDG
....MON
..A1
....FGA
......BG
with modified input, ie. wrapping XML pieces into single root tree.

--
William Park <op**********@yahoo.ca>
Open Geometry Consulting, Toronto, Canada
Linux solution for data processing.

Thanks everyone for the responses. I will try the above solutions.

Jul 18 '05 #8
This is a neat solution. You can parse any well-formed general
entitity (e.g. Anil's document with multiple root nodes) in 4Suite
1.0a4:

from Ft.Xml.Domlette import EntityReader
s = """
<spam1>eggs</spam1>
<spam2>more eggs</spam2>
"""
docfrag = EntityReader.parseString(s, 'http://foo/test/spam.xml')

docfrag is now ready for processing using DOM methods.

--
Uche Ogbuji Fourthought, Inc.
http://uche.ogbuji.net http://4Suite.org http://fourthought.com
Use CSS to display XML -
http://www.ibm.com/developerworks/ed...-xmlcss-i.html
Location, Location, Location -
http://www.xml.com/pub/a/2004/11/24/py-xml.html
The State of Python-XML in 2004 -
http://www.xml.com/pub/a/2004/10/13/py-xml.html
Be humble, not imperial (in design) -
http://www.adtmag.com/article.asp?id=10286XMLOpen and more XML Hacks -
http://www.ibm.com/developerworks/xm...x-think27.html
A survey of XML standards -
http://www-106.ibm.com/developerwork...rary/x-stand4/

Jul 18 '05 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: Andreas Røsdal | last post by:
Hello, I want to parse a binary file in python. Does python have some built in methods for doing this easily? Any links to example code would be nice.. Thanks Andreas R.
1
by: chuck amadi | last post by:
By the way list is there a better way than using the readlines() to > > >parse the mail data into a file , because Im using > > >email.message_from_file it returns > > >all the data i.e reads one...
4
by: Ben Rf | last post by:
Hi I'm new to programming and i'd like to write a program that will parse a list produced by md5summer and give me a report in a text file on which md5 sums appear more than once and where they...
0
by: Chad Whitacre | last post by:
Hey all, I've been playing around with the parser module, and based on the documentation I would expect all symbols in a parse tree to be part of the grammar. For example, I find this line in...
7
by: serpent17 | last post by:
Hello all, I have this line of numbers: 04242005 18:20:42-0.000002, 271.1748608, , , repeated several times in a text file and I would like each element to be part of a vector. how do...
13
by: DH | last post by:
Hi, I'm trying to strip the html and other useless junk from a html page.. Id like to create something like an automated text editor, where it takes the keywords from a txt file and removes them...
9
by: seberino | last post by:
I'm a compiler newbie and curious if Python grammar is able to be parsed by a recursive descent parser or if it requires a more powerful algorithm. Chris
4
by: yinglcs | last post by:
Hi, I use os.system() to execute a system command in python. Can you please tell me how can I parse (in python) the output of the os.system() ? Thank you.
4
by: Jean-Claude Neveu | last post by:
Hello, I am writing a Python program to check email using POP3. I've tried the sample code from python.org, and it works great. In other words, the code below successfully prints out my emails....
5
by: goldtech | last post by:
SAX XML Parse Python error message Hi, My first attempt at SAX, but have an error message I need help with. I cite the error message, code, and xml below. Be grateful if anyone can tell me...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.