471,354 Members | 2,095 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,354 software developers and data experts.

read xml file from compressed file using gzip

I was working at creating a simple program that would read the content
of a playlist file( in this case *.k3b") and write it out . the
compressed "*.k3b" file has two file and the one I was trying to read
was maindata.xml . I cannot however seem to use the gzip module
correctly. Have tried the program 2 ways for no success, any ideas
would be appreciated.

Attempt 1

#!/usr/bin/python

import os
import gzip
playlist_file = open('/home/flebber/oddalt.k3b')
class GzipFile([playlist_file[decompress[9, 'rb']]]);

os.system(open("/home/flebber/tmp/maindata.xml"));

for line in maindata.xml:
print line

playlist_file.close()

Attempt 2 - largely just trying to get gzip to work

#!/usr/bin/python

import gzip
fileObj = Gzipfile("/home/flebber/oddalt.k3b", 'rb');
fileContent = fileObj.read()
for line in filecontent:
print line

fileObj.close()

Jun 8 '07 #1
9 6981
flebber wrote:
I was working at creating a simple program that would read the content
of a playlist file( in this case *.k3b") and write it out . the
compressed "*.k3b" file has two file and the one I was trying to read
was maindata.xml.
Consider using lxml. It reads in gzip compressed XML files transparently and
provides loads of other nice XML goodies.

http://codespeak.net/lxml/dev/

Stefan
Jun 8 '07 #2
On Jun 8, 3:31 pm, Stefan Behnel <stefan.behnel-n05...@web.dewrote:
flebber wrote:
I was working at creating a simple program that would read the content
of a playlist file( in this case *.k3b") and write it out . the
compressed "*.k3b" file has two file and the one I was trying to read
was maindata.xml.

Consider using lxml. It reads in gzip compressed XML files transparently and
provides loads of other nice XML goodies.

http://codespeak.net/lxml/dev/

Stefan
I will, baby steps at the moment for me at the moment though as I am
only learning and can't get gzip to work

Jun 8 '07 #3
On Jun 8, 9:45 pm, flebber <flebber.c...@gmail.comwrote:
On Jun 8, 3:31 pm, Stefan Behnel <stefan.behnel-n05...@web.dewrote:
flebber wrote:
I was working at creating a simple program that would read the content
of a playlist file( in this case *.k3b") and write it out . the
compressed "*.k3b" file has two file and the one I was trying to read
was maindata.xml.
Consider using lxml. It reads in gzip compressed XML files transparently and
provides loads of other nice XML goodies.
http://codespeak.net/lxml/dev/
Stefan

I will, baby steps at the moment for me at the moment though as I am
only learning and can't get gzip to work
This is my latest attempt

#!/usr/bin/python

import os
import zlib

class gzip('/home/flebber/oddalt.k3b', 'rb')

main_data = os.system(open("/home/flebber/maindata.xml"));

for line in main_data:
print line

main_data.close()

Jun 8 '07 #4
En Fri, 08 Jun 2007 10:00:58 -0300, flebber <fl**********@gmail.com>
escribió:
>I will, baby steps at the moment for me at the moment though as I am
only learning and can't get gzip to work
Try reading some tutorial from http://wiki.python.org/moin/BeginnersGuide

--
Gabriel Genellina

Jun 9 '07 #5
flebber wrote:
I was working at creating a simple program that would read the content
of a playlist file( in this case *.k3b") and write it out . the
compressed "*.k3b" file has two file and the one I was trying to read
was maindata.xml
The k3b format is a ZIP archive. Use the zipfile library:

file:///usr/share/doc/python2.5-doc/html/lib/module-zipfile.html

Stefan
Jun 9 '07 #6
On Jun 10, 3:45 am, Stefan Behnel <stefan.behnel-n05...@web.dewrote:
flebber wrote:
I was working at creating a simple program that would read the content
of a playlist file( in this case *.k3b") and write it out . the
compressed "*.k3b" file has two file and the one I was trying to read
was maindata.xml

The k3b format is a ZIP archive. Use the zipfile library:

file:///usr/share/doc/python2.5-doc/html/lib/module-zipfile.html

Stefan
Thanks for all the help, have been using the docs at python.org and
the magnus t Hetland book. Is there any docs tha re a little more
practical or expressive as most of the module documentation is very
confusing for a beginner and doesn't provide much in the way of
examples on how to use the modules.

Not criticizing the docs as they are probably very good for
experienced programmers.

Jun 10 '07 #7
On 10/06/2007 3:06 PM, flebber wrote:
On Jun 10, 3:45 am, Stefan Behnel <stefan.behnel-n05...@web.dewrote:
>flebber wrote:
>>I was working at creating a simple program that would read the content
of a playlist file( in this case *.k3b") and write it out . the
compressed "*.k3b" file has two file and the one I was trying to read
was maindata.xml
The k3b format is a ZIP archive. Use the zipfile library:

file:///usr/share/doc/python2.5-doc/html/lib/module-zipfile.html

Stefan

Thanks for all the help, have been using the docs at python.org and
the magnus t Hetland book. Is there any docs tha re a little more
practical or expressive as most of the module documentation is very
confusing for a beginner and doesn't provide much in the way of
examples on how to use the modules.

Not criticizing the docs as they are probably very good for
experienced programmers.

Somebody else has already drawn your attention to the/a tutorial. You
need to read, understand, and work through a *good* introductory book or
tutorial before jumping into the deep end.
class GzipFile([playlist_file[decompress[9, 'rb']]]);
Errr, no, the [] are a documentation device used in most computer
language documentation to denote optional elements -- you don't type
them into your program. See below.

Secondly as Stefan pointed out, your file is a ZIP file (not a gzipped
file), they're quite different animals, so you need the zipfile module,
not the gzip module.

os.system(open("/home/flebber/tmp/maindata.xml"));
The manuals say quite simply and clearly that:
open() returns a file object
os.system's arg is a string (a command, like "grep -i fubar *.pl")
So that's guaranteed not to work.

From the docs of the zipfile module:
"""
class ZipFile( file[, mode[, compression[, allowZip64]]])

Open a ZIP file, where file can be either a path to a file (a string) or
a file-like object. The mode parameter should be 'r' to read an existing
file, 'w' to truncate and write a new file,
or 'a' to append to an existing file.
"""
.... and you don't care about the rest of the class docs in your simple
case of reading.

A class has to be called like a function to give you an object which is
an instance of that class. You need only the first argument; the second
has about a 99.999% chance of defaulting to 'r' if omitted, but we'll
play it safe and explicit:

import zipfile
zf = zipfile.ZipFile('/home/flebber/oddalt.k3b', 'r')

OK, some more useful docs:
"""
namelist( )
Return a list of archive members by name.
printdir( )
Print a table of contents for the archive to sys.stdout.
read( name)
Return the bytes of the file in the archive. The archive must be
open for read or append.
"""

So give the following a try:

print zf.namelist()
zf.printdir()
xml_string = zf.read('maindata.xml')
zf.close()

# xml_string will be a string which may or may not have line endings in
it ...
print len(xml_string)

# If you can't imagine what the next two lines will do,
# you'll have to do it once, just to see what happens:
for line in xml_string:
print line

# Wasn't that fun? How big was that file? Now do this:
lines = xml_text.splitlines()
print len(lines) # number of lines
print len(lines[0]) # length of first line

# Ummm, maybe if it's only one line you don't want to do this either,
# but what the heck:
for line in lines:
print line

HTH,
John
Jun 10 '07 #8
On Jun 10, 7:43 pm, John Machin <sjmac...@lexicon.netwrote:
On 10/06/2007 3:06 PM, flebber wrote:
On Jun 10, 3:45 am, Stefan Behnel <stefan.behnel-n05...@web.dewrote:
flebber wrote:
I was working at creating a simple program that would read the content
of a playlist file( in this case *.k3b") and write it out . the
compressed "*.k3b" file has two file and the one I was trying to read
was maindata.xml
The k3b format is a ZIP archive. Use the zipfile library:
file:///usr/share/doc/python2.5-doc/html/lib/module-zipfile.html
Stefan
Thanks for all the help, have been using the docs at python.org and
the magnus t Hetland book. Is there any docs tha re a little more
practical or expressive as most of the module documentation is very
confusing for a beginner and doesn't provide much in the way of
examples on how to use the modules.
Not criticizing the docs as they are probably very good for
experienced programmers.

Somebody else has already drawn your attention to the/a tutorial. You
need to read, understand, and work through a *good* introductory book or
tutorial before jumping into the deep end.
class GzipFile([playlist_file[decompress[9, 'rb']]]);

Errr, no, the [] are a documentation device used in most computer
language documentation to denote optional elements -- you don't type
them into your program. See below.

Secondly as Stefan pointed out, your file is a ZIP file (not a gzipped
file), they're quite different animals, so you need the zipfile module,
not the gzip module.
os.system(open("/home/flebber/tmp/maindata.xml"));

The manuals say quite simply and clearly that:
open() returns a file object
os.system's arg is a string (a command, like "grep -i fubar *.pl")
So that's guaranteed not to work.

From the docs of the zipfile module:
"""
class ZipFile( file[, mode[, compression[, allowZip64]]])

Open a ZIP file, where file can be either a path to a file (a string) or
a file-like object. The mode parameter should be 'r' to read an existing
file, 'w' to truncate and write a new file,
or 'a' to append to an existing file.
"""
... and you don't care about the rest of the class docs in your simple
case of reading.

A class has to be called like a function to give you an object which is
an instance of that class. You need only the first argument; the second
has about a 99.999% chance of defaulting to 'r' if omitted, but we'll
play it safe and explicit:

import zipfile
zf = zipfile.ZipFile('/home/flebber/oddalt.k3b', 'r')

OK, some more useful docs:
"""
namelist( )
Return a list of archive members by name.
printdir( )
Print a table of contents for the archive to sys.stdout.
read( name)
Return the bytes of the file in the archive. The archive must be
open for read or append.
"""

So give the following a try:

print zf.namelist()
zf.printdir()
xml_string = zf.read('maindata.xml')
zf.close()

# xml_string will be a string which may or may not have line endings in
it ...
print len(xml_string)

# If you can't imagine what the next two lines will do,
# you'll have to do it once, just to see what happens:
for line in xml_string:
print line

# Wasn't that fun? How big was that file? Now do this:
lines = xml_text.splitlines()
print len(lines) # number of lines
print len(lines[0]) # length of first line

# Ummm, maybe if it's only one line you don't want to do this either,
# but what the heck:
for line in lines:
print line

HTH,
John
Thanks that was so helpful to see how to do it. I have read a lot but
it wasn't sinking in, and sometimes its better to learn by doing. Some
of the books I have read just seem to go from theory to theory with
the occasional example ( which is meant to show us how good the author
is rather than help us).

For the record
>>## working on region in file /usr/tmp/python-F_C5sr.py...
['mimetype', 'maindata.xml']
File Name
Modified Size
mimetype 2007-05-27
20:36:20 17
maindata.xml 2007-05-27
20:36:20 10795
>>print len(xml_string)
10795
>>for line in xml_string:
print line
.... ...
<
?
x
m
l

v
e
r
s
i.....(etc ...it went for a while)

and
>>lines = xml_string.splitlines()
print len(lines)
387
>>print len(lines[0])
38
>>for line in lines:
.... print line
File "<stdin>", line 2
print line
^
IndentationError: expected an indented block
>>for line in lines:
print line

Jun 10 '07 #9
On 10/06/2007 8:08 PM, flebber wrote:
>
Thanks that was so helpful to see how to do it. I have read a lot but
it wasn't sinking in, and sometimes its better to learn by doing.
IMHO it's always better to learn by: read some, try it out, read some, ...
Some
of the books I have read just seem to go from theory to theory with
the occasional example ( which is meant to show us how good the author
is rather than help us).
Well, that's the wrong sort of book for learning a language. You need
one with little exercises on each page, plus a couple of bigger ones per
chapter. It helps to get used to looking things up in the manual.
Compare the description in the manual with what's in the book.
>
For the record
>>>## working on region in file /usr/tmp/python-F_C5sr.py...
['mimetype', 'maindata.xml']
File Name
Modified Size
mimetype 2007-05-27
20:36:20 17
maindata.xml 2007-05-27
20:36:20 10795
>>>print len(xml_string)
10795
>>>for line in xml_string:
print line
... ...
<
?
x
m
l

v
e
r
s
i.....(etc ...it went for a while)
Yup. At a rough guess, I'd say it printed 10795 lines.

So now you've learned by doing it what
for x in a_string:
does :-)

I hope you've also learned that "xml_string" was a good name and "line"
wasn't quite so good.
>
and
>>>lines = xml_string.splitlines()
Have you looked up splitlines in the manual?

>>>print len(lines)
387
>>>print len(lines[0])
38
>>>for line in lines:
... print line
File "<stdin>", line 2
print line
^
IndentationError: expected an indented block
>>>for line in lines:
print line
After you fixed your indentation error, did it look like what you
expected to find?

Cheers,
John
Jun 10 '07 #10

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

1 post views Thread by chuck amadi | last post: by
17 posts views Thread by Guyon Morée | last post: by
1 post views Thread by stroumf | last post: by
4 posts views Thread by Petr Jakes | last post: by
6 posts views Thread by Aaron Gray | last post: by
1 post views Thread by John Nagle | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.