473,396 Members | 1,804 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

read xml file from compressed file using gzip

I was working at creating a simple program that would read the content
of a playlist file( in this case *.k3b") and write it out . the
compressed "*.k3b" file has two file and the one I was trying to read
was maindata.xml . I cannot however seem to use the gzip module
correctly. Have tried the program 2 ways for no success, any ideas
would be appreciated.

Attempt 1

#!/usr/bin/python

import os
import gzip
playlist_file = open('/home/flebber/oddalt.k3b')
class GzipFile([playlist_file[decompress[9, 'rb']]]);

os.system(open("/home/flebber/tmp/maindata.xml"));

for line in maindata.xml:
print line

playlist_file.close()

Attempt 2 - largely just trying to get gzip to work

#!/usr/bin/python

import gzip
fileObj = Gzipfile("/home/flebber/oddalt.k3b", 'rb');
fileContent = fileObj.read()
for line in filecontent:
print line

fileObj.close()

Jun 8 '07 #1
9 7346
flebber wrote:
I was working at creating a simple program that would read the content
of a playlist file( in this case *.k3b") and write it out . the
compressed "*.k3b" file has two file and the one I was trying to read
was maindata.xml.
Consider using lxml. It reads in gzip compressed XML files transparently and
provides loads of other nice XML goodies.

http://codespeak.net/lxml/dev/

Stefan
Jun 8 '07 #2
On Jun 8, 3:31 pm, Stefan Behnel <stefan.behnel-n05...@web.dewrote:
flebber wrote:
I was working at creating a simple program that would read the content
of a playlist file( in this case *.k3b") and write it out . the
compressed "*.k3b" file has two file and the one I was trying to read
was maindata.xml.

Consider using lxml. It reads in gzip compressed XML files transparently and
provides loads of other nice XML goodies.

http://codespeak.net/lxml/dev/

Stefan
I will, baby steps at the moment for me at the moment though as I am
only learning and can't get gzip to work

Jun 8 '07 #3
On Jun 8, 9:45 pm, flebber <flebber.c...@gmail.comwrote:
On Jun 8, 3:31 pm, Stefan Behnel <stefan.behnel-n05...@web.dewrote:
flebber wrote:
I was working at creating a simple program that would read the content
of a playlist file( in this case *.k3b") and write it out . the
compressed "*.k3b" file has two file and the one I was trying to read
was maindata.xml.
Consider using lxml. It reads in gzip compressed XML files transparently and
provides loads of other nice XML goodies.
http://codespeak.net/lxml/dev/
Stefan

I will, baby steps at the moment for me at the moment though as I am
only learning and can't get gzip to work
This is my latest attempt

#!/usr/bin/python

import os
import zlib

class gzip('/home/flebber/oddalt.k3b', 'rb')

main_data = os.system(open("/home/flebber/maindata.xml"));

for line in main_data:
print line

main_data.close()

Jun 8 '07 #4
En Fri, 08 Jun 2007 10:00:58 -0300, flebber <fl**********@gmail.com>
escribió:
>I will, baby steps at the moment for me at the moment though as I am
only learning and can't get gzip to work
Try reading some tutorial from http://wiki.python.org/moin/BeginnersGuide

--
Gabriel Genellina

Jun 9 '07 #5
flebber wrote:
I was working at creating a simple program that would read the content
of a playlist file( in this case *.k3b") and write it out . the
compressed "*.k3b" file has two file and the one I was trying to read
was maindata.xml
The k3b format is a ZIP archive. Use the zipfile library:

file:///usr/share/doc/python2.5-doc/html/lib/module-zipfile.html

Stefan
Jun 9 '07 #6
On Jun 10, 3:45 am, Stefan Behnel <stefan.behnel-n05...@web.dewrote:
flebber wrote:
I was working at creating a simple program that would read the content
of a playlist file( in this case *.k3b") and write it out . the
compressed "*.k3b" file has two file and the one I was trying to read
was maindata.xml

The k3b format is a ZIP archive. Use the zipfile library:

file:///usr/share/doc/python2.5-doc/html/lib/module-zipfile.html

Stefan
Thanks for all the help, have been using the docs at python.org and
the magnus t Hetland book. Is there any docs tha re a little more
practical or expressive as most of the module documentation is very
confusing for a beginner and doesn't provide much in the way of
examples on how to use the modules.

Not criticizing the docs as they are probably very good for
experienced programmers.

Jun 10 '07 #7
On 10/06/2007 3:06 PM, flebber wrote:
On Jun 10, 3:45 am, Stefan Behnel <stefan.behnel-n05...@web.dewrote:
>flebber wrote:
>>I was working at creating a simple program that would read the content
of a playlist file( in this case *.k3b") and write it out . the
compressed "*.k3b" file has two file and the one I was trying to read
was maindata.xml
The k3b format is a ZIP archive. Use the zipfile library:

file:///usr/share/doc/python2.5-doc/html/lib/module-zipfile.html

Stefan

Thanks for all the help, have been using the docs at python.org and
the magnus t Hetland book. Is there any docs tha re a little more
practical or expressive as most of the module documentation is very
confusing for a beginner and doesn't provide much in the way of
examples on how to use the modules.

Not criticizing the docs as they are probably very good for
experienced programmers.

Somebody else has already drawn your attention to the/a tutorial. You
need to read, understand, and work through a *good* introductory book or
tutorial before jumping into the deep end.
class GzipFile([playlist_file[decompress[9, 'rb']]]);
Errr, no, the [] are a documentation device used in most computer
language documentation to denote optional elements -- you don't type
them into your program. See below.

Secondly as Stefan pointed out, your file is a ZIP file (not a gzipped
file), they're quite different animals, so you need the zipfile module,
not the gzip module.

os.system(open("/home/flebber/tmp/maindata.xml"));
The manuals say quite simply and clearly that:
open() returns a file object
os.system's arg is a string (a command, like "grep -i fubar *.pl")
So that's guaranteed not to work.

From the docs of the zipfile module:
"""
class ZipFile( file[, mode[, compression[, allowZip64]]])

Open a ZIP file, where file can be either a path to a file (a string) or
a file-like object. The mode parameter should be 'r' to read an existing
file, 'w' to truncate and write a new file,
or 'a' to append to an existing file.
"""
.... and you don't care about the rest of the class docs in your simple
case of reading.

A class has to be called like a function to give you an object which is
an instance of that class. You need only the first argument; the second
has about a 99.999% chance of defaulting to 'r' if omitted, but we'll
play it safe and explicit:

import zipfile
zf = zipfile.ZipFile('/home/flebber/oddalt.k3b', 'r')

OK, some more useful docs:
"""
namelist( )
Return a list of archive members by name.
printdir( )
Print a table of contents for the archive to sys.stdout.
read( name)
Return the bytes of the file in the archive. The archive must be
open for read or append.
"""

So give the following a try:

print zf.namelist()
zf.printdir()
xml_string = zf.read('maindata.xml')
zf.close()

# xml_string will be a string which may or may not have line endings in
it ...
print len(xml_string)

# If you can't imagine what the next two lines will do,
# you'll have to do it once, just to see what happens:
for line in xml_string:
print line

# Wasn't that fun? How big was that file? Now do this:
lines = xml_text.splitlines()
print len(lines) # number of lines
print len(lines[0]) # length of first line

# Ummm, maybe if it's only one line you don't want to do this either,
# but what the heck:
for line in lines:
print line

HTH,
John
Jun 10 '07 #8
On Jun 10, 7:43 pm, John Machin <sjmac...@lexicon.netwrote:
On 10/06/2007 3:06 PM, flebber wrote:
On Jun 10, 3:45 am, Stefan Behnel <stefan.behnel-n05...@web.dewrote:
flebber wrote:
I was working at creating a simple program that would read the content
of a playlist file( in this case *.k3b") and write it out . the
compressed "*.k3b" file has two file and the one I was trying to read
was maindata.xml
The k3b format is a ZIP archive. Use the zipfile library:
file:///usr/share/doc/python2.5-doc/html/lib/module-zipfile.html
Stefan
Thanks for all the help, have been using the docs at python.org and
the magnus t Hetland book. Is there any docs tha re a little more
practical or expressive as most of the module documentation is very
confusing for a beginner and doesn't provide much in the way of
examples on how to use the modules.
Not criticizing the docs as they are probably very good for
experienced programmers.

Somebody else has already drawn your attention to the/a tutorial. You
need to read, understand, and work through a *good* introductory book or
tutorial before jumping into the deep end.
class GzipFile([playlist_file[decompress[9, 'rb']]]);

Errr, no, the [] are a documentation device used in most computer
language documentation to denote optional elements -- you don't type
them into your program. See below.

Secondly as Stefan pointed out, your file is a ZIP file (not a gzipped
file), they're quite different animals, so you need the zipfile module,
not the gzip module.
os.system(open("/home/flebber/tmp/maindata.xml"));

The manuals say quite simply and clearly that:
open() returns a file object
os.system's arg is a string (a command, like "grep -i fubar *.pl")
So that's guaranteed not to work.

From the docs of the zipfile module:
"""
class ZipFile( file[, mode[, compression[, allowZip64]]])

Open a ZIP file, where file can be either a path to a file (a string) or
a file-like object. The mode parameter should be 'r' to read an existing
file, 'w' to truncate and write a new file,
or 'a' to append to an existing file.
"""
... and you don't care about the rest of the class docs in your simple
case of reading.

A class has to be called like a function to give you an object which is
an instance of that class. You need only the first argument; the second
has about a 99.999% chance of defaulting to 'r' if omitted, but we'll
play it safe and explicit:

import zipfile
zf = zipfile.ZipFile('/home/flebber/oddalt.k3b', 'r')

OK, some more useful docs:
"""
namelist( )
Return a list of archive members by name.
printdir( )
Print a table of contents for the archive to sys.stdout.
read( name)
Return the bytes of the file in the archive. The archive must be
open for read or append.
"""

So give the following a try:

print zf.namelist()
zf.printdir()
xml_string = zf.read('maindata.xml')
zf.close()

# xml_string will be a string which may or may not have line endings in
it ...
print len(xml_string)

# If you can't imagine what the next two lines will do,
# you'll have to do it once, just to see what happens:
for line in xml_string:
print line

# Wasn't that fun? How big was that file? Now do this:
lines = xml_text.splitlines()
print len(lines) # number of lines
print len(lines[0]) # length of first line

# Ummm, maybe if it's only one line you don't want to do this either,
# but what the heck:
for line in lines:
print line

HTH,
John
Thanks that was so helpful to see how to do it. I have read a lot but
it wasn't sinking in, and sometimes its better to learn by doing. Some
of the books I have read just seem to go from theory to theory with
the occasional example ( which is meant to show us how good the author
is rather than help us).

For the record
>>## working on region in file /usr/tmp/python-F_C5sr.py...
['mimetype', 'maindata.xml']
File Name
Modified Size
mimetype 2007-05-27
20:36:20 17
maindata.xml 2007-05-27
20:36:20 10795
>>print len(xml_string)
10795
>>for line in xml_string:
print line
.... ...
<
?
x
m
l

v
e
r
s
i.....(etc ...it went for a while)

and
>>lines = xml_string.splitlines()
print len(lines)
387
>>print len(lines[0])
38
>>for line in lines:
.... print line
File "<stdin>", line 2
print line
^
IndentationError: expected an indented block
>>for line in lines:
print line

Jun 10 '07 #9
On 10/06/2007 8:08 PM, flebber wrote:
>
Thanks that was so helpful to see how to do it. I have read a lot but
it wasn't sinking in, and sometimes its better to learn by doing.
IMHO it's always better to learn by: read some, try it out, read some, ...
Some
of the books I have read just seem to go from theory to theory with
the occasional example ( which is meant to show us how good the author
is rather than help us).
Well, that's the wrong sort of book for learning a language. You need
one with little exercises on each page, plus a couple of bigger ones per
chapter. It helps to get used to looking things up in the manual.
Compare the description in the manual with what's in the book.
>
For the record
>>>## working on region in file /usr/tmp/python-F_C5sr.py...
['mimetype', 'maindata.xml']
File Name
Modified Size
mimetype 2007-05-27
20:36:20 17
maindata.xml 2007-05-27
20:36:20 10795
>>>print len(xml_string)
10795
>>>for line in xml_string:
print line
... ...
<
?
x
m
l

v
e
r
s
i.....(etc ...it went for a while)
Yup. At a rough guess, I'd say it printed 10795 lines.

So now you've learned by doing it what
for x in a_string:
does :-)

I hope you've also learned that "xml_string" was a good name and "line"
wasn't quite so good.
>
and
>>>lines = xml_string.splitlines()
Have you looked up splitlines in the manual?

>>>print len(lines)
387
>>>print len(lines[0])
38
>>>for line in lines:
... print line
File "<stdin>", line 2
print line
^
IndentationError: expected an indented block
>>>for line in lines:
print line
After you fixed your indentation error, did it look like what you
expected to find?

Cheers,
John
Jun 10 '07 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: chuck amadi | last post by:
Hi sorry to try and get someone to do my job but im at logger heads at the moment . I have been assigned a duty to abstract from a mail acoount (/var/spool/mail/Usersurvey) The email messages...
17
by: Guyon Morée | last post by:
what is the difference? if I open a text file in binary (rb) mode, it doesn't matter... the read() output is the same.
22
by: petermichaux | last post by:
Hi, I'm curious about server load and download time if I use one big javascript file or break it into several smaller ones. Which is better? (Please think of this as the first time the scripts...
1
by: stroumf | last post by:
Hi, Simple question, I want to receive compressed data from a server using AJAX. At the client I make an xmlHttpRequest. Next I want to set the accept-header to Gzip, but I keep getting errors...
4
by: Petr Jakes | last post by:
I am trying to save data it is comming from the serial port continually for some period. (expect reading from serial port is 100% not a problem) Following is an example of the code I am trying to...
6
by: Aaron Gray | last post by:
Hi, Is there any tool that will concatinate JavaScript files compacting them spacewise and removing comment fields ? I have some sed script that sort of does the job but was wondering if there...
1
by: John Nagle | last post by:
I have a large (gigabytes) file which is encoded in UTF-8 and then compressed with gzip. I'd like to read it with the "gzip" module and "utf8" decoding. The obvious approach is fd =...
5
by: DR | last post by:
Why is its substantialy slower to load 50GB of gzipped file (20GB gzipped file) then loading 50GB unzipped data? im using System.IO.Compression.GZipStream and its not maxing out the cpu while...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.