473,372 Members | 993 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,372 software developers and data experts.

read xml file from compressed file using gzip

I was working at creating a simple program that would read the content
of a playlist file( in this case *.k3b") and write it out . the
compressed "*.k3b" file has two file and the one I was trying to read
was maindata.xml . I cannot however seem to use the gzip module
correctly. Have tried the program 2 ways for no success, any ideas
would be appreciated.

Attempt 1

#!/usr/bin/python

import os
import gzip
playlist_file = open('/home/flebber/oddalt.k3b')
class GzipFile([playlist_file[decompress[9, 'rb']]]);

os.system(open("/home/flebber/tmp/maindata.xml"));

for line in maindata.xml:
print line

playlist_file.close()

Attempt 2 - largely just trying to get gzip to work

#!/usr/bin/python

import gzip
fileObj = Gzipfile("/home/flebber/oddalt.k3b", 'rb');
fileContent = fileObj.read()
for line in filecontent:
print line

fileObj.close()

Jun 8 '07 #1
9 7334
flebber wrote:
I was working at creating a simple program that would read the content
of a playlist file( in this case *.k3b") and write it out . the
compressed "*.k3b" file has two file and the one I was trying to read
was maindata.xml.
Consider using lxml. It reads in gzip compressed XML files transparently and
provides loads of other nice XML goodies.

http://codespeak.net/lxml/dev/

Stefan
Jun 8 '07 #2
On Jun 8, 3:31 pm, Stefan Behnel <stefan.behnel-n05...@web.dewrote:
flebber wrote:
I was working at creating a simple program that would read the content
of a playlist file( in this case *.k3b") and write it out . the
compressed "*.k3b" file has two file and the one I was trying to read
was maindata.xml.

Consider using lxml. It reads in gzip compressed XML files transparently and
provides loads of other nice XML goodies.

http://codespeak.net/lxml/dev/

Stefan
I will, baby steps at the moment for me at the moment though as I am
only learning and can't get gzip to work

Jun 8 '07 #3
On Jun 8, 9:45 pm, flebber <flebber.c...@gmail.comwrote:
On Jun 8, 3:31 pm, Stefan Behnel <stefan.behnel-n05...@web.dewrote:
flebber wrote:
I was working at creating a simple program that would read the content
of a playlist file( in this case *.k3b") and write it out . the
compressed "*.k3b" file has two file and the one I was trying to read
was maindata.xml.
Consider using lxml. It reads in gzip compressed XML files transparently and
provides loads of other nice XML goodies.
http://codespeak.net/lxml/dev/
Stefan

I will, baby steps at the moment for me at the moment though as I am
only learning and can't get gzip to work
This is my latest attempt

#!/usr/bin/python

import os
import zlib

class gzip('/home/flebber/oddalt.k3b', 'rb')

main_data = os.system(open("/home/flebber/maindata.xml"));

for line in main_data:
print line

main_data.close()

Jun 8 '07 #4
En Fri, 08 Jun 2007 10:00:58 -0300, flebber <fl**********@gmail.com>
escribió:
>I will, baby steps at the moment for me at the moment though as I am
only learning and can't get gzip to work
Try reading some tutorial from http://wiki.python.org/moin/BeginnersGuide

--
Gabriel Genellina

Jun 9 '07 #5
flebber wrote:
I was working at creating a simple program that would read the content
of a playlist file( in this case *.k3b") and write it out . the
compressed "*.k3b" file has two file and the one I was trying to read
was maindata.xml
The k3b format is a ZIP archive. Use the zipfile library:

file:///usr/share/doc/python2.5-doc/html/lib/module-zipfile.html

Stefan
Jun 9 '07 #6
On Jun 10, 3:45 am, Stefan Behnel <stefan.behnel-n05...@web.dewrote:
flebber wrote:
I was working at creating a simple program that would read the content
of a playlist file( in this case *.k3b") and write it out . the
compressed "*.k3b" file has two file and the one I was trying to read
was maindata.xml

The k3b format is a ZIP archive. Use the zipfile library:

file:///usr/share/doc/python2.5-doc/html/lib/module-zipfile.html

Stefan
Thanks for all the help, have been using the docs at python.org and
the magnus t Hetland book. Is there any docs tha re a little more
practical or expressive as most of the module documentation is very
confusing for a beginner and doesn't provide much in the way of
examples on how to use the modules.

Not criticizing the docs as they are probably very good for
experienced programmers.

Jun 10 '07 #7
On 10/06/2007 3:06 PM, flebber wrote:
On Jun 10, 3:45 am, Stefan Behnel <stefan.behnel-n05...@web.dewrote:
>flebber wrote:
>>I was working at creating a simple program that would read the content
of a playlist file( in this case *.k3b") and write it out . the
compressed "*.k3b" file has two file and the one I was trying to read
was maindata.xml
The k3b format is a ZIP archive. Use the zipfile library:

file:///usr/share/doc/python2.5-doc/html/lib/module-zipfile.html

Stefan

Thanks for all the help, have been using the docs at python.org and
the magnus t Hetland book. Is there any docs tha re a little more
practical or expressive as most of the module documentation is very
confusing for a beginner and doesn't provide much in the way of
examples on how to use the modules.

Not criticizing the docs as they are probably very good for
experienced programmers.

Somebody else has already drawn your attention to the/a tutorial. You
need to read, understand, and work through a *good* introductory book or
tutorial before jumping into the deep end.
class GzipFile([playlist_file[decompress[9, 'rb']]]);
Errr, no, the [] are a documentation device used in most computer
language documentation to denote optional elements -- you don't type
them into your program. See below.

Secondly as Stefan pointed out, your file is a ZIP file (not a gzipped
file), they're quite different animals, so you need the zipfile module,
not the gzip module.

os.system(open("/home/flebber/tmp/maindata.xml"));
The manuals say quite simply and clearly that:
open() returns a file object
os.system's arg is a string (a command, like "grep -i fubar *.pl")
So that's guaranteed not to work.

From the docs of the zipfile module:
"""
class ZipFile( file[, mode[, compression[, allowZip64]]])

Open a ZIP file, where file can be either a path to a file (a string) or
a file-like object. The mode parameter should be 'r' to read an existing
file, 'w' to truncate and write a new file,
or 'a' to append to an existing file.
"""
.... and you don't care about the rest of the class docs in your simple
case of reading.

A class has to be called like a function to give you an object which is
an instance of that class. You need only the first argument; the second
has about a 99.999% chance of defaulting to 'r' if omitted, but we'll
play it safe and explicit:

import zipfile
zf = zipfile.ZipFile('/home/flebber/oddalt.k3b', 'r')

OK, some more useful docs:
"""
namelist( )
Return a list of archive members by name.
printdir( )
Print a table of contents for the archive to sys.stdout.
read( name)
Return the bytes of the file in the archive. The archive must be
open for read or append.
"""

So give the following a try:

print zf.namelist()
zf.printdir()
xml_string = zf.read('maindata.xml')
zf.close()

# xml_string will be a string which may or may not have line endings in
it ...
print len(xml_string)

# If you can't imagine what the next two lines will do,
# you'll have to do it once, just to see what happens:
for line in xml_string:
print line

# Wasn't that fun? How big was that file? Now do this:
lines = xml_text.splitlines()
print len(lines) # number of lines
print len(lines[0]) # length of first line

# Ummm, maybe if it's only one line you don't want to do this either,
# but what the heck:
for line in lines:
print line

HTH,
John
Jun 10 '07 #8
On Jun 10, 7:43 pm, John Machin <sjmac...@lexicon.netwrote:
On 10/06/2007 3:06 PM, flebber wrote:
On Jun 10, 3:45 am, Stefan Behnel <stefan.behnel-n05...@web.dewrote:
flebber wrote:
I was working at creating a simple program that would read the content
of a playlist file( in this case *.k3b") and write it out . the
compressed "*.k3b" file has two file and the one I was trying to read
was maindata.xml
The k3b format is a ZIP archive. Use the zipfile library:
file:///usr/share/doc/python2.5-doc/html/lib/module-zipfile.html
Stefan
Thanks for all the help, have been using the docs at python.org and
the magnus t Hetland book. Is there any docs tha re a little more
practical or expressive as most of the module documentation is very
confusing for a beginner and doesn't provide much in the way of
examples on how to use the modules.
Not criticizing the docs as they are probably very good for
experienced programmers.

Somebody else has already drawn your attention to the/a tutorial. You
need to read, understand, and work through a *good* introductory book or
tutorial before jumping into the deep end.
class GzipFile([playlist_file[decompress[9, 'rb']]]);

Errr, no, the [] are a documentation device used in most computer
language documentation to denote optional elements -- you don't type
them into your program. See below.

Secondly as Stefan pointed out, your file is a ZIP file (not a gzipped
file), they're quite different animals, so you need the zipfile module,
not the gzip module.
os.system(open("/home/flebber/tmp/maindata.xml"));

The manuals say quite simply and clearly that:
open() returns a file object
os.system's arg is a string (a command, like "grep -i fubar *.pl")
So that's guaranteed not to work.

From the docs of the zipfile module:
"""
class ZipFile( file[, mode[, compression[, allowZip64]]])

Open a ZIP file, where file can be either a path to a file (a string) or
a file-like object. The mode parameter should be 'r' to read an existing
file, 'w' to truncate and write a new file,
or 'a' to append to an existing file.
"""
... and you don't care about the rest of the class docs in your simple
case of reading.

A class has to be called like a function to give you an object which is
an instance of that class. You need only the first argument; the second
has about a 99.999% chance of defaulting to 'r' if omitted, but we'll
play it safe and explicit:

import zipfile
zf = zipfile.ZipFile('/home/flebber/oddalt.k3b', 'r')

OK, some more useful docs:
"""
namelist( )
Return a list of archive members by name.
printdir( )
Print a table of contents for the archive to sys.stdout.
read( name)
Return the bytes of the file in the archive. The archive must be
open for read or append.
"""

So give the following a try:

print zf.namelist()
zf.printdir()
xml_string = zf.read('maindata.xml')
zf.close()

# xml_string will be a string which may or may not have line endings in
it ...
print len(xml_string)

# If you can't imagine what the next two lines will do,
# you'll have to do it once, just to see what happens:
for line in xml_string:
print line

# Wasn't that fun? How big was that file? Now do this:
lines = xml_text.splitlines()
print len(lines) # number of lines
print len(lines[0]) # length of first line

# Ummm, maybe if it's only one line you don't want to do this either,
# but what the heck:
for line in lines:
print line

HTH,
John
Thanks that was so helpful to see how to do it. I have read a lot but
it wasn't sinking in, and sometimes its better to learn by doing. Some
of the books I have read just seem to go from theory to theory with
the occasional example ( which is meant to show us how good the author
is rather than help us).

For the record
>>## working on region in file /usr/tmp/python-F_C5sr.py...
['mimetype', 'maindata.xml']
File Name
Modified Size
mimetype 2007-05-27
20:36:20 17
maindata.xml 2007-05-27
20:36:20 10795
>>print len(xml_string)
10795
>>for line in xml_string:
print line
.... ...
<
?
x
m
l

v
e
r
s
i.....(etc ...it went for a while)

and
>>lines = xml_string.splitlines()
print len(lines)
387
>>print len(lines[0])
38
>>for line in lines:
.... print line
File "<stdin>", line 2
print line
^
IndentationError: expected an indented block
>>for line in lines:
print line

Jun 10 '07 #9
On 10/06/2007 8:08 PM, flebber wrote:
>
Thanks that was so helpful to see how to do it. I have read a lot but
it wasn't sinking in, and sometimes its better to learn by doing.
IMHO it's always better to learn by: read some, try it out, read some, ...
Some
of the books I have read just seem to go from theory to theory with
the occasional example ( which is meant to show us how good the author
is rather than help us).
Well, that's the wrong sort of book for learning a language. You need
one with little exercises on each page, plus a couple of bigger ones per
chapter. It helps to get used to looking things up in the manual.
Compare the description in the manual with what's in the book.
>
For the record
>>>## working on region in file /usr/tmp/python-F_C5sr.py...
['mimetype', 'maindata.xml']
File Name
Modified Size
mimetype 2007-05-27
20:36:20 17
maindata.xml 2007-05-27
20:36:20 10795
>>>print len(xml_string)
10795
>>>for line in xml_string:
print line
... ...
<
?
x
m
l

v
e
r
s
i.....(etc ...it went for a while)
Yup. At a rough guess, I'd say it printed 10795 lines.

So now you've learned by doing it what
for x in a_string:
does :-)

I hope you've also learned that "xml_string" was a good name and "line"
wasn't quite so good.
>
and
>>>lines = xml_string.splitlines()
Have you looked up splitlines in the manual?

>>>print len(lines)
387
>>>print len(lines[0])
38
>>>for line in lines:
... print line
File "<stdin>", line 2
print line
^
IndentationError: expected an indented block
>>>for line in lines:
print line
After you fixed your indentation error, did it look like what you
expected to find?

Cheers,
John
Jun 10 '07 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: chuck amadi | last post by:
Hi sorry to try and get someone to do my job but im at logger heads at the moment . I have been assigned a duty to abstract from a mail acoount (/var/spool/mail/Usersurvey) The email messages...
17
by: Guyon Morée | last post by:
what is the difference? if I open a text file in binary (rb) mode, it doesn't matter... the read() output is the same.
22
by: petermichaux | last post by:
Hi, I'm curious about server load and download time if I use one big javascript file or break it into several smaller ones. Which is better? (Please think of this as the first time the scripts...
1
by: stroumf | last post by:
Hi, Simple question, I want to receive compressed data from a server using AJAX. At the client I make an xmlHttpRequest. Next I want to set the accept-header to Gzip, but I keep getting errors...
4
by: Petr Jakes | last post by:
I am trying to save data it is comming from the serial port continually for some period. (expect reading from serial port is 100% not a problem) Following is an example of the code I am trying to...
6
by: Aaron Gray | last post by:
Hi, Is there any tool that will concatinate JavaScript files compacting them spacewise and removing comment fields ? I have some sed script that sort of does the job but was wondering if there...
1
by: John Nagle | last post by:
I have a large (gigabytes) file which is encoded in UTF-8 and then compressed with gzip. I'd like to read it with the "gzip" module and "utf8" decoding. The obvious approach is fd =...
5
by: DR | last post by:
Why is its substantialy slower to load 50GB of gzipped file (20GB gzipped file) then loading 50GB unzipped data? im using System.IO.Compression.GZipStream and its not maxing out the cpu while...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.