473,888 Members | 1,564 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Read a gzip file from inside a tar file

I have a tar file. The content of the file are as following.

rohits@sandman 12-08-04 $ tar tvf 20041208.tar
drwxr-xr-x root/root 0 2004-12-08 21:39:19 20041208/
-rw-r--r-- root/root 1576 2004-12-08 21:39:19 20041208/README
drwxr-xr-x root/root 0 2004-12-08 21:27:31
20041208/snapshot_01/
-rw-r--r-- was/was 103010606 2004-12-08 16:37:38
20041208/snapshot_01/tpv-2004 1208-1350.xml.gz
What is the best method to read the content of the
tpv-20041208-1350.xml.gz?

I want to do the following with minimum code :-)
1) read above tar file
2) find the gzip file
3) read the content of this file
4) perform operations on content
5) continue

I tried various combination of following code but it does not work as
intended

fileName = sys.argv[1]
print "File Name is ", fileName
tar = tarfile.open(fi leName, "r:")
for tarinfo in tar:
if tarinfo.isreg() :
print tarinfo.name
if tarinfo.name.fi nd("tpv") != -1:
#read the gzip file
print "\thttp plugin file"
fileLike = tar.extractfile (tarinfo)
fileText = fileLike.read()
stringio = StringIO.String IO(fileText)
fileRead = gzip.GzipFile(s tringio)
for aLine in fileRead:
print aLine

Jul 18 '05 #1
3 4590
if I change fileText = fileLike.read() to fileText =
fileLike.readLi nes().

It works for a while before it gets killed of out of memory.

These are huge files. My goal is to analyze the content of the gzip
file in the tar file without having to un gzip. If that is possible.

Jul 18 '05 #2
On Tue, 2004-12-14 at 02:39, Rohit wrote:
if I change fileText = fileLike.read() to fileText =
fileLike.readLi nes().

It works for a while before it gets killed of out of memory.

These are huge files. My goal is to analyze the content of the gzip
file in the tar file without having to un gzip. If that is possible.


As far as I know, gzip is a stream compression algorithm that can't be
decompressed in small blocks. That is, I don't think you can seek 500k
into a 1MB file and decompress the next 100k.

I'd say you'll have to progressively read the file from the beginning,
processing and discarding as you go. It looks like a no-brainer to me -
see zlib.decompress obj.

Note that you _do_ have to ungzip it, you just don't have to store the
whole decompressed thing in memory / on disk at once. If you need to do
anything to it that does require the entire thing to be loaded (or
anything that means you have to seek around the file), I'd say you're
SOL.

--
Craig Ringer

Jul 18 '05 #3
Craig Ringer wrote:
These are huge files. My goal is to analyze the content of the gzip
file in the tar file without having to un gzip. If that is possible.
As far as I know, gzip is a stream compression algorithm that can't be
decompressed in small blocks. That is, I don't think you can seek 500k
into a 1MB file and decompress the next 100k.


correct.
I'd say you'll have to progressively read the file from the beginning,
processing and discarding as you go. It looks like a no-brainer to me -
see zlib.decompress obj.


it can be a bit tricky to set things up properly, though. here's a piece
of code that uses Python's good old consumer interface to decode things
incrementally:

http://effbot.org/zone/consumer-gzip.htm

you can either use this as is; just create a "target consumer", wrap it in the
gzip consumer, and feed data to the gzip consumer in suitable pieces.

alternatively, hack it until it does what you want.

</F>

Jul 18 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
2681
by: bmgz | last post by:
I am having problems trying to use the gzip module, I do the followig >>>import gzip >>>file = gzip.GzipFile("testfile.txt") >>>file.write() -which params does this accept?, archive name? I get this ERROR: Traceback (most recent call last): File "<stdin>", line 1, in ?
17
10547
by: Guyon Morée | last post by:
what is the difference? if I open a text file in binary (rb) mode, it doesn't matter... the read() output is the same.
2
1864
by: comp.lang.php | last post by:
I am simply trying to zip together selected files into a single ZIP file. Windows XP doesn't have a native process by which you can do this, so I borrowed gzip, downloaded and installed, and upon using it: <? $msg = exec('gzip -q --suffix .zip '. @join(' ', $fileArray)); ?> All of my files named in $fileArray were horribly mangled!! They all were converted individually into files with .zip extension that lost
10
4528
by: Yogi_Bear_79 | last post by:
pardon my ignorance as I am a self-taught hobbyist programmer. I am curious after reading up on SharpZipLib. Can I embed a zipped txt file in my program? Then either read from within the zip file or unzip and read it? I currently have an embedded text file that contains a list that is read into an array. I'm always looking to save space. And I could reduce my file size 75% if it was zipped! I have looked at the SharpZipLib web site,...
1
6969
by: Paul Smith | last post by:
Hi, I'd like to read a series of sqlite database files that have already been gzipped and was wondering if this can be done on the fly. In other words, can I avoid explicitly unzipping the file into another file, but instead get an SQL connection to the zip file either directly (can't see an option to do this) or to an object in memory resulting from unzipping, eg. (hypothetically); import gzip from sqlite3 import dbapi2 as sqlite
9
7396
by: flebber | last post by:
I was working at creating a simple program that would read the content of a playlist file( in this case *.k3b") and write it out . the compressed "*.k3b" file has two file and the one I was trying to read was maindata.xml . I cannot however seem to use the gzip module correctly. Have tried the program 2 ways for no success, any ideas would be appreciated. Attempt 1 #!/usr/bin/python
1
11136
by: John Nagle | last post by:
I have a large (gigabytes) file which is encoded in UTF-8 and then compressed with gzip. I'd like to read it with the "gzip" module and "utf8" decoding. The obvious approach is fd = gzip.open(fname, 'rb',encoding='utf8') But "gzip.open" doesn't support an "encoding" parameter. (It probably should, for consistency.) Is there some way to do this? Is it possible to express "unzip, then decode utf8" via "codecs.open"?
2
3208
by: Carlo Razzeto | last post by:
Hello there, I'm having an odd issue with GZIP compression (having followed example code found on MSDN). Basically, after running through the compression routine I end up with a byte array several times larger than the source text file, full of zero data. Below is the code used to do the compression, it's a part of a web service to retreive a file, there's a compress option prior to base64 encoding the data. In the following code all...
3
5971
by: Sean Davis | last post by:
I have a set of numpy arrays which I would like to save to a gzip file. Here is an example without gzip: b=numpy.ones(1000000,dtype=numpy.uint8) a=numpy.zeros(1000000,dtype=numpy.uint8) fd = file('test.dat','wb') a.tofile(fd) b.tofile(fd) fd.close()
0
9961
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
1
10885
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10439
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9597
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7990
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
7148
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5817
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
6014
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
3
3252
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.