473,461 Members | 1,578 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

Reading the contents of zip files

424 256MB
I have a large number of sequentially-named zip files, each containing a single csv data file which I need to read into my C++ program.

1) Does anybody know of any open source libraries to handle zip files? (I've seen some unportable, expensive commercial ones only).

2) I could use a free program like gzip which can decompress zip files. I don't want to decompress each archive to disk and read in the resultant csv file as this would be excruciatingly slow. Instead, I've seen you can pipe the output of gzip to another program. Is it possible, using system() calls to gzip, to capture this piped output as an fstream or some other stream which I can then getline() to read the csv data rows?
Jan 28 '08 #1
11 45928
RRick
463 Expert 256MB
On linux, gzip wants to create a single compressed file for each file passed. This doesn't sound like what you want

If you want to combine multiple files into a single compressed file, the tar command is the way to go. For linux, it is the archive workhorse. It will store and extract single or multiple files; supports directories; puts them in their own directories; or sends them to stdout.

What more could you ask for? :-)
Jan 29 '08 #2
gpraghuram
1,275 Expert 1GB
I have a large number of sequentially-named zip files, each containing a single csv data file which I need to read into my C++ program.

1) Does anybody know of any open source libraries to handle zip files? (I've seen some unportable, expensive commercial ones only).

2) I could use a free program like gzip which can decompress zip files. I don't want to decompress each archive to disk and read in the resultant csv file as this would be excruciatingly slow. Instead, I've seen you can pipe the output of gzip to another program. Is it possible, using system() calls to gzip, to capture this piped output as an fstream or some other stream which I can then getline() to read the csv data rows?
You idea 2 is good.
I remeber that there is a command whicn say what are all the files available inside the zipped file without opening it.(I dont remember the exact name).
With ur second idea even if you tryto unzip with pipe command then also the files will be unzipped.
I dont think that can boost the speed

Raghuram
Jan 29 '08 #3
arnaudk
424 256MB
Thanks for your replies,

RRick: I don't need to create any archives. I just have a collection of zip files, each containing one single file which I need to read into objects in my program, so gzip would work fine here, with the added advantage that the command syntax is the same on most platforms where it is installed, making my program more portable.

gpraghuram: Actually, I already know that the name of the file inside the zip file will be the same as the name of the archive (except the extension, of course), so I just have to unzip them. It's true that uncompressing the files which will cost some time, but this is unavoidable. What is avoidable, however, is any disk read/writes which are notoriously slow compared to keeping everything confined to the (solid-state) memory. Thus, I'd like to avoid creating temporary files on the hard disk, etc. Do you know how I can capture redirected/piped output into a stream?
Jan 29 '08 #4
mac11
256 100+
Do you know how I can capture redirected/piped output into a stream?
I have an idea If you're running Linux. Maybe try using a fifo (named pipe) - have gzip input to it and your program read from it - should work but I've never done it

google for "linux fifo" to learn fifos
Jan 29 '08 #5
RRick
463 Expert 256MB
On Unux, the trick used to redirect output from one process to the input of another process is called "piping". From the command line it looks something like:
Expand|Select|Wrap|Line Numbers
  1. gunzip -c xxx.gz | myProg
gunzip outputs to stdout which the pipe (|) redirects to the input of myProg.

All myProg has to do is read stdin to get the info.
Jan 30 '08 #6
arnaudk
424 256MB
I found some useful links after some exhaustive searching. For the benefit of others, I post them here.

Zlib, C library for reading/writing .gz files. The following are all based on this great library.

Gzstream, a wrapper for zlib which defines C++ streams for zlib which work just like ifstream, etc.

However, I need to read .zip archives which are more complex than .gz files because they can contain several files and directory structure. To this end, there is:

Minizip, an addon for zlib to handle .zip files. It is also distributed with Zlib1.2.3.

A C++ wrapper class for Minizip, written by David Godson

Note that to get them going in VC++ 2005 Express, I had to install the microsoft SDK since I got errors that windows.h wasn't found when I tried to compile.

With zlib and gzstream, I managed to read the contents of .gz files. But I'm still working on reading .zip files...
Jan 30 '08 #7
arnaudk
424 256MB
OK, after a lot of dredging through oodles of lines of C code, I finally managed to dump the buffer filled by the function unzReadCurrentFile in Minizip (which reads a file in a .zip archive) into a string stream which I can use in the rest of my c++ program. So, problem solved.
Jan 30 '08 #8
arnaudk
424 256MB
... and here is the code that does it (didn't use string streams in the end)
Expand|Select|Wrap|Line Numbers
  1. /*
  2.    unzips testfile.txt from C:\temp\test.zip
  3.    and puts it in a string
  4. */
  5. #include <cstdio>
  6. #include <string>
  7. #include <iostream>
  8. #include "unzip.h" // MiniZip library
  9.  
  10. #define WRITEBUFFERSIZE (5242880) // 5Mb buffer
  11.  
  12. using namespace std;
  13.  
  14. string readZipFile(string zipFile, string fileInZip) {
  15.     int err = UNZ_OK;                 // error status
  16.     uInt size_buf = WRITEBUFFERSIZE;  // byte size of buffer to store raw csv data
  17.     void* buf;                        // the buffer  
  18.     string sout;                      // output strings
  19.     char filename_inzip[256];         // for unzGetCurrentFileInfo
  20.     unz_file_info file_info;          // for unzGetCurrentFileInfo   
  21.  
  22.     unzFile uf = unzOpen(zipFile.c_str()); // open zipfile stream
  23.     if (uf==NULL) {
  24.         cerr << "Cannot open " << zipFile << endl;
  25.         return sout;
  26.     } // file is open
  27.  
  28.     if ( unzLocateFile(uf,fileInZip.c_str(),1) ) { // try to locate file inside zip
  29.         // second argument of unzLocateFile: 1 = case sensitive, 0 = case-insensitive
  30.         cerr << "File " << fileInZip << " not found in " << zipFile << endl;
  31.         return sout;
  32.     } // file inside zip found
  33.  
  34.     if (unzGetCurrentFileInfo(uf,&file_info,filename_inzip,sizeof(filename_inzip),NULL,0,NULL,0)) {
  35.         cerr << "Error " << err << " with zipfile " << zipFile << " in unzGetCurrentFileInfo." << endl;
  36.         return sout;
  37.     } // obtained the necessary details about file inside zip
  38.  
  39.     buf = (void*)malloc(size_buf); // setup buffer
  40.     if (buf==NULL) {
  41.         cerr << "Error allocating memory for read buffer" << endl;
  42.         return sout;
  43.     } // buffer ready
  44.  
  45.     err = unzOpenCurrentFilePassword(uf,NULL); // Open the file inside the zip (password = NULL)
  46.     if (err!=UNZ_OK) {
  47.         cerr << "Error " << err << " with zipfile " << zipFile << " in unzOpenCurrentFilePassword." << endl;
  48.         return sout;
  49.     } // file inside the zip is open
  50.  
  51.     // Copy contents of the file inside the zip to the buffer
  52.     cout << "Extracting: " << filename_inzip << " from " << zipFile << endl;
  53.     do {
  54.         err = unzReadCurrentFile(uf,buf,size_buf);
  55.         if (err<0) {
  56.             cerr << "Error " << err << " with zipfile " << zipFile << " in unzReadCurrentFile" << endl;
  57.             sout = ""; // empty output string
  58.             break;
  59.         }
  60.         // copy the buffer to a string
  61.         if (err>0) for (int i = 0; i < (int) err; i++) sout.push_back( *(((char*)buf)+i) );
  62.     } while (err>0);
  63.  
  64.     err = unzCloseCurrentFile (uf);  // close the zipfile
  65.     if (err!=UNZ_OK) {
  66.             cerr << "Error " << err << " with zipfile " << zipFile << " in unzCloseCurrentFile" << endl;
  67.             sout = ""; // empty output string
  68.         }
  69.  
  70.     free(buf); // free up buffer memory
  71.     return sout;
  72. }
  73.  
  74. int main(int argc, char *argv[]) {
  75.     string string_buffer = readZipFile("C:/temp/test.zip", "testfile.txt");
  76.     cout << string_buffer << endl;
  77.     return 0;
  78. }
  79.  
Jan 31 '08 #9
RRick
463 Expert 256MB
Very nice. I like how simple it is to find and extract the file.

Is this based on the zlib library? I believe the latest version is 1.2.3
Jan 31 '08 #10
gpraghuram
1,275 Expert 1GB
Very good effort and i like to appreciate for your work

Raghuram
Feb 1 '08 #11
mrviit
1
Thank you very much.
But if the path to the zip file or the name's zip file is Unicode. The unzOpen( or unzOpen64) cannot open the zip file. (I have change the type of the first parameter zipFile to wstring).

Can you help me, please?
Feb 20 '12 #12

Sign in to post your reply or Sign up for a free account.

Similar topics

14
by: Peter Galfi | last post by:
Hi! I am looking for a library in Python that would read PDF files and I could extract information from the PDF with it. I have searched with google, but only found libraries that can be used to...
1
by: John Puopolo | last post by:
All, Is there a class in the .NET framework specifically designed for reading configuration files, e.g., MyApp.exe.config? I know that I can read them via XmlReader and the like, but I was...
1
by: Manjunath sp via DotNetMonster.com | last post by:
Hi, How to effectively write and read structures from binary files in .Net? Currently I am using functions like ReadInt32 and the likes to read data from binary file into each elements of a...
9
by: jeff M via .NET 247 | last post by:
I'm still having problems reading EBCDIC files. Currently itlooks like the lower range (0 to 127) is working. I have triedthe following code pages 20284, 20924, 1140, 37, 500 and 20127.By working I...
1
by: Shmuel Shulman | last post by:
Hi I have 2 funny probs that are probably related 1. I can't read the entire field from a dbf file it cuts it at some point (see below) 2. When I use Access or SQL server to read these...
2
by: eddieb7 | last post by:
Hi, I am new to visual Studio 2005 C++ and am looking for some directions on where best to start. I come from a mainly Delphi background and looking to switch to VS 2005 C++ or C#. I am...
0
by: Anish G | last post by:
Hi, I have an issue with reading CSV files. I am to reading CSV file and putting it in a Datatable in C#. I am using a regular expression to read the values. Below is the code. Now, it reads...
10
by: lancer6238 | last post by:
Hi all, I'm having programs reading from files. I have a text file "files.txt" that contains the names of the files to be opened, i.e. the contents of files.txt are Homo_sapiens.fa...
2
by: doublemaster007 | last post by:
Hi How to read binary files in MAC OS? FILE *readFile = fopen("filename", "rb"); then reading it like this: count=fread(readBuffer, sizeof(char), bufferSize, readFile)
0
by: philipdv | last post by:
I have a process reading xml files over an https connection My code to read this xml file is CErrorAndLog.LogAlways("CKlim", "parseXML", "Parsing XML file: " + strFileName) ...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.