468,490 Members | 2,598 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 468,490 developers. It's quick & easy.

Reading the contents of zip files

424 256MB
I have a large number of sequentially-named zip files, each containing a single csv data file which I need to read into my C++ program.

1) Does anybody know of any open source libraries to handle zip files? (I've seen some unportable, expensive commercial ones only).

2) I could use a free program like gzip which can decompress zip files. I don't want to decompress each archive to disk and read in the resultant csv file as this would be excruciatingly slow. Instead, I've seen you can pipe the output of gzip to another program. Is it possible, using system() calls to gzip, to capture this piped output as an fstream or some other stream which I can then getline() to read the csv data rows?
Jan 28 '08 #1
11 41285
RRick
463 Expert 256MB
On linux, gzip wants to create a single compressed file for each file passed. This doesn't sound like what you want

If you want to combine multiple files into a single compressed file, the tar command is the way to go. For linux, it is the archive workhorse. It will store and extract single or multiple files; supports directories; puts them in their own directories; or sends them to stdout.

What more could you ask for? :-)
Jan 29 '08 #2
gpraghuram
1,275 Expert 1GB
I have a large number of sequentially-named zip files, each containing a single csv data file which I need to read into my C++ program.

1) Does anybody know of any open source libraries to handle zip files? (I've seen some unportable, expensive commercial ones only).

2) I could use a free program like gzip which can decompress zip files. I don't want to decompress each archive to disk and read in the resultant csv file as this would be excruciatingly slow. Instead, I've seen you can pipe the output of gzip to another program. Is it possible, using system() calls to gzip, to capture this piped output as an fstream or some other stream which I can then getline() to read the csv data rows?
You idea 2 is good.
I remeber that there is a command whicn say what are all the files available inside the zipped file without opening it.(I dont remember the exact name).
With ur second idea even if you tryto unzip with pipe command then also the files will be unzipped.
I dont think that can boost the speed

Raghuram
Jan 29 '08 #3
arnaudk
424 256MB
Thanks for your replies,

RRick: I don't need to create any archives. I just have a collection of zip files, each containing one single file which I need to read into objects in my program, so gzip would work fine here, with the added advantage that the command syntax is the same on most platforms where it is installed, making my program more portable.

gpraghuram: Actually, I already know that the name of the file inside the zip file will be the same as the name of the archive (except the extension, of course), so I just have to unzip them. It's true that uncompressing the files which will cost some time, but this is unavoidable. What is avoidable, however, is any disk read/writes which are notoriously slow compared to keeping everything confined to the (solid-state) memory. Thus, I'd like to avoid creating temporary files on the hard disk, etc. Do you know how I can capture redirected/piped output into a stream?
Jan 29 '08 #4
mac11
256 100+
Do you know how I can capture redirected/piped output into a stream?
I have an idea If you're running Linux. Maybe try using a fifo (named pipe) - have gzip input to it and your program read from it - should work but I've never done it

google for "linux fifo" to learn fifos
Jan 29 '08 #5
RRick
463 Expert 256MB
On Unux, the trick used to redirect output from one process to the input of another process is called "piping". From the command line it looks something like:
Expand|Select|Wrap|Line Numbers
  1. gunzip -c xxx.gz | myProg
gunzip outputs to stdout which the pipe (|) redirects to the input of myProg.

All myProg has to do is read stdin to get the info.
Jan 30 '08 #6
arnaudk
424 256MB
I found some useful links after some exhaustive searching. For the benefit of others, I post them here.

Zlib, C library for reading/writing .gz files. The following are all based on this great library.

Gzstream, a wrapper for zlib which defines C++ streams for zlib which work just like ifstream, etc.

However, I need to read .zip archives which are more complex than .gz files because they can contain several files and directory structure. To this end, there is:

Minizip, an addon for zlib to handle .zip files. It is also distributed with Zlib1.2.3.

A C++ wrapper class for Minizip, written by David Godson

Note that to get them going in VC++ 2005 Express, I had to install the microsoft SDK since I got errors that windows.h wasn't found when I tried to compile.

With zlib and gzstream, I managed to read the contents of .gz files. But I'm still working on reading .zip files...
Jan 30 '08 #7
arnaudk
424 256MB
OK, after a lot of dredging through oodles of lines of C code, I finally managed to dump the buffer filled by the function unzReadCurrentFile in Minizip (which reads a file in a .zip archive) into a string stream which I can use in the rest of my c++ program. So, problem solved.
Jan 30 '08 #8
arnaudk
424 256MB
... and here is the code that does it (didn't use string streams in the end)
Expand|Select|Wrap|Line Numbers
  1. /*
  2.    unzips testfile.txt from C:\temp\test.zip
  3.    and puts it in a string
  4. */
  5. #include <cstdio>
  6. #include <string>
  7. #include <iostream>
  8. #include "unzip.h" // MiniZip library
  9.  
  10. #define WRITEBUFFERSIZE (5242880) // 5Mb buffer
  11.  
  12. using namespace std;
  13.  
  14. string readZipFile(string zipFile, string fileInZip) {
  15.     int err = UNZ_OK;                 // error status
  16.     uInt size_buf = WRITEBUFFERSIZE;  // byte size of buffer to store raw csv data
  17.     void* buf;                        // the buffer  
  18.     string sout;                      // output strings
  19.     char filename_inzip[256];         // for unzGetCurrentFileInfo
  20.     unz_file_info file_info;          // for unzGetCurrentFileInfo   
  21.  
  22.     unzFile uf = unzOpen(zipFile.c_str()); // open zipfile stream
  23.     if (uf==NULL) {
  24.         cerr << "Cannot open " << zipFile << endl;
  25.         return sout;
  26.     } // file is open
  27.  
  28.     if ( unzLocateFile(uf,fileInZip.c_str(),1) ) { // try to locate file inside zip
  29.         // second argument of unzLocateFile: 1 = case sensitive, 0 = case-insensitive
  30.         cerr << "File " << fileInZip << " not found in " << zipFile << endl;
  31.         return sout;
  32.     } // file inside zip found
  33.  
  34.     if (unzGetCurrentFileInfo(uf,&file_info,filename_inzip,sizeof(filename_inzip),NULL,0,NULL,0)) {
  35.         cerr << "Error " << err << " with zipfile " << zipFile << " in unzGetCurrentFileInfo." << endl;
  36.         return sout;
  37.     } // obtained the necessary details about file inside zip
  38.  
  39.     buf = (void*)malloc(size_buf); // setup buffer
  40.     if (buf==NULL) {
  41.         cerr << "Error allocating memory for read buffer" << endl;
  42.         return sout;
  43.     } // buffer ready
  44.  
  45.     err = unzOpenCurrentFilePassword(uf,NULL); // Open the file inside the zip (password = NULL)
  46.     if (err!=UNZ_OK) {
  47.         cerr << "Error " << err << " with zipfile " << zipFile << " in unzOpenCurrentFilePassword." << endl;
  48.         return sout;
  49.     } // file inside the zip is open
  50.  
  51.     // Copy contents of the file inside the zip to the buffer
  52.     cout << "Extracting: " << filename_inzip << " from " << zipFile << endl;
  53.     do {
  54.         err = unzReadCurrentFile(uf,buf,size_buf);
  55.         if (err<0) {
  56.             cerr << "Error " << err << " with zipfile " << zipFile << " in unzReadCurrentFile" << endl;
  57.             sout = ""; // empty output string
  58.             break;
  59.         }
  60.         // copy the buffer to a string
  61.         if (err>0) for (int i = 0; i < (int) err; i++) sout.push_back( *(((char*)buf)+i) );
  62.     } while (err>0);
  63.  
  64.     err = unzCloseCurrentFile (uf);  // close the zipfile
  65.     if (err!=UNZ_OK) {
  66.             cerr << "Error " << err << " with zipfile " << zipFile << " in unzCloseCurrentFile" << endl;
  67.             sout = ""; // empty output string
  68.         }
  69.  
  70.     free(buf); // free up buffer memory
  71.     return sout;
  72. }
  73.  
  74. int main(int argc, char *argv[]) {
  75.     string string_buffer = readZipFile("C:/temp/test.zip", "testfile.txt");
  76.     cout << string_buffer << endl;
  77.     return 0;
  78. }
  79.  
Jan 31 '08 #9
RRick
463 Expert 256MB
Very nice. I like how simple it is to find and extract the file.

Is this based on the zlib library? I believe the latest version is 1.2.3
Jan 31 '08 #10
gpraghuram
1,275 Expert 1GB
Very good effort and i like to appreciate for your work

Raghuram
Feb 1 '08 #11
mrviit
1
Thank you very much.
But if the path to the zip file or the name's zip file is Unicode. The unzOpen( or unzOpen64) cannot open the zip file. (I have change the type of the first parameter zipFile to wstring).

Can you help me, please?
Feb 20 '12 #12

Post your reply

Sign in to post your reply or Sign up for a free account.

Similar topics

14 posts views Thread by Peter Galfi | last post: by
1 post views Thread by John Puopolo | last post: by
1 post views Thread by Manjunath sp via DotNetMonster.com | last post: by
9 posts views Thread by jeff M via .NET 247 | last post: by
1 post views Thread by Shmuel Shulman | last post: by
10 posts views Thread by lancer6238 | last post: by
2 posts views Thread by doublemaster007 | last post: by
3 posts views Thread by gieforce | last post: by
reply views Thread by theflame83 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.