ZipOutputStream Writes Corrupt Zip Files

Hal Vaughan

I am writing out archive files using ZipOutputStream with the following
code:

aEntry is a global Array of ZipEntries
llData is a LinkedList of the data corresponding to the the ZipEntry of the
same index in aEntry

public void save() {
byte[] bBuffer = null;
int i = 0;
String sName = "";
ZipEntry entry;
try {
ZipOutputStream zon = new ZipOutputStream(new FileOutputStream(outFile));
for (i = 0; i < aEntry.length; i++) {
entry = new ZipEntry(aEntry[i]);
zon.putNextEntry(entry);
bBuffer = (byte[]) llData.get(i);
zon.write(bBuffer, 0, bBuffer.length);
}
zon.flush();
zon.close();
} catch (Exception e) {
sysConfig.log("error", "Could not process archive file: " + inFile + ",
Error: " + e);} //My logging routine to report errors
}

Command line instances of zip (on both Linux and Windows) have reported some
of the files created with this method as corrupt. It seems all the entries
are intact, but an inspection with a hex editor shows that some files don't
have the Central Directory Header at teh end of the file, and in some
files, the Central Directory Header carries the wrong information -- it
doesn't point to the actual Central Directory and may be too long.

The archive files I'm writing are for OpenOffice, which stores it's files as
multiple files in an archive format. OpenOffice can open the files, but it
sees them as corrupt and has to fix them. It recovers the data, and can
save them to a new file. This is what convinces me that it's only the
Central Directory Header (and possibly the Central Directory) that are
messed up.

I wrote a class to detect the faulty files. Some are easily detected
because they cause errors when read in by ZipInputStream (wrapped around
FileInputStream), but some don't cause those errors.

Any help would be appreciated. I cannot see any problem in my code, which I
have compared to a number of tutorials and they all seem to be quite close
to what I have.

Thanks!

Hal

Jul 17 '05 #1

Subscribe Post Reply

19245

Silvio Bierman

"Hal Vaughan" <ha*@thresholddigital.com> wrote in message
news:re********************@comcast.com...

I am writing out archive files using ZipOutputStream with the following
code:

aEntry is a global Array of ZipEntries
llData is a LinkedList of the data corresponding to the the ZipEntry of the same index in aEntry

public void save() {
byte[] bBuffer = null;
int i = 0;
String sName = "";
ZipEntry entry;
try {
ZipOutputStream zon = new ZipOutputStream(new FileOutputStream(outFile)); for (i = 0; i < aEntry.length; i++) {
entry = new ZipEntry(aEntry[i]);
zon.putNextEntry(entry);
bBuffer = (byte[]) llData.get(i);
zon.write(bBuffer, 0, bBuffer.length);
}
zon.flush();
zon.close();
} catch (Exception e) {
sysConfig.log("error", "Could not process archive file: " + inFile + ", Error: " + e);} //My logging routine to report errors
}

Command line instances of zip (on both Linux and Windows) have reported some of the files created with this method as corrupt. It seems all the entries are intact, but an inspection with a hex editor shows that some files don't have the Central Directory Header at teh end of the file, and in some
files, the Central Directory Header carries the wrong information -- it
doesn't point to the actual Central Directory and may be too long.

The archive files I'm writing are for OpenOffice, which stores it's files as multiple files in an archive format. OpenOffice can open the files, but it sees them as corrupt and has to fix them. It recovers the data, and can
save them to a new file. This is what convinces me that it's only the
Central Directory Header (and possibly the Central Directory) that are
messed up.

I wrote a class to detect the faulty files. Some are easily detected
because they cause errors when read in by ZipInputStream (wrapped around
FileInputStream), but some don't cause those errors.

Any help would be appreciated. I cannot see any problem in my code, which I have compared to a number of tutorials and they all seem to be quite close
to what I have.

Thanks!

Hal

Show us the code that creates the entries. You have byte[] containing the
data. Where these by any chance created by writing to a
ByteArrayOutputStream which you did not close/flush before taking out the
bytes?

Sounds like a flushing problem to me.

Regards,

Silvio Bierman

Jul 17 '05 #2

Hal Vaughan

Silvio Bierman wrote:

"Hal Vaughan" <ha*@thresholddigital.com> wrote in message
news:re********************@comcast.com...
I am writing out archive files using ZipOutputStream with the following
code:

aEntry is a global Array of ZipEntries
llData is a LinkedList of the data corresponding to the the ZipEntry of the
same index in aEntry

public void save() {
byte[] bBuffer = null;
int i = 0;
String sName = "";
ZipEntry entry;
try {
ZipOutputStream zon = new ZipOutputStream(new

FileOutputStream(outFile));
for (i = 0; i < aEntry.length; i++) {
entry = new ZipEntry(aEntry[i]);
zon.putNextEntry(entry);
bBuffer = (byte[]) llData.get(i);
zon.write(bBuffer, 0, bBuffer.length);
}
zon.flush();
zon.close();
} catch (Exception e) {
sysConfig.log("error", "Could not process archive file: "

+ inFile + ",
Error: " + e);} //My logging routine to report errors
}

Command line instances of zip (on both Linux and Windows) have reported

some
of the files created with this method as corrupt. It seems all the

entries
are intact, but an inspection with a hex editor shows that some files

don't
have the Central Directory Header at teh end of the file, and in some
files, the Central Directory Header carries the wrong information -- it
doesn't point to the actual Central Directory and may be too long.

The archive files I'm writing are for OpenOffice, which stores it's files

as
multiple files in an archive format. OpenOffice can open the files, but

it
sees them as corrupt and has to fix them. It recovers the data, and can
save them to a new file. This is what convinces me that it's only the
Central Directory Header (and possibly the Central Directory) that are
messed up.

I wrote a class to detect the faulty files. Some are easily detected
because they cause errors when read in by ZipInputStream (wrapped around
FileInputStream), but some don't cause those errors.

Any help would be appreciated. I cannot see any problem in my code,
which

I
have compared to a number of tutorials and they all seem to be quite
close to what I have.

Thanks!

Hal

Show us the code that creates the entries. You have byte[] containing the
data. Where these by any chance created by writing to a
ByteArrayOutputStream which you did not close/flush before taking out the
bytes?

Sounds like a flushing problem to me.

I think it's some kind of flushing problem, too. But the entries are all
intact -- OpenOffice has no problem recreating the entire file, but it
seems to be the Central Directory and the Central Directory Header
(especially the latter) that are messed up. The entries are created an
original OpenOffice document is loaded:

public loadZip(String s) {
int i = 0;
byte[] bIn = new byte[4096], bData;
LinkedList llEntry = new LinkedList();
inFile = s;
outFile = s;
try {
ByteArrayOutputStream bBuffer;
ZipInputStream zin =
new ZipInputStream(new FileInputStream(inFile));
ZipEntry entry;
while ((entry = zin.getNextEntry()) != null) {
bBuffer = new ByteArrayOutputStream();
llEntry.add(entry.getName());
bData = new byte[0];
while ((i = zin.read(bIn, 0, 4096)) != -1) {
bBuffer.write(bIn, 0, i);
}
bData = bBuffer.toByteArray();
llData.add(bData);
}
} catch (Exception e)
{//My own error logging program here}
aEntry = TDUtil.linkedListToArray(llEntry);
//This takes the names of all the entries and stores them in aEntry, which
//is used in the output routine to create entries.
}

When I have all the entries loaded, I edit one, the one that contains all
the text content of the document, then replace the original content with
the new content. To do that, I convert this entry from byte[] to String:

//iEntry is the entry number and index for the linked list
String sData = new String((byte[]) llData.get(iEntry));

And when I'm done editing the string, I put it back into the llData
LinkedList of byte[]'s like this:

llData.set(iEntry, sData.getBytes());

This is the only thing I change in any entries. It is part of a loop. I
load in the original file one time, then I go through a loop, editing the
content entry, replacing it in the LinkedList, then saving a file with the
new entry, and repeating from the editing step on. Each time I save, I'm
calling the output method again, so the output streams should be new and
"clean." (If there's any reason why they aren't, I'd like to know, and
also know what I can do to prevent it.)

The strange thing is that this is working perfectly on 3 boxen, all on
different operating systems, and is only messing up on 1 box, which is on
the same OS as one of the other function boxen -- and it doesn't do it all
the time, it only messes up some files.

I'm re-writing the 2 classes this effects to do the following:

1) The Zip input will load it into a byte[] first, to check the contents
first and make sure it's correct, THEN I'll read in the Zip file using a
ByteArrayInputStream with the ZipInputStream wrapped around it. I'm
improving error checking, so if the Zip isn't loaded properly, I'll re-try
and, after several failures move on to the next one (I don't think this
will make a difference -- I'm getting a number of good files generated from
each one I load and only a few bad ones).

2) Just out of paranoia, and to make sure everything is "fresh", once I load
in the first archive, I will not change it, but I'll copy all the entries
to a new object, THEN change the content and write out the NEW, CLONED zip
instead. This way I won't be taking one Zip file and changing an entry a
number of times and writing each new file to disk.

3) When I create the output Zip file, I'm going to write it to a byte[]
first (wrapping a ZipOutputStream around a ByteArrayOutputstream), and
verify that the Central Directory and Central Directory Header are intact
and that all entries can be read BEFORE I write the verified byte[] to
disk. (If I still have problems and the byte[] is testing as good, I'll
even load the new file back in and double check it -- but I'll set flags so
that will only happen on systems that have had problems with this.)

I don't think I've ever been so stumped as I am on this problem, so any help
is greatly appreciated and I'll be glad to provide more information.

Thanks!

Hal
Regards,

Silvio Bierman

Jul 17 '05 #3

Silvio Bierman

"Hal Vaughan" <ha*@thresholddigital.com> wrote in message
news:re********************@comcast.com...

Silvio Bierman wrote:

"Hal Vaughan" <ha*@thresholddigital.com> wrote in message
news:re********************@comcast.com...
I am writing out archive files using ZipOutputStream with the following
code:

On second thought I realized that a BAOS#flush is a noop and looking at your
code I doubt that is has something to do with flushing.

When I have all the entries loaded, I edit one, the one that contains all
the text content of the document, then replace the original content with
the new content. To do that, I convert this entry from byte[] to String:

//iEntry is the entry number and index for the linked list
String sData = new String((byte[]) llData.get(iEntry));

And when I'm done editing the string, I put it back into the llData
LinkedList of byte[]'s like this:

llData.set(iEntry, sData.getBytes());

This is the only thing I change in any entries. It is part of a loop. I
load in the original file one time, then I go through a loop, editing the
content entry, replacing it in the LinkedList, then saving a file with the
new entry, and repeating from the editing step on. Each time I save, I'm
calling the output method again, so the output streams should be new and
"clean." (If there's any reason why they aren't, I'd like to know, and
also know what I can do to prevent it.)

Have you tried leaving out this step, effectively creating a duplicate of
the original ZIP? My guess is that that will work...
The strange thing is that this is working perfectly on 3 boxen, all on
different operating systems, and is only messing up on 1 box, which is on
the same OS as one of the other function boxen -- and it doesn't do it all
the time, it only messes up some files.

Could you look at the language/locale settings of the machines? You use
byte[]->String and String->byte[] conversions without an explicit encoding,
which will default to a system-dependant encoding. Try leaving out the
modification pass and if that works, figure out what the default encoding is
for the machines where it does work and use that encoding explicitly in your
code. It might just work on all machines then...

Good luck,

Silvio Bierman

Jul 17 '05 #4

Hal Vaughan

Silvio Bierman wrote:

"Hal Vaughan" <ha*@thresholddigital.com> wrote in message
news:re********************@comcast.com...
Silvio Bierman wrote:
>
> "Hal Vaughan" <ha*@thresholddigital.com> wrote in message
> news:re********************@comcast.com...
>> I am writing out archive files using ZipOutputStream with the
>> following code:
On second thought I realized that a BAOS#flush is a noop and looking at
your code I doubt that is has something to do with flushing.

When I have all the entries loaded, I edit one, the one that contains all
the text content of the document, then replace the original content with
the new content. To do that, I convert this entry from byte[] to String:

//iEntry is the entry number and index for the linked list
String sData = new String((byte[]) llData.get(iEntry));

And when I'm done editing the string, I put it back into the llData
LinkedList of byte[]'s like this:

llData.set(iEntry, sData.getBytes());

This is the only thing I change in any entries. It is part of a loop. I
load in the original file one time, then I go through a loop, editing the
content entry, replacing it in the LinkedList, then saving a file with
the
new entry, and repeating from the editing step on. Each time I save, I'm
calling the output method again, so the output streams should be new and
"clean." (If there's any reason why they aren't, I'd like to know, and
also know what I can do to prevent it.)

Have you tried leaving out this step, effectively creating a duplicate of
the original ZIP? My guess is that that will work...

I'm working on something close to it. The new version will take the
original Zip and clone it, so I'll only be modifying duplicates.
The strange thing is that this is working perfectly on 3 boxen, all on
different operating systems, and is only messing up on 1 box, which is on
the same OS as one of the other function boxen -- and it doesn't do it
all the time, it only messes up some files.

Could you look at the language/locale settings of the machines? You use
byte[]->String and String->byte[] conversions without an explicit
encoding, which will default to a system-dependant encoding. Try leaving
out the modification pass and if that works, figure out what the default
encoding is for the machines where it does work and use that encoding
explicitly in your code. It might just work on all machines then...

That sound you hear is me smacking my forehead and saying, "D'oh". I
remember, when I was first learning Java, and finding cases of specifying
the encoding types, I thought, "This system will only be used within this
state-- that should never be a problem. Bag it." The system with the
problem is near Washington, D.C., and it is very possible it's been used by
people who have changed the localization settings, and possibly when they
were reset, they weren't reset to what I expected.

I'll check on that -- as well as all the other things I mentioned.

I created a couple short term fixes. I wrote a class that would go through
the directory, find all the files that Java couldn't read in as Zips, and
deleted them. That got rid of most of the problems. Then I upgraded the
fix so it not only tried to read them in, but found the start of the
Central Directory Header, got the address of the Central Directory, made
sure the CDH was the correct length, and that the CD was where it should
be. Less than 1% of the files were corrupted in a manner I could detect,
but that was enough to clog up the system. Now they're automatically
removed. That makes it work, but it means a loss of data, so I'm still
trying to make sure the Zip files are written out correctly.

Thanks for the suggestions and ideas. I'll post what happens next week,
when I get results.

Hal
Good luck,

Silvio Bierman

Jul 17 '05 #5

Similar topics

Zip files in other directories

by: Alex | last post by:

Trying to figure out how to use the following program to zip up files in another directory other than current "." Using current works fine, I substitute it with a path "c:\\stuff" I get the...

Java

Problem Saving Zip Files (Only Shows on WinXP for NOw)

by: Hal Vaughan | last post by:

I am using OpenOffice files in my app. They're stored in .zip format, so the zip format is much more important, in this case, than that they're OOo files (other than that the extension is .sxw and...

Java

DLL In IIS Cant Wite Files to some servers, VB App Can?

by: Scott Townsend | last post by:

I have a DLL that created field. You assign it a path and it writes the files to the path. If I create an ASP Page that uses the DLL I cant seem to get it to write files to servers with long...

ASP / Active Server Pages

Corrupt files when using SSL download and ASP 3

by: Jens Nilson | last post by:

My company has a web application for downloading and uploading of reports in MS Office formats. The application has been working fine until we added certificate for secure transfer. Now, when the...

ASP / Active Server Pages

writes to files passed as struct of ofstream objects fail

by: m vaughn | last post by:

I wanted to hold information about open files in a struct and be able pass it around, but the behavior is not what I would expect. Can someone tell me why this doesn't work? (Even better, what is...

C / C++

Where is Temporary Files for ASP.NET?

by: poi | last post by:

I want to detect through code where the ASP.NET temporary files location is, or where the System TEMP directory is, so I can write temporary datasets to that location. Normally the user is ASPNET,...

ASP.NET

writting excel files

by: Boris Condarco | last post by:

Hi gurus, I'm using excel 2000 to show data that comes from datagrid. The problem is that for any reason the asp.net application maintains the excel open, even though, i do close it. Besides,...

ASP.NET

reading and writing dbf files

by: Andrus | last post by:

For file based export-import with existing programs I need to implement dbf format file read/write from my .NET 2 winforms application. It is not possible to force users to install something so...

C# / C Sharp

Zip files on the fly in servlet

by: ysong | last post by:

Tried to zip multiple files on the fly in java servlet with zip package (ZipOutputStream), so users can download the zip file by calling the servlet. It works fine for small files. But occasionally...

Java

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice