469,286 Members | 2,476 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,286 developers. It's quick & easy.

ZipOutputStream Writes Corrupt Zip Files

I am writing out archive files using ZipOutputStream with the following
code:

aEntry is a global Array of ZipEntries
llData is a LinkedList of the data corresponding to the the ZipEntry of the
same index in aEntry

public void save() {
byte[] bBuffer = null;
int i = 0;
String sName = "";
ZipEntry entry;
try {
ZipOutputStream zon = new ZipOutputStream(new FileOutputStream(outFile));
for (i = 0; i < aEntry.length; i++) {
entry = new ZipEntry(aEntry[i]);
zon.putNextEntry(entry);
bBuffer = (byte[]) llData.get(i);
zon.write(bBuffer, 0, bBuffer.length);
}
zon.flush();
zon.close();
} catch (Exception e) {
sysConfig.log("error", "Could not process archive file: " + inFile + ",
Error: " + e);} //My logging routine to report errors
}

Command line instances of zip (on both Linux and Windows) have reported some
of the files created with this method as corrupt. It seems all the entries
are intact, but an inspection with a hex editor shows that some files don't
have the Central Directory Header at teh end of the file, and in some
files, the Central Directory Header carries the wrong information -- it
doesn't point to the actual Central Directory and may be too long.

The archive files I'm writing are for OpenOffice, which stores it's files as
multiple files in an archive format. OpenOffice can open the files, but it
sees them as corrupt and has to fix them. It recovers the data, and can
save them to a new file. This is what convinces me that it's only the
Central Directory Header (and possibly the Central Directory) that are
messed up.

I wrote a class to detect the faulty files. Some are easily detected
because they cause errors when read in by ZipInputStream (wrapped around
FileInputStream), but some don't cause those errors.

Any help would be appreciated. I cannot see any problem in my code, which I
have compared to a number of tutorials and they all seem to be quite close
to what I have.

Thanks!

Hal
Jul 17 '05 #1
4 18407

"Hal Vaughan" <ha*@thresholddigital.com> wrote in message
news:re********************@comcast.com...
I am writing out archive files using ZipOutputStream with the following
code:

aEntry is a global Array of ZipEntries
llData is a LinkedList of the data corresponding to the the ZipEntry of the same index in aEntry

public void save() {
byte[] bBuffer = null;
int i = 0;
String sName = "";
ZipEntry entry;
try {
ZipOutputStream zon = new ZipOutputStream(new FileOutputStream(outFile)); for (i = 0; i < aEntry.length; i++) {
entry = new ZipEntry(aEntry[i]);
zon.putNextEntry(entry);
bBuffer = (byte[]) llData.get(i);
zon.write(bBuffer, 0, bBuffer.length);
}
zon.flush();
zon.close();
} catch (Exception e) {
sysConfig.log("error", "Could not process archive file: " + inFile + ", Error: " + e);} //My logging routine to report errors
}

Command line instances of zip (on both Linux and Windows) have reported some of the files created with this method as corrupt. It seems all the entries are intact, but an inspection with a hex editor shows that some files don't have the Central Directory Header at teh end of the file, and in some
files, the Central Directory Header carries the wrong information -- it
doesn't point to the actual Central Directory and may be too long.

The archive files I'm writing are for OpenOffice, which stores it's files as multiple files in an archive format. OpenOffice can open the files, but it sees them as corrupt and has to fix them. It recovers the data, and can
save them to a new file. This is what convinces me that it's only the
Central Directory Header (and possibly the Central Directory) that are
messed up.

I wrote a class to detect the faulty files. Some are easily detected
because they cause errors when read in by ZipInputStream (wrapped around
FileInputStream), but some don't cause those errors.

Any help would be appreciated. I cannot see any problem in my code, which I have compared to a number of tutorials and they all seem to be quite close
to what I have.

Thanks!

Hal


Show us the code that creates the entries. You have byte[] containing the
data. Where these by any chance created by writing to a
ByteArrayOutputStream which you did not close/flush before taking out the
bytes?

Sounds like a flushing problem to me.

Regards,

Silvio Bierman
Jul 17 '05 #2
Silvio Bierman wrote:

"Hal Vaughan" <ha*@thresholddigital.com> wrote in message
news:re********************@comcast.com...
I am writing out archive files using ZipOutputStream with the following
code:

aEntry is a global Array of ZipEntries
llData is a LinkedList of the data corresponding to the the ZipEntry of the
same index in aEntry

public void save() {
byte[] bBuffer = null;
int i = 0;
String sName = "";
ZipEntry entry;
try {
ZipOutputStream zon = new ZipOutputStream(new

FileOutputStream(outFile));
for (i = 0; i < aEntry.length; i++) {
entry = new ZipEntry(aEntry[i]);
zon.putNextEntry(entry);
bBuffer = (byte[]) llData.get(i);
zon.write(bBuffer, 0, bBuffer.length);
}
zon.flush();
zon.close();
} catch (Exception e) {
sysConfig.log("error", "Could not process archive file: "

+ inFile + ",
Error: " + e);} //My logging routine to report errors
}

Command line instances of zip (on both Linux and Windows) have reported

some
of the files created with this method as corrupt. It seems all the

entries
are intact, but an inspection with a hex editor shows that some files

don't
have the Central Directory Header at teh end of the file, and in some
files, the Central Directory Header carries the wrong information -- it
doesn't point to the actual Central Directory and may be too long.

The archive files I'm writing are for OpenOffice, which stores it's files

as
multiple files in an archive format. OpenOffice can open the files, but

it
sees them as corrupt and has to fix them. It recovers the data, and can
save them to a new file. This is what convinces me that it's only the
Central Directory Header (and possibly the Central Directory) that are
messed up.

I wrote a class to detect the faulty files. Some are easily detected
because they cause errors when read in by ZipInputStream (wrapped around
FileInputStream), but some don't cause those errors.

Any help would be appreciated. I cannot see any problem in my code,
which

I
have compared to a number of tutorials and they all seem to be quite
close to what I have.

Thanks!

Hal


Show us the code that creates the entries. You have byte[] containing the
data. Where these by any chance created by writing to a
ByteArrayOutputStream which you did not close/flush before taking out the
bytes?

Sounds like a flushing problem to me.


I think it's some kind of flushing problem, too. But the entries are all
intact -- OpenOffice has no problem recreating the entire file, but it
seems to be the Central Directory and the Central Directory Header
(especially the latter) that are messed up. The entries are created an
original OpenOffice document is loaded:

public loadZip(String s) {
int i = 0;
byte[] bIn = new byte[4096], bData;
LinkedList llEntry = new LinkedList();
inFile = s;
outFile = s;
try {
ByteArrayOutputStream bBuffer;
ZipInputStream zin =
new ZipInputStream(new FileInputStream(inFile));
ZipEntry entry;
while ((entry = zin.getNextEntry()) != null) {
bBuffer = new ByteArrayOutputStream();
llEntry.add(entry.getName());
bData = new byte[0];
while ((i = zin.read(bIn, 0, 4096)) != -1) {
bBuffer.write(bIn, 0, i);
}
bData = bBuffer.toByteArray();
llData.add(bData);
}
} catch (Exception e)
{//My own error logging program here}
aEntry = TDUtil.linkedListToArray(llEntry);
//This takes the names of all the entries and stores them in aEntry, which
//is used in the output routine to create entries.
}

When I have all the entries loaded, I edit one, the one that contains all
the text content of the document, then replace the original content with
the new content. To do that, I convert this entry from byte[] to String:

//iEntry is the entry number and index for the linked list
String sData = new String((byte[]) llData.get(iEntry));

And when I'm done editing the string, I put it back into the llData
LinkedList of byte[]'s like this:

llData.set(iEntry, sData.getBytes());

This is the only thing I change in any entries. It is part of a loop. I
load in the original file one time, then I go through a loop, editing the
content entry, replacing it in the LinkedList, then saving a file with the
new entry, and repeating from the editing step on. Each time I save, I'm
calling the output method again, so the output streams should be new and
"clean." (If there's any reason why they aren't, I'd like to know, and
also know what I can do to prevent it.)

The strange thing is that this is working perfectly on 3 boxen, all on
different operating systems, and is only messing up on 1 box, which is on
the same OS as one of the other function boxen -- and it doesn't do it all
the time, it only messes up some files.

I'm re-writing the 2 classes this effects to do the following:

1) The Zip input will load it into a byte[] first, to check the contents
first and make sure it's correct, THEN I'll read in the Zip file using a
ByteArrayInputStream with the ZipInputStream wrapped around it. I'm
improving error checking, so if the Zip isn't loaded properly, I'll re-try
and, after several failures move on to the next one (I don't think this
will make a difference -- I'm getting a number of good files generated from
each one I load and only a few bad ones).

2) Just out of paranoia, and to make sure everything is "fresh", once I load
in the first archive, I will not change it, but I'll copy all the entries
to a new object, THEN change the content and write out the NEW, CLONED zip
instead. This way I won't be taking one Zip file and changing an entry a
number of times and writing each new file to disk.

3) When I create the output Zip file, I'm going to write it to a byte[]
first (wrapping a ZipOutputStream around a ByteArrayOutputstream), and
verify that the Central Directory and Central Directory Header are intact
and that all entries can be read BEFORE I write the verified byte[] to
disk. (If I still have problems and the byte[] is testing as good, I'll
even load the new file back in and double check it -- but I'll set flags so
that will only happen on systems that have had problems with this.)

I don't think I've ever been so stumped as I am on this problem, so any help
is greatly appreciated and I'll be glad to provide more information.

Thanks!

Hal
Regards,

Silvio Bierman


Jul 17 '05 #3

"Hal Vaughan" <ha*@thresholddigital.com> wrote in message
news:re********************@comcast.com...
Silvio Bierman wrote:

"Hal Vaughan" <ha*@thresholddigital.com> wrote in message
news:re********************@comcast.com...
I am writing out archive files using ZipOutputStream with the following
code:

On second thought I realized that a BAOS#flush is a noop and looking at your
code I doubt that is has something to do with flushing.

When I have all the entries loaded, I edit one, the one that contains all
the text content of the document, then replace the original content with
the new content. To do that, I convert this entry from byte[] to String:

//iEntry is the entry number and index for the linked list
String sData = new String((byte[]) llData.get(iEntry));

And when I'm done editing the string, I put it back into the llData
LinkedList of byte[]'s like this:

llData.set(iEntry, sData.getBytes());

This is the only thing I change in any entries. It is part of a loop. I
load in the original file one time, then I go through a loop, editing the
content entry, replacing it in the LinkedList, then saving a file with the
new entry, and repeating from the editing step on. Each time I save, I'm
calling the output method again, so the output streams should be new and
"clean." (If there's any reason why they aren't, I'd like to know, and
also know what I can do to prevent it.)

Have you tried leaving out this step, effectively creating a duplicate of
the original ZIP? My guess is that that will work...
The strange thing is that this is working perfectly on 3 boxen, all on
different operating systems, and is only messing up on 1 box, which is on
the same OS as one of the other function boxen -- and it doesn't do it all
the time, it only messes up some files.


Could you look at the language/locale settings of the machines? You use
byte[]->String and String->byte[] conversions without an explicit encoding,
which will default to a system-dependant encoding. Try leaving out the
modification pass and if that works, figure out what the default encoding is
for the machines where it does work and use that encoding explicitly in your
code. It might just work on all machines then...

Good luck,

Silvio Bierman
Jul 17 '05 #4
Silvio Bierman wrote:

"Hal Vaughan" <ha*@thresholddigital.com> wrote in message
news:re********************@comcast.com...
Silvio Bierman wrote:
>
> "Hal Vaughan" <ha*@thresholddigital.com> wrote in message
> news:re********************@comcast.com...
>> I am writing out archive files using ZipOutputStream with the
>> following code:
On second thought I realized that a BAOS#flush is a noop and looking at
your code I doubt that is has something to do with flushing.

When I have all the entries loaded, I edit one, the one that contains all
the text content of the document, then replace the original content with
the new content. To do that, I convert this entry from byte[] to String:

//iEntry is the entry number and index for the linked list
String sData = new String((byte[]) llData.get(iEntry));

And when I'm done editing the string, I put it back into the llData
LinkedList of byte[]'s like this:

llData.set(iEntry, sData.getBytes());

This is the only thing I change in any entries. It is part of a loop. I
load in the original file one time, then I go through a loop, editing the
content entry, replacing it in the LinkedList, then saving a file with
the
new entry, and repeating from the editing step on. Each time I save, I'm
calling the output method again, so the output streams should be new and
"clean." (If there's any reason why they aren't, I'd like to know, and
also know what I can do to prevent it.)

Have you tried leaving out this step, effectively creating a duplicate of
the original ZIP? My guess is that that will work...


I'm working on something close to it. The new version will take the
original Zip and clone it, so I'll only be modifying duplicates.
The strange thing is that this is working perfectly on 3 boxen, all on
different operating systems, and is only messing up on 1 box, which is on
the same OS as one of the other function boxen -- and it doesn't do it
all the time, it only messes up some files.


Could you look at the language/locale settings of the machines? You use
byte[]->String and String->byte[] conversions without an explicit
encoding, which will default to a system-dependant encoding. Try leaving
out the modification pass and if that works, figure out what the default
encoding is for the machines where it does work and use that encoding
explicitly in your code. It might just work on all machines then...


That sound you hear is me smacking my forehead and saying, "D'oh". I
remember, when I was first learning Java, and finding cases of specifying
the encoding types, I thought, "This system will only be used within this
state-- that should never be a problem. Bag it." The system with the
problem is near Washington, D.C., and it is very possible it's been used by
people who have changed the localization settings, and possibly when they
were reset, they weren't reset to what I expected.

I'll check on that -- as well as all the other things I mentioned.

I created a couple short term fixes. I wrote a class that would go through
the directory, find all the files that Java couldn't read in as Zips, and
deleted them. That got rid of most of the problems. Then I upgraded the
fix so it not only tried to read them in, but found the start of the
Central Directory Header, got the address of the Central Directory, made
sure the CDH was the correct length, and that the CD was where it should
be. Less than 1% of the files were corrupted in a manner I could detect,
but that was enough to clog up the system. Now they're automatically
removed. That makes it work, but it means a loss of data, so I'm still
trying to make sure the Zip files are written out correctly.

Thanks for the suggestions and ideas. I'll post what happens next week,
when I get results.

Hal
Good luck,

Silvio Bierman


Jul 17 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

1 post views Thread by Alex | last post: by
reply views Thread by Hal Vaughan | last post: by
1 post views Thread by poi | last post: by
3 posts views Thread by Boris Condarco | last post: by
5 posts views Thread by Andrus | last post: by
reply views Thread by ysong | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.