By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,238 Members | 1,746 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,238 IT Pros & Developers. It's quick & easy.

Adding compression

P: n/a
Hello,
I want to add compression to a memory stream and save it in an Oracle
database. This is the code I have so far:

//save the Word document to a binary field,
MemoryStream dataStream = new MemoryStream();
doc.Save(dataStream, SaveFormat.Doc);

//now compress it
GZipStream compressedZipStream = new GZipStream(dataStream,
CompressionMode.Compress);

//now store to document attatchment
row["DOCUMENT"] = compressedZipStream. <---------How can I
dump all the bytes here?

I need help with the fourth line. I was using stream.ToArray() to make
the assignment but that is not available for the compressedZipStream.

How can I store the compressed zip stream in the row?

tia,
chance.

Apr 4 '07 #1
Share this Question
Share on Google+
20 Replies


P: n/a
On Apr 4, 2:01 pm, "chance" <cha...@crwmail.comwrote:
Hello,
I want to add compression to a memory stream and save it in an Oracle
database. This is the code I have so far:

//save the Word document to a binary field,
MemoryStream dataStream = new MemoryStream();
doc.Save(dataStream, SaveFormat.Doc);

//now compress it
GZipStream compressedZipStream = new GZipStream(dataStream,
CompressionMode.Compress);

//now store to document attatchment
row["DOCUMENT"] = compressedZipStream. <---------How can I
dump all the bytes here?

I need help with the fourth line. I was using stream.ToArray() to make
the assignment but that is not available for the compressedZipStream.

How can I store the compressed zip stream in the row?
You're not ordering things correctly. You want to set the GZipStream
up to write into the MemoryStream, then tell your document to save
into the GZipStream, then close both streams, *then* call ToArray.
Currently you're not compressing anything.

Jon

Apr 4 '07 #2

P: n/a
What call actually does the compression?

On Apr 4, 8:23 am, "Jon Skeet [C# MVP]" <s...@pobox.comwrote:
On Apr 4, 2:01 pm, "chance" <cha...@crwmail.comwrote:


Hello,
I want to add compression to a memory stream and save it in an Oracle
database. This is the code I have so far:
//save the Word document to a binary field,
MemoryStream dataStream = new MemoryStream();
doc.Save(dataStream, SaveFormat.Doc);
//now compress it
GZipStream compressedZipStream = new GZipStream(dataStream,
CompressionMode.Compress);
//now store to document attatchment
row["DOCUMENT"] = compressedZipStream. <---------How can I
dump all the bytes here?
I need help with the fourth line. I was using stream.ToArray() to make
the assignment but that is not available for the compressedZipStream.
How can I store the compressed zip stream in the row?

You're not ordering things correctly. You want to set the GZipStream
up to write into the MemoryStream, then tell your document to save
into the GZipStream, then close both streams, *then* call ToArray.
Currently you're not compressing anything.

Jon- Hide quoted text -

- Show quoted text -

Apr 4 '07 #3

P: n/a
On Apr 4, 4:42 pm, "chance" <cha...@crwmail.comwrote:
What call actually does the compression?
When you write to a GZipStream, it writes the compressed data (after
buffering etc) to the stream you give it in the constructor. The
compression effectively happens behind the scenes, without you ever
having to say "compress now". You do, however, have to close the
stream so it can write the final buffered data out.

Jon

Apr 4 '07 #4

P: n/a
Can you show an example where we try and compress a file called c:\
\report.doc and then store it in a row on a table?

thanxs.

On Apr 4, 10:52 am, "Jon Skeet [C# MVP]" <s...@pobox.comwrote:
On Apr 4, 4:42 pm, "chance" <cha...@crwmail.comwrote:
What call actually does the compression?

When you write to a GZipStream, it writes the compressed data (after
buffering etc) to the stream you give it in the constructor. The
compression effectively happens behind the scenes, without you ever
having to say "compress now". You do, however, have to close the
stream so it can write the final buffered data out.

Jon

Apr 4 '07 #5

P: n/a
chance <ch****@crwmail.comwrote:
Can you show an example where we try and compress a file called c:\
\report.doc and then store it in a row on a table?
I'm afraid I haven't got the time, but your original code was very
close - just make the changes I suggested and it should be fine.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Apr 4 '07 #6

P: n/a
Chance,

There is an example here:
http://msdn2.microsoft.com/en-us/lib...tream.aspxDave

Apr 5 '07 #7

P: n/a
Chance,

The only other thing that I would add is that you should not stuff bytes
into the GZipStream one byte at a time. In my experience this has resulted
in almost NO compression. Try shoving 1K or 2K worth of data at a time into
the GZipStream till you get to the end of your file stream and then a
partail buffer buffer before closing the GZipStream. Just keep this fact
in mind.

Dave
Apr 5 '07 #8

P: n/a
D. Yates <fo****@hotmail.comwrote:
The only other thing that I would add is that you should not stuff bytes
into the GZipStream one byte at a time. In my experience this has resulted
in almost NO compression. Try shoving 1K or 2K worth of data at a time into
the GZipStream till you get to the end of your file stream and then a
partail buffer buffer before closing the GZipStream. Just keep this fact
in mind.
It shouldn't make any difference - I would expect GZipStream to buffer
things up appropriately. One of the points of a stream is that it
shouldn't normally make a difference (other than performance) how you
put the data in - you should get the same data out.

I can write a test program for this if you're really sure you've seen
it make a difference, but as I say it shouldn't.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Apr 5 '07 #10

P: n/a
Jon,

I zipped a 3,760Kb firewall log text file and it only compresses to 3750Kb
using code like this:

private void Compress_Click(object sender, EventArgs e)
{
using(FileStream oldFile = File.OpenRead("Test.log"))
using(FileStream newFile = File.Create("Test.gz"))
using(GZipStream compression = new GZipStream(newFile,
CompressionMode.Compress))
{
int data = oldFile.ReadByte();
while(data != -1)
{
compression.WriteByte((byte) data);
data = oldFile.ReadByte();
}

compression.Close();
}
}
However, I can zip the 3,760Kb firewall log text file and it will compresses
to 233KB using code like this:

private void Compress_Click(object sender, EventArgs e)
{
using (FileStream oldFile = File.OpenRead("Test.log"))
using (FileStream newFile = File.Create("Test.gz"))
using (GZipStream compression = new GZipStream(newFile,
CompressionMode.Compress))
{
byte[] buffer = new byte[1024];
int numberOfBytesRead = oldFile.Read(buffer, 0, buffer.Length);
while (numberOfBytesRead 0)
{
compression.Write(buffer, 0, numberOfBytesRead);
numberOfBytesRead = oldFile.Read(buffer, 0, buffer.Length);
}

compression.Close();
}
}
Decompress works if I do it one byte at a time like this:
private void Decompress_Click(object sender, EventArgs e)
{
using(FileStream compressFile = File.Open("Test.gz", FileMode.Open))
using (FileStream uncompressedFile = File.Create("Test-gz.log"))
using (GZipStream compression = new GZipStream(compressFile,
CompressionMode.Decompress))
{
int data = compression.ReadByte();
while(data != -1)
{
uncompressedFile.WriteByte((byte) data);
data = compression.ReadByte();
}

compression.Close();
}
}
Dave
Apr 5 '07 #11

P: n/a
D. Yates <fo****@hotmail.comwrote:
I zipped a 3,760Kb firewall log text file and it only compresses to 3750Kb
using code like this:
<snip>

Good grief. I view that as a significant flaw in the GZipStream class.
Fortunately it can also be fixed by wrapping a BufferedStream round it,
but I'm astonished that it doesn't perform appropriate buffering
itself.

I do apologise for doubting you - thanks for the simple sample code :)
(In my case I only took a 40K file, but it went down to 35K without
buffering and 5K when a BufferedStream was wrapped around the
GZipStream.)

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Apr 5 '07 #12

P: n/a
Chance,

You might also want to read this:
http://www.madskristensen.dk/blog/Pe...64e924fa0.aspx

On Mads Kristensen blog, he states that he tested GZipStream against
DeflateStream and that DeflateStream is 41% faster than GZipStream.

You might want to do your own tests as well.....

Dave
Apr 5 '07 #13

P: n/a
I can't even get a non-corrupt zip file. This is my code. What gives?
//compress it
MemoryStream uncompressedStream = new MemoryStream();
doc.Save(uncompressedStream, SaveFormat.Doc);

MemoryStream compressedStream = new MemoryStream();
GZipStream compressor = new GZipStream(compressedStream,
CompressionMode.Compress);

uncompressedStream.Position = 0;
uncompressedStream.WriteTo(compressor);

row["DOCUMENT"] = compressedStream.ToArray();


On Apr 5, 3:44 pm, "D. Yates" <foe...@hotmail.comwrote:
Chance,

You might also want to read this:http://www.madskristensen.dk/blog/Pe...0e-9ab8-422e-a...

On Mads Kristensen blog, he states that he tested GZipStream against
DeflateStream and that DeflateStream is 41% faster than GZipStream.

You might want to do your own tests as well.....

Dave

Apr 6 '07 #14

P: n/a
Chance,

You are going to have to create a compressed version of the file on disk,
load the compressed version and then stream it to the database. If you try
to compress the file directly to a memorystream it will not work because the
compression stream will CLOSE the memory stream when it is disposed/closed.

Sooo... use the example given earlier (maybe with DeflateStream instead of
GZipStream) to compress the document on disk and then load up a the
compressed document and send it to the database. Afterwards, you can delete
to compressed disk file and you are good to go.

Dave
PS - You should use the GZipStream to do the writing since it holds a
reference to the destination stream and it compresses the data as it writes.
Look to the examples posted earlier for more information.

Apr 6 '07 #15

P: n/a
Jon,

I'm interested in why you would use a BufferedStream for reading data in and
then writing data back to a file? I can see it benefits if you don't know
how much data is coming down the pipe (the MSDN example uses a NetworkStream
with sockets...I get that...) and you want to gradually feed data into the
BufferedStream till it hits its preset size limit and then flushes data, but
in a case like this are there any advantages?

Dave

Apr 6 '07 #16

P: n/a
D. Yates <fo****@hotmail.comwrote:
I'm interested in why you would use a BufferedStream for reading data in and
then writing data back to a file? I can see it benefits if you don't know
how much data is coming down the pipe (the MSDN example uses a NetworkStream
with sockets...I get that...) and you want to gradually feed data into the
BufferedStream till it hits its preset size limit and then flushes data, but
in a case like this are there any advantages?
By wrapping a BufferedStream round the GZipStream, if you *do* write in
small blocks the effect is mitigated by the buffering.

Nicer to read and write whole blocks at a time, of course. Indeed, I've
got code in my MiscUtil library to do exactly that, copying the
contents of one stream into another...

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Apr 7 '07 #17

P: n/a
Jon,

Hey, I couldn't find any code that uses BufferedStream in the Miscellaneous
Utilities here: http://www.yoda.arachsys.com/csharp/miscutil/

The MiscUtil.IO.SteamUtil is mentioned in the contents section, but the
source download does not contain this class.

Dave

Apr 7 '07 #18

P: n/a
D. Yates <fo****@hotmail.comwrote:
Hey, I couldn't find any code that uses BufferedStream in the Miscellaneous
Utilities here: http://www.yoda.arachsys.com/csharp/miscutil/
No, that doesn't use BufferedStream - but it provides a way of copying
the contents of one stream to another easily using a single buffer.
The MiscUtil.IO.SteamUtil is mentioned in the contents section, but the
source download does not contain this class.
Eek, does it not? It certainly should do!

<sfx: tappety tappety>

Hmm. Not sure where that all went wrong. Okay, I've got a unit test to
fix (from a bug reported by someone else) and then I'll upload a new
version. Thanks for pointing out the inconsistency!

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Apr 7 '07 #19

P: n/a
Jon Skeet [C# MVP] <sk***@pobox.comwrote:

<snip>
Hmm. Not sure where that all went wrong. Okay, I've got a unit test to
fix (from a bug reported by someone else) and then I'll upload a new
version. Thanks for pointing out the inconsistency!
Okay, found out what was wrong. I'd got all the code, but not committed
it to svn. My build process for the code that ends up on the website
fetches a clean copy from svn...

It's up there now.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Apr 7 '07 #20

P: n/a
D. Yates wrote:
I'm interested in why you would use a BufferedStream for reading data in
and then writing data back to a file? I can see it benefits if you
don't know how much data is coming down the pipe (the MSDN example uses
a NetworkStream with sockets...I get that...) and you want to gradually
feed data into the BufferedStream till it hits its preset size limit and
then flushes data, but in a case like this are there any advantages?
A few comments.

I have not seen the anyone explain why the buffering effects
the compression.

GZip uses LZH which is a combination of LZ77 and Huffman.

If you compress 1 byte at a time then that algorithm will
degenerate into a pure Huffman with 1/8 overhead added.

GZipStream does not override WriteByte, so we are actually
calling Stream WriteByte which just do:

byte[] buffer = new byte[] { value };
this.Write(buffer, 0, 1);

That is why this is happening.

Could it be fixed ? Yes !

GZipStream could overide WriteByte and use an internal
buffer.

Should it do that ? I am not sure !

The actual bytes written will depend on the buffer size. I do
not think it is nice to have functionality depend on an internal
const. OK - then we could make it a property in the class and
have an constructor with an argument.

But then I think it is just as simple to have the programmer
wrap with a BufferedStream.

Nobody should be using ReadByte and WriteByte on big files
anyway.

A note in the docs would definatetly be nice though !

Arne
Apr 8 '07 #21

This discussion thread is closed

Replies have been disabled for this discussion.