Adding compression - C# / C Sharp

chance

Hello,
I want to add compression to a memory stream and save it in an Oracle
database. This is the code I have so far:

//save the Word document to a binary field,
MemoryStream dataStream = new MemoryStream();
doc.Save(dataStream, SaveFormat.Doc);

//now compress it
GZipStream compressedZipStream = new GZipStream(dataStream,
CompressionMode.Compress);

//now store to document attatchment
row["DOCUMENT"] = compressedZipStream. <---------How can I
dump all the bytes here?

I need help with the fourth line. I was using stream.ToArray() to make
the assignment but that is not available for the compressedZipStream.

How can I store the compressed zip stream in the row?

tia,
chance.

Apr 4 '07 #1

Subscribe Reply

2887

Jon Skeet [C# MVP]

On Apr 4, 2:01 pm, "chance" <cha...@crwmail.comwrote:

Hello,
I want to add compression to a memory stream and save it in an Oracle
database. This is the code I have so far:

//save the Word document to a binary field,
MemoryStream dataStream = new MemoryStream();
doc.Save(dataStream, SaveFormat.Doc);

//now compress it
GZipStream compressedZipStream = new GZipStream(dataStream,
CompressionMode.Compress);

//now store to document attatchment
row["DOCUMENT"] = compressedZipStream. <---------How can I
dump all the bytes here?

I need help with the fourth line. I was using stream.ToArray() to make
the assignment but that is not available for the compressedZipStream.

How can I store the compressed zip stream in the row?

You're not ordering things correctly. You want to set the GZipStream
up to write into the MemoryStream, then tell your document to save
into the GZipStream, then close both streams, *then* call ToArray.
Currently you're not compressing anything.

Jon

Apr 4 '07 #2

chance

What call actually does the compression?

On Apr 4, 8:23 am, "Jon Skeet [C# MVP]" <s...@pobox.comwrote:

On Apr 4, 2:01 pm, "chance" <cha...@crwmail.comwrote:

Hello,
I want to add compression to a memory stream and save it in an Oracle
database. This is the code I have so far:

//save the Word document to a binary field,
MemoryStream dataStream = new MemoryStream();
doc.Save(dataStream, SaveFormat.Doc);

//now compress it
GZipStream compressedZipStream = new GZipStream(dataStream,
CompressionMode.Compress);

//now store to document attatchment
row["DOCUMENT"] = compressedZipStream. <---------How can I
dump all the bytes here?

I need help with the fourth line. I was using stream.ToArray() to make
the assignment but that is not available for the compressedZipStream.

How can I store the compressed zip stream in the row?

You're not ordering things correctly. You want to set the GZipStream
up to write into the MemoryStream, then tell your document to save
into the GZipStream, then close both streams, *then* call ToArray.
Currently you're not compressing anything.

Jon- Hide quoted text -

- Show quoted text -

Apr 4 '07 #3

Jon Skeet [C# MVP]

On Apr 4, 4:42 pm, "chance" <cha...@crwmail.comwrote:

What call actually does the compression?

When you write to a GZipStream, it writes the compressed data (after
buffering etc) to the stream you give it in the constructor. The
compression effectively happens behind the scenes, without you ever
having to say "compress now". You do, however, have to close the
stream so it can write the final buffered data out.

Jon

Apr 4 '07 #4

chance

Can you show an example where we try and compress a file called c:\
\report.doc and then store it in a row on a table?

thanxs.

On Apr 4, 10:52 am, "Jon Skeet [C# MVP]" <s...@pobox.comwrote:

On Apr 4, 4:42 pm, "chance" <cha...@crwmail.comwrote:

What call actually does the compression?

When you write to a GZipStream, it writes the compressed data (after
buffering etc) to the stream you give it in the constructor. The
compression effectively happens behind the scenes, without you ever
having to say "compress now". You do, however, have to close the
stream so it can write the final buffered data out.

Jon

Apr 4 '07 #5

Jon Skeet [C# MVP]

chance <ch****@crwmail.comwrote:

Can you show an example where we try and compress a file called c:\
\report.doc and then store it in a row on a table?

I'm afraid I haven't got the time, but your original code was very
close - just make the changes I suggested and it should be fine.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Apr 4 '07 #6

D. Yates

Chance,

There is an example here:
http://msdn2.microsoft.com/en-us/lib...tream.aspxDave

Apr 5 '07 #7

D. Yates

Chance,

The only other thing that I would add is that you should not stuff bytes
into the GZipStream one byte at a time. In my experience this has resulted
in almost NO compression. Try shoving 1K or 2K worth of data at a time into
the GZipStream till you get to the end of your file stream and then a
partail buffer buffer before closing the GZipStream. Just keep this fact
in mind.

Dave

Apr 5 '07 #8

D. Yates

That should be:
http://msdn2.microsoft.com/en-us/lib...zipstream.aspx

Apr 5 '07 #9

Jon Skeet [C# MVP]

D. Yates <fo****@hotmail.comwrote:

The only other thing that I would add is that you should not stuff bytes
into the GZipStream one byte at a time. In my experience this has resulted
in almost NO compression. Try shoving 1K or 2K worth of data at a time into
the GZipStream till you get to the end of your file stream and then a
partail buffer buffer before closing the GZipStream. Just keep this fact
in mind.

It shouldn't make any difference - I would expect GZipStream to buffer
things up appropriately. One of the points of a stream is that it
shouldn't normally make a difference (other than performance) how you
put the data in - you should get the same data out.

I can write a test program for this if you're really sure you've seen
it make a difference, but as I say it shouldn't.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Apr 5 '07 #10

D. Yates

Jon,

I zipped a 3,760Kb firewall log text file and it only compresses to 3750Kb
using code like this:

private void Compress_Click(object sender, EventArgs e)
{
using(FileStream oldFile = File.OpenRead("Test.log"))
using(FileStream newFile = File.Create("Test.gz"))
using(GZipStream compression = new GZipStream(newFile,
CompressionMode.Compress))
{
int data = oldFile.ReadByte();
while(data != -1)
{
compression.WriteByte((byte) data);
data = oldFile.ReadByte();
}

compression.Close();
}
}
However, I can zip the 3,760Kb firewall log text file and it will compresses
to 233KB using code like this:

private void Compress_Click(object sender, EventArgs e)
{
using (FileStream oldFile = File.OpenRead("Test.log"))
using (FileStream newFile = File.Create("Test.gz"))
using (GZipStream compression = new GZipStream(newFile,
CompressionMode.Compress))
{
byte[] buffer = new byte[1024];
int numberOfBytesRead = oldFile.Read(buffer, 0, buffer.Length);
while (numberOfBytesRead 0)
{
compression.Write(buffer, 0, numberOfBytesRead);
numberOfBytesRead = oldFile.Read(buffer, 0, buffer.Length);
}

compression.Close();
}
}
Decompress works if I do it one byte at a time like this:
private void Decompress_Click(object sender, EventArgs e)
{
using(FileStream compressFile = File.Open("Test.gz", FileMode.Open))
using (FileStream uncompressedFile = File.Create("Test-gz.log"))
using (GZipStream compression = new GZipStream(compressFile,
CompressionMode.Decompress))
{
int data = compression.ReadByte();
while(data != -1)
{
uncompressedFile.WriteByte((byte) data);
data = compression.ReadByte();
}

compression.Close();
}
}
Dave

Apr 5 '07 #11

Jon Skeet [C# MVP]

D. Yates <fo****@hotmail.comwrote:

I zipped a 3,760Kb firewall log text file and it only compresses to 3750Kb
using code like this:

<snip>

Good grief. I view that as a significant flaw in the GZipStream class.
Fortunately it can also be fixed by wrapping a BufferedStream round it,
but I'm astonished that it doesn't perform appropriate buffering
itself.

I do apologise for doubting you - thanks for the simple sample code :)
(In my case I only took a 40K file, but it went down to 35K without
buffering and 5K when a BufferedStream was wrapped around the
GZipStream.)

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Apr 5 '07 #12

D. Yates

Chance,

You might also want to read this:
http://www.madskristensen.dk/blog/Pe...64e924fa0.aspx

On Mads Kristensen blog, he states that he tested GZipStream against
DeflateStream and that DeflateStream is 41% faster than GZipStream.

You might want to do your own tests as well.....

Dave

Apr 5 '07 #13

chance

I can't even get a non-corrupt zip file. This is my code. What gives?
//compress it
MemoryStream uncompressedStream = new MemoryStream();
doc.Save(uncompressedStream, SaveFormat.Doc);

MemoryStream compressedStream = new MemoryStream();
GZipStream compressor = new GZipStream(compressedStream,
CompressionMode.Compress);

uncompressedStream.Position = 0;
uncompressedStream.WriteTo(compressor);

row["DOCUMENT"] = compressedStream.ToArray();

On Apr 5, 3:44 pm, "D. Yates" <foe...@hotmail.comwrote:

Chance,

You might also want to read this:http://www.madskristensen.dk/blog/Pe...0e-9ab8-422e-a...

On Mads Kristensen blog, he states that he tested GZipStream against
DeflateStream and that DeflateStream is 41% faster than GZipStream.

You might want to do your own tests as well.....

Dave

Apr 6 '07 #14

D. Yates

Chance,

You are going to have to create a compressed version of the file on disk,
load the compressed version and then stream it to the database. If you try
to compress the file directly to a memorystream it will not work because the
compression stream will CLOSE the memory stream when it is disposed/closed.

Sooo... use the example given earlier (maybe with DeflateStream instead of
GZipStream) to compress the document on disk and then load up a the
compressed document and send it to the database. Afterwards, you can delete
to compressed disk file and you are good to go.

Dave
PS - You should use the GZipStream to do the writing since it holds a
reference to the destination stream and it compresses the data as it writes.
Look to the examples posted earlier for more information.

Apr 6 '07 #15

D. Yates

Jon,

I'm interested in why you would use a BufferedStream for reading data in and
then writing data back to a file? I can see it benefits if you don't know
how much data is coming down the pipe (the MSDN example uses a NetworkStream
with sockets...I get that...) and you want to gradually feed data into the
BufferedStream till it hits its preset size limit and then flushes data, but
in a case like this are there any advantages?

Dave

Apr 6 '07 #16

Jon Skeet [C# MVP]

D. Yates <fo****@hotmail.comwrote:

I'm interested in why you would use a BufferedStream for reading data in and
then writing data back to a file? I can see it benefits if you don't know
how much data is coming down the pipe (the MSDN example uses a NetworkStream
with sockets...I get that...) and you want to gradually feed data into the
BufferedStream till it hits its preset size limit and then flushes data, but
in a case like this are there any advantages?

By wrapping a BufferedStream round the GZipStream, if you *do* write in
small blocks the effect is mitigated by the buffering.

Nicer to read and write whole blocks at a time, of course. Indeed, I've
got code in my MiscUtil library to do exactly that, copying the
contents of one stream into another...

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Apr 7 '07 #17

D. Yates

Jon,

Hey, I couldn't find any code that uses BufferedStream in the Miscellaneous
Utilities here: http://www.yoda.arachsys.com/csharp/miscutil/

The MiscUtil.IO.SteamUtil is mentioned in the contents section, but the
source download does not contain this class.

Dave

Apr 7 '07 #18

Jon Skeet [C# MVP]

D. Yates <fo****@hotmail.comwrote:

Hey, I couldn't find any code that uses BufferedStream in the Miscellaneous
Utilities here: http://www.yoda.arachsys.com/csharp/miscutil/

No, that doesn't use BufferedStream - but it provides a way of copying
the contents of one stream to another easily using a single buffer.

The MiscUtil.IO.SteamUtil is mentioned in the contents section, but the
source download does not contain this class.

Eek, does it not? It certainly should do!

<sfx: tappety tappety>

Hmm. Not sure where that all went wrong. Okay, I've got a unit test to
fix (from a bug reported by someone else) and then I'll upload a new
version. Thanks for pointing out the inconsistency!

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Apr 7 '07 #19

Jon Skeet [C# MVP]

Jon Skeet [C# MVP] <sk***@pobox.comwrote:

<snip>

Hmm. Not sure where that all went wrong. Okay, I've got a unit test to
fix (from a bug reported by someone else) and then I'll upload a new
version. Thanks for pointing out the inconsistency!

Okay, found out what was wrong. I'd got all the code, but not committed
it to svn. My build process for the code that ends up on the website
fetches a clean copy from svn...

It's up there now.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Apr 7 '07 #20

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

D. Yates wrote:

I'm interested in why you would use a BufferedStream for reading data in
and then writing data back to a file? I can see it benefits if you
don't know how much data is coming down the pipe (the MSDN example uses
a NetworkStream with sockets...I get that...) and you want to gradually
feed data into the BufferedStream till it hits its preset size limit and
then flushes data, but in a case like this are there any advantages?

A few comments.

I have not seen the anyone explain why the buffering effects
the compression.

GZip uses LZH which is a combination of LZ77 and Huffman.

If you compress 1 byte at a time then that algorithm will
degenerate into a pure Huffman with 1/8 overhead added.

GZipStream does not override WriteByte, so we are actually
calling Stream WriteByte which just do:

byte[] buffer = new byte[] { value };
this.Write(buffer, 0, 1);

That is why this is happening.

Could it be fixed ? Yes !

GZipStream could overide WriteByte and use an internal
buffer.

Should it do that ? I am not sure !

The actual bytes written will depend on the buffer size. I do
not think it is nice to have functionality depend on an internal
const. OK - then we could make it a property in the class and
have an constructor with an argument.

But then I think it is just as simple to have the programmer
wrap with a BufferedStream.

Nobody should be using ReadByte and WriteByte on big files
anyway.

A note in the docs would definatetly be nice though !

Arne

Apr 8 '07 #21

Similar topics

Adding to a zip file

by: Dennis Hotson | last post by:

I'm having trouble adding a file to a .zip file using python2.3. The write method of a ZipFile object needs a filename in order to add a file to the archive. The problem is that I want to add a...

Python

Compression algorithims....

by: Jim Hubbard | last post by:

I went to the compression newsgroups, but all I saw was spam. So, I thought I'd post his question here to get the best info I could from other programmers. Which compression algorithm offers...

.NET Framework

Data Compression

by: Anurag | last post by:

Hi, I am told that Oracle has this "data compression" feature that allows you to store online data ina compressed format. This is different from archived data - you compress only that data which...

DB2 Database

Unicode Compression - good or evil?

by: deko | last post by:

Is it best practice to set Unicode Compression to "No" for memo fields in a table? What about text fields? According to the VB help entry: "Data in a Memo field is not compressed unless it...

Microsoft Access / VBA

Adding DLL references to VB.Net Project

by: Dennis | last post by:

I am trying to use ZLIB.Dll in a VB.Net project but keep getting an error message that says it can't load the DLL. I tried to add a reference using "Project-Add Reference" but get the error message...

Visual Basic .NET

Copying zlib compression objects

by: chris.atlee | last post by:

I'm writing a program in python that creates tar files of a certain maximum size (to fit onto CD/DVD). One of the problems I'm running into is that when using compression, it's pretty much...

Python

Compression size

by: =?Utf-8?B?VkJB?= | last post by:

I compressed a file with GZipStream class and is larger than the original file.... how can this be?, the original file is 737 KB and the "compressed" file is 1.1 MB. Did i miss something or is...

C# / C Sharp

Problem with adding a data into a vector

by: oktayarslan | last post by:

Hi all; I have a problem when inserting an element to a vector. All I want is reading some data from a file and putting them into a vector. But the program is crashing after pushing a data which...

C / C++

TIFF compression & transfer

by: GiJeet | last post by:

Hello, we have an app that scans documents into TIFF format and we need to transfer them over the internet. If anyone knows of a SDK we can use that can compress TIFFs on the fly or even if it can...

C# / C Sharp

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

C# / C Sharp

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...

C# / C Sharp