Compression size

=?Utf-8?B?VkJB?=

I compressed a file with GZipStream class and is larger than the original
file.... how can this be?, the original file is 737 KB and the "compressed"
file is 1.1 MB. Did i miss something or is normal with that compression class?

--
VBA

Jun 23 '07 #1

Subscribe Post Reply

5966

Andrew Robinson

What type of file are you compressing? Some highly compressed files such as
images may grow in size when compressed a second time. Also, the Microsoft
algorithms are not idea but rather make an attempt to steer clear of patent
issues.

"VBA" <VB*@discussions.microsoft.comwrote in message
news:BB**********************************@microsof t.com...

>I compressed a file with GZipStream class and is larger than the original
file.... how can this be?, the original file is 737 KB and the
"compressed"
file is 1.1 MB. Did i miss something or is normal with that compression
class?

--
VBA

Jun 23 '07 #2

=?Utf-8?B?VkJB?=

First I compressed a txt file, i read that if a file is very small , the
compression can turn it larger in size, so then i tried with a mp3 file (not
sure if the file type matters) of 3.4 Mb, but turned it to 5.3
MB....so.....what's wrong??

--
VBA
"Andrew Robinson" wrote:

What type of file are you compressing? Some highly compressed files such as
images may grow in size when compressed a second time. Also, the Microsoft
algorithms are not idea but rather make an attempt to steer clear of patent
issues.

"VBA" <VB*@discussions.microsoft.comwrote in message
news:BB**********************************@microsof t.com...
I compressed a file with GZipStream class and is larger than the original
file.... how can this be?, the original file is 737 KB and the
"compressed"
file is 1.1 MB. Did i miss something or is normal with that compression
class?

--
VBA

Jun 23 '07 #3

Scott C

VBA wrote:

First I compressed a txt file, i read that if a file is very small , the
compression can turn it larger in size, so then i tried with a mp3 file (not
sure if the file type matters) of 3.4 Mb, but turned it to 5.3
MB....so.....what's wrong??

MP3 is a compressed file... I bet you'd get better behavior with a 3.5
MB text file.

Scott

Jun 23 '07 #4

=?Utf-8?B?VkJB?=

But i can only compress text files?? because a tried a while ago with a
pdf....and resulted the same..bigger size, but i don't know if a pdf file is
somehow compressed already.

by the way, when compressing a file, the resulting compressed file should be
with the same file extension? or i must use something like *.Z ????

--
VBA
"Scott C" wrote:

VBA wrote:
First I compressed a txt file, i read that if a file is very small , the
compression can turn it larger in size, so then i tried with a mp3 file (not
sure if the file type matters) of 3.4 Mb, but turned it to 5.3
MB....so.....what's wrong??

MP3 is a compressed file... I bet you'd get better behavior with a 3.5
MB text file.

Scott

Jun 23 '07 #5

Marc Gravell

PDF can contain compressed graphics (and, IIRC, sometimes text), and
if it is encrypted the data can appear relatively random. Both of
these make it a poor choice for compression.

Put simply: some files compress very well indeed, and some don't. In
particular, those that are already compressed (or highly random) don't
tend to compress (and can get bigger).

Marc

Jun 23 '07 #6

Tom Spink

VBA wrote:

I compressed a file with GZipStream class and is larger than the original
file.... how can this be?, the original file is 737 KB and the
"compressed" file is 1.1 MB. Did i miss something or is normal with that
compression class?

Hi VBA,

Random data is hard to compress, as compression techniques often work on
probabilities (e.g. Huffman encoding). So, encrypted files and already
compressed files, such as MP3s, JPEGs, GIFs, etc will not compress at all.

Text documents written in English, or files containing sparse data (such as
BMPs and certain executables) will compress fairly well. It all depends on
the compression algorithm.

You should choose an algorithm that's appropriate to the type of data you're
trying to compress... a bad algorithm will almost certainly result in
larger files.

But like I said at the start random data is hard if not damn near impossible
to compress.

--
Tom Spink
University of Edinburgh

Jun 23 '07 #7

Peter Duniho

On Sat, 23 Jun 2007 00:26:00 -0700, VBA <VB*@discussions.microsoft.com>
wrote:

[...]
by the way, when compressing a file, the resulting compressed file
should be
with the same file extension? or i must use something like *.Z ????

You can name the compressed file whatever you like. Of course, using the
Gzip class, it's common to use the ".gz" extension for the output. But
there's no requirement that you do so.

Pete

Jun 23 '07 #8

Peter Duniho

On Fri, 22 Jun 2007 22:41:25 -0700, Andrew Robinson <ne****@nospam.nospam>
wrote:

[...] Also, the Microsoft
algorithms are not idea but rather make an attempt to steer clear of
patent
issues.

GzipStream may not implement an ideal algorithm, but since Gzip itself is
an open format, I doubt that patent issues are part of the question.

Jun 23 '07 #9

=?Utf-8?B?VkJB?=

Looks very interesenting all that you are telling me :)
I just now thought in a new question related it.... how does Winzip work?? i
mean you can put any file in a Winzip file and compress it, and i read in a
book that uses a similar compression algorithm, is that another type a
compression or you could do a similar software in .NET using GZipStream????

--
VBA
"Tom Spink" wrote:

VBA wrote:

I compressed a file with GZipStream class and is larger than the original
file.... how can this be?, the original file is 737 KB and the
"compressed" file is 1.1 MB. Did i miss something or is normal with that
compression class?

Hi VBA,

Random data is hard to compress, as compression techniques often work on
probabilities (e.g. Huffman encoding). So, encrypted files and already
compressed files, such as MP3s, JPEGs, GIFs, etc will not compress at all.

Text documents written in English, or files containing sparse data (such as
BMPs and certain executables) will compress fairly well. It all depends on
the compression algorithm.

You should choose an algorithm that's appropriate to the type of data you're
trying to compress... a bad algorithm will almost certainly result in
larger files.

But like I said at the start random data is hard if not damn near impossible
to compress.

--
Tom Spink
University of Edinburgh

Jun 23 '07 #10

Peter Duniho

On Sat, 23 Jun 2007 09:16:01 -0700, VBA <VB*@discussions.microsoft.com>
wrote:

Looks very interesenting all that you are telling me :)
I just now thought in a new question related it.... how does Winzip
work??

Two standard compression algorithms on which much (nearly all, actually,
as far as I know) of our lossless compression tools are built on are
Huffman encoding and the Lempel-Ziv-Welch algorithm. I don't have
specifics on the exact implementation of WinZip, but I gather that like
all "zip" variations, it uses some forms of these algorithms.

If you want to have a better idea of how various compression schemes work,
the place to start is reading about these basic algorithms.

i mean you can put any file in a Winzip file and compress it, and i read
in a
book that uses a similar compression algorithm, is that another type a
compression or you could do a similar software in .NET using
GZipStream????

You can't "put any file in a Winzip file and compress it". Typically,
something like WinZip will try a variety of specific compression
algorithms to see which performs best (each variation of a given algorithm
may perform differently, depending on the content and structure of the
data). In some cases, no compression algorithm will reduce the size, or
will reduce it significantly, and the original data will be used. But
inclusion of file headers and other information will increase the file
size at least a little.

Note that the GzipStream class does not have the entire data before it
must make decisions about how to compress the data. As far as I know, it
just uses a single "best general case" version of the "deflate" algorithm
(based on Huffman and LZW). In any case, it's guaranteed that GzipStream
doesn't have the ability to pick from a variety of algorithms to use the
best-performing one, as something like WinZip can.

Again, I don't know specifically how WinZip works, but all compression
tools have this basic behavior. There is not a single compression tool
that is guaranteed to reduce the size of the data.

Pete

Jun 23 '07 #11

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

Tom Spink wrote:

But like I said at the start random data is hard if not damn near impossible
to compress.

Some define random data as being data that are uncompressable ...

:-)

Arne

Jul 2 '07 #12

=?UTF-8?B?QXJuZSBWYWpow7hq?=

Peter Duniho wrote:

On Sat, 23 Jun 2007 09:16:01 -0700, VBA <VB*@discussions.microsoft.com>
wrote:

>Looks very interesenting all that you are telling me :)
I just now thought in a new question related it.... how does Winzip
work??

Two standard compression algorithms on which much (nearly all, actually,
as far as I know) of our lossless compression tools are built on are
Huffman encoding and the Lempel-Ziv-Welch algorithm. I don't have
specifics on the exact implementation of WinZip, but I gather that like
all "zip" variations, it uses some forms of these algorithms.

Absolutely untrue.

LZ78 (LZW) is used in traditional Unix compress.

But ZIP and GZip uses LZ77.

Both often combined with either Huffman or Arithmetic encoding.

BZip uses Burrows Wheeler.

>i mean you can put any file in a Winzip file and compress it, and i
read in a
book that uses a similar compression algorithm, is that another type a
compression or you could do a similar software in .NET using
GZipStream????

You can't "put any file in a Winzip file and compress it". Typically,
something like WinZip will try a variety of specific compression
algorithms to see which performs best (each variation of a given
algorithm may perform differently, depending on the content and
structure of the data). In some cases, no compression algorithm will
reduce the size, or will reduce it significantly, and the original data
will be used. But inclusion of file headers and other information will
increase the file size at least a little.

Note that the GzipStream class does not have the entire data before it
must make decisions about how to compress the data. As far as I know,
it just uses a single "best general case" version of the "deflate"
algorithm (based on Huffman and LZW). In any case, it's guaranteed that
GzipStream doesn't have the ability to pick from a variety of algorithms
to use the best-performing one, as something like WinZip can.

I would assume that WinZip only uses the possibilities within the
Zip format and not some custom format.

And deflate is still LZ77 not LZ78 (LZW).

Arne

Jul 2 '07 #13

Peter Duniho

On Sun, 01 Jul 2007 19:56:15 -0700, Arne VajhÃ¸j <ar**@vajhoej.dkwrote:

> Two standard compression algorithms on which much (nearly all,
actually, as far as I know) of our lossless compression tools are built
on are Huffman encoding and the Lempel-Ziv-Welch algorithm. I don't
have specifics on the exact implementation of WinZip, but I gather that
like all "zip" variations, it uses some forms of these algorithms.

Absolutely untrue.

Okay.

LZ78 (LZW) is used in traditional Unix compress.

But ZIP and GZip uses LZ77.

Both often combined with either Huffman or Arithmetic encoding.

That's what I said. I thought you said what I said was "absolutely
untrue".

Maybe the word "absolutely" means something different in your native
language? Here, it's used to emphasize, rather than to negate.

Pete

Jul 2 '07 #14

=?UTF-8?B?QXJuZSBWYWpow7hq?=

Peter Duniho wrote:

On Sun, 01 Jul 2007 19:56:15 -0700, Arne VajhÃ¸j <ar**@vajhoej.dkwrote:

>> Two standard compression algorithms on which much (nearly all,
actually, as far as I know) of our lossless compression tools are
built on are Huffman encoding and the Lempel-Ziv-Welch algorithm. I
don't have specifics on the exact implementation of WinZip, but I
gather that like all "zip" variations, it uses some forms of these
algorithms.

Absolutely untrue.

Okay.

>LZ78 (LZW) is used in traditional Unix compress.

But ZIP and GZip uses LZ77.

Both often combined with either Huffman or Arithmetic encoding.

That's what I said. I thought you said what I said was "absolutely
untrue".

Maybe the word "absolutely" means something different in your native
language? Here, it's used to emphasize, rather than to negate.

????

You said that nearly all lossless compression tools are build on LZW.

That is absolute untrue or complete bullshit or whatever you want
to call it.

It even explained why: that ZIP and GZip does not use LZW. And they
are a lot more used than good old Unix Compress.

Arne

Jul 3 '07 #15

Peter Duniho

On Mon, 02 Jul 2007 19:27:55 -0700, Arne VajhÃ¸j <ar**@vajhoej.dkwrote:

You said that nearly all lossless compression tools are build on LZW.

I wrote (and you quoted) "WinZip...uses some forms of these algorithms".

In what way is LZ77 (the algorithm you wrote is used with the ZIP format)
_not_ "some form" of the LZW algorithm?

That is absolute untrue or complete bullshit or whatever you want
to call it.

My statement was just fine, and your own claims even confirm that. You
can continue to write asinine things like "absolute untrue" and "complete
bullshit" as much as you like, there was nothing wrong with my post.
Furthermore, your posts continue to insult without educating.

If you have an actual point, try making it without being such an ass.

Thanks,
Pete

Jul 3 '07 #16

=?UTF-8?B?QXJuZSBWYWpow7hq?=

Peter Duniho wrote:

On Mon, 02 Jul 2007 19:27:55 -0700, Arne VajhÃ¸j <ar**@vajhoej.dkwrote:
>You said that nearly all lossless compression tools are build on LZW.

I wrote (and you quoted) "WinZip...uses some forms of these algorithms".

In what way is LZ77 (the algorithm you wrote is used with the ZIP
format) _not_ "some form" of the LZW algorithm?

No.

Not code wise. Not patent wise. Not in any way.

>That is absolute untrue or complete bullshit or whatever you want
to call it.

My statement was just fine, and your own claims even confirm that.

Bullshit.

Furthermore, your posts continue to insult without educating.

I have tried multiple times to explain to you that the most
widely used compression algorithms does not use LZW they use
LZ77.

That is educational.

That you refuse to understand it does not make it less educational.

If you have an actual point, try making it without being such an ass.

It seems as if you just have difficulties understanding the point.

Arne

Jul 4 '07 #17

Peter Duniho

On Tue, 03 Jul 2007 17:02:23 -0700, Arne VajhÃ¸j <ar**@vajhoej.dkwrote:

It seems as if you just have difficulties understanding the point.

When you make a point that is comprehensible, then I will start worrying
about whether I understand it.

Jul 4 '07 #18

=?UTF-8?B?QXJuZSBWYWpow7hq?=

Peter Duniho wrote:

On Tue, 03 Jul 2007 17:02:23 -0700, Arne VajhÃ¸j <ar**@vajhoej.dkwrote:
>It seems as if you just have difficulties understanding the point.

When you make a point that is comprehensible, then I will start worrying
about whether I understand it.

So you did not understand the following:

#In what way is LZ77 (the algorithm you wrote is used with the ZIP
#format) _not_ "some form" of the LZW algorithm?
#
#No.
#
#Not code wise. Not patent wise. Not in any way.

LZW is a completely different algorithm than LZ77. An implementation
will be different code. The infamous LZW patent does not apply to LZ77.

It is difficult to understand ?

Arne

Jul 4 '07 #19

Peter Duniho

On Wed, 04 Jul 2007 16:37:34 -0700, Arne VajhÃ¸j <ar**@vajhoej.dkwrote:

[...]
LZW is a completely different algorithm than LZ77. An implementation
will be different code. The infamous LZW patent does not apply to LZ77.

You have a very strange concept of these absolute terms you're using:
"absolutely untrue", "complete bullshit", "completely different
algorithm", etc.

LZW is _not_ a COMPLETELY different algorithm. A COMPLETELY different
algorithm would share absolutely zero similarities.

All of the algorithms spawned by Lempel and Ziv, including the LZW
algorithm, share various similarities. Some have more similarities in
common than others, but they are ALL "some form" of each other. They all
share the same heritage, and in many ways address similar problems with
similar approaches. All of the LZ-based algorithms, being
dictionary-based, are much more similar to each other than they are to,
for example, Huffman encoding.

The question of a patent is completely irrelevant, by the way. Even
assuming that software patents make sense in the first place, it doesn't
take much for a patent to be inapplicable to closely related code. Most
software patents are written narrowly, for the very reason that it's too
easy to invalidate a broadly-written patent. As such, relatively minor
variations can results in two otherwise closely related algorithms not
sharing patent protection (see MP3 versus other similar
psychoacoustics-based audio compression algorithms, for example).

You seem to have this pathological need to find fault in whatever has been
written, at least with respect to my own posts, regardless of how
contrivedly narrow you have to interpret what was actually written, even
to the point of completely ignoring whatever intent actually existed in
what was written.

Frankly, I find _that_ to be "complete bullshit", and I'm sick and tired
of it. I go to a lot of trouble to make what I write as correct as I can,
and to make it clear where my first-hand knowledge of something is vague
or incomplete. When someone posts a _valid_ correction to something I've
written, I have no problem acknowledging my mistake, and I've posted my
share of "mea culpas" here in this newsgroup and others.

I find your insistence on finding fault with my posts where no fault
exists to be idiotic. I wish you would cut it out.

Pete

Jul 4 '07 #20

=?UTF-8?B?QXJuZSBWYWpow7hq?=

Peter Duniho wrote:

On Wed, 04 Jul 2007 16:37:34 -0700, Arne VajhÃ¸j <ar**@vajhoej.dkwrote:
LZW is _not_ a COMPLETELY different algorithm. A COMPLETELY different
algorithm would share absolutely zero similarities.

All of the algorithms spawned by Lempel and Ziv, including the LZW
algorithm, share various similarities. Some have more similarities in
common than others, but they are ALL "some form" of each other. They
all share the same heritage, and in many ways address similar problems
with similar approaches. All of the LZ-based algorithms, being
dictionary-based, are much more similar to each other than they are to,
for example, Huffman encoding.

LZ77 and LZW are both dictionary based, but that does not make LZ77
a form of LZW.

You seem to have this pathological need to find fault in whatever has
been written, at least with respect to my own posts, regardless of how
contrivedly narrow you have to interpret what was actually written, even
to the point of completely ignoring whatever intent actually existed in
what was written.

Let us take a step back.

You started by writing:

#Two standard compression algorithms on which much (nearly all,
#actually, as far as I know) of our lossless compression tools are built
#on are Huffman encoding and the Lempel-Ziv-Welch algorithm.

I replied:

#Absolutely untrue.
#
#LZ78 (LZW) is used in traditional Unix compress.
#
#But ZIP and GZip uses LZ77.

That is not an interpretation. What you wrote was plain wrong.

The most common compression tools does not use LZW.

Frankly, I find _that_ to be "complete bullshit", and I'm sick and tired
of it. I go to a lot of trouble to make what I write as correct as I
can, and to make it clear where my first-hand knowledge of something is
vague or incomplete. When someone posts a _valid_ correction to
something I've written, I have no problem acknowledging my mistake, and
I've posted my share of "mea culpas" here in this newsgroup and others.

Well in this case you have tried to cover your mistake with various
lame excuses:

#In what way is LZ77 (the algorithm you wrote is used with the ZIP
#format) _not_ "some form" of the LZW algorithm?

instead of just admitting that you remembered wrong regarding LZW.

Arne

Jul 5 '07 #21

Peter Duniho

On Wed, 04 Jul 2007 17:06:27 -0700, Arne VajhÃ¸j <ar**@vajhoej.dkwrote:

LZ77 and LZW are both dictionary based, but that does not make LZ77
a form of LZW.

Why not? Who are you that you get to define what "a form" is? Why is
your definition any more important or correct than mine? Where is the
"official" definition of "a form" on which you base your claim?

I have explained my basis for my usage of the phrase "some form" or "a
form". You have not bothered to explain your basis, but even if you
should happen to, why would your explanation take priority over mine with
respect to interpreting what *I* wrote?

You have a pretty arrogant view of your own importance in how language
should be used, especially when it comes to the intent of someone _else's_
use of language.

[...]
Well in this case you have tried to cover your mistake with various
lame excuses:

Baloney. I made no mistake, and I stand by my original post. I am not
trying to "cover" anything. It is only your pathological need to find
fault that has resulted in this inane sub-thread.

And inane it is. Frankly, I'm a bit embarassed to have even bothered
feeding your troll-like behavior, and I'm done.

To anyone else who has rightly identified this as a useless sub-thread, I
apologize for it and promise that my involvement with it, as well as more
generally with Arne's continued insistence on finding fault where none
exists, is over with. Life's too short to waste time on idiotic stuff
like this.

Pete

Jul 5 '07 #22

Similar topics