Connecting Tech Pros Worldwide Help | Site Map

how far can data compression go

  #1  
Old February 24th, 2009, 04:29 PM
Member
 
Join Date: Apr 2007
Posts: 53
I'm just wondering,what factor(s) makes recursive compression algorithms stop.I mean if you can compress compressed data,why cant you keep compressing till the data reaches it's smallest possible size.
  #2  
Old February 24th, 2009, 04:59 PM
JosAH's Avatar
Expert
 
Join Date: Mar 2007
Posts: 10,634
Provided Answers: 2

re: how far can data compression go


Quote:
Originally Posted by dynamo View Post
I'm just wondering,what factor(s) makes recursive compression algorithms stop.I mean if you can compress compressed data,why cant you keep compressing till the data reaches it's smallest possible size.
You can't keep on compressing. A perfect compression algorithm leaves no redundancy in the data it has to compress so it ends up with its minimal size after one compression round. The minimal size of some data d (its 'information') is defined as -2log(P(d)) where P(d) is the probability that an event d occurs.

A sequence of random bits can not be compressed (e.g. white noise). Another problem is what we consider 'information'. Normally we use eight bits as information units but for a certain bit stream the information unit could be nine bits long or a prime number of bits long.

Most compression algorithms are near optimal so they can't be applied ad infinitum. For a nice introduction read this and this.

kind regards,

Jos
  #3  
Old February 24th, 2009, 07:43 PM
Member
 
Join Date: Apr 2007
Posts: 53

re: how far can data compression go


Thanks for replying.When you say one can't compress random data.What exactly do you mean by random data and does this mean you cant compress data if your looking at it at binary level?
  #4  
Old February 25th, 2009, 03:56 PM
JosAH's Avatar
Expert
 
Join Date: Mar 2007
Posts: 10,634
Provided Answers: 2

re: how far can data compression go


Quote:
Originally Posted by dynamo View Post
Thanks for replying.When you say one can't compress random data.What exactly do you mean by random data and does this mean you cant compress data if your looking at it at binary level?
For white noise random data every bit has a probability P(0) == P(1) == 0.5. If all those bits are truely random you can't compress that bit stream. In jargon: the bit stream has maximal entropy.

Compressing data increases the entropy of the data. If the entropy is maximal already you can't compress that data.

kind regards,

Jos
  #5  
Old March 5th, 2009, 11:10 PM
bilibytes's Avatar
Familiar Sight
 
Join Date: Jun 2008
Location: Europe
Posts: 128

re: how far can data compression go


Quote:
Originally Posted by JosAH View Post
For white noise random data every bit has a probability P(0) == P(1) == 0.5. If all those bits are truely random you can't compress that bit stream. In jargon: the bit stream has maximal entropy.

Compressing data increases the entropy of the data. If the entropy is maximal already you can't compress that data.

kind regards,

Jos
Hey you look very smart!
Did you get your knowledge from a University or by googleing?

congrats for your math knowledge!
  #6  
Old March 6th, 2009, 07:17 AM
JosAH's Avatar
Expert
 
Join Date: Mar 2007
Posts: 10,634
Provided Answers: 2

re: how far can data compression go


Quote:
Originally Posted by bilibytes View Post
Did you get your knowledge from a University or by googleing?
University; when I studied there (somewhere in the dark ages) there was no Google yet, nor was there any internet available to the public ...

kind regards,

Jos
  #7  
Old March 16th, 2009, 06:12 PM
RedSon's Avatar
Site Moderator
 
Join Date: Jan 2007
Location: America
Posts: 3,362

re: how far can data compression go


Quote:
Originally Posted by bilibytes View Post
Hey you look very smart!
Did you get your knowledge from a University or by googleing?

congrats for your math knowledge!
Actually if you saw him he would look very ugly! No just kidding Jos actually looks like a fluffy dog. But that is not to take away from the fact that he is very smart or at least very good at thumbing through text books.
  #8  
Old March 23rd, 2009, 01:50 AM
Newbie
 
Join Date: Mar 2009
Posts: 2

re: how far can data compression go


Quote:
Originally Posted by dynamo View Post
I'm just wondering,what factor(s) makes recursive compression algorithms stop.I mean if you can compress compressed data,why cant you keep compressing till the data reaches it's smallest possible size.
All data compression schemes involve the detection of patterns of some kind, and reducing their stored size. Since real data typically contains substantial numbers of patterns, programs like ZIP can usually squeeze them at least a little.

However, there is a limit to this ability. Think about it this way: 8 bits of data can store any of 256 possible configurations. No compression scheme could transform all of these 256 configurations to, say, 4 bits, since there are only 16 possible 4-bit configurations. In practice, compression works well on the usual case where there are some sort of patterns to detect. High-entropy random data has no such patterns can cannot, generally, be reduced in size.


-Will Dwinnell
Data Mining in MATLAB
  #9  
Old March 27th, 2009, 11:00 PM
Banfa's Avatar
AdministratorVoR
 
Join Date: Feb 2006
Location: South West UK
Posts: 6,117
Provided Answers: 6

re: how far can data compression go


It should be pointed out that so far everyone has been talking about lossless data compression. That is compression that when reversed produces exactly what was originally compressed.

With lossy compression it gets a bit more abstract because, as the name implies, you are actually discarding some of the data either purposefully or through the algorithm. In this way you get the situation you have with jpegs where you get to choose your compression level but the more you compress the worst the picture looks. Lossy compression is used in a fair number of places, jpegs mp3 MPEG2 and MPEG4.

It is my belief that the most efficient lossy compression is fractal compression which basically replaces a picture with a set of rules about how the different bits of the picture relate to each other. However this is not in very common use because I believe it takes a very long time to perform the compression although the decompression is very quick.

MPEG2 based standards (like MPEG2 and mp3) discard some data based on what most human beings can perceive, that is humans do not perceive blue-green (I think) colour differences very well so MPEG2 only uses a small amount of data to encode that information since small changes are imperceptible to the human eye. Similarly with audio it tends to drop very high and very low frequencies that a large number of humans can't hear.
  #10  
Old March 28th, 2009, 11:27 AM
JosAH's Avatar
Expert
 
Join Date: Mar 2007
Posts: 10,634
Provided Answers: 2

re: how far can data compression go


I once found the 'ultimate loss-less compression method': suppose I want to compress a sequence of characters (eg a book). Define the vertices v_1, v_2, ... v_m where each vertex represents a character present in the book. Next draw a directed arc from v_s to v_n where v_s is the first character from the book and v_n is the next character; do this for all following characters in the book.

This way I created a multigraph; let v_s be the first character in the book and v_e be the last character in the book. The sequence of characters in the book is repesented by an Euler path in the multi-graph (all edges used exactly once, starting at v_s and ending at v_e). All Euler paths can be lexicographically sorted and a number assigned, so all I need is a matrix with m rows and colums and each cell has a number v_ij representing the number of edges starting at v_i and ending at v_j; the number of the Euler path plus this matrix is enough to represent the entire book, no matter the size of the book.

Here's the catch: that 'Euler number' takes as many bits proportional to the size of the original book; the other parts are just the offspring of my hyper-intelligent brains ;-)

kind regards,

Jos
Reply


Similar Threads
Thread Thread Starter Forum Replies Last Post
simple string compression? Chris LaJoie answers 20 November 15th, 2005 12:39 PM
File bloating/compression problem in Jet Riley DeWiley answers 12 November 13th, 2005 01:20 PM
sorting dates mike answers 22 August 4th, 2005 08:15 PM
Trouble with system() function Penn Markham answers 9 July 17th, 2005 05:43 AM