How to create non-compressible string? | | |
I have the following code, and I use this method to create a 1 MB file by
running it 1024 times. After it writes the file, I tried to use ZIP to
compress it, the file size is down to 50%!! How to create some plain text
data that ZIP cannot compress anymore? any algorithm?
String str =
"1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklm nopqrstuvwxyz";
StringBuffer buf = new StringBuffer();
for (int i=0; i<1024; i++) {
buf.append(str.charAt((int)(Math.random() * str.length())));
}
String newstr = buf.toString(); | | | | re: How to create non-compressible string?
On Sun, 09 May 2004 23:35:41 GMT, "Roomates Computer"
<roomatesScomputer@googlemail.com> wrote or quoted :
[color=blue]
> How to create some plain text
>data that ZIP cannot compress anymore? any algorithm?[/color]
See http://mindprod.com/jgloss/randomnumbers.html
Get it to generate random numbers 0..6535 as char.
--
Canadian Mind Products, Roedy Green.
Coaching, problem solving, economical contract programming.
See http://mindprod.com/jgloss/jgloss.html for The Java Glossary. | | | | re: How to create non-compressible string?
Roomates Computer wrote:[color=blue]
> I have the following code, and I use this method to create a 1 MB
> file by running it 1024 times. After it writes the file, I tried to
> use ZIP to compress it, the file size is down to 50%!! How to
> create some plain text data that ZIP cannot compress anymore? any
> algorithm?
>
> String str =
> "1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklm nopqrstuvwxyz";
> StringBuffer buf = new StringBuffer();
>
> for (int i=0; i<1024; i++) {
> buf.append(str.charAt((int)(Math.random() * str.length())));
> }
> String newstr = buf.toString();[/color]
Plain text that zip cannot compress???
I do not think such a thing is possible :)
To few possible combinations that you wont find any repetitions
in 1MB file....
Try to think about it even in simplest terms....
Can you create 1MB text file in which there are no pairs repeated
more than once...
I'm not into calculating possible combinations but I'd say it is
impossible...
If it is possible by a chance make that not pairs but triples, than four
letters
in a row, then five...
Tomy. | | | | re: How to create non-compressible string?
"Roomates Computer" <roomatesScomputer@googlemail.com> wrote in message news:<hfznc.8189$om.5110@news01.bloor.is.net.cable .rogers.com>...[color=blue]
> I have the following code, and I use this method to create a 1 MB file by
> running it 1024 times. After it writes the file, I tried to use ZIP to
> compress it, the file size is down to 50%!! How to create some plain text
> data that ZIP cannot compress anymore? any algorithm?
>
> String str =
> "1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklm nopqrstuvwxyz";
> StringBuffer buf = new StringBuffer();
>
> for (int i=0; i<1024; i++) {
> buf.append(str.charAt((int)(Math.random() * str.length())));
> }
> String newstr = buf.toString();[/color]
This is one of those questions where one feels compelled to ask *why*
something is being attemted. (Nine times out of ten it's a weak
solution to a problem which can be solved far more easily. :-)
Random numbers on their own may not necessarily help you here. For
a start, even if they are truely random and free from any 'known'
pattern (which they usually are not!) there is always the probability
of accidental patterns occuring - which might afford some small
compression.
The best way data to use is data which has been already compressed.
Let's say we wish to generate 1Mb of data which cannot be compressed.
Start by generating 1Mb of 'random' data, then compress it. Lets say
we get an 800k result - so we top that up with 200k more random data,
and compress again. Now we get a 950k result - so we top that up with
50k random data, and keep going. (The figures are entirely theoretical,
and will depend upon the strength of the random number algorithm.)
Be warned - you are unlikely to get *dead on* 1Mb, but must keep going
until you get over 1Mb of data. You may not be able to truncate the
data to 1Mb without corrupting the compression (although this may not
-FISH- ><> | | | | re: How to create non-compressible string?
On Sun, 09 May 2004 23:35:41 GMT, Roomates Computer wrote:[color=blue]
> I have the following code, and I use this method to create a 1 MB
> file by running it 1024 times. After it writes the file, I tried to
> use ZIP to compress it, the file size is down to 50%!! How to create
> some plain text data that ZIP cannot compress anymore? any
> algorithm?[/color]
If by your alphabet consists only of ASCII encoded letters and numbers
(as your example implies), then you are only using about 6 bits per
byte, so even a *trivial* compression algorithm will always manage to
compress the text to 6/8 the original size, regardless of how randomly
you choose the characters.
Before your data can even approach non-compressibility, it needs to be
truly random and it needs to use the entire set of possible values
from 0-255.
/gordon
--
[ do not email me copies of your followups ]
g o r d o n + n e w s @ b a l d e r 1 3 . s e | | | | re: How to create non-compressible string?
Roomates Computer wrote:[color=blue]
> I have the following code, and I use this method to create a 1 MB file by
> running it 1024 times. After it writes the file, I tried to use ZIP to
> compress it, the file size is down to 50%!! How to create some plain text
> data that ZIP cannot compress anymore? any algorithm?
>
> String str =
> "1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklm nopqrstuvwxyz";
> StringBuffer buf = new StringBuffer();
>
> for (int i=0; i<1024; i++) {
> buf.append(str.charAt((int)(Math.random() * str.length())));
> }
> String newstr = buf.toString();
>[/color]
Zip is a tokanizing compressor, what it does is it looks for collections
of repeating bytes, it replaces a collection of repeating bytes with a
token, for example take the following:
The fat cat sat on the mat of leaves. In this case 'at ' can be
tokenized. Lets represent our token with @ so the sentence becomes:
The f@c@s@on the m@of leaves.
We compressed 37 bytes down to 32 bytes (you need to add the token table
to the storage), not a big savings, and a contrived example of how it
works at the core, it's actually more complex then this, but essentially
this is how lossless compression schemes work. To keep it from
compressing at all, you need to make sure that the whole string becomes
a single token, this means no repetition at all.
This is why tokenizers have tough times compressing images and sound
files, because the number of repetitions is quite small. Jpeg and MP3
are not tokenizers, but then you lose some of the data as well.
Paul |  | | | | /bytes/about
We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights.
Get the best answers to your questions from over 226,272 network members.
|