By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
438,736 Members | 1,982 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 438,736 IT Pros & Developers. It's quick & easy.

How to create non-compressible string?

P: n/a
I have the following code, and I use this method to create a 1 MB file by
running it 1024 times. After it writes the file, I tried to use ZIP to
compress it, the file size is down to 50%!! How to create some plain text
data that ZIP cannot compress anymore? any algorithm?

String str =
"1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklm nopqrstuvwxyz";
StringBuffer buf = new StringBuffer();

for (int i=0; i<1024; i++) {
buf.append(str.charAt((int)(Math.random() * str.length())));
}
String newstr = buf.toString();


Jul 17 '05 #1
Share this Question
Share on Google+
5 Replies


P: n/a
On Sun, 09 May 2004 23:35:41 GMT, "Roomates Computer"
<ro***************@googlemail.com> wrote or quoted :
How to create some plain text
data that ZIP cannot compress anymore? any algorithm?


See http://mindprod.com/jgloss/randomnumbers.html

Get it to generate random numbers 0..6535 as char.

--
Canadian Mind Products, Roedy Green.
Coaching, problem solving, economical contract programming.
See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
Jul 17 '05 #2

P: n/a
Roomates Computer wrote:
I have the following code, and I use this method to create a 1 MB
file by running it 1024 times. After it writes the file, I tried to
use ZIP to compress it, the file size is down to 50%!! How to
create some plain text data that ZIP cannot compress anymore? any
algorithm?

String str =
"1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklm nopqrstuvwxyz";
StringBuffer buf = new StringBuffer();

for (int i=0; i<1024; i++) {
buf.append(str.charAt((int)(Math.random() * str.length())));
}
String newstr = buf.toString();


Plain text that zip cannot compress???
I do not think such a thing is possible :)
To few possible combinations that you wont find any repetitions
in 1MB file....

Try to think about it even in simplest terms....
Can you create 1MB text file in which there are no pairs repeated
more than once...
I'm not into calculating possible combinations but I'd say it is
impossible...
If it is possible by a chance make that not pairs but triples, than four
letters
in a row, then five...

Tomy.
Jul 17 '05 #3

P: n/a
"Roomates Computer" <ro***************@googlemail.com> wrote in message news:<hf****************@news01.bloor.is.net.cable .rogers.com>...
I have the following code, and I use this method to create a 1 MB file by
running it 1024 times. After it writes the file, I tried to use ZIP to
compress it, the file size is down to 50%!! How to create some plain text
data that ZIP cannot compress anymore? any algorithm?

String str =
"1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklm nopqrstuvwxyz";
StringBuffer buf = new StringBuffer();

for (int i=0; i<1024; i++) {
buf.append(str.charAt((int)(Math.random() * str.length())));
}
String newstr = buf.toString();

This is one of those questions where one feels compelled to ask *why*
something is being attemted. (Nine times out of ten it's a weak
solution to a problem which can be solved far more easily. :-)

Random numbers on their own may not necessarily help you here. For
a start, even if they are truely random and free from any 'known'
pattern (which they usually are not!) there is always the probability
of accidental patterns occuring - which might afford some small
compression.

The best way data to use is data which has been already compressed.
Let's say we wish to generate 1Mb of data which cannot be compressed.
Start by generating 1Mb of 'random' data, then compress it. Lets say
we get an 800k result - so we top that up with 200k more random data,
and compress again. Now we get a 950k result - so we top that up with
50k random data, and keep going. (The figures are entirely theoretical,
and will depend upon the strength of the random number algorithm.)

Be warned - you are unlikely to get *dead on* 1Mb, but must keep going
until you get over 1Mb of data. You may not be able to truncate the
data to 1Mb without corrupting the compression (although this may not
-FISH- ><>
Jul 17 '05 #4

P: n/a
On Sun, 09 May 2004 23:35:41 GMT, Roomates Computer wrote:
I have the following code, and I use this method to create a 1 MB
file by running it 1024 times. After it writes the file, I tried to
use ZIP to compress it, the file size is down to 50%!! How to create
some plain text data that ZIP cannot compress anymore? any
algorithm?


If by your alphabet consists only of ASCII encoded letters and numbers
(as your example implies), then you are only using about 6 bits per
byte, so even a *trivial* compression algorithm will always manage to
compress the text to 6/8 the original size, regardless of how randomly
you choose the characters.

Before your data can even approach non-compressibility, it needs to be
truly random and it needs to use the entire set of possible values
from 0-255.

/gordon

--
[ do not email me copies of your followups ]
g o r d o n + n e w s @ b a l d e r 1 3 . s e
Jul 17 '05 #5

P: n/a
Roomates Computer wrote:
I have the following code, and I use this method to create a 1 MB file by
running it 1024 times. After it writes the file, I tried to use ZIP to
compress it, the file size is down to 50%!! How to create some plain text
data that ZIP cannot compress anymore? any algorithm?

String str =
"1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklm nopqrstuvwxyz";
StringBuffer buf = new StringBuffer();

for (int i=0; i<1024; i++) {
buf.append(str.charAt((int)(Math.random() * str.length())));
}
String newstr = buf.toString();


Zip is a tokanizing compressor, what it does is it looks for collections
of repeating bytes, it replaces a collection of repeating bytes with a
token, for example take the following:

The fat cat sat on the mat of leaves. In this case 'at ' can be
tokenized. Lets represent our token with @ so the sentence becomes:

The f@c@s@on the m@of leaves.

We compressed 37 bytes down to 32 bytes (you need to add the token table
to the storage), not a big savings, and a contrived example of how it
works at the core, it's actually more complex then this, but essentially
this is how lossless compression schemes work. To keep it from
compressing at all, you need to make sure that the whole string becomes
a single token, this means no repetition at all.

This is why tokenizers have tough times compressing images and sound
files, because the number of repetitions is quite small. Jpeg and MP3
are not tokenizers, but then you lose some of the data as well.

Paul

Jul 17 '05 #6

This discussion thread is closed

Replies have been disabled for this discussion.