473,382 Members | 1,357 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,382 software developers and data experts.

xor: how come so slow?

Hi,
I'm trying to encode a byte data. Let's not focus on the process of
encoding; in fact, I want to emphasize that the method
create_random_block takes 0.5s to be executed (even Java it's faster) on
a Dual-Core 3.0Ghz machine:

took 46.746999979s, avg: 0.46746999979s

Thus I suppose that the xor operation between bytes raise the execution
time to 0.5; why I suppose that?
Because in Python there's no support for bytes and even for xoring
bytes, so I used a workaround:
I cycle on the two strings to be xor-red
for every char in the strings
convert one char on integer and then xor them; (ord)
insert one char in the result, transforming the previous integer
in char (chr)

I suppose that ord() and char() are the main problems of my
implementation, but I don't know either way to xor two bytes of data
(bytes are represented as strings).
For more information, see the code attached.

How should I decrease the execution time?

Thank you
from __future__ import division
import random
import time
import sha
import os

class Encoder(object):
def create_random_block(self, data, seed, blocksize):
number_of_blocks = int(len(data)/blocksize)
random.seed(seed)
random_block = ['0'] * blocksize
for index in range(number_of_blocks):
if int(random.getrandbits(1)) == 1:
block = data[blocksize*index:blocksize*index+blocksize]
for bit in range(len(block)):
random_block[bit] =
chr(ord(random_block[bit])^ord(block[bit])) # workaround per fare xor
bit a bit di str; xor e' solo supportato per int -ord
return ''.join(random_block)
x = Encoder()
piece = os.urandom(1024*1024)
blocksize = 16384
t1 = time.time()
for l in range(100):
seed = random.getrandbits(32)
block = x.create_random_block(piece, seed, blocksize)
t2 = time.time()
print 'took ' + str(t2-t1) + 's, avg: ' + str((t2-t1)/100.0) + 's'
Oct 15 '08 #1
21 2461
My answer is: never do things like this with python.
You will find this module useful: www.pycrypto.org

On Oct 15, 12:19*pm, Michele <mich...@nectarine.itwrote:
Hi,
I'm trying to encode a byte data. Let's not focus on the process of
encoding; in fact, I want to emphasize that the method
create_random_block takes 0.5s to be executed (even Java it's faster) on
a Dual-Core 3.0Ghz machine:

took 46.746999979s, avg: 0.46746999979s

Thus I suppose that the xor operation between bytes raise the execution
time to 0.5; why I suppose that?
Because in Python there's no support for bytes and even for xoring
bytes, so I used a workaround:
I cycle on the two strings to be xor-red
* * for every char in the strings
* * * * convert one char on integer and then xor them; (ord)
* * * * insert one char in the result, transforming the previous integer
in char (chr)

I suppose that ord() and char() are the main problems of my
implementation, but I don't know either way to xor two bytes of data
(bytes are represented as strings).
For more information, see the code attached.

How should I decrease the execution time?

Thank you

from __future__ import division
import random
import time
import sha
import os

class Encoder(object):
* * def create_random_block(self, data, seed, blocksize):
* * * * number_of_blocks = int(len(data)/blocksize)
* * * * random.seed(seed)
* * * * random_block = ['0'] * blocksize
* * * * for index in range(number_of_blocks):
* * * * * * if int(random.getrandbits(1)) == 1:
* * * * * * * * block = data[blocksize*index:blocksize*index+blocksize]
* * * * * * * * for bit in range(len(block)):
* * * * * * * * * * random_block[bit] =
chr(ord(random_block[bit])^ord(block[bit])) # workaround per fare xor
bit a bit di str; xor e' solo supportato per int -ord
* * * * return ''.join(random_block)

x = Encoder()
piece = os.urandom(1024*1024)
blocksize = 16384
t1 = time.time()
for l in range(100):
* * seed = random.getrandbits(32)
* * block = x.create_random_block(piece, seed, blocksize)
t2 = time.time()
print 'took ' + str(t2-t1) + 's, avg: ' + str((t2-t1)/100.0) + 's'
Oct 15 '08 #2
Few suggestions for your code:
- Use xrange instead of range.
- Loop over lists where you can instead of their indexes.
- array.array("B", somestring) may help you because it gives a byte
"view" of a string.
- Using psyco helps a lot for such kind of code.
- I think numpy arrays can contain text/chars too, so it may offer you
ways to speed up your code a lot.
- Generally Python is fit to download pages from the net or to act as
glue between different subsystems, or to do bulk string processing,
etc, but for grunt low-level works like this it's often too much slow,
and you can use other lower-level languages.
- You can use a lib already written, or use an extension, for example
you can try ShedSkin, or Pyd.

Bye,
bearophile
Oct 15 '08 #3
On Oct 15, 10:19*pm, Michele <mich...@nectarine.itwrote:
Hi,
I'm trying to encode a byte data. Let's not focus on the process of
encoding; in fact, I want to emphasize that the method
create_random_block takes 0.5s to be executed (even Java it's faster) on
a Dual-Core 3.0Ghz machine:

took 46.746999979s, avg: 0.46746999979s

Thus I suppose that the xor operation between bytes raise the execution
time to 0.5; why I suppose that?
Because in Python there's no support for bytes and even for xoring
bytes, so I used a workaround:
I cycle on the two strings to be xor-red
* * for every char in the strings
* * * * convert one char on integer and then xor them; (ord)
* * * * insert one char in the result, transforming the previous integer
in char (chr)

I suppose that ord() and char() are the main problems of my
implementation, but I don't know either way to xor two bytes of data
(bytes are represented as strings).
For more information, see the code attached.

How should I decrease the execution time?

Thank you

from __future__ import division
import random
import time
import sha
import os

class Encoder(object):
* * def create_random_block(self, data, seed, blocksize):
* * * * number_of_blocks = int(len(data)/blocksize)
* * * * random.seed(seed)
* * * * random_block = ['0'] * blocksize
You possibly mean '\0' i.e. the byte all of whose bits are zero.
* * * * for index in range(number_of_blocks):
* * * * * * if int(random.getrandbits(1)) == 1:
getrandbits(1) produces a *long* with one random bit. Any good reason
for preferring this to randrange(2) and randint(0, 1)?

So there's a 50% chance that this block will be XORed into the result;
is that what you intend?
* * * * * * * * block = data[blocksize*index:blocksize*index+blocksize]
You don't need to slice out block, certainly not so awkwardly.

* * * * * * * * for bit in range(len(block)):

Perhaps you mean "byte_index", not "bit".

On my assumption that range(len(block)) is invariant: calculate it
once. That assumption is incorrect, so is your code for calculating
the number of blocks; it ignores a possible short block at the end.

* * * * * * * * * * random_block[bit] =
chr(ord(random_block[bit])^ord(block[bit]))
The chr() and one ord() are utterly wasteful; leave random_block as a
list of ints and do the chr() thing in the return statement.
# workaround per fare xor
bit a bit di str; xor e' solo supportato per int -ord
* * * * return ''.join(random_block)
this will become
return ''.join(map(chr, random_block))
or
return ''.join(chr(i) for i in random_block)
as taste or speed dictates :-)

So the whole thing becomes [not tested]:
def create_random_block(self, data, seed, blocksize):
datalen = len(data)
assert datalen % blocksize == 0
random.seed(seed)
random_block = [0] * blocksize
block_range = range(blocksize)
for start in xrange(0, datalen, blocksize):
if random.randrange(2):
for x in block_range:
random_block[x] ^= ord(data[start + x])
return ''.join(map(chr, random_block))

Looks slightly more athletic than before :-)

BTW, +1 misleading subject of the week; it's not XOR that's slow!!

Cheers,
John
Oct 15 '08 #4
Michele wrote:
I'm trying to encode a byte data. Let's not focus on the process of
encoding; in fact, I want to emphasize that the method
create_random_block takes 0.5s to be executed (even Java it's faster) on
a Dual-Core 3.0Ghz machine:

took 46.746999979s, avg: 0.46746999979s
How should I decrease the execution time?
Use numpy. You should be able to swap in the following in your script

import numpy
from numpy.core import multiarray as ma
class Encoder(object):
def create_random_block(self, data, seed, blocksize):
number_of_blocks = len(data)//blocksize
random.seed(seed)
random_block = ma.fromstring("0"*blocksize, numpy.uint8)

for index in range(number_of_blocks):
if random.getrandbits(1):
block =
ma.fromstring(data[blocksize*index:blocksize*index+blocksize], numpy.uint8)
random_block ^= block
return random_block.tostring()

There are absolutely no warranties as I'm just a random tinkerer when it
comes to numpy. But if it works you should get a nice speedup...

Peter
Oct 15 '08 #5
In message <48***********************@reader5.news.tin.it>, Michele wrote:
class Encoder(object):
def create_random_block(self, data, seed, blocksize):
number_of_blocks = int(len(data)/blocksize)
random.seed(seed)
random_block = ['0'] * blocksize
for index in range(number_of_blocks):
if int(random.getrandbits(1)) == 1:
block = data[blocksize*index:blocksize*index+blocksize]
for bit in range(len(block)):
random_block[bit] =
chr(ord(random_block[bit])^ord(block[bit])) # workaround per fare xor
bit a bit di str; xor e' solo supportato per int -ord
return ''.join(random_block)
OK, this function is randomly choosing blocks from data (of length
number_of_blocks * blocksize), and xoring them together to produce a single
block of length blocksize.
piece = os.urandom(1024*1024)
Is piece really meant to be random? If so, your create_random_block function
isn't achieving much--xoring random data together isn't going to produce
anything more exciting than less random data than you started with.
Oct 17 '08 #6
On Fri, 17 Oct 2008 20:51:37 +1300, Lawrence D'Oliveiro wrote:
Is piece really meant to be random? If so, your create_random_block
function isn't achieving much--xoring random data together isn't going
to produce anything more exciting than less random data than you started
with.

Hmmm... why do you say that xoring random data with other random data
produces less randomness than you started with?

I'm not saying that you're wrong, and certainly it is pointless since
you're not going to improve on the randomness of /dev/urandom without a
lot of work. But less random?

--
Steven
Oct 17 '08 #7
Michele <mi*****@nectarine.itwrites:
I suppose that ord() and char() are the main problems
yes
How should I decrease the execution time?
See http://nightsong.com/phr/crypto/p3.py which deals with
the same problem by using the array module to do the xor's
32 bits at a time.
Oct 17 '08 #8
In message <01**********************@news.astraweb.com>, Steven D'Aprano
wrote:
On Fri, 17 Oct 2008 20:51:37 +1300, Lawrence D'Oliveiro wrote:
>... why do you say that xoring random data with other random data
produces less randomness than you started with?
blocksize <= number_of_blocks * blocksize
Oct 17 '08 #9
On Fri, 17 Oct 2008 22:45:19 +1300, Lawrence D'Oliveiro wrote:
In message <01**********************@news.astraweb.com>, Steven D'Aprano
wrote:
>On Fri, 17 Oct 2008 20:51:37 +1300, Lawrence D'Oliveiro wrote:
>>... why do you say that xoring random data with other random data
produces less randomness than you started with?

blocksize <= number_of_blocks * blocksize
I must be thick, because that looks like a non sequitor to me. I don't
see the relevance.

Of course, that's just another way of saying that:

1 <= number_of_blocks

and I don't see how this relates to whether xoring random data with other
random data decreases the randomness available.
>>c1 = os.urandom(1)
c2 = os.urandom(1)
c3 = chr( ord(c1)^ord(c2) )
Is it your contention that c3 is more predictable than c1 or c2? If so,
why?

--
Steven
Oct 17 '08 #10
[I think these attributions are right]
Steven D'Aprano <st***@REMOVE-THIS-cybersource.com.auwrote:
>On Fri, 17 Oct 2008 22:45:19 +1300, Lawrence D'Oliveiro wrote:
>In message <01**********************@news.astraweb.com>, Steven D'Aprano
wrote:
>>... why do you say that xoring random data with other random data
produces less randomness than you started with?
blocksize <= number_of_blocks * blocksize
I must be thick, because that looks like a non sequitor to me. I don't
see the relevance.
Lawrence originally said something along the lines of this just
being a way of taking some random data and producing "less random
data". You're reading it as "(less random) data". The intent (I
assume) is for it to be read as "less (random data)".

Maybe it should be "fewer random data". After all, each byte in
the block is discrete.

--
\S -- si***@chiark.greenend.org.uk -- http://www.chaos.org.uk/~sion/
"Frankly I have no feelings towards penguins one way or the other"
-- Arthur C. Clarke
her nu becomež se bera eadward ofdun hlęddre heafdes bęce bump bump bump
Oct 17 '08 #11
In message <Jl*******@news.chiark.greenend.org.uk>, Sion Arrowsmith wrote:
Maybe it should be "fewer random data".
Except these days we tend to think of "data" being, say, more like "flour"
than "bees", so it's "less data", like "less flour", rather than
like "fewer bees". :)
After all, each byte in the block is discrete.
Data can come in fractional bits. That's how compression works.
Oct 17 '08 #12
On Fri, 17 Oct 2008 13:59:27 +0100, Sion Arrowsmith wrote:
[I think these attributions are right] Steven D'Aprano
<st***@REMOVE-THIS-cybersource.com.auwrote:
>>On Fri, 17 Oct 2008 22:45:19 +1300, Lawrence D'Oliveiro wrote:
>>In message <01**********************@news.astraweb.com>, Steven
D'Aprano wrote:
... why do you say that xoring random data with other random data
produces less randomness than you started with?
blocksize <= number_of_blocks * blocksize
I must be thick, because that looks like a non sequitor to me. I don't
see the relevance.

Lawrence originally said something along the lines of this just being a
way of taking some random data and producing "less random data". You're
reading it as "(less random) data". The intent (I assume) is for it to
be read as "less (random data)".

Ah. That was how I read it.

Maybe it should be "fewer random data". After all, each byte in the
block is discrete.

Or "a smaller amount of random data".

--
Steven
Oct 19 '08 #13
On Sat, 18 Oct 2008 09:16:11 +1300, Lawrence D'Oliveiro wrote:
In message <Jl*******@news.chiark.greenend.org.uk>, Sion Arrowsmith
wrote:
>Maybe it should be "fewer random data".

Except these days we tend to think of "data" being, say, more like
"flour" than "bees", so it's "less data", like "less flour", rather than
like "fewer bees". :)
>After all, each byte in the block is discrete.

Data can come in fractional bits. That's how compression works.
Compression works by removing redundant data so the same amount of useful
information requires fewer bits. If you don't believe me, try compressing
a single bit and see if you get a "fractional bit". I leave it to you to
choose whether the bit should be on or off.

That's why compressing data twice doesn't gain much, if anything. Try
compressing a typical JPEG image, chances are it won't get any smaller.
That's because JPEGs already remove the redundancy in the data,
compressing it. With no redundancy, there can be no compression.

Due to the pigeon-hole principle, any reversible compression scheme must
cause some data to be expanded rather than compressed. (If not, then
there must be two different pieces of data that compress to the same
thing, which means the scheme is not reversible.) If there were
fractional bits, that wouldn't apply, because you can always divide a
pigeon-hole (a bit) into two smaller pigeon-holes.

--
Steven
Oct 19 '08 #14
In message <01**********************@news.astraweb.com>, Steven D'Aprano
wrote:
On Sat, 18 Oct 2008 09:16:11 +1300, Lawrence D'Oliveiro wrote:
>Data can come in fractional bits. That's how compression works.

If you don't believe me, try compressing a single bit and see if you get
a "fractional bit".
If both states of the bit are not equally likely, then you do indeed have a
fractional bit, since

nrbits = (- logbase2(P[bit = 0]) - logbase2(P[bit = 1])) / 2
Oct 19 '08 #15
Steven D'Aprano <st***@REMOVE-THIS-cybersource.com.auwrote:
>
On Fri, 17 Oct 2008 20:51:37 +1300, Lawrence D'Oliveiro wrote:
>Is piece really meant to be random? If so, your create_random_block
function isn't achieving much--xoring random data together isn't going
to produce anything more exciting than less random data than you started
with.

Hmmm... why do you say that xoring random data with other random data
produces less randomness than you started with?

I'm not saying that you're wrong, and certainly it is pointless since
you're not going to improve on the randomness of /dev/urandom without a
lot of work. But less random?
For those who got a bit lost here, I'd would point out that Knuth[1] has an
excellent chapter on random numbers that includes a detailed discussion of
this effect. His net takeaway is that most of the things people do to
increase randomness actually have exactly the opposite effect.
-----
[1] Knuth, Donald. The Art of Computer Programming, Volume 2,
Seminumerical Algorithms.
--
Tim Roberts, ti**@probo.com
Providenza & Boekelheide, Inc.
Oct 19 '08 #16
On Sun, 19 Oct 2008 16:38:37 +1300, Lawrence D'Oliveiro wrote:
In message <01**********************@news.astraweb.com>, Steven D'Aprano
wrote:
>On Sat, 18 Oct 2008 09:16:11 +1300, Lawrence D'Oliveiro wrote:
>>Data can come in fractional bits. That's how compression works.

If you don't believe me, try compressing a single bit and see if you
get a "fractional bit".

If both states of the bit are not equally likely, then you do indeed
have a fractional bit, since

nrbits = (- logbase2(P[bit = 0]) - logbase2(P[bit = 1])) / 2

That's an arithmetic mean of the logarithms. It doesn't imply that there
are fractional bits any more than an average family having 2.3 children
implies that there are 0.3 of a child wandering around the streets.

Using the Shannon measure of information, you can have messages which
contain fractional information (technically, "surprisal"), when measured
in bits. But that doesn't imply the existence of fractional bits. Look at
it this way: consider a barter economy where I agree to swap 5 chickens
for 2 axes. So each axe is equivalent to 2.5 chickens. But that doesn't
imply that there is such a thing as 0.5 of a chicken -- at least not a
*live* chicken. While I can blithely talk about bartering fractional
chickens, in practice when I actually go to make good on my promise, it
must be an integer number of chickens.

Similarly, we can talk about messages containing fractional bits of
information, but when we actually store or transmit that message in
practice, we can only use integer numbers of bits.

As Wikipedia puts it:

It is important to differentiate between the use of "bit" in referring to
a discrete storage unit and the use of "bit" in referring to a
statistical unit of information. The bit, as a discrete storage unit, can
by definition store only 0 or 1. A statistical bit is the amount of
information that, on average[citation needed], can be stored in a
discrete bit. ... If these two ideas need to be distinguished, sometimes
the name bit is used when discussing data storage while shannon is used
for the statistical bit.

http://en.wikipedia.org/wiki/Bit

--
Steven
Oct 19 '08 #17
On Sun, 19 Oct 2008 04:38:04 +0000, Tim Roberts wrote:
Steven D'Aprano <st***@REMOVE-THIS-cybersource.com.auwrote:
>>
On Fri, 17 Oct 2008 20:51:37 +1300, Lawrence D'Oliveiro wrote:
>>Is piece really meant to be random? If so, your create_random_block
function isn't achieving much--xoring random data together isn't going
to produce anything more exciting than less random data than you
started with.

Hmmm... why do you say that xoring random data with other random data
produces less randomness than you started with?

I'm not saying that you're wrong, and certainly it is pointless since
you're not going to improve on the randomness of /dev/urandom without a
lot of work. But less random?

For those who got a bit lost here, I'd would point out that Knuth[1] has
an excellent chapter on random numbers that includes a detailed
discussion of this effect. His net takeaway is that most of the things
people do to increase randomness actually have exactly the opposite
effect.
I don't doubt it at all. But xoring random data with more random data?
I'm guessing that if the two sources of data are independent and from the
same distribution, then xoring them is pointless but not harmful. Here's
a rough-and-ready test which suggests there's little harm in it:

>>import os, math
def rand_data(size):
.... return [ord(c) for c in os.urandom(size)]
....
>>def mean(data):
.... return sum(data)/len(data)
....
>>def stdev(data):
.... return math.sqrt( mean([x**2 for x in data]) - mean(data)**2 )
....
>>A = rand_data(1000) # good random data
B = rand_data(1000) # more good random data
AB = [a^b for (a,b) in zip(A, B)] # is this still good random data?
assert len(AB) == len(A) == len(B)

mean(A), stdev(A)
(126, 73.91887445030531)
>>mean(B), stdev(B)
(128, 74.242844773082339)
>>mean(AB), stdev(AB)
(129, 74.39085965358916)
Note: I wouldn't take the above terribly seriously. Mean and standard
deviation alone are terrible measures of the randomness of data. But this
does suggest that any deviation from uniform randomness will be quite
subtle.

--
Steven
Oct 19 '08 #18
On Oct 19, 7:13*am, Dennis Lee Bieber <wlfr...@ix.netcom.comwrote:
On Sun, 19 Oct 2008 04:38:04 GMT, Tim Roberts <t...@probo.comdeclaimed
the following in comp.lang.python:
For those who got a bit lost here, I'd would point out that Knuth[1] has an
excellent chapter on random numbers that includes a detailed discussionof
this effect. *His net takeaway is that most of the things people do to
increase randomness actually have exactly the opposite effect.

* * * * Some decade I'll have to obtain his volumes... But they've never
shown up in a $60 special offer from a book club (unlike the compact
editions of the OED) <G>.

* * * * And while XOR may seem significant, just consider die rolls...

* * * * If each "byte" were one die roll, you'd expect a nearly even
distribution... (for a 6 sided die, 1/6 would have each value). But
using the sum of two die, your begin to get a bell curve: 2 and 12
appear 1/36 of the time (each), but 7 occurs 6/36 of the time. Use three
die, and it gets worse: 3 and 18 occur 1/216, "10.5" occurs much more
often...
That should be one die, two dice, etc. :-)
Oct 19 '08 #19
Lawrence D'Oliveiro wrote:
In message <01**********************@news.astraweb.com>, Steven D'Aprano
wrote:
>On Sat, 18 Oct 2008 09:16:11 +1300, Lawrence D'Oliveiro wrote:
>>Data can come in fractional bits. That's how compression works.
If you don't believe me, try compressing a single bit and see if you get
a "fractional bit".

If both states of the bit are not equally likely, then you do indeed have a
fractional bit, since

nrbits = (- logbase2(P[bit = 0]) - logbase2(P[bit = 1])) / 2
What's happening here is that the two different meanings of "bit" are
being confused. A bit is both a binary digit and a measure of information.

Obviously you can't create a bit stream with half a bit in it.

In a coding system where all messages of N binary digits are equally
likely then each message contains N bits of information content. This is
the theoretical upper bound on the information content.

In most practical systems, however, the messages have differing
probabilities; then an N-binary-digit message conveys less than N bits
of information, as Lawrence indicated above. Fractional bits are
perfectly valid as a measure of information content.

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/

Oct 19 '08 #20
Steven D'Aprano wrote:
On Sun, 19 Oct 2008 04:38:04 +0000, Tim Roberts wrote:
>Steven D'Aprano <st***@REMOVE-THIS-cybersource.com.auwrote:
>>>
On Fri, 17 Oct 2008 20:51:37 +1300, Lawrence D'Oliveiro wrote:

Is piece really meant to be random? If so, your create_random_block
function isn't achieving much--xoring random data together isn't going
to produce anything more exciting than less random data than you
started with.

Hmmm... why do you say that xoring random data with other random data
produces less randomness than you started with?

I'm not saying that you're wrong, and certainly it is pointless since
you're not going to improve on the randomness of /dev/urandom without a
lot of work. But less random?

For those who got a bit lost here, I'd would point out that Knuth[1] has
an excellent chapter on random numbers that includes a detailed
discussion of this effect. His net takeaway is that most of the things
people do to increase randomness actually have exactly the opposite
effect.

I don't doubt it at all. But xoring random data with more random data?
I'm guessing that if the two sources of data are independent and from the
same distribution, then xoring them is pointless but not harmful. Here's
a rough-and-ready test which suggests there's little harm in it:

>>>import os, math
def rand_data(size):
... return [ord(c) for c in os.urandom(size)]
...
>>>def mean(data):
... return sum(data)/len(data)
...
>>>def stdev(data):
... return math.sqrt( mean([x**2 for x in data]) - mean(data)**2 )
...
>>>A = rand_data(1000) # good random data
B = rand_data(1000) # more good random data
AB = [a^b for (a,b) in zip(A, B)] # is this still good random data?
assert len(AB) == len(A) == len(B)

mean(A), stdev(A)
(126, 73.91887445030531)
>>>mean(B), stdev(B)
(128, 74.242844773082339)
>>>mean(AB), stdev(AB)
(129, 74.39085965358916)
Note: I wouldn't take the above terribly seriously. Mean and standard
deviation alone are terrible measures of the randomness of data. But this
does suggest that any deviation from uniform randomness will be quite
subtle.
Operations like 'and' and 'or' will tend to destroy randomness. 'and'
tends to the 0-string and 'or' tends to the 1-string. I feel like 'xor'
should be safe (like Steven), but is the proof merely the half-and-half
split of the truth table?

Oct 19 '08 #21
In message <gd**********@lust.ihug.co.nz>, Lawrence D'Oliveiro wrote:
In message <01**********************@news.astraweb.com>, Steven D'Aprano
wrote:
>On Sat, 18 Oct 2008 09:16:11 +1300, Lawrence D'Oliveiro wrote:
>>Data can come in fractional bits. That's how compression works.

If you don't believe me, try compressing a single bit and see if you get
a "fractional bit".

If both states of the bit are not equally likely, then you do indeed have
a fractional bit, since

nrbits = (- logbase2(P[bit = 0]) - logbase2(P[bit = 1])) / 2
Oops, sorry, the formula should of course be

nrbits = - P[bit = 0] * logbase2(P[bit = 0])
- P[bit = 1] * logbase2(P[bit = 1])
Oct 19 '08 #22

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Mike Mc | last post by:
Can anyone tell me an easy way to hide numbers using XOR as mask, i.e. I want to write a program that will get a number, then it will create two random numbers that when they are XOR'ed together...
50
by: dataangel | last post by:
I wrote a function to compare whether two strings are "similar" because I'm using python to make a small text adventure engine and I want to it to be responsive to slight mispellings, like...
20
by: JJK | last post by:
I'd tried to port an application from vb6 to vb.net 2003. While testing is I'd got a error in the XOR operator. This is an example: dim c as Byte .... mask = 190 Fileget(1,c) c = c XOR mask...
80
by: Christopher Benson-Manica | last post by:
Of course one can get the effect with appropriate use of existing operators, but a ^^ operator would make for nice symmetry (as well as useful to me in something I'm working on). Am I the only one...
5
by: robert.h.lowe | last post by:
Hi, I'm looking for a computationally fast means of XOR folding the individual bytes of an unsigned 32-bit int, and an unsigned 16-bit int. Can I point a char * at the address of each of the...
6
by: Chris | last post by:
Hi, Can someone explain the use of XOR? Thanks
11
by: John Williams | last post by:
I've written a simple program to do XOR encryption as my first foray into understanding how encryption works. The code compiles fine, however it segmentation faults on every run. using gdb to...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.