xor: how come so slow?

Michele

Hi,
I'm trying to encode a byte data. Let's not focus on the process of
encoding; in fact, I want to emphasize that the method
create_random_block takes 0.5s to be executed (even Java it's faster) on
a Dual-Core 3.0Ghz machine:

took 46.746999979s, avg: 0.46746999979s

Thus I suppose that the xor operation between bytes raise the execution
time to 0.5; why I suppose that?
Because in Python there's no support for bytes and even for xoring
bytes, so I used a workaround:
I cycle on the two strings to be xor-red
for every char in the strings
convert one char on integer and then xor them; (ord)
insert one char in the result, transforming the previous integer
in char (chr)

I suppose that ord() and char() are the main problems of my
implementation, but I don't know either way to xor two bytes of data
(bytes are represented as strings).
For more information, see the code attached.

How should I decrease the execution time?

Thank you
from __future__ import division
import random
import time
import sha
import os

class Encoder(object):
def create_random_block(self, data, seed, blocksize):
number_of_blocks = int(len(data)/blocksize)
random.seed(seed)
random_block = ['0'] * blocksize
for index in range(number_of_blocks):
if int(random.getrandbits(1)) == 1:
block = data[blocksize*index:blocksize*index+blocksize]
for bit in range(len(block)):
random_block[bit] =
chr(ord(random_block[bit])^ord(block[bit])) # workaround per fare xor
bit a bit di str; xor e' solo supportato per int -ord
return ''.join(random_block)
x = Encoder()
piece = os.urandom(1024*1024)
blocksize = 16384
t1 = time.time()
for l in range(100):
seed = random.getrandbits(32)
block = x.create_random_block(piece, seed, blocksize)
t2 = time.time()
print 'took ' + str(t2-t1) + 's, avg: ' + str((t2-t1)/100.0) + 's'

Oct 15 '08 #1

Subscribe Post Reply

2461

Seun Osewa

My answer is: never do things like this with python.
You will find this module useful: www.pycrypto.org

On Oct 15, 12:19*pm, Michele <mich...@nectarine.itwrote:

Hi,
I'm trying to encode a byte data. Let's not focus on the process of
encoding; in fact, I want to emphasize that the method
create_random_block takes 0.5s to be executed (even Java it's faster) on
a Dual-Core 3.0Ghz machine:

took 46.746999979s, avg: 0.46746999979s

Thus I suppose that the xor operation between bytes raise the execution
time to 0.5; why I suppose that?
Because in Python there's no support for bytes and even for xoring
bytes, so I used a workaround:
I cycle on the two strings to be xor-red
* * for every char in the strings
* * * * convert one char on integer and then xor them; (ord)
* * * * insert one char in the result, transforming the previous integer
in char (chr)

I suppose that ord() and char() are the main problems of my
implementation, but I don't know either way to xor two bytes of data
(bytes are represented as strings).
For more information, see the code attached.

How should I decrease the execution time?

Thank you

from __future__ import division
import random
import time
import sha
import os

class Encoder(object):
* * def create_random_block(self, data, seed, blocksize):
* * * * number_of_blocks = int(len(data)/blocksize)
* * * * random.seed(seed)
* * * * random_block = ['0'] * blocksize
* * * * for index in range(number_of_blocks):
* * * * * * if int(random.getrandbits(1)) == 1:
* * * * * * * * block = data[blocksize*index:blocksize*index+blocksize]
* * * * * * * * for bit in range(len(block)):
* * * * * * * * * * random_block[bit] =
chr(ord(random_block[bit])^ord(block[bit])) # workaround per fare xor
bit a bit di str; xor e' solo supportato per int -ord
* * * * return ''.join(random_block)

x = Encoder()
piece = os.urandom(1024*1024)
blocksize = 16384
t1 = time.time()
for l in range(100):
* * seed = random.getrandbits(32)
* * block = x.create_random_block(piece, seed, blocksize)
t2 = time.time()
print 'took ' + str(t2-t1) + 's, avg: ' + str((t2-t1)/100.0) + 's'

Oct 15 '08 #2

bearophileHUGS

Few suggestions for your code:
- Use xrange instead of range.
- Loop over lists where you can instead of their indexes.
- array.array("B", somestring) may help you because it gives a byte
"view" of a string.
- Using psyco helps a lot for such kind of code.
- I think numpy arrays can contain text/chars too, so it may offer you
ways to speed up your code a lot.
- Generally Python is fit to download pages from the net or to act as
glue between different subsystems, or to do bulk string processing,
etc, but for grunt low-level works like this it's often too much slow,
and you can use other lower-level languages.
- You can use a lib already written, or use an extension, for example
you can try ShedSkin, or Pyd.

Bye,
bearophile

Oct 15 '08 #3

John Machin

On Oct 15, 10:19*pm, Michele <mich...@nectarine.itwrote:

Hi,
I'm trying to encode a byte data. Let's not focus on the process of
encoding; in fact, I want to emphasize that the method
create_random_block takes 0.5s to be executed (even Java it's faster) on
a Dual-Core 3.0Ghz machine:

took 46.746999979s, avg: 0.46746999979s

Thus I suppose that the xor operation between bytes raise the execution
time to 0.5; why I suppose that?
Because in Python there's no support for bytes and even for xoring
bytes, so I used a workaround:
I cycle on the two strings to be xor-red
* * for every char in the strings
* * * * convert one char on integer and then xor them; (ord)
* * * * insert one char in the result, transforming the previous integer
in char (chr)

I suppose that ord() and char() are the main problems of my
implementation, but I don't know either way to xor two bytes of data
(bytes are represented as strings).
For more information, see the code attached.

How should I decrease the execution time?

Thank you

from __future__ import division
import random
import time
import sha
import os

class Encoder(object):
* * def create_random_block(self, data, seed, blocksize):
* * * * number_of_blocks = int(len(data)/blocksize)
* * * * random.seed(seed)
* * * * random_block = ['0'] * blocksize

You possibly mean '\0' i.e. the byte all of whose bits are zero.

* * * * for index in range(number_of_blocks):
* * * * * * if int(random.getrandbits(1)) == 1:

getrandbits(1) produces a *long* with one random bit. Any good reason
for preferring this to randrange(2) and randint(0, 1)?

So there's a 50% chance that this block will be XORed into the result;
is that what you intend?

* * * * * * * * block = data[blocksize*index:blocksize*index+blocksize]

You don't need to slice out block, certainly not so awkwardly.

* * * * * * * * for bit in range(len(block)):

Perhaps you mean "byte_index", not "bit".

On my assumption that range(len(block)) is invariant: calculate it
once. That assumption is incorrect, so is your code for calculating
the number of blocks; it ignores a possible short block at the end.

* * * * * * * * * * random_block[bit] =
chr(ord(random_block[bit])^ord(block[bit]))

The chr() and one ord() are utterly wasteful; leave random_block as a
list of ints and do the chr() thing in the return statement.

# workaround per fare xor
bit a bit di str; xor e' solo supportato per int -ord
* * * * return ''.join(random_block)

this will become
return ''.join(map(chr, random_block))
or
return ''.join(chr(i) for i in random_block)
as taste or speed dictates :-)

So the whole thing becomes [not tested]:
def create_random_block(self, data, seed, blocksize):
datalen = len(data)
assert datalen % blocksize == 0
random.seed(seed)
random_block = [0] * blocksize
block_range = range(blocksize)
for start in xrange(0, datalen, blocksize):
if random.randrange(2):
for x in block_range:
random_block[x] ^= ord(data[start + x])
return ''.join(map(chr, random_block))

Looks slightly more athletic than before :-)

BTW, +1 misleading subject of the week; it's not XOR that's slow!!

Cheers,
John

Oct 15 '08 #4

Peter Otten

Michele wrote:

I'm trying to encode a byte data. Let's not focus on the process of
encoding; in fact, I want to emphasize that the method
create_random_block takes 0.5s to be executed (even Java it's faster) on
a Dual-Core 3.0Ghz machine:

took 46.746999979s, avg: 0.46746999979s

How should I decrease the execution time?

Use numpy. You should be able to swap in the following in your script

import numpy
from numpy.core import multiarray as ma
class Encoder(object):
def create_random_block(self, data, seed, blocksize):
number_of_blocks = len(data)//blocksize
random.seed(seed)
random_block = ma.fromstring("0"*blocksize, numpy.uint8)

for index in range(number_of_blocks):
if random.getrandbits(1):
block =
ma.fromstring(data[blocksize*index:blocksize*index+blocksize], numpy.uint8)
random_block ^= block
return random_block.tostring()

There are absolutely no warranties as I'm just a random tinkerer when it
comes to numpy. But if it works you should get a nice speedup...

Peter

Oct 15 '08 #5

Lawrence D'Oliveiro

In message <48***********************@reader5.news.tin.it>, Michele wrote:

class Encoder(object):
def create_random_block(self, data, seed, blocksize):
number_of_blocks = int(len(data)/blocksize)
random.seed(seed)
random_block = ['0'] * blocksize
for index in range(number_of_blocks):
if int(random.getrandbits(1)) == 1:
block = data[blocksize*index:blocksize*index+blocksize]
for bit in range(len(block)):
random_block[bit] =
chr(ord(random_block[bit])^ord(block[bit])) # workaround per fare xor
bit a bit di str; xor e' solo supportato per int -ord
return ''.join(random_block)

OK, this function is randomly choosing blocks from data (of length
number_of_blocks * blocksize), and xoring them together to produce a single
block of length blocksize.

piece = os.urandom(1024*1024)

Is piece really meant to be random? If so, your create_random_block function
isn't achieving much--xoring random data together isn't going to produce
anything more exciting than less random data than you started with.

Oct 17 '08 #6

Steven D'Aprano

On Fri, 17 Oct 2008 20:51:37 +1300, Lawrence D'Oliveiro wrote:

Is piece really meant to be random? If so, your create_random_block
function isn't achieving much--xoring random data together isn't going
to produce anything more exciting than less random data than you started
with.

Hmmm... why do you say that xoring random data with other random data
produces less randomness than you started with?

I'm not saying that you're wrong, and certainly it is pointless since
you're not going to improve on the randomness of /dev/urandom without a
lot of work. But less random?

--
Steven

Oct 17 '08 #7

Paul Rubin

Michele <mi*****@nectarine.itwrites:

I suppose that ord() and char() are the main problems

yes

How should I decrease the execution time?

See http://nightsong.com/phr/crypto/p3.py which deals with
the same problem by using the array module to do the xor's
32 bits at a time.

Oct 17 '08 #8

Lawrence D'Oliveiro

In message <01**********************@news.astraweb.com>, Steven D'Aprano
wrote:

On Fri, 17 Oct 2008 20:51:37 +1300, Lawrence D'Oliveiro wrote:

>... why do you say that xoring random data with other random data
produces less randomness than you started with?

blocksize <= number_of_blocks * blocksize

Oct 17 '08 #9

Steven D'Aprano

On Fri, 17 Oct 2008 22:45:19 +1300, Lawrence D'Oliveiro wrote:

In message <01**********************@news.astraweb.com>, Steven D'Aprano
wrote:

>On Fri, 17 Oct 2008 20:51:37 +1300, Lawrence D'Oliveiro wrote:

>>... why do you say that xoring random data with other random data
produces less randomness than you started with?

blocksize <= number_of_blocks * blocksize

I must be thick, because that looks like a non sequitor to me. I don't
see the relevance.

Of course, that's just another way of saying that:

1 <= number_of_blocks

and I don't see how this relates to whether xoring random data with other
random data decreases the randomness available.

>>c1 = os.urandom(1)
c2 = os.urandom(1)
c3 = chr( ord(c1)^ord(c2) )

Is it your contention that c3 is more predictable than c1 or c2? If so,
why?

--
Steven

Oct 17 '08 #10

Sion Arrowsmith

[I think these attributions are right]
Steven D'Aprano <st***@REMOVE-THIS-cybersource.com.auwrote:

>On Fri, 17 Oct 2008 22:45:19 +1300, Lawrence D'Oliveiro wrote:
>In message <01**********************@news.astraweb.com>, Steven D'Aprano
wrote:
>>... why do you say that xoring random data with other random data
produces less randomness than you started with?
blocksize <= number_of_blocks * blocksize
I must be thick, because that looks like a non sequitor to me. I don't
see the relevance.

Lawrence originally said something along the lines of this just
being a way of taking some random data and producing "less random
data". You're reading it as "(less random) data". The intent (I
assume) is for it to be read as "less (random data)".

Maybe it should be "fewer random data". After all, each byte in
the block is discrete.

--
\S -- si***@chiark.greenend.org.uk -- http://www.chaos.org.uk/~sion/
"Frankly I have no feelings towards penguins one way or the other"
-- Arthur C. Clarke
her nu becomeþ se bera eadward ofdun hlæddre heafdes bæce bump bump bump

Oct 17 '08 #11

Lawrence D'Oliveiro

In message <Jl*******@news.chiark.greenend.org.uk>, Sion Arrowsmith wrote:

Maybe it should be "fewer random data".

Except these days we tend to think of "data" being, say, more like "flour"
than "bees", so it's "less data", like "less flour", rather than
like "fewer bees". :)

After all, each byte in the block is discrete.

Data can come in fractional bits. That's how compression works.

Oct 17 '08 #12

Steven D'Aprano

On Fri, 17 Oct 2008 13:59:27 +0100, Sion Arrowsmith wrote:

[I think these attributions are right] Steven D'Aprano
<st***@REMOVE-THIS-cybersource.com.auwrote:
>>On Fri, 17 Oct 2008 22:45:19 +1300, Lawrence D'Oliveiro wrote:
>>In message <01**********************@news.astraweb.com>, Steven
D'Aprano wrote:
... why do you say that xoring random data with other random data
produces less randomness than you started with?
blocksize <= number_of_blocks * blocksize
I must be thick, because that looks like a non sequitor to me. I don't
see the relevance.

Lawrence originally said something along the lines of this just being a
way of taking some random data and producing "less random data". You're
reading it as "(less random) data". The intent (I assume) is for it to
be read as "less (random data)".

Ah. That was how I read it.

Maybe it should be "fewer random data". After all, each byte in the
block is discrete.

Or "a smaller amount of random data".

--
Steven

Oct 19 '08 #13

Steven D'Aprano

On Sat, 18 Oct 2008 09:16:11 +1300, Lawrence D'Oliveiro wrote:

In message <Jl*******@news.chiark.greenend.org.uk>, Sion Arrowsmith
wrote:

>Maybe it should be "fewer random data".

Except these days we tend to think of "data" being, say, more like
"flour" than "bees", so it's "less data", like "less flour", rather than
like "fewer bees". :)

>After all, each byte in the block is discrete.

Data can come in fractional bits. That's how compression works.

Compression works by removing redundant data so the same amount of useful
information requires fewer bits. If you don't believe me, try compressing
a single bit and see if you get a "fractional bit". I leave it to you to
choose whether the bit should be on or off.

That's why compressing data twice doesn't gain much, if anything. Try
compressing a typical JPEG image, chances are it won't get any smaller.
That's because JPEGs already remove the redundancy in the data,
compressing it. With no redundancy, there can be no compression.

Due to the pigeon-hole principle, any reversible compression scheme must
cause some data to be expanded rather than compressed. (If not, then
there must be two different pieces of data that compress to the same
thing, which means the scheme is not reversible.) If there were
fractional bits, that wouldn't apply, because you can always divide a
pigeon-hole (a bit) into two smaller pigeon-holes.

--
Steven

Oct 19 '08 #14

Lawrence D'Oliveiro

In message <01**********************@news.astraweb.com>, Steven D'Aprano
wrote:

On Sat, 18 Oct 2008 09:16:11 +1300, Lawrence D'Oliveiro wrote:

>Data can come in fractional bits. That's how compression works.

If you don't believe me, try compressing a single bit and see if you get
a "fractional bit".

If both states of the bit are not equally likely, then you do indeed have a
fractional bit, since

nrbits = (- logbase2(P[bit = 0]) - logbase2(P[bit = 1])) / 2

Oct 19 '08 #15

Tim Roberts

Steven D'Aprano <st***@REMOVE-THIS-cybersource.com.auwrote:

>
On Fri, 17 Oct 2008 20:51:37 +1300, Lawrence D'Oliveiro wrote:

>Is piece really meant to be random? If so, your create_random_block
function isn't achieving much--xoring random data together isn't going
to produce anything more exciting than less random data than you started
with.

Hmmm... why do you say that xoring random data with other random data
produces less randomness than you started with?

I'm not saying that you're wrong, and certainly it is pointless since
you're not going to improve on the randomness of /dev/urandom without a
lot of work. But less random?

For those who got a bit lost here, I'd would point out that Knuth[1] has an
excellent chapter on random numbers that includes a detailed discussion of
this effect. His net takeaway is that most of the things people do to
increase randomness actually have exactly the opposite effect.
-----
[1] Knuth, Donald. The Art of Computer Programming, Volume 2,
Seminumerical Algorithms.
--
Tim Roberts, ti**@probo.com
Providenza & Boekelheide, Inc.

Oct 19 '08 #16

Steven D'Aprano

On Sun, 19 Oct 2008 16:38:37 +1300, Lawrence D'Oliveiro wrote:

In message <01**********************@news.astraweb.com>, Steven D'Aprano
wrote:

>On Sat, 18 Oct 2008 09:16:11 +1300, Lawrence D'Oliveiro wrote:

>>Data can come in fractional bits. That's how compression works.

If you don't believe me, try compressing a single bit and see if you
get a "fractional bit".

If both states of the bit are not equally likely, then you do indeed
have a fractional bit, since

nrbits = (- logbase2(P[bit = 0]) - logbase2(P[bit = 1])) / 2

That's an arithmetic mean of the logarithms. It doesn't imply that there
are fractional bits any more than an average family having 2.3 children
implies that there are 0.3 of a child wandering around the streets.

Using the Shannon measure of information, you can have messages which
contain fractional information (technically, "surprisal"), when measured
in bits. But that doesn't imply the existence of fractional bits. Look at
it this way: consider a barter economy where I agree to swap 5 chickens
for 2 axes. So each axe is equivalent to 2.5 chickens. But that doesn't
imply that there is such a thing as 0.5 of a chicken -- at least not a
*live* chicken. While I can blithely talk about bartering fractional
chickens, in practice when I actually go to make good on my promise, it
must be an integer number of chickens.

Similarly, we can talk about messages containing fractional bits of
information, but when we actually store or transmit that message in
practice, we can only use integer numbers of bits.

As Wikipedia puts it:

It is important to differentiate between the use of "bit" in referring to
a discrete storage unit and the use of "bit" in referring to a
statistical unit of information. The bit, as a discrete storage unit, can
by definition store only 0 or 1. A statistical bit is the amount of
information that, on average[citation needed], can be stored in a
discrete bit. ... If these two ideas need to be distinguished, sometimes
the name bit is used when discussing data storage while shannon is used
for the statistical bit.

http://en.wikipedia.org/wiki/Bit

--
Steven

Oct 19 '08 #17

Steven D'Aprano

On Sun, 19 Oct 2008 04:38:04 +0000, Tim Roberts wrote:

Steven D'Aprano <st***@REMOVE-THIS-cybersource.com.auwrote:
>>
On Fri, 17 Oct 2008 20:51:37 +1300, Lawrence D'Oliveiro wrote:

>>Is piece really meant to be random? If so, your create_random_block
function isn't achieving much--xoring random data together isn't going
to produce anything more exciting than less random data than you
started with.

Hmmm... why do you say that xoring random data with other random data
produces less randomness than you started with?

I'm not saying that you're wrong, and certainly it is pointless since
you're not going to improve on the randomness of /dev/urandom without a
lot of work. But less random?

For those who got a bit lost here, I'd would point out that Knuth[1] has
an excellent chapter on random numbers that includes a detailed
discussion of this effect. His net takeaway is that most of the things
people do to increase randomness actually have exactly the opposite
effect.

I don't doubt it at all. But xoring random data with more random data?
I'm guessing that if the two sources of data are independent and from the
same distribution, then xoring them is pointless but not harmful. Here's
a rough-and-ready test which suggests there's little harm in it:

>>import os, math
def rand_data(size):

.... return [ord(c) for c in os.urandom(size)]
....

>>def mean(data):

.... return sum(data)/len(data)
....

>>def stdev(data):

.... return math.sqrt( mean([x**2 for x in data]) - mean(data)**2 )
....

>>A = rand_data(1000) # good random data
B = rand_data(1000) # more good random data
AB = [a^b for (a,b) in zip(A, B)] # is this still good random data?
assert len(AB) == len(A) == len(B)

mean(A), stdev(A)

(126, 73.91887445030531)

>>mean(B), stdev(B)

(128, 74.242844773082339)

>>mean(AB), stdev(AB)

(129, 74.39085965358916)
Note: I wouldn't take the above terribly seriously. Mean and standard
deviation alone are terrible measures of the randomness of data. But this
does suggest that any deviation from uniform randomness will be quite
subtle.

--
Steven

Oct 19 '08 #18

MRAB

On Oct 19, 7:13*am, Dennis Lee Bieber <wlfr...@ix.netcom.comwrote:

On Sun, 19 Oct 2008 04:38:04 GMT, Tim Roberts <t...@probo.comdeclaimed
the following in comp.lang.python:

For those who got a bit lost here, I'd would point out that Knuth[1] has an
excellent chapter on random numbers that includes a detailed discussionof
this effect. *His net takeaway is that most of the things people do to
increase randomness actually have exactly the opposite effect.

* * * * Some decade I'll have to obtain his volumes... But they've never
shown up in a $60 special offer from a book club (unlike the compact
editions of the OED) <G>.

* * * * And while XOR may seem significant, just consider die rolls...

* * * * If each "byte" were one die roll, you'd expect a nearly even
distribution... (for a 6 sided die, 1/6 would have each value). But
using the sum of two die, your begin to get a bell curve: 2 and 12
appear 1/36 of the time (each), but 7 occurs 6/36 of the time. Use three
die, and it gets worse: 3 and 18 occur 1/216, "10.5" occurs much more
often...

That should be one die, two dice, etc. :-)

Oct 19 '08 #19

Steve Holden

Lawrence D'Oliveiro wrote:

In message <01**********************@news.astraweb.com>, Steven D'Aprano
wrote:

>On Sat, 18 Oct 2008 09:16:11 +1300, Lawrence D'Oliveiro wrote:

>>Data can come in fractional bits. That's how compression works.
If you don't believe me, try compressing a single bit and see if you get
a "fractional bit".

If both states of the bit are not equally likely, then you do indeed have a
fractional bit, since

nrbits = (- logbase2(P[bit = 0]) - logbase2(P[bit = 1])) / 2

What's happening here is that the two different meanings of "bit" are
being confused. A bit is both a binary digit and a measure of information.

Obviously you can't create a bit stream with half a bit in it.

In a coding system where all messages of N binary digits are equally
likely then each message contains N bits of information content. This is
the theoretical upper bound on the information content.

In most practical systems, however, the messages have differing
probabilities; then an N-binary-digit message conveys less than N bits
of information, as Lawrence indicated above. Fractional bits are
perfectly valid as a measure of information content.

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/

Oct 19 '08 #20

Aaron Brady

Steven D'Aprano wrote:

On Sun, 19 Oct 2008 04:38:04 +0000, Tim Roberts wrote:

>Steven D'Aprano <st***@REMOVE-THIS-cybersource.com.auwrote:
>>>
On Fri, 17 Oct 2008 20:51:37 +1300, Lawrence D'Oliveiro wrote:

Is piece really meant to be random? If so, your create_random_block
function isn't achieving much--xoring random data together isn't going
to produce anything more exciting than less random data than you
started with.

Hmmm... why do you say that xoring random data with other random data
produces less randomness than you started with?

I'm not saying that you're wrong, and certainly it is pointless since
you're not going to improve on the randomness of /dev/urandom without a
lot of work. But less random?

For those who got a bit lost here, I'd would point out that Knuth[1] has
an excellent chapter on random numbers that includes a detailed
discussion of this effect. His net takeaway is that most of the things
people do to increase randomness actually have exactly the opposite
effect.

I don't doubt it at all. But xoring random data with more random data?
I'm guessing that if the two sources of data are independent and from the
same distribution, then xoring them is pointless but not harmful. Here's
a rough-and-ready test which suggests there's little harm in it:

>>>import os, math
def rand_data(size):

... return [ord(c) for c in os.urandom(size)]
...

>>>def mean(data):

... return sum(data)/len(data)
...

>>>def stdev(data):

... return math.sqrt( mean([x**2 for x in data]) - mean(data)**2 )
...

>>>A = rand_data(1000) # good random data
B = rand_data(1000) # more good random data
AB = [a^b for (a,b) in zip(A, B)] # is this still good random data?
assert len(AB) == len(A) == len(B)

mean(A), stdev(A)

(126, 73.91887445030531)

>>>mean(B), stdev(B)

(128, 74.242844773082339)

>>>mean(AB), stdev(AB)

(129, 74.39085965358916)
Note: I wouldn't take the above terribly seriously. Mean and standard
deviation alone are terrible measures of the randomness of data. But this
does suggest that any deviation from uniform randomness will be quite
subtle.

Operations like 'and' and 'or' will tend to destroy randomness. 'and'
tends to the 0-string and 'or' tends to the 1-string. I feel like 'xor'
should be safe (like Steven), but is the proof merely the half-and-half
split of the truth table?

Oct 19 '08 #21

Lawrence D'Oliveiro

In message <gd**********@lust.ihug.co.nz>, Lawrence D'Oliveiro wrote:

In message <01**********************@news.astraweb.com>, Steven D'Aprano
wrote:

>On Sat, 18 Oct 2008 09:16:11 +1300, Lawrence D'Oliveiro wrote:

>>Data can come in fractional bits. That's how compression works.

If you don't believe me, try compressing a single bit and see if you get
a "fractional bit".

If both states of the bit are not equally likely, then you do indeed have
a fractional bit, since

nrbits = (- logbase2(P[bit = 0]) - logbase2(P[bit = 1])) / 2

Oops, sorry, the formula should of course be

nrbits = - P[bit = 0] * logbase2(P[bit = 0])
- P[bit = 1] * logbase2(P[bit = 1])

Oct 19 '08 #22

xor: how come so slow?

Similar topics