469,589 Members | 2,177 Online

xor incongruences

Hi,
I write a simple encoder in python and Java; they do the same
computations, with the same inputs: however they won't produce the same
output.
Let me explain with code.

First of all, you need a test file for input:
\$ dd if=/dev/urandom of=test.img bs=1048576 count=1

I have attached the code.
As you can see we have xor-red the same inputs (bitlists are the same,
thus the selected blocks to xor are the same - you can easily see it,
whenever a block is xor-red both programs will print out its hash).
But here comes the strange: the random_block that the
create_random_block function returns is not the same: in Java this block
has an hash which is different from the Python one.
Why?

Thank you

import os
import sha
import sys

class EncoderDecoder(object):
def create_random_block(self,piece,blocksize=16384):
if len(piece) % blocksize != 0:
raise Exception('size error')
self.N = len(piece)/blocksize
random_block = ['0']*blocksize
bitlist = [1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1]
for i in range(len(bitlist)):
if bitlist[i] == 1:
block = piece[i*blocksize:i*blocksize+blocksize]
print '%d-%d %s' %(i*blocksize,i*blocksize+blocksize,sha.new(block) .hexdigest())
for j in range(blocksize):
random_block[j] = chr(ord(random_block[j]) ^ ord(block[j]))
print sha.new(''.join(random_block)).hexdigest()
return ''.join(random_block)

if __name__ == '__main__':
x = EncoderDecoder()
x.create_random_block(data)
sys.exit(0)

import java.io.FileInputStream;
import java.io.IOException;
import java.io.UnsupportedEncodingException;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;

public class Test {
int N;

public byte[] create_random_block(byte[] piece)
throws NoSuchAlgorithmException, UnsupportedEncodingException {
N = piece.length / 16384;
byte[] random_block = new byte[16384];
int[] bitlist = { 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0,
0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0,
1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1,
0, 1, 0, 1 };
for (int i = 0; i < N; i++) {
if (bitlist[i] == 1) {
byte[] block = new byte[16384];
for (int j = 0; j < block.length; j++) {
block[j] = piece[i * 16384 + j];
}
System.out.println(i * 16384 + "-" + (i * 16384 + 16384)
+ " " + AeSimpleSHA1.SHA1(block));
for (int j = 0; j < random_block.length; j++) {
random_block[j] = (byte) (random_block[j] ^ block[j]);
}
}
}
System.out.println(AeSimpleSHA1.SHA1(random_block) );
return random_block;
}

public static void main(String[] args) throws IOException,
NoSuchAlgorithmException {
byte data[] = new byte[1024 * 1024];
FileInputStream fi = new FileInputStream("test.img");
Test x = new Test();
x.create_random_block(data);
System.exit(0);

}

public static class AeSimpleSHA1 {
private static String convertToHex(byte[] data) {
StringBuffer buf = new StringBuffer();
for (int i = 0; i < data.length; i++) {
int halfbyte = (data[i] >>4) & 0x0F;
int two_halfs = 0;
do {
if ((0 <= halfbyte) && (halfbyte <= 9))
buf.append((char) ('0' + halfbyte));
else
buf.append((char) ('a' + (halfbyte - 10)));
halfbyte = data[i] & 0x0F;
} while (two_halfs++ < 1);
}
return buf.toString();
}

public static String SHA1(byte[] text) throws NoSuchAlgorithmException,
UnsupportedEncodingException {
MessageDigest md;
md = MessageDigest.getInstance("SHA-1");
byte[] sha1hash = new byte[40];
md.update(text);
sha1hash = md.digest();
return convertToHex(sha1hash);
}
}

}

Oct 16 '08 #1
3 1219
Michele:
in Java this block has an hash which is different from the Python one.
Note that integer numbers in Python are multiprecision by default,
this may cause differences.

You can put some prints in various stages of the data flow (or
breakpoints for your debuggers, etc) to spot where the variable
contents start to differ.

Bye,
bearophile
Oct 16 '08 #2
Michele wrote:
Hi,
I write a simple encoder in python and Java; they do the same
computations, with the same inputs: however they won't produce the same
output.
Let me explain with code.

First of all, you need a test file for input:
\$ dd if=/dev/urandom of=test.img bs=1048576 count=1

I have attached the code.
As you can see we have xor-red the same inputs (bitlists are the same,
thus the selected blocks to xor are the same - you can easily see it,
whenever a block is xor-red both programs will print out its hash).
But here comes the strange: the random_block that the
create_random_block function returns is not the same: in Java this block
has an hash which is different from the Python one.
Why?
random_block = ['0']*blocksize
should be

random_block = ['\0']*blocksize

As John Machin already told you in another thread -- the character "0" is
not the same as the 0-byte "\0".

Peter
Oct 16 '08 #3
On Oct 17, 7:02*am, Michele <mich...@nectarine.itwrote:
Hi,
I write a simple encoder in python and Java; they do the same
computations, with the same inputs
No they don't.
however they won't produce the same
output.
Let me explain with code.
You have a strange understanding of the word "explain".
>
First of all, you need a test file for input:
\$ dd if=/dev/urandom of=test.img bs=1048576 count=1

I have attached the code.
As you can see we have xor-red the same inputs (bitlists are the same,
thus the selected blocks to xor are the same - you can easily see it,
whenever a block is xor-red both programs will print out its hash).
But here comes the strange: the random_block that the
create_random_block function returns is not the same: in Java this block
has an hash which is different from the Python one.
Why?

Thank you

[test.py1K ]
* * * * random_block = ['0']*blocksize
This initialises each element to '0'. ord('0') is 48, not 0. Either
I'm hallucinating, or I pointed this out to you in response to your
previous posting (within the last few days).
* * * * * * * * * * random_block[j] = chr(ord(random_block[j]) ^ ord(block[j]))

[Test.java2K ]
* * * * * * * * byte[] random_block = new byte[16384];
Presumably java initialises each element to 0 (or maybe random
gibberish); it is highly unlikely to be 48!
* * * * * * * * * * * * * * * * * * * * random_block[j] = (byte) (random_block[j] ^ block[j]);
Oct 16 '08 #4