473,322 Members | 1,431 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,322 software developers and data experts.

Fast and safe method for XOR folding

Hi,

I'm looking for a computationally fast means of XOR folding
the individual bytes of an unsigned 32-bit int, and an unsigned
16-bit int. Can I point a char * at the address of each of
the int's (in succession), and safely access each byte?
Danger, danger, Will Robinsin!?? Any better ideas?? I'm
assuming this is faster than a series of 8-bit right shifts
followed by ORs with 255 as input to the XOR. E.g.

int i;
char *base;
uint32_t y = <blah>;
uint8_t x = 0;

base = (char *) &y;

for (i=0; i<4; i++) {
x ^= *(base+i);
}

(yes, differences in endianess will result in different byte ordering,
which might be important if only certain bytes were to be factored
into the result.)

versus

int i;
uint32_t y = <blah>;
uint8_t x = 0;

for (i=0; i<4; i++) {
x ^= ((t >> (8*i)) | 255); /* or some destructive variant for t */
/* ...to avoid the multiplication
*/
}

Also, should this result in too many collisions (as a hash key),
widening the space will have to result in shifts, e.g. 9 or 10-bits
plus a small number of bits left over... perhaps just better to do
it that way to begin with?

TIA,
Robert

Apr 4 '06 #1
5 6593
ro***********@gmail.com wrote:

Hi,

I'm looking for a computationally fast means of XOR folding
the individual bytes of an unsigned 32-bit int, and an unsigned
16-bit int. Can I point a char * at the address of each of
the int's (in succession), and safely access each byte?
Danger, danger, Will Robinsin!?? Any better ideas?? I'm
assuming this is faster than a series of 8-bit right shifts
followed by ORs with 255 as input to the XOR. E.g.

int i;
char *base;
Make that

unsigned char *base;

unsigned char is the most appropriate type
for bitwise operations on bytes.
Having a sign bit introduces complications.
uint32_t y = <blah>;
uint8_t x = 0;

base = (char *) &y;

for (i=0; i<4; i++) {
x ^= *(base+i);
}

(yes, differences in endianess will result in different byte ordering,
which might be important if only certain bytes were to be factored
into the result.)

versus

int i;
uint32_t y = <blah>;
uint8_t x = 0;

for (i=0; i<4; i++) {
x ^= ((t >> (8*i)) | 255); /* or some destructive variant for t */
/* ...to avoid the multiplication
*/
}

Also, should this result in too many collisions (as a hash key),
widening the space will have to result in shifts, e.g. 9 or 10-bits
plus a small number of bits left over... perhaps just better to do
it that way to begin with?


--
pete
Apr 4 '06 #2
ro***********@gmail.com wrote:
Hi,

I'm looking for a computationally fast means of XOR folding
the individual bytes of an unsigned 32-bit int, and an unsigned
16-bit int.
I can't find the 16-bit value in your example.
Can I point a char * at the address of each of
the int's (in succession), and safely access each byte?
Yes. As "pete" has already noted, though, it would be
better to use an `unsigned char*'.
Danger, danger, Will Robinsin!?? Any better ideas?? I'm
assuming this is faster than a series of 8-bit right shifts
followed by ORs with 255 as input to the XOR.
Such assumptions are (1) not justifiable in terms of the
C language, but only in terms of particular implementations,
and (2) stupid. Measure, measure, measure! (Better yet, ask
yourself how much difference it could possibly make. If you
shave three milliwiggles off the computation of a byte you
will then transmit via carrier pigeon, what have you saved?)
[...]
(yes, differences in endianess will result in different byte ordering,
which might be important if only certain bytes were to be factored
into the result.)
Thought exercise: Can you find four 8-bit values such that
the computed XOR of all four depends on the order in which they
are presented to the computation? (Incentive: If you find them,
you will have earned immortality as the person who exposed the
fundamental fallacies of Boolean algebra and overthrew the
foundations of computing. Everlasting fame -- go for it!)
Also, should this result in too many collisions (as a hash key),
widening the space will have to result in shifts, e.g. 9 or 10-bits
plus a small number of bits left over... perhaps just better to do
it that way to begin with?


I find myself unable to make sense of this paragraph. You
evidently have some sort of purpose in mind for this computation
on the bytes of an integer, and there's a hint that it might
have something to do with hashing -- but there's certainly not
enough information here for anyone to say whether X or Y is
"better."

--
Eric Sosman
es*****@acm-dot-org.invalid
Apr 5 '06 #3
Eric Sosman wrote:
Thought exercise: Can you find four 8-bit values such that
the computed XOR of all four depends on the order in which they
are presented to the computation?


If you have a four byte unsigned type and a two byte
unsigned data type, both with no padding,
then it can be done with only 2 XOR operations,
by folding it in half, twice.

0x12u ^ 0x34 ^ 0x56 ^ 0x78 is 0x8
0x12345678 folded in half twice is 0x8
0x12345678 folded sequentialy is 0x8

/* BEGIN new.c */

#include <stdio.h>
#include <assert.h>
#include <limits.h>

int main(void)
{
long unsigned data = 0x12345678;
unsigned short temp;
unsigned char folded;

assert(sizeof(long) == 4 && sizeof(short) == 2
&& ULONG_MAX == 0xffffffff && USHRT_MAX == 0xffff);

printf("0x12u ^ 0x34 ^ 0x56 ^ 0x78 is 0x%x\n",
0x12u ^ 0x34 ^ 0x56 ^ 0x78);

temp = (unsigned short)
(*(unsigned short *)&data ^ *((unsigned short *)&data + 1));
folded = (unsigned char)
(*(unsigned char *)&temp ^ *((unsigned char *)&temp + 1));
printf("0x12345678 folded in half twice is 0x%x\n",
(unsigned)folded);

folded = (unsigned char)
(* (unsigned char *)&data
^ *((unsigned char *)&data + 1)
^ *((unsigned char *)&data + 2)
^ *((unsigned char *)&data + 3));
printf("0x12345678 folded sequentialy is 0x%x\n",
(unsigned)folded);
return 0;
}

/* END new.c */

--
pete
Apr 5 '06 #4
ro***********@gmail.com wrote:
Hi,

I'm looking for a computationally fast means of XOR folding
the individual bytes of an unsigned 32-bit int, and an unsigned
16-bit int. ...

int i;
char *base;
uint32_t y = <blah>;
uint8_t x = 0;

base = (char *) &y;

for (i=0; i<4; i++) {
x ^= *(base+i);
}

(yes, differences in endianess will result in different byte ordering,
which might be important if only certain bytes were to be factored
into the result.)


You seem to be ignoring the golden rule that the person who wrote the
compiler is often a better programmer than you are (or I am.)

uint32_t y = <blah>;
uint32_t z = y ^ (y >> 16);
uint8_t x = z ^ (z >> 8);

As Eric says, measure, measure, measure.

--
Peter

Apr 5 '06 #5
Eric Sosman wrote:
I'm looking for a computationally fast means of XOR folding
the individual bytes of an unsigned 32-bit int, and an unsigned
16-bit int.


I can't find the 16-bit value in your example.


It wasn't present, but the principle was.
Can I point a char * at the address of each of
the int's (in succession), and safely access each byte?


Yes. As "pete" has already noted, though, it would be
better to use an `unsigned char*'.
Danger, danger, Will Robinsin!?? Any better ideas?? I'm
assuming this is faster than a series of 8-bit right shifts
followed by ORs with 255 as input to the XOR.


Such assumptions are (1) not justifiable in terms of the
C language, but only in terms of particular implementations,
and (2) stupid. Measure, measure, measure! (Better yet, ask
yourself how much difference it could possibly make. If you
shave three milliwiggles off the computation of a byte you
will then transmit via carrier pigeon, what have you saved?)


Yes; the advice is well taken. Amdahls' law, work to make the
common case fast, etc. In this case, the carrier pigeons have
already come home to roost.
[...]
(yes, differences in endianess will result in different byte ordering,
which might be important if only certain bytes were to be factored
into the result.)


Thought exercise: Can you find four 8-bit values such that
the computed XOR of all four depends on the order in which they
are presented to the computation? (Incentive: If you find them,
you will have earned immortality as the person who exposed the
fundamental fallacies of Boolean algebra and overthrew the
foundations of computing. Everlasting fame -- go for it!)


No, but it is quite possible that a subset of the total number of
8-bit bytes contains more information and therefore yields a
better result. Therefore the computation might justifiably be
limited to those bytes, and they are on one end or the other
because the endianess of the underlying implementation, then it
makes a difference. Sorry -- I was vague. I am attempting to
learn through experimentation. If you are like all the people I
know, then you learned a lot the same way; if not maybe you
will be the immortal one. :-)
Also, should this result in too many collisions (as a hash key),
widening the space will have to result in shifts, e.g. 9 or 10-bits
plus a small number of bits left over... perhaps just better to do
it that way to begin with?


I find myself unable to make sense of this paragraph. You
evidently have some sort of purpose in mind for this computation
on the bytes of an integer, and there's a hint that it might
have something to do with hashing -- but there's certainly not
enough information here for anyone to say whether X or Y is
"better."


I should not have asked for the value judgement -- I'll measure! :-)
Thanks for the response, Eric!!!

-r

Apr 5 '06 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: jf | last post by:
Hi, I've a big file, and I want to count the number of lines. I tried this: $i=0; $fp =fopen('file','r'); while ( ! feof ($fp) ) { $ligne = fgets($fp,8096); $i++;
11
by: Jeremy Pridmore | last post by:
Hi All, Here's a challenge. If there is a 'one word' answer to this, i.e. the name of a built in SQL Server function I can use, that would be great! However... I have a standard 1:m...
7
by: Anshul Sawant | last post by:
Have a look at the following code: #include<iostream> using namespace std; int main() { const int a = 1; const int* const aptr1 = &a;
13
by: William Stacey | last post by:
FYI. /// <summary> /// Author: William Stacey /// Fast and simple way to implement a singleton pattern without resorting /// to nested classes or other static vodo. Can also be easily converted...
14
by: John Salerno | last post by:
Specifically, I'm using UltraEdit and perhaps there's no way perfect way to implement code folding with it, given how it uses its syntax highlighting file to do so (i.e., you have to specify an...
95
by: hstagni | last post by:
Where can I find a library to created text-based windows applications? Im looking for a library that can make windows and buttons inside console.. Many old apps were make like this, i guess ...
2
by: email.ttindustries | last post by:
Hello, I am a C/C++ newbie and I have a simple question concerning fast data writing to binary files. Are there any other faster methods than the standard write() method to write to binary data...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.