473,383 Members | 1,795 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,383 software developers and data experts.

Fast Bit Operations Contest

Hello,

The objective of this contest is to set and clear a bit at an arbitrary
memory address as fast as possible.

Implement one or multiple prototypes to take part in the contest:

// 8 bit versions
// 1 in 256 bits
procedure SetBit_8bit( BaseAddress : pointer; BitIndex : byte );
function GetBit_8bit( BaseAddress : pointer; BitIndex : byte ) : boolean;
procedure ClearBit_8bit( BaseAddress : pointer; BitIndex : byte );
procedure XorBit_8bit( BaseAddress : pointer; BitIndex : int64 );

// 16 bit versions
// 1 in 2^16 bits
procedure SetBit_16bit( BaseAddress : pointer; BitIndex : word );
function GetBit_16bit( BaseAddress : pointer; BitIndex : word ) : boolean;
procedure ClearBit_16bit( BaseAddress : pointer; BitIndex : word );
procedure XorBit_16bit( BaseAddress : pointer; BitIndex : word );

// 32 bit versions
// 1 in 2^32 bits
procedure SetBit_32bit( BaseAddress : pointer; BitIndex : longword );
function GetBit_32bit( BaseAddress : pointer; BitIndex : longword ) :
boolean;
procedure ClearBit_32bit( BaseAddress : pointer; BitIndex : longword );
procedure XorBit_32bit( BaseAddress : pointer; BitIndex : longword );

// 64 bit versions
// 1 in 2^32 * 8 bits
procedure SetBit_64bit( BaseAddress : pointer; BitIndex : int64 );
function GetBit_64bit( BaseAddress : pointer; BitIndex : int64 ) : boolean;
procedure ClearBit_64bit( BaseAddress : pointer; BitIndex : int64 );
procedure XorBit_64bit( BaseAddress : pointer; BitIndex : int64 );

All implementations will be bug tested, examined, and performance tested.

Code can be in C, Delphi or Assembler.

Code should work on 286 and above.

( Hint: 286 code will run on 386,486,pentium, etc <- backwards compatible)

( Hint: Single 386 BTS instruction is slow )

( Hint: I already have a pretty fast 8 bit implementation )

Processor specific code is to be mentioned and still interesting.

Processor detection code is also interesting.

Results will be posted along the way.

Multiple entries/versions are allowed.

Final results will be posted as well.

Bye,
Skybuck.
Jan 6 '07 #1
7 2686
In case MMX instructions can also speed it up here is a possible MMX
prototype:

procedure SetFourBits_8bit(
BaseAddress1 : pointer; BitIndex1 : byte;
BaseAddress2 : pointer; BitIndex2 : byte;
BaseAddress3 : pointer; BitIndex3 : byte;
BaseAddress4 : pointer; BitIndex4 : byte );

etc...

Bye,
Skybuck.
Jan 6 '07 #2
"Skybuck Flying" <sp**@hotmail.comwrote in message
news:en**********@news1.zwoll1.ov.home.nl...
In case MMX instructions can also speed it up here is a possible MMX
prototype:

procedure SetFourBits_8bit(
BaseAddress1 : pointer; BitIndex1 : byte;
BaseAddress2 : pointer; BitIndex2 : byte;
BaseAddress3 : pointer; BitIndex3 : byte;
BaseAddress4 : pointer; BitIndex4 : byte );
Euhm MMX register 64 bit... means 8 bytes so it could be:

procedure SetFourBits_8bit(
BaseAddress1 : pointer; BitIndex1 : byte;
BaseAddress2 : pointer; BitIndex2 : byte;
BaseAddress3 : pointer; BitIndex3 : byte;
BaseAddress4 : pointer; BitIndex4 : byte);
BaseAddress5 : pointer; BitIndex5 : byte;
BaseAddress6 : pointer; BitIndex6 : byte;
BaseAddress7 : pointer; BitIndex7 : byte;
BaseAddress8 : pointer; BitIndex8 : byte );

8 random access bits in memory ;)

Bye,
Skybuck.
Jan 7 '07 #3
Or just:

procedure SetFourBits_8bit(
BaseAddress : pointer;
BitIndex0 : byte;
BitIndex1 : byte;
BitIndex2 : byte;
BitIndex3 : byte;
BitIndex4 : byte;
BitIndex5 : byte;
BitIndex6 : byte;
BitIndex7 : byte );

If all bit indexes part of the same memory/data structure ;)

Bye,
Skybuck.
Jan 7 '07 #4

"Skybuck Flying" <sp**@hotmail.comwrote in message
news:en**********@news5.zwoll1.ov.home.nl...
Or just:

procedure SetFourBits_8bit(
BaseAddress : pointer;
BitIndex0 : byte;
BitIndex1 : byte;
BitIndex2 : byte;
BitIndex3 : byte;
BitIndex4 : byte;
BitIndex5 : byte;
BitIndex6 : byte;
BitIndex7 : byte );

Hmmm probably not possible with the current setbit algorithm.

MMX/SSE1/2/3 has no divide or shr instruction which works on bytes !?

Only words supported... there isn't even a divide instruction... hmm.

One could use floating points but that sounds slow.

Or one could still try to do 8 at the same time by using multiple mmx
registers and shifting them...

For now I ll try to implement 4 bit indexes

Bye,
Skybuck.
Jan 7 '07 #5
Hmmm,

Maybe it's still possible to do a 64 bit shift on all bytes at the same
time.

And later and them with a special mask to clear the upper bits of each byte
to zero ;)

So 8 bit indexes might still be possible ;)

Bye,
Skybuck.
Jan 7 '07 #6
Holyshit the pand instruction is 128 bit...

and mmx registers/technology is extended in xmm registers and technology...
hmmm

mmx <-xmm could be confusion but ok...

So maybe with SSE2 is even possible to set 16 bit indexes at the same time.

Holyshit ?!

Except maybe for the final writing to memory on independant memory locations
but still the calculation could be performed at the same time ! :)

Gonna try it :)

Bye,
Skybuck.
Jan 7 '07 #7
Hello,

Today I made a new xmm version which can set 8 indexes at the same time.

Before I actually coded it I estimated the speed based on the latency in the
optimization manual and I estimated it would be slower than the non-xmm
version.

I still decided to test it to be sure... and indeed the benchmark does show
the xmm version to be slower...

I actually wanted to use a loop to make the code 8 times shorther but
unfortunately the extract word instruction does not except a parameter for
the word index/offset... so all code
had to be repeated 8 times !

The overhead is probably in pushing the xmm data into the stack, loading it,
extracting it etc <- extra instructions needed.

Still interesting to see though, maybe somebody knows some handy to tricks
to speed it up a little further.

I shall also compare it to the other benchmark method from Herbert...
haven't done that yet gonna do it right away ;) :) to see the results =D

type
Txmm8Indexes = packed record
mIndex : packed array[0..7] of word;
end;

type
Txmm8IndexesAndMask = packed record
mMask : packed array[0..7] of word;
end;

procedure SetBit_16bit_8indexes_xmm( BaseAddress : pointer; Xmm8Indexes :
Txmm8Indexes );
const
vXmm8IndexesAndMask : Txmm8IndexesAndMask = ( mMask : (7,7,7,7, 7,7,7,7) );
begin
asm
// step 1. Copy BitIndexes into BytePositions (xmm0) and BitPositions
(xmm1)
movdqu xmm0, Xmm8Indexes
movdqu xmm1, xmm0

// step 2. Divide the BitIndexes in BytePositions(xmm0) by 8 (shr 3) to
get the true BytePositions (xmm0)
psrlw xmm0, $3

// step 3. Mod the BitIndexes in BitPositions(xmm1) by 8 (and 7) to get
the true BitPositions(xmm1)
pand xmm1, vXmm8IndexesAndMask

// optimize step: store BaseAddress in edx
mov edx, BaseAddress

// Repeat 8 times:

// BitIndex 0

// step 4. Load the BaseAddress
mov eax, edx

// step 5. Load extract the BytePosition
pextrw ecx, xmm0, 0

// step 6. Add the BytePosition to the BaseAddress
add eax, ecx

// step 7. Extract BitPosition
pextrw ecx, xmm1, 0

// step 8. Set initial mask to 1
mov ch, $1

// step 9. Shift the mask left with the BitPosition
shl ch, cl

// step 10. Or the byte content with the mask
or byte ptr [eax], ch

// BitIndex 1

// step 4. Load the BaseAddress
mov eax, edx

// step 5. Load extract the BytePosition
pextrw ecx, xmm0, 1

// step 6. Add the BytePosition to the BaseAddress
add eax, ecx

// step 7. Extract BitPosition
pextrw ecx, xmm1, 1

// step 8. Set initial mask to 1
mov ch, $1

// step 9. Shift the mask left with the BitPosition
shl ch, cl

// step 10. Or the byte content with the mask
or byte ptr [eax], ch

// BitIndex 2

// step 4. Load the BaseAddress
mov eax, edx

// step 5. Load extract the BytePosition
pextrw ecx, xmm0, 2

// step 6. Add the BytePosition to the BaseAddress
add eax, ecx

// step 7. Extract BitPosition
pextrw ecx, xmm1, 2

// step 8. Set initial mask to 1
mov ch, $1

// step 9. Shift the mask left with the BitPosition
shl ch, cl

// step 10. Or the byte content with the mask
or byte ptr [eax], ch

// BitIndex 3

// step 4. Load the BaseAddress
mov eax, edx

// step 5. Load extract the BytePosition
pextrw ecx, xmm0, 3

// step 6. Add the BytePosition to the BaseAddress
add eax, ecx

// step 7. Extract BitPosition
pextrw ecx, xmm1, 3

// step 8. Set initial mask to 1
mov ch, $1

// step 9. Shift the mask left with the BitPosition
shl ch, cl

// step 10. Or the byte content with the mask
or byte ptr [eax], ch

// BitIndex 4

// step 4. Load the BaseAddress
mov eax, edx

// step 5. Load extract the BytePosition
pextrw ecx, xmm0, 4

// step 6. Add the BytePosition to the BaseAddress
add eax, ecx

// step 7. Extract BitPosition
pextrw ecx, xmm1, 4

// step 8. Set initial mask to 1
mov ch, $1

// step 9. Shift the mask left with the BitPosition
shl ch, cl

// step 10. Or the byte content with the mask
or byte ptr [eax], ch

// BitIndex 5

// step 4. Load the BaseAddress
mov eax, edx

// step 5. Load extract the BytePosition
pextrw ecx, xmm0, 5

// step 6. Add the BytePosition to the BaseAddress
add eax, ecx

// step 7. Extract BitPosition
pextrw ecx, xmm1, 5

// step 8. Set initial mask to 1
mov ch, $1

// step 9. Shift the mask left with the BitPosition
shl ch, cl

// step 10. Or the byte content with the mask
or byte ptr [eax], ch

// BitIndex 6

// step 4. Load the BaseAddress
mov eax, edx

// step 5. Load extract the BytePosition
pextrw ecx, xmm0, 6

// step 6. Add the BytePosition to the BaseAddress
add eax, ecx

// step 7. Extract BitPosition
pextrw ecx, xmm1, 6

// step 8. Set initial mask to 1
mov ch, $1

// step 9. Shift the mask left with the BitPosition
shl ch, cl

// step 10. Or the byte content with the mask
or byte ptr [eax], ch

// BitIndex 7

// step 4. Load the BaseAddress
mov eax, edx

// step 5. Load extract the BytePosition
pextrw ecx, xmm0, 7

// step 6. Add the BytePosition to the BaseAddress
add eax, ecx

// step 7. Extract BitPosition
pextrw ecx, xmm1, 7

// step 8. Set initial mask to 1
mov ch, $1

// step 9. Shift the mask left with the BitPosition
shl ch, cl

// step 10. Or the byte content with the mask
or byte ptr [eax], ch
end;
end;

Bye,
Skybuck.
Jan 8 '07 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

18
by: Michele Simionato | last post by:
I posted this few weeks ago (remember the C Sharp thread?) but it went unnoticed on the large mass of posts, so let me retry. Here I get Python+ Psyco twice as fast as optimized C, so I would like...
0
by: Sridhar | last post by:
Hi, We, the students of CEG, Anna University are organizing an online programming contest as part of aBaCus 2005. The contest itself will start on 6th March 2005 at 1:00 pm IST and will end...
0
by: Sridhar | last post by:
Hi, We, the students of CEG, Anna University are organizing an online programming contest as part of aBaCus 2005. The contest itself will start on 6th March 2005 at 1:00 pm IST and will end...
0
by: anujb | last post by:
----------------------------------------------------------------------- We are very Pleased to announce IOPC-05 - The International Online Programming Contest. INTRODUCTION - IOPC is...
24
by: Alex Vinokur | last post by:
Consider the following statement: n+i, where i = 1 or 0. Is there more fast method for computing n+i than direct computing that sum? -- Alex Vinokur email: alex DOT vinokur AT gmail DOT...
20
by: GS | last post by:
The stdint.h header definition mentions five integer categories, 1) exact width, eg., int32_t 2) at least as wide as, eg., int_least32_t 3) as fast as possible but at least as wide as, eg.,...
6
by: thecodemachine | last post by:
Hi, I'm looking for a fast and simple one to one hash function, suitable for longer strings (up to 2048 in length). I'd like keys to be relatively short, I doubt I'd be creating more than 256...
0
by: Tom 7 | last post by:
Language lovers: Registration is now open for the 9th Annual ICFP Programming Contest! http://icfpcontest.org/ The contest, associated with the International Conference on Functional...
22
by: SETT Programming Contest | last post by:
The SETT Programming Contest: The fastest set<Timplementation Write the fastest set<Timplementation using only standard C++/C. Ideally it should have the same interface like std::set. At least...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.