473,624 Members | 2,025 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Fast Bit Operations Contest

Hello,

The objective of this contest is to set and clear a bit at an arbitrary
memory address as fast as possible.

Implement one or multiple prototypes to take part in the contest:

// 8 bit versions
// 1 in 256 bits
procedure SetBit_8bit( BaseAddress : pointer; BitIndex : byte );
function GetBit_8bit( BaseAddress : pointer; BitIndex : byte ) : boolean;
procedure ClearBit_8bit( BaseAddress : pointer; BitIndex : byte );
procedure XorBit_8bit( BaseAddress : pointer; BitIndex : int64 );

// 16 bit versions
// 1 in 2^16 bits
procedure SetBit_16bit( BaseAddress : pointer; BitIndex : word );
function GetBit_16bit( BaseAddress : pointer; BitIndex : word ) : boolean;
procedure ClearBit_16bit( BaseAddress : pointer; BitIndex : word );
procedure XorBit_16bit( BaseAddress : pointer; BitIndex : word );

// 32 bit versions
// 1 in 2^32 bits
procedure SetBit_32bit( BaseAddress : pointer; BitIndex : longword );
function GetBit_32bit( BaseAddress : pointer; BitIndex : longword ) :
boolean;
procedure ClearBit_32bit( BaseAddress : pointer; BitIndex : longword );
procedure XorBit_32bit( BaseAddress : pointer; BitIndex : longword );

// 64 bit versions
// 1 in 2^32 * 8 bits
procedure SetBit_64bit( BaseAddress : pointer; BitIndex : int64 );
function GetBit_64bit( BaseAddress : pointer; BitIndex : int64 ) : boolean;
procedure ClearBit_64bit( BaseAddress : pointer; BitIndex : int64 );
procedure XorBit_64bit( BaseAddress : pointer; BitIndex : int64 );

All implementations will be bug tested, examined, and performance tested.

Code can be in C, Delphi or Assembler.

Code should work on 286 and above.

( Hint: 286 code will run on 386,486,pentium , etc <- backwards compatible)

( Hint: Single 386 BTS instruction is slow )

( Hint: I already have a pretty fast 8 bit implementation )

Processor specific code is to be mentioned and still interesting.

Processor detection code is also interesting.

Results will be posted along the way.

Multiple entries/versions are allowed.

Final results will be posted as well.

Bye,
Skybuck.
Jan 6 '07 #1
7 2697
In case MMX instructions can also speed it up here is a possible MMX
prototype:

procedure SetFourBits_8bi t(
BaseAddress1 : pointer; BitIndex1 : byte;
BaseAddress2 : pointer; BitIndex2 : byte;
BaseAddress3 : pointer; BitIndex3 : byte;
BaseAddress4 : pointer; BitIndex4 : byte );

etc...

Bye,
Skybuck.
Jan 6 '07 #2
"Skybuck Flying" <sp**@hotmail.c omwrote in message
news:en******** **@news1.zwoll1 .ov.home.nl...
In case MMX instructions can also speed it up here is a possible MMX
prototype:

procedure SetFourBits_8bi t(
BaseAddress1 : pointer; BitIndex1 : byte;
BaseAddress2 : pointer; BitIndex2 : byte;
BaseAddress3 : pointer; BitIndex3 : byte;
BaseAddress4 : pointer; BitIndex4 : byte );
Euhm MMX register 64 bit... means 8 bytes so it could be:

procedure SetFourBits_8bi t(
BaseAddress1 : pointer; BitIndex1 : byte;
BaseAddress2 : pointer; BitIndex2 : byte;
BaseAddress3 : pointer; BitIndex3 : byte;
BaseAddress4 : pointer; BitIndex4 : byte);
BaseAddress5 : pointer; BitIndex5 : byte;
BaseAddress6 : pointer; BitIndex6 : byte;
BaseAddress7 : pointer; BitIndex7 : byte;
BaseAddress8 : pointer; BitIndex8 : byte );

8 random access bits in memory ;)

Bye,
Skybuck.
Jan 7 '07 #3
Or just:

procedure SetFourBits_8bi t(
BaseAddress : pointer;
BitIndex0 : byte;
BitIndex1 : byte;
BitIndex2 : byte;
BitIndex3 : byte;
BitIndex4 : byte;
BitIndex5 : byte;
BitIndex6 : byte;
BitIndex7 : byte );

If all bit indexes part of the same memory/data structure ;)

Bye,
Skybuck.
Jan 7 '07 #4

"Skybuck Flying" <sp**@hotmail.c omwrote in message
news:en******** **@news5.zwoll1 .ov.home.nl...
Or just:

procedure SetFourBits_8bi t(
BaseAddress : pointer;
BitIndex0 : byte;
BitIndex1 : byte;
BitIndex2 : byte;
BitIndex3 : byte;
BitIndex4 : byte;
BitIndex5 : byte;
BitIndex6 : byte;
BitIndex7 : byte );

Hmmm probably not possible with the current setbit algorithm.

MMX/SSE1/2/3 has no divide or shr instruction which works on bytes !?

Only words supported... there isn't even a divide instruction... hmm.

One could use floating points but that sounds slow.

Or one could still try to do 8 at the same time by using multiple mmx
registers and shifting them...

For now I ll try to implement 4 bit indexes

Bye,
Skybuck.
Jan 7 '07 #5
Hmmm,

Maybe it's still possible to do a 64 bit shift on all bytes at the same
time.

And later and them with a special mask to clear the upper bits of each byte
to zero ;)

So 8 bit indexes might still be possible ;)

Bye,
Skybuck.
Jan 7 '07 #6
Holyshit the pand instruction is 128 bit...

and mmx registers/technology is extended in xmm registers and technology...
hmmm

mmx <-xmm could be confusion but ok...

So maybe with SSE2 is even possible to set 16 bit indexes at the same time.

Holyshit ?!

Except maybe for the final writing to memory on independant memory locations
but still the calculation could be performed at the same time ! :)

Gonna try it :)

Bye,
Skybuck.
Jan 7 '07 #7
Hello,

Today I made a new xmm version which can set 8 indexes at the same time.

Before I actually coded it I estimated the speed based on the latency in the
optimization manual and I estimated it would be slower than the non-xmm
version.

I still decided to test it to be sure... and indeed the benchmark does show
the xmm version to be slower...

I actually wanted to use a loop to make the code 8 times shorther but
unfortunately the extract word instruction does not except a parameter for
the word index/offset... so all code
had to be repeated 8 times !

The overhead is probably in pushing the xmm data into the stack, loading it,
extracting it etc <- extra instructions needed.

Still interesting to see though, maybe somebody knows some handy to tricks
to speed it up a little further.

I shall also compare it to the other benchmark method from Herbert...
haven't done that yet gonna do it right away ;) :) to see the results =D

type
Txmm8Indexes = packed record
mIndex : packed array[0..7] of word;
end;

type
Txmm8IndexesAnd Mask = packed record
mMask : packed array[0..7] of word;
end;

procedure SetBit_16bit_8i ndexes_xmm( BaseAddress : pointer; Xmm8Indexes :
Txmm8Indexes );
const
vXmm8IndexesAnd Mask : Txmm8IndexesAnd Mask = ( mMask : (7,7,7,7, 7,7,7,7) );
begin
asm
// step 1. Copy BitIndexes into BytePositions (xmm0) and BitPositions
(xmm1)
movdqu xmm0, Xmm8Indexes
movdqu xmm1, xmm0

// step 2. Divide the BitIndexes in BytePositions(x mm0) by 8 (shr 3) to
get the true BytePositions (xmm0)
psrlw xmm0, $3

// step 3. Mod the BitIndexes in BitPositions(xm m1) by 8 (and 7) to get
the true BitPositions(xm m1)
pand xmm1, vXmm8IndexesAnd Mask

// optimize step: store BaseAddress in edx
mov edx, BaseAddress

// Repeat 8 times:

// BitIndex 0

// step 4. Load the BaseAddress
mov eax, edx

// step 5. Load extract the BytePosition
pextrw ecx, xmm0, 0

// step 6. Add the BytePosition to the BaseAddress
add eax, ecx

// step 7. Extract BitPosition
pextrw ecx, xmm1, 0

// step 8. Set initial mask to 1
mov ch, $1

// step 9. Shift the mask left with the BitPosition
shl ch, cl

// step 10. Or the byte content with the mask
or byte ptr [eax], ch

// BitIndex 1

// step 4. Load the BaseAddress
mov eax, edx

// step 5. Load extract the BytePosition
pextrw ecx, xmm0, 1

// step 6. Add the BytePosition to the BaseAddress
add eax, ecx

// step 7. Extract BitPosition
pextrw ecx, xmm1, 1

// step 8. Set initial mask to 1
mov ch, $1

// step 9. Shift the mask left with the BitPosition
shl ch, cl

// step 10. Or the byte content with the mask
or byte ptr [eax], ch

// BitIndex 2

// step 4. Load the BaseAddress
mov eax, edx

// step 5. Load extract the BytePosition
pextrw ecx, xmm0, 2

// step 6. Add the BytePosition to the BaseAddress
add eax, ecx

// step 7. Extract BitPosition
pextrw ecx, xmm1, 2

// step 8. Set initial mask to 1
mov ch, $1

// step 9. Shift the mask left with the BitPosition
shl ch, cl

// step 10. Or the byte content with the mask
or byte ptr [eax], ch

// BitIndex 3

// step 4. Load the BaseAddress
mov eax, edx

// step 5. Load extract the BytePosition
pextrw ecx, xmm0, 3

// step 6. Add the BytePosition to the BaseAddress
add eax, ecx

// step 7. Extract BitPosition
pextrw ecx, xmm1, 3

// step 8. Set initial mask to 1
mov ch, $1

// step 9. Shift the mask left with the BitPosition
shl ch, cl

// step 10. Or the byte content with the mask
or byte ptr [eax], ch

// BitIndex 4

// step 4. Load the BaseAddress
mov eax, edx

// step 5. Load extract the BytePosition
pextrw ecx, xmm0, 4

// step 6. Add the BytePosition to the BaseAddress
add eax, ecx

// step 7. Extract BitPosition
pextrw ecx, xmm1, 4

// step 8. Set initial mask to 1
mov ch, $1

// step 9. Shift the mask left with the BitPosition
shl ch, cl

// step 10. Or the byte content with the mask
or byte ptr [eax], ch

// BitIndex 5

// step 4. Load the BaseAddress
mov eax, edx

// step 5. Load extract the BytePosition
pextrw ecx, xmm0, 5

// step 6. Add the BytePosition to the BaseAddress
add eax, ecx

// step 7. Extract BitPosition
pextrw ecx, xmm1, 5

// step 8. Set initial mask to 1
mov ch, $1

// step 9. Shift the mask left with the BitPosition
shl ch, cl

// step 10. Or the byte content with the mask
or byte ptr [eax], ch

// BitIndex 6

// step 4. Load the BaseAddress
mov eax, edx

// step 5. Load extract the BytePosition
pextrw ecx, xmm0, 6

// step 6. Add the BytePosition to the BaseAddress
add eax, ecx

// step 7. Extract BitPosition
pextrw ecx, xmm1, 6

// step 8. Set initial mask to 1
mov ch, $1

// step 9. Shift the mask left with the BitPosition
shl ch, cl

// step 10. Or the byte content with the mask
or byte ptr [eax], ch

// BitIndex 7

// step 4. Load the BaseAddress
mov eax, edx

// step 5. Load extract the BytePosition
pextrw ecx, xmm0, 7

// step 6. Add the BytePosition to the BaseAddress
add eax, ecx

// step 7. Extract BitPosition
pextrw ecx, xmm1, 7

// step 8. Set initial mask to 1
mov ch, $1

// step 9. Shift the mask left with the BitPosition
shl ch, cl

// step 10. Or the byte content with the mask
or byte ptr [eax], ch
end;
end;

Bye,
Skybuck.
Jan 8 '07 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

18
4864
by: Michele Simionato | last post by:
I posted this few weeks ago (remember the C Sharp thread?) but it went unnoticed on the large mass of posts, so let me retry. Here I get Python+ Psyco twice as fast as optimized C, so I would like to now if something is wrong on my old laptop and if anybody can reproduce my results. Here are I my numbers for calling the error function a million times (Python 2.3, Psyco 1.0, Red Hat Linux 7.3, Pentium II 366 MHz): $ time p23 erf.py real ...
0
1327
by: Sridhar | last post by:
Hi, We, the students of CEG, Anna University are organizing an online programming contest as part of aBaCus 2005. The contest itself will start on 6th March 2005 at 1:00 pm IST and will end after 5 hours. You have to solve the problems posted at the start of the contest. Teams ranking high will be awarded the prizes. As a special note, inspite of C,C++ and Java we also allow Python this time. So we hope a lot of Pythonistas...
0
1615
by: Sridhar | last post by:
Hi, We, the students of CEG, Anna University are organizing an online programming contest as part of aBaCus 2005. The contest itself will start on 6th March 2005 at 1:00 pm IST and will end after 5 hours. You have to solve the problems posted at the start of the contest. Teams ranking high will be awared the prizes. As a special note, inspite of C,C++ and Java we also allow Python this time. So we hope a lot of Pythonistas...
0
1728
by: anujb | last post by:
----------------------------------------------------------------------- We are very Pleased to announce IOPC-05 - The International Online Programming Contest. INTRODUCTION - IOPC is organized every year as a part of TECHKRITI - the annual Science and Technology festival of Indian Institute of Technology, Kanpur , India.
24
2702
by: Alex Vinokur | last post by:
Consider the following statement: n+i, where i = 1 or 0. Is there more fast method for computing n+i than direct computing that sum? -- Alex Vinokur email: alex DOT vinokur AT gmail DOT com http://mathforum.org/library/view/10978.html
20
9138
by: GS | last post by:
The stdint.h header definition mentions five integer categories, 1) exact width, eg., int32_t 2) at least as wide as, eg., int_least32_t 3) as fast as possible but at least as wide as, eg., int_fast32_t 4) integer capable of holding a pointer, intptr_t 5) widest integer in the implementation, intmax_t Is there a valid motivation for having both int_least and int_fast?
6
3402
by: thecodemachine | last post by:
Hi, I'm looking for a fast and simple one to one hash function, suitable for longer strings (up to 2048 in length). I'd like keys to be relatively short, I doubt I'd be creating more than 256 keys.. I could use md5 but even that is more than I need, and I assume not the fastest algorithm (?), so I'm reluctant to use it. I've been looking around a lot, not finding much anything come to mind?
0
1267
by: Tom 7 | last post by:
Language lovers: Registration is now open for the 9th Annual ICFP Programming Contest! http://icfpcontest.org/ The contest, associated with the International Conference on Functional Programming, will be held on the weekend of July 21-24. The contest task will be released at noon EDT on Friday, and entries will
22
2685
by: SETT Programming Contest | last post by:
The SETT Programming Contest: The fastest set<Timplementation Write the fastest set<Timplementation using only standard C++/C. Ideally it should have the same interface like std::set. At least the following methods must be implemented: insert(), find(), begin(), end(), erase(), size(), operator<(), and at least the forward iterator. Here, speed and correctness are the 2 most important factors. Functionally it should behave similar to...
0
8236
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8173
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8621
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
7159
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5563
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4079
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4174
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
1785
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
2
1482
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.