454,985 Members | 1,161 Online
Need help? Post your question and get tips & solutions from a community of 454,985 IT Pros & Developers. It's quick & easy.

# need superfast binarysearch, with superfast icomparer

 P: n/a my motivation is as follows: i have to binarysearch several million times an long[] that is several million items big. this costs time, although i use Array.Binarysearch. so i try to keep the Compare-method as lean as possible, as it is summoned several 10 million times. in case of an integer the fastest way would be this i think: public int Compare(int a, int b) { return a-b; } the prob is the returnval needs to be an integer, because the binarysearch-method works with an int as far as i know. if i would write... public int Compare(long a, long b) { return a-b; } its an error, or i write... public int Compare(long a, long b) { return (int)(a-b); } i loose information, likely leading to incorrect results. any idea for a superfast way for long-types or anybody who knows a binarysearch-method that takes a long-type as input? thanks Mar 30 '06 #1
11 Replies

 P: n/a Why do you need a compare method? Your implementation (return a-b) works the same as the default comparison for long. If there's somehting I'm missing though, logical comparison might be faster than subtracting the numbers: int Compare(long a, long b) { if (a > b) return 1; else if (a < b) return -1; else return 0; } "solandre" wrote: my motivation is as follows: i have to binarysearch several million times an long[] that is several million items big. this costs time, although i use Array.Binarysearch. so i try to keep the Compare-method as lean as possible, as it is summoned several 10 million times. in case of an integer the fastest way would be this i think: public int Compare(int a, int b) { return a-b; } the prob is the returnval needs to be an integer, because the binarysearch-method works with an int as far as i know. if i would write... public int Compare(long a, long b) { return a-b; } its an error, or i write... public int Compare(long a, long b) { return (int)(a-b); } i loose information, likely leading to incorrect results. any idea for a superfast way for long-types or anybody who knows a binarysearch-method that takes a long-type as input? thanks Mar 30 '06 #2

 P: n/a I expect the best performance would be achieved by implementing the search in unsafe code (perhaps with assembler). The IComparable interface and the like make implementing searching and sorting _complex_ structures very easy to code (perhaps at the expense of performance). But you don't seem to be using a complex structure, so just use straightforward code and call it good. Mar 30 '06 #3

 P: n/a solandre wrote: my motivation is as follows: i have to binarysearch several million times an long[] that is several million items big. this costs time, although i use Array.Binarysearch. so i try to keep the Compare-method as lean as possible, as it is summoned several 10 million times. 1. Are the items you are looking for coming in as one big sorted collection? if they are you can do your lookup by linear travesing both collections at the same time. 2. If your input is coming in as an unsorted collection it may be faster to sort it and goto 1. 3. Perhaps you could use a dictionary for the items instead? if you only need to decide membership. Hashsets with longs can be extremely fast. If push comes to shove you can implement your own hastset for the longs and get *extremely* fast performance. public int Compare(long a, long b) { return (int)(a-b); } i loose information, likely leading to incorrect results. Yes, that would certainly be very bad. any idea for a superfast way for long-types or anybody who knows a binarysearch-method that takes a long-type as input? 1. try benchmarking using: public int Compare(long a, long b) { if ( a == b ) return 0; else if ( a < b ) return -1; else return 1; } is it fast enough? 2. you can write your own binary-search. 3. if the code is *still* not fast enough, you could step into the realm of unsafe code and/or write your own hash-set. -- Helge Jensen mailto:he**********@slog.dk sip:he**********@slog.dk -=> Sebastian cover-music: http://ungdomshus.nu <=- Mar 30 '06 #4

 P: n/a well, somthing i thought i dont need to mention is, that i use the long-type which actually is an ulong as a container for 3bit+35bit+26bit informations. for binarysearching the array i need just the 35bit-information which is allready longer than int. but this already means i cannot easily subtract the values. and i have to write a comparer which does the masking. the latest code looks like this: public int Compare(ulong PDA, ulong PDB) //compares ulong-pds by 35-bit-pID //<0...PDA is lower than PDB //0...equl //>0...PDA higher than PDB { //pID-35bit-masking PDA&=0x1FFFFFFFFC000000; PDB&=0x1FFFFFFFFC000000; if (PDA> PDB) return 1; else if (PDA< PDB) return -1; else return 0; } faster would be return ((PDA & 0x1FFFFFFFFC000000) - (PDB & 0x1FFFFFFFFC000000)), which is the border i am not able to cross for reason of the returntype. do you have any further hints on that? Mar 30 '06 #5

 P: n/a thanks for your replies. i want to return some information i already collected: 1.linear traversing: actually this binary-search question is part (as you guessed right) of a matching two arrays question which i posted later some minutes, as it is the wrapping topic. here i/we enhanced linear traversing with binary-search, which is an advance when the matches are far appart (see there). 2. on hash-tables: looking up a dictionary (in safe code) with 10million items 10 million times, unluckily takes longer than binary-searching two arrays of the same size. the dictionary is resized automatically by .net(i guess doubled if the load exeedes 70%). so there should be not too much collisions. but i did not research this further. 3. i wrote a binary-search (in safe code) but it performed poorly, so i didnt devlope it further, but might pick this up again later ... 4. unsafe code.... whoao .... :-) something i pushed far appart, but that keeps coming closer as more it is like "every tick counts". .... we might soon start to do some b-search in c++. Mar 30 '06 #6

 P: n/a solandre wrote: my motivation is as follows: i have to binarysearch several million times an long[] that is several million items big. Binary searching through "several million items" should still only take 20 iterations or so, shouldn't it? Are you sure your array is sorted to start with? If it's not, binary searching won't work and you may see odd results. You shouldn't need to specify an IComparer, however, as Int64 implements IComparable. Having said all this, here's a pretty fast comparison implementation: public int Compare (long a, long b) { if (a==b) { return 0; } return a < b ? -1 : 1; } If none of that helps, could you post a short but complete program which demonstrates the problem? See http://www.pobox.com/~skeet/csharp/complete.html for details of what I mean by that. -- Jon Skeet - http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet If replying to the group, please do not mail me too Mar 30 '06 #7

 P: n/a > public int Compare(ulong PDA, ulong PDB) //compares ulong-pds by 35-bit-pID //<0...PDA is lower than PDB //0...equl //>0...PDA higher than PDB { //pID-35bit-masking PDA&=0x1FFFFFFFFC000000; PDB&=0x1FFFFFFFFC000000; if (PDA> PDB) return 1; else if (PDA< PDB) return -1; else return 0; } Ah, hmmm, looks good. Mar 30 '06 #8