473,769 Members | 1,640 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

fastest searchable datastructure?

Hi,

I need some type of array/list/... In which I can store objects together
with a unique key. The most important thing is performance: I will need to
do the whole time searches in the list of given keys. Which datastructure
will be best suited for this? A Hashtable? The list may contain upto 10^12
items, bit more proably most of the time +- 10^9 of even 10^6...

Thanks a lot in advance,
Pieter
Jan 16 '08
21 1591
A Hashtable would be faster in terms of searching, as long as one was
searching by a key value.

--
HTH,

Kevin Spencer
Chicken Salad Surgeon
Microsoft MVP

"Peter Bromberg [C# MVP]" <pb*******@yaho o.NoSpamMaam.co mwrote in message
news:E9******** *************** ***********@mic rosoft.com...
I'd think that a Generic List or Generic Dictionary would be faster as
there
would be no boxing / unboxing involved in adding or retrieving typed
elements.
-- Peter
Site: http://www.eggheadcafe.com
UnBlog: http://petesbloggerama.blogspot.com
MetaFinder: http://www.blogmetafinder.com
"Pieter" wrote:
>Hi,

I need some type of array/list/... In which I can store objects together
with a unique key. The most important thing is performance: I will need
to
do the whole time searches in the list of given keys. Which datastructure
will be best suited for this? A Hashtable? The list may contain upto
10^12
items, bit more proably most of the time +- 10^9 of even 10^6...

Thanks a lot in advance,
Pieter

Jan 16 '08 #11
(of course, given the numbers in question, *any* in-memory approach is
doomed... doomed...)

[Cor Ligthert]
System.Collecti ons.SortedList
&
[Kevin Spencer]
A Hashtable would be faster in terms of searching, as long as one was
searching by a key value.
Yes, a hash-table implementation is better than a flat list, but
Hashtable is the non-typed version; for the reasons already given by
Peter the generic versions would generally be preferable -
specifically one of Dictionary<TKey , TValue>, SortedDictionar y<TKey,
TValueand SortedList<TKey , TValue(dependin g on what operations are
important). The OP indicates that *read* performance is the key here
(not insertion/deletion); these report O(1), O(log n) and O(log n) for
retrieval - hence Dictionary<TKey , TValuewould be my first choice.
But the real answer is to test it with typical data: if you code the
app to work against the IDictionary<TKe y, TValueinterface (rather
than any concrete implementation) you should be able to try different
implementations with likely data.

Marc
Jan 16 '08 #12
Hello Pieter

Balena did some tests and posted the results on his website , however with
the transition form VB2themax to dotnet2themax the article seems to be
disapeared
However the result was that the hashtable was and still is the fastest key
value pair object , the higher the amount of data was the more efiicient the
hashtable was in comparisation to other methods for sowell insertion as
retrieving of the key value pairs ( Balena timed both the time it took to
built up the data tree as the time it took to retrieve n elements ) ,
however with the hughe amounts you are talking about a database would be a
much bether idea
HTH

Michel



"Pieter" <pi************ ****@hotmail.co mschreef in bericht
news:OH******** ********@TK2MSF TNGP04.phx.gbl. ..
Hi,

I need some type of array/list/... In which I can store objects together
with a unique key. The most important thing is performance: I will need to
do the whole time searches in the list of given keys. Which datastructure
will be best suited for this? A Hashtable? The list may contain upto 10^12
items, bit more proably most of the time +- 10^9 of even 10^6...

Thanks a lot in advance,
Pieter

Jan 16 '08 #13

Found some numbers ( needed to do some translation )

The test was with 100.000 items

compared was ArrayList, HashTable en SortedList

Arraylist was 4 times faster as the hashtable , hashtable was 8 - 100 times
faster as the sortedlist

please if you write some comparisation routines yourself then don`t forget
to set compile options in VB.Net remove overflow checks and enable
optimizations
otherwise we get again trolling messages here that C# is so much faster as
VB.Net with these options the same as they are in C# they should evaluate at
the same speed


"Pieter" <pi************ ****@hotmail.co mschreef in bericht
news:OH******** ********@TK2MSF TNGP04.phx.gbl. ..
Hi,

I need some type of array/list/... In which I can store objects together
with a unique key. The most important thing is performance: I will need to
do the whole time searches in the list of given keys. Which datastructure
will be best suited for this? A Hashtable? The list may contain upto 10^12
items, bit more proably most of the time +- 10^9 of even 10^6...

Thanks a lot in advance,
Pieter

Jan 16 '08 #14
On Jan 16, 9:59 pm, "Michel Posseth [MCP]" <M...@posseth.c omwrote:
Found some numbers ( needed to do some translation )

The test was with 100.000 items

compared was ArrayList, HashTable en SortedList

Arraylist was 4 times faster as the hashtable , hashtable was 8 - 100 times
faster as the sortedlist
It really surprises me that you found an ArrayList faster than a
Hashtable with that many items. I could understand it for a really
small dataset, but at 100,000 a hashtable should be much faster than a
list. Could you post a link to the code?

Jon
Jan 17 '08 #15
Arraylist was 4 times faster as the hashtable
The other question would be: "at what, exactly?"

If the timings include the insertion time, then yes: a hash-table
(Hashtable or Dictionary<TKey ,TValue>) will take longer: it needs to
obtain hashes, build buckets, etc (an ArrayList or List<Tjust has to
copy an array a few times). But the *read* performance should then be
/significantly/ better, assuming that the items are being fetched by
key (and not by looping).

Without inspectable, reproducable code I'm going to take this with a
pinch of salt...

Marc
Jan 17 '08 #16

Before anyone takes this out of perspective

1 . these tests were done by Balena not by me , however the results and code
are not annymore on the website , but i have read on a dutch website that
you can find this in his 2003 book ( i have the 2002 and 2005 version )

2 . The conclusion was that the hashtable was the fastest when values were
retrieved with the key , however a arraylist was faster when looping
through the values

"Although the statistics might tell you that the Amazone rivers average
depth is 50 cm this does not mean that you can walk to the other side"

Michel


"Marc Gravell" <ma**********@g mail.comschreef in bericht
news:%2******** ********@TK2MSF TNGP06.phx.gbl. ..
>Arraylist was 4 times faster as the hashtable
The other question would be: "at what, exactly?"

If the timings include the insertion time, then yes: a hash-table
(Hashtable or Dictionary<TKey ,TValue>) will take longer: it needs to
obtain hashes, build buckets, etc (an ArrayList or List<Tjust has to
copy an array a few times). But the *read* performance should then be
/significantly/ better, assuming that the items are being fetched by key
(and not by looping).

Without inspectable, reproducable code I'm going to take this with a pinch
of salt...

Marc

Jan 17 '08 #17
Michel Posseth [MCP] <MS**@posseth.c omwrote:
Before anyone takes this out of perspective

1 . these tests were done by Balena not by me , however the results and code
are not annymore on the website , but i have read on a dutch website that
you can find this in his 2003 book ( i have the 2002 and 2005 version )

2 . The conclusion was that the hashtable was the fastest when values were
retrieved with the key , however a arraylist was faster when looping
through the values
Right. That makes a lot more sense. As far as I'm aware, this thread is
about looking up by key, at which point the hashtable should win
easily.

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
World class .NET training in the UK: http://iterativetraining.co.uk
Jan 17 '08 #18
IMHO

The Hashtable will not work for the TS as the object limit in .Net = 2 GB
the hashtable way will blow up with a out of memory exception wit the
amounts the TS wants to store

so the only way that will probably work is the also in this thread proposed
database aproach

Michel


"Jon Skeet [C# MVP]" <sk***@pobox.co mschreef in bericht
news:MP******** *************@m snews.microsoft .com...
Michel Posseth [MCP] <MS**@posseth.c omwrote:
>Before anyone takes this out of perspective

1 . these tests were done by Balena not by me , however the results and
code
are not annymore on the website , but i have read on a dutch website that
you can find this in his 2003 book ( i have the 2002 and 2005
ersion )

2 . The conclusion was that the hashtable was the fastest when values
were
retrieved with the key , however a arraylist was faster when looping
through the values

Right. That makes a lot more sense. As far as I'm aware, this thread is
about looking up by key, at which point the hashtable should win
easily.

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
World class .NET training in the UK: http://iterativetraining.co.uk

Jan 18 '08 #19

"Jon Skeet [C# MVP]" <sk***@pobox.co mwrote in message
news:1e******** *************** ***********@e23 g2000prf.google groups.com...
>
I've got a datastructure of 100 million records in memory. I probably
shouldn't say what it's for, but it basically ends up being a 10,000
squared lookup, with a single byte result.

Doing this in memory (for the low, low cost of 100M of memory) gave me
a ~100,000% performance improvement over a database lookup :)
That of course would not be a searchable datastructure, more like an indexed
array where the location is calculated by offset ;) If it was a key per
item, then each item would require at least a 4 byte key, making the in
memory size at least 500M

Jan 19 '08 #20

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

18
1793
by: Pieter | last post by:
Hi, I need some type of array/list/... In which I can store objects together with a unique key. The most important thing is performance: I will need to do the whole time searches in the list of given keys. Which datastructure will be best suited for this? A Hashtable? The list may contain upto 10^12 items, bit more proably most of the time +- 10^9 of even 10^6... Thanks a lot in advance,
0
9589
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9423
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10045
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
9994
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9863
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8870
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5447
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
3561
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2815
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.