473,756 Members | 1,904 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

fastest searchable datastructure?

Hi,

I need some type of array/list/... In which I can store objects together
with a unique key. The most important thing is performance: I will need to
do the whole time searches in the list of given keys. Which datastructure
will be best suited for this? A Hashtable? The list may contain upto 10^12
items, bit more proably most of the time +- 10^9 of even 10^6...

Thanks a lot in advance,
Pieter
Jan 16 '08 #1
21 1587
I need some type of array/list/... In which I can store objects together
with a unique key.
Sounds like Dictionary<TKey , TValue>...
The list may contain upto 10^12
Seriously? You do realise that even at one byte per item, with no
index overhead, padding, etc that's a TB?

I'm going to assume that is a typo - but even so, for large numbers
you may be better using a database approach, with a non-clustered
unique index on this field (only; don't span - let it use a bookmark
lookup to get the value) and ideally with the index on a different
file-group to the main data.

Marc
Jan 16 '08 #2
Thansk both for your answer!

Actually: it's not a type: it could go theoretical upto 6*10^12 :-)
But as I said: I expect a practical implementation of +- 10^6...

I will use indeed SQL Server when the amoutn will be too big... But in case
it's underneath 10^6: Will a dictionary be better than a Hashtable? Or
soemthing else?

"Marc Gravell" <ma**********@g mail.comwrote in message
news:e0******** *************** ***********@h11 g2000prf.google groups.com...
>I need some type of array/list/... In which I can store objects together
with a unique key.
Sounds like Dictionary<TKey , TValue>...
>The list may contain upto 10^12
Seriously? You do realise that even at one byte per item, with no
index overhead, padding, etc that's a TB?

I'm going to assume that is a typo - but even so, for large numbers
you may be better using a database approach, with a non-clustered
unique index on this field (only; don't span - let it use a bookmark
lookup to get the value) and ideally with the index on a different
file-group to the main data.

Marc

Jan 16 '08 #3
Do you know how much memory 10^12 of these "items" is going to require?
Because if you run out of physical RAM, your whole process will either blow
up or slow down dramatically due to paging.
So the best alternative may be a database to hold your items. A search on a
table with a clustered index on a primary key in this case will be the
fastest.
-- Peter
Site: http://www.eggheadcafe.com
UnBlog: http://petesbloggerama.blogspot.com
MetaFinder: http://www.blogmetafinder.com
"Pieter" wrote:
Hi,

I need some type of array/list/... In which I can store objects together
with a unique key. The most important thing is performance: I will need to
do the whole time searches in the list of given keys. Which datastructure
will be best suited for this? A Hashtable? The list may contain upto 10^12
items, bit more proably most of the time +- 10^9 of even 10^6...

Thanks a lot in advance,
Pieter
Jan 16 '08 #4
Hehe I do know :-)
The problem is: it will be for an experiment, and not every possiblity will
happen as much as the others. So I want definetly put the most popular in
some kind of local cache... See it as putting the first block of a B-tree in
the RAM memory...

"Peter Bromberg [C# MVP]" <pb*******@yaho o.NoSpamMaam.co mwrote in message
news:E0******** *************** ***********@mic rosoft.com...
Do you know how much memory 10^12 of these "items" is going to require?
Because if you run out of physical RAM, your whole process will either
blow
up or slow down dramatically due to paging.
So the best alternative may be a database to hold your items. A search on
a
table with a clustered index on a primary key in this case will be the
fastest.
-- Peter
Site: http://www.eggheadcafe.com
UnBlog: http://petesbloggerama.blogspot.com
MetaFinder: http://www.blogmetafinder.com
"Pieter" wrote:
>Hi,

I need some type of array/list/... In which I can store objects together
with a unique key. The most important thing is performance: I will need
to
do the whole time searches in the list of given keys. Which datastructure
will be best suited for this? A Hashtable? The list may contain upto
10^12
items, bit more proably most of the time +- 10^9 of even 10^6...

Thanks a lot in advance,
Pieter

Jan 16 '08 #5
Pieter wrote:
Thansk both for your answer!

Actually: it's not a type: it could go theoretical upto 6*10^12 :-)
But as I said: I expect a practical implementation of +- 10^6...

I will use indeed SQL Server when the amoutn will be too big... But in case
it's underneath 10^6: Will a dictionary be better than a Hashtable? Or
soemthing else?

"Marc Gravell" <ma**********@g mail.comwrote in message
news:e0******** *************** ***********@h11 g2000prf.google groups.com...
>>I need some type of array/list/... In which I can store objects together
with a unique key.
Sounds like Dictionary<TKey , TValue>...
>>The list may contain upto 10^12
Seriously? You do realise that even at one byte per item, with no
index overhead, padding, etc that's a TB?

I'm going to assume that is a typo - but even so, for large numbers
you may be better using a database approach, with a non-clustered
unique index on this field (only; don't span - let it use a bookmark
lookup to get the value) and ideally with the index on a different
file-group to the main data.

Marc

I would carefully avoid solutions that require you to write the same
type of code multiple times, one type for < 10^6, one for <10^9 and one
for <10^12. If you need as many as 10^12, use a database.

If you absolutely want to draw the best performance out of everything,
abstract out the storage of this list to a new class, so that you can
inherit for it for an in-memory data structure, and inherit from it for
a database structure.

--
Lasse Vågsæther Karlsen
mailto:la***@vk arlsen.no
http://presentationmode.blogspot.com/
PGP KeyID: 0xBCDEA2E3
Jan 16 '08 #6
Thanks Lasse. What it actually will do is: every object will be another
possivble state. But some states will be much more needed than others. So
the 'popular' ones will be put in a local cache. But it's the structure of
that cache that I'm worrying about: As mayb 80% of the searching will happen
in there, it shoudl be as fast as possible: So: which structure to use?

"Lasse Vågsæther Karlsen" <la***@vkarlsen .nowrote in message
news:%2******** ********@TK2MSF TNGP04.phx.gbl. ..
I would carefully avoid solutions that require you to write the same type
of code multiple times, one type for < 10^6, one for <10^9 and one for
<10^12. If you need as many as 10^12, use a database.

If you absolutely want to draw the best performance out of everything,
abstract out the storage of this list to a new class, so that you can
inherit for it for an in-memory data structure, and inherit from it for a
database structure.

--
Lasse Vågsæther Karlsen
mailto:la***@vk arlsen.no
http://presentationmode.blogspot.com/
PGP KeyID: 0xBCDEA2E3

Jan 16 '08 #7
"Pieter" <pi************ ****@hotmail.co mwrote in message
news:OH******** ********@TK2MSF TNGP04.phx.gbl. ..
I need some type of array/list/... In which I can store objects together
with a unique key. The most important thing is performance: I will need to
do the whole time searches in the list of given keys. Which datastructure
will be best suited for this? A Hashtable? The list may contain upto 10^12
items, bit more proably most of the time +- 10^9 of even 10^6...
You will get the fastest performance with a Hash Table. Assuming that
you choose a good algorithm to assign the hash values, the hash table has
the advantage that the average number of accesses to the table to find a key
does not depend on the size of the table, but only on the percent of the
table that is full. With a table that is not terribly full (say 80 o 90%, I
don't remember the figures), the average number of accesses is one point
something. This beats a search on a B-Tree, which requires a number of
accesses that grows with the logarithm of the size of the table. Note that
I'm speaking about the *average* number of accesses. The *worst-case*
scenario is horribly bad, since it would require a full-scan of the hash
table. Fortunately, this is extremely improbable, on the condition that the
hashing algorithm and the collisions algorithm are properly chosen.

If you are going to deal with in-memory elemets using .Net, you can use
a Dictionary<key, value>, which will automatically use a hash table when the
number of stored elements is larger than some internally coded threshold.
You will need to use 64-bit code (and run it in a huge machine) if you want
to address 10^12 elements. If you need to store your data on disk, a
properly programmed hashing algorithm against a flat file can outperform a
database server, which uses trees to store its indices.

Jan 16 '08 #8
I'd think that a Generic List or Generic Dictionary would be faster as there
would be no boxing / unboxing involved in adding or retrieving typed elements.
-- Peter
Site: http://www.eggheadcafe.com
UnBlog: http://petesbloggerama.blogspot.com
MetaFinder: http://www.blogmetafinder.com
"Pieter" wrote:
Hi,

I need some type of array/list/... In which I can store objects together
with a unique key. The most important thing is performance: I will need to
do the whole time searches in the list of given keys. Which datastructure
will be best suited for this? A Hashtable? The list may contain upto 10^12
items, bit more proably most of the time +- 10^9 of even 10^6...

Thanks a lot in advance,
Pieter
Jan 16 '08 #9
Pieter,

In past there was in this newsgroup somebody from Belgie active who was
always answering this question with.

Sorted List

http://msdn2.microsoft.com/en-us/lib...ortedlist.aspx

I have no expirience with that

Cor

Jan 16 '08 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

18
1791
by: Pieter | last post by:
Hi, I need some type of array/list/... In which I can store objects together with a unique key. The most important thing is performance: I will need to do the whole time searches in the list of given keys. Which datastructure will be best suited for this? A Hashtable? The list may contain upto 10^12 items, bit more proably most of the time +- 10^9 of even 10^6... Thanks a lot in advance,
0
9456
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9275
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10034
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9872
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
9843
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8713
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7248
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6534
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
3
2666
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.