optimal load factor for small Hashtable

John A Grandy

For a Hashtable that is expected to contain no more than 100 DictionaryEntry
elements , what is the best load factor ?

( This Hashtable is a encapsulted in a class , an instance of which is
cached and methods on which are called by n threads ( n is approx = 25 ) )

Mar 28 '06 #1

Subscribe Reply

5072

Bruce Wood

I'm going to hazard a guess that locking and synchronization are going
to use up more horsepower than the hashtable search itself. However,
given the tiny size of the table and the enormous amount of memory on
most machines, I would go for a lower load factor: trade memory for a
chance at better speed. Back when I used to write real-time systems we
used load factors of 0.5 as a general rule. However, with a maximum
somewhere near 100 entries, you can afford to go lower than that
without paying too much of a penalty in memory. Even if the cost of
lookup is dwarfed by the cost of synchronization , you haven't paid a
memory penalty worth worrying about, so who cares?

However, remember that the load factor is only as good as the hashing
algorithm. Make sure that the GetHashCode for your objects is
statistically solid, because all of the twiddling of load factor in the
world won't help you if several of your data items return the same hash
code.

Mar 28 '06 #2

John A Grandy

Hi Bruce, and thanks for the response.

My objects are not large. I am wondering if load factor even matters at
all.

Is it the case the load factor most prominently comes into play when very
large numbers of items are consistently added/removed from the hashtable ?

"Bruce Wood" <br*******@cana da.com> wrote in message
news:11******** **************@ v46g2000cwv.goo glegroups.com.. .

I'm going to hazard a guess that locking and synchronization are going
to use up more horsepower than the hashtable search itself. However,
given the tiny size of the table and the enormous amount of memory on
most machines, I would go for a lower load factor: trade memory for a
chance at better speed. Back when I used to write real-time systems we
used load factors of 0.5 as a general rule. However, with a maximum
somewhere near 100 entries, you can afford to go lower than that
without paying too much of a penalty in memory. Even if the cost of
lookup is dwarfed by the cost of synchronization , you haven't paid a
memory penalty worth worrying about, so who cares?

However, remember that the load factor is only as good as the hashing
algorithm. Make sure that the GetHashCode for your objects is
statistically solid, because all of the twiddling of load factor in the
world won't help you if several of your data items return the same hash
code.

Mar 29 '06 #3

Bruce Wood

Load factor represents your statistical chances of having more than one
item in a hash bucket. Note that by "item" here I mean a reference, or
pointer to an item, so the size of your items doesn't matter: the same
thing is always stored in the Hashtable: a pointer.

I'm guessing that you know how a Hashtable is implemented, but in case
you don't, here's a quick rundown. A hash table is nothing more than an
array of lists. Each entry in the array is called a "bucket", and a
bucket can hold zero, one, or more of your items. The way it works is
that your class's GetHashCode method returns a hash code value, which
then goes through some simple manipulation and ends up determining
which bucket in the table the item will be filed under. If more than
one item ends up in the same bucket, they are chained into a list.

From the documentation, it appears that a load factor of 1 means one

Hashtable bucket for each expected item in the table. So, if you
expected 100 items, and you created a Hashtable with a load factor of
1, you would get 100 buckets in the table.

So, that sounds fine: one bucket per item. However, remember that which
item ends up in which bucket depends upon statistical mathematics
tricks, so you could very well have two items in one bucket and one
bucket empty, depending upon how your hashes work out. Two items in one
bucket means that you have to search down that chain looking for which
one is the one you really want. Two items isn't really a big deal, but
if that number gets too large, you start doing a lot of searching,
which takes time.

So, one solution is to lower the load factor. If you have two buckets
for every expected item, then your chances of getting two items in the
same bucket, and therefore having to search, just went down. Three
buckets for every expected item and the chances drop even more. If you
lower the load factor enough, your chances of getting two or more items
in one bucket become very low indeed.

(That is, assuming that you have a good hash function. If you have a
stupid hash function with a bad distribution, you can have 100 buckets
for each expected item and still get multiple items in the same
bucket.)

So, the size of your items doesn't matter, and how often you add and
remove doesn't matter. All the load factor does is change your chances
of having to search down a chain for the item you want.

Mar 29 '06 #4

Helge Jensen

John A Grandy wrote:

For a Hashtable that is expected to contain no more than 100 DictionaryEntry
elements , what is the best load factor ?

Why are you worrying about this? are you experiencing performance or
memory-problems?

If you are worried about hash-collisions, just allocate a large
dictionary to begin with:

new Hashtable(100*3 );

--
Helge Jensen
mailto:he****** ****@slog.dk
sip:he********* *@slog.dk
-=> Sebastian cover-music: http://ungdomshus.nu <=-

Mar 29 '06 #5

John A Grandy

Great explanation !

It my case I am app start populaing the Hashtable from an XML file stored in
RAM. The XML file rarely changes.

The Hashtable is read only. It lives in an instance of a class cached on a
multi-processor server and this instance is shared by many threads ( they
grab the cached class instance and then use an IDictionaryEnum erator to
iterate through the hashtable items, performing comparisons).

So, given that my data store is read-only, 100 items max, I am now wondering
if hashtable is the most appropriate data class .......
"Bruce Wood" <br*******@cana da.com> wrote in message
news:11******** *************@e 56g2000cwe.goog legroups.com...

Load factor represents your statistical chances of having more than one
item in a hash bucket. Note that by "item" here I mean a reference, or
pointer to an item, so the size of your items doesn't matter: the same
thing is always stored in the Hashtable: a pointer.

I'm guessing that you know how a Hashtable is implemented, but in case
you don't, here's a quick rundown. A hash table is nothing more than an
array of lists. Each entry in the array is called a "bucket", and a
bucket can hold zero, one, or more of your items. The way it works is
that your class's GetHashCode method returns a hash code value, which
then goes through some simple manipulation and ends up determining
which bucket in the table the item will be filed under. If more than
one item ends up in the same bucket, they are chained into a list.
From the documentation, it appears that a load factor of 1 means one

Hashtable bucket for each expected item in the table. So, if you
expected 100 items, and you created a Hashtable with a load factor of
1, you would get 100 buckets in the table.

So, that sounds fine: one bucket per item. However, remember that which
item ends up in which bucket depends upon statistical mathematics
tricks, so you could very well have two items in one bucket and one
bucket empty, depending upon how your hashes work out. Two items in one
bucket means that you have to search down that chain looking for which
one is the one you really want. Two items isn't really a big deal, but
if that number gets too large, you start doing a lot of searching,
which takes time.

So, one solution is to lower the load factor. If you have two buckets
for every expected item, then your chances of getting two items in the
same bucket, and therefore having to search, just went down. Three
buckets for every expected item and the chances drop even more. If you
lower the load factor enough, your chances of getting two or more items
in one bucket become very low indeed.

(That is, assuming that you have a good hash function. If you have a
stupid hash function with a bad distribution, you can have 100 buckets
for each expected item and still get multiple items in the same
bucket.)

So, the size of your items doesn't matter, and how often you add and
remove doesn't matter. All the load factor does is change your chances
of having to search down a chain for the item you want.

Mar 29 '06 #6

Bruce Wood

Are the clients always iterating over the table? If so, then you
definitely don't want a Hashtable.

What kinds of comparisons are they doing? Are they filtering the
entries, looking for subsets, or doing something else?

You say that the data in the hash table is "rarely updated". Under what
circumstances is it updated, and in what way (entries altered / added /
removed)?

If you give me more details maybe I can suggest a more appropriate data
structure.

Mar 29 '06 #7

Bruce Wood

Oh, sorry. Ignore the question about how the hash table changes. You've
already stated that it's read-only.

I'm interested in how it's used, though. I can probably suggest a
better structure.

Oh, and locking shouldn't be required if the structure is read-only.

Mar 29 '06 #8

John A Grandy

In terms of the business process, the Hashtable's keys and values are
strings.

Product name strings are examined to determine if they contain one or more
of the string keys in the Hashtable (this is where the Hashtable is
iterated). For each key so found , the corresponding value is added to an
ArrayList containing objects of a custom type (ProductAttribu te).

The Hashtable is loaded from a read-only xml file into a cached instance of
an "umbrella" cache class. The xml file might change approx once a month.
It's unlikely to ever contain more than 100 elements. Each time it's
changed the Windows Service that houses this entire process must be
restarted to re-instantiate/re-load the cache instance.

So ... it appears that the only performance-relevant aspect is that many
threads (at least 25) possibly are running simultaneously, each grabbing
instances of the Hashtable from cache whenever they process a product item.
"Bruce Wood" <br*******@cana da.com> wrote in message
news:11******** **************@ i39g2000cwa.goo glegroups.com.. .

Oh, sorry. Ignore the question about how the hash table changes. You've
already stated that it's read-only.

I'm interested in how it's used, though. I can probably suggest a
better structure.

Oh, and locking shouldn't be required if the structure is read-only.

Mar 30 '06 #9

Bruce Wood

> Product name strings are examined to determine if they contain one or more of the string keys in the Hashtable...

How is this examination done? How is the product name formed? What I'm
getting at is whether the product name has some delimiter that you
could use to break it up before checking, or whether the table keys can
appear in any subset of any word of the product name (where "words" may
be delimited by spaces, dashes, or whatever)?

Is there any way that you could break up the product name and search
for the bits in the Hashtable, rather than iterating over the table
checking for each key in the product name?

As well, is your ProductAttribut e class immutable? Could you
instantiate all of the ProductAttribut e objects once when the service
starts up and then never make another one? Or do they contain
information that changes for each product and so you need many
instances of the same ProductAttribut e?

As for changes to the XML file, you could use FileSystemWatch er to
respond to a change in the file and reload it while the system is
running. That would remove the need to stop and restart the service
when the file changes. (Hey, one less thing that IT staff have to do in
order to make the system work properly is a money saver. This is
particularly true since the XML file changes rarely, and therefore
they're more likely to forget that they have to stop and restart the
service and spend time ( = $ ) tracking down the "bug" that results
from the XML file not being current in memory.)

It should be possible to rebuild whatever data structure you're using
from the XML file, then swap the (static) pointer to the list of
product attributes in one clean, atomic operation, so that searches
that are in progress on the old file contents will proceed on the old
structure, but searches that are about to start will grab the reference
to the new structure.

This, in turn, should mean (in theory) that you should need no locking
at all on the structure. I'm no expert in multithreading. .. I'd rather
have someone like Jon Skeet comment on this, but if you're not altering
the data structure at all, only reading it, then you shouldn't need to
lock on it. There would be only two operations that would change the
structure:

1) .NET's ability to move things about in memory as your code is
running. However, this should be transparent, and so shouldn't be
affected by locking.
2) Wholesale replacement of the structure when the XML file changes.
However, if it's done right, it should not require locking.

That means that you can have multiple threads reading the structure
simultaneously and they shouldn't conflict in any way.

Once I figure out better exactly what you're doing, I can tell you what
"the structure" should be. :-)

Mar 30 '06 #10

Similar topics

2231

How to load and sort Values from Hashtable into a DataColumn of a DataTable

by: bpdace | last post by:

I have a datasource that is being passed in as a Hashtable. I want to load in all the Values (strings inside the Value, not the Keys) into a collumn inside a DataTable. The values need to be sorted alphabetically. What is the best way to do this? Any help is appreciated!

.NET Framework

5267

Intermittent Hashtable() error: Load factor too high

by: Gary Davis | last post by:

Once a day or so, I receive an error on a fairly active website that calls this StrMixed.cs method constructor. 99% of the time there is no exception: System.Web.HttpUnhandledException: Exception of type System.Web.HttpUnhandledException was thrown. ---> System.InvalidOperationException: Hashtable insert failed. Load factor too high. at System.Collections.Hashtable.Insert(Object key, Object nvalue, Boolean add)

C# / C Sharp

1511

DataSource initial load time issue

by: Daniel | last post by:

Hi there, I'm working on a app that has a number of combo boxs, my issue is that on loading the app, when the code gets to the datasource property of the first combo box it takes about 4 seconds to load. I have tried binding it to a table or to a table view, i have also tried adding the elements into an Array List and then binding to that but this did not have any effect. The table that i'm trying to bind to is a small table with about...

C# / C Sharp

1423

Load Dll at Singleton

by: Amar | last post by:

Hi, I have created a singleton object that has a hashtable which holds all the functions loaded from a dll in order to have access to these functions every time i want without having to open each time the dll. The problem is that when i do this i cannot debug my project at all, but the project runs perfect with no error. The hashtable is filled correct! Notice i don't want to debug the function of the hashtable which cannot be done. I...

ASP.NET

1876

How to get frmMain to load before dependent forms?

by: Dean Slindee | last post by:

My project has a main form (frmMain, the startup object for the project) and several other "child" forms that are painted within a large panel on frmMain. In each form's Form_Load event, a Weak Reference for that form is loaded into a global Hash table, like this: Dim wr As WeakReference = New WeakReference(Me, False) If Not hashTable.Contains(cfrmMain) Then

Visual Basic .NET

4282

How to get frmMain to load before other dependent forms?

by: Dean Slindee | last post by:

Visual Basic .NET

2396

Load factor

by: Maya | last post by:

Is there any reason why my hashtable load factor doesn't accept the value i assign to? it always resets it self to the default which is 0.72f although i initialized the table using: HashTable HT = new HasTable(100,1.0f) Thank you, Maya.

C# / C Sharp

2398

Optimal buffer growth

by: Angel Tsankov | last post by:

Hi, I remember reading in a book (or in an article) that the optmial buffer growth factor is about 1.6. Now I need to find this book but I can't remember its title. Can someone help me with this?

C / C++

8440

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...

General

8352

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...

Windows Server

8863

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...

C / C++

5661

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...

C# / C Sharp

4192

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

4358

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

2765

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

C# / C Sharp

2005

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP

1763

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

General