473,378 Members | 1,354 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,378 software developers and data experts.

The Right Collection for the Job

Hi There

I have a huge array of objects being displayed on a UI. These are
actually 'jobs' that the users need to deal with before close of
business.

As the operator deals with a particular job, I want to add the unique
identifier for that job to a collection. The collection simply keeps
track of jobs that have been dealt with.

The issue is speed of retrieval. Given a job, I know its unique
identifier and want to be able to quickly search the collection to see
if the job has been dealt with.

At the moment I am using a HashTable, but my use of the HashTable is
not intuitive - I add items with myHashTable.Add(jobID,jobID) ... ie I
use the unique identifier for both key and value, because really, I
don't care for the value, I just want a quick way to see if the JobID
is in the table.

Is there a better way to go about things?

Thanks
Bill
Nov 17 '05 #1
8 1095
On 2005-04-30, orekin <or*******@yahoo.com.au> wrote:
Hi There

I have a huge array of objects being displayed on a UI. These are
actually 'jobs' that the users need to deal with before close of
business.

As the operator deals with a particular job, I want to add the unique
identifier for that job to a collection. The collection simply keeps
track of jobs that have been dealt with.

The issue is speed of retrieval. Given a job, I know its unique
identifier and want to be able to quickly search the collection to see
if the job has been dealt with.

At the moment I am using a HashTable, but my use of the HashTable is
not intuitive - I add items with myHashTable.Add(jobID,jobID) ... ie I
use the unique identifier for both key and value, because really, I
don't care for the value, I just want a quick way to see if the JobID
is in the table.

Is there a better way to go about things?

Thanks
Bill


Bill... I think using that hastable is still pretty good. You could do
something like this:

using System;
using System.Collections;

public sealed class App
{
public static void Main ()
{
// create a hashtable
Hashtable ht = new Hashtable ();

// add some keys
ht.Add (1, null);
ht.Add (2, null);

Console.WriteLine (ht.ContainsKey (1));
Console.WriteLine (ht.ContainsKey (10));
}
}

Output:

True
False

--
Tom Shelton [MVP]
Nov 17 '05 #2
You should check the specialized collection namespace
(System.Collections.Specialized) it has a string dictionary that
implements a hash table with the key and the value strongly typed to be
strings rather than objects.

How many entries are you going to have in the collection?

And do you only ever want to check for Job once?
--
HTH

Ollie Riches
http://www.phoneanalyser.net
Disclaimer: Opinions expressed in this forum are my own, and not
representative of my employer.
I do not answer questions on behalf of my employer. I'm just a programmer
helping programmers.

"Tom Shelton" <ts******@YOUKNOWTHEDRILLcomcast.net> wrote in message
news:%2****************@TK2MSFTNGP10.phx.gbl...
On 2005-04-30, orekin <or*******@yahoo.com.au> wrote:
Hi There

I have a huge array of objects being displayed on a UI. These are
actually 'jobs' that the users need to deal with before close of
business.

As the operator deals with a particular job, I want to add the unique
identifier for that job to a collection. The collection simply keeps
track of jobs that have been dealt with.

The issue is speed of retrieval. Given a job, I know its unique
identifier and want to be able to quickly search the collection to see
if the job has been dealt with.

At the moment I am using a HashTable, but my use of the HashTable is
not intuitive - I add items with myHashTable.Add(jobID,jobID) ... ie I
use the unique identifier for both key and value, because really, I
don't care for the value, I just want a quick way to see if the JobID
is in the table.

Is there a better way to go about things?

Thanks
Bill


Bill... I think using that hastable is still pretty good. You could do
something like this:

using System;
using System.Collections;

public sealed class App
{
public static void Main ()
{
// create a hashtable
Hashtable ht = new Hashtable ();

// add some keys
ht.Add (1, null);
ht.Add (2, null);

Console.WriteLine (ht.ContainsKey (1));
Console.WriteLine (ht.ContainsKey (10));
}
}

Output:

True
False

--
Tom Shelton [MVP]

Nov 17 '05 #3
If you don't care about the "Value" in the Hashtable, then I recommend using an ArrayList and the Sort and BinarySearch methods.

I believe that sorting the list, and "binary searching" will be the fastest implementation in your case, but I believe that this
method performs the best with 10 or more entries (if i remember correctly). Either way, 10 entries should be extremely fast so that
should work just fine.

--
Dave Sexton
dave@www..jwaonline..com
-----------------------------------------------------------------------
"Ollie Riches" <ol**********@phoneanalyser.net> wrote in message news:e8*************@tk2msftngp13.phx.gbl...
You should check the specialized collection namespace (System.Collections.Specialized) it has a string dictionary that implements
a hash table with the key and the value strongly typed to be strings rather than objects.

How many entries are you going to have in the collection?

And do you only ever want to check for Job once?
--
HTH

Ollie Riches
http://www.phoneanalyser.net
Disclaimer: Opinions expressed in this forum are my own, and not
representative of my employer.
I do not answer questions on behalf of my employer. I'm just a programmer
helping programmers.

"Tom Shelton" <ts******@YOUKNOWTHEDRILLcomcast.net> wrote in message news:%2****************@TK2MSFTNGP10.phx.gbl...
On 2005-04-30, orekin <or*******@yahoo.com.au> wrote:
Hi There

I have a huge array of objects being displayed on a UI. These are
actually 'jobs' that the users need to deal with before close of
business.

As the operator deals with a particular job, I want to add the unique
identifier for that job to a collection. The collection simply keeps
track of jobs that have been dealt with.

The issue is speed of retrieval. Given a job, I know its unique
identifier and want to be able to quickly search the collection to see
if the job has been dealt with.

At the moment I am using a HashTable, but my use of the HashTable is
not intuitive - I add items with myHashTable.Add(jobID,jobID) ... ie I
use the unique identifier for both key and value, because really, I
don't care for the value, I just want a quick way to see if the JobID
is in the table.

Is there a better way to go about things?

Thanks
Bill


Bill... I think using that hastable is still pretty good. You could do
something like this:

using System;
using System.Collections;

public sealed class App
{
public static void Main ()
{
// create a hashtable
Hashtable ht = new Hashtable ();

// add some keys
ht.Add (1, null);
ht.Add (2, null);

Console.WriteLine (ht.ContainsKey (1));
Console.WriteLine (ht.ContainsKey (10));
}
}

Output:

True
False

--
Tom Shelton [MVP]


Nov 17 '05 #4
if the list of jobs is constrained to "the number of jobs that a single
human can perform in a single day," then the word "huge" is misleading.

To create a parallel, let's say that I wanted to count, as a task, each
breath I take while at work. I take appoximately 3 seconds per breath. In
other words, I inhale about 20 times a minute while at rest. In an eight
hour day, I will inhale 9,600 times. This is tiny.

Assuming I wanted to place 9,600 rows in any collection object that the
framework makes available, it would be difficult (at best) for a human being
to percieve any difference in the performance for looking up a single entry,
since the number of entries is so small.

You have already spent more time thinking about it, just by reading these
responses, than all of your users, combined, will ever percieve in variation
based on the choice you make.

In other words, don't optimize until you know where your time is being
spent.

--
--- Nick Malik [Microsoft]
MCSD, CFPS, Certified Scrummaster
http://blogs.msdn.com/nickmalik

Disclaimer: Opinions expressed in this forum are my own, and not
representative of my employer.
I do not answer questions on behalf of my employer. I'm just a
programmer helping programmers.
--
"orekin" <or*******@yahoo.com.au> wrote in message
news:84*************************@posting.google.co m...
Hi There

I have a huge array of objects being displayed on a UI. These are
actually 'jobs' that the users need to deal with before close of
business.

As the operator deals with a particular job, I want to add the unique
identifier for that job to a collection. The collection simply keeps
track of jobs that have been dealt with.

The issue is speed of retrieval. Given a job, I know its unique
identifier and want to be able to quickly search the collection to see
if the job has been dealt with.

At the moment I am using a HashTable, but my use of the HashTable is
not intuitive - I add items with myHashTable.Add(jobID,jobID) ... ie I
use the unique identifier for both key and value, because really, I
don't care for the value, I just want a quick way to see if the JobID
is in the table.

Is there a better way to go about things?

Thanks
Bill

Nov 17 '05 #5
Thanks everyone for your replies.

A particular jobID would be checked against the collection several
thousand times, not just once. Also, the jobID is an integer.

In terms of the size of the array that holds the number of jobs that
are dealt with - this ranges from 0 to approx 250,000 items [There are
thousands of employees available to deal with jobs and each job takes
on average 10 minutes]

So on the absolute worst case (please correct if these calcs are
wrong):
- Linear search requires 250,000 hits
- Binary search requires 18 hits [Log 250,000 base 2]
- HashTable with default load factor requires 3.6 hits [1/(1-0.72)]

If my calculations are correct that means that we have roughly 15 ms to
gain every time this collection is searched. The search routine will
be put away in its own function and that function could be called up to
10,000 times (1 refresh every 3 seconds over a standard business day).
This equates to a time saving of 150 seconds throughout the day.

I am really interested to read further feedback. For now I am leaning
toward the approach suggested by Tom as it is quick and sticks to
HashTables, which the other programmers on my team are familiar with..

Kind Regards
Bill

Nov 17 '05 #6
Hi Bill,

I was wondering if you could please explain this calculation:
- HashTable with default load factor requires 3.6 hits [1/(1-0.72)]
Default load is 1, I believe. How did you determine the other values?

HashTable is obviously the way to go (aside from algorithms not built into the framework that may perform better).
--
Dave Sexton
dave@www..jwaonline..com
-----------------------------------------------------------------------
<or*******@yahoo.com.au> wrote in message news:11*********************@o13g2000cwo.googlegro ups.com... Thanks everyone for your replies.

A particular jobID would be checked against the collection several
thousand times, not just once. Also, the jobID is an integer.

In terms of the size of the array that holds the number of jobs that
are dealt with - this ranges from 0 to approx 250,000 items [There are
thousands of employees available to deal with jobs and each job takes
on average 10 minutes]

So on the absolute worst case (please correct if these calcs are
wrong):
- Linear search requires 250,000 hits
- Binary search requires 18 hits [Log 250,000 base 2]
- HashTable with default load factor requires 3.6 hits [1/(1-0.72)]

If my calculations are correct that means that we have roughly 15 ms to
gain every time this collection is searched. The search routine will
be put away in its own function and that function could be called up to
10,000 times (1 refresh every 3 seconds over a standard business day).
This equates to a time saving of 150 seconds throughout the day.

I am really interested to read further feedback. For now I am leaning
toward the approach suggested by Tom as it is quick and sticks to
HashTables, which the other programmers on my team are familiar with..

Kind Regards
Bill

Nov 17 '05 #7
Hi Dave!

I got this from:

http://msdn.microsoft.com/library/de...res_guide2.asp

Search the document for "1 / (1 - lf), where lf is the load factor"

I could probably be incorrect, but I am thinking worse case scenario is
a collision, and using the above formula with default load factor I get
3.6 (note that Microsoft does a transformation on the load factor,
0.72 is really 1 ... search for paragraph containing "Realize, however,
that whatever value you provide, it is scaled down 72 percent, so even
if you pass in a value of 1.0, the Hashtable class's actual loadFactor
will be 0.72").

Cheers
Bill

Nov 17 '05 #8
doing this in memory is not scalable. It is fast, true, but not scalable.

In other words, you can get good numbers from 100 folks hitting the
collection, but when you have 10,000, you are going to need another machine
or two, because you will bottleneck on the networking or communication
aspects. Therefore, all the speed in the world in the collection won't
help.

If you are going to scale, you will need a nice fast database under the
covers. Put in SQL (on a seperate machine from your app server). Then, at
the app server level, Cache the hits you have looked up, but no more, and
even then keep them in memory only for a short time, since their data will
age pretty fast.
Your speed issue still has nothing to do with the collection. Use the hash
table and go. Save this "thinking" time to consider actual scalability.
--
--- Nick Malik [Microsoft]
MCSD, CFPS, Certified Scrummaster
http://blogs.msdn.com/nickmalik

Disclaimer: Opinions expressed in this forum are my own, and not
representative of my employer.
I do not answer questions on behalf of my employer. I'm just a
programmer helping programmers.
--
<or*******@yahoo.com.au> wrote in message
news:11*********************@o13g2000cwo.googlegro ups.com...
Thanks everyone for your replies.

A particular jobID would be checked against the collection several
thousand times, not just once. Also, the jobID is an integer.

In terms of the size of the array that holds the number of jobs that
are dealt with - this ranges from 0 to approx 250,000 items [There are
thousands of employees available to deal with jobs and each job takes
on average 10 minutes]

So on the absolute worst case (please correct if these calcs are
wrong):
- Linear search requires 250,000 hits
- Binary search requires 18 hits [Log 250,000 base 2]
- HashTable with default load factor requires 3.6 hits [1/(1-0.72)]

If my calculations are correct that means that we have roughly 15 ms to
gain every time this collection is searched. The search routine will
be put away in its own function and that function could be called up to
10,000 times (1 refresh every 3 seconds over a standard business day).
This equates to a time saving of 150 seconds throughout the day.

I am really interested to read further feedback. For now I am leaning
toward the approach suggested by Tom as it is quick and sticks to
HashTables, which the other programmers on my team are familiar with..

Kind Regards
Bill

Nov 17 '05 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Tiësto | last post by:
Hi guys, I've trying to find a collection for a very common task without any luck. I only need a collection of Key/Value objects, cause I need to check whether a key as been added and if not, get...
0
by: Tiësto | last post by:
Hi guys, I've trying to find a collection for a very common task without any luck. I only need a collection of Key/Value objects, cause I need to check whether a key as been added and if not, get...
2
by: Sam | last post by:
I have a custom control (MyTextBox - taken from Microsoft website) that implements the IPostBackDataHandler interface. It is added to the controls collection of a placeholder control during the...
3
by: J-T | last post by:
Is there a problem of defining a category object like below. I am just trying to create a proof-of-concept sort of application (and I don't want to use database) , so I have to sort of hard code...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.