473,407 Members | 2,312 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,407 software developers and data experts.

Looking for fast string hash searching

Hi!

First let me apologize for asking this question when there are so many answers
to it on Google, but most of them are really contradicting, and making what I
want to do very performant is crucial to my project. So, here's what I have:

My C programm connects to a database and gets ca. 50-100K domain name/file path
pairs. Those pairs have to be cached by my application. Building the cache may
take a second or two, but retrieving from it must be very fast. Since I get
the data from a database, I'd be able to order by domain name (which will be
my key, and is guaranteed to be unique), so I thought something like a btree
search for strings might be a good idea. I only have to look up by domain name
from the hash, searching by path is not permitted.
Since I'm far from being an expert on the subject of hashing and search
algorithms, your opinion on how to make this fast is humbly requested :-)

TIA,

Thomas
Nov 14 '05 #1
4 3077
Thomas Christmann wrote:

Hi!

First let me apologize for asking this question when there are so many answers
to it on Google, but most of them are really contradicting, and making what I
want to do very performant is crucial to my project. So, here's what I have:

My C programm connects to a database and gets ca. 50-100K domain name/file path
pairs. Those pairs have to be cached by my application. Building the cache may
take a second or two, but retrieving from it must be very fast. Since I get
the data from a database, I'd be able to order by domain name (which will be
my key, and is guaranteed to be unique), so I thought something like a btree
search for strings might be a good idea. I only have to look up by domain name
from the hash, searching by path is not permitted.
Since I'm far from being an expert on the subject of hashing and search
algorithms, your opinion on how to make this fast is humbly requested :-)

TIA,

Thomas


This isn't _really_ a `C' question...

If the distribution of "domain" names
is pretty even across the alphabet,
then you could use the 1st letter of
the name as an index to an array of
"pointers" to name/path pairs that
you can `bsearch()'. 100,000 entries
isn't that much now-a-days, and
dividing by 26 (for about 4,000 entries)
should provide a very fast lookup.
Stephen
Nov 14 '05 #2
> This isn't _really_ a `C' question...

I know, I know, and I'm sorry to post here, but you guys usually
help me very much (not knowingly, I suppose) with your posts. Also,
there isn't really an alt.hash.maps :-)
If the distribution of "domain" names
is pretty even across the alphabet,
then you could use the 1st letter of
the name as an index to an array of
"pointers" to name/path pairs that
you can `bsearch()'. 100,000 entries
isn't that much now-a-days, and
dividing by 26 (for about 4,000 entries)
should provide a very fast lookup.


Sounds good, I'll give that a try.

Thanks,

Thomas
Nov 14 '05 #3
On Thu, 13 May 2004 07:47:36 -0700, Thomas Christmann wrote:
This isn't _really_ a `C' question...


I know, I know, and I'm sorry to post here, but you guys usually
help me very much (not knowingly, I suppose) with your posts. Also,
there isn't really an alt.hash.maps :-)


comp.programming or something like that (I've forgotten the exact name)
handles language-agnostic algorithm questions. You should get the
algorithm ironed out first before trying a specific implementation anyway.

--
yvoregnevna gjragl-guerr gjb-gubhfnaq guerr ng lnubb qbg pbz
To email me, rot13 and convert spelled-out numbers to numeric form.
"Makes hackers smile" makes hackers smile.

Nov 14 '05 #4
"Stephen L." <sd*********@cast-com.net> writes:

|> Thomas Christmann wrote:

|> > First let me apologize for asking this question when there are so
|> > many answers to it on Google, but most of them are really
|> > contradicting, and making what I want to do very performant is
|> > crucial to my project. So, here's what I have:

|> > My C programm connects to a database and gets ca. 50-100K domain
|> > name/file path pairs. Those pairs have to be cached by my
|> > application. Building the cache may take a second or two, but
|> > retrieving from it must be very fast. Since I get the data from a
|> > database, I'd be able to order by domain name (which will be my
|> > key, and is guaranteed to be unique), so I thought something like
|> > a btree search for strings might be a good idea. I only have to
|> > look up by domain name from the hash, searching by path is not
|> > permitted. Since I'm far from being an expert on the subject of
|> > hashing and search algorithms, your opinion on how to make this
|> > fast is humbly requested :-)

|> If the distribution of "domain" names
|> is pretty even across the alphabet,
|> then you could use the 1st letter of
|> the name as an index to an array of
|> "pointers" to name/path pairs that
|> you can `bsearch()'.

They aren't. I'll bet that well over half of all domains start with
"www.". Also, the alphabet for domain names isn't limited to letters.

I think that for this application, nothing will beat a good hash code.
The trick is, of course, to avoid a bad one:-); for some reason, URL's
seem to be very sensitive to bad hash codes. A Google search for FNV
hashing should turn up what you need -- if performance of the hash
itself turns out to be an issue, and your hardware doesn't handle
arbitrary multiplies very rapidly, I've also used Mersenne prime based
hash codes in the past with good results. (The basic algorithm is the
same as for FNV hashing, but the multiplier is a Mersenne prime, which
can easily be calculated with a shift and a subtraction.)

--
James Kanze
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France +33 (0)1 30 23 00 34
Nov 14 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Stewart | last post by:
Dear javascripters, Through a frustrating afternoon of debugging I appear to have discovered something: Setting location.hash to an empty string in the global namespace (not inside a...
9
by: zjut | last post by:
I want to add a string to the file and the file is sort by letter! for examply: the follow file is a big file ////////////////////// abort black cabbage dog egg fly
0
by: thomson | last post by:
Hi all, can any one tell me which is fast traversing a XML file or a hash file is fast, i got few few field names and values in XML which i will use to retrieve. I can use Hash File also to do the...
2
by: thomson | last post by:
Hi all, can any one tell me which is fast traversing a XML file or a hash file is fast, i got few few field names and values in XML which i will use to retrieve. I can use Hash File also to do the...
6
by: thecodemachine | last post by:
Hi, I'm looking for a fast and simple one to one hash function, suitable for longer strings (up to 2048 in length). I'd like keys to be relatively short, I doubt I'd be creating more than 256...
5
by: Just call me James | last post by:
Hi, Coming away from the luxury of the delphi IDE has been something of a shock. As a consequence I've become aware that maybe I need to spend some money on a python IDE. As a beginner I...
44
by: gokkog | last post by:
Hi there, There's a classic hash function to hash strings, where MULT is defined as "31": //from programming pearls unsigned int hash(char *ptr) { unsigned int h = 0; unsigned char *p =...
6
by: fdmfdmfdm | last post by:
This might not be the best place to post this topic, but I assume most of the experts in C shall know this. This is an interview question. My answer is: hash table gives you O(1) searching but...
95
by: hstagni | last post by:
Where can I find a library to created text-based windows applications? Im looking for a library that can make windows and buttons inside console.. Many old apps were make like this, i guess ...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.