471,326 Members | 2,125 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,326 software developers and data experts.

Working with Hastables

Hi all,

I am new to Hashtables, so at the moment, not fully familiar with them.

I have been experimenting with a search engine spider written in c#. It uses
hashtables to hold the catalog.

Now, if I have a large site, or I want to scan many websites, then the
hashtables would get very large. I am looking at writing them to disk and
reading them, though am not sure how this would work.

Now, I have found this on the net (vb code, I can convert, so no need to
worry about that)

Private Sub cmdSave_Click(ByVal sender As System.Object, ByVal e As
System.EventArgs) Handles cmdSave.Click
'Save the hashtable
If File.Exists(Application.StartupPath & "\data.dat") = True Then
File.Delete(Application.StartupPath & "\data.dat")
Dim fs As New FileStream(Application.StartupPath & "\data.dat",
FileMode.CreateNew)
Dim bf As New
Runtime.Serialization.Formatters.Binary.BinaryForm atter()
bf.Serialize(fs, HashTest)
fs.Close()
End Sub

Private Sub cmdLoad_Click(ByVal sender As System.Object, ByVal e As
System.EventArgs) Handles cmdLoad.Click
If File.Exists(Application.StartupPath & "\data.dat") = False Then
Exit Sub
Dim fs As New FileStream(Application.StartupPath & "\data.dat",
FileMode.Open)
Dim bf As New
Runtime.Serialization.Formatters.Binary.BinaryForm atter()
HashTest = bf.Deserialize(fs)
fs.Close()
cmdIterate_Click(Nothing, Nothing)
End Sub

which looks like it might be suitable... but, my question is then,
1. Can I incrementally add to the hashtable file so that I don't run out of
memory when scanning for files to catalog
2. When reading it back off disk, do I have to read the whole lot into
memory in order to search through it?

If I can't do either of these, then what would you suggest?

The way I want to use it is somewhat like a sql database, where I can
quickly select the records I need.

--
Best regards,
Dave Colliver.
http://www.AshfieldFOCUS.com
~~
http://www.FOCUSPortals.com - Local franchises available
Jan 29 '06 #1
3 2261
Dnia 29-01-2006 o 17:20:07 David
<da*****************@revilloc.REMOVETHIS.com> napisał:
Hi all,

I am new to Hashtables, so at the moment, not fully familiar with them.

I have been experimenting with a search engine spider written in c#. It
uses
hashtables to hold the catalog.

Now, if I have a large site, or I want to scan many websites, then the
hashtables would get very large. I am looking at writing them to disk and
reading them, though am not sure how this would work.

Now, I have found this on the net (vb code, I can convert, so no need to
worry about that)

Private Sub cmdSave_Click(ByVal sender As System.Object, ByVal e As
System.EventArgs) Handles cmdSave.Click
'Save the hashtable
If File.Exists(Application.StartupPath & "\data.dat") = True Then
File.Delete(Application.StartupPath & "\data.dat")
Dim fs As New FileStream(Application.StartupPath & "\data.dat",
FileMode.CreateNew)
Dim bf As New
Runtime.Serialization.Formatters.Binary.BinaryForm atter()
bf.Serialize(fs, HashTest)
fs.Close()
End Sub

Private Sub cmdLoad_Click(ByVal sender As System.Object, ByVal e As
System.EventArgs) Handles cmdLoad.Click
If File.Exists(Application.StartupPath & "\data.dat") = False
Then
Exit Sub
Dim fs As New FileStream(Application.StartupPath & "\data.dat",
FileMode.Open)
Dim bf As New
Runtime.Serialization.Formatters.Binary.BinaryForm atter()
HashTest = bf.Deserialize(fs)
fs.Close()
cmdIterate_Click(Nothing, Nothing)
End Sub

which looks like it might be suitable... but, my question is then,
1. Can I incrementally add to the hashtable file so that I don't run out
of
memory when scanning for files to catalog
2. When reading it back off disk, do I have to read the whole lot into
memory in order to search through it?
[PD] In the solution above I'm afraid the answer for both questions is no.
You would have to serialize and deserialize whole Hashtable.

If I can't do either of these, then what would you suggest?

The way I want to use it is somewhat like a sql database, where I can
quickly select the records I need.

[PD] If you want database functionality why not use database? Some SQL
servers (i.e. Firebird) can be embedded within your application so you
don't need to install them.
--
Piotr Dobrowolski
Piotr.Dobrowolski@_usun_gmail.com
Jan 29 '06 #2
A hashtable is basically like a database table with a single, primary
key. (Actually, they're more analogous to in-memory indexed files, but
not everyone these days remembers what an indexed file is.)

You can store an object in the hashtable by specifying the key under
which to store it. You can retrieve the object again by giving its key.
I use them all the time: if you read some a table full of business
information from a database, often a Hashtable is a natural fit.

Take a table full of invoices, for example. Each invoice usually has a
unique invoice number. Put the invoices in a Hashtable, keyed by
invoice number. Later, when you get a foreign key in another table that
refers to an invoice number, just use that number to look up the
invoice object in your in-memory hash table.

In your case, why not store the information in a database? I believe
that Microsoft's new SQL Express is out (others will correct me if I'm
wrong about that). It's a free mini-database that will run on any
laptop / desktop / notepad machine. That way your app is scalable.

I wouldn't bother using a flat-file serialization like the one you
posted here. It may be fine for a quick test hack, but not for a real
application.

Jan 30 '06 #3
Thank you both Bruce and Piotr,

I thought what I was looking at would not be the best solution. I will look
at writing it to a database.

It is for a search engine and spider that uses Hashtables to store the
catalog. I don't like the idea of storing it this way, but need a quick
access.

I am looking at alternative search engine code as well, so I will start
another thread regarding that.

Best regards,
Dave Colliver.
http://www.AshfieldFOCUS.com
~~
http://www.FOCUSPortals.com - Local franchises available
"Bruce Wood" <br*******@canada.com> wrote in message
news:11**********************@o13g2000cwo.googlegr oups.com...
A hashtable is basically like a database table with a single, primary
key. (Actually, they're more analogous to in-memory indexed files, but
not everyone these days remembers what an indexed file is.)

You can store an object in the hashtable by specifying the key under
which to store it. You can retrieve the object again by giving its key.
I use them all the time: if you read some a table full of business
information from a database, often a Hashtable is a natural fit.

Take a table full of invoices, for example. Each invoice usually has a
unique invoice number. Put the invoices in a Hashtable, keyed by
invoice number. Later, when you get a foreign key in another table that
refers to an invoice number, just use that number to look up the
invoice object in your in-memory hash table.

In your case, why not store the information in a database? I believe
that Microsoft's new SQL Express is out (others will correct me if I'm
wrong about that). It's a free mini-database that will run on any
laptop / desktop / notepad machine. That way your app is scalable.

I wouldn't bother using a flat-file serialization like the one you
posted here. It may be fine for a quick test hack, but not for a real
application.

Jan 30 '06 #4

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

2 posts views Thread by Gary | last post: by
6 posts views Thread by Mullin Yu | last post: by
8 posts views Thread by Hardy Wang | last post: by
5 posts views Thread by Martin Heuckeroth | last post: by
5 posts views Thread by tshad | last post: by
8 posts views Thread by jojobar | last post: by
2 posts views Thread by Don | last post: by
reply views Thread by rosydwin | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.