473,242 Members | 1,862 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,242 software developers and data experts.

Caching Queries in DataTable

Hello:

I am writing a cute little class that will cache queries against a
database. Currently, I am implementing this by storing the command
text, parameter values and generated DataRows.

First of all, is there something that does this already?

Second, right now I use a struct called Result that looks like this:

private struct Result
{
public object[] ParameterValues;
public DataRow[] Rows;
}

I am looking up past queries by storing the queries in a
Dictionary<string, List<Result>>, where the string key is the command
text and the List<Resultvalue is a collection of all the parameter
values and the generated DataRows that have been queried thus far.

Is there a way to improve the performance of this search? I am looking
for ideas. Currently, I have to loop though all the Results in the
List<Resultand compare against each of the parameter values. Is
there a way to optimize this algorithm while keeping things generic? I
can't even go against a primary key, since this class is meant to work
with anything.

Essentially, this class is worthless if it takes longer to find the
cached results than it would be to just rehit the database.

Thanks,
Travis
Aug 14 '08 #1
9 3602
On Aug 14, 7:32*am, "jehugalea...@gmail.com" <jehugalea...@gmail.com>
wrote:
Hello:

I am writing a cute little class that will cache queries against a
database. Currently, I am implementing this by storing the command
text, parameter values and generated DataRows.

First of all, is there something that does this already?
If you're doing it solely for performance reasons, then yes. The
database itself is likely to cache query results.
Second, right now I use a struct called Result that looks like this:

private struct Result
{
* * public object[] ParameterValues;
* * public DataRow[] Rows;

}

I am looking up past queries by storing the queries in a
Dictionary<string, List<Result>>, where the string key is the command
text and the List<Resultvalue is a collection of all the parameter
values and the generated DataRows that have been queried thus far.

Is there a way to improve the performance of this search? I am looking
for ideas. Currently, I have to loop though all the Results in the
List<Resultand compare against each of the parameter values. Is
there a way to optimize this algorithm while keeping things generic? I
can't even go against a primary key, since this class is meant to work
with anything.
Since you're effectively looking up by CommandText and Parameters, why
not aggregate them together into a single class, and use that as a key
of your Dictionary? Then your search will be just a single hashtable
lookup, without the following linear scan (though GetHashCode() would
be somewhat slower).
Aug 14 '08 #2
On Aug 13, 11:32*pm, "jehugalea...@gmail.com" <jehugalea...@gmail.com>
wrote:
Hello:

I am writing a cute little class that will cache queries against a
database. Currently, I am implementing this by storing the command
text, parameter values and generated DataRows.

First of all, is there something that does this already?

Second, right now I use a struct called Result that looks like this:

private struct Result
{
* * public object[] ParameterValues;
* * public DataRow[] Rows;

}

I am looking up past queries by storing the queries in a
Dictionary<string, List<Result>>, where the string key is the command
text and the List<Resultvalue is a collection of all the parameter
values and the generated DataRows that have been queried thus far.

Is there a way to improve the performance of this search? I am looking
for ideas. Currently, I have to loop though all the Results in the
List<Resultand compare against each of the parameter values. Is
there a way to optimize this algorithm while keeping things generic? I
can't even go against a primary key, since this class is meant to work
with anything.

Essentially, this class is worthless if it takes longer to find the
cached results than it would be to just rehit the database.

Thanks,
Travis
honestly, unless that you have a serious performance hit by accessing
the DB you should go to it.

if your approach is faster depend of several things like how many
combinations of command/parameters you will have, how fast is your DB
connection, etc
Aug 14 '08 #3
On Aug 14, 2:15 am, Pavel Minaev <int...@gmail.comwrote:
On Aug 14, 7:32 am, "jehugalea...@gmail.com" <jehugalea...@gmail.com>
wrote:
Hello:
I am writing a cute little class that will cache queries against a
database. Currently, I am implementing this by storing the command
text, parameter values and generated DataRows.
First of all, is there something that does this already?

If you're doing it solely for performance reasons, then yes. The
database itself is likely to cache query results.


Second, right now I use a struct called Result that looks like this:
private struct Result
{
public object[] ParameterValues;
public DataRow[] Rows;
}
I am looking up past queries by storing the queries in a
Dictionary<string, List<Result>>, where the string key is the command
text and the List<Resultvalue is a collection of all the parameter
values and the generated DataRows that have been queried thus far.
Is there a way to improve the performance of this search? I am looking
for ideas. Currently, I have to loop though all the Results in the
List<Resultand compare against each of the parameter values. Is
there a way to optimize this algorithm while keeping things generic? I
can't even go against a primary key, since this class is meant to work
with anything.

Since you're effectively looking up by CommandText and Parameters, why
not aggregate them together into a single class, and use that as a key
of your Dictionary? Then your search will be just a single hashtable
lookup, without the following linear scan (though GetHashCode() would
be somewhat slower).- Hide quoted text -

- Show quoted text -
How do I implement such a class? How can I be sure that my hash codes
don't overlap? I mean, I don't know how many parameter values there
may be. Won't this be an issue?

Or is it safe to iterate over each parameter and the command text and
just xor the hash codes?
Aug 14 '08 #4
You have a design issue here that you aren't addressing. Caching the queries
brings no conceivable advantage - your rdbms already does that for you by
preserving and optimizing the query. Also, query construction is simply a
matter of string concatenation which should, all things considered, rival a
hash table look up and load.

You should cache the results of the query since that is where the expense
lies. Opening a connection and transferring data is what is expensive.

--

Regards,
Alvin Bruney [MVP ASP.NET]

[Shameless Author plug]
Download OWC Black Book, 2nd Edition
Exclusively on www.lulu.com/owc $15.00
Need a free copy of VSTS 2008 w/ MSDN Premium?
http://msmvps.com/blogs/alvin/Default.aspx
-------------------------------------------------------
<je**********@gmail.comwrote in message
news:fd**********************************@d1g2000h sg.googlegroups.com...
On Aug 14, 2:15 am, Pavel Minaev <int...@gmail.comwrote:
>On Aug 14, 7:32 am, "jehugalea...@gmail.com" <jehugalea...@gmail.com>
wrote:
Hello:
I am writing a cute little class that will cache queries against a
database. Currently, I am implementing this by storing the command
text, parameter values and generated DataRows.
First of all, is there something that does this already?

If you're doing it solely for performance reasons, then yes. The
database itself is likely to cache query results.


Second, right now I use a struct called Result that looks like this:
private struct Result
{
public object[] ParameterValues;
public DataRow[] Rows;
}
I am looking up past queries by storing the queries in a
Dictionary<string, List<Result>>, where the string key is the command
text and the List<Resultvalue is a collection of all the parameter
values and the generated DataRows that have been queried thus far.
Is there a way to improve the performance of this search? I am looking
for ideas. Currently, I have to loop though all the Results in the
List<Resultand compare against each of the parameter values. Is
there a way to optimize this algorithm while keeping things generic? I
can't even go against a primary key, since this class is meant to work
with anything.

Since you're effectively looking up by CommandText and Parameters, why
not aggregate them together into a single class, and use that as a key
of your Dictionary? Then your search will be just a single hashtable
lookup, without the following linear scan (though GetHashCode() would
be somewhat slower).- Hide quoted text -

- Show quoted text -

How do I implement such a class? How can I be sure that my hash codes
don't overlap? I mean, I don't know how many parameter values there
may be. Won't this be an issue?

Or is it safe to iterate over each parameter and the command text and
just xor the hash codes?
Aug 14 '08 #5
On Aug 14, 4:57*pm, "Alvin Bruney [ASP.NET MVP]" <vapor dan using hot
male spam filterwrote:
You have a design issue here that you aren't addressing. Caching the queries
brings no conceivable advantage - your rdbms already does that for you by
preserving and optimizing the query. Also, query construction is simply a
matter of string concatenation which should, all things considered, rivala
hash table look up and load.

You should cache the results of the query since that is where the expense
lies. Opening a connection and transferring data is what is expensive.

--

Regards,
Alvin Bruney [MVP ASP.NET]
I am caching the result of the query in a DataTable. But, how do you
know which results belong to which query? I could dump down everything
and then work entirely in my DataTable, but that is not a generic
solution. The table could be huge or I don't want to spend a
considerable time building the table at start up.

If I can reduce the time it takes to find older results, then I should
be able use a little more memory and save a hit or two to the database.
Aug 15 '08 #6
An interesting thing to note: I went ahead and took Pavel Minaev's
suggestion to make a Hashable class. It was a great idea, once I
realized clashing didn't matter (just implement Equals). When running
a few examples, I was able to get some statistics. The cacher really
only helps if there is a moderate number of parameters (which is
usually the case) and the parameter values fall within a moderately-
sized range (which is also usual). The memory usage was actually very
low. I might enact a expiration policy.

So really, the value of the cacher hinges on the rate at which
parameter values repeat. It makes sense since the cacher can't cache
something unless it has already been queried. Duh.
Aug 15 '08 #7
On Aug 14, 6:32*am, "jehugalea...@gmail.com" <jehugalea...@gmail.com>
wrote:
Hello:

I am writing a cute little class that will cache queries against a
database. Currently, I am implementing this by storing the command
text, parameter values and generated DataRows.

First of all, is there something that does this already?

Second, right now I use a struct called Result that looks like this:

private struct Result
{
* * public object[] ParameterValues;
* * public DataRow[] Rows;

}

I am looking up past queries by storing the queries in a
Dictionary<string, List<Result>>, where the string key is the command
text and the List<Resultvalue is a collection of all the parameter
values and the generated DataRows that have been queried thus far.

Is there a way to improve the performance of this search? I am looking
for ideas. Currently, I have to loop though all the Results in the
List<Resultand compare against each of the parameter values. Is
there a way to optimize this algorithm while keeping things generic? I
can't even go against a primary key, since this class is meant to work
with anything.

Essentially, this class is worthless if it takes longer to find the
cached results than it would be to just rehit the database.

Thanks,
Travis
Hello

If i understand right you need Linq to something
the select of the linq is very fast
I need more details
Aug 15 '08 #8
On Aug 15, 6:11*am, "jehugalea...@gmail.com" <jehugalea...@gmail.com>
wrote:
An interesting thing to note: I went ahead and took Pavel Minaev's
suggestion to make a Hashable class. It was a great idea, once I
realized clashing didn't matter (just implement Equals).
Erm... you _do_ need to implement GetHashCode() if you provide your
own Equals(). Yes, your GetHashCode() can return equal values for non-
equal objects without breaking anything - but it should never return
non-equal values for equal objects, otherwise any hash-based container
will break! And if you override Equals() to use data equality, but
leave the default GetHashCode(), then that is precisely what will
happen...

Aug 15 '08 #9
On Aug 15, 12:58*pm, Pavel Minaev <int...@gmail.comwrote:
On Aug 15, 6:11*am, "jehugalea...@gmail.com" <jehugalea...@gmail.com>
wrote:
An interesting thing to note: I went ahead and took Pavel Minaev's
suggestion to make a Hashable class. It was a great idea, once I
realized clashing didn't matter (just implement Equals).

Erm... you _do_ need to implement GetHashCode() if you provide your
own Equals(). Yes, your GetHashCode() can return equal values for non-
equal objects without breaking anything - but it should never return
non-equal values for equal objects, otherwise any hash-based container
will break! And if you override Equals() to use data equality, but
leave the default GetHashCode(), then that is precisely what will
happen...
I went ahead and did some additional analysis. The class itself is
really only useful in a very small number of cases. The class comes
with the requirement that data doesn't change a lot, since keeping a
local copy around will eventually lead outdated data. The class also
have to have an extremely limited number of inputs. The values of
parameters must cluster. As the values approach being truly random,
the range of values must be fractions smaller than the number of
reads. Also, the number of parameters also have to be small because as
each value is added, the probability of hitting a cached query grows
substantially. Say parameter one usually tends to fall between 0-99
and the next parameter falls between 0-9. The likelihood of a value
being the cache in a random environment approaches 99 * 9. Take into
account that missed caches still require a hash table hit. From my
analysis, half the processing time is performed by the caching
algorithm performing searches (which is amortized constant), rather
than simply hitting the database and allowing it to perform the
majority of the work, that is in a random environment. Finally,
deleted records need to be handled appropriately. It doesn't make
sense to return a deleted row, so the cacher probably should remove
them before returning the results. This requires a linear search every
time results are returned, since the cache keeps track of the DataRow
directly and doesn't have another means of keeping their values in
tact. The only other work around is to create an expression for
searching within the DataTable that would yield the same results.
However, creating such an expression would probably have even more of
a negative affect on runtime performance (as well as only be partially
implementable).

I think what I am getting at is this: it probably makes more sense to
use an Identity Map (a class for caching results by primary key) when
possible and to just hit the database directly otherwise. With the
primary key, seaches can be performed against the DataTable, which
means it will probably be fast and deal with deleted DataRows
automatically. Of course, I would like to do some more research on a
cacher that works with foreign keys, since that seems slightly more
useful.

Thanks for everyone's input.
Aug 18 '08 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
by: JZ | last post by:
Is there any equivalent of caching sql queries for Python like ADOdb for PHP? (For those who did not hear about ADODb and its feature : http://phplens.com/adodb/caching.of.recordsets.html) -- JZ
0
by: Luca Calderano | last post by:
The Only way is using Com S.G.A S.p.A. Nucleo Sistemi Informativi Luca Calderano -----Messaggio originale----- Da: python-list-admin@python.org Per conto di JZ
0
by: Luca Calderano | last post by:
I MEAN YOU CAN ACCESS ADO TROUGHT COM AS EXPLAINED HERE: import win32com.client import locale from os import sys adCmdStoredProc = 4 # Evaluates CommandText as a stored procedure...
5
by: Craig G | last post by:
im having my first play about with caching a dataset but seem to be having some problems in doing so!! basically i have a dataset which i add to the cache as follows Cache.Insert("myDS",...
1
by: Ben Fidge | last post by:
What are best practices for page output caching on pages that are dynamically generated from database tables. Our site has left-hand navigation that is comprised of dynamically generated menus...
1
by: sam | last post by:
Hi, I know fragment caching where you can use a cached user control in a non-cached page (or the opposite) But without using a user control, how can I cache just a part of my page (two...
4
by: Hermann | last post by:
My site is a bit slow showing the main page so I thought caching query result in PHP will improve performace. Then I read MySQL documentation and saw that MySQL does have a caching feature. So......
0
by: daokfella | last post by:
Hi all, I added a dataset to my project and dragged over a table from server explorer. I modified the dataadapter to use existing stored procedures for the Get, Insert, Update and Delete. I've...
3
by: Gary W. Smith | last post by:
I had a couple questions about data caching. We have a site that gets a huge amount of traffic to a few specific pages that have a lot of data on them (300k hits/hour during peak, about 6-10 data...
0
by: abbasky | last post by:
### Vandf component communication method one: data sharing ​ Vandf components can achieve data exchange through data sharing, state sharing, events, and other methods. Vandf's data exchange method...
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 7 Feb 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:30 (7.30PM). In this month's session, the creator of the excellent VBE...
0
Git
by: egorbl4 | last post by:
Скачал я git, хотел начать настройку, а там вылезло вот это Что это? Что мне с этим делать? ...
1
by: davi5007 | last post by:
Hi, Basically, I am trying to automate a field named TraceabilityNo into a web page from an access form. I've got the serial held in the variable strSearchString. How can I get this into the...
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: Aftab Ahmad | last post by:
So, I have written a code for a cmd called "Send WhatsApp Message" to open and send WhatsApp messaage. The code is given below. Dim IE As Object Set IE =...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.