473,405 Members | 2,344 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,405 software developers and data experts.

CHECKSUM_AGG and BINARY_CHECKSUM performance problems


Gentlemen,

I am using the following query to get a list of grouped checksum data.

SELECT CAST(Field0_datetime AS INT),
CHECKSUM_AGG(BINARY_CHECKSUM(Field1_bigint, Field2_datetime,
Field3_datetime, Field4_bigint, Field5_bigint, CAST(Field6_float
Decimal(38,6)), Field7_datetime))
FROM Table1
WHERE Field0_datetime BETWEEN '2003-01-01' AND '2003-01-20'
GROUP BY CAST(Field0_datetime AS INT)

Please notice the used filter: from January 1 to January 20.
That query takes about 6 minutes do return the data. The result is 18
records.

However, when I execute the same query filtering BETWEEN '2003-01-01' and
'2003-01-10', this time it takes only 1 second to return data.
When I execute the query filtering BETWEEN '2003-01-10' and '2003-01-20' the
query takes another 1 second to return data.

So why 6 minutes to process them together??

The table have an index by Field0_datetime.

It contains about 1.5 millions records total, using around 1.7Gb of
diskspace, indexes included.

From 2003-01-01 and 2003-01-20, there are 11401 records selected. Don't look
like that much.

The situation is repeatable, I mean, if I execute the queries back and
again, they takes the about the same ammount of time to execute, so I don't
think this problem is related to cache or something like that.

I would appreciate any advice about what might be wrong with my situation.

Thanks a lot and kind regards,

Orly Junior
IT Professional

Jul 23 '05 #1
2 5390
By using the profiler, I found that while executing the first query (20 days
span), the system don't use the index. How it possible?

A simpler version of the query that causes the same problem is:

select checksum_agg(binary_checksum([dc])) from [table1] where [dc] between
'2003-01-01' and '2003-01-20'

The profiler reports it will be using a clustered index scan wich is
unacceptable since the table have a lot of records.

Why the hell it is not using the [dc] index ?? If a tight the criteria to
between a 10-day span it uses the index correctly.

Do you have any idea why is that happening?

Thanks in advance and best regards,

Orly Junior
IT Professional
"Orly Junior" <no****@nomail.com> wrote in message
news:42***********************@news.telepac.pt...

Gentlemen,

I am using the following query to get a list of grouped checksum data.

SELECT CAST(Field0_datetime AS INT),
CHECKSUM_AGG(BINARY_CHECKSUM(Field1_bigint, Field2_datetime,
Field3_datetime, Field4_bigint, Field5_bigint, CAST(Field6_float
Decimal(38,6)), Field7_datetime))
FROM Table1
WHERE Field0_datetime BETWEEN '2003-01-01' AND '2003-01-20'
GROUP BY CAST(Field0_datetime AS INT)

Please notice the used filter: from January 1 to January 20.
That query takes about 6 minutes do return the data. The result is 18
records.

However, when I execute the same query filtering BETWEEN '2003-01-01' and
'2003-01-10', this time it takes only 1 second to return data.
When I execute the query filtering BETWEEN '2003-01-10' and '2003-01-20'
the query takes another 1 second to return data.

So why 6 minutes to process them together??

The table have an index by Field0_datetime.

It contains about 1.5 millions records total, using around 1.7Gb of
diskspace, indexes included.

From 2003-01-01 and 2003-01-20, there are 11401 records selected. Don't
look like that much.

The situation is repeatable, I mean, if I execute the queries back and
again, they takes the about the same ammount of time to execute, so I
don't think this problem is related to cache or something like that.

I would appreciate any advice about what might be wrong with my situation.

Thanks a lot and kind regards,

Orly Junior
IT Professional

Jul 23 '05 #2
Orly Junior (no****@nomail.com) writes:
By using the profiler, I found that while executing the first query (20
days span), the system don't use the index. How it possible?


When you have a non-clustered index that can be used to compute a query,
SQL Server cannot always use this index blindly. If the selection is
small, the index is find. If the selection is large, the index spells
disaster. This is because every hit in the pages, requires an access to
the data pages. This can up with more pages reads, than use scanning the
table once.

Now, in your case, there are 11041 rows that matches the WHERE clause.
The table is 1.7 GB, which is 207 000 pages. Even if some of those
1.7 GB are indexes, the table scan is obviously more expensive.

But SQL Server does not build query plans from full knowledge, but from
statistics it has saved about the table. If this statistics is inaccurate
for some reason, the estimate may be incorrect. By default, SQL Server
does only sample data for its statistics.

You can try "UPDATE STATISTICS tbl WITH FULLSCAN" and see if this
has any effect. SQL Server will now look at all rows. However, it
saves data in a histogramme, so you may still lose accuracy. DBCC
SHOW_STATISTICS may give some information.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 23 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Eric S. Johansson | last post by:
more questions about performance measurements in python. I solve the problems I had with shelf by not using it. I'm doing a direct pickle to and from file of an entire dictionary. That knocked off...
5
by: John Bailo | last post by:
I wrote a webservice to output a report file. The fields of the report are formatted based on information in an in-memory XmlDocument. As each row of a SqlDataReader are looped through, a...
1
by: Liam Caffrey | last post by:
Hi, I can see that by using the object ID rather that the object name, the following SQL query works. Has anybody got any idea what is causing the error? -- Works OK select o.id...
5
by: Liam Caffrey | last post by:
Hi, It appears that binary_checksum can give the same checksum for different strings, which is a bit worrying. (I guess the algorithm is the problem in the context of a repeating pattern.) ...
3
by: Orly Junior | last post by:
Hello, Do you know if the algorithm for the BINARY_CHECKSUM function in documented somewhere? I would like to use it to avoid returning some string fields from the server. By returning only...
5
by: Scott | last post by:
I have a customer that had developed an Access97 application to track their business information. The application grew significantly and they used the Upsizing Wizard to move the tables to SQL...
24
by: Bob Alston | last post by:
Most of my Access database implementations have been fairly small in terms of data volume and number of concurrent users. So far I haven't had performance issues to worry about. <knock on wood> ...
4
by: Martin | last post by:
I am using graphics as backgrounds for forms,buttons,labels etc. The question is: is it faster to load all graphics from files on app start or to use it embeded (places in editor during design)....
5
by: Markus Ernst | last post by:
Hello A class that composes the output of shop-related data gets some info from the main shop class. Now I wonder whether it is faster to store the info in the output class or get it from the...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.