473,846 Members | 1,962 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

How do you improve SQL performance over large amount of data?

Hi,

I am using SQL 2000 and has a table that contains more than 2 million
rows of data (and growing). Right now, I have encountered 2 problems:

1) Sometimes, when I try to query against this table, I would get sql
command time out. Hence, I did more testing with Query Analyser and to
find out that the same queries would not always take about the same
time to be executed. Could anyone please tell me what would affect the
speed of the query and what is the most important thing among all the
factors? (I could think of the opened connections, server's
CPU/Memory...)

2) I am not sure if 2 million rows is considered a lot or not, however,
it start to take 5~10 seconds for me to finish some simple queries. I
am wondering what is the best practices to handle this amount of data
while having a decent performance?
Thank you,

Charlie Chang
[Ch*********@hot mail.com]

Jul 23 '05 #1
5 14402

Have you researched indexes?
Generally if you create a index in your table on the most common fields
called in your where statement you can increase performance
considerably.
Keep in mind that creating too many indexes could hinder performance
for insert and delete queries since the table has to be reindexed after
each of these type of operations.
Any other suggestions would require us seeing how you built the table.
Hope that helps.

Philip
ch*********@hot mail.com wrote:
Hi,

I am using SQL 2000 and has a table that contains more than 2 million
rows of data (and growing). Right now, I have encountered 2 problems:

1) Sometimes, when I try to query against this table, I would get sql
command time out. Hence, I did more testing with Query Analyser and to find out that the same queries would not always take about the same
time to be executed. Could anyone please tell me what would affect the speed of the query and what is the most important thing among all the
factors? (I could think of the opened connections, server's
CPU/Memory...)

2) I am not sure if 2 million rows is considered a lot or not, however, it start to take 5~10 seconds for me to finish some simple queries. I
am wondering what is the best practices to handle this amount of data
while having a decent performance?
Thank you,

Charlie Chang
[Ch*********@hot mail.com]


Jul 23 '05 #2
(ch*********@ho tmail.com) writes:
I am using SQL 2000 and has a table that contains more than 2 million
rows of data (and growing). Right now, I have encountered 2 problems:
I'll take the second question first, as it is more general.
2) I am not sure if 2 million rows is considered a lot or not, however,
it start to take 5~10 seconds for me to finish some simple queries. I
am wondering what is the best practices to handle this amount of data
while having a decent performance?
Two million rows for a table is a respectable number, although the world
have seen many larger tables than this. By the way, what matters a lot
is the total size. A two-million row table with a single integer column
and a two-million row table with a single char(8000) are very different.
But say that you have some 30 columns, with an average size of 300 bytes.
That's a 600 MB table which certainly is not a small table.

For a table with that size, it's essential that you have good indexes
for the common queries. It is also essential that you rebuild indexes
on a regular basis with DBCC REINDEX. However, depends how quickly they
get fragmented.

When you say that queries are taking a long time, it could be because
you need to add some more indexes. One tool to find proper indexes, is
run the Index Tuning Wizard on a workload.

If you believe that you have the right indexes, a possible cause could
be fragmentation. The command DBCC SHOWCONTIG can give you information
about this.
1) Sometimes, when I try to query against this table, I would get sql
command time out. Hence, I did more testing with Query Analyser and to
find out that the same queries would not always take about the same
time to be executed. Could anyone please tell me what would affect the
speed of the query and what is the most important thing among all the
factors? (I could think of the opened connections, server's
CPU/Memory...)


There are a bit too many unknowns here to give an exact answer. Is the
same query take differnt amount of time from execution to execution?
There are at least two possible causes for this: blocking and caching.
If another process performs some update operation, your query may be
blocked for a while. You can examine blocking by using the sp_who command.
If you see a non-zero value in the Blk column, the spid on that row is
blocked by the spid in Blk. In the status bar in QA you can see the spid
of the current window.

SQL Server tries to keep as much data as it can in cache. If data is
in cache, the response time for a query can be significantly better
than if data has to be read from disk. But the cache cannot be bigger
than a certain amount of the available memory in the machine. (I don't
know the exact number, but say 60-70%). If there are a lot of scans in
many tables, data will go in and out of the cache, and response time
will vary accordingly.

When testing different queries or indexes, one way to factor out the
effect of the cache is to use the command DBCC DROPCLEANBUFFER S. This
flushes the cache entirely. Obviously, it is not a good idea to do
this on a production box.

--
Erland Sommarskog, SQL Server MVP, es****@sommarsk og.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 23 '05 #3
Thx for the reply, I will read about the indexes tonight.

As of my table structure. It consists of 12 columns with the following
order:

Sale_Date_DT (datetime, first column)
Employee_ID (int)
Machine_ID (int)
Receipt_Number_ NV (nvarchar)
UPC_NV (nvarchar)
Quantity_Sold_I N (int)
Sale_Price_MN (money)
Tax_MN (money)
Payment_Type_IN (int)
Payment_Amount_ MN (money)
Rebate_Category _ID (int)
Sales_ID (int, key, identity)

I get somewhere between 1.5 to 2 million rows of data every year. I
have been thinking about archieve and reindexing every 6 months.

I guess I will read about indexing and full-text indexing (maybe on
receipt number). Any other suggestions would be appreciated :)

Thank you,

Charlie Chang
[ch*********@hot mail.com]

ph******@msn.co m wrote:
Have you researched indexes?
Generally if you create a index in your table on the most common fields called in your where statement you can increase performance
considerably.
Keep in mind that creating too many indexes could hinder performance
for insert and delete queries since the table has to be reindexed after each of these type of operations.
Any other suggestions would require us seeing how you built the table. Hope that helps.

Philip
ch*********@hot mail.com wrote:
Hi,

I am using SQL 2000 and has a table that contains more than 2 million rows of data (and growing). Right now, I have encountered 2 problems:
1) Sometimes, when I try to query against this table, I would get sql command time out. Hence, I did more testing with Query Analyser and

to
find out that the same queries would not always take about the same
time to be executed. Could anyone please tell me what would affect

the
speed of the query and what is the most important thing among all the factors? (I could think of the opened connections, server's
CPU/Memory...)

2) I am not sure if 2 million rows is considered a lot or not,

however,
it start to take 5~10 seconds for me to finish some simple queries. I am wondering what is the best practices to handle this amount of data while having a decent performance?
Thank you,

Charlie Chang
[Ch*********@hot mail.com]


Jul 23 '05 #4
Adding indexes works great. Thank you.

I do have a few more questions:

when I do dbcc showcontig (table_name)
I get the following information:

TABLE level scan performed.
- Pages Scanned........ ............... .........: 38882
- Extents Scanned........ ............... .......: 4879
- Extent Switches....... ............... ........: 4878
- Avg. Pages per Extent......... ............... : 8.0
- Scan Density [Best Count:Actual Count].......: 99.63% [4861:4879]
- Logical Scan Fragmentation ............... ...: 0.08%
- Extent Scan Fragmentation ............... ....: 1.46%
- Avg. Bytes Free per Page........... ..........: 27.4
- Avg. Page Density (full)......... ............: 99.66%

I guess the number to look is the Scan Density (the table I had problem
with was down to 34%). Now, what I really want to know is, in general,
when should I reindex the table?

Another question I encountered is that while performing all the
database maintenances and with all the failed (due to fragmentation
causing operation timeout) , my transaction log get so big that my HD
ran out of spaces. I detached the database and truncated the log and
reattach back the database in order to fix this. I am wondering, if
there is a way that we can do so that the transaction log can erase the
old logs when the log file get to a certain size?
Thank you again for your reply, it really helped.

Charlie Chang

Erland Sommarskog wrote:
(ch*********@ho tmail.com) writes:
I am using SQL 2000 and has a table that contains more than 2 million rows of data (and growing). Right now, I have encountered 2 problems:

I'll take the second question first, as it is more general.
2) I am not sure if 2 million rows is considered a lot or not,
however, it start to take 5~10 seconds for me to finish some simple queries. I am wondering what is the best practices to handle this amount of data while having a decent performance?


Two million rows for a table is a respectable number, although the

world have seen many larger tables than this. By the way, what matters a lot is the total size. A two-million row table with a single integer column and a two-million row table with a single char(8000) are very different. But say that you have some 30 columns, with an average size of 300 bytes. That's a 600 MB table which certainly is not a small table.

For a table with that size, it's essential that you have good indexes
for the common queries. It is also essential that you rebuild indexes
on a regular basis with DBCC REINDEX. However, depends how quickly they get fragmented.

When you say that queries are taking a long time, it could be because
you need to add some more indexes. One tool to find proper indexes, is run the Index Tuning Wizard on a workload.

If you believe that you have the right indexes, a possible cause could be fragmentation. The command DBCC SHOWCONTIG can give you information about this.
1) Sometimes, when I try to query against this table, I would get sql command time out. Hence, I did more testing with Query Analyser and to find out that the same queries would not always take about the same
time to be executed. Could anyone please tell me what would affect the speed of the query and what is the most important thing among all the factors? (I could think of the opened connections, server's
CPU/Memory...)
There are a bit too many unknowns here to give an exact answer. Is

the same query take differnt amount of time from execution to execution?
There are at least two possible causes for this: blocking and caching. If another process performs some update operation, your query may be
blocked for a while. You can examine blocking by using the sp_who command. If you see a non-zero value in the Blk column, the spid on that row is blocked by the spid in Blk. In the status bar in QA you can see the spid of the current window.

SQL Server tries to keep as much data as it can in cache. If data is
in cache, the response time for a query can be significantly better
than if data has to be read from disk. But the cache cannot be bigger
than a certain amount of the available memory in the machine. (I don't know the exact number, but say 60-70%). If there are a lot of scans in many tables, data will go in and out of the cache, and response time
will vary accordingly.

When testing different queries or indexes, one way to factor out the
effect of the cache is to use the command DBCC DROPCLEANBUFFER S. This
flushes the cache entirely. Obviously, it is not a good idea to do
this on a production box.

--
Erland Sommarskog, SQL Server MVP, es****@sommarsk og.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp


Jul 23 '05 #5
(ch*********@ho tmail.com) writes:
I do have a few more questions:

when I do dbcc showcontig (table_name)
I get the following information:

TABLE level scan performed.
- Pages Scanned........ ............... .........: 38882
- Extents Scanned........ ............... .......: 4879
- Extent Switches....... ............... ........: 4878
- Avg. Pages per Extent......... ............... : 8.0
- Scan Density [Best Count:Actual Count].......: 99.63% [4861:4879]
- Logical Scan Fragmentation ............... ...: 0.08%
- Extent Scan Fragmentation ............... ....: 1.46%
- Avg. Bytes Free per Page........... ..........: 27.4
- Avg. Page Density (full)......... ............: 99.66%

I guess the number to look is the Scan Density (the table I had problem
with was down to 34%). Now, what I really want to know is, in general,
when should I reindex the table?
Depends a little, and there are actually a number of strategies you
can use, depending on how the table is used. But as a simple rule of
thumb, don't defragment if scan density is better than 70%. If nothing
else, that avoids unnecessary bloat of the transaction log.
Another question I encountered is that while performing all the
database maintenances and with all the failed (due to fragmentation
causing operation timeout) , my transaction log get so big that my HD
ran out of spaces. I detached the database and truncated the log and
reattach back the database in order to fix this. I am wondering, if
there is a way that we can do so that the transaction log can erase the
old logs when the log file get to a certain size?


Well, depends on what you want the transaction log for. If you are
perfectly content with restore the latest full backup (a backup every
night is good) in case of a crash, just switch to simple recovery mode. You
can still encounter transaction-log explosion with reindexing, since the log
can never be truncated past any currently running transaction. But at least
when you are, the log will be truncated automatically.

If you need up-to-the-point recovery, you must run with full or bulk-logged
recovery, but in such case you don't want the transaction to be erased,
you need to back it up every now and then.

--
Erland Sommarskog, SQL Server MVP, es****@sommarsk og.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 23 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
1450
by: Ice Man | last post by:
Hi All, I will need to send bewteen 2 asp pages very large amounts of data what is the best way to send it and to read it? for ex. I know this method: For i = 1 To Request.QueryString.Count - 1 Response.write(Request.QueryString.Key(i)) Next
2
2749
by: steve | last post by:
Hi, I have researched but have not found a good solution to this problem. I am importing large amounts of data (over 50 Meg) into a new mysql db that I set up. I use >mysql dbname < importfile.txt But I keep getting timeouts and errors due to the data being too large. I know that since if I break the imported data into multiple chuncks (by importing a few tables at a time) then everything works.
5
2188
by: Mike | last post by:
This is a general question on the best way to import a large amount of data to a MS-SQL DB. I can have the data in just about any format I need to, I just don't know how to import the data. I some experience with SQL but not much. There is about 1500 to 2000 lines of data. I am looking for the best way to get this amount of data in on a monthly basis. Any help is greatly thanked!!
1
6000
by: Bart | last post by:
Dear all, I would like to encrypt a large amount of data by using public/private keys, but I read on MSDN: "Symmetric encryption is performed on streams and is therefore useful to encrypt large amounts of data. Asymmetric encryption is performed on a small number of bytes and is therefore only useful for small amounts of data." There is not possibility to do it? I have tried to encrypt a 300kB file by RSA Algorithm, but I received...
1
1884
by: Lakesider | last post by:
Hi NG, I have written an application with a lot of file- and database operations. There are several algorithmic operations, too. My question is: are ther any tools to improve performance - for "normal" C# methods - for database operations - for memory optimization - ...
4
6694
by: loretta | last post by:
I have data within an xml tag that is being truncated when being read into a javascript variable. On Firefox, I am only getting up to 4096 characters. On IE, I am getting 31324 characters. I can view the xml source, all the data is there. I am using javascript function getElementsByTagName to read the data from the tag and then the firstChild and nodeValue notations to get the data. I can't find any references to xml size limits, but I am...
4
2306
by: eggie5 | last post by:
OK, so I'm trying to get a large amount of text back to my ASP script on the server. My large amount of text is the source to a web page, and I want to preserve the formatting on it, ie the indenting. Before any of this the only way I got stuff back to the server is with query strings. Now I'm pretty sure there's other ways to get data back to the sever besides query stings but I'm having trouble finding them. Can somebody point me in the...
16
2593
by: Jack | last post by:
I need to process large amount of data. The data structure fits well in a dictionary but the amount is large - close to or more than the size of physical memory. I wonder what will happen if I try to load the data into a dictionary. Will Python use swap memory or will it fail? Thanks.
3
5182
by: AnishAbs | last post by:
Hi , I have got a senario in which large amount of data should be copied from MS excel to MS SQL server. which is the best option to do so. Because when I use recordsets the process is very slow and it effects the performance. Any one please suggest me. Thanks Anish
5
1558
by: Gilles Ganault | last post by:
Hello I'm no PHP expert, and I'm reading "Building scalable web sites". In the tips section, the author mentions using templates to speed things up. I was wondering how the template engines manage PHP pages that contain calls to MySQL: In our application, the data returned is different for most users, so the resulting page has different contents. So why do templates (and opcode cache) improve performance? Thank you.
0
9879
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, weíll explore What is ONU, What Is Router, ONU & Routerís main usage, and What is the difference between ONU and Router. Letís take a closer look ! Part I. Meaning of...
0
9725
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10976
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10640
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
10330
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7050
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5714
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
4521
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
3157
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.