473,435 Members | 1,484 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,435 software developers and data experts.

How can I maintain Index-heavy application.

145 Expert 100+
Hello


I've been tasked with supporting a newly implemented piece of off-the-shelf software that requires SQL 2000. A recent upgrade to the software radically changed the distributed design of the underlying database to a very, very index-heavy and centralized model (they dropped 80% of the PK's and added 1,000's of indexes). I've been casually keeping an eye on the performance stats coming from the server and wasn't concerned until I learned of plans to perform several large data imports. I'm concerned about index performance, file growth, and data file fragmentation. I have plans to expand the data and log files, but I'm wondering if there isn't something I should do with the indexes pre/post-import. What can I do to improve or prepare for this batch import?


SQL Server 8.0.760
DATA ~3 GB
LOG ~1 GB
Collation SQL_Latin1_General_CP1_CI_AS

Thanks in advance!
Aug 24 '10 #1
10 1594
Oralloy
988 Expert 512MB
Jerry,

As you're aware, a primary key and a unique index are really moral equivalents.

I'm not certain how Microsoft treats them differently, though.

You might start by asking yourself what the incoming data bulk is to your existing data bulk. This will probably drive your decisions.

One trick we've occasionally used in the Oracle world is to turn off the inidices during the import (assuming you can do so), and re-build them afterwards.

Outside of that, how sanitary is the incoming data? Will it collide with your existing data and constraints, or will it simply fit in with little-to-no difficulty?

Good Luck!
Aug 24 '10 #2
Jerry Winston
145 Expert 100+
Thanks for the response.

yes, I know they're equivalent but I'm a little concerned because all of the indexes are Non-Unique,Non-Clustered--I checked sysindexes.

According to my rough estimate the number of incoming rows should triple the size of the database on a 1:3 ratio where one row of incoming data produces one row of data in the database(destTable,propertyTable, LogTable).

I considered disabling indexes for the bulk import but I couldn't be sure what kind of down-time we'd be facing rebuilding the indexes.

The bulk data should come directly into the database without any trouble colliding with the existing data.

I might be making much ado about nothing, and if I am let me know, but I have a sneaking suspicion the DB is going to really balloon. I might be able to calculate the number of grow operations by knowing the growth rate is 10 percent.
Aug 24 '10 #3
Oralloy
988 Expert 512MB
Well, how long does a backup take? :))

Worst case is it all goes south at the end of the import. Then you have lost time and recovery downtime.

Is it possible for you to back up the DB and run an update against the backup first, just to see what happens?
Aug 24 '10 #4
ck9663
2,878 Expert 2GB
First on PK and unique index. They are the same, but different :) The most visible difference between the two is the treatment of NULLable columns. All columns in PK should not allow NULL. Unique index, on the other hand, accepts NULLable columns.

Depending on your definition of "volume", you may start observing the performance of your database. I typically re-create index if my data grew by 10-15%.

Non-clustered index does not update the physical table, just the leaf arrangements of the index.

Good Luck!!!

~~ CK
Aug 24 '10 #5
Oralloy
988 Expert 512MB
Yep, there are differences. Also, the implementation mechanism can be radically different depending on the DB's implementation.

Regardless, the cost of building (and maintaining) either index is expensive.

From what I see, Jerry has about 10M records in his database (assuming ~33% utilization and 100 byte/record).

My personal experience stress testing MySQL shows that it takes between 2 and 4 hours to insert 1M records into an otherwise quiet database on an x86 based server running linux. That's 139 records/second.

Assuming O(log(n)) insert performance, I'm guestimating that 10M records should take a minimum of 40-80 hours to insert while maintaining indices. (As we all know, O(n*log(n)) performance really is decent, but it's still painful for large n.) That's just to build up the original database from raw record data.

And then we get to tripple that for his new data. Ugh! Now we're talking another 120 to 250 hours or more.

So, how can we make this viable?

If we turn off indices, he can (nominally) insert the records in linear [O(n)] time. Three giga-bytes of data is very little, so I would expect a small number of hours.

So the question is - how quickly can the indices be rebuilt? We know the answer is O(n*log(n)), however the time multiplier may be much better, depending on the implementation.

That's why it could be very interesting (and useful) for him to dry-run this operation on a separate server.

I want to see how this works out.
Aug 24 '10 #6
Jerry Winston
145 Expert 100+
Thanks for all the help everyone.


I always like to maintain a test environment for these very situations, when the math says you could be down for 1024 years. :P I can look like a genius on production implementations if I have a test implementation first!

Unfortunately, this project has "legs" and I only found out about it through a "BTW Jerry..." conversation.

This is a very important application so, of course, we purchased the custom consulting component and left the majority of our IT staff in the dark.

The implementation is very, very buggy, IMO, and would not support bringing up a test environment quickly (in our deployment schedule).

Luckily, the import will run in pieces so hopefully by starting with the smallest I can get some good numbers to figure out how long the next run will take.

In any event what do you all suggest I do during the import?
  1. Get a cup of coffee. The contractors will take any heat.
  2. Stand in a door way
  3. Put fingers in ears and crouch under a table
  4. Head between legs to kiss butt goodbye

Thanks again for all the advice
Aug 25 '10 #7
Oralloy
988 Expert 512MB
C-wrap Jerry, but you got handed a ball of snakes.

Good luck, it's good that you've go the contractors helping you out; but still, you have to make sure that all the infrastructure and logistics are kept working during the upgrade. I live the contracting world, and I've dropped a ball or two over the years. You would not believe just how valuable a right-honest backup can be at that moment.

Let us know how it works out. I'd like to know if my time estimates are anything better than the thumb at the end of my arm.

Good Luck!
Aug 25 '10 #8
NeoPa
32,556 Expert Mod 16PB
I figured I wouldn't be in much of a position to help if you were asking the question Jerry, but I'm pleased to see some of the other experts in here helping out.

Best of luck anyway :)
Aug 25 '10 #9
ck9663
2,878 Expert 2GB
There are a number of factors that you might also want to consider. How often the BULK IMPORT will happen? How fast do you want your production database be updated?...Is this a simple BULK IMPORT or a full ETL (Extract-Transform-Load). I deal with ETL process on a monthly basis. Our files could range from a couple of thousand rows...to average of 7-10 million rows...to the extreme of 18 million. I managed to transform the 18 million rows in ~3hrs. I managed to load a 5M rows in ~2hrs. All index active and no re-index performed.

Do you have the necessary rights to bring back those PK instead of index? In theory, PK are faster than unique index due to the fact that PK is a constraint and not an index.

Keep us posted.

Good Luck!!!

~~ CK
Aug 25 '10 #10
Jerry Winston
145 Expert 100+
@ck

Those are some impressive number for BULK IMPORTs, I'll definitely have to talk with you about ETL later.

However, I think I've been a little too cavalier with my language. The "bulk imports" are processed through a third-party application that's completely black-boxed.

The bulk upload will happen once, after all records are loaded new entries will be added as needed.

I'd love to bring back the PK's but there's no way I'd get the green light to perform any DDL-level tuning short of the server physically catching afire.
Aug 25 '10 #11

Sign in to post your reply or Sign up for a free account.

Similar topics

1
by: xo55ox | last post by:
Hi, I want to find out what is a good way to identify indexes that need rebuilding. I tried to run DBCC showcontig and identify them based on the scan density %. And according to a Microsoft...
14
by: Sean C. | last post by:
Helpful folks, Most of my previous experience with DB2 was on s390 mainframe systems and the optimizer on this platform always seemed very predictable and consistent. Since moving to a WinNT/UDB...
2
by: Fred | last post by:
Let's say I suddenly discover that an unclustered table would benefit if it was clustered on column(s) already indexed. Currently, I'd need to drop that perfectly good index just to re-create it as...
5
by: Jerry | last post by:
Is that true that one table can only have one clustered index? The column with clustered index will be sorted physically, is that true? The column with non-clustere index will not be sorted...
22
by: Lonni Friedman | last post by:
Greetings, I've got an annoying problem. I'm currently running PostgreSQL-7.3.4 on Linux (x86). This problem started with 7.3.3. I've got a database that is on the larger side (about 3GB dump)....
10
by: Phuff | last post by:
Thanks in advance! I'm trying to maintain a dl list's selected index on postback. What I'm doing is when a person selects an item from the drop down list I select a date in a calendar control and...
10
by: Ian | last post by:
Henry J. wrote: MDC *guarantees* clustering, whereas a table with a clustering index will eventually require maintenance (a.k.a. reorg) to maintain the cluster ratio. That's not to say that...
3
by: Tomkat53 | last post by:
Hello, I'm not sure if this has been asked here before (wouldn't know what to search for). Here's my situation... I have records in Table1 that I need to insert into Table2. However, there is...
2
by: Ananthu | last post by:
Hi, I have a gridview with paging option. I want to maintain the state of the checkbox column in gridview for each page. I used vb.net coding. I used the following link for this purpose, ...
5
pradeepjain
by: pradeepjain | last post by:
How do I maintain an array over number of different accesses? When ever the page gets refreshed the doctor index maintained by incrementing will be lost right? Should i make it a global variable?...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.