By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,572 Members | 1,516 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,572 IT Pros & Developers. It's quick & easy.

What's a fast way of doing millions of SQL queries

P: n/a
Hi,
I have a job where it will involve checking large volumes of addresses
against a master address table.

The master address table contains approx 1.7 million addresses.
The files (normally CSV) I must check against will contain anything
between 100,000 records and 40 million records.

Due to spelling mistakes, different address formats and other errors I
can't just load up the data into SQL and do simple queries to get
matches. So I am writting some vb.net code to pre-process the data in
order to match to the SQL master address table.

At the moment I've played around with an open SQL connection and
SQLdatareader that I'm feeding new SQLcommands to. It works but as I'm a
relative beginner to vb.net it's probably not the most efficient method.

Would loading the master address table into a dataset or datatable be
effective. Bear in mind that the master address table will be approx
600mb in size.
Any help/advice/suggestions greatly appreciated.
thanks
Dave
Nov 21 '05 #1
Share this Question
Share on Google+
4 Replies


P: n/a
uh...actually you *can* do it all in sql and i can almost guarantee that it
will process the records faster than any pre-processor you can build.

i know because we had a team of developers do and test just that. we even
went as far as expanding every form of address (blvd., w 34 n ave., st.,
ste. etc.) to its full standardized text. if you want accuracy, you'll need
to consult the usps guidelines on addressing. anyway, once your address is
standardized you can accurately query it to do whatever you'd like.

imho, this should all be done as a dts package and any other option, unless
for better reasons than given here, should be summarily dismissed.
"dave" <da**@invalid.domain.com> wrote in message
news:zv****************@fe1.news.blueyonder.co.uk. ..
| Hi,
| I have a job where it will involve checking large volumes of addresses
| against a master address table.
|
| The master address table contains approx 1.7 million addresses.
| The files (normally CSV) I must check against will contain anything
| between 100,000 records and 40 million records.
|
| Due to spelling mistakes, different address formats and other errors I
| can't just load up the data into SQL and do simple queries to get
| matches. So I am writting some vb.net code to pre-process the data in
| order to match to the SQL master address table.
|
| At the moment I've played around with an open SQL connection and
| SQLdatareader that I'm feeding new SQLcommands to. It works but as I'm a
| relative beginner to vb.net it's probably not the most efficient method.
|
| Would loading the master address table into a dataset or datatable be
| effective. Bear in mind that the master address table will be approx
| 600mb in size.
|
|
| Any help/advice/suggestions greatly appreciated.
|
|
| thanks
| Dave
Nov 21 '05 #2

P: n/a
steve wrote:
uh...actually you *can* do it all in sql and i can almost guarantee that it
will process the records faster than any pre-processor you can build.

i know because we had a team of developers do and test just that. we even
went as far as expanding every form of address (blvd., w 34 n ave., st.,
ste. etc.) to its full standardized text. if you want accuracy, you'll need
to consult the usps guidelines on addressing. anyway, once your address is
standardized you can accurately query it to do whatever you'd like.
My problem is that the addresses are not standardized. They are a mess
and in many cases they are invalid and un-useable in thier current form.
Hence the need to reference a maser address table to clean them up.

Dave



imho, this should all be done as a dts package and any other option, unless
for better reasons than given here, should be summarily dismissed.
"dave" <da**@invalid.domain.com> wrote in message
news:zv****************@fe1.news.blueyonder.co.uk. ..
| Hi,
| I have a job where it will involve checking large volumes of addresses
| against a master address table.
|
| The master address table contains approx 1.7 million addresses.
| The files (normally CSV) I must check against will contain anything
| between 100,000 records and 40 million records.
|
| Due to spelling mistakes, different address formats and other errors I
| can't just load up the data into SQL and do simple queries to get
| matches. So I am writting some vb.net code to pre-process the data in
| order to match to the SQL master address table.
|
| At the moment I've played around with an open SQL connection and
| SQLdatareader that I'm feeding new SQLcommands to. It works but as I'm a
| relative beginner to vb.net it's probably not the most efficient method.
|
| Would loading the master address table into a dataset or datatable be
| effective. Bear in mind that the master address table will be approx
| 600mb in size.
|
|
| Any help/advice/suggestions greatly appreciated.
|
|
| thanks
| Dave

Nov 21 '05 #3

P: n/a
Steve's correct, you need to get the data into the SQL table and then
process it from there. If the CSV file requires some basic preprocessing to
get it ready to load into SQL then you can code that in VB, but get it into
SQL and then do the majority of your cleanup work there.

--
JT Lovell
"dave" <da**@invalid.domain.com> wrote in message
news:kG****************@fe1.news.blueyonder.co.uk. ..
steve wrote:
uh...actually you *can* do it all in sql and i can almost guarantee that
it will process the records faster than any pre-processor you can build.

i know because we had a team of developers do and test just that. we even
went as far as expanding every form of address (blvd., w 34 n ave., st.,
ste. etc.) to its full standardized text. if you want accuracy, you'll
need to consult the usps guidelines on addressing. anyway, once your
address is standardized you can accurately query it to do whatever you'd
like.


My problem is that the addresses are not standardized. They are a mess and
in many cases they are invalid and un-useable in thier current form. Hence
the need to reference a maser address table to clean them up.

Dave



imho, this should all be done as a dts package and any other option,
unless for better reasons than given here, should be summarily dismissed.
"dave" <da**@invalid.domain.com> wrote in message
news:zv****************@fe1.news.blueyonder.co.uk. ..
| Hi,
| I have a job where it will involve checking large volumes of addresses
| against a master address table.
|
| The master address table contains approx 1.7 million addresses.
| The files (normally CSV) I must check against will contain anything
| between 100,000 records and 40 million records.
|
| Due to spelling mistakes, different address formats and other errors I
| can't just load up the data into SQL and do simple queries to get
| matches. So I am writting some vb.net code to pre-process the data in
| order to match to the SQL master address table.
|
| At the moment I've played around with an open SQL connection and
| SQLdatareader that I'm feeding new SQLcommands to. It works but as I'm
a
| relative beginner to vb.net it's probably not the most efficient
method.
|
| Would loading the master address table into a dataset or datatable be
| effective. Bear in mind that the master address table will be approx
| 600mb in size.
|
|
| Any help/advice/suggestions greatly appreciated.
|
|
| thanks
| Dave

Nov 21 '05 #4

P: n/a

I agree with Steve, try to use SQL to do the Import Process (DTS
Packages). In the past, I used a third party product called AccuZIP (I
don't have any association with AccuZIP, I just used their product so I
am familiar with them) to do the Address standardization. Accuzip was
good because I could call a execuatable through the command line
passing in parms of known scripts to use. These scripts did the work of
(standardizing, long/lat calcs, even CASS Certify addresses). The
product would use either the Address/City/State or the ZIP code to
indicate if the address was a valid address (note: it can tell you if
the Address is theoretically there, not that the address is REALLY
there). I believe that they even have a service where you place files
to process out on the internet (User ID & Password protected..of
course) and within X amount of time, the file would be cleaned up.

Anyway, once Accuzip was done, I would import the Accuzip output into a
temp table then Left Outter Join to insert the new validated Addresses,
(the non-validatable addressess would then be QC'ed manually to see
what the problem was on them (and if it was a simple mistake...include
the record into a Translation table - bad address to good address -
then do import of corrected address)...then go on....I would process
within 15-20 minutes (100's of thousands of records every day)...after
a while the import process was pretty much a lights out type
environment....15-30 minutes every few weeks to trouble-shoot simple
problems.

Anyway, this may not have helped, but I would tend to go this
route...it's easier to maintain in the long run...versus re-compiling
code to accomodate for new situations....
--
rviray
------------------------------------------------------------------------
rviray's Profile: http://www.msusenet.com/member.php?userid=4211
View this thread: http://www.msusenet.com/t-1871083191

Nov 21 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.