473,386 Members | 1,621 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

What's a fast way of doing millions of SQL queries

Hi,
I have a job where it will involve checking large volumes of addresses
against a master address table.

The master address table contains approx 1.7 million addresses.
The files (normally CSV) I must check against will contain anything
between 100,000 records and 40 million records.

Due to spelling mistakes, different address formats and other errors I
can't just load up the data into SQL and do simple queries to get
matches. So I am writting some vb.net code to pre-process the data in
order to match to the SQL master address table.

At the moment I've played around with an open SQL connection and
SQLdatareader that I'm feeding new SQLcommands to. It works but as I'm a
relative beginner to vb.net it's probably not the most efficient method.

Would loading the master address table into a dataset or datatable be
effective. Bear in mind that the master address table will be approx
600mb in size.
Any help/advice/suggestions greatly appreciated.
thanks
Dave
Nov 21 '05 #1
4 2652
uh...actually you *can* do it all in sql and i can almost guarantee that it
will process the records faster than any pre-processor you can build.

i know because we had a team of developers do and test just that. we even
went as far as expanding every form of address (blvd., w 34 n ave., st.,
ste. etc.) to its full standardized text. if you want accuracy, you'll need
to consult the usps guidelines on addressing. anyway, once your address is
standardized you can accurately query it to do whatever you'd like.

imho, this should all be done as a dts package and any other option, unless
for better reasons than given here, should be summarily dismissed.
"dave" <da**@invalid.domain.com> wrote in message
news:zv****************@fe1.news.blueyonder.co.uk. ..
| Hi,
| I have a job where it will involve checking large volumes of addresses
| against a master address table.
|
| The master address table contains approx 1.7 million addresses.
| The files (normally CSV) I must check against will contain anything
| between 100,000 records and 40 million records.
|
| Due to spelling mistakes, different address formats and other errors I
| can't just load up the data into SQL and do simple queries to get
| matches. So I am writting some vb.net code to pre-process the data in
| order to match to the SQL master address table.
|
| At the moment I've played around with an open SQL connection and
| SQLdatareader that I'm feeding new SQLcommands to. It works but as I'm a
| relative beginner to vb.net it's probably not the most efficient method.
|
| Would loading the master address table into a dataset or datatable be
| effective. Bear in mind that the master address table will be approx
| 600mb in size.
|
|
| Any help/advice/suggestions greatly appreciated.
|
|
| thanks
| Dave
Nov 21 '05 #2
steve wrote:
uh...actually you *can* do it all in sql and i can almost guarantee that it
will process the records faster than any pre-processor you can build.

i know because we had a team of developers do and test just that. we even
went as far as expanding every form of address (blvd., w 34 n ave., st.,
ste. etc.) to its full standardized text. if you want accuracy, you'll need
to consult the usps guidelines on addressing. anyway, once your address is
standardized you can accurately query it to do whatever you'd like.
My problem is that the addresses are not standardized. They are a mess
and in many cases they are invalid and un-useable in thier current form.
Hence the need to reference a maser address table to clean them up.

Dave



imho, this should all be done as a dts package and any other option, unless
for better reasons than given here, should be summarily dismissed.
"dave" <da**@invalid.domain.com> wrote in message
news:zv****************@fe1.news.blueyonder.co.uk. ..
| Hi,
| I have a job where it will involve checking large volumes of addresses
| against a master address table.
|
| The master address table contains approx 1.7 million addresses.
| The files (normally CSV) I must check against will contain anything
| between 100,000 records and 40 million records.
|
| Due to spelling mistakes, different address formats and other errors I
| can't just load up the data into SQL and do simple queries to get
| matches. So I am writting some vb.net code to pre-process the data in
| order to match to the SQL master address table.
|
| At the moment I've played around with an open SQL connection and
| SQLdatareader that I'm feeding new SQLcommands to. It works but as I'm a
| relative beginner to vb.net it's probably not the most efficient method.
|
| Would loading the master address table into a dataset or datatable be
| effective. Bear in mind that the master address table will be approx
| 600mb in size.
|
|
| Any help/advice/suggestions greatly appreciated.
|
|
| thanks
| Dave

Nov 21 '05 #3
Steve's correct, you need to get the data into the SQL table and then
process it from there. If the CSV file requires some basic preprocessing to
get it ready to load into SQL then you can code that in VB, but get it into
SQL and then do the majority of your cleanup work there.

--
JT Lovell
"dave" <da**@invalid.domain.com> wrote in message
news:kG****************@fe1.news.blueyonder.co.uk. ..
steve wrote:
uh...actually you *can* do it all in sql and i can almost guarantee that
it will process the records faster than any pre-processor you can build.

i know because we had a team of developers do and test just that. we even
went as far as expanding every form of address (blvd., w 34 n ave., st.,
ste. etc.) to its full standardized text. if you want accuracy, you'll
need to consult the usps guidelines on addressing. anyway, once your
address is standardized you can accurately query it to do whatever you'd
like.


My problem is that the addresses are not standardized. They are a mess and
in many cases they are invalid and un-useable in thier current form. Hence
the need to reference a maser address table to clean them up.

Dave



imho, this should all be done as a dts package and any other option,
unless for better reasons than given here, should be summarily dismissed.
"dave" <da**@invalid.domain.com> wrote in message
news:zv****************@fe1.news.blueyonder.co.uk. ..
| Hi,
| I have a job where it will involve checking large volumes of addresses
| against a master address table.
|
| The master address table contains approx 1.7 million addresses.
| The files (normally CSV) I must check against will contain anything
| between 100,000 records and 40 million records.
|
| Due to spelling mistakes, different address formats and other errors I
| can't just load up the data into SQL and do simple queries to get
| matches. So I am writting some vb.net code to pre-process the data in
| order to match to the SQL master address table.
|
| At the moment I've played around with an open SQL connection and
| SQLdatareader that I'm feeding new SQLcommands to. It works but as I'm
a
| relative beginner to vb.net it's probably not the most efficient
method.
|
| Would loading the master address table into a dataset or datatable be
| effective. Bear in mind that the master address table will be approx
| 600mb in size.
|
|
| Any help/advice/suggestions greatly appreciated.
|
|
| thanks
| Dave

Nov 21 '05 #4

I agree with Steve, try to use SQL to do the Import Process (DTS
Packages). In the past, I used a third party product called AccuZIP (I
don't have any association with AccuZIP, I just used their product so I
am familiar with them) to do the Address standardization. Accuzip was
good because I could call a execuatable through the command line
passing in parms of known scripts to use. These scripts did the work of
(standardizing, long/lat calcs, even CASS Certify addresses). The
product would use either the Address/City/State or the ZIP code to
indicate if the address was a valid address (note: it can tell you if
the Address is theoretically there, not that the address is REALLY
there). I believe that they even have a service where you place files
to process out on the internet (User ID & Password protected..of
course) and within X amount of time, the file would be cleaned up.

Anyway, once Accuzip was done, I would import the Accuzip output into a
temp table then Left Outter Join to insert the new validated Addresses,
(the non-validatable addressess would then be QC'ed manually to see
what the problem was on them (and if it was a simple mistake...include
the record into a Translation table - bad address to good address -
then do import of corrected address)...then go on....I would process
within 15-20 minutes (100's of thousands of records every day)...after
a while the import process was pretty much a lights out type
environment....15-30 minutes every few weeks to trouble-shoot simple
problems.

Anyway, this may not have helped, but I would tend to go this
route...it's easier to maintain in the long run...versus re-compiling
code to accomodate for new situations....
--
rviray
------------------------------------------------------------------------
rviray's Profile: http://www.msusenet.com/member.php?userid=4211
View this thread: http://www.msusenet.com/t-1871083191

Nov 21 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

92
by: Reed L. O'Brien | last post by:
I see rotor was removed for 2.4 and the docs say use an AES module provided separately... Is there a standard module that works alike or an AES module that works alike but with better encryption?...
125
by: Sarah Tanembaum | last post by:
Beside its an opensource and supported by community, what's the fundamental differences between PostgreSQL and those high-price commercial database (and some are bloated such as Oracle) from...
44
by: lester | last post by:
a pre-beginner's question: what is the pros and cons of .net, compared to ++ I am wondering what can I get if I continue to learn C# after I have learned C --> C++ --> C# ?? I think there...
7
by: Danny | last post by:
I am trying to process a database and my code does so much that it takes a whle to go through the database. most of it is sql queries, updates and such. For about 6000 records, it takes over a...
14
by: google | last post by:
I am creating a new database for use within our company, that I'd like to make reasonably secure (short of a true server based solution). The back-end of a non-server based database seems to be...
4
by: Got2Go | last post by:
Hello Group, I have a table that has millions of records in it. About 100 records are added every 5 minutes (one per OIDID) (the sample provided below has data for 2 OIDIDs (99 and 100) And I...
4
by: Federico | last post by:
Hi everybody, I'm evaluating the possibility of using MySQL 5.0 as a database backend for an application we are doing and I'd like to have hardware requirements rough estimates and/or real world...
3
by: rfuscjr via AccessMonster.com | last post by:
This is truly bizzare. I have a query that runs for hours in one Access db. When I import it into another Access db, it runs in minutes. I compacted and repaired the original, relinked tables...
11
by: Nezhate | last post by:
Hi There ! After learning C++, I must learn gtk mm, libxml, eclipse, TCP/IP . From what should I start? Thanks !!
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.