473,569 Members | 2,691 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

What's a fast way of doing millions of SQL queries

Hi,
I have a job where it will involve checking large volumes of addresses
against a master address table.

The master address table contains approx 1.7 million addresses.
The files (normally CSV) I must check against will contain anything
between 100,000 records and 40 million records.

Due to spelling mistakes, different address formats and other errors I
can't just load up the data into SQL and do simple queries to get
matches. So I am writting some vb.net code to pre-process the data in
order to match to the SQL master address table.

At the moment I've played around with an open SQL connection and
SQLdatareader that I'm feeding new SQLcommands to. It works but as I'm a
relative beginner to vb.net it's probably not the most efficient method.

Would loading the master address table into a dataset or datatable be
effective. Bear in mind that the master address table will be approx
600mb in size.
Any help/advice/suggestions greatly appreciated.
thanks
Dave
Nov 21 '05 #1
4 2663
uh...actually you *can* do it all in sql and i can almost guarantee that it
will process the records faster than any pre-processor you can build.

i know because we had a team of developers do and test just that. we even
went as far as expanding every form of address (blvd., w 34 n ave., st.,
ste. etc.) to its full standardized text. if you want accuracy, you'll need
to consult the usps guidelines on addressing. anyway, once your address is
standardized you can accurately query it to do whatever you'd like.

imho, this should all be done as a dts package and any other option, unless
for better reasons than given here, should be summarily dismissed.
"dave" <da**@invalid.d omain.com> wrote in message
news:zv******** ********@fe1.ne ws.blueyonder.c o.uk...
| Hi,
| I have a job where it will involve checking large volumes of addresses
| against a master address table.
|
| The master address table contains approx 1.7 million addresses.
| The files (normally CSV) I must check against will contain anything
| between 100,000 records and 40 million records.
|
| Due to spelling mistakes, different address formats and other errors I
| can't just load up the data into SQL and do simple queries to get
| matches. So I am writting some vb.net code to pre-process the data in
| order to match to the SQL master address table.
|
| At the moment I've played around with an open SQL connection and
| SQLdatareader that I'm feeding new SQLcommands to. It works but as I'm a
| relative beginner to vb.net it's probably not the most efficient method.
|
| Would loading the master address table into a dataset or datatable be
| effective. Bear in mind that the master address table will be approx
| 600mb in size.
|
|
| Any help/advice/suggestions greatly appreciated.
|
|
| thanks
| Dave
Nov 21 '05 #2
steve wrote:
uh...actually you *can* do it all in sql and i can almost guarantee that it
will process the records faster than any pre-processor you can build.

i know because we had a team of developers do and test just that. we even
went as far as expanding every form of address (blvd., w 34 n ave., st.,
ste. etc.) to its full standardized text. if you want accuracy, you'll need
to consult the usps guidelines on addressing. anyway, once your address is
standardized you can accurately query it to do whatever you'd like.
My problem is that the addresses are not standardized. They are a mess
and in many cases they are invalid and un-useable in thier current form.
Hence the need to reference a maser address table to clean them up.

Dave



imho, this should all be done as a dts package and any other option, unless
for better reasons than given here, should be summarily dismissed.
"dave" <da**@invalid.d omain.com> wrote in message
news:zv******** ********@fe1.ne ws.blueyonder.c o.uk...
| Hi,
| I have a job where it will involve checking large volumes of addresses
| against a master address table.
|
| The master address table contains approx 1.7 million addresses.
| The files (normally CSV) I must check against will contain anything
| between 100,000 records and 40 million records.
|
| Due to spelling mistakes, different address formats and other errors I
| can't just load up the data into SQL and do simple queries to get
| matches. So I am writting some vb.net code to pre-process the data in
| order to match to the SQL master address table.
|
| At the moment I've played around with an open SQL connection and
| SQLdatareader that I'm feeding new SQLcommands to. It works but as I'm a
| relative beginner to vb.net it's probably not the most efficient method.
|
| Would loading the master address table into a dataset or datatable be
| effective. Bear in mind that the master address table will be approx
| 600mb in size.
|
|
| Any help/advice/suggestions greatly appreciated.
|
|
| thanks
| Dave

Nov 21 '05 #3
Steve's correct, you need to get the data into the SQL table and then
process it from there. If the CSV file requires some basic preprocessing to
get it ready to load into SQL then you can code that in VB, but get it into
SQL and then do the majority of your cleanup work there.

--
JT Lovell
"dave" <da**@invalid.d omain.com> wrote in message
news:kG******** ********@fe1.ne ws.blueyonder.c o.uk...
steve wrote:
uh...actually you *can* do it all in sql and i can almost guarantee that
it will process the records faster than any pre-processor you can build.

i know because we had a team of developers do and test just that. we even
went as far as expanding every form of address (blvd., w 34 n ave., st.,
ste. etc.) to its full standardized text. if you want accuracy, you'll
need to consult the usps guidelines on addressing. anyway, once your
address is standardized you can accurately query it to do whatever you'd
like.


My problem is that the addresses are not standardized. They are a mess and
in many cases they are invalid and un-useable in thier current form. Hence
the need to reference a maser address table to clean them up.

Dave



imho, this should all be done as a dts package and any other option,
unless for better reasons than given here, should be summarily dismissed.
"dave" <da**@invalid.d omain.com> wrote in message
news:zv******** ********@fe1.ne ws.blueyonder.c o.uk...
| Hi,
| I have a job where it will involve checking large volumes of addresses
| against a master address table.
|
| The master address table contains approx 1.7 million addresses.
| The files (normally CSV) I must check against will contain anything
| between 100,000 records and 40 million records.
|
| Due to spelling mistakes, different address formats and other errors I
| can't just load up the data into SQL and do simple queries to get
| matches. So I am writting some vb.net code to pre-process the data in
| order to match to the SQL master address table.
|
| At the moment I've played around with an open SQL connection and
| SQLdatareader that I'm feeding new SQLcommands to. It works but as I'm
a
| relative beginner to vb.net it's probably not the most efficient
method.
|
| Would loading the master address table into a dataset or datatable be
| effective. Bear in mind that the master address table will be approx
| 600mb in size.
|
|
| Any help/advice/suggestions greatly appreciated.
|
|
| thanks
| Dave

Nov 21 '05 #4

I agree with Steve, try to use SQL to do the Import Process (DTS
Packages). In the past, I used a third party product called AccuZIP (I
don't have any association with AccuZIP, I just used their product so I
am familiar with them) to do the Address standardization . Accuzip was
good because I could call a execuatable through the command line
passing in parms of known scripts to use. These scripts did the work of
(standardizing, long/lat calcs, even CASS Certify addresses). The
product would use either the Address/City/State or the ZIP code to
indicate if the address was a valid address (note: it can tell you if
the Address is theoretically there, not that the address is REALLY
there). I believe that they even have a service where you place files
to process out on the internet (User ID & Password protected..of
course) and within X amount of time, the file would be cleaned up.

Anyway, once Accuzip was done, I would import the Accuzip output into a
temp table then Left Outter Join to insert the new validated Addresses,
(the non-validatable addressess would then be QC'ed manually to see
what the problem was on them (and if it was a simple mistake...inclu de
the record into a Translation table - bad address to good address -
then do import of corrected address)...then go on....I would process
within 15-20 minutes (100's of thousands of records every day)...after
a while the import process was pretty much a lights out type
environment.... 15-30 minutes every few weeks to trouble-shoot simple
problems.

Anyway, this may not have helped, but I would tend to go this
route...it's easier to maintain in the long run...versus re-compiling
code to accomodate for new situations....
--
rviray
------------------------------------------------------------------------
rviray's Profile: http://www.msusenet.com/member.php?userid=4211
View this thread: http://www.msusenet.com/t-1871083191

Nov 21 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

92
6375
by: Reed L. O'Brien | last post by:
I see rotor was removed for 2.4 and the docs say use an AES module provided separately... Is there a standard module that works alike or an AES module that works alike but with better encryption? cheers, reed
125
14597
by: Sarah Tanembaum | last post by:
Beside its an opensource and supported by community, what's the fundamental differences between PostgreSQL and those high-price commercial database (and some are bloated such as Oracle) from software giant such as Microsoft SQL Server, Oracle, and Sybase? Is PostgreSQL reliable enough to be used for high-end commercial application? Thanks
44
4146
by: lester | last post by:
a pre-beginner's question: what is the pros and cons of .net, compared to ++ I am wondering what can I get if I continue to learn C# after I have learned C --> C++ --> C# ?? I think there must be many know the answer here. thanks
7
4081
by: Danny | last post by:
I am trying to process a database and my code does so much that it takes a whle to go through the database. most of it is sql queries, updates and such. For about 6000 records, it takes over a minute to process,maybe 2.5 to 3 minutes. How can I speed it up? I did create an index on some of the search fields, but didnt seem to help. I...
14
2449
by: google | last post by:
I am creating a new database for use within our company, that I'd like to make reasonably secure (short of a true server based solution). The back-end of a non-server based database seems to be the biggest security hole, due to the ease of opening, linking, and/or importing raw data directly from the back-end database. I've read information...
4
3480
by: Got2Go | last post by:
Hello Group, I have a table that has millions of records in it. About 100 records are added every 5 minutes (one per OIDID) (the sample provided below has data for 2 OIDIDs (99 and 100) And I have a webpage that executes 9 queries one after the other, and then displays the results on the webpage. When the database was empty, this process...
4
4887
by: Federico | last post by:
Hi everybody, I'm evaluating the possibility of using MySQL 5.0 as a database backend for an application we are doing and I'd like to have hardware requirements rough estimates and/or real world experience from people that are already running similar dimensioned database/application. As a rought estimate the database has around 80 tables...
3
2600
by: rfuscjr via AccessMonster.com | last post by:
This is truly bizzare. I have a query that runs for hours in one Access db. When I import it into another Access db, it runs in minutes. I compacted and repaired the original, relinked tables etc. Nothing makes it run faster. I have imported the query in several dbs. I did find another where it appeared to run very long. Still in others...
11
1806
by: Nezhate | last post by:
Hi There ! After learning C++, I must learn gtk mm, libxml, eclipse, TCP/IP . From what should I start? Thanks !!
0
7924
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...
1
7673
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
0
7970
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
0
6284
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
0
5219
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3653
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
1
2113
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1213
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
937
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.