Speed issue in Access

SQL Learner

Hi all,

I have an Access db with two large tables - 3,100,000 (tblA) and 7,000
(tblB) records. I created a select query using Inner Join by partial
matching two fields (X from tblA and Y from tblB). The size of the db
is about 200MBs.

Now my issue is, the query has been running for over 3 hours already -
I have no idea when it will end. I am using Access 2003. Are there
ways to improve the speed performance? (Also, would the query run
faster if I index the two fields?)

Any helpful suggestion will be appreciated.
- Grasshopper -

Jan 26 '07 #1

Subscribe Post Reply

3192

Tom van Stiphout

On 25 Jan 2007 20:05:39 -0800, "SQL Learner" <ex***********@gmail.com>
wrote:

No. Assuming you mean indexing the two fields used in the
relationship: they are already indexed behind the scenes.

Don't keep us guessing. Give us some SQL.

-Tom.

>Hi all,

I have an Access db with two large tables - 3,100,000 (tblA) and 7,000
(tblB) records. I created a select query using Inner Join by partial
matching two fields (X from tblA and Y from tblB). The size of the db
is about 200MBs.

Now my issue is, the query has been running for over 3 hours already -
I have no idea when it will end. I am using Access 2003. Are there
ways to improve the speed performance? (Also, would the query run
faster if I index the two fields?)

Any helpful suggestion will be appreciated.
- Grasshopper -

Jan 26 '07 #2

SQL Learner

Thank you, Tom. Here is the SQL:

SELECT *
FROM tblA AS M INNER JOIN tblB AS S ON M.X Like "*" & S.Y & "*";

Jan 26 '07 #3

Tim Marshall

SQL Learner wrote:

Thank you, Tom. Here is the SQL:

SELECT *
FROM tblA AS M INNER JOIN tblB AS S ON M.X Like "*" & S.Y & "*";

Holy crap, what the blazes prompted you to write a hideous join like
this? No wonder it's going forever. If someone gave me a join like
this in MS Jet/Access or Oracle I'd kick his ass out into the snow
drifts outside my office.

Use the query builder that Access provides to construct acceptable SQL.
However, it appears you may not have a properly designed schema. If
you're having to use the like operator to make joins, then you do NOT
have a proper linking field between the two tables.

You should have X being a pointer to Y or visa versa. In other words,
they should be the same value where you want records from the two tables
to be joined. Fix this first.

--
Tim http://www.ucs.mun.ca/~tmarshal/
^o<
/#) "Burp-beep, burp-beep, burp-beep?" - Quaker Jake
/^^ "Whatcha doin?" - Ditto "TIM-MAY!!" - Me

Jan 26 '07 #4

Tom van Stiphout

On Fri, 26 Jan 2007 02:08:43 -0330, Tim Marshall
<TI****@PurplePandaChasers.Moertheriumwrote:

Me too. Straight into the spines of a Cholla cactus :-)
-Tom.

>SQL Learner wrote:
>Thank you, Tom. Here is the SQL:

SELECT *
FROM tblA AS M INNER JOIN tblB AS S ON M.X Like "*" & S.Y & "*";

Holy crap, what the blazes prompted you to write a hideous join like
this? No wonder it's going forever. If someone gave me a join like
this in MS Jet/Access or Oracle I'd kick his ass out into the snow
drifts outside my office.

<lip>

Jan 26 '07 #5

SQL Learner

Thanks, double T.

The query was for partial match. Not quite sure if it can be done
using query analyzer. Let me give you the original spec of my issue
below:

================================================== ==========================================
How can I UPDATE table using partial word match?

TABLE: tblStreet_Value (Table to be updated)

Street Value2
--------------- ------
123 ABC Street
124 ABC Street
125 ABC Street
1 XYZ AVE
2 XYZ AVE
3 XYZ AVE
10 CBS Street
11 CBS Street
12 CBS Street
100 Apple Road
101 Apple Road
102 Apple Road

TABLE: TblWord_Number

Word Number Value
---- ------ -----
ABC 123 NY
ABC 125 CA
CBS 11 MA
CBS 12 TX

Expected Result

Street Value2
--------------- ------
123 ABC Street NY
124 ABC Street
125 ABC Street CA
1 XYZ AVE
2 XYZ AVE
3 XYZ AVE
10 CBS Street
11 CBS Street MA
12 CBS Street TX
100 Apple Road
101 Apple Road
102 Apple Road

====================

Jan 26 '07 #6

Tim Marshall

Tom van Stiphout wrote:

On Fri, 26 Jan 2007 02:08:43 -0330, Tim Marshall
<TI****@PurplePandaChasers.Moertheriumwrote:

Me too. Straight into the spines of a Cholla cactus :-)
-Tom.

<Sighwish it was cactus here... we've got a "weather bomb" bearing
down on us for Saturday - which would be great if it were a weekday and
meant a day off work, but on saturday? What a waste... 8)

--
Tim http://www.ucs.mun.ca/~tmarshal/
^o<
/#) "Burp-beep, burp-beep, burp-beep?" - Quaker Jake
/^^ "Be Careful, Big Bird!" - Ditto "TIM-MAY!!" - Me

Jan 26 '07 #7

Tim Marshall

SQL Learner wrote:

SELECT *
FROM tblA AS M INNER JOIN tblB AS S ON M.X Like "*" & S.Y & "*";

PS

My response may have come across as aggressive. Please don't let it
scare you away from this group, it's a wonderful resource. It looks as
if you have some major issues with your design and we can help you here
if you're not sure how to correct things.

--
Tim http://www.ucs.mun.ca/~tmarshal/
^o<
/#) "Burp-beep, burp-beep, burp-beep?" - Quaker Jake
/^^ "Be Careful, Big Bird!" - Ditto "TIM-MAY!!" - Me

Jan 26 '07 #8

SQL Learner

Just to follow up with my information provided above, the Street value
may not always follow the pattern as shown above (Street number +
Streen Name). It was only for simplification reason that I presented
the data that way. Some addresses can be "East 123 ABC Street" or "S.
100 Apply Road, Apt. 4D". So in such cases, string concatenation will
not work.

The SQL works (and it was from an Access MVP's help) - I have tried it
on a sample database where the records are less. I just need to find a
way to speed up the query.

- Grasshopper -

Jan 26 '07 #9

SQL Learner

================================================== =
My response may have come across as aggressive. Please don't let it
scare you away from this group, it's a wonderful resource. It looks as

if you have some major issues with your design and we can help you here

if you're not sure how to correct things.
================================================== =

Please do not worry about, Tim. I am very easy going and can take a
lot of "hits". : )

- Grasshopper -

Jan 26 '07 #10

Tom van Stiphout

On Fri, 26 Jan 2007 14:31:04 -0330, Tim Marshall
<TI****@PurplePandaChasers.Moertheriumwrote:

Same here. Last week it didn't get above 65. Brrrr.
-Tom.

>Tom van Stiphout wrote:

>On Fri, 26 Jan 2007 02:08:43 -0330, Tim Marshall
<TI****@PurplePandaChasers.Moertheriumwrote:

Me too. Straight into the spines of a Cholla cactus :-)
-Tom.

<Sighwish it was cactus here... we've got a "weather bomb" bearing
down on us for Saturday - which would be great if it were a weekday and
meant a day off work, but on saturday? What a waste... 8)

Jan 27 '07 #11

SQL Learner

Are you guys going to help or just chat about the weather?

Jan 27 '07 #12

Lyle Fairfield

"SQL Learner" <ex***********@gmail.comwrote in
news:11*********************@l53g2000cwa.googlegro ups.com:

Are you guys going to help or just chat about the weather?

I would just chat about the weather, at least until I had some more
information.

1. I want to see the REAL problem. A regular here can spend thirty minutes
or so coming up with a suggested solution for a problem. Often, the
original poster responds that the solution didn't work. Why? Because the
problem presented was just an example, or he/she didn't mention that the
problem occurred in Access 2.0, or that one of the tables is a text file.
Your situation of designating a State through identification of street name
and number seems to be a candidate to be the origin of one of these
frustrating experiences. Surely every State has several 120 Main Streets so
how could 120 Main Street identify the State? Of course, maybe I'm
misunderstanding what you wrote. That is not my problem; it's your problem.
Perhaps you have a very small amount of data, so that, indeed, there is
only one 120 Main Street and it is in NY. But a record count of 2 100 000
does not seem to suggest this.
Regardless, the problem you present is quite challenging, and is the result
of a very poorly though out design, or some kind of legacy situation.

2. I want to know about the REAL problem. Is a this a one time need or will
the data be used time and time again to answer question after question? If
it's the latter, we need to parse the address data, probably into fields
recognized and used by the US Post Office. Then the State Designation may
become easier and faster. And the answer to other questions down the line
will be as well.

My guess is that Tim and/or Tom would charge you a $100 bucks an hour if
you were paying them. As you aren't, they may chat about whatever they
please.
If you describe your problem again, very precisely and openly, then one of
them, or someone else, may take a run at it.

Jan 27 '07 #13

Larry Linson

"SQL Learner" <ex***********@gmail.comwrote

Thank you, Tom. Here is the SQL:

SELECT *
FROM tblA AS M INNER JOIN tblB AS S ON M.X Like "*" & S.Y & "*";

My guess is that using the LIKE comparison requires a manipulation and
comparison of every record, vastly increasing the time and disk accesses
required.

If you can redesign your tables so that you are comparing indexed fields,
you are likely to see a significant increase in speed, as the initial
comparison will only use the indexes.

If you can come up with another way to solve the business purpose, you may
be able to use a different approach that is faster, as Lyle pointed out.
Because, if you stick with the approach you came up with to deal with the
problem at hand, there may not BE a way to improve the speed.

And, also, as he pointed out, oversimplifying can be even worse than burying
us in irrelevant information. And, the FAQ should, but doesn't state,
"realize that each additional request for clarification that the potential
responders have to make reduces your probability of getting a good answer to
your question". If it is like "pulling one's own teeth" to determine what
you are actually trying to accomplish so they can help you, some will just
lose interest and go answer some other questions.

Larry Linson
Microsoft Access MVP

Jan 27 '07 #14

SQL Learner

==================================================
....... LIKE comparison requires a manipulation and
comparison of every record, vastly increasing the time and disk
accesses
required.
==================================================
..........we need to parse the address data, probably into fields
recognized and used by the US Post Office. .........
==================================================

Hi Larry and Lyle,

Thank you. That is the kind of answers (above) I need.

The actually problem is very complex and may take 2000 words to
describe it. People may lose their interest reading such complex spec
(or really ask me to pay $100 per hour - which I definely could not
afford). That is why I would rather present the question in a
simplified way. Once I got some hints, I might be able to solve the
problem myself.

I already got the solutions with the help from another MVP:

UPDATE tblStreet_Value AS S INNER JOIN tblWord_Number AS W ON
S.Street
Like "*" & W.Number & "*" & W.Word & "*" SET S.Value2 = W.Value;

The query works very fast for the sample tables, but when it is
implied to the real data (with 3,200,000 records on tblStreet_Value
and 7,000 on TblWord_Number), it took forever for the result to come
up.

My objective is, as stated in my initial post, to find a way to speed
up the query process. After experimenting with different type of
queries while I am waiting for answers, I have tried the same methods
that you suggested: Parse the street field into several different
fields and use a "perfect match" instead of using Like for partial
match AND indexed the fields being linked.

For experimental purpose, I used a SELECT query instead of a UPDATE
query. They improved the speed performance, but still not fast enough
(3 hours on my duo CPU laptop).

Holding all other things constant/same (same machine spec and same
query), would the query be faster if it is ran on a SQL Server ? Is
Access 2003 still using the same jet engine as SQL Server?
Would using indexing really slow down the update query (as I read from
somewhere)?
In general, would the use of Group By or Distinct slow down the
query? Which is fast - Group By or Distinct?

(I understand that the slow down may be caused by the poor design of
the database and inadequate SQL statement.)

Sorry guys, next time I will be more thoughtful.

- Your Humble Grasshopper -

Jan 27 '07 #15

Rick Brandt

"SQL Learner" <ex***********@gmail.comwrote in message
news:11**********************@s48g2000cws.googlegr oups.com...

==================================================
...... LIKE comparison requires a manipulation and
comparison of every record, vastly increasing the time and disk
accesses
required.
==================================================
.........we need to parse the address data, probably into fields
recognized and used by the US Post Office. .........
==================================================

Hi Larry and Lyle,

Thank you. That is the kind of answers (above) I need.

The actually problem is very complex and may take 2000 words to
describe it. People may lose their interest reading such complex spec
(or really ask me to pay $100 per hour - which I definely could not
afford). That is why I would rather present the question in a
simplified way. Once I got some hints, I might be able to solve the
problem myself.

I already got the solutions with the help from another MVP:

UPDATE tblStreet_Value AS S INNER JOIN tblWord_Number AS W ON
S.Street
Like "*" & W.Number & "*" & W.Word & "*" SET S.Value2 = W.Value;

The query works very fast for the sample tables, but when it is
implied to the real data (with 3,200,000 records on tblStreet_Value
and 7,000 on TblWord_Number), it took forever for the result to come
up.

My objective is, as stated in my initial post, to find a way to speed
up the query process. After experimenting with different type of
queries while I am waiting for answers, I have tried the same methods
that you suggested: Parse the street field into several different
fields and use a "perfect match" instead of using Like for partial
match AND indexed the fields being linked.

For experimental purpose, I used a SELECT query instead of a UPDATE
query. They improved the speed performance, but still not fast enough
(3 hours on my duo CPU laptop).

Holding all other things constant/same (same machine spec and same
query), would the query be faster if it is ran on a SQL Server ? Is
Access 2003 still using the same jet engine as SQL Server?
Would using indexing really slow down the update query (as I read from
somewhere)?
In general, would the use of Group By or Distinct slow down the
query? Which is fast - Group By or Distinct?

(I understand that the slow down may be caused by the poor design of
the database and inadequate SQL statement.)

Sorry guys, next time I will be more thoughtful.

To speed this up you need exact matches on fields that are indexed. Parsing the
data up in a SELECT query does not allow you to index on the individual fields
so accomplishes nothing.

You need to create a NEW table where the existing data is split into fields you
can index and then apply exact matching to. Anything less than that and you are
stuck with how long it takes to scan the entire table a row at a time which is
99% driven by the speed of your disk drive access on the PC.

SQL server would not help unless it was on a machine that had faster disk i/o
than the one you have now and you absolutely do NOT want this to run over a
network. That would slow i/o to a point that would be brutal.
--
Rick Brandt, Microsoft Access MVP
Email (as appropriate) to...
RBrandt at Hunter dot com

Jan 27 '07 #16

SQL Learner

Thanks, Rick.

I created a new table for the parsed data. For now, I will continue
to use Access.

- Grasshopper -

Jan 27 '07 #17

Lyle Fairfield

On Jan 27, 10:22 am, "SQL Learner" <excelmodel...@gmail.comwrote:

==================================================
...... LIKE comparison requires a manipulation and
comparison of every record, vastly increasing the time and disk
accesses
required.
==================================================
.........we need to parse the address data, probably into fields
recognized and used by the US Post Office. .........
==================================================

Hi Larry and Lyle,

Thank you. That is the kind of answers (above) I need.

The actually problem is very complex and may take 2000 words to
describe it. People may lose their interest reading such complex spec
(or really ask me to pay $100 per hour - which I definely could not
afford). That is why I would rather present the question in a
simplified way. Once I got some hints, I might be able to solve the
problem myself.

I already got the solutions with the help from another MVP:

UPDATE tblStreet_Value AS S INNER JOIN tblWord_Number AS W ON
S.Street
Like "*" & W.Number & "*" & W.Word & "*" SET S.Value2 = W.Value;

The query works very fast for the sample tables, but when it is
implied to the real data (with 3,200,000 records on tblStreet_Value
and 7,000 on TblWord_Number), it took forever for the result to come
up.

My objective is, as stated in my initial post, to find a way to speed
up the query process. After experimenting with different type of
queries while I am waiting for answers, I have tried the same methods
that you suggested: Parse the street field into several different
fields and use a "perfect match" instead of using Like for partial
match AND indexed the fields being linked.

For experimental purpose, I used a SELECT query instead of a UPDATE
query. They improved the speed performance, but still not fast enough
(3 hours on my duo CPU laptop).

Holding all other things constant/same (same machine spec and same
query), would the query be faster if it is ran on a SQL Server ? Is
Access 2003 still using the same jet engine as SQL Server?
Would using indexing really slow down the update query (as I read from
somewhere)?
In general, would the use of Group By or Distinct slow down the
query? Which is fast - Group By or Distinct?

(I understand that the slow down may be caused by the poor design of
the database and inadequate SQL statement.)

If you've decided to parse the address field why not tell us? Then
give us a few examples of what the text in an address field really
looks like. Then ask, "What's your best solution on how to parse the
address field and populate the (more) atomic fields?" I'm guessng I
would use a UDF with Regular Expressions but others might have better,
faster solutions.

IMO, the notion that indexing will drastically slow down an update
query is greatly exaggerated. Of course, in your case, as you will not
have indexed the new fields (you may want to later) it will not slow
things at all. (The reason an index might slow down an update is that
a new index key and pointer must be written for each new or updated
value; if there's no index on the field; there's no new key and hence,
no slowdown).

SQL-Server is likely to do an update based on an existing field at the
Server level. As I recall, SQL Server 2005 let's us use Regular
Expressions with a T-SQL UDF. If that's the case we should be able to
parse out the data in ten minutes. But I've never done this.
In any case SQL-Server might be faster. As it would do it without
stopping your local machine at all, you are unlikely to notice how
long it takes, unless you are waiting breathlessly for results.

Jan 27 '07 #18

SQL Learner

Thanks, Lyle.

I think the best solution is to parse the string into different fields
and index them. (It is good that none of these fields will be the
updated fields.)

It is quite a challenge to parse the following format
East 123 ABC Street" or "S. 100 Apple Road

into

Field1 Field2 Field3
-------- --------- ----------------
East 123 ABC Street
S. 100 Apple Road

since there is no comment "delimited character" to use.

May be one can use the following logic - split the string using the
first space location before the number (field1), extract the number
using the first and second spaces (field2), and extract the last piece
of data after the last number digest + space (field3). How can I
write T-SQL UDFs for these three fields in Access?

- Grasshopper -

Jan 27 '07 #19

Lyle Fairfield

On Jan 27, 10:29 am, "Rick Brandt" <rickbran...@hotmail.comwrote:

"SQL Learner" <excelmodel...@gmail.comwrote in messagenews:11**********************@s48g2000cws.g ooglegroups.com...

SQL server would not help unless it was on a machine that had faster disk i/o
than the one you have now and you absolutely do NOT want this to run over a
network. That would slow i/o to a point that would be brutal.

Please, expand on this. It seems you are relating SQL Server to
running over a network and suggesting that running the query in SQL-
Server "over a network", whatever that would mean, would be slow. I
doubt very much if that is what you mean.

Jan 27 '07 #20

Larry Linson

"SQL Learner" <ex***********@gmail.comwrote

It is quite a challenge to parse the following format
East 123 ABC Street" or "S. 100 Apple Road

into

Field1 Field2 Field3
-------- --------- ----------------
East 123 ABC Street
S. 100 Apple Road

since there is no comment "delimited character" to use.

Those, of course, are well-structured addresses and, thus, rather easy to
parse. The real problem will be when the addresses aren't well-structured,
and, in real life, they can just be free-form text, such as "Twelve
Intergalactic Circle" or "Worthington House" or "One Microsoft Way" or "9876
125th Street SW" or any of many variations.

May be one can use the following logic - split the string using the
first space location before the number (field1), extract the number
using the first and second spaces (field2), and extract the last piece
of data after the last number digest + space (field3).

Implemented correctly, this would work nicely on the well-structured
examples. It wouldn't work so well on free-form addresses, which you are
entirely likely to encounter.

How can I write T-SQL UDFs for these
three fields in Access?

T-SQL (full name Transact SQL) is a feature of Microsoft SQL Server, a
language for writing stored procedures, and triggers. None are supported
in the Jet database engine that is, by default, used with Access. So the
answer is, "You can't."

On the other hand, you can write user-defined-functions (often abbreviated
as UDF) in Access, and call them from Jet SQL. Just bear in mind that,
using Jet, all the work is done on the user's machine.

Using Microsoft SQL Server, a substantial part of the work may be done on
the server, if you have studied, learned well, and properly designed and
implemented stored procedures and triggers on the SQL Server -- which is a
non-trivial skill that will have to be learned, practiced, and perfected in
addition to Access. It is only if you do that study, learning, design, and
implementation that using SQL Server is likely to give you a performance
increase. It is quite common for users to convert their application data
storage to linked server tables, and discover that the application runs
slower rather than faster.

A well-designed, well-implemented single-user standalone database, just
split for multi-user access does not necessarily make a well-designed,
well-implemented multi-user database. A well-designed, well-implemented
multi-user database, with its back-end tables just moved to a server DB does
not necessarily make a well-designed, well-implemented client-server
database. You have to understand how Access works in each environment in
order to take advantage of the benefits of that environment, and to avoid
the stumbling blocks it may put in your path.

Larry Linson
Microsoft Access MVP

Jan 27 '07 #21

Rick Brandt

"Lyle Fairfield" <ly***********@aim.comwrote in message
news:11**********************@a75g2000cwd.googlegr oups.com...

On Jan 27, 10:29 am, "Rick Brandt" <rickbran...@hotmail.comwrote:
>"SQL Learner" <excelmodel...@gmail.comwrote in
messagenews:11**********************@s48g2000cws. googlegroups.com...

>SQL server would not help unless it was on a machine that had faster disk i/o
than the one you have now and you absolutely do NOT want this to run over a
network. That would slow i/o to a point that would be brutal.

Please, expand on this. It seems you are relating SQL Server to
running over a network and suggesting that running the query in SQL-
Server "over a network", whatever that would mean, would be slow. I
doubt very much if that is what you mean.

I meant that if run in Access/Jet you would want the file to be local. A
network wouldn't matter to SQL Server and would be expected.

--
Rick Brandt, Microsoft Access MVP
Email (as appropriate) to...
RBrandt at Hunter dot com

Jan 27 '07 #22

SQL Learner

Hi Larry,

I am not using SQL Server as the backend and no network is involved.
I have the tables all in Access. The reason why the database is not
well designed is because I just took the data from csv files and
dumpped them into Access. I did not design the database.

You raised a good print regarding the free form address. We would try
to convert all the "not well structured" data into a better structured
data before we run our update query. For example, we would
standardized all the words used in the addresses by, for example,
convert "E." or "E" to "East" or "Five" to "5".

The objective is to get as much update as we can. The "Left over"
will need to be handled manually.
So given these fact and assume that there will be only two type of
structures: "East 123 ABC Street" and "100 Apple Road", how can I
write UDFs to parse them into

into the following:

Field1 Field2 Field3
-------- --------- ----------------
East 123 ABC Street
100 Apple Road
- Grasshopper -

Jan 27 '07 #23

Lyle Fairfield

On Jan 27, 4:33 pm, "SQL Learner" <excelmodel...@gmail.comwrote:

I am not using SQL Server as the backend and no network is involved.
I have the tables all in Access. The reason why the database is not
well designed is because I just took the data from csv files and
dumpped them into Access. I did not design the database.

It's quite likely that you will have to make several passes to
"stroke" your data.

Here's a start in a way I would try. It might not do what is needed.
It might need revision.
It's intended to filter the largest (in characters) numeral in an
address field and to put it into a (type long integer) StreeNumber
field. It's intended to put a null in the field if there is no number
in the address fields, in case some previous effort has made it zero,
or something else that's inappropriate.

For northwinds 830 Customer records this procedure takes 0.01 (one one
hundredth) seconds with my Intel Pntium M 1.5 Processor and 512 meg of
ram. Part of that is taken initializing the Regular expression so we
might expect it to reduce its average time per record as the number of
records increases. But there's no guarantee of anything like that.
Access and VBA and JEt are mysterious beasts and there's no guarantee
that at record 831 there will not be some problem with a cache, or a
disk write or read or whatever and the whole thing will slow to a
crawl.

This is not old proven code. It's new and has been created right now
while I am packing (for that vacation I mentioned). It's very likely
that it can be improved.

If you decide to run it though, please run the whole thing not part of
it. It's unlikely that part of it will behave well. If you are a Reg
Exp/ SQL fiend go right ahead and inporve it. I'll be happy. As for
the other parts, (they're what re(1) to re(3) are for) you'll need
someone else to step up to the plate. I'll be surf boarding.

Private Declare Function GetTickCount& Lib "kernel32" ()

Dim re(0 To 3) As Object
Dim matches As Object
Dim match As Object

Public Sub ParseAddress()
Dim t As Long
t = GetTickCount
On Error GoTo ParseStreetNumberErr
InitializeRegularExpressions
DBEngine(0)(0).Execute "UPDATE Customers SET StreetNumber =
GetStreetNumber(Address)"
Debug.Print GetTickCount - t
ParseStreetNumberExit:
TerminateRegularExpressions
Exit Sub
ParseStreetNumberErr:
MsgBox "Error Number: " & Err.Number & vbNewLine &
Err.Description, vbCritical
Resume ParseStreetNumberExit
End Sub

Private Sub InitializeRegularExpressions()
Dim z As Long
For z = 0 To 3
Set re(z) = CreateObject("VBScript.RegExp")
re(0).Pattern = "\d+"
re(0).Global = True
Next z
End Sub

Private Sub TerminateRegularExpressions()
Erase re
End Sub

Public Function GetStreetNumber(ByVal vAddress$)
Dim temp$
Dim streetNumber$
Set matches = re(0).Execute(vAddress)
For Each match In matches
temp = match.Value
If Len(temp) Len(streetNumber) Then _
streetNumber = temp
Next match
If Len(streetNumber) 0 Then
GetStreetNumber = CLng(streetNumber)
Else
GetStreetNumber = Null
End If
End Function

Jan 27 '07 #24

Larry Linson

Looks to me as if Lyle went out of his way to code and test something for
you. I'm not anywhere near as familiar with Regular Expressions as he is.
Thus, I think I will just say that Lyle is a highly experienced, very
capable developer in Access and other development environments. I suggest
you carefully review, and try, what he provided.

Larry Linson
Microsoft Access MVP

"SQL Learner" <ex***********@gmail.comwrote in message
news:11**********************@q2g2000cwa.googlegro ups.com...

Hi Larry,

I am not using SQL Server as the backend and no network is involved.
I have the tables all in Access. The reason why the database is not
well designed is because I just took the data from csv files and
dumpped them into Access. I did not design the database.

You raised a good print regarding the free form address. We would try
to convert all the "not well structured" data into a better structured
data before we run our update query. For example, we would
standardized all the words used in the addresses by, for example,
convert "E." or "E" to "East" or "Five" to "5".

The objective is to get as much update as we can. The "Left over"
will need to be handled manually.
So given these fact and assume that there will be only two type of
structures: "East 123 ABC Street" and "100 Apple Road", how can I
write UDFs to parse them into

into the following:

Field1 Field2 Field3
-------- --------- ----------------
East 123 ABC Street
100 Apple Road
- Grasshopper -

Jan 27 '07 #25

Lyle Fairfield

Not surf boarding if there is such a thing ... I'm afraid of the
surf ... sail boarding or wind surfing.

Jan 27 '07 #26

Gord

SQL Learner said:

>So given these fact and assume that there will be only two type of
structures: "East 123 ABC Street" and "100 Apple Road", how can I
write UDFs to parse them into

into the following:

Field1 Field2 Field3
-------- ------ ----------------
East 123 ABC Street
100 Apple Road

and then Lyle Fairfield said:

>Here's a start in a way I would try.

[suggested a solution using Regular Expressions]

As for the other parts, (they're what re(1) to re(3) are for) you'll
need someone else to step up to the plate.

I'll push it ahead a bit further by splitting up the three parts.
(Note that it does not find the number with the most digits as Lyle's
code does, it just finds the first number in the string.)
Sub foo()
' requires reference to:
' Microsoft VBScript Regular Expressions 5.5

' output:
'
' "East","123","ABC Street"
' "","100","Apple Road"
' "","1410","33rd Ave NW"

Dim re As RegExp, matches As MatchCollection, match As match
Dim TestValues As Variant, AddrLine As Variant

TestValues = Array( _
"East 123 ABC Street", _
"100 Apple Road", _
"1410 33rd Ave NW")

For Each AddrLine In TestValues
Set re = New RegExp
' late-binding syntax is...
'Set re = CreateObject("VBScript.RegExp")

re.Pattern = "^(.*?)(\d+)\s+(.*)$"
Set matches = re.Execute(AddrLine)
If matches.Count = 0 Then
Debug.Print "[no match]"
Else
For Each match In matches
Debug.Print """" & Trim(match.SubMatches(0)) & _
""",""" & Trim(match.SubMatches(1)) & _
""",""" & Trim(match.SubMatches(2)) & _
""""
Next
End If

Set match = Nothing
Set matches = Nothing
Set re = Nothing
Next
End Sub

On Jan 27, 5:58 pm, "Lyle Fairfield" <lylefairfi...@aim.comwrote:

On Jan 27, 4:33 pm, "SQL Learner" <excelmodel...@gmail.comwrote:

I am not using SQL Server as the backend and no network is involved.
I have the tables all in Access. The reason why the database is not
well designed is because I just took the data from csv files and
dumpped them into Access. I did not design the database.It's quite likely that you will have to make several passes to
"stroke" your data.

Here's a start in a way I would try. It might not do what is needed.
It might need revision.
It's intended to filter the largest (in characters) numeral in an
address field and to put it into a (type long integer) StreeNumber
field. It's intended to put a null in the field if there is no number
in the address fields, in case some previous effort has made it zero,
or something else that's inappropriate.

For northwinds 830 Customer records this procedure takes 0.01 (one one
hundredth) seconds with my Intel Pntium M 1.5 Processor and 512 meg of
ram. Part of that is taken initializing the Regular expression so we
might expect it to reduce its average time per record as the number of
records increases. But there's no guarantee of anything like that.
Access and VBA and JEt are mysterious beasts and there's no guarantee
that at record 831 there will not be some problem with a cache, or a
disk write or read or whatever and the whole thing will slow to a
crawl.

This is not old proven code. It's new and has been created right now
while I am packing (for that vacation I mentioned). It's very likely
that it can be improved.

If you decide to run it though, please run the whole thing not part of
it. It's unlikely that part of it will behave well. If you are a Reg
Exp/ SQL fiend go right ahead and inporve it. I'll be happy. As for
the other parts, (they're what re(1) to re(3) are for) you'll need
someone else to step up to the plate. I'll be surf boarding.

Private Declare Function GetTickCount& Lib "kernel32" ()

Dim re(0 To 3) As Object
Dim matches As Object
Dim match As Object

Public Sub ParseAddress()
Dim t As Long
t = GetTickCount
On Error GoTo ParseStreetNumberErr
InitializeRegularExpressions
DBEngine(0)(0).Execute "UPDATE Customers SET StreetNumber =
GetStreetNumber(Address)"
Debug.Print GetTickCount - t
ParseStreetNumberExit:
TerminateRegularExpressions
Exit Sub
ParseStreetNumberErr:
MsgBox "Error Number: " & Err.Number & vbNewLine &
Err.Description, vbCritical
Resume ParseStreetNumberExit
End Sub

Private Sub InitializeRegularExpressions()
Dim z As Long
For z = 0 To 3
Set re(z) = CreateObject("VBScript.RegExp")
re(0).Pattern = "\d+"
re(0).Global = True
Next z
End Sub

Private Sub TerminateRegularExpressions()
Erase re
End Sub

Public Function GetStreetNumber(ByVal vAddress$)
Dim temp$
Dim streetNumber$
Set matches = re(0).Execute(vAddress)
For Each match In matches
temp = match.Value
If Len(temp) Len(streetNumber) Then _
streetNumber = temp
Next match
If Len(streetNumber) 0 Then
GetStreetNumber = CLng(streetNumber)
Else
GetStreetNumber = Null
End If
End Function

Jan 29 '07 #27

SQL Learner

Thank you for your code, Lyle!

I tried to use it but the GetStreetNumber function kept returned a
null value and hence, the Update SQL

DBEngine(0)(0).Execute "UPDATE Customers SET StreetNumber =
GetStreetNumber(Address)"

cannot be executed properly.

Also, let's assume the function does return value other than null (or
empty), the SQL statement will only update the records based on that
particular value, but how about others? In addition, do we need a
loop to loop through all the records? May be I am just not getting
your code logic right. Please advice.

- Grasshopper -

Jan 29 '07 #28

Similar topics