473,320 Members | 1,699 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

near duplicates

I'm new to MS Access (97) and I need for a query to find *near*
matches (not exact duplicates) mainly to weed out duplicates that may
contain similar erronious data.

EG Table1

Name Company Town
Alan ABC Alphaville
Brian B inc. Beta City
Calan ABC Alpaville

where the result would return

Alan
Calan

when searching by "name" or

Alphaville
Alpaville

when doing a near match query by "town"

Or if there is a way that that the whole table could be searched to
find near matches that would help.
Thank you.

Nov 13 '05 #1
3 2710
duck, duck, goose! "Quack-boy"!!
your skills must not be all they're 'quacked-up' to be!!!

hint, hint: 'basic pattern matching 101' for query criteria.

(...seriously though, see the help for query criteria examples...)
-PT

Quack Boy <qu***@nottelling.co.uk> wrote in message news:<nr********************************@4ax.com>. ..
I'm new to MS Access (97) and I need for a query to find *near*
matches (not exact duplicates) mainly to weed out duplicates that may
contain similar erronious data.

EG Table1

Name Company Town
Alan ABC Alphaville
Brian B inc. Beta City
Calan ABC Alpaville

where the result would return

Alan
Calan

when searching by "name" or

Alphaville
Alpaville

when doing a near match query by "town"

Or if there is a way that that the whole table could be searched to
find near matches that would help.
Thank you.

Nov 13 '05 #2
Quack Boy <qu***@nottelling.co.uk> wrote in message news:<nr********************************@4ax.com>. ..
I'm new to MS Access (97) and I need for a query to find *near*
matches (not exact duplicates) mainly to weed out duplicates that may
contain similar erronious data.

EG Table1

Name Company Town
Alan ABC Alphaville
Brian B inc. Beta City
Calan ABC Alpaville

where the result would return

Alan
Calan

when searching by "name" or

Alphaville
Alpaville

when doing a near match query by "town"

Or if there is a way that that the whole table could be searched to
find near matches that would help.
Thank you.

You can use either SimilCmp or LD both included, SimilCmp is faster.
have fun.

tom

Public Function LD(ByVal s As String, ByVal t As String) As Integer
Dim d() As Integer ' matrix
Dim ks As Integer ' length of Known word
Dim ms As Integer ' length of Match word
Dim ki As Integer ' Known word index
Dim mi As Integer ' Match word index
Dim kc As String ' Known character
Dim mc As String ' Match character
Dim cost As Integer ' cost
Dim a As Integer '
Dim b As Integer '
Dim c As Integer '

' Step 1
ks = Len(s) ' set Known word length
ms = Len(t) ' set Match word length
If ks = 0 Then ' if no Known word
LD = ms ' return length of Match
word
Exit Function ' bail out
End If ' if no Known word
If ms = 0 Then ' if no Match word
LD = ks ' return length of Known
word
Exit Function ' bail out
End If ' if no Match word
ReDim d(0 To ks, 0 To ms) As Integer ' build matrix based on
length of Known and Match words

' Step 2
For ki = 0 To ks ' Known word loop
d(ki, 0) = ki ' load Known word index into
matrix, vertically
Next ki ' next index of known word

For mi = 0 To ms ' Known Match loop
d(0, mi) = mi ' load Match word index into
matrix, horizontally
Next mi ' next index of Match word

For ki = 1 To ks ' Known word loop
kc = Mid$(s, ki, 1) ' get character of Known
word
For mi = 1 To ms ' Match word loop
mc = Mid$(t, mi, 1) ' get character of Match
word
If kc = mc Then ' if character from Known
and Match words NOT the same
cost = 0 ' set cost to 0
Else ' if character from Known
and Match words the same
cost = 1 ' set cost to 0
End If ' if character from Known
and Match words the same
a = d(ki - 1, mi) + 1
b = d(ki, mi - 1) + 1
c = d(ki - 1, mi - 1) + cost
If b < a Then a = b
If c < a Then
d(ki, mi) = c
Else
d(ki, mi) = a
End If

' d(ki, mi) = Minimum(d(ki - 1, mi) + 1, d(ki, mi - 1) + 1, d(ki -
1, mi - 1) + cost)
Next mi ' next Match word character
Next ki ' next Known word character

LD = d(ks, ms) ' return distance
Erase d ' free array

End Function
Public Function SimilCmp(Known As String, Match As String) As Single
Dim k() As Byte ' Known word array
Dim m() As Byte ' Match word array
Dim ks As Integer ' Known string length
Dim ms As Integer ' Match string length

ks = Len(Known): ms = Len(Match) ' initialize Known and
Match string lengths
If ks = 0 Or ms = 0 Then Exit Function ' if Known or Match are
empty, return 0%
If Known = Match Then ' if strings the same
SimilCmp = 100 ' return 100%
Else ' if strings are NOT the
same
k = StrConv(UCase(Known), vbFromUnicode) ' load Known array
m = StrConv(UCase(Match), vbFromUnicode) ' load Match array
SimilCmp = GcSubStr(k(), 0, ks, m(), 0, ms) ' get number of matching
characters
SimilCmp = (200 * SimilCmp / (ks + ms)) ' calculate precentage
Erase k ' free array resources
Erase m ' free array resources
End If ' if strings match

End Function

Private Function GcSubStr(k() As Byte, ko As Integer, ks As Integer,
m() As Byte, mo As Integer, ms As Integer) As Long
Dim ki As Integer ' Known string index
Dim mi As Integer ' Match string index
Dim kn As Integer ' Known next index
Dim mn As Integer ' Match next index
Dim kl As Integer ' Known next length
Dim ml As Integer ' Match next length
Dim cc As Integer ' current concurrent
character count
Dim ci As Integer ' concurrent character
index

' If no more substing to compare, return 0
If (ks <= ko Or ms <= mo) Then Exit Function ' if Known or Match
index is past end of string
If (ks = ko + 1 And ms = mo + 1) Then Exit Function ' if last
character of Known and Match strings
cc = 0: kl = ks: ml = ms: ' initialize concurrent
character count and string lengths

ki = ko ' initialize Known index
While (ki < kl) ' Known string loop
mi = mo ' initialize Match index
While (mi < ml) ' Match string loop
If (k(ki) = m(mi)) Then ' if a character the
same in both strings
ci = 1 ' initialize concurrent
character index
Do Until ((ki + ci >= kl) Or (mi + ci >= ml)) ' while things
match, keep trying...
If (k(ki + ci) <> m(mi + ci)) Then Exit Do ' if things NOT
matching, break loop
ci = ci + 1 ' bump concurrent
character count
Loop ' next character match
If ci > cc Then ' if concurrent count >
greatest count
cc = ci: kn = ki: mn = mi ' save indexes and
greatest count
kl = ks - cc: ml = ms - cc ' save indexes and
greatest count
End If ' if concurrent count >
greatest count
End If ' if a character the
same in both strings
mi = mi + 1 ' bump Match index
Wend ' next Match character
ki = ki + 1 ' bump Known index
Wend ' next Known character
If cc = 0 Then Exit Function ' return 0
cc = cc + GcSubStr(k(), kn + cc, ks, m(), mn + cc, ms) ' check right
hand side
cc = cc + GcSubStr(k(), ko, kn, m(), mo, mn) ' check left hand side
GcSubStr = cc ' return matched
character count of substrings
End Function
Nov 13 '05 #3
On 19 Jul 2004 23:48:22 -0700, tw*@gate.net (Tom Warren) wrote:


You can use either SimilCmp or LD both included, SimilCmp is faster.
have fun.


SimilCmp works just fine. Thank you.
Nov 13 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
by: Marlene | last post by:
Hi All I have the following scenario, where I have found all the duplicates in a table, based on an order number and a part number (item).I might have something like this: Order PODate Rec...
16
by: tyrfboard | last post by:
I've been searching for awhile now on how to remove duplicates from a table within an Access db and have found plenty of articles on finding or deleting duplicates. All I want to do is remove them...
6
by: Matej Cepl | last post by:
Hi, I am somehow experienced working with databases (mainly PostgreSQL on Linux) and with scripting (bash, Python, Basic), but total newbie when it comes to Access. However, in this temp job I...
3
by: AK | last post by:
Hi Our product uses MS-SQL Server 2000. One of our customer has 10 installations with each installation stroring data in its own database. Now the customer wants to consolidate these databases...
7
by: vsgdp | last post by:
I have a container of pointers. It is possible for two pointers to point to the same element. I want to remove duplicates. I am open to which container is best for this. I thought of using...
14
by: ak | last post by:
Is it possible to find repeated(duplicate) element in an array in single loop ? AK
3
by: ryan.paquette | last post by:
In the table there are 2 fields in which I wish to limit (i.e. No Duplicates) Although I do not want to limit them to "No Duplicates" separately. I need them to be limited to "No Duplicates" as...
1
by: tskmjk55 | last post by:
Recently, I have a requirement to develop a vb.net application wherein the input excel sheet data which has an average of 5000 records should be checked for Internal duplicates (duplicates within the...
3
Thekid
by: Thekid | last post by:
I'm trying to figure out a way to find if there are duplicates in an array. My idea was to take the array as 'a' and make a second array as 'b' and remove the duplicates from 'b' using 'set' and then...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.