473,378 Members | 1,482 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,378 software developers and data experts.

query Double Id's on partial match

Good afternoon all,

I have a database with a lot of fields.
One of the fields is a UserId. That should be unique and is also the only key in the table.
Unfortunately multiple Id's are created for the same person.
Example Data:
Expand|Select|Wrap|Line Numbers
  1. B092840  Bert Kruis
  2. 4444         P.Plank
  3. 92840        B. Kruis
  4. 12345_EP C.Baker
  5. EP92840  Kruis B.
  6. 4444_FC  Plank P.
A want to have a list that shows only the double UserIds
Expand|Select|Wrap|Line Numbers
  1. UserId   UserName1   UserId2  UserName2
  2. 92840    B. Kruis    B092840  Bert Kruis
  3. 92840    B. Kruis    EP92840  Kruis B.
  4. 4444     P.Plank     4444_FC  Plank P.
I've created a query/ SQL that didnot work.
Or it showed every record (approx 90.000 users) or it showed nothing. So i duplicated the table and tried to write a query again. That didnot work either.

Here is what I've created.
Expand|Select|Wrap|Line Numbers
  1. SELECT TblUSR02.BNAME, USR02.BNAME
  2. FROM USR02 INNER JOIN TblUSR02 ON USR02.BNAME = TblUSR02.BNAME
  3. WHERE (((USR02.BNAME) Like "*” & [TblUSR02]![BNAME] &”*" And (USR02.BNAME) In (SELECT [BNAME] FROM [USR02] As Tmp GROUP BY [BNAME] HAVING Count(*)>1 )))
  4. ORDER BY USR02.BNAME;
Can anyone advise how to solve this. (hopefully without having duplicating the table).

Many Thanks
May 20 '16 #1
8 1152
Seth Schrock
2,965 Expert 2GB
The problem is that, while we humans know that Bert Kruis is probably the same person as Kruis B. and B. Kruis, there isn't a way for the computer to know that. You would need some other field like SSN so that the computer can perform the match.
May 20 '16 #2
NeoPa
32,556 Expert Mod 16PB
As Seth has said, the way you've stored the data means there is no reliable way to identify which items match.

In order to program a computer to do any work, you must first be able to explain how the work can be done - even as a human. I see no way from the information and data supplied that I could even do it reliably as a human, so telling the computer how to do it would be impossible.

I notice that the IDs include a numeric value that would match, but that is stored in a way that is not predictable or able to be described precisely. if you had this information stored in a practical way then the task would be easy. Do you have that number stored elsewhere? If so, then that would have been helpful and important to include in your question. If not then you've probably just learned the importance of proper table design the hard way.

You will surely benefit from checking out Database Normalisation and Table Structures.
May 20 '16 #3
PhilOfWalton
1,430 Expert 1GB
How many users?
What is the significance of the UserID with a version of a name and a number with or without letters? What is each bit for?

My instinct says that a user should be picked from a combo box and that the rest of your UserID should be in a separate field.

Phil
May 20 '16 #4
NeoPa
32,556 Expert Mod 16PB
As a further point, your use of ” in your SQL, instead of ", indicates that you may have been working on it in a word processor - rather than a text editor. That is a very bad idea as these characters, and various others that may be substituted, may look similar but are not recognised by SQL - of pretty well any flavour - as meaning the same thing.

Also, the & concatenation character needs to have spaces around it to be recognised for what it is. Many spaces are optional in SQL. These aren't.
May 20 '16 #5
Thanks Phil and NeoPa.
I just extracted the list (about 90.000 users) from a application to identify the access the users have. By analyzing the duplicate Id's(numbers)I'm able to analyze any SoD. So i didnot build a access database but I'm using a database for querying and cleansing Users (Id's).
May 23 '16 #6
PhilOfWalton
1,430 Expert 1GB
Sorry to say I agree with Seth & NeoPa. I don't think it can be done without some form of structure.

Phil
May 23 '16 #7
jforbes
1,107 Expert 1GB
This is a situation where a computer program can only help with solving a problem. When you are attempting to scrub information that is this dissimilar, the only reliable way I know to perform this is to have a routine find possible matches and then display them to a user who makes the decision on what to do with the information (Merge, Delete, Add, or Ignore).

Also, it's much easier to start with a list of reliable known values and then compare against it with the data to be scrubbed. In your case, there is most likely a reliable users table available, that you can compare the 90,000 records against. If not, then I would create one, from your data, like by extracting all the rows with a UserID that is strictly numeric.

To determine what records to show the user, you would need a select statement to pull out anything that could be a match.
You could try some basic SQL like statements or some variations based on patterns you can predetermine. You could also look at these two articles as they might find results in a manner that a basic pattern wouldn't:
Levenshtein Approximate String Matching
NGram Approximate String Matching
May 23 '16 #8
PhilOfWalton
1,430 Expert 1GB
Having re-read the posts and mulled things over, I think there is some confusion.
I had a feeling that the original post had no spacing (or possibly 1 space) between the fields. I have a feeling this has been edited by placing it between the code tags and adding space. If so can we have the unedited version.
Secondly at one point you say there are about 90,000 records and elswhere you say there are about 90,000 users.
In the example given of 6 records there appear to be only 3 users - Kruis, Plank & Baker.
I note also on the example given that a person's initial is always followed by a full stop (sometimes with a space as well) Is this always the situation.
Can you tell us a bit more about the non-name part of the field i.e.
4444, EP92840, 12345_EP. They can't just be random numbers and letters: there must be some rules and logic in how they are derived.

IMO there is no hope of making progress until we have a reasonable chance of detecting the user's name.

Phil
May 23 '16 #9

Sign in to post your reply or Sign up for a free account.

Similar topics

2
by: Christian Staffe | last post by:
Hi, I would like to check for a partial match between an input string and a regular expression using the Regex class in .NET. By partial match, I mean that the input string could not yet be...
2
by: Andrew Chalk | last post by:
I have a customer who wants to SELECT records based on a partial match in a text field. For example, in a list of telephone numbers they want to search for all records that contain the digits '777'...
4
by: wecka | last post by:
Hello All, Does any one know how to use the form filter in Access to provide partial match? When you choose to "form filter," all text boxes will convert to combo boxes where you can choose from...
3
by: danielleissy | last post by:
Firstly i would like to know the difference between Keyword searching using SQL and partial matching of a search. I have to search for a book by key word in the book title. But allow TWO key words...
4
natalie99
by: natalie99 | last post by:
Hi All This may be a very basic question, please bare with me I am learning! I have two sets of data, one is the baseline of a Billing Inventory, the other is a seperate data set from the Live...
16
by: vorlonfear | last post by:
I have been working on this for a while now and I wanted to see if someone could assist me. I have 2 tables each with 5 fields. 4 of the fields are 2 character strings, then the final field is the...
4
by: Pat Rowe | last post by:
Relative Newbie w/ a stumper! Any help is appreciated. Thanks! I have two tables that I have imported to access. I'd like to query between these two tables for records that have a shared last...
4
Fspinelli
by: Fspinelli | last post by:
I don't know any other way to explain this but: I have a query by form (queries my customer table) that by entering a few letters of a company's name in the search field will bring up results in...
1
by: TJB1 | last post by:
I have a Master Table (tbl1) with part numbers (PNs)and a table of PNs where the single character suffix was dropped (tbl2). I need to create a query where the PN characters in tbl2 match the PN...
2
by: TNewGuyDB | last post by:
Sorry if someone has answered this question but I was not able to find it by searching the forums or google and trust me I spent hours trying to find an answer. I am really a newbie at access and...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.