How do I remove text that was duplicated within a field. For example:
Series
---------
AC Delco AC Delco Taper Spoiler
Bendix Bendix High Performance Altinators
B&I B&I Headlamps
Bridgestone Firestone Birdgestone Firestone AllWeather Tires
21 12919
You could split the string by the space character and then loop through the array looking for distinct words.
How do I go about doing that in sql. I have a process in java script but I can't get it to work in Access. - Function removeDuplicates (field) {
-
var temp = field.value;
-
var array = temp.split(" ");
-
array.sort();
-
temp = array.join(" ");
-
-
do {
-
var newTemp = temp;
-
var temp = newTemp.replace(/\s(\w+\s)\1/, " $1");
-
} while (temp.length != newTemp.length);
-
-
temp = temp.replace(/^(\w+\s)\1/, "$1");
-
temp = temp.replace(/(\s\w+)\1$/, "$1");
-
-
var orig = field.value.split(" ");
-
var finalStr = "";
-
for (var i=0; i<orig.length; i++) {
-
if (temp.indexOf(" " + orig[i] + " ") != -1) {
-
finalStr += orig[i] + " ";
-
temp = temp.split(" " + orig[i] + " ").join(" ");
-
} else if ((temp.indexOf(orig[i]) != -1) && (temp.indexOf(" "
-
+ orig[i]) == (temp.length-orig[i].length-1))) {
-
finalStr += orig[i] + " ";
-
temp = temp.substring(0, (temp.length-orig[i].length-1));
-
} else if (temp.indexOf(orig[i] + " ") == 0) {
-
finalStr += orig[i] + " ";
-
temp = temp.substring(orig[i].length+1, temp.length);
-
} else if (temp == orig[i]) {
-
finalStr += orig[i];
-
temp = "";
-
}
-
}
-
-
if (finalStr.substring(finalStr.length-1, finalStr.length) == " ") {
-
finalStr = finalStr.substring(0, finalStr.length-1);
-
}
-
field.value = finalStr;
-
}
This would be a starter. i.e. remove successive words from string: - Function removeSuccessingWordInString(val As String) As String
-
Dim sVal, s As Variant
-
Dim p As Integer
-
Static cleanString As String
-
sVal = Split(val, " ")
-
For p = 0 To UBound(sVal) - 1
-
If sVal(p) = sVal(p + 1) Then
-
cleanString = Trim(Replace(val, sVal(p), "", 1, 1))
-
removeSuccessingWordInString cleanString
-
End If
-
Next p
-
removeSuccessingWordInString = cleanString
-
End Function
Just as a comment:
This appears to be the concatenation of 3 seperate fields from 3 separate tables. If you have control over the concatentation, then it would be easier to rewrite that then try an remove duplication.
If you're getting this from an external source, you might try asking them to modify the concatenation.
NeoPa 32,499
Expert Mod 16PB
Bobbie, why not explain in words (preferably as part of the original question, but failing that now) what it is you want, rather than dumping some code (in a language other than VBA) for us to try to determine what you should have asked.
I see no easy Method to accomplish this. The best results that I can come up with is the following, using a very unorthodox approach. Notice the last 2 Records: -
Series No_Dups
-
AC Delco AC Delco Taper Spoiler AC Delco Taper Spoiler
-
Bendix Bendix High Performance Altinators Bendix High Performance tinators
-
B&I B&I Headlamps B&I Headlamps
-
Bridgestone Firestone Bridgestone Firestone Bridgestone Firestone
-
AllWeather Tires AllWeather Tires
-
BIG KMART BIG KMART BIG KMART BIG KMART BIG KMART
-
TJ Max TJ Thompsom TJ Max Thompsom
-
Bill Clinton Hillary Clinton George Bush Bill Clinton George Bush Bill Clinton Hillary George Bush
-
@NeoPa
I have an Acces table of fields and one of the fields is concatenated and unfortunately one of the fields was concatenated twice. I don't have the data in from the separate fields otherwise I will simply fix the concatenated field. So I'm stuck with this field to fix. I can't use spaces or drop the first or second word because this problem of duplicate words isn't in all the fields.
So I was looking for a way using an access query to remove the duplicae words. The code I attached earlier I found on the internet and looks like it would work but it's in Java.
NeoPa 32,499
Expert Mod 16PB
I recognise that at least as an attempt to specify the problem Bobbie (which is a good step forward :-)). Specifications are often a problem to get right, so all we expect is an honest attempt.
Now let's see if we can tidy it up a little. Correct me if I go astray. - You have a table within which is a single field which was previously put together by concatenating various separate data strings.
- Unfortunately, the first of these values was included twice in the resultant string.
- The string values were originally joined together using spaces to separate them.
- Any or all of the values may, but don't necessarily have to, contain embedded spaces within themselves.
- So, we have a string field in a table which has a duplication at the start which may consist of one or more words separated in all cases by spaces.
- The only indication we have that an original value may have ended and a new one begun (within the resultant data we're working with) is the occurrence of a space character.
- You need to determine some logic first, and later some code that implements that logic, to handle de-duplication of the first value added to the string.
Thanks to NeoPa's clarification, the following Function, when used in a Query (Calculated Field), will handle 'Start of Field Duplication' for a Maximum of 6 Words. It can easily be expanded if so desired. I ran it against Test Data, some of it yours, and I'll post the Code and results below: - Public Function fCheckForDupes(strSeries As String)
-
Dim varDups As Variant
-
Dim intNumOfWords As Integer
-
Dim strRetVal As String
-
-
varDups = Split(strSeries, " ")
-
-
intNumOfWords = UBound(varDups) + 1
-
-
Select Case intNumOfWords
-
Case 1 '1 Word, return Word
-
strRetVal = varDups(0)
-
Case 2 '2 Words, check for equality
-
If varDups(0) = varDups(1) Then 'Equal, drop the 2nd Word
-
strRetVal = varDups(0)
-
Else '2 Words, not equal, return original String
-
strRetVal = varDups(0) + " " & varDups(1)
-
End If
-
Case 3 'Is the 1st equal to the 2nd Word?
-
If varDups(0) = varDups(1) Then '1st & 2nd equal, Drop 2nd
-
strRetVal = varDups(0) & " " & varDups(2)
-
Else 'No equality, return entire String
-
strRetVal = varDups(0) + " " & varDups(1) & " " & varDups(2)
-
End If
-
Case 4 'Is the 1st = the 2nd OR are the 1st and 2nd = 3rd and 4th?
-
If varDups(0) = varDups(1) Then 'Drop the 2nd
-
strRetVal = varDups(0) & " " & varDups(2) & " " & varDups(3)
-
ElseIf (varDups(0) = varDups(2)) And (varDups(1) = varDups(3)) Then
-
strRetVal = varDups(0) & " " & varDups(1) 'Drop 3rd and 4th
-
Else
-
strRetVal = strSeries
-
End If
-
Case 5
-
If varDups(0) = varDups(1) Then 'Drop the 2nd
-
strRetVal = varDups(0) & " " & varDups(2) & " " & varDups(3) & _
-
" " & varDups(4)
-
ElseIf (varDups(0) = varDups(2)) And (varDups(1) = varDups(3)) Then
-
strRetVal = varDups(0) & " " & varDups(1) & " " & varDups(4) '
-
Else
-
strRetVal = strSeries
-
End If
-
Case 6
-
If varDups(0) = varDups(1) Then
-
strRetVal = varDups(0) & " " & varDups(2) & " " & varDups(3) & _
-
" " & varDups(4) & " " & varDups(5)
-
ElseIf (varDups(0) = varDups(2)) And (varDups(1) = varDups(3)) Then
-
strRetVal = varDups(0) & " " & varDups(1) & " " & varDups(4) & _
-
" " & varDups(5)
-
ElseIf (varDups(0) = varDups(3)) And (varDups(1) = varDups(4)) And _
-
(varDups(2) = varDups(5)) Then
-
strRetVal = varDups(0) & " " & varDups(1) & " " & varDups(2)
-
Else
-
strRetVal = strSeries
-
End If
-
Case Else
-
'If all else fails, return the Original String
-
strRetVal = strSeries
-
End Select
-
-
fCheckForDupes = strRetVal
-
End Function
- Series New_Series
-
AC Delco AC Delco Taper Spoiler AC Delco Taper Spoiler
-
Bendix Bendix High Performance Altinators Bendix High Performance Altinators
-
B&I B&I Headlamps B&I Headlamps
-
Bridgestone Firestone Bridgestone Firestone Bridgestone Firestone
-
AllWeather Tires AllWeather Tires
-
Encyclopedia Encyclopedia
-
One Two Three One Two Three
-
YaDa YaDa YaDa
-
One One Three Four One Three Four
-
One Two One Two Five One Two Five
-
One Two Three Four Five One Two Three Four Five
-
One Two Three Four One Two Three Four
-
One One Three Four Five Six One Three Four Five Six
-
One Two Three One Two Three One Two Three
-
1 2 3 4 5 6 1 2 3 4 5 6
NeoPa 32,499
Expert Mod 16PB
My understanding is that strings with no duplications at the start are not possible in this scenario, but let's wait for confirmation of the spec before taking the next step (at least that's how I prefer to play, personally).
@NeoPa:
My understanding is that strings with no duplications at the start are not possible in this scenario
That was my understanding also after reading your Post (#9), and my Code is based entirely on this Assumption.
NeoPa 32,499
Expert Mod 16PB
Excellent. I was thrown by your posted example data, but as long as that's all clear all's good :-)
NeoPa 32,499
Expert Mod 16PB
This is an alternative procedure to allow you to strip the duplicates from your field : - Public Function StripDup(ByVal varWith As Variant) As String
-
Dim intX As Integer, intY As Integer
-
Dim strWith As String
-
-
If IsNull(varWith) Then
-
StripDup = ""
-
Exit Function
-
End If
-
StripDup = Trim(varWith)
-
varWith = Split(StripDup, " ")
-
For intX = (UBound(varWith) \ 2) + 1 To 1 Step -1
-
For intY = 0 To intX - 1
-
If varWith(intY) <> varWith(intY + intX) Then Exit For
-
Next intY
-
If intY > (intX - 1) Then Exit For
-
Next intX
-
If intX > 0 Then
-
StripDup = ""
-
For intY = intX To UBound(varWith)
-
StripDup = StripDup & " " & varWith(intY)
-
Next intY
-
StripDup = LTrim(StripDup)
-
End If
-
End Function
Nice job, NeoPa! Let me be the first to say that your StripDup is better then my StripDup.
NeoPa 32,499
Expert Mod 16PB
Thank you ADezii. A generous comment :-)
I appreciate all your help and please forgive my ignorance because I clearly know very little about access and what it can do. Can I take the code you have just provided below and open access, go to design query in SQL view and paste your code. How do I get it to run and updated that field.
Table is called Distinct_Configs_w_Attirbutes and the field is called potential Group Description?
NeoPa 32,499
Expert Mod 16PB
Not exactly. Let's see if we can describe the process clearly for you. - Find a module in your project that you can use to hold the procedure code (that I posted). This can be an existing procedure or a new one created specially.
To do this you first need to find your way to where the modules are visible to you :- Ensure you're in your database in Access.
- Switch to the VBA IDE using Alt-F11.
- Switch to the Project Explorer pane using Ctrl-R.
- Expand the project and you will see (if there is anything there with code) up to three separate trees :
- Microsoft Office Access Class Objects.
- Modules.
- Class Objects.
- Standard modules are found within the Modules branch.
- Select an existing Standard module, or create a new one :
- Right click in the Project Explorer pane.
- Select Insert.
- Select Module (and name it etc).
- If the module isn't opened then double-click on it to open it up.
- See Require Variable Declaration before going any further and follow instructions found there.
- Copy and Paste the code from post #14 (Click on Line Numbers and Select at the top first to allow it to work smoothly).
- In your query, simply refer to the procedure by name and pass, as the parameter, the field that contains the unfixed data :
- SELECT StripDup([UnfixedField]) AS [FixedField]
-
FROM [SomeTable]
-
etc
You should now see the data as it should have been.
Code posted in #14 simply doesn't work. I test with input "Hello World Hello Hello World World" and get back the same string unchanged. I test with input "Hello Hello Hello" and get a runtime error 9, subscript out of range.
May I offer the following function instead. This is basically an O(n^2) algorithm and does not take advantage of VBA Collections. However for small strings it's probably just fine. Don't call it on a many-megabyte input :) -
Public Function DropDupeWords(ByVal words As Variant) As String
-
'Removes duplicate words in the input, ignores case, preserves original order.
-
'Result has the first unique words found in the original casing.
-
'A word is any sequence of non-space characters.
-
-
'Don't do anything on null or empty
-
DropDupeWords = ""
-
If IsNull(words) Then
-
Exit Function
-
End If
-
Dim trimmed As String
-
trimmed = Trim(words)
-
If trimmed = "" Then
-
Exit Function
-
End If
-
'If no embedded space, there's nothing to do
-
If InStr(trimmed, " ") = 0 Then
-
DropDupeWords = trimmed
-
Exit Function
-
End If
-
-
'Split into words
-
Dim wordarr As Variant
-
wordarr = Split(trimmed, " ")
-
Dim limit As Long
-
limit = UBound(wordarr)
-
-
'Build new array of unique words, initialized with first word.
-
Dim uniquewords() As String
-
ReDim uniquewords(limit)
-
Dim i As Integer, unique As Integer, u As Integer, found As Boolean
-
uniquewords(0) = wordarr(0)
-
unique = 0
-
For i = 1 To limit
-
'Skip empty elements resulting from multiple spaces
-
If wordarr(i) <> "" Then
-
found = False
-
For u = 0 To unique
-
'Ignore case when comparing words
-
If StrComp(wordarr(i), uniquewords(u), vbTextCompare) = 0 Then
-
found = True
-
End If
-
Next u
-
If Not found Then
-
unique = unique + 1
-
uniquewords(unique) = wordarr(i)
-
End If
-
End If
-
Next i
-
-
'Build result from the array of unique words
-
Dim result As String
-
result = uniquewords(0)
-
For i = 1 To unique
-
result = result + " " + uniquewords(i)
-
Next i
-
DropDupeWords = result
-
-
End Function
-
NeoPa 32,499
Expert Mod 16PB ChrisLott:
Code posted in #14 simply doesn't work. I test with input "Hello World Hello Hello World World" and get back the same string unchanged. I test with input "Hello Hello Hello" and get a runtime error 9, subscript out of range.
I was surprised to find you're correct in your statement. Although the second example is the only problem here.
Bear in mind that "Hello World Hello Hello World World" is supposed to return "Hello World Hello Hello World World", as there is no duplicate that matches the requirements specified in the question (or more particularly expressed in post #9 as the OP was unable to express themself clearly enough when they posted the question originally).
Your suggested code is not a match for the stated requirement and it's also far from efficient. Remember, this is not simply about deleting duplicates. It is about deleting a specific duplicate introduced by doubling the first value added to the result. Thus "A B C A B C B C A A B C" should return only "A B C B C A A B C". No other duplicates need, or should, be removed.
With this in mind I've revisited my earlier code and this new version should do the trick as expected : - Public Function StripDup(ByVal varWith As Variant) As String
-
Dim intX As Integer, intY As Integer, intZ As Integer
-
Dim strWith As String
-
-
If Not varWith > "" Then
-
StripDup = ""
-
Exit Function
-
End If
-
StripDup = Trim(varWith)
-
varWith = Split(StripDup, " ")
-
intX = 0
-
For intY = 1 To (UBound(varWith) + 1) \ 2
-
If varWith(intY) = varWith(intX) Then
-
For intX = 1 To intY - 1
-
If varWith(intX) <> varWith(intX + intY) Then Exit For
-
Next intX
-
If intX = intY Then
-
For intX = 0 To intY - 1
-
varWith(intX) = ""
-
Next intX
-
Exit For
-
End If
-
End If
-
Next intY
-
If varWith(0) = "" Then
-
StripDup = Trim(Join(varWith, " "))
-
End If
-
End Function
Thanks for clarifying the spec. A person I'm supporting was struggling with a related problem and landed here via google but didn't read the intro carefully enough. My little function does not solve the problem of removing a proper prefix of duplicate words; it removes all duplicate words. I'll leave it up on the off chance it will help the next person who lands here by keyword search :)
NeoPa 32,499
Expert Mod 16PB ChrisLott:
I'll leave it up on the off chance it will help the next person who lands here by keyword search :)
That's good thinking and I agree. Let's leave it there. It may well help people looking for answers.
Post your reply Sign in to post your reply or Sign up for a free account.
Similar topics
2 posts
views
Thread by Damien Cobbs |
last post: by
|
7 posts
views
Thread by Voetleuce en fênsievry |
last post: by
|
1 post
views
Thread by guoqi zheng |
last post: by
|
6 posts
views
Thread by Paul |
last post: by
|
3 posts
views
Thread by dazzle |
last post: by
| | |
reply
views
Thread by TAB |
last post: by
| | | | | | | | | | | | |