By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
434,728 Members | 2,490 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 434,728 IT Pros & Developers. It's quick & easy.

Importing from delimited text file & removing dulpicates problem

P: n/a
Hi
I have a problem which I hope someone can help me with because I
really don't even know where to start with it.
I am using Access 2003. I have a delimited text file which contains
about 200,000 lines. There are only 2 fields, Role and ID. The role
field contains many duplicates, out of the 200,000 records I think
there are only 16,000 unique roles.

Role ID
AB_F2S_COST_PLAN_CLK EE1
AB_F2S_COST_PLAN_CLK LT10
AB_F2S_COST_PLAN_CLK LT18
AB_F2S_COST_PLAN_CLK LT3
AB_F2S_COST_PLAN_CLK LT7
AB_F2S_COST_PLAN_CLK LT8
AB_F2S_COST_PLAN_DSP TR8
AB_F2S_COST_PLAN_DSP TH2
AB_F2S_COST_PLAN_DSP BM23

I need to produce a single unique list of roles with the ID's related
to that role all concatenated together with comma's as separaters.

e.g
Role
ID
AB_F2S_COST_PLAN_CLK EE1,LT10,LT18,LT3,LT7,LT8
AB_F2S_COST_PLAN_DSP TR8,TH2,BM23

Is this possible and if so, how ?
Thanks in advance

Jun 14 '07 #1
Share this Question
Share on Google+
5 Replies


P: n/a
Hi

The following should help you...

Step 1: Import your delimited file into a table named "Roles" with
"Role" as 1st named Field and "ID" as 2nd named field

e.g.

Role ID
AB_F2S_COST_PLAN_CLK EE1
AB_F2S_COST_PLAN_CLK LT10
AB_F2S_COST_PLAN_CLK LT18
AB_F2S_COST_PLAN_CLK LT3
AB_F2S_COST_PLAN_CLK LT7
AB_F2S_COST_PLAN_CLK LT8
AB_F2S_COST_PLAN_DSP TR8
AB_F2S_COST_PLAN_DSP TH2
AB_F2S_COST_PLAN_DSP BM23
Step 2: Create a new module and add the following function to it

i.e.

Function ConcatDetail1(Num As String) As Variant
Dim rs As DAO.Recordset
Dim strOut As String
Dim strSQL As String
Dim lngLen As Long
Const strcSep = ","

strSQL = "SELECT ID FROM Test WHERE Role = '" & Num & "';"
Set rs = DBEngine(0)(0).OpenRecordset(strSQL)
With rs
Do While Not .EOF
strOut = strOut & !ID & strcSep
.MoveNext
Loop
End With
rs.Close

lngLen = Len(strOut) - Len(strcSep)
If lngLen 0 Then
ConcatDetail1 = Left(strOut, lngLen)
Else
ConcatDetail1 = Null
End If
Set rs = Nothing
End Function
Step 3: Create a new query, add the "Roles" Table to it. Use the SQL
VIEW form of the Query and put the following in it...

i.e.
SELECT DISTINCT Test.Role, ConcatDetail1([Role]) AS ID
FROM Test;

Save the query.

Step 4. Run the query and you should get your required result.

i.e.

AB_F2S_COST_PLAN_CLK EE1,LT10,LT18,LT3,LT7,LT8
AB_F2S_COST_PLAN_DSP TR8,TH2,BM23
Step 5. You can now export the query.

If you need further clarification please ask...

Regards,

osmethod

(N.B. The "ConcatDetail1" function was provided to me by Allen Browne
when I needed a similar resolution to yours..)

See
http://groups.google.ie/group/comp.d...d&rnum=1&hl=en

Jun 14 '07 #2

P: n/a
I have followed your instructions but when I ran the query an error
occured saying that the table 'Test' could not be found. I figured
that my table should be re-named as 'Test' and not 'Roles'. I then ran
the query again and left it for about 40 minutes. It still wasn't
finished so I aborted it. Should it really be taking this long to sort
200,000 records ?!

Jun 15 '07 #3

P: n/a
On Jun 15, 1:14 pm, tom.hepwo...@jt-int.com wrote:
I have followed your instructions but when I ran the query an error
occured saying that the table 'Test' could not be found. I figured
that my table should be re-named as 'Test' and not 'Roles'. I then ran
the query again and left it for about 40 minutes. It still wasn't
finished so I aborted it. Should it really be taking this long to sort
200,000 records ?!
You are correct - table name is test.

Hard to judge the length of time. Some things that may help...

1. Put an index(duplicates ok) on each of the fields in the test
table.

2. Create a smaller file sample of 100 records from your live file and
import it first.Then run the query and see the effect first.

Regards,

osmethod

Jun 15 '07 #4

P: n/a
On Jun 14, 12:18 pm, tom.hepwo...@jt-int.com wrote:
Hi
I have a problem which I hope someone can help me with because I
really don't even know where to start with it.
I am using Access 2003. I have a delimited text file which contains
about 200,000 lines. There are only 2 fields, Role and ID. The role
field contains many duplicates, out of the 200,000 records I think
there are only 16,000 unique roles.

Role ID
AB_F2S_COST_PLAN_CLK EE1
AB_F2S_COST_PLAN_CLK LT10
AB_F2S_COST_PLAN_CLK LT18
AB_F2S_COST_PLAN_CLK LT3
AB_F2S_COST_PLAN_CLK LT7
AB_F2S_COST_PLAN_CLK LT8
AB_F2S_COST_PLAN_DSP TR8
AB_F2S_COST_PLAN_DSP TH2
AB_F2S_COST_PLAN_DSP BM23

I need to produce a single unique list of roles with the ID's related
to that role all concatenated together with comma's as separaters.

e.g
Role
ID
AB_F2S_COST_PLAN_CLK EE1,LT10,LT18,LT3,LT7,LT8
AB_F2S_COST_PLAN_DSP TR8,TH2,BM23

Is this possible and if so, how ?
Thanks in advance
See if this code-behind-form runs quickly enough:

Private Sub cmdProcess_Click()
Const ROLEMAX = 30000
Dim UniqueRole(ROLEMAX) As String
Dim IDList(ROLEMAX) As String
Dim lngI As Long
Dim lngJ As Long
Dim lngMax As Long
Dim lngRecords As Long
Dim MyDB As Database
Dim MyRS As Recordset
Dim OutRS As Recordset
Dim strSQL As String
Dim strID As String
Dim strRole As String
Dim boolRoleFound As Boolean
Dim boolIDFound As Boolean
Dim strTemp As String
Dim intComma As Integer
Dim strTest As String
Dim tdf As TableDef
Dim fld As Field
Dim boolOutputTable As Boolean

Set MyDB = CurrentDb
boolOutputTable = False
For Each tdf In MyDB.TableDefs
If tdf.Name = "tblOutput" Then
boolOutputTable = True
Exit For
End If
Next tdf
If Not boolOutputTable Then
Set tdf = MyDB.CreateTableDef("tblOutput")
Set fld = tdf.CreateField("Role", dbText, 255)
tdf.Fields.Append fld
Set fld = tdf.CreateField("IDList", dbMemo)
tdf.Fields.Append fld
tdf.Fields.Refresh
MyDB.TableDefs.Append tdf
MyDB.TableDefs.Refresh
Set tdf = Nothing
Set fld = Nothing
Else
strSQL = "DELETE tblOutput FROM tblOutput;"
MyDB.Execute strSQL, dbFailOnError
End If
For lngI = 1 To ROLEMAX
UniqueRole(lngI) = ""
IDList(lngI) = ""
Next lngI
strSQL = "SELECT Role, ID FROM Roles;"
Set MyDB = CurrentDb
Set MyRS = MyDB.OpenRecordset(strSQL, dbOpenSnapshot)
MyRS.MoveLast
lngRecords = MyRS.RecordCount
MyRS.MoveFirst
lngMax = 0
For lngI = 1 To lngRecords
strID = MyRS("ID")
strRole = MyRS("Role")
'Search for Role in arrays
If lngMax 0 Then
boolRoleFound = False
For lngJ = 1 To lngMax
If UniqueRole(lngJ) = strRole Then
boolRoleFound = True
'Update the ID list, empty string, single ID, multiple ID no
substring, multipleID substring
If IDList(lngJ) = "" Then
IDList(lngJ) = strID
ElseIf InStr(1, ",", IDList(lngI), vbTextCompare) = 0 And
IDList(lngJ) <strID Then
IDList(lngJ) = IDList(lngJ) & "," & strID
ElseIf InStr(1, strID, IDList(lngI), vbTextCompare) = 0 Then
IDList(lngJ) = IDList(lngJ) & "," & strID
Else
'Check each piece
boolIDFound = False
strTemp = IDList(lngJ)
Do While Len(strTemp) 0
intComma = InStr(1, ",", strTemp, vbTextCompare)
If intComma 0 Then
strTest = Left(strTemp, intComma - 1)
If strTest = strID Then
boolIDFound = True
Exit Do
Else
strTemp = Right(strTemp, Len(strTemp) - Len(strTest))
End If
Else
'only one piece
If strTemp = strID Then boolIDFound = True
strTemp = ""
End If
Loop
If boolIDFound = False Then
IDList(lngJ) = IDList(lngJ) & "," & strID
End If
End If
Exit For
End If
Next lngJ
If boolRoleFound = False Then
'Add a new Role and ID
lngMax = lngMax + 1
UniqueRole(lngMax) = strRole
IDList(lngMax) = strID
End If
Else
UniqueRole(1) = strRole
IDList(1) = strID
lngMax = 1
End If
If lngI <lngRecords Then MyRS.MoveNext
Next lngI
MyRS.Close
Set MyRS = Nothing
'Append array values to table
strSQL = "SELECT * FROM tblOutput;"
Set OutRS = MyDB.OpenRecordset(strSQL, dbOpenDynaset)
For lngI = 1 To lngMax
OutRS.AddNew
OutRS("Role") = UniqueRole(lngI)
If IDList(lngI) <"" Then OutRS("IDList") = IDList(lngI)
OutRS.Update
Next lngI
MsgBox ("Done.")
End Sub

Note that by ordering the query (from "SELECT Role, ID FROM Roles;" to
"SELECT Role, ID FROM Roles ORDER BY Role, ID;") it is possible to
simplify greatly the checks for existing Roles and ID's. The main
idea is that a single pass is made through the table. Note that I
didn't put checks in for when lngMax exceeds ROLEMAX - 1. That should
be done also. This was coded in A97 so you may need 'DAO.' in a few
Dim statements (Recordset and possibly Database). This code can also
be done more elegantly with ADO or by using newer commands such as
Split and Join. Anyway, it should give you a good place to start.

James A. Fortune
CD********@FortuneJames.com

Jun 15 '07 #5

P: n/a
On Jun 15, 5:15 pm, CDMAPos...@FortuneJames.com wrote:
On Jun 14, 12:18 pm, tom.hepwo...@jt-int.com wrote:


Hi
I have a problem which I hope someone can help me with because I
really don't even know where to start with it.
I am using Access 2003. I have a delimited text file which contains
about 200,000 lines. There are only 2 fields, Role and ID. The role
field contains many duplicates, out of the 200,000 records I think
there are only 16,000 unique roles.
Role ID
AB_F2S_COST_PLAN_CLK EE1
AB_F2S_COST_PLAN_CLK LT10
AB_F2S_COST_PLAN_CLK LT18
AB_F2S_COST_PLAN_CLK LT3
AB_F2S_COST_PLAN_CLK LT7
AB_F2S_COST_PLAN_CLK LT8
AB_F2S_COST_PLAN_DSP TR8
AB_F2S_COST_PLAN_DSP TH2
AB_F2S_COST_PLAN_DSP BM23
I need to produce a single unique list of roles with the ID's related
to that role all concatenated together with comma's as separaters.
e.g
Role
ID
AB_F2S_COST_PLAN_CLK EE1,LT10,LT18,LT3,LT7,LT8
AB_F2S_COST_PLAN_DSP TR8,TH2,BM23
Is this possible and if so, how ?
Thanks in advance

See if this code-behind-form runs quickly enough:
...
Note that by ordering the query (from "SELECT Role, ID FROM Roles;" to
"SELECT Role, ID FROM Roles ORDER BY Role, ID;") it is possible to
simplify greatly the checks for existing Roles and ID's. The main
idea is that a single pass is made through the table. Note that I
didn't put checks in for when lngMax exceeds ROLEMAX - 1. That should
be done also. This was coded in A97 so you may need 'DAO.' in a few
Dim statements (Recordset and possibly Database). This code can also
be done more elegantly with ADO or by using newer commands such as
Split and Join. Anyway, it should give you a good place to start.

James A. Fortune
CDMAPos...@FortuneJames.com
The good news:

I created a Roles table with 200,000 records and a lot of duplicates.
The code ran in about a minute on a slow machine. A version with an
ordered recordset and simplified logic should be run quite quickly.

The bad news:

It didn't prevent some duplicate ID's from ending up in
tblOutput.IDList. The simplified logic using an ordered recordset
should handle that problem nicely.

Post back if you need help with this for either the unordered or
ordered recordset cases.

James A. Fortune
CD********@FortuneJames.com

Jun 17 '07 #6

This discussion thread is closed

Replies have been disabled for this discussion.