473,405 Members | 2,344 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,405 software developers and data experts.

Importing from delimited text file & removing dulpicates problem

Hi
I have a problem which I hope someone can help me with because I
really don't even know where to start with it.
I am using Access 2003. I have a delimited text file which contains
about 200,000 lines. There are only 2 fields, Role and ID. The role
field contains many duplicates, out of the 200,000 records I think
there are only 16,000 unique roles.

Role ID
AB_F2S_COST_PLAN_CLK EE1
AB_F2S_COST_PLAN_CLK LT10
AB_F2S_COST_PLAN_CLK LT18
AB_F2S_COST_PLAN_CLK LT3
AB_F2S_COST_PLAN_CLK LT7
AB_F2S_COST_PLAN_CLK LT8
AB_F2S_COST_PLAN_DSP TR8
AB_F2S_COST_PLAN_DSP TH2
AB_F2S_COST_PLAN_DSP BM23

I need to produce a single unique list of roles with the ID's related
to that role all concatenated together with comma's as separaters.

e.g
Role
ID
AB_F2S_COST_PLAN_CLK EE1,LT10,LT18,LT3,LT7,LT8
AB_F2S_COST_PLAN_DSP TR8,TH2,BM23

Is this possible and if so, how ?
Thanks in advance

Jun 14 '07 #1
5 1979
Hi

The following should help you...

Step 1: Import your delimited file into a table named "Roles" with
"Role" as 1st named Field and "ID" as 2nd named field

e.g.

Role ID
AB_F2S_COST_PLAN_CLK EE1
AB_F2S_COST_PLAN_CLK LT10
AB_F2S_COST_PLAN_CLK LT18
AB_F2S_COST_PLAN_CLK LT3
AB_F2S_COST_PLAN_CLK LT7
AB_F2S_COST_PLAN_CLK LT8
AB_F2S_COST_PLAN_DSP TR8
AB_F2S_COST_PLAN_DSP TH2
AB_F2S_COST_PLAN_DSP BM23
Step 2: Create a new module and add the following function to it

i.e.

Function ConcatDetail1(Num As String) As Variant
Dim rs As DAO.Recordset
Dim strOut As String
Dim strSQL As String
Dim lngLen As Long
Const strcSep = ","

strSQL = "SELECT ID FROM Test WHERE Role = '" & Num & "';"
Set rs = DBEngine(0)(0).OpenRecordset(strSQL)
With rs
Do While Not .EOF
strOut = strOut & !ID & strcSep
.MoveNext
Loop
End With
rs.Close

lngLen = Len(strOut) - Len(strcSep)
If lngLen 0 Then
ConcatDetail1 = Left(strOut, lngLen)
Else
ConcatDetail1 = Null
End If
Set rs = Nothing
End Function
Step 3: Create a new query, add the "Roles" Table to it. Use the SQL
VIEW form of the Query and put the following in it...

i.e.
SELECT DISTINCT Test.Role, ConcatDetail1([Role]) AS ID
FROM Test;

Save the query.

Step 4. Run the query and you should get your required result.

i.e.

AB_F2S_COST_PLAN_CLK EE1,LT10,LT18,LT3,LT7,LT8
AB_F2S_COST_PLAN_DSP TR8,TH2,BM23
Step 5. You can now export the query.

If you need further clarification please ask...

Regards,

osmethod

(N.B. The "ConcatDetail1" function was provided to me by Allen Browne
when I needed a similar resolution to yours..)

See
http://groups.google.ie/group/comp.d...d&rnum=1&hl=en

Jun 14 '07 #2
I have followed your instructions but when I ran the query an error
occured saying that the table 'Test' could not be found. I figured
that my table should be re-named as 'Test' and not 'Roles'. I then ran
the query again and left it for about 40 minutes. It still wasn't
finished so I aborted it. Should it really be taking this long to sort
200,000 records ?!

Jun 15 '07 #3
On Jun 15, 1:14 pm, tom.hepwo...@jt-int.com wrote:
I have followed your instructions but when I ran the query an error
occured saying that the table 'Test' could not be found. I figured
that my table should be re-named as 'Test' and not 'Roles'. I then ran
the query again and left it for about 40 minutes. It still wasn't
finished so I aborted it. Should it really be taking this long to sort
200,000 records ?!
You are correct - table name is test.

Hard to judge the length of time. Some things that may help...

1. Put an index(duplicates ok) on each of the fields in the test
table.

2. Create a smaller file sample of 100 records from your live file and
import it first.Then run the query and see the effect first.

Regards,

osmethod

Jun 15 '07 #4
On Jun 14, 12:18 pm, tom.hepwo...@jt-int.com wrote:
Hi
I have a problem which I hope someone can help me with because I
really don't even know where to start with it.
I am using Access 2003. I have a delimited text file which contains
about 200,000 lines. There are only 2 fields, Role and ID. The role
field contains many duplicates, out of the 200,000 records I think
there are only 16,000 unique roles.

Role ID
AB_F2S_COST_PLAN_CLK EE1
AB_F2S_COST_PLAN_CLK LT10
AB_F2S_COST_PLAN_CLK LT18
AB_F2S_COST_PLAN_CLK LT3
AB_F2S_COST_PLAN_CLK LT7
AB_F2S_COST_PLAN_CLK LT8
AB_F2S_COST_PLAN_DSP TR8
AB_F2S_COST_PLAN_DSP TH2
AB_F2S_COST_PLAN_DSP BM23

I need to produce a single unique list of roles with the ID's related
to that role all concatenated together with comma's as separaters.

e.g
Role
ID
AB_F2S_COST_PLAN_CLK EE1,LT10,LT18,LT3,LT7,LT8
AB_F2S_COST_PLAN_DSP TR8,TH2,BM23

Is this possible and if so, how ?
Thanks in advance
See if this code-behind-form runs quickly enough:

Private Sub cmdProcess_Click()
Const ROLEMAX = 30000
Dim UniqueRole(ROLEMAX) As String
Dim IDList(ROLEMAX) As String
Dim lngI As Long
Dim lngJ As Long
Dim lngMax As Long
Dim lngRecords As Long
Dim MyDB As Database
Dim MyRS As Recordset
Dim OutRS As Recordset
Dim strSQL As String
Dim strID As String
Dim strRole As String
Dim boolRoleFound As Boolean
Dim boolIDFound As Boolean
Dim strTemp As String
Dim intComma As Integer
Dim strTest As String
Dim tdf As TableDef
Dim fld As Field
Dim boolOutputTable As Boolean

Set MyDB = CurrentDb
boolOutputTable = False
For Each tdf In MyDB.TableDefs
If tdf.Name = "tblOutput" Then
boolOutputTable = True
Exit For
End If
Next tdf
If Not boolOutputTable Then
Set tdf = MyDB.CreateTableDef("tblOutput")
Set fld = tdf.CreateField("Role", dbText, 255)
tdf.Fields.Append fld
Set fld = tdf.CreateField("IDList", dbMemo)
tdf.Fields.Append fld
tdf.Fields.Refresh
MyDB.TableDefs.Append tdf
MyDB.TableDefs.Refresh
Set tdf = Nothing
Set fld = Nothing
Else
strSQL = "DELETE tblOutput FROM tblOutput;"
MyDB.Execute strSQL, dbFailOnError
End If
For lngI = 1 To ROLEMAX
UniqueRole(lngI) = ""
IDList(lngI) = ""
Next lngI
strSQL = "SELECT Role, ID FROM Roles;"
Set MyDB = CurrentDb
Set MyRS = MyDB.OpenRecordset(strSQL, dbOpenSnapshot)
MyRS.MoveLast
lngRecords = MyRS.RecordCount
MyRS.MoveFirst
lngMax = 0
For lngI = 1 To lngRecords
strID = MyRS("ID")
strRole = MyRS("Role")
'Search for Role in arrays
If lngMax 0 Then
boolRoleFound = False
For lngJ = 1 To lngMax
If UniqueRole(lngJ) = strRole Then
boolRoleFound = True
'Update the ID list, empty string, single ID, multiple ID no
substring, multipleID substring
If IDList(lngJ) = "" Then
IDList(lngJ) = strID
ElseIf InStr(1, ",", IDList(lngI), vbTextCompare) = 0 And
IDList(lngJ) <strID Then
IDList(lngJ) = IDList(lngJ) & "," & strID
ElseIf InStr(1, strID, IDList(lngI), vbTextCompare) = 0 Then
IDList(lngJ) = IDList(lngJ) & "," & strID
Else
'Check each piece
boolIDFound = False
strTemp = IDList(lngJ)
Do While Len(strTemp) 0
intComma = InStr(1, ",", strTemp, vbTextCompare)
If intComma 0 Then
strTest = Left(strTemp, intComma - 1)
If strTest = strID Then
boolIDFound = True
Exit Do
Else
strTemp = Right(strTemp, Len(strTemp) - Len(strTest))
End If
Else
'only one piece
If strTemp = strID Then boolIDFound = True
strTemp = ""
End If
Loop
If boolIDFound = False Then
IDList(lngJ) = IDList(lngJ) & "," & strID
End If
End If
Exit For
End If
Next lngJ
If boolRoleFound = False Then
'Add a new Role and ID
lngMax = lngMax + 1
UniqueRole(lngMax) = strRole
IDList(lngMax) = strID
End If
Else
UniqueRole(1) = strRole
IDList(1) = strID
lngMax = 1
End If
If lngI <lngRecords Then MyRS.MoveNext
Next lngI
MyRS.Close
Set MyRS = Nothing
'Append array values to table
strSQL = "SELECT * FROM tblOutput;"
Set OutRS = MyDB.OpenRecordset(strSQL, dbOpenDynaset)
For lngI = 1 To lngMax
OutRS.AddNew
OutRS("Role") = UniqueRole(lngI)
If IDList(lngI) <"" Then OutRS("IDList") = IDList(lngI)
OutRS.Update
Next lngI
MsgBox ("Done.")
End Sub

Note that by ordering the query (from "SELECT Role, ID FROM Roles;" to
"SELECT Role, ID FROM Roles ORDER BY Role, ID;") it is possible to
simplify greatly the checks for existing Roles and ID's. The main
idea is that a single pass is made through the table. Note that I
didn't put checks in for when lngMax exceeds ROLEMAX - 1. That should
be done also. This was coded in A97 so you may need 'DAO.' in a few
Dim statements (Recordset and possibly Database). This code can also
be done more elegantly with ADO or by using newer commands such as
Split and Join. Anyway, it should give you a good place to start.

James A. Fortune
CD********@FortuneJames.com

Jun 15 '07 #5
On Jun 15, 5:15 pm, CDMAPos...@FortuneJames.com wrote:
On Jun 14, 12:18 pm, tom.hepwo...@jt-int.com wrote:


Hi
I have a problem which I hope someone can help me with because I
really don't even know where to start with it.
I am using Access 2003. I have a delimited text file which contains
about 200,000 lines. There are only 2 fields, Role and ID. The role
field contains many duplicates, out of the 200,000 records I think
there are only 16,000 unique roles.
Role ID
AB_F2S_COST_PLAN_CLK EE1
AB_F2S_COST_PLAN_CLK LT10
AB_F2S_COST_PLAN_CLK LT18
AB_F2S_COST_PLAN_CLK LT3
AB_F2S_COST_PLAN_CLK LT7
AB_F2S_COST_PLAN_CLK LT8
AB_F2S_COST_PLAN_DSP TR8
AB_F2S_COST_PLAN_DSP TH2
AB_F2S_COST_PLAN_DSP BM23
I need to produce a single unique list of roles with the ID's related
to that role all concatenated together with comma's as separaters.
e.g
Role
ID
AB_F2S_COST_PLAN_CLK EE1,LT10,LT18,LT3,LT7,LT8
AB_F2S_COST_PLAN_DSP TR8,TH2,BM23
Is this possible and if so, how ?
Thanks in advance

See if this code-behind-form runs quickly enough:
...
Note that by ordering the query (from "SELECT Role, ID FROM Roles;" to
"SELECT Role, ID FROM Roles ORDER BY Role, ID;") it is possible to
simplify greatly the checks for existing Roles and ID's. The main
idea is that a single pass is made through the table. Note that I
didn't put checks in for when lngMax exceeds ROLEMAX - 1. That should
be done also. This was coded in A97 so you may need 'DAO.' in a few
Dim statements (Recordset and possibly Database). This code can also
be done more elegantly with ADO or by using newer commands such as
Split and Join. Anyway, it should give you a good place to start.

James A. Fortune
CDMAPos...@FortuneJames.com
The good news:

I created a Roles table with 200,000 records and a lot of duplicates.
The code ran in about a minute on a slow machine. A version with an
ordered recordset and simplified logic should be run quite quickly.

The bad news:

It didn't prevent some duplicate ID's from ending up in
tblOutput.IDList. The simplified logic using an ordered recordset
should handle that problem nicely.

Post back if you need help with this for either the unordered or
ordered recordset cases.

James A. Fortune
CD********@FortuneJames.com

Jun 17 '07 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Elmo Watson | last post by:
I've been asked to develop a semi-automated type situation where we have a database table (sql server) and periodically, there will be a comma delimited file from which we need to import the data,...
1
by: sparks | last post by:
I have never done this and wanted to ask people who have what is the best way. One person said import it to excel, then import it into access table. but since this will be done a lot, I am...
8
by: harry | last post by:
Hi Folks: I'm trying to do something that looks simple, but I can't make it work right. It's Access 2000 on a Win2000 computer. I create a database with 5 columbs. The data I need to import...
1
by: don | last post by:
I'm trying to import a comma delimited text file into MS Access 2002 version - the first column is set for an auto increment primary key and everything works right the first time I import a text...
10
by: hendafe | last post by:
I have a problem of importing a text file into Access. The text file will be imported using a delimiter, say ( , ) for example. This should be imported into the database. An example is: John,...
4
by: Kathie via AccessMonster.com | last post by:
Hello, I have to import monthly, files that were once *.csv but due to commas in addresses, the interface program was changed to dump tab delimited. Now my code is not finding the files in the...
17
by: OdAwG | last post by:
Just some questions regarding tables. I am new Access Database and need a little help. I have the following data listed below 01. I have a table called tbl_Customer with the following...
5
by: Andy | last post by:
Hello All: I am importing an Excel / CSV file. The problem I am having is: the columns are being defined for me as int32 (able to determine by using the ..GetFieldType method). As a result: ...
2
by: denisel | last post by:
Hi, We will be conducting surveys through SurveyMonkey online and will be importing the answers by tab delimited or comma delimited file into access. I was wondering if there is specific way to...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.