473,587 Members | 2,267 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Avoiding dupes when merging files

Hi all.

I currently have 2 text files which contain lists of file names. These
text files are updated by my code. What I want to do is be able to
merge these text files discarding the duplicates.

And to make it harder (or not???!!) my criteria for defining the
duplicate is the left 15 (or so) characters of the file path.
Help, as always, is greatly appreciated!

Thanks

Nov 21 '05 #1
12 1404
go************@ hotmail.com wrote in news:1101328833 .131813.52400
@c13g2000cwb.go oglegroups.com:
Hi all.

I currently have 2 text files which contain lists of file names. These
text files are updated by my code. What I want to do is be able to
merge these text files discarding the duplicates.

And to make it harder (or not???!!) my criteria for defining the
duplicate is the left 15 (or so) characters of the file path.
Help, as always, is greatly appreciated!

Take a look at the Microsoft Text Driver - you can run SQL queries on the
text file. Perhaps you can just query each file checking for dupes?

Or you could load the data into a datatable (or hash table type object?),
with the PK set as the filename... if a duplicate shows up, the datatable
should throw a duplicate PK exception which you would catch and ignore.

Or lastly... perhaps you should think of a different method of storing the
data? Maybe a database is a better idea than text files?

--
Lucas Tam (RE********@rog ers.com)
Please delete "REMOVE" from the e-mail address when replying.
http://members.ebay.com/aboutme/coolspot18/
Nov 21 '05 #2
> Take a look at the Microsoft Text Driver - you can run SQL queries on the
text file. Perhaps you can just query each file checking for dupes?

Or you could load the data into a datatable (or hash table type object?),
with the PK set as the filename... if a duplicate shows up, the datatable
should throw a duplicate PK exception which you would catch and ignore.

Or lastly... perhaps you should think of a different method of storing the
data? Maybe a database is a better idea than text files?


I like the idea of the PK exception as it will give an error that i can
trap. I am being forced to use text files though for simplicity. Do you
have any sample code for implementing a datatable/PK exception as this is
new to me!

Bob
Nov 21 '05 #3
"Bob Hollness" <bo*@blockbuste r.com> wrote in
news:uH******** ******@TK2MSFTN GP11.phx.gbl:
I like the idea of the PK exception as it will give an error that i
can trap. I am being forced to use text files though for simplicity.
Do you have any sample code for implementing a datatable/PK exception
as this is new to me!


Here's the example from MSDN:

http://msdn.microsoft.com/library/de...l=/library/en-
us/cpref/html/frlrfsystemdata datatableclassp rimarykeytopic. asp

I've used it a couple of times and it works fine.

Here is what you do in short:

1. Add your columns to a datatable.
2. Add the same column from step 2 into a primary key array.
3. Add the primary key array to the DataTable.Prima ryKey property.

--
Lucas Tam (RE********@rog ers.com)
Please delete "REMOVE" from the e-mail address when replying.
http://members.ebay.com/aboutme/coolspot18/
Nov 21 '05 #4
>
Take a look at the Microsoft Text Driver - you can run SQL queries on the
text file. Perhaps you can just query each file checking for dupes?

Or you could load the data into a datatable (or hash table type object?),
with the PK set as the filename... if a duplicate shows up, the datatable
should throw a duplicate PK exception which you would catch and ignore.

Or lastly... perhaps you should think of a different method of storing the
data? Maybe a database is a better idea than text files?

--
Lucas Tam (RE********@rog ers.com)
Please delete "REMOVE" from the e-mail address when replying.
http://members.ebay.com/aboutme/coolspot18/


Thanks for the fast reply. I have to use text files so that really is not
an option. Any pointers or some sample code on how to use the datatable? I
like the idea of being able to trap a dupicate OK error.

Bob
Nov 21 '05 #5
"Bob Hollness" <bo*@blockbuste r.com> wrote in
news:#U******** ******@TK2MSFTN GP11.phx.gbl:

Take a look at the Microsoft Text Driver - you can run SQL queries on
the text file. Perhaps you can just query each file checking for
dupes?

Or you could load the data into a datatable (or hash table type
object?), with the PK set as the filename... if a duplicate shows up,
the datatable should throw a duplicate PK exception which you would
catch and ignore.

Or lastly... perhaps you should think of a different method of
storing the data? Maybe a database is a better idea than text files?

--
Lucas Tam (RE********@rog ers.com)
Please delete "REMOVE" from the e-mail address when replying.
http://members.ebay.com/aboutme/coolspot18/


Thanks for the fast reply. I have to use text files so that really is
not an option. Any pointers or some sample code on how to use the
datatable? I like the idea of being able to trap a dupicate OK error.


I replied to your message a bit early in the day, but I'm not sure if
you received it:

Here's the example from MSDN (particularly the SetPrimaryKeys Sub):

http://msdn.microsoft.com/library/de...l=/library/en-
us/cpref/html/frlrfsystemdata datatableclassp rimarykeytopic. asp

I've used it a couple of times and it works fine.

Here is what you do in short:

1. Add your columns to a datatable.
2. Add the same column from step 2 into a primary key array.
3. Add the primary key array to the DataTable.Prima ryKey property.
--
Lucas Tam (RE********@rog ers.com)
Please delete "REMOVE" from the e-mail address when replying.
http://members.ebay.com/aboutme/coolspot18/
Nov 21 '05 #6
Thanks for this. But I guess i need something a little more basic. Also to
do it in memory or straight to disk. I guess i'll keep playing with the
loops

--
Bob Hollness

-------------------------------------
I'll have a B please Bob
"Lucas Tam" <RE********@rog ers.com> wrote in message
news:Xn******** *************** ****@140.99.99. 130...
"Bob Hollness" <bo*@blockbuste r.com> wrote in
news:uH******** ******@TK2MSFTN GP11.phx.gbl:
I like the idea of the PK exception as it will give an error that i
can trap. I am being forced to use text files though for simplicity.
Do you have any sample code for implementing a datatable/PK exception
as this is new to me!


Here's the example from MSDN:

http://msdn.microsoft.com/library/de...l=/library/en-
us/cpref/html/frlrfsystemdata datatableclassp rimarykeytopic. asp

I've used it a couple of times and it works fine.

Here is what you do in short:

1. Add your columns to a datatable.
2. Add the same column from step 2 into a primary key array.
3. Add the primary key array to the DataTable.Prima ryKey property.

--
Lucas Tam (RE********@rog ers.com)
Please delete "REMOVE" from the e-mail address when replying.
http://members.ebay.com/aboutme/coolspot18/

Nov 21 '05 #7
> Hi all.

I currently have 2 text files which contain lists of file names. These
text files are updated by my code. What I want to do is be able to
merge these text files discarding the duplicates.

And to make it harder (or not???!!) my criteria for defining the
duplicate is the left 15 (or so) characters of the file path.
Help, as always, is greatly appreciated!

Thanks


OK. This is the solution I came up with. Not as elegant as one would have
hoped. but then again, only I get to see how it functions under the bonnet
(hood for the Americans) !!! And of course, this is still to be tidied up
and made pretty. Feel free to pull it apart and embarrass me.......
Sub FindDupes(ByVal File2Compare As String, ByVal OriginalFile As
String, ByVal OutputFile As String)

Dim File1Reader As New StreamReader(Fi le2Compare)
Dim File2Reader 'As New StreamReader(Or iginalFile)
Dim File3Writer As New StreamWriter(Ou tputFile)
Dim Line1 As String = ""
Dim Line2 As String = ""
Dim Found As Boolean

Do
Line1 = File1Reader.Rea dLine
Found = False

If Not Line1 Is Nothing Then

File2Reader = New StreamReader(Or iginalFile)

Do
Line2 = File2Reader.Rea dLine()
If Line1 = Line2 Then
Found = True
Exit Do
End If
Loop Until Line2 Is Nothing

If Found = False Then
File3Writer.Wri teLine(Line1)
End If

Found = False

File2Reader.Clo se()

End If
Loop Until Line1 Is Nothing

File1Reader.Clo se()
File2Reader.Clo se()
File3Writer.Clo se()

--
Bob Hollness

-------------------------------------
I'll have a B please Bob
Nov 21 '05 #8
"Bob Hollness" <bo*@blockbuste r.com> wrote in news:uUD3YV00EH A.1392
@TK2MSFTNGP14.p hx.gbl:
Feel free to pull it apart and embarrass me.......


Very inefficent when compared to Cor's elegant example of a hash table!

Nov 21 '05 #9

"Bob Hollness" <bo*@blockbuste r.com> wrote

I currently have 2 text files which contain lists of file names. These
text files are updated by my code. What I want to do is be able to
merge these text files discarding the duplicates.
And to make it harder (or not???!!) my criteria for defining the
duplicate is the left 15 (or so) characters of the file path.
Help, as always, is greatly appreciated!


OK. This is the solution I came up with. Not as elegant as one would have
hoped. but then again, only I get to see how it functions under the bonnet
(hood for the Americans) !!! And of course, this is still to be tidied up
and made pretty. Feel free to pull it apart and embarrass me.......


As Cor suggested use a Hashtable, (or you might call it a Dictionary) it will
be much more efficient, and easier to code....

Paste the following in to a routine to see it in action:

HTH
LFS
Dim item As String
Dim hash As New System.Collecti ons.Hashtable
Dim file1 As String() = New String() { _
"Pretend this is text from a file.", _
"It is contained in an array only for", _
"demo purposes."}
Dim file2 As String() = New String() { _
"This is the text from a second file.", _
"The next line is a duplicate line and", _
"will overwrite the original entry:", _
"It is contained (DUPLICATE)", _
"Only the first 10 characters", _
"were used toward duplicate testing."}

For Each item In file1
hash.Item(item. Substring(0, 10)) = item
Next

For Each item In file2
hash.Item(item. Substring(0, 10)) = item
Next

Dim entry As System.Collecti ons.DictionaryE ntry
For Each entry In hash
Debug.WriteLine (entry.Value)
Next

Debug.WriteLine ("")
Debug.WriteLine ("Note that the order is not maintained, and")
Debug.WriteLine ("the duplicate line's original value was")
Debug.WriteLine ("overwritte n by the later (duplicate) entry.")

Nov 21 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
11015
by: Mike | last post by:
Hi! I also asked this question in C# group with no results: I have 2 datasets loaded with data from two xml files having the same schema. The files contain data from yesterday and today. I'd like to merge both datasets in such a way that the resulting dataset should have all the today's data overriding yestrerday's data. CATCH: the today's...
2
3891
by: Nikhil Prashar | last post by:
I'm trying to merge two XML files that have the same structure but not necessarily the same nodes in the same order. I've tried opening the files as datasets and using the DataSet.Merge() function, but this only "fumbles" the data together and puts children under the wrong parent nodes. How else could I go about merging the files? I'm using C#...
0
1507
by: steve | last post by:
Hi there, I am trying to import data from 2 dbf files into excel using the 'get external data' option which launches ms query. Ultimately I am merging data with a right join statement. I can successfully select the dbf files, and drag the common ID fields to make a match, yet when trying to select the right join option, the left and...
3
1654
by: Georges Heinesch | last post by:
Hi. This issue semms trivial, but I didn't get it working so far. I have a database, which contains dupes. I'd like to create a query, which shows all dupes (not only one record, but all records which are double). E.g.
1
1251
by: gdarian216 | last post by:
I am tring to get rid of dupes and his code is taking the first input and repeating it. I don't know why. this is what i have so far can anyone help #include <iostream> using namespace std; int main() { int scores; int value;
10
10038
by: n o s p a m p l e a s e | last post by:
Is it possible to merge two DLL files into one? If so, how? Thanx/NSP
7
7777
by: Jan | last post by:
Hi: When I searched the newsgroup for this problem, I saw two or three instances of the question being asked, but it was never answered. Not too promising, but here goes: I have a form with four subforms, and bit of code that cycles through the data in the subform (bound to a local temp table) and writes it to a table on the server. ...
0
1338
by: veer | last post by:
Hello sir. I am making a program on merging in Visual Basic. The program is that I have a folder which is not on my hard drive contain 80 Mdb files and each Mdb file contains two tables. I want to merge all the Mdb files into a new Mdb file using a single execution or in one loop. I have some coding, please correct it or modify. I don't know how...
10
1667
by: username88 | last post by:
I am having trouble with a query for my database. It is a name & address database with columns like firstname, lastname, email, etc. I am trying to show dupes in my 20,000 name database. I have written a query that shows the dupes. For example, if there are 10 email addresses as john@john.net, my output only shows that entry once. What I...
0
7915
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
7843
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
8205
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...
1
7967
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
0
8220
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
1
5712
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
5392
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
1
2347
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1452
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.