468,288 Members | 1,978 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 468,288 developers. It's quick & easy.

Compare 2 files and discard common lines

I have a requirement to compare 2 text files and write to a 3rd file
only those lines that appear in the 2nd file but not in the 1st file.

Rather than re-invent the wheel I am wondering if anyone has written
anything already?

Jun 27 '08 #1
12 5674
On May 29, 6:36 pm, loial <jldunn2...@googlemail.comwrote:
I have a requirement to compare 2 text files and write to a 3rd file
only those lines that appear in the 2nd file but not in the 1st file.

Rather than re-invent the wheel I am wondering if anyone has written
anything already?
You can use the cmp(x, y) function to tell if a string is similar to
another.

going cmp('spam', 'eggs') will return 1 (spam is greater than eggs)
(have no idea why)
swapping the two give -1
and having 'eggs' and 'eggs' gives 0.

is that what you were looking for?
Jun 27 '08 #2
Kalibr wrote:
On May 29, 6:36 pm, loial <jldunn2...@googlemail.comwrote:
>I have a requirement to compare 2 text files and write to a 3rd file
only those lines that appear in the 2nd file but not in the 1st file.

Rather than re-invent the wheel I am wondering if anyone has written
anything already?

You can use the cmp(x, y) function to tell if a string is similar to
another.
Or "==".

Stefan
Jun 27 '08 #3
loial wrote:
I have a requirement to compare 2 text files and write to a 3rd file
only those lines that appear in the 2nd file but not in the 1st file.
lines_in_file2 = set(open("file2").readlines())
for line in open("file1"):
if line not in lines_in_file2:
print line

Stefan
Jun 27 '08 #4
On May 29, 10:36*am, loial <jldunn2...@googlemail.comwrote:
I have a requirement to compare 2 text files and write to a 3rd file
only those lines that appear in the 2nd file but not in the 1st file.

Rather than re-invent the wheel I am wondering if anyone has written
anything already?
How large are the files ? You could load up the smallest file into
memory then while iterating over the other one just do 'if line in
other_files_lines:' and do your processing from there. By your
description it doesn't sound like you want to iterate over both files
simultaneously and do a line for line comparison because that would
mean if someone plonks an extra newline somewhere it wouldn't gel.
Jun 27 '08 #5
Another way of doing this might be to use the module difflib to
calculate the differences. It has a sequence matcher under it which
has the function get_matching_blocks

difflib is included with python.
On May 29, 2:02*pm, Chris <cwi...@gmail.comwrote:
On May 29, 10:36*am, loial <jldunn2...@googlemail.comwrote:
I have a requirement to compare 2 text files and write to a 3rd file
only those lines that appear in the 2nd file but not in the 1st file.
Rather than re-invent the wheel I am wondering if anyone has written
anything already?

How large are the files ? You could load up the smallest file into
memory then while iterating over the other one just do 'if line in
other_files_lines:' and do your processing from there. *By your
description it doesn't sound like you want to iterate over both files
simultaneously and do a line for line comparison because that would
mean if someone plonks an extra newline somewhere it wouldn't gel.
Jun 27 '08 #6
On May 29, 6:36 pm, loial <jldunn2...@googlemail.comwrote:
I have a requirement to compare 2 text files and write to a 3rd file
only those lines that appear in the 2nd file but not in the 1st file.

Rather than re-invent the wheel I am wondering if anyone has written
anything already?
>>file1 = set((x for x in open('file1')))
file2 = set((x for x in open('file2')))
file3 = file2.difference(file1)
open('file3','w').writelines(file3)
Jun 27 '08 #7
On May 29, 1:36*am, loial <jldunn2...@googlemail.comwrote:
only those lines that appear in the 2nd file but not in the 1st file.
set(file_2_recs).difference(set(file_1_recs)) will give the recs in
file_2 that are not in file_1 if you can store both files in memory.
Sets are indexed and so are faster than lists.
Jun 27 '08 #8
Open('3rd', 'w').writelines(set(open('2nd').readlines())-set(open('1st')))

2008/5/29, loial <jl********@googlemail.com>:
I have a requirement to compare 2 text files and write to a 3rd file
only those lines that appear in the 2nd file but not in the 1st file.

Rather than re-invent the wheel I am wondering if anyone has written
anything already?

--
http://mail.python.org/mailman/listinfo/python-list

--
mvh Björn
Jun 27 '08 #9
2008/5/29, loial <jl********@googlemail.com>:
>I have a requirement to compare 2 text files and write to a 3rd file
only those lines that appear in the 2nd file but not in the 1st file.
En Thu, 29 May 2008 18:08:28 -0300, BJörn Lindqvist <bj*****@gmail.com>
escribió:
Open('3rd','w').writelines(set(open('2nd').readlin es())-set(open('1st')))
Is the asymmetry 1st/2nd intentional? I think one could omit .readlines()
in 2nd file too.

--
Gabriel Genellina

Jun 27 '08 #10
On May 29, 3:36*am, loial <jldunn2...@googlemail.comwrote:
I have a requirement to compare 2 text files and write to a 3rd file
only those lines that appear in the 2nd file but not in the 1st file.

Rather than re-invent the wheel I am wondering if anyone has written
anything already?
Take the time to learn difflib - it is a standard module, and good for
general comparison of files, sequences, etc.

-- Paul
Jun 27 '08 #11
On Thu, 29 May 2008 01:36:44 -0700, loial wrote:
I have a requirement to compare 2 text files and write to a 3rd file
only those lines that appear in the 2nd file but not in the 1st file.

Rather than re-invent the wheel I am wondering if anyone has written
anything already?
Of course you can do this at any linux or unix command line simply by:

comm -13 file1 file2 >file3
Jun 27 '08 #12
Lie
On May 29, 3:36*pm, loial <jldunn2...@googlemail.comwrote:
I have a requirement to compare 2 text files and write to a 3rd file
only those lines that appear in the 2nd file but not in the 1st file.

Rather than re-invent the wheel I am wondering if anyone has written
anything already?
It's so easy to do that it won't count as reinventing the wheel:

a = open('a.txt', 'r').read().split('\n')
b = open('b.txt', 'r').read().split('\n')
c = open('c.txt', 'w')
c.write('\n'.join([comm for comm in b if not (comm in a)]))
c.close()

it's not the fastest common searcher but it works.
Jun 27 '08 #13

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

5 posts views Thread by PyPK | last post: by
2 posts views Thread by SP | last post: by
5 posts views Thread by Morten Snedker | last post: by
2 posts views Thread by Fazana | last post: by
1 post views Thread by anithas | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.