By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,036 Members | 1,963 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,036 IT Pros & Developers. It's quick & easy.

Compare 2 files and discard common lines

P: n/a
I have a requirement to compare 2 text files and write to a 3rd file
only those lines that appear in the 2nd file but not in the 1st file.

Rather than re-invent the wheel I am wondering if anyone has written
anything already?

Jun 27 '08 #1
Share this Question
Share on Google+
12 Replies


P: n/a
On May 29, 6:36 pm, loial <jldunn2...@googlemail.comwrote:
I have a requirement to compare 2 text files and write to a 3rd file
only those lines that appear in the 2nd file but not in the 1st file.

Rather than re-invent the wheel I am wondering if anyone has written
anything already?
You can use the cmp(x, y) function to tell if a string is similar to
another.

going cmp('spam', 'eggs') will return 1 (spam is greater than eggs)
(have no idea why)
swapping the two give -1
and having 'eggs' and 'eggs' gives 0.

is that what you were looking for?
Jun 27 '08 #2

P: n/a
Kalibr wrote:
On May 29, 6:36 pm, loial <jldunn2...@googlemail.comwrote:
>I have a requirement to compare 2 text files and write to a 3rd file
only those lines that appear in the 2nd file but not in the 1st file.

Rather than re-invent the wheel I am wondering if anyone has written
anything already?

You can use the cmp(x, y) function to tell if a string is similar to
another.
Or "==".

Stefan
Jun 27 '08 #3

P: n/a
loial wrote:
I have a requirement to compare 2 text files and write to a 3rd file
only those lines that appear in the 2nd file but not in the 1st file.
lines_in_file2 = set(open("file2").readlines())
for line in open("file1"):
if line not in lines_in_file2:
print line

Stefan
Jun 27 '08 #4

P: n/a
On May 29, 10:36*am, loial <jldunn2...@googlemail.comwrote:
I have a requirement to compare 2 text files and write to a 3rd file
only those lines that appear in the 2nd file but not in the 1st file.

Rather than re-invent the wheel I am wondering if anyone has written
anything already?
How large are the files ? You could load up the smallest file into
memory then while iterating over the other one just do 'if line in
other_files_lines:' and do your processing from there. By your
description it doesn't sound like you want to iterate over both files
simultaneously and do a line for line comparison because that would
mean if someone plonks an extra newline somewhere it wouldn't gel.
Jun 27 '08 #5

P: n/a
Another way of doing this might be to use the module difflib to
calculate the differences. It has a sequence matcher under it which
has the function get_matching_blocks

difflib is included with python.
On May 29, 2:02*pm, Chris <cwi...@gmail.comwrote:
On May 29, 10:36*am, loial <jldunn2...@googlemail.comwrote:
I have a requirement to compare 2 text files and write to a 3rd file
only those lines that appear in the 2nd file but not in the 1st file.
Rather than re-invent the wheel I am wondering if anyone has written
anything already?

How large are the files ? You could load up the smallest file into
memory then while iterating over the other one just do 'if line in
other_files_lines:' and do your processing from there. *By your
description it doesn't sound like you want to iterate over both files
simultaneously and do a line for line comparison because that would
mean if someone plonks an extra newline somewhere it wouldn't gel.
Jun 27 '08 #6

P: n/a
On May 29, 6:36 pm, loial <jldunn2...@googlemail.comwrote:
I have a requirement to compare 2 text files and write to a 3rd file
only those lines that appear in the 2nd file but not in the 1st file.

Rather than re-invent the wheel I am wondering if anyone has written
anything already?
>>file1 = set((x for x in open('file1')))
file2 = set((x for x in open('file2')))
file3 = file2.difference(file1)
open('file3','w').writelines(file3)
Jun 27 '08 #7

P: n/a
On May 29, 1:36*am, loial <jldunn2...@googlemail.comwrote:
only those lines that appear in the 2nd file but not in the 1st file.
set(file_2_recs).difference(set(file_1_recs)) will give the recs in
file_2 that are not in file_1 if you can store both files in memory.
Sets are indexed and so are faster than lists.
Jun 27 '08 #8

P: n/a
Open('3rd', 'w').writelines(set(open('2nd').readlines())-set(open('1st')))

2008/5/29, loial <jl********@googlemail.com>:
I have a requirement to compare 2 text files and write to a 3rd file
only those lines that appear in the 2nd file but not in the 1st file.

Rather than re-invent the wheel I am wondering if anyone has written
anything already?

--
http://mail.python.org/mailman/listinfo/python-list

--
mvh Björn
Jun 27 '08 #9

P: n/a
2008/5/29, loial <jl********@googlemail.com>:
>I have a requirement to compare 2 text files and write to a 3rd file
only those lines that appear in the 2nd file but not in the 1st file.
En Thu, 29 May 2008 18:08:28 -0300, BJörn Lindqvist <bj*****@gmail.com>
escribió:
Open('3rd','w').writelines(set(open('2nd').readlin es())-set(open('1st')))
Is the asymmetry 1st/2nd intentional? I think one could omit .readlines()
in 2nd file too.

--
Gabriel Genellina

Jun 27 '08 #10

P: n/a
On May 29, 3:36*am, loial <jldunn2...@googlemail.comwrote:
I have a requirement to compare 2 text files and write to a 3rd file
only those lines that appear in the 2nd file but not in the 1st file.

Rather than re-invent the wheel I am wondering if anyone has written
anything already?
Take the time to learn difflib - it is a standard module, and good for
general comparison of files, sequences, etc.

-- Paul
Jun 27 '08 #11

P: n/a
On Thu, 29 May 2008 01:36:44 -0700, loial wrote:
I have a requirement to compare 2 text files and write to a 3rd file
only those lines that appear in the 2nd file but not in the 1st file.

Rather than re-invent the wheel I am wondering if anyone has written
anything already?
Of course you can do this at any linux or unix command line simply by:

comm -13 file1 file2 >file3
Jun 27 '08 #12

P: n/a
Lie
On May 29, 3:36*pm, loial <jldunn2...@googlemail.comwrote:
I have a requirement to compare 2 text files and write to a 3rd file
only those lines that appear in the 2nd file but not in the 1st file.

Rather than re-invent the wheel I am wondering if anyone has written
anything already?
It's so easy to do that it won't count as reinventing the wheel:

a = open('a.txt', 'r').read().split('\n')
b = open('b.txt', 'r').read().split('\n')
c = open('c.txt', 'w')
c.write('\n'.join([comm for comm in b if not (comm in a)]))
c.close()

it's not the fastest common searcher but it works.
Jun 27 '08 #13

This discussion thread is closed

Replies have been disabled for this discussion.