Jan 1 02:32:40 hello welcome to python world
Jan 1 02:32:40 hello welcome to python world
Mar 31 23:31:55 learn python
Mar 31 23:31:55 learn python be smart
Mar 31 23:31:56 python is good scripting language
Jan 1 00:00:01 hello welcome to python world
Jan 1 00:00:02 hello welcome to python world
Mar 31 23:31:55 learn python
Mar 31 23:31:56 python is good scripting language
The expected output file ( Let's say outputfile.txt ) should contain below records...
Jan 1 02:32:40 hello welcome to python world
Jan 1 02:32:40 hello welcome to python world
Mar 31 23:31:55 learn python
Mar 31 23:31:55 learn python be smart
Mar 31 23:31:56 python is good scripting language
Jan 1 00:00:01 hello welcome to python world
Jan 1 00:00:02 hello welcome to python world
Note: I need all the records (including duplicate) which are starting with "Jan 1" and also I don't need Duplicate records not starting with "Jan 1"
I have tried the following program where all the duplicate records are getting deleted.
Expand|Select|Wrap|Line Numbers
- def remove_Duplicate_Lines(inputfile, outputfile):
- with open(inputfile) as fin, open(outputfile, 'w') as out:
- lines = (line.rstrip() for line in fin)
- unique_lines = OrderedDict.fromkeys( (line for line in lines if line) )
- out.writelines("\n".join(unique_lines.iterkeys()))
- return 0
Jan 1 02:32:40 hello welcome to python world
Mar 31 23:31:55 learn python
Mar 31 23:31:55 learn python be smart
Mar 31 23:31:56 python is good scripting language
Jan 1 00:00:01 hello welcome to python world
Your help would be appreciated!!!