hi
i wrote some code to compare 2 files. One is the base file, the other
file i got from somewhere. I need to compare this file against the
base,
eg base file
abc
def
ghi
eg another file
abc
def
ghi
jkl
after compare , the base file will be overwritten with "jkl". Also both
files tend to grow towards > 20MB ..
Here is my code...using difflib.
pat = re.compile(r'^\ +') ## i want to get rid of the '+' from the
difflib output...
def difference(file name,basename):
import difflib
base = open(basename)
a = base.readlines( )
input = open(filename)
b = input.readlines ()
d = difflib.Differ( )
diff = list(d.compare( a, b))
if len(diff) > 0:
os.remove(basen ame)
o = open(basename, "aU")
for i in diff:
if pat.search(i):
i = i.lstrip("\+ ")
o.writelines(i) ## write a new base
file...
o.close()
g = open(basename)
return g.readlines()
Whenever the 2 files get very large, i find that it's very slow
comparing...any good advice to speed things up.? I thought of removing
readlines() method, and use line by line compare. Is it a better way?
thanks 3 4309 s9************@ yahoo.com wrote: hi i wrote some code to compare 2 files. One is the base file, the other file i got from somewhere. I need to compare this file against the base, eg base file abc def ghi
eg another file abc def ghi jkl
after compare , the base file will be overwritten with "jkl". Also both files tend to grow towards > 20MB ..
Here is my code...using difflib.
pat = re.compile(r'^\ +') ## i want to get rid of the '+' from the difflib output... def difference(file name,basename): import difflib base = open(basename) a = base.readlines( ) input = open(filename) b = input.readlines () d = difflib.Differ( ) diff = list(d.compare( a, b)) if len(diff) > 0: os.remove(basen ame) o = open(basename, "aU") for i in diff: if pat.search(i): i = i.lstrip("\+ ") o.writelines(i) ## write a new base file... o.close() g = open(basename) return g.readlines()
Whenever the 2 files get very large, i find that it's very slow comparing...any good advice to speed things up.? I thought of removing readlines() method, and use line by line compare. Is it a better way? thanks
It seems like you want a new base that contains only those lines
contained in 'filename' that are not contained in 'basename' where
'basename' is an ordered subset of filename. In other words, the
'filename' file has all of the lines of 'basename' in order somewhere
but 'basename' has some additional lines. Is that correct? difflib looks
to be overkill for this. Here is a suggestion:
basefile = open(basename)
newfile = open(filename)
baseiter = basefile.xreadl ines()
newiter = newfile.xreadli nes()
newbase = open('tmp.txt', 'w')
for baseline in baseiter:
for newline in newiter:
if baseline != newline:
newbase.write(n ewline)
else:
break
for afile in (basefile, newfile, newbase): afile.close()
If 'basename'is not an ordered subset of 'filename', then difflib seems
to be your best bet because you have a computationally intensive problem.
James
--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095 http://www.jamesstroud.com/
thanks for the reply,
I have used another method to solve my problem. ie
1) get the total count of the first file
2) write this total count to basecnt eg basecnt
3) get another file, get the total count of this file. eg filecnt
4) if filecnt > basecnt, read in the values from file[basecnt:filecnt]
5) if filecnt < basecnt, overwrite original basecnt and start over
again.
basically, the problem domain is i want to get the most current records
from a log file to review after every 3 hours. so this log file will
increase or accumulate.
<s9************ @yahoo.com> wrote in message
news:11******** **************@ e56g2000cwe.goo glegroups.com.. . thanks for the reply, I have used another method to solve my problem. ie 1) get the total count of the first file 2) write this total count to basecnt eg basecnt 3) get another file, get the total count of this file. eg filecnt 4) if filecnt > basecnt, read in the values from file[basecnt:filecnt] 5) if filecnt < basecnt, overwrite original basecnt and start over again.
basically, the problem domain is i want to get the most current records from a log file to review after every 3 hours. so this log file will increase or accumulate.
I did this:
fp = os.popen('/usr/sbin/logtail /var/log/syslog')
loglines = fp.readlines()
.... pyparsing ... stuff .... from loglines
;-)
Python is maybe overkill too - have "cron" call "logtail" and pibe the
output whereever?
PS:
"logtail" is very simple, it works simply by maintaining a "bookmark" from
the last read that is updated after each time the file is read (i.e. on each
call). It is probably a very easy thing to implement in Python. On
Linux/UNIX syslog+logutils can do a lot of work just by configuration (but
you did not say you are on unix) This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: hakhan |
last post by:
Hello,
I need to store huge(+/- 100MB) data. Furthermore, my GUI application
must select data portions from these huge data files in order to do
some post-processing. I wonder in which format I should put my data
in? XML or just a (relational) database? Or should I use an XML
database (native or xml-enabled?)??? I am a little bit confused ....
If I'd put the data in XML files, then loading the entire XML tree in
memory(DOM) would...
|
by: Odd-R. |
last post by:
I have to lists, A and B, that may, or may not be equal. If they are not
identical, I want the output to be three new lists, X,Y and Z where X has
all the elements that are in A, but not in B, and Y contains all the
elements that are B but not in A. Z will then have the elements that are
in both A and B.
One way of doing this is of course to iterate throug the lists and compare
each of the element, but is there a more efficient way?
...
|
by: purifier |
last post by:
The problem is to write a program in 'C' to find the greatest of 2 given
numbers... Easy? huh
here's the catch
do not use 'if' or any conditional statements
if u want it to be a little more tougher you can use the if but this time
no relational operators or any of the predefined functions....
Can someone please help me solve the problem....
|
by: ddd |
last post by:
I am trying to build a diff tool that allows me to compare two HTML files. I
am looking for resources on how to achive this. The main problem is that I do
not want to simply highlight the line of code where the change happened, but
rather the word/text that changed.
Example say the html file contains a table with three cells/one row, and all
that changes between the two HTML files that I want to compare is the value
on the second cell. I...
|
by: richardkreidl |
last post by:
I have the following hash script that I use to compare two text files.
'Class
Public Class FileComparison
Public Class FileComparisonException
Public Enum ExceptionType
U 'Unknown
A 'Add
D 'Delete
| |
by: Frost |
last post by:
Hi All,
I am a newbie i have written a c program on unix for line by line
comparison for two files now could some one help on how i could do word
by word comparison in case both lines have the same words but in
jumbled order they should match and print only the dissimilar lines.The
program also checks for multiple entries of the same line.
Here file 2 converts to file 3 which is in the format of file1 and i
compare file1 with file3.
|
by: ma740988 |
last post by:
There's a need for me to move around at specified offsets within
memory. As as a result - long story short - unsigned char* is the type
of choice.
At issue: Consider the case ( test code ) where I'm comparing two
structs. The struct test1 has information with regards to data_size
and pointer
to address. The struct test2 has information with regards to data_size
and value. I will compare test1 and test2. For each matching data
size,...
|
by: ranganadh |
last post by:
Dear Group members,
I am new to LINQ, pls help on the deeling with huge amount of data
with the C# stand Alone application.
I have two file, which contains more then 2 lacs lines in every file
suppose file1 like ...
|
by: Avi1 |
last post by:
Hi,
I got the code (from the internet)for comparing two files and showing the difference in contents.Now,I tried the same code for two files written in japanese language(kanji).If I save the two japanese .txt files in ANSI format,it works fine,but, if I save them in formats like 'UTF-8','unicode','unicode bigendian',it doesn't show the differences properly....keeps showing odd symbols instead of the japanese characters.
Would be glad if...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it.
First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
| |
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
| |
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |