Compare multiple files for common entries

1 New Member

I apologize in advance, I'm trying to teach myself python in my spare time since I was assigned this task. I am working on a way to examine a directory of thousands of files looking for common entries. In this instance, we have multiple cases when I have extracted telephone numbers from thousands of pieces of code and stored them in individual folders for each case. However, the name of the files are all the same "telephone_histogram.txt". What I'm trying to accomplish is to figure out how to compare the entire directory and have it produce a file that tells me if a number appears in more than just one file and how many files does it appear in. The other problem is that each of the txt files have two columns, with the number appearing in the second column. Here's what we have got so far, but I've only been able to compare two files, not a whole directory:

Expand|Select|Wrap|Line Numbers

 
# Open each file and suck all of the data into an array called searchlines

# Then sort the array

with open("folder1/telephone_histogram.txt", "r") as f:

searchlines = f.readlines()

with open("folder2/telephone_histogram.txt", "r") as f:

searchlines = searchlines+f.readlines()

searchlines.sort();

# dupe will be the variable to compare against the value of the next line

# dupe_count will be the number of times the item is found in the file

# dupe is initialized to a junk value and dupe_count is set to 0

dupe="DUPE"

dupe_count=1

for i, line in enumerate(searchlines):

if dupe in line:

    dupe_count +=1;

else:

    if dupe_count==1:

        #Item is unique

        print searchlines[i-1];

        nothing=False# delete this line.  It is just here so I can comment out the lines before without error
 
    else:

        #Item is duplicated print the item preceeded by the number of times it was duplicated

        #print dupe_count, searchlines[i-1];

        nothing=False # delete this line.  It is just here so I can comment out the lines before without error

    dupe_count=1;

dupe=line;]

If you can help, thank you so much in advance

Apr 2 '13 #1

Subscribe Reply

4621

bvdet

2,851

Recognized Expert Moderator Specialist

I would approach it like this:

Generate a list of files to read. os.walk() is ideal for this.
Initialize a dictionary. The phone numbers will be the keys and the counts will be the values.
Iterate over the files, updating the dictionary with each entry.

Dictionary method get() or setdefault() can be used to increment the counts. Example:

Expand|Select|Wrap|Line Numbers

 >>> key = '555-555-5555'

>>> v = dd.get(key, 0)

>>> dd[key] = v+1

>>> key = '555-555-5556'

>>> v = dd.setdefault(key, 0)

>>> dd[key] += 1

>>>

Apr 2 '13 #2

by: Gleep | last post by:

Hey Guys, I've got a table called Outcomes. With 3 columns and 15 rows 1st col 2nd col 3rdcol outcome date price There are 15 rows...

PHP

REQ How would I compare multiple date fields in one table to find the latest entry Opps

by: Gleep | last post by:

sorry i didn't explain it correctly before my table is like this example fields: ID name username outcome date1 date2 date3 (etc..) - date15 price1 price2 price3 (etc..) I know that...

PHP

Finding common entries in multiple arraylists

by: Brandon Potter | last post by:

Trying to find the best way to find common entries in an x number of ArrayLists or arrays of integers. Curious if there is a method already available in .NET to do just this very thing. ...

.NET Framework

Upload multiple files using PHP and VBScript or JavaScript

by: Susan | last post by:

Is it possible to use PHP with VBScript or JavaScript to upload multiple files. I'd like to automate the upload of a complete directory (not using FTP). I don't need a common form with several...

PHP

VS 2005 VB.Net Namespace in Multiple files

by: msustrick | last post by:

I'm working on a large asp.net maintenance application that has numerous utilities. There can be multiple files associated with a particular utility. The Namespace standard is...

.NET Framework

downloading multiple files

by: fatima.issawi | last post by:

Hello, Is it possible to download multiple files? Right now I am using the following code in a loop, but I can only save one file - the first one. Response.Clear();...

ASP.NET

How to use cmp() function to compare 2 files?

by: yinglcs | last post by:

Hi, i have 2 files which are different (1 line difference): $ diff groupresult20070226190027.xml groupresult20070226190027-2.xml 5c5 < x:22 y:516 w:740 h:120 area: --- But when I use the...

Python

one submission of multiple files,multiple ids instead of one.

by: Jankie | last post by:

Hi all ! I hope someone help me with problem that I assume common,yet googling didnt even mention it. In brief: User uploads multiple files at one go. It is one submission of multiple files,so I...

MySQL Database

Opening multiple files with my applicaton

by: kimiraikkonen | last post by:

Hi, I have an app which has a listbox and when i double click an associated fileS, i want their paths to be added into listbox in my application. This code works good when i try to open a...

Visual Basic .NET

Compare multiple result sets

by: anilkodali | last post by:

How to compare multiple result sets with a set of values? Here is the scenario.. My query returns me multiple results(one column of data) and I want compare all the data at once with a set of...

DB2 Database

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

C# / C Sharp

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...

Networking - Hardware / Configuration

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...

C# / C Sharp

Compare multiple files for common entries

Similar topics