append docname & linenumber to dic-value

Hey guys,
I am comparing two documents - if a word is in both documents, it gets added as a new key to a dictionary.
As the dictionary value I would like to store the documents name and the line# the word was found on.
Here is what I have so far with comments:

Expand|Select|Wrap|Line Numbers

 dic = {}

def matchtermer():

    f3 = open('korpus/avis.txt')

    f4 = open("ordliste_output_kort.txt")

    text3 = f3.read()

    text4 = f4.read()

    ordliste2 = text3.split()

    ordliste3 = text4.split()

    wordlist2 = []
 
    for word1 in ordliste2: #this part removes end characters that aren't part of the word and makes all lowercase

        # last character of each word

        lastchar = word1[-1:]

        # use a list of punctuation marks

        if lastchar in [",", ".", "!", "?", ";"]:

            word2 = word1.rstrip(lastchar)

        else:

            word2 = word1

        # build a wordList of lower case modified words

        wordlist2.append(word2.lower())
 
    for word in wordlist2: # and finally this compares the two documents

        if word in ordliste3:

            if word not in dic.keys():

                dic[word]=[]  #if word not in dic, create it

            #dic[word].append(docname, linenumber) - this is what I want to do - obviously this does not work

    return dic

Dec 17 '07 #1

Subscribe Post Reply

1405

bvdet

2,851

Expert Mod 2GB

I think this will do it:

Expand|Select|Wrap|Line Numbers

 import string, re
 
def wordList(words):

    patt = re.compile(r'\d+')

    # eliminate words with digits, strip punctuation and whitespace, lowercase

    word_list = [word.strip().strip(string.punctuation).lower() for word \

                 in words.split() if not patt.search(word)]

    # elinimate blank words

    return [word for word in word_list if word != '']
 
def matchtermer(fn1, fn2):

    dd = {}

    # file to compare against

    f1 = open(fn1).read()

    # file to compare

    f2 = open(fn2).readlines()

    word_list = wordList(f1)

    for i, line in enumerate(f2):

        for word in line.split():

            word = word.strip().strip(string.punctuation).lower()

            if word in word_list:

                dd.setdefault(word, []).append((fn2, i+1))

    return dd

Usage:

Expand|Select|Wrap|Line Numbers

wordDict = matchtermer('words1.txt', 'words2.txt')

Dec 17 '07 #2

Similar topics

append special chars with "\"

by: tertius | last post by:

Is there a better way to append certain chars in a string with a backslash that the example below? chr = "#$%^&_{}" # special chars to look out for str = "123 45^ & 00 0_" # string to...

Python

Timing Difference: insert vs. append & reverse

by: John Keeling | last post by:

Dear all, I tried the test program below. My interest is to examine timing differences between insert vs. append & reverse for a list. My results on my XP Python 2.3.4 are as follows:...

Python

SP3 & CURSORS

by: Ahmed B. Zayan | last post by:

We just installed SP3 and the cursor behaviors changed, does anyone know anything about that? I call this stored procedure from DTS: DECLARE Queue_cursor SCROLL CURSOR FOR SELECT...

Microsoft SQL Server

Recordsets & Append query

by: Paul Wagstaff | last post by:

Hi there I have 2 tables: tblAccuracy & tblClearance Users add new records to tblAccuracy using frmRegister. Under specific conditions I need to append the current record from frmRegister into...

Microsoft Access / VBA

.NET 2.0 XML Validation - Exception sets LineNumber=0 and LinePosition=0

by: Eric M L | last post by:

I am wondering if I am alone with this problem. Using VS 2005, I must validate an XML file via a Schema and it works well. When I get the schema exception and check the LineNumber and...

.NET Framework

XmlReader and LineNumber

by: jonfroehlich | last post by:

According to the MSDN documentation within the XmlTextReader class for ..NET 2.0, the recommended practice to create XmlReader instances is using the XmlReaderSettings class and the...

.NET Framework

K&R -> ANSI?

by: Zach | last post by:

I am looking for a program which can automatically convert K&R C code to ANSI C code. Zach

C / C++

update custom.dic

by: =?Utf-8?B?SmFzb24=?= | last post by:

Is there any way to programmatically update (add custom words) to the custom.dic file (office's custom dictionary file) using .NET (vb or c#)? Any com interface? I tried looking in the...

.NET Framework

Interop: C++ Console & C# GUI

by: Sheikko | last post by:

Sincerly is a little bit complicated to explain to you what I have in my mind, but I will try: Above all the problem is the type of data that I want to passe between these two applications. The...

C# / C Sharp

How do I append data to XML file with PHP

by: Hags007 | last post by:

I have a XML file I am working with. This file has been created by hand and I now need to develop a PHP script that will create it in the same format. Here is what I have thus far: $query =...

XML

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

C# / C Sharp