473,511 Members | 12,017 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

How can I find top words frequencies of combined files?

1 New Member
I need to take two files and print the top most frequent words they have in common as well as their combined(sum) frequencies.
Expand|Select|Wrap|Line Numbers
  1. def mostFrequent(word,frequency,n):
  2.    my_list = zip(word,frequency) #combine the two lists
  3.    my_list.sort(key=lambda x:x[1],reverse=True) #sort by freq
  4.    words,freqs = zip(*my_list[:n]) #take the top n entries and split back to seperate lists
  5.    return words, freqs #return our most frequent words in order   
  6. from wordFrequencies import * #gives both the word and its frequency in a file
  7. L1 = wordFrequencies('file1.txt')
  8. words1 = L1[0]
  9. freqs1 = L1[1]
  10. L2 = wordFrequencies('file2.txt')
  11. words2 = L2[0]
  12. freqs2 = L2[1]
  13. print mostFrequent(words,freqs,20)
I've tried
Expand|Select|Wrap|Line Numbers
  1. L1 = WordFrequencies('file1.txt')
  2. words1 = set(L1[0])
  3. freqs1 = set(L1[1])
  4. L2 = WordFrequencies('file2.txt')
  5. words2 = set(L2[0])
  6. freqs2 = set(L2[1])
  7. words3 = words1.intersection(words2)
  8. freqs3 = freqs1.intersection(freqs2)
  9. print mostFrequent(words3,freqs3,20)
but it didn't work. It outputed the wrong words
Mar 8 '13 #1
1 2259
dwblas
626 Recognized Expert Contributor
We don't have the code for the function WordFrequencies(). It looks like it is returning some kind of container containing the word and the number of times it is found. The answer depends on if "L1" and "L2" (non-descriptive variable names don't tell us anything) are lists, sets, or dictionaries. In whatever case, combine the two and do a sort on the number converted to an integer if it is not one already, so the container will be in order of frequency. Then print how ever many words you want. On the code you posted, I would suggest that you print words1 and freqs1 as I don't think it contains what you want.
Expand|Select|Wrap|Line Numbers
  1.  L1 = WordFrequencies('file1.txt')
  2. words1 = set(L1[0])
  3. freqs1 = set(L1[1]) 
Mar 8 '13 #2

Sign in to post your reply or Sign up for a free account.

Similar topics

2
3145
by: Adams-Blake Co. | last post by:
Can this be done in PHP? I want to read a directory and for each file with a .sql extension that is older than 3 days, I want to delete it. I have a cron job that makes mysqldump files ...
12
7937
by: hokiegal99 | last post by:
The below code does what I need it to do, but I thought that using something like ext = os.path.splitext(fname) and then searching ext for '.mp3' would be a much more accurate approach to solving...
4
10114
by: r | last post by:
if I have 2 arrays, @wordlist and @testlist, how can I create a third array that contains the words from @testlist that are not common to @wordlist? I thought I might use grep but can't figure it...
1
1520
by: Larry | last post by:
I've been researching some of the on-line documentation for remote scripting. The MSDN on-line files on the subject (http://msdn.microsoft.com/library/en-us/rmscpt/html/rmscpt.htm) make...
5
7828
by: Paula | last post by:
Hi !! I have to find some words in a string. I can use string.IndexOf, LastIndexOf, etc, but they are case sensitive. And there is another problem : If I found the word, I have to get three...
4
1295
by: Gary Wessle | last post by:
Hi is there a module to do things like concatenate all files in a given directory into a big file, where all the files have the same data formate? name address phone_no. or do I have to open...
4
6828
by: vikerneso | last post by:
I'm trying to write a program that would take a text file and searched for every word starting with a user specified letter. i.e. user inputs 'a' the program searches through the text file and...
5
2107
by: krisbee1983 | last post by:
Hello to all, I'm beginer in learning Python I wish somebody help me with solving this problem. I would like to read all text files wchich are in some folder. For this text files I need to make...
3
2897
by: Peter Proost | last post by:
Hi group first of all I need to say that I almost never use regex hence my question may be stupid. I'm using regex to find all words that start with an @ in a string. But the regex that I figured...
2
2743
by: firian | last post by:
It's my first post here, so hi all. Anyway, I am trying to work this out, but can't do this alone. I have a specific line of text, I already replaced it with something more user-friendly using this...
0
7251
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
7148
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
7367
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
1
7089
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
1
5072
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
3230
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
3217
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
1581
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
1
790
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.