473,399 Members | 3,888 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,399 software developers and data experts.

How can I find top words frequencies of combined files?

I need to take two files and print the top most frequent words they have in common as well as their combined(sum) frequencies.
Expand|Select|Wrap|Line Numbers
  1. def mostFrequent(word,frequency,n):
  2.    my_list = zip(word,frequency) #combine the two lists
  3.    my_list.sort(key=lambda x:x[1],reverse=True) #sort by freq
  4.    words,freqs = zip(*my_list[:n]) #take the top n entries and split back to seperate lists
  5.    return words, freqs #return our most frequent words in order   
  6. from wordFrequencies import * #gives both the word and its frequency in a file
  7. L1 = wordFrequencies('file1.txt')
  8. words1 = L1[0]
  9. freqs1 = L1[1]
  10. L2 = wordFrequencies('file2.txt')
  11. words2 = L2[0]
  12. freqs2 = L2[1]
  13. print mostFrequent(words,freqs,20)
I've tried
Expand|Select|Wrap|Line Numbers
  1. L1 = WordFrequencies('file1.txt')
  2. words1 = set(L1[0])
  3. freqs1 = set(L1[1])
  4. L2 = WordFrequencies('file2.txt')
  5. words2 = set(L2[0])
  6. freqs2 = set(L2[1])
  7. words3 = words1.intersection(words2)
  8. freqs3 = freqs1.intersection(freqs2)
  9. print mostFrequent(words3,freqs3,20)
but it didn't work. It outputed the wrong words
Mar 8 '13 #1
1 2246
dwblas
626 Expert 512MB
We don't have the code for the function WordFrequencies(). It looks like it is returning some kind of container containing the word and the number of times it is found. The answer depends on if "L1" and "L2" (non-descriptive variable names don't tell us anything) are lists, sets, or dictionaries. In whatever case, combine the two and do a sort on the number converted to an integer if it is not one already, so the container will be in order of frequency. Then print how ever many words you want. On the code you posted, I would suggest that you print words1 and freqs1 as I don't think it contains what you want.
Expand|Select|Wrap|Line Numbers
  1.  L1 = WordFrequencies('file1.txt')
  2. words1 = set(L1[0])
  3. freqs1 = set(L1[1]) 
Mar 8 '13 #2

Sign in to post your reply or Sign up for a free account.

Similar topics

2
by: Adams-Blake Co. | last post by:
Can this be done in PHP? I want to read a directory and for each file with a .sql extension that is older than 3 days, I want to delete it. I have a cron job that makes mysqldump files ...
12
by: hokiegal99 | last post by:
The below code does what I need it to do, but I thought that using something like ext = os.path.splitext(fname) and then searching ext for '.mp3' would be a much more accurate approach to solving...
4
by: r | last post by:
if I have 2 arrays, @wordlist and @testlist, how can I create a third array that contains the words from @testlist that are not common to @wordlist? I thought I might use grep but can't figure it...
1
by: Larry | last post by:
I've been researching some of the on-line documentation for remote scripting. The MSDN on-line files on the subject (http://msdn.microsoft.com/library/en-us/rmscpt/html/rmscpt.htm) make...
5
by: Paula | last post by:
Hi !! I have to find some words in a string. I can use string.IndexOf, LastIndexOf, etc, but they are case sensitive. And there is another problem : If I found the word, I have to get three...
4
by: Gary Wessle | last post by:
Hi is there a module to do things like concatenate all files in a given directory into a big file, where all the files have the same data formate? name address phone_no. or do I have to open...
4
by: vikerneso | last post by:
I'm trying to write a program that would take a text file and searched for every word starting with a user specified letter. i.e. user inputs 'a' the program searches through the text file and...
5
by: krisbee1983 | last post by:
Hello to all, I'm beginer in learning Python I wish somebody help me with solving this problem. I would like to read all text files wchich are in some folder. For this text files I need to make...
3
by: Peter Proost | last post by:
Hi group first of all I need to say that I almost never use regex hence my question may be stupid. I'm using regex to find all words that start with an @ in a string. But the regex that I figured...
2
by: firian | last post by:
It's my first post here, so hi all. Anyway, I am trying to work this out, but can't do this alone. I have a specific line of text, I already replaced it with something more user-friendly using this...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.