By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,712 Members | 763 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,712 IT Pros & Developers. It's quick & easy.

how to ingrate my code to read txt file in dirctory(folder)and subdirectory? PLZ help

P: 17
how to ingrate my code to read text in in parent folder contain sub folders and files for example folder name is cars and sub file is Toyota,Honda and BMW and Toyota contain file name Camry and file name corolla, file name Honda contain folder accord and BMW contain file name X5

Is there way to enter name of parent folder(cars) and search in all sub folder(Toyota,Honda and BMW) and files ?

please help ASAP

code is find most frequent word in one text file and print them in decrease order
and I wont it to find most frequant word in all text files (together) under specific folder

Expand|Select|Wrap|Line Numbers
  1. # count words in a text and show the first ten items
  2. # by decreasing frequency
  3.  
  4. # sample text for testing
  5.  
  6. import sys
  7. import string
  8. import re
  9. file = open ("arb.txt", "r")
  10. text = file.read ( )
  11. file.close ( )
  12.  
  13. word_freq = {}
  14.  
  15. word_list = text.split()
  16.  
  17. for word in word_list:
  18.     # word all lower case
  19.     word = word.lower()
  20.     # strip any trailing period or comma
  21.     word = word.rstrip('.,/"-_;\[]()')
  22.     # build the dictionary
  23.     count = word_freq.get(word, 0)
  24.     word_freq[word] = count + 1
  25.  
  26. # create a list of (freq, word) tuples
  27. freq_list = [(freq, word) for word, freq in word_freq.items()]
  28.  
  29. # sort the list by the first element in each tuple (default)
  30. freq_list.sort(reverse=True)
  31.  
  32. for n, tup in enumerate(freq_list):
  33.     # print the first ten items
  34.     if n < 10:
  35.         freq, word = tup
  36.         print freq, word
Mar 18 '08 #1
Share this Question
Share on Google+
11 Replies


jlm699
100+
P: 314
Is there way to enter name of parent folder(cars) and search in all sub folder(Toyota,Honda and BMW) and files ?
I'm sorry, but it's very difficult to understand what it is that you are asking. I can provide you with some direction however...

Perhaps something you're looking for is os.walk. Here is a sample:
Expand|Select|Wrap|Line Numbers
  1. >>> for root, dirs, files in os.walk(os.getcwd()):
  2. ...     print 'Looking into %s' % root.split('\\')[-1]
  3. ...     print 'Found %d dirs and %d files' % (len(dirs), len(files))
  4. ...     for idx, dir in enumerate(dirs):
  5. ...         print 'Directory #%d: %s' % (idx + 1, dir)
  6. ...     for idx, file in enumerate(files):
  7. ...         print 'File #%d: %s' % (idx + 1, file)
  8. ...     
  9. Looking into pythtests
  10. Found 2 dirs and 16 files
  11. Directory #1: graphics
  12. Directory #2: Question
  13. File #1: bckmch.py
  14. File #2: cmdtest.py
  15. File #3: cobyla.py
  16. File #4: elseerr.py
  17. File #5: fileio.py
  18. File #6: ldict.py
  19. File #7: lid
  20. File #8: mainbody
  21. File #9: matrixprint.py
  22. File #10: matrx_print.py
  23. File #11: test.py
  24. File #12: test2.py
  25. File #13: topload
  26. File #14: totalbottle
  27. File #15: trivgame.py
  28. File #16: wxtemplate.py
  29. Looking into graphics
  30. Found 0 dirs and 8 files
  31. File #1: Buttons.py
  32. File #2: dice_class.py
  33. File #3: ghostchars.py
  34. File #4: graphics.py
  35. File #5: graphics.pyc
  36. File #6: graphics22.py
  37. File #7: graphics22.pyc
  38. File #8: hw6-template.py
  39. Looking into Question
  40. Found 0 dirs and 0 files
  41. >>> 
Hope that helps a little bit
Mar 18 '08 #2

P: 17
I mean example of parent dirctory (folder) is cars and example of subdirectory (folder) is (BMW,Honda,Toyota) so I wont to trace directory and all subdirctory
to find most frequant word in all text files (together) under specific folder


and I did not understand what your code mean
Mar 19 '08 #3

P: 17
thanx M.r jlm699 your reply was helpfull

but it does not match what I wont exactly

modifyig code is

Expand|Select|Wrap|Line Numbers
  1. # count words in a text and show the first ten items
  2. # by decreasing frequency
  3.  
  4. # sample text for testing
  5.  
  6. import sys
  7. import string
  8. import re
  9. import os.path
  10. for root, dirs, files in os.walk(os.getcwd()):
  11.   print 'Looking into %s' % root.split('\\')[-1]
  12.   print 'Found %d dirs and %d files' % (len(dirs), len(files))
  13.   for idx, dir in enumerate(dirs):
  14.     print 'Directory #%d: %s' % (idx + 1, dir)
  15.     for idx, file in enumerate(files):
  16.       print 'File #%d: %s' % (idx + 1, file)
  17.       ff = open (file, "r")
  18.       text = ff.read ( )
  19.       ff.close ( )
  20.  
  21.       word_freq = {}
  22.  
  23.       word_list = text.split()
  24.  
  25.       for word in word_list:
  26.         # word all lower case
  27.          word = word.lower()
  28.         # strip any trailing period or comma
  29.          word = word.rstrip('.,/"-_;\[]()')
  30.         # build the dictionary
  31.          count = word_freq.get(word, 0)
  32.          word_freq[word] = count + 1
  33.  
  34.     # create a list of (freq, word) tuples
  35.       freq_list = [(freq, word) for word, freq in word_freq.items()]
  36.  
  37.     # sort the list by the first element in each tuple (default)
  38.       freq_list.sort(reverse=True)
  39.  
  40.       for n, tup in enumerate(freq_list):
  41.         # print the first ten items
  42.          if n < 10:
  43.             freq, word = tup
  44.             print freq, word
  45.  
the output like

File #12: listtoDict.py
14 with
6 python
6 for
File #13: parseAddresses
3 python
1 with
1 will
and I need to find frequacy of word in all text file not seperat for examle the previos output shud be like

15 with
9 python
6 for
1 will

so add frequancy of word in (File #12: listtoDict.py) with (File #13: parseAddresses) and print thim in one list
Mar 19 '08 #4

jlm699
100+
P: 314
and I need to find frequacy of word in all text file
Just move your word_freq dictionary declaration to before you begin the for loop, and then move the sorting/printing of that structure to after the for loop. And you'll achieve this.
Mar 19 '08 #5

jlm699
100+
P: 314
Here's the modifications that I suggest above and the resulting output.
Expand|Select|Wrap|Line Numbers
  1. import sys, os
  2.  
  3. word_freq = {}
  4.  
  5. for root, dirs, files in os.walk(os.getcwd()):
  6.     print 'Looking into %s' % root.split('\\')[-1]
  7.     print 'Found %d dirs and %d files' % (len(dirs), len(files))
  8.  
  9.     for idx, file in enumerate(files):
  10.         ff = open (os.path.join(root, file), "r")
  11.         text = ff.read ( )
  12.         ff.close ( )
  13.  
  14.     word_list = text.strip().split()
  15.  
  16.     for word in word_list:
  17.         word = word.lower().rstrip('.,/"-_;\\[]()')
  18.         if word.isalpha():
  19.             # build the dictionary
  20.             count = word_freq.get(word, 0)
  21.             word_freq[word] = count + 1
  22.  
  23.     # create a list of (freq, word) tuples
  24.     freq_list = [(freq, word) for word, freq in word_freq.items()]
  25.  
  26.     # sort the list by the first element in each tuple (default)
  27.     freq_list.sort(reverse=True)
  28.  
  29. for n, tup in enumerate(freq_list):
  30.     # print the first ten items
  31.     if n < 10:
  32.         freq, word = tup
  33.         print freq, word
Output:
Expand|Select|Wrap|Line Numbers
  1. Microsoft Windows XP [Version 5.1.2600]
  2. (C) Copyright 1985-2001 Microsoft Corp.
  3.  
  4. C:\Documents and Settings\Administrator>cd Desktop\pythtests
  5.  
  6. C:\Documents and Settings\Administrator\Desktop\pythtests>python walkncount.py
  7. Looking into pythtests
  8. Found 2 dirs and 17 files
  9. Looking into graphics
  10. Found 0 dirs and 8 files
  11. Looking into Question
  12. Found 0 dirs and 0 files
  13. 46 the
  14. 17 and
  15. 14 of
  16. 14 a
  17. 12 is
  18. 10 to
  19. 10 in
  20. 8 you
  21. 8 this
  22. 8 that
  23.  
  24. C:\Documents and Settings\Administrator\Desktop\pythtests>
Mar 19 '08 #6

P: 17
thanx alot
but it is actualy read all file but print frequancy of only one of them
not print frequancy of word in all file which I wont
Mar 19 '08 #7

jlm699
100+
P: 314
read all file but print frequancy of only one of them
Ok... I'm not sure exactly what you mean by that but I think that you're trying to say you only want to display the frequency of words in the file with the highest frequencies?

Expand|Select|Wrap|Line Numbers
  1. import sys, os
  2.  
  3. highest_freq = [(0,'Blank')]
  4. high_file_name = ''
  5.  
  6. for root, dirs, files in os.walk(os.getcwd()):
  7.     # print 'Looking into %s' % root.split('\\')[-1]
  8.     # print 'Found %d dirs and %d files' % (len(dirs), len(files))
  9.  
  10.     for idx, file in enumerate(files):
  11.         # print 'File #%d: %s' % (idx + 1, file)
  12.         ff = open (os.path.join(root, file), "r")
  13.         text = ff.read ( )
  14.         ff.close ( )
  15.  
  16.         word_freq = {}
  17.         word_list = text.strip().split()
  18.  
  19.         for word in word_list:
  20.             word = word.lower().rstrip('.,/"-_;\\[]()')
  21.             if word.isalpha():
  22.                 # build the dictionary
  23.                 word_freq[word] = word_freq.get(word, 0) + 1
  24.  
  25.         # create a list of (freq, word) tuples
  26.         freq_list = [(freq, word) for word, freq in word_freq.items()]
  27.  
  28.         # sort the list by the first element in each tuple (default)
  29.         freq_list.sort(reverse=True)
  30.         if freq_list:
  31.             if freq_list[0][0] > highest_freq[0][0]:
  32.                 highest_freq = freq_list
  33.                 high_file_name = file
  34.  
  35. print 'Highest frequency file: %s' % high_file_name
  36. for n, tup in enumerate(highest_freq):
  37.     if n < 10:
  38.         freq, word = tup
  39.         print freq, word
  40. raw_input('\nHit enter to exit')
  41.  
Output:
Expand|Select|Wrap|Line Numbers
  1. Highest frequency file: graphics.py
  2. 93 def
  3. 44 return
  4. 36 the
  5. 31 in
  6. 29 of
  7. 26 for
  8. 25 if
  9. 23 to
  10. 19 class
  11. 19 a
  12.  
  13. Hit enter to exit
  14.  
This is a crude example so I apologize; however I don't understand what you're trying to do or why. So working with what you've given this is the most I can make of your question.
Mar 19 '08 #8

P: 17
realy I aprechat your trying to help
but unfortionatly that is no wat I ment

I meat is read all files in directory compin all words in all files and put them in new file then find freqancy of each word in taht new file
Mar 20 '08 #9

jlm699
100+
P: 314
realy I aprechat your trying to help
but unfortionatly that is no wat I ment

I meat is read all files in directory compin all words in all files and put them in new file then find freqancy of each word in taht new file
So basically, you're saying you want to combine the contents of all the files into a new file, and then find the frequency of the words in that file?

Well to do that w/o creating a new file would be a very slight change from a previous post:
Expand|Select|Wrap|Line Numbers
  1. import sys, os
  2.  
  3. word_freq = {}
  4.  
  5. for root, dirs, files in os.walk(os.getcwd()):
  6.     print 'Looking into %s' % root.split('\\')[-1]
  7.     print 'Found %d dirs and %d files' % (len(dirs), len(files))
  8.  
  9.     for idx, file in enumerate(files):
  10.         ff = open (os.path.join(root, file), "r")
  11.         text = ff.read ( )
  12.         ff.close ( )
  13.  
  14.         word_list = text.strip().split()
  15.  
  16.         for word in word_list:
  17.             word = word.lower().rstrip('.,/"-_;\\[]()')
  18.             if word.isalpha():
  19.                 # build the dictionary
  20.                 count = word_freq.get(word, 0)
  21.                 word_freq[word] = count + 1
  22.  
  23.     # create a list of (freq, word) tuples
  24.     freq_list = [(freq, word) for word, freq in word_freq.items()]
  25.  
  26.     # sort the list by the first element in each tuple (default)
  27.     freq_list.sort(reverse=True)
  28.  
  29. for n, tup in enumerate(freq_list):
  30.     # print the first ten items
  31.     if n < 10:
  32.         print "%s times: %s" % tup
  33. raw_input('\nHit enter to exit')
  34.  
Mar 20 '08 #10

P: 17
thank you very much M.r jlm699 it is work fine now

I integrat program to be GUI using Tkinter and insted search in curent direction I try to be from browser
as you can see

Expand|Select|Wrap|Line Numbers
  1. # a look at the Tkinter Text widget
  2.  
  3. # use ctrl+c to copy, ctrl+x to cut selected text,
  4.  
  5. # ctrl+v to paste, and ctrl+/ to select all
  6.   # count words in a text and show the first ten items
  7.  # by decreasing frequency
  8.  
  9. import Tkinter as tk
  10. import os, glob
  11. import sys
  12. import string
  13. import re
  14. import tkFileDialog      
  15. def most_frequant_word():    
  16.  a= tkFileDialog.askdirectory()
  17.  browser= os.listdir(a)
  18.  
  19.  
  20.  for root, dirs, files in os.walk(browser):
  21.     print 'Looking into %s' % root.split('\\')[-1]
  22.     print 'Found %d dirs and %d files' % (len(dirs), len(files))
  23.  
  24.     for idx, file in enumerate(files):
  25.      ff = open (os.path.join(root, file), "r")
  26.      text = ff.read ( )
  27.      ff.close ( )
  28.  
  29.      word_list = text.strip().split()
  30.  
  31.      for word in word_list:
  32.       word = word.lower().rstrip('.,/"-_;\\[]()')
  33.  
  34.       if word.isalpha():
  35.                 # build the dictionary
  36.        count = word_freq.get(word, 0)
  37.        word_freq[word] = count + 1
  38.  
  39.        # create a list of (freq, word) tuples
  40.        freq_list = [(freq, word) for word, freq in word_freq.items()]
  41.  
  42.        # sort the list by the first element in each tuple (default)
  43.        freq_list.sort(reverse=True)
  44.  
  45.      for n, tup in enumerate(freq_list):
  46.     # print the first ten items
  47.       if n < 15:
  48.         print "%s times: %s" % tup
  49.         text1.insert(tk.INSERT, freq)
  50.         text1.insert(tk.INSERT, word)
  51.         text1.insert(tk.INSERT, "\n")
  52.  
  53.  raw_input('\nHit enter to exit')
  54.  
  55. root = tk.Tk(className = " most_frequant_word")
  56. # text entry field, width=width chars, height=lines text
  57. v1 = tk.StringVar()
  58. text1 = tk.Text(root, width=50, height=20, bg='green')
  59. text1.pack()
  60. # function listed in command will be executed on button click
  61. button1 = tk.Button(root, text='result', command=most_frequant_word)
  62. button1.pack(pady=5)
  63. text1.focus()
  64. root.mainloop()
but give me this error

Exception in Tkinter callback
Traceback (most recent call last):
File "C:\Python25\lib\lib-tk\Tkinter.py", line 1403, in __call__
return self.func(*args)
File "C:\Documents and Settings\Administrator\Desktop\ICS482\hw3\programA li.py", line 21, in most_frequant_word
for root, dirs, files in os.walk(browser):
File "C:\Python25\lib\os.py", line 285, in walk
names = listdir(top)
TypeError: coercing to Unicode: need string or buffer, list found
could you please help me to solve this problem
Mar 20 '08 #11

P: 17
I fix the error now
but
it will not insert to the textbox
it just print then hanging

Expand|Select|Wrap|Line Numbers
  1. # a look at the Tkinter Text widget
  2.  
  3. # use ctrl+c to copy, ctrl+x to cut selected text,
  4.  
  5. # ctrl+v to paste, and ctrl+/ to select all
  6.   # count words in a text and show the first ten items
  7.  # by decreasing frequency
  8.  
  9. import Tkinter as tk
  10. import os, glob
  11. import sys
  12. import string
  13. import re
  14. import tkFileDialog      
  15. def most_frequant_word():    
  16.  browser= tkFileDialog.askdirectory()
  17.  #browser= os.listdir(a)
  18.  
  19.  
  20.  for root, dirs, files in os.walk(browser):
  21.     print 'Looking into %s' % root.split('\\')[-1]
  22.     print 'Found %d dirs and %d files' % (len(dirs), len(files))
  23.     #text1.insert(tk.INSERT,'Looking into %s' % root.split('\\')[-1])
  24.     #text1.insert(tk.INSERT, 'Found %d dirs and %d files' % (len(dirs), len(files)))
  25.     for idx, file in enumerate(files):
  26.      print 'File #%d: %s' % (idx + 1, file)
  27.       #text1.insert(tk.INSERT, 'File #%d: %s' % (idx + 1, file))
  28.      ff = open (os.path.join(root, file), "r")
  29.      text = ff.read ( )
  30.      ff.close ( )
  31.      word_freq = {}
  32.  
  33.      word_list = text.strip().split()
  34.  
  35.      for word in word_list:
  36.       word = word.lower().rstrip('.,/"-_;\\[]()')
  37.  
  38.       if word.isalpha():
  39.                 # build the dictionary
  40.        count = word_freq.get(word, 0)
  41.        word_freq[word] = count + 1
  42.  
  43.        # create a list of (freq, word) tuples
  44.        freq_list = [(freq, word) for word, freq in word_freq.items()]
  45.  
  46.        # sort the list by the first element in each tuple (default)
  47.        freq_list.sort(reverse=True)
  48.  
  49.      for n, tup in enumerate(freq_list):
  50.     # print the first ten items
  51.       if n < 50:
  52.         print "%s times: %s" % tup
  53.         text1.insert(tk.INSERT, freq)
  54.         text1.insert(tk.INSERT, word)
  55.         text1.insert(tk.INSERT, "\n")
  56.  
  57.  raw_input('\nHit enter to exit')
  58.  
  59. root = tk.Tk(className = " most_frequant_word")
  60. # text entry field, width=width chars, height=lines text
  61. v1 = tk.StringVar()
  62. text1 = tk.Text(root, width=50, height=20, bg='green')
  63. text1.pack()
  64. # function listed in command will be executed on button click
  65. button1 = tk.Button(root, text='Brows', command=most_frequant_word)
  66. button1.pack(pady=5)
  67. text1.focus()
  68. root.mainloop()
code try to insert
Expand|Select|Wrap|Line Numbers
  1.  print "%s times: %s" % tup
  2.         text1.insert(tk.INSERT, freq)
  3.         text1.insert(tk.INSERT, word)
  4.         text1.insert(tk.INSERT, "\n")
when I wont to insert fil name and directory to the textbox it will hang also
code is comment

Expand|Select|Wrap|Line Numbers
  1. print 'Looking into %s' % root.split('\\')[-1]
  2.     print 'Found %d dirs and %d files' % (len(dirs), len(files))
  3.     #text1.insert(tk.INSERT,'Looking into %s' % root.split('\\')[-1])
  4.     #text1.insert(tk.INSERT, 'Found %d dirs and %d files' % (len(dirs), len(files)))
  5.     for idx, file in enumerate(files):
  6.      print 'File #%d: %s' % (idx + 1, file)
  7.       #text1.insert(tk.INSERT, 'File #%d: %s' % (idx + 1, file))
Mar 21 '08 #12

Post your reply

Sign in to post your reply or Sign up for a free account.