473,328 Members | 1,460 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,328 software developers and data experts.

how to ingrate my code to read txt file in dirctory(folder)and subdirectory? PLZ help

17
how to ingrate my code to read text in in parent folder contain sub folders and files for example folder name is cars and sub file is Toyota,Honda and BMW and Toyota contain file name Camry and file name corolla, file name Honda contain folder accord and BMW contain file name X5

Is there way to enter name of parent folder(cars) and search in all sub folder(Toyota,Honda and BMW) and files ?

please help ASAP

code is find most frequent word in one text file and print them in decrease order
and I wont it to find most frequant word in all text files (together) under specific folder

Expand|Select|Wrap|Line Numbers
  1. # count words in a text and show the first ten items
  2. # by decreasing frequency
  3.  
  4. # sample text for testing
  5.  
  6. import sys
  7. import string
  8. import re
  9. file = open ("arb.txt", "r")
  10. text = file.read ( )
  11. file.close ( )
  12.  
  13. word_freq = {}
  14.  
  15. word_list = text.split()
  16.  
  17. for word in word_list:
  18.     # word all lower case
  19.     word = word.lower()
  20.     # strip any trailing period or comma
  21.     word = word.rstrip('.,/"-_;\[]()')
  22.     # build the dictionary
  23.     count = word_freq.get(word, 0)
  24.     word_freq[word] = count + 1
  25.  
  26. # create a list of (freq, word) tuples
  27. freq_list = [(freq, word) for word, freq in word_freq.items()]
  28.  
  29. # sort the list by the first element in each tuple (default)
  30. freq_list.sort(reverse=True)
  31.  
  32. for n, tup in enumerate(freq_list):
  33.     # print the first ten items
  34.     if n < 10:
  35.         freq, word = tup
  36.         print freq, word
Mar 18 '08 #1
11 2099
jlm699
314 100+
Is there way to enter name of parent folder(cars) and search in all sub folder(Toyota,Honda and BMW) and files ?
I'm sorry, but it's very difficult to understand what it is that you are asking. I can provide you with some direction however...

Perhaps something you're looking for is os.walk. Here is a sample:
Expand|Select|Wrap|Line Numbers
  1. >>> for root, dirs, files in os.walk(os.getcwd()):
  2. ...     print 'Looking into %s' % root.split('\\')[-1]
  3. ...     print 'Found %d dirs and %d files' % (len(dirs), len(files))
  4. ...     for idx, dir in enumerate(dirs):
  5. ...         print 'Directory #%d: %s' % (idx + 1, dir)
  6. ...     for idx, file in enumerate(files):
  7. ...         print 'File #%d: %s' % (idx + 1, file)
  8. ...     
  9. Looking into pythtests
  10. Found 2 dirs and 16 files
  11. Directory #1: graphics
  12. Directory #2: Question
  13. File #1: bckmch.py
  14. File #2: cmdtest.py
  15. File #3: cobyla.py
  16. File #4: elseerr.py
  17. File #5: fileio.py
  18. File #6: ldict.py
  19. File #7: lid
  20. File #8: mainbody
  21. File #9: matrixprint.py
  22. File #10: matrx_print.py
  23. File #11: test.py
  24. File #12: test2.py
  25. File #13: topload
  26. File #14: totalbottle
  27. File #15: trivgame.py
  28. File #16: wxtemplate.py
  29. Looking into graphics
  30. Found 0 dirs and 8 files
  31. File #1: Buttons.py
  32. File #2: dice_class.py
  33. File #3: ghostchars.py
  34. File #4: graphics.py
  35. File #5: graphics.pyc
  36. File #6: graphics22.py
  37. File #7: graphics22.pyc
  38. File #8: hw6-template.py
  39. Looking into Question
  40. Found 0 dirs and 0 files
  41. >>> 
Hope that helps a little bit
Mar 18 '08 #2
alivip
17
I mean example of parent dirctory (folder) is cars and example of subdirectory (folder) is (BMW,Honda,Toyota) so I wont to trace directory and all subdirctory
to find most frequant word in all text files (together) under specific folder


and I did not understand what your code mean
Mar 19 '08 #3
alivip
17
thanx M.r jlm699 your reply was helpfull

but it does not match what I wont exactly

modifyig code is

Expand|Select|Wrap|Line Numbers
  1. # count words in a text and show the first ten items
  2. # by decreasing frequency
  3.  
  4. # sample text for testing
  5.  
  6. import sys
  7. import string
  8. import re
  9. import os.path
  10. for root, dirs, files in os.walk(os.getcwd()):
  11.   print 'Looking into %s' % root.split('\\')[-1]
  12.   print 'Found %d dirs and %d files' % (len(dirs), len(files))
  13.   for idx, dir in enumerate(dirs):
  14.     print 'Directory #%d: %s' % (idx + 1, dir)
  15.     for idx, file in enumerate(files):
  16.       print 'File #%d: %s' % (idx + 1, file)
  17.       ff = open (file, "r")
  18.       text = ff.read ( )
  19.       ff.close ( )
  20.  
  21.       word_freq = {}
  22.  
  23.       word_list = text.split()
  24.  
  25.       for word in word_list:
  26.         # word all lower case
  27.          word = word.lower()
  28.         # strip any trailing period or comma
  29.          word = word.rstrip('.,/"-_;\[]()')
  30.         # build the dictionary
  31.          count = word_freq.get(word, 0)
  32.          word_freq[word] = count + 1
  33.  
  34.     # create a list of (freq, word) tuples
  35.       freq_list = [(freq, word) for word, freq in word_freq.items()]
  36.  
  37.     # sort the list by the first element in each tuple (default)
  38.       freq_list.sort(reverse=True)
  39.  
  40.       for n, tup in enumerate(freq_list):
  41.         # print the first ten items
  42.          if n < 10:
  43.             freq, word = tup
  44.             print freq, word
  45.  
the output like

File #12: listtoDict.py
14 with
6 python
6 for
File #13: parseAddresses
3 python
1 with
1 will
and I need to find frequacy of word in all text file not seperat for examle the previos output shud be like

15 with
9 python
6 for
1 will

so add frequancy of word in (File #12: listtoDict.py) with (File #13: parseAddresses) and print thim in one list
Mar 19 '08 #4
jlm699
314 100+
and I need to find frequacy of word in all text file
Just move your word_freq dictionary declaration to before you begin the for loop, and then move the sorting/printing of that structure to after the for loop. And you'll achieve this.
Mar 19 '08 #5
jlm699
314 100+
Here's the modifications that I suggest above and the resulting output.
Expand|Select|Wrap|Line Numbers
  1. import sys, os
  2.  
  3. word_freq = {}
  4.  
  5. for root, dirs, files in os.walk(os.getcwd()):
  6.     print 'Looking into %s' % root.split('\\')[-1]
  7.     print 'Found %d dirs and %d files' % (len(dirs), len(files))
  8.  
  9.     for idx, file in enumerate(files):
  10.         ff = open (os.path.join(root, file), "r")
  11.         text = ff.read ( )
  12.         ff.close ( )
  13.  
  14.     word_list = text.strip().split()
  15.  
  16.     for word in word_list:
  17.         word = word.lower().rstrip('.,/"-_;\\[]()')
  18.         if word.isalpha():
  19.             # build the dictionary
  20.             count = word_freq.get(word, 0)
  21.             word_freq[word] = count + 1
  22.  
  23.     # create a list of (freq, word) tuples
  24.     freq_list = [(freq, word) for word, freq in word_freq.items()]
  25.  
  26.     # sort the list by the first element in each tuple (default)
  27.     freq_list.sort(reverse=True)
  28.  
  29. for n, tup in enumerate(freq_list):
  30.     # print the first ten items
  31.     if n < 10:
  32.         freq, word = tup
  33.         print freq, word
Output:
Expand|Select|Wrap|Line Numbers
  1. Microsoft Windows XP [Version 5.1.2600]
  2. (C) Copyright 1985-2001 Microsoft Corp.
  3.  
  4. C:\Documents and Settings\Administrator>cd Desktop\pythtests
  5.  
  6. C:\Documents and Settings\Administrator\Desktop\pythtests>python walkncount.py
  7. Looking into pythtests
  8. Found 2 dirs and 17 files
  9. Looking into graphics
  10. Found 0 dirs and 8 files
  11. Looking into Question
  12. Found 0 dirs and 0 files
  13. 46 the
  14. 17 and
  15. 14 of
  16. 14 a
  17. 12 is
  18. 10 to
  19. 10 in
  20. 8 you
  21. 8 this
  22. 8 that
  23.  
  24. C:\Documents and Settings\Administrator\Desktop\pythtests>
Mar 19 '08 #6
alivip
17
thanx alot
but it is actualy read all file but print frequancy of only one of them
not print frequancy of word in all file which I wont
Mar 19 '08 #7
jlm699
314 100+
read all file but print frequancy of only one of them
Ok... I'm not sure exactly what you mean by that but I think that you're trying to say you only want to display the frequency of words in the file with the highest frequencies?

Expand|Select|Wrap|Line Numbers
  1. import sys, os
  2.  
  3. highest_freq = [(0,'Blank')]
  4. high_file_name = ''
  5.  
  6. for root, dirs, files in os.walk(os.getcwd()):
  7.     # print 'Looking into %s' % root.split('\\')[-1]
  8.     # print 'Found %d dirs and %d files' % (len(dirs), len(files))
  9.  
  10.     for idx, file in enumerate(files):
  11.         # print 'File #%d: %s' % (idx + 1, file)
  12.         ff = open (os.path.join(root, file), "r")
  13.         text = ff.read ( )
  14.         ff.close ( )
  15.  
  16.         word_freq = {}
  17.         word_list = text.strip().split()
  18.  
  19.         for word in word_list:
  20.             word = word.lower().rstrip('.,/"-_;\\[]()')
  21.             if word.isalpha():
  22.                 # build the dictionary
  23.                 word_freq[word] = word_freq.get(word, 0) + 1
  24.  
  25.         # create a list of (freq, word) tuples
  26.         freq_list = [(freq, word) for word, freq in word_freq.items()]
  27.  
  28.         # sort the list by the first element in each tuple (default)
  29.         freq_list.sort(reverse=True)
  30.         if freq_list:
  31.             if freq_list[0][0] > highest_freq[0][0]:
  32.                 highest_freq = freq_list
  33.                 high_file_name = file
  34.  
  35. print 'Highest frequency file: %s' % high_file_name
  36. for n, tup in enumerate(highest_freq):
  37.     if n < 10:
  38.         freq, word = tup
  39.         print freq, word
  40. raw_input('\nHit enter to exit')
  41.  
Output:
Expand|Select|Wrap|Line Numbers
  1. Highest frequency file: graphics.py
  2. 93 def
  3. 44 return
  4. 36 the
  5. 31 in
  6. 29 of
  7. 26 for
  8. 25 if
  9. 23 to
  10. 19 class
  11. 19 a
  12.  
  13. Hit enter to exit
  14.  
This is a crude example so I apologize; however I don't understand what you're trying to do or why. So working with what you've given this is the most I can make of your question.
Mar 19 '08 #8
alivip
17
realy I aprechat your trying to help
but unfortionatly that is no wat I ment

I meat is read all files in directory compin all words in all files and put them in new file then find freqancy of each word in taht new file
Mar 20 '08 #9
jlm699
314 100+
realy I aprechat your trying to help
but unfortionatly that is no wat I ment

I meat is read all files in directory compin all words in all files and put them in new file then find freqancy of each word in taht new file
So basically, you're saying you want to combine the contents of all the files into a new file, and then find the frequency of the words in that file?

Well to do that w/o creating a new file would be a very slight change from a previous post:
Expand|Select|Wrap|Line Numbers
  1. import sys, os
  2.  
  3. word_freq = {}
  4.  
  5. for root, dirs, files in os.walk(os.getcwd()):
  6.     print 'Looking into %s' % root.split('\\')[-1]
  7.     print 'Found %d dirs and %d files' % (len(dirs), len(files))
  8.  
  9.     for idx, file in enumerate(files):
  10.         ff = open (os.path.join(root, file), "r")
  11.         text = ff.read ( )
  12.         ff.close ( )
  13.  
  14.         word_list = text.strip().split()
  15.  
  16.         for word in word_list:
  17.             word = word.lower().rstrip('.,/"-_;\\[]()')
  18.             if word.isalpha():
  19.                 # build the dictionary
  20.                 count = word_freq.get(word, 0)
  21.                 word_freq[word] = count + 1
  22.  
  23.     # create a list of (freq, word) tuples
  24.     freq_list = [(freq, word) for word, freq in word_freq.items()]
  25.  
  26.     # sort the list by the first element in each tuple (default)
  27.     freq_list.sort(reverse=True)
  28.  
  29. for n, tup in enumerate(freq_list):
  30.     # print the first ten items
  31.     if n < 10:
  32.         print "%s times: %s" % tup
  33. raw_input('\nHit enter to exit')
  34.  
Mar 20 '08 #10
alivip
17
thank you very much M.r jlm699 it is work fine now

I integrat program to be GUI using Tkinter and insted search in curent direction I try to be from browser
as you can see

Expand|Select|Wrap|Line Numbers
  1. # a look at the Tkinter Text widget
  2.  
  3. # use ctrl+c to copy, ctrl+x to cut selected text,
  4.  
  5. # ctrl+v to paste, and ctrl+/ to select all
  6.   # count words in a text and show the first ten items
  7.  # by decreasing frequency
  8.  
  9. import Tkinter as tk
  10. import os, glob
  11. import sys
  12. import string
  13. import re
  14. import tkFileDialog      
  15. def most_frequant_word():    
  16.  a= tkFileDialog.askdirectory()
  17.  browser= os.listdir(a)
  18.  
  19.  
  20.  for root, dirs, files in os.walk(browser):
  21.     print 'Looking into %s' % root.split('\\')[-1]
  22.     print 'Found %d dirs and %d files' % (len(dirs), len(files))
  23.  
  24.     for idx, file in enumerate(files):
  25.      ff = open (os.path.join(root, file), "r")
  26.      text = ff.read ( )
  27.      ff.close ( )
  28.  
  29.      word_list = text.strip().split()
  30.  
  31.      for word in word_list:
  32.       word = word.lower().rstrip('.,/"-_;\\[]()')
  33.  
  34.       if word.isalpha():
  35.                 # build the dictionary
  36.        count = word_freq.get(word, 0)
  37.        word_freq[word] = count + 1
  38.  
  39.        # create a list of (freq, word) tuples
  40.        freq_list = [(freq, word) for word, freq in word_freq.items()]
  41.  
  42.        # sort the list by the first element in each tuple (default)
  43.        freq_list.sort(reverse=True)
  44.  
  45.      for n, tup in enumerate(freq_list):
  46.     # print the first ten items
  47.       if n < 15:
  48.         print "%s times: %s" % tup
  49.         text1.insert(tk.INSERT, freq)
  50.         text1.insert(tk.INSERT, word)
  51.         text1.insert(tk.INSERT, "\n")
  52.  
  53.  raw_input('\nHit enter to exit')
  54.  
  55. root = tk.Tk(className = " most_frequant_word")
  56. # text entry field, width=width chars, height=lines text
  57. v1 = tk.StringVar()
  58. text1 = tk.Text(root, width=50, height=20, bg='green')
  59. text1.pack()
  60. # function listed in command will be executed on button click
  61. button1 = tk.Button(root, text='result', command=most_frequant_word)
  62. button1.pack(pady=5)
  63. text1.focus()
  64. root.mainloop()
but give me this error

Exception in Tkinter callback
Traceback (most recent call last):
File "C:\Python25\lib\lib-tk\Tkinter.py", line 1403, in __call__
return self.func(*args)
File "C:\Documents and Settings\Administrator\Desktop\ICS482\hw3\programA li.py", line 21, in most_frequant_word
for root, dirs, files in os.walk(browser):
File "C:\Python25\lib\os.py", line 285, in walk
names = listdir(top)
TypeError: coercing to Unicode: need string or buffer, list found
could you please help me to solve this problem
Mar 20 '08 #11
alivip
17
I fix the error now
but
it will not insert to the textbox
it just print then hanging

Expand|Select|Wrap|Line Numbers
  1. # a look at the Tkinter Text widget
  2.  
  3. # use ctrl+c to copy, ctrl+x to cut selected text,
  4.  
  5. # ctrl+v to paste, and ctrl+/ to select all
  6.   # count words in a text and show the first ten items
  7.  # by decreasing frequency
  8.  
  9. import Tkinter as tk
  10. import os, glob
  11. import sys
  12. import string
  13. import re
  14. import tkFileDialog      
  15. def most_frequant_word():    
  16.  browser= tkFileDialog.askdirectory()
  17.  #browser= os.listdir(a)
  18.  
  19.  
  20.  for root, dirs, files in os.walk(browser):
  21.     print 'Looking into %s' % root.split('\\')[-1]
  22.     print 'Found %d dirs and %d files' % (len(dirs), len(files))
  23.     #text1.insert(tk.INSERT,'Looking into %s' % root.split('\\')[-1])
  24.     #text1.insert(tk.INSERT, 'Found %d dirs and %d files' % (len(dirs), len(files)))
  25.     for idx, file in enumerate(files):
  26.      print 'File #%d: %s' % (idx + 1, file)
  27.       #text1.insert(tk.INSERT, 'File #%d: %s' % (idx + 1, file))
  28.      ff = open (os.path.join(root, file), "r")
  29.      text = ff.read ( )
  30.      ff.close ( )
  31.      word_freq = {}
  32.  
  33.      word_list = text.strip().split()
  34.  
  35.      for word in word_list:
  36.       word = word.lower().rstrip('.,/"-_;\\[]()')
  37.  
  38.       if word.isalpha():
  39.                 # build the dictionary
  40.        count = word_freq.get(word, 0)
  41.        word_freq[word] = count + 1
  42.  
  43.        # create a list of (freq, word) tuples
  44.        freq_list = [(freq, word) for word, freq in word_freq.items()]
  45.  
  46.        # sort the list by the first element in each tuple (default)
  47.        freq_list.sort(reverse=True)
  48.  
  49.      for n, tup in enumerate(freq_list):
  50.     # print the first ten items
  51.       if n < 50:
  52.         print "%s times: %s" % tup
  53.         text1.insert(tk.INSERT, freq)
  54.         text1.insert(tk.INSERT, word)
  55.         text1.insert(tk.INSERT, "\n")
  56.  
  57.  raw_input('\nHit enter to exit')
  58.  
  59. root = tk.Tk(className = " most_frequant_word")
  60. # text entry field, width=width chars, height=lines text
  61. v1 = tk.StringVar()
  62. text1 = tk.Text(root, width=50, height=20, bg='green')
  63. text1.pack()
  64. # function listed in command will be executed on button click
  65. button1 = tk.Button(root, text='Brows', command=most_frequant_word)
  66. button1.pack(pady=5)
  67. text1.focus()
  68. root.mainloop()
code try to insert
Expand|Select|Wrap|Line Numbers
  1.  print "%s times: %s" % tup
  2.         text1.insert(tk.INSERT, freq)
  3.         text1.insert(tk.INSERT, word)
  4.         text1.insert(tk.INSERT, "\n")
when I wont to insert fil name and directory to the textbox it will hang also
code is comment

Expand|Select|Wrap|Line Numbers
  1. print 'Looking into %s' % root.split('\\')[-1]
  2.     print 'Found %d dirs and %d files' % (len(dirs), len(files))
  3.     #text1.insert(tk.INSERT,'Looking into %s' % root.split('\\')[-1])
  4.     #text1.insert(tk.INSERT, 'Found %d dirs and %d files' % (len(dirs), len(files)))
  5.     for idx, file in enumerate(files):
  6.      print 'File #%d: %s' % (idx + 1, file)
  7.       #text1.insert(tk.INSERT, 'File #%d: %s' % (idx + 1, file))
Mar 21 '08 #12

Sign in to post your reply or Sign up for a free account.

Similar topics

2
by: Paul | last post by:
I am creating a Program for college, in which the Program will read a Folder and create a HTML page from the pictures that are storrd in that folder. . What would be the best way to do...
0
by: Paul | last post by:
I am creating a Program for college, in which the Program will read a Folder and create a HTML page from the pictures that are storrd in that folder. .. What would be the best way to do it in VB...
3
by: Davie | last post by:
I am new to .Net, so apologies if this is a simple question. I need a way of display folders and files to my users. However, it must show whether they have NTFS access to the file/folder. For...
7
by: tonelab | last post by:
I have an aspx page that does not have a separate source for the VB - it is on top in between the <script> tags. I use the following statement Dim oComLib as New ComLib To reference a class...
3
by: eholz1 | last post by:
Hello PHP Group, I am having trouble setting permissions correctly so that the magickwand api (php 5.2) can read and write images. I usually read a file from one directory, create a magickwand...
2
by: =?Utf-8?B?SmVmZnJleQ==?= | last post by:
I made a typo on a Project name (e.g. Wong, instead of Wang). Later on I renamed the Soution, Project, WebForm., etc, except the file folder name, back to Wang. Then after I closed the VS.net and...
10
by: kai | last post by:
Hi, All I am trying to create a file folder for any login user, and create sub folders for the user on a web page. After the user login again, he can only sees his own folder on the Web page. I am...
1
by: cbz9633 | last post by:
hi i want to read a folder name, after that read folder names within that folder and at last the file name and access it, that i know but i dont know how to read a folder name. please help
0
parshupooja
by: parshupooja | last post by:
Contact Reply 1 point Member propoo Joined on 08-31-2007, 10:32 PM Posts 3 Hey all ,
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.