By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
464,609 Members | 1,142 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 464,609 IT Pros & Developers. It's quick & easy.

Extract data from a file and write it to another file

P: 2
I want to open a word file,
check again my list of words or phases to extract
(such as Monday_Tuesday, Happy_birthday and etc)
write the word or phases to another file
Also states which word or phases in my list were not found
Jul 27 '12 #1
Share this Question
Share on Google+
6 Replies

P: 2
I know nothing about python. my knowledge at this time is only I downloaded it and wrote hello world..
Jul 27 '12 #2

Jory R Ferrell
P: 62
First, you should realize that your question is unlikely to receive very many responses besides mine. Your question is asking for a lot of higher level(higher than noob level :P) concepts to be addressed, but you failed to do a very important thing: You should have made an attempt to write your own code. You should show what level of knowledge you have about lists, dictionaries, indexing, function calls, etc. This shows you made an effort and are not trying to have someone else whip up a highly efficient piece of code so you yourself avoid having to put in effort.

You asked for many things in your code but provided none of you own progress to build off of. But I do understand why you did. I myself have inadvertently done it in the past. Just try to keep this in mind for the future.
Jul 28 '12 #3

Jory R Ferrell
P: 62
Expand|Select|Wrap|Line Numbers
  1. #------------------------------------------------------------------------------#
  2. #                            Preparation of Data                               #
  3. #------------------------------------------------------------------------------#
  4. # "user_input" Holds all words and/or phrases that you would like to search for.
  5.  
  6. user_input = raw_input('Please input words and phrases to search for, separating each standalone term with a comma.')
  7.  
  8.  
  9. user_input = user_input.split(',') # split will separate each search var by the commas you are asked to use at input.
  10.  
  11. print user_input
  12. # #Now I am not entirely sure how to go about efficiently searching for phrases, but single words are fairly simple.
  13. # #The following variable, "phrases" is a list which will contain all...well...phrases, separate from the single words. :P
  14.  
  15. phrases = []
  16. for var in user_input:
  17.     if len(var.split()) > 1: # Leaving the params empty in the .split() func call will separate everything by whitespaces.
  18.         phrases.append(var)
  19.         user_input.remove(var) # Remove all instances of var from previous list (user_input) to avoid redundant search iterations.
  20.  
  21.  
  22. #------------------------------------------------------------------------------#
  23. #                            Search Data For Matches                           #
  24. #------------------------------------------------------------------------------#
  25.  
  26. path_to_file = ''
  27.  
  28. search_text = open('C:/Users/JRFerrell/Desktop/sample_parse.txt', 'r') # You can find several good tutorials on youtube for dealing with file IO.
  29.  
  30.  
  31. matches = []
  32.  
  33.  
  34. # we'll need to combine two lines at a time in order to search for phrases. Phrases could be split over  multiple lines, so you'll need conventions for dealing with that.
  35. # "prev_lines" will store the previous line and be combined with the current line to form a completely new line for iteration.
  36. prev_line = ''
  37.  
  38. for line in search_text: # Each line is counted as a separate, complete object in itself.
  39.     new_line = prev_line + line # Create a new line from the current and previous line.
  40.     prev_line = line # Re-assign the previous line variable with the current line in preparation for the next search.
  41.     new_line = new_line.split() # Split each line object into separate words.
  42.  
  43.     # SEARCH FOR SINGLE WORDS #
  44.     for var in user_input:
  45.         if var.strip('!') in new_line|var.strip('?') in new_line|var.strip('.') in new_line: # I am unpracticed with ways to do this without ".split()". This leaves a problem. :)
  46.         # The strint splitting function, when separating whitespaces, will leave puncuation attached. So, annoyingly, splitting "Hello there!" leaves you with "there!", not "there".
  47.         # So we can add conditionals that check to see if anything, once stripped of potential puncutation, matches the var.
  48.             matches.append(var) # If there is a match, we can append the match to a list and/or write it to another file...for example: file.write(var + ' ')
  49.  
  50.     # SEARCH FOR PHRASES #
  51.     if phrases: # If "False", there are no phrases to search for, so you can skip this long and laborious search. Otherwise, for "True", begin searching.
  52.         for var in phrases:
  53.             var_split = var.split()
  54.             length = len(var_split) # Length of the line will be used as range of index vars.
  55.  
  56.             # We now know the length of each phrase. For each word in the phrase, we'll iterate through the line,
  57.             #   and for each word in the line, add the word plus each word after it, for every number in the range of the length variable.
  58.             #   So if the length of a phrase is 3 words, grab the index of the current word (curr_index) and:
  59.             #
  60.             #       if phrase == new_line[curr_index] + new_line[curr_index+1] + new_line[curr_index+2]:
  61.             #           do_Something()
  62.             #
  63.             #   This means you slowly go through each word in the line in this example, and check to see if that word,
  64.             #   combined with the two after it, equals the phrase you need.
  65.  
  66.             for word in new_line:
  67.                 index = new_line.index(word)
  68.  
  69.                 search_term = word
  70.  
  71.  
  72.                 for x in range(length-1):
  73.                     search_term = search_term + ' ' + new_line[x+1]
  74.                     if var == search_term:
  75.                         matches.append(var) # Or matches.append(search_term)
  76.  
  77.  
  78. if matches:
  79.     for match in matches:
  80.         print 'Match:', match
  81.  
  82.  
  83.  
  84.  
  85.  
  86.  
  87.  
  88.  
  89.  
  90.  
  91.  
  92.  
Jul 29 '12 #4

Jory R Ferrell
P: 62
I am going to try to read up on regular expressions and see if I can write something a little more streamlined.
Jul 29 '12 #5

Jory R Ferrell
P: 62
So...I stopped being lazy last night(....sorta.... :P), and
I figured out some issues that were holding me back.
The code below works as far as I test (a few lines of a short test txt doc). It may not be very efficient for extremely large searches, but it'll do for short quick work, as I said for the last code. Anyways, it turns out that Python has a built-in module called "re". This stands for Regular Expressions. This module is purpose built for searching strings for a match of a user-defined pattern.
This is more efficient than a custom, self-built, franken-parser (unless you know what is efficient, memory wise...I do not. :P), because it's been optimized by serious programmers. :) All you have to do with the module is set-up the text to be searched, create a way to iterate through the text, and condition the user input to be so it can be used as a search param.

When you go to use re.search, keep in mind that it deals with some strings in a way that you won't commonly run into as a beginner (like me). re.search requires the pattern to search for be a "raw string literal", for example: 'Hello' becomes r'Hello', with a 'r' in front. When you try to match the exact string (the word or phrase) as is, you have to use the '\b' indicator (so r'\bHello\b') which is part of the 're' module. But, I couldn't convert strings into raw strings. Luckily, I found a work around: Python uses the backslash ('\') as an escape character: Anything after the backslash is ignored. It's not processed in the way you might want it to be, so you have to use an escape character on the backslash of the '\b' flag: '\b' becomes '\\b'. This is the exact thing a raw string is meant to replace, so you can simply avoid trying to add the 'r' flag directly, and concatenate an extra backslash where ever it's needed: r'\bHello\b' becomes '\\bHello\\b'. Not doing this will lead to confusion. You have been warned. ;P
Expand|Select|Wrap|Line Numbers
  1.  
  2.  
  3. import re
  4.  
  5. #------------------------------------------------------------------------------#
  6. #                            Preparation of Data                               #
  7. #------------------------------------------------------------------------------#
  8. # "user_input" Holds all words and/or phrases that you would like to search for.
  9.  
  10. user_input = raw_input('Please input words and phrases to search for, separating each standalone term with a comma.')
  11.  
  12.  
  13. user_input = user_input.split(',') # split will separate each search var by the commas you are asked to use at input.
  14.  
  15. print user_input
  16.  
  17.  
  18.  
  19.  
  20. #------------------------------------------------------------------------------#
  21. #                            Search Data For Matches                           #
  22. #------------------------------------------------------------------------------#
  23.  
  24. path_to_file = ''
  25.  
  26. search_text = open('C:/Users/JRFerrell/Desktop/sample_parse.txt', 'r') 
  27. # You can replace my example path with path_to_file after the user assigns their own custom path to it.)
  28. # You can find several good tutorials on youtube for dealing with file IO.
  29.  
  30. matches = []
  31.  
  32. prev_line = ''
  33.  
  34. for line in search_text: # Each line is counted as a separate, complete object in itself.
  35.     new_line = prev_line + line # Create a new line from the current and previous line.
  36.     prev_line = line # Re-assign the previous line variable with the current line in preparation for the next search.
  37.  
  38.  
  39.     # SEARCH FOR SINGLE WORDS #
  40.     for var in user_input:
  41.         if var not in matches:
  42.             match = re.search('\\b'+var+'\\b', new_line)
  43.             matches.append(match.group(0))
  44.  
  45.         else:
  46.             user_input.remove(var)
  47.  
  48.  
  49. if matches:
  50.     for match in matches:
  51.         print 'Match:', matches
  52.  
If you have any questions, or you find a mistake in my code, please let me know. Have fun.
Aug 4 '12 #6

numberwhun
Expert Mod 2.5K+
P: 3,503
It is recommended that you find one of the plethora of Python tutorials on the internet and go through it. Python is fun an easier than one would think, especially for beginners.

Regards,

Jeff
Aug 7 '12 #7

Post your reply

Sign in to post your reply or Sign up for a free account.