473,699 Members | 2,131 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Extract data from a file and write it to another file

2 New Member
I want to open a word file,
check again my list of words or phases to extract
(such as Monday_Tuesday, Happy_birthday and etc)
write the word or phases to another file
Also states which word or phases in my list were not found
Jul 27 '12 #1
6 2654
charley Situ
2 New Member
I know nothing about python. my knowledge at this time is only I downloaded it and wrote hello world..
Jul 27 '12 #2
Jory R Ferrell
62 New Member
First, you should realize that your question is unlikely to receive very many responses besides mine. Your question is asking for a lot of higher level(higher than noob level :P) concepts to be addressed, but you failed to do a very important thing: You should have made an attempt to write your own code. You should show what level of knowledge you have about lists, dictionaries, indexing, function calls, etc. This shows you made an effort and are not trying to have someone else whip up a highly efficient piece of code so you yourself avoid having to put in effort.

You asked for many things in your code but provided none of you own progress to build off of. But I do understand why you did. I myself have inadvertently done it in the past. Just try to keep this in mind for the future.
Jul 28 '12 #3
Jory R Ferrell
62 New Member
Expand|Select|Wrap|Line Numbers
  1. #------------------------------------------------------------------------------#
  2. #                            Preparation of Data                               #
  3. #------------------------------------------------------------------------------#
  4. # "user_input" Holds all words and/or phrases that you would like to search for.
  5.  
  6. user_input = raw_input('Please input words and phrases to search for, separating each standalone term with a comma.')
  7.  
  8.  
  9. user_input = user_input.split(',') # split will separate each search var by the commas you are asked to use at input.
  10.  
  11. print user_input
  12. # #Now I am not entirely sure how to go about efficiently searching for phrases, but single words are fairly simple.
  13. # #The following variable, "phrases" is a list which will contain all...well...phrases, separate from the single words. :P
  14.  
  15. phrases = []
  16. for var in user_input:
  17.     if len(var.split()) > 1: # Leaving the params empty in the .split() func call will separate everything by whitespaces.
  18.         phrases.append(var)
  19.         user_input.remove(var) # Remove all instances of var from previous list (user_input) to avoid redundant search iterations.
  20.  
  21.  
  22. #------------------------------------------------------------------------------#
  23. #                            Search Data For Matches                           #
  24. #------------------------------------------------------------------------------#
  25.  
  26. path_to_file = ''
  27.  
  28. search_text = open('C:/Users/JRFerrell/Desktop/sample_parse.txt', 'r') # You can find several good tutorials on youtube for dealing with file IO.
  29.  
  30.  
  31. matches = []
  32.  
  33.  
  34. # we'll need to combine two lines at a time in order to search for phrases. Phrases could be split over  multiple lines, so you'll need conventions for dealing with that.
  35. # "prev_lines" will store the previous line and be combined with the current line to form a completely new line for iteration.
  36. prev_line = ''
  37.  
  38. for line in search_text: # Each line is counted as a separate, complete object in itself.
  39.     new_line = prev_line + line # Create a new line from the current and previous line.
  40.     prev_line = line # Re-assign the previous line variable with the current line in preparation for the next search.
  41.     new_line = new_line.split() # Split each line object into separate words.
  42.  
  43.     # SEARCH FOR SINGLE WORDS #
  44.     for var in user_input:
  45.         if var.strip('!') in new_line|var.strip('?') in new_line|var.strip('.') in new_line: # I am unpracticed with ways to do this without ".split()". This leaves a problem. :)
  46.         # The strint splitting function, when separating whitespaces, will leave puncuation attached. So, annoyingly, splitting "Hello there!" leaves you with "there!", not "there".
  47.         # So we can add conditionals that check to see if anything, once stripped of potential puncutation, matches the var.
  48.             matches.append(var) # If there is a match, we can append the match to a list and/or write it to another file...for example: file.write(var + ' ')
  49.  
  50.     # SEARCH FOR PHRASES #
  51.     if phrases: # If "False", there are no phrases to search for, so you can skip this long and laborious search. Otherwise, for "True", begin searching.
  52.         for var in phrases:
  53.             var_split = var.split()
  54.             length = len(var_split) # Length of the line will be used as range of index vars.
  55.  
  56.             # We now know the length of each phrase. For each word in the phrase, we'll iterate through the line,
  57.             #   and for each word in the line, add the word plus each word after it, for every number in the range of the length variable.
  58.             #   So if the length of a phrase is 3 words, grab the index of the current word (curr_index) and:
  59.             #
  60.             #       if phrase == new_line[curr_index] + new_line[curr_index+1] + new_line[curr_index+2]:
  61.             #           do_Something()
  62.             #
  63.             #   This means you slowly go through each word in the line in this example, and check to see if that word,
  64.             #   combined with the two after it, equals the phrase you need.
  65.  
  66.             for word in new_line:
  67.                 index = new_line.index(word)
  68.  
  69.                 search_term = word
  70.  
  71.  
  72.                 for x in range(length-1):
  73.                     search_term = search_term + ' ' + new_line[x+1]
  74.                     if var == search_term:
  75.                         matches.append(var) # Or matches.append(search_term)
  76.  
  77.  
  78. if matches:
  79.     for match in matches:
  80.         print 'Match:', match
  81.  
  82.  
  83.  
  84.  
  85.  
  86.  
  87.  
  88.  
  89.  
  90.  
  91.  
  92.  
Jul 29 '12 #4
Jory R Ferrell
62 New Member
I am going to try to read up on regular expressions and see if I can write something a little more streamlined.
Jul 29 '12 #5
Jory R Ferrell
62 New Member
So...I stopped being lazy last night(....sorta .... :P), and
I figured out some issues that were holding me back.
The code below works as far as I test (a few lines of a short test txt doc). It may not be very efficient for extremely large searches, but it'll do for short quick work, as I said for the last code. Anyways, it turns out that Python has a built-in module called "re". This stands for Regular Expressions. This module is purpose built for searching strings for a match of a user-defined pattern.
This is more efficient than a custom, self-built, franken-parser (unless you know what is efficient, memory wise...I do not. :P), because it's been optimized by serious programmers. :) All you have to do with the module is set-up the text to be searched, create a way to iterate through the text, and condition the user input to be so it can be used as a search param.

When you go to use re.search, keep in mind that it deals with some strings in a way that you won't commonly run into as a beginner (like me). re.search requires the pattern to search for be a "raw string literal", for example: 'Hello' becomes r'Hello', with a 'r' in front. When you try to match the exact string (the word or phrase) as is, you have to use the '\b' indicator (so r'\bHello\b') which is part of the 're' module. But, I couldn't convert strings into raw strings. Luckily, I found a work around: Python uses the backslash ('\') as an escape character: Anything after the backslash is ignored. It's not processed in the way you might want it to be, so you have to use an escape character on the backslash of the '\b' flag: '\b' becomes '\\b'. This is the exact thing a raw string is meant to replace, so you can simply avoid trying to add the 'r' flag directly, and concatenate an extra backslash where ever it's needed: r'\bHello\b' becomes '\\bHello\\b'. Not doing this will lead to confusion. You have been warned. ;P
Expand|Select|Wrap|Line Numbers
  1.  
  2.  
  3. import re
  4.  
  5. #------------------------------------------------------------------------------#
  6. #                            Preparation of Data                               #
  7. #------------------------------------------------------------------------------#
  8. # "user_input" Holds all words and/or phrases that you would like to search for.
  9.  
  10. user_input = raw_input('Please input words and phrases to search for, separating each standalone term with a comma.')
  11.  
  12.  
  13. user_input = user_input.split(',') # split will separate each search var by the commas you are asked to use at input.
  14.  
  15. print user_input
  16.  
  17.  
  18.  
  19.  
  20. #------------------------------------------------------------------------------#
  21. #                            Search Data For Matches                           #
  22. #------------------------------------------------------------------------------#
  23.  
  24. path_to_file = ''
  25.  
  26. search_text = open('C:/Users/JRFerrell/Desktop/sample_parse.txt', 'r') 
  27. # You can replace my example path with path_to_file after the user assigns their own custom path to it.)
  28. # You can find several good tutorials on youtube for dealing with file IO.
  29.  
  30. matches = []
  31.  
  32. prev_line = ''
  33.  
  34. for line in search_text: # Each line is counted as a separate, complete object in itself.
  35.     new_line = prev_line + line # Create a new line from the current and previous line.
  36.     prev_line = line # Re-assign the previous line variable with the current line in preparation for the next search.
  37.  
  38.  
  39.     # SEARCH FOR SINGLE WORDS #
  40.     for var in user_input:
  41.         if var not in matches:
  42.             match = re.search('\\b'+var+'\\b', new_line)
  43.             matches.append(match.group(0))
  44.  
  45.         else:
  46.             user_input.remove(var)
  47.  
  48.  
  49. if matches:
  50.     for match in matches:
  51.         print 'Match:', matches
  52.  
If you have any questions, or you find a mistake in my code, please let me know. Have fun.
Aug 4 '12 #6
numberwhun
3,509 Recognized Expert Moderator Specialist
It is recommended that you find one of the plethora of Python tutorials on the internet and go through it. Python is fun an easier than one would think, especially for beginners.

Regards,

Jeff
Aug 7 '12 #7

Sign in to post your reply or Sign up for a free account.

Similar topics

11
3164
by: Ren | last post by:
Suppose I have a file containing several lines similar to this: :10000000E7280530AC00A530AD00AD0B0528AC0BE2 The data I want to extract are 8 hexadecimal strings, the first of which is E728, like this: :10000000 E728 0530 AC00 A530 AD00 AD0B 0528 AC0B E2 Also, the bytes in the string are reversed. The E728 needs to be 28E7,
0
1719
by: Peter A. Schott | last post by:
If I want to verify that a file has finished writing before deleting the remote file, what would be the best method? Current code on Python 2.4: #filename - remote FTP server File Name #NewFile - local file copy of the remote file #objFTP - standard ftplib.FTP object NewFile = open(os.path.join(InputPath, RemoteFileName), "wb")
4
3988
by: Tony Clarke | last post by:
Hi All, I have been trying to extract data from a text file using the fscanf() functions and sscanf() functions. The file is of various characters and integers separated by semicolons, the problem I'm having is that each line is of varying length and the fields separated by semicolons are of varying length also. Is there a way that I could check the first field and depending on this extract data from certain fields contained in this...
8
2834
by: Fabian Braennstroem | last post by:
Hi, I would like to remove certain lines from a log files. I had some sed/awk scripts for this, but now, I want to use python with its re module for this task. Actually, I have two different log files. The first file looks like: ...
0
4625
by: shamszia | last post by:
I exported data using Export Utility. Now import utility is not working with this .dmp file. i've tried using nxtextract, but it limits rows to 5000. is there any data extracting tool for .dmp file.
1
5061
by: manishabh77 | last post by:
I will be obliged if anybody can help me with this problem: I am trying to extract data from an excel sheet that matches IDs given in column 4 of the excel sheet.I have stored those query IDs in an array (@names). After I look for the match in this section of the code: if ($value=~/^$names$/), I want to write out only those rows that satisfy the above natch condition. But currently the code I have here writes out everything. How do I get it to...
1
1455
by: =?Utf-8?B?THVpZ2k=?= | last post by:
Hi all, is it possible to extract data from Pdf file, in several formats, like .txt or Excel. And from an aspx page (ASP.NET 2.0 - C#). Thanks in advance. -- Luigi
1
3068
by: veer | last post by:
Hi i am making a program in which i want to extract data from html file . Actually there are two dates on html file i want to extract these dates but the main probleum is that these dates are different on each file. A word "AKTIVA" is always comes before these dates. i made this by seaching the activa word but after this i am not getting any idea how these dates can be accessed. i use one another method by transfering the whole data of...
1
1986
by: honeymoon | last post by:
Hello to everyone!! I'm a newbie to Python and I have this problem: I have an xml document like this <root> <one> <two> <third> some text
1
2774
by: masterinex | last post by:
Hi guys , Im a little unfamiliar with Python . Hope you can take a look at this: Im trying to extract the number 7.2 from the html string below using python: '''<a href="/ratings_explained">weighted average</a> vote of <a href="/List?ratings=7">7.2</a> / 10</p><p>''' I thought this would be code to do this .But how come this doesnt work ? averageget = re.compile('<a href="/List?ratings=7">(.*?)</a>') average =...
0
8706
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8630
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9055
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8944
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8899
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
5889
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4391
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
3075
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
2364
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.