473,651 Members | 2,716 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

need help extracting data from a text file

Hey there,
i have a text file with a bunch of values scattered throughout it.
i am needing to pull out a value that is in parenthesis right after a
certain word,
like the first time the word 'foo' is found, retrieve the values in the
next set of parenthesis (bar) and it would return 'bar'

i think i can use re to do this, but is there some easier way?
thanks

Nov 7 '05 #1
7 1543

ne*****@xit.net wrote:
Hey there,
i have a text file with a bunch of values scattered throughout it.
i am needing to pull out a value that is in parenthesis right after a
certain word,
like the first time the word 'foo' is found, retrieve the values in the
next set of parenthesis (bar) and it would return 'bar'

i think i can use re to do this, but is there some easier way?
thanks


well, you can use string.find with offsets, but an re is probably a
cleaner way to go. I'm not sure which way is faster - it'll depend on
how many times you're searching compared to the overhead of setting up
an re.

start = textfile.find(" foo(") + 4 # 4 being how long 'foo(' is
end = textfile.find(" )", start)
value = textfile[start:end]

Iain

Nov 7 '05 #2
this is cool, it is only going to run about 10 times a day,

the text is not written out like foo(bar) its more like
foo blah blah blah (bar)

the thing is , every few days the structure of the textfile may change,
one of the reasons i wanted to avoid the re.

thanks for the tip,

Nov 7 '05 #3

ne*****@xit.net wrote:
this is cool, it is only going to run about 10 times a day,

the text is not written out like foo(bar) its more like
foo blah blah blah (bar)


then I guess you worked this out, but just for completeness:

keywordPos = textfile.find(" foo")
start = textfile.find(" (", keywordPos)
end = textfile.find(" )", start)
value = textfile[start:end]
Iain

Nov 7 '05 #4
um, wait. what you are doing here is easier than what i was doing after
your first post.
thanks a lot. this is going to work out ok.

thanks again.
sk

Nov 7 '05 #5
<ne*****@xit.ne t> wrote in message
news:11******** **************@ f14g2000cwb.goo glegroups.com.. .
Hey there,
i have a text file with a bunch of values scattered throughout it.
i am needing to pull out a value that is in parenthesis right after a
certain word,
like the first time the word 'foo' is found, retrieve the values in the
next set of parenthesis (bar) and it would return 'bar'

i think i can use re to do this, but is there some easier way?
thanks

Using string methods to locate the 'foo' instances is by far the fastest way
to go.

If your requirements get more complicated, look into using pyparsing
(http://pyparsing.sourceforge.net). Here is a pyparsing rendition of this
problem. This does three scans through some sample data - the first lists
all matches, the second ignores matches if they are found inside a quoted
string, and the third reports only the third match. This kind of
context-sensitive matching gets trickier with basic string and re tools.

-- Paul

data = """
i have a text file with a bunch of foo(bar1) values scattered throughout it.
i am needing to pull out a value that foo(bar2) is in parenthesis right
after a
certain word,
like the foo(bar3) first time the word 'foo' is found, retrieve the values
in the
next set of parenthesis foo(bar4) and it would return 'bar'
do we want to skip things in quotes, such as 'foo(barInQuote s)'?
"""

from pyparsing import Literal,SkipTo, quotedString

pattern = Literal("foo") + "(" + SkipTo(")").set ResultsName("pa yload") + ")"

# report all occurrences of xxx found in "foo(xxx)"
for tokens,start,en d in pattern.scanStr ing(data):
print tokens.payload, "at location", start
print

# ignore quoted strings
pattern.ignore( quotedString)
for tokens,start,en d in pattern.scanStr ing(data):
print tokens.payload, "at location", start
print

# only report 3rd occurrence
tokenMatch = {'foo':0}
def thirdTimeOnly(s trg,loc,tokens) :
word = tokens[0]
if word in tokenMatch:
tokenMatch[word] += 1
if tokenMatch[word] != 3:
raise ParseException( strg,loc,"wrong occurrence of token")

pattern.setPars eAction(thirdTi meOnly)
for tokens,start,en d in pattern.scanStr ing(data):
print tokens.payload, "at location", start
print

Prints:
bar1 at location 36
bar2 at location 116
bar3 at location 181
bar4 at location 278
barInQuotes at location 360

bar1 at location 36
bar2 at location 116
bar3 at location 181
bar4 at location 278

bar3 at location 181
Nov 7 '05 #6
ne*****@xit.net wrote:
Hey there,
i have a text file with a bunch of values scattered throughout it.
i am needing to pull out a value that is in parenthesis right after a
certain word,
like the first time the word 'foo' is found, retrieve the values in the
next set of parenthesis (bar) and it would return 'bar'

i think i can use re to do this, but is there some easier way?


It's pretty easy with an re:
import re
fooRe = re.compile(r'fo o.*?\((.*?)\)')
fooRe.search('f oo(bar)').group (1) 'bar' fooRe.search('T his is a foo bar baz blah blah (bar)').group(1 )

'bar'

Kent
Nov 7 '05 #7
On Mon, 7 Nov 2005, Kent Johnson wrote:
ne*****@xit.net wrote:
i have a text file with a bunch of values scattered throughout it. i am
needing to pull out a value that is in parenthesis right after a
certain word, like the first time the word 'foo' is found, retrieve the
values in the next set of parenthesis (bar) and it would return 'bar'


It's pretty easy with an re:
import re
fooRe = re.compile(r'fo o.*?\((.*?)\)')
Just out of interest, i've never really got into using non-greedy
quantifiers (i use them from time to time, but hardly ever feel the need
for them), so my instinct would have been to write this as:
fooRe = re.compile(r"fo o[^(]*\(([^)]*)\)")
Is there any reason to use one over the other?
fooRe.search('f oo(bar)').group (1) 'bar' fooRe.search('T his is a foo bar baz blah blah (bar)').group(1 )

'bar'


Ditto.

tom

--
[of Muholland Drive] Cancer is pretty ingenious too, but its best to
avoid. -- Tex
Nov 9 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
3726
by: Trader | last post by:
Hi, I'm trying to use Mark Hammond's win32clipboard module to extract more complex data than just plain ASCII text from the Windows clipboard. For instance, when you select all the content on web page, you can paste it into an app like Frontpage, or something Rich Text-aware, and it will preserve all the formatting, HTML, etc. I'd like to include that behavior in the application I'm writing. In the interactive session below, before I...
5
2944
by: Michael Hill | last post by:
Hi, folks. I am writing a Javascript program that accepts (x, y) data pairs from a text box and then analyzes that data in various ways. This is my first time using text area boxes; in the past, I have used individual entry fields for each variable. I would now like to use text area boxes to simplify the data entry (this way, data can be produced by another program--FORTRAN, "C", etc.--but analyzed online, so long as it is first...
2
2492
by: Kevin K | last post by:
Hi, I'm having a problem with extracting text from a Word document using StreamReader. As I'm developing a web application, I do NOT want the server to make calls to Word. I want to simply open the Word document via StreamReader and extract the text. Here's the problem, the users insist on leaving the "Track Changes" features on. Because of this, the raw text portion of the file contains the change history. I don't want the...
16
10957
by: Preben Randhol | last post by:
Hi A short newbie question. I would like to extract some values from a given text file directly into python variables. Can this be done simply by either standard library or other libraries? Some pointers where to get started would be much appreciated. An example text file: ----------- Some text that can span some lines.
4
2750
by: georges the man | last post by:
hey guys, i ve been posting for the last week trying to understand some stuff about c and reading but unfortunaly i couldnt do this. i have to write the following code. this will be the last time i ask for an entire code or u can give me the outine of what to do and i ll do it by myself. the description is the following: the program will read a text file that contains historical price of a stock. The program will allow users to query...
6
6309
by: Mag Gam | last post by:
Hi All, I am new to XML, and trying to extract some data from a file. The file looks like this: <CATALOG> <CD> <TITLE>Empire Burlesque</TITLE> <ARTIST>Bob Dylan</ARTIST> <COUNTRY>USA</COUNTRY> <COMPANY>Columbia</COMPANY>
0
1471
by: sgsiaokia | last post by:
I need help in extracting data from another source file using VBA. I have problems copying the extracted data and format into the required data format. And also, how do i delete the row that is not required in the output file, in the below example: The row, D0, is not needed. An Example Data Format From the SOURCE file: W1 W2 W3 W4 Oct05 AverageYield 95% 96% 92% 91% 94% D0 0.1 ...
3
1757
by: Clarisa | last post by:
Hello Folks I am working on extracting lines of data in a text file based on the string it contains. This is the text file called info.txt:
6
4444
by: Werner | last post by:
Hi, I try to read (and extract) some "self extracting" zipefiles on a Windows system. The standard module zipefile seems not to be able to handle this. False Is there a wrapper or has some one experience with other libaries to
0
8352
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8275
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8802
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8697
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
7297
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5612
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4283
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2699
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1909
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.