473,322 Members | 1,287 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,322 software developers and data experts.

Extracting/finding strings from a list

Hi,

I have a very long string, someting like:

DISPLAY=localhost:0.0,FORT_BUFFERED=true,

F_ERROPT1=271\,271\,2\,1\,2\,2\,2\,2,G03BASIS=/opt/g03b05/g03/basis,
GAMESS=/opt/gamess,GAUSS_ARCHDIR=/opt/g03b05/g03/arch,

GAUSS_EXEDIR=/opt/g03b05/g03/bsd:/opt/g03b05/g03/private:/opt/g03b05/g
03,GAUSS_SCR_ROOT=/home/561/345561/scratch,
GDVBASIS=/opt/g03b05/g03/basis,
GMAIN=/opt/g03b05/g03/bsd:/opt/g03b05/g03/private:/opt/g03b05/g03,
GROUP=e12,GV_DIR=/opt/g03b05/gv,HOST=sc1,
HOSTTYPE=alpha,INFOPATH=/opt/info,KMP_DUPLICATE_LIB_OK=TRUE,
KMP_STACKSIZE=10485760,
LINDA_FORTRAN=f90 -i8 -r8 -omp -reentrancy threaded,
LINDA_FORTRAN_LINK=f90 -i8 -r8 -omp -reentrancy threaded,
LOGNAME=345561,MACHTYPE=alpha,MAIL=/var/spool/mail/345561,

MANPATH=/opt/g03b05/g03/bsd:/usr/share/man:/usr/dt/share/man:/usr/loca

l/man:/opt/man:/opt/pbs/man:/opt/rash/man:/usr/opt/mpi/man:/usr/opt/mpi
/man:/usr/opt/mpi/man,MP_STACK_OVERFLOW=OFF,
NLSPATH=/usr/lib/nls/msg/%L/%N,OMP_NUM_THREADS=4,ONEEXE=-DONEEXE,
OSTYPE=osf1,

PERLLIB=/opt/g03b05/g03/bsd,PGI=/usr/pgi,PGIDIR=/usr/pgi/linux86/5.0,
POSTFL_FORTRAN=f90 -i8 -r8 -omp -reentrancy threaded,PROJECT=e12,
QCAUX=/opt/qchem-2.02/aux,QCPLATFORM=DEC_ALPHA,
RMS_PROJECT=e12,RUNCPP=/lib/cpp,SHELL=/opt/rash/bin/tcsh,SHLVL=1,
QUEUE=normal
and I need to extract the value of the variable "GAUSS_EXEDIR". Although
these are environment variables, I don't have access to them directly.
These variables are stored in a special file and I need to parse it to
be able to extract the variable. I wrote the following code for this:

...
data = {'gauss_var' : ''}
allLines = os.popen('cat ./somefile').readlines()
j = 0
# Extract GAUSS_EXEDIR from the string
gaussCont = 0
variable_list = value.split(",")
vars = variable_list[0].split("=")
while len(allLines) > j and vars[0] != "QUEUE":
var_line = allLines[j]
var_toks = split(var_line)
value = var_toks[0]
variable_list = value.split(",")
for k in range(len(pvariable_list)):
if variable_list[k].find("=") == -1:
if gaussCont == 1:
data['guass_var']="%s%s"
%(data['gauss_var'],variable_list[k])
gaussCont = -1
break
# end if
# end if

vars = variable_list[k].split("=")
for m in range(len(vars)):
if vars[m] == "GAUSS_EXEDIR":
for p in range(1, len(vars)):
data['gauss_var']="%s%s" %
(data['gauss_var'], vars[p])
# end for
gaussCont = 1
# end if
# end for
# end for
j += 1
if gaussCont == -1:
break
# end if
# end while
The reason why I look for the word "QUEUE" is because "QUEUE" is the
last variable expected in a list. After this line, the list continues
but the variables belong to another use (so basically it's a big file
full of env variables that belong to different users. Each list begins
with "DISPLAY" and ends with "QUEUE").

Is there a much simpler way of doing this? That is, extracting/finding
specific variables/value pairs from a list/string? These loops take up a
lot of my time and I'm trying to learn better ways of doing the same.
Thanks!
Steve

Jul 18 '05 #1
2 2908
Steve wrote:
Hi,

I have a very long string, someting like:

DISPLAY=localhost:0.0,FORT_BUFFERED=true,

F_ERROPT1=271\,271\,2\,1\,2\,2\,2\,2,G03BASIS=/opt/g03b05/g03/basis,
GAMESS=/opt/gamess,GAUSS_ARCHDIR=/opt/g03b05/g03/arch,

GAUSS_EXEDIR=/opt/g03b05/g03/bsd:/opt/g03b05/g03/private:/opt/g03b05/g
03,GAUSS_SCR_ROOT=/home/561/345561/scratch,
GDVBASIS=/opt/g03b05/g03/basis, <snipped>
and I need to extract the value of the variable "GAUSS_EXEDIR". Although
these are environment variables, I don't have access to them directly.
These variables are stored in a special file and I need to parse it to
be able to extract the variable. I wrote the following code for this:

...


How about a little RE:

import re

search_string = file("./somefile").read()
patt = """(?P<key>GAUSS_EXEDIR)
(?P<equals>=)
(?P<value>.*?)
(?P<comma>,)"""

compile_obj = re.compile(patt, re.IGNORECASE| re.DOTALL| re.VERBOSE)
match_obj = compile_obj.search(search_string)

# to search for the first match:
if match_obj:
value = match_obj.group('value')
print value

# to find all matches:
match_obj = compile_obj.findall(search_string)
for match in match_obj:
value = match[2]
print value

--
Vincent Wehren
Jul 18 '05 #2
"Steve" <nospam@nopes> wrote in message
news:40********@clarion.carno.net.au...
Hi,

I have a very long string, someting like:

DISPLAY=localhost:0.0,FORT_BUFFERED=true,

F_ERROPT1=271\,271\,2\,1\,2\,2\,2\,2,G03BASIS=/opt/g03b05/g03/basis,
GAMESS=/opt/gamess,GAUSS_ARCHDIR=/opt/g03b05/g03/arch,

GAUSS_EXEDIR=/opt/g03b05/g03/bsd:/opt/g03b05/g03/private:/opt/g03b05/g
03,GAUSS_SCR_ROOT=/home/561/345561/scratch,
GDVBASIS=/opt/g03b05/g03/basis,
GMAIN=/opt/g03b05/g03/bsd:/opt/g03b05/g03/private:/opt/g03b05/g03, GROUP=e12,GV_DIR=/opt/g03b05/gv,HOST=sc1,
HOSTTYPE=alpha,INFOPATH=/opt/info,KMP_DUPLICATE_LIB_OK=TRUE,
KMP_STACKSIZE=10485760,
LINDA_FORTRAN=f90 -i8 -r8 -omp -reentrancy threaded,
LINDA_FORTRAN_LINK=f90 -i8 -r8 -omp -reentrancy threaded,
LOGNAME=345561,MACHTYPE=alpha,MAIL=/var/spool/mail/345561,

MANPATH=/opt/g03b05/g03/bsd:/usr/share/man:/usr/dt/share/man:/usr/loca

l/man:/opt/man:/opt/pbs/man:/opt/rash/man:/usr/opt/mpi/man:/usr/opt/mpi
/man:/usr/opt/mpi/man,MP_STACK_OVERFLOW=OFF,
NLSPATH=/usr/lib/nls/msg/%L/%N,OMP_NUM_THREADS=4,ONEEXE=-DONEEXE,
OSTYPE=osf1,

PERLLIB=/opt/g03b05/g03/bsd,PGI=/usr/pgi,PGIDIR=/usr/pgi/linux86/5.0,
POSTFL_FORTRAN=f90 -i8 -r8 -omp -reentrancy threaded,PROJECT=e12,
QCAUX=/opt/qchem-2.02/aux,QCPLATFORM=DEC_ALPHA,
RMS_PROJECT=e12,RUNCPP=/lib/cpp,SHELL=/opt/rash/bin/tcsh,SHLVL=1,
QUEUE=normal
and I need to extract the value of the variable "GAUSS_EXEDIR". Although
these are environment variables, I don't have access to them directly.
These variables are stored in a special file and I need to parse it to
be able to extract the variable. I wrote the following code for this:

...
data = {'gauss_var' : ''}
allLines = os.popen('cat ./somefile').readlines()
j = 0
# Extract GAUSS_EXEDIR from the string
gaussCont = 0
variable_list = value.split(",")
vars = variable_list[0].split("=")
while len(allLines) > j and vars[0] != "QUEUE":
var_line = allLines[j]
var_toks = split(var_line)
value = var_toks[0]
variable_list = value.split(",")
for k in range(len(pvariable_list)):
if variable_list[k].find("=") == -1:
if gaussCont == 1:
data['guass_var']="%s%s"
%(data['gauss_var'],variable_list[k])
gaussCont = -1
break
# end if
# end if

vars = variable_list[k].split("=")
for m in range(len(vars)):
if vars[m] == "GAUSS_EXEDIR":
for p in range(1, len(vars)):
data['gauss_var']="%s%s" %
(data['gauss_var'], vars[p])
# end for
gaussCont = 1
# end if
# end for
# end for
j += 1
if gaussCont == -1:
break
# end if
# end while
The reason why I look for the word "QUEUE" is because "QUEUE" is the
last variable expected in a list. After this line, the list continues
but the variables belong to another use (so basically it's a big file
full of env variables that belong to different users. Each list begins
with "DISPLAY" and ends with "QUEUE").

Is there a much simpler way of doing this? That is, extracting/finding
specific variables/value pairs from a list/string? These loops take up a
lot of my time and I'm trying to learn better ways of doing the same.
Thanks!
Steve

Here is a pyparsing implementation. -- Paul
(download pyparsing at http://pyparsing.sourceforge.net )

search_string = """
DISPLAY=localhost:0.0,FORT_BUFFERED=true,

F_ERROPT1=271\,271\,2\,1\,2\,2\,2\,2,G03BASIS=/opt/g03b05/g03/basis,
GAMESS=/opt/gamess,GAUSS_ARCHDIR=/opt/g03b05/g03/arch,

GAUSS_EXEDIR=/opt/g03b05/g03/bsd:/opt/g03b05/g03/private:/opt/g03b05/g
03,GAUSS_SCR_ROOT=/home/561/345561/scratch,
GDVBASIS=/opt/g03b05/g03/basis,
GMAIN=/opt/g03b05/g03/bsd:/opt/g03b05/g03/private:/opt/g03b05/g03,
GROUP=e12,GV_DIR=/opt/g03b05/gv,HOST=sc1,
HOSTTYPE=alpha,INFOPATH=/opt/info,KMP_DUPLICATE_LIB_OK=TRUE,
KMP_STACKSIZE=10485760,
LINDA_FORTRAN=f90 -i8 -r8 -omp -reentrancy threaded,
LINDA_FORTRAN_LINK=f90 -i8 -r8 -omp -reentrancy threaded,
LOGNAME=345561,MACHTYPE=alpha,MAIL=/var/spool/mail/345561,

MANPATH=/opt/g03b05/g03/bsd:/usr/share/man:/usr/dt/share/man:/usr/loca

l/man:/opt/man:/opt/pbs/man:/opt/rash/man:/usr/opt/mpi/man:/usr/opt/mpi
/man:/usr/opt/mpi/man,MP_STACK_OVERFLOW=OFF,
NLSPATH=/usr/lib/nls/msg/%L/%N,OMP_NUM_THREADS=4,ONEEXE=-DONEEXE,
OSTYPE=osf1,

PERLLIB=/opt/g03b05/g03/bsd,PGI=/usr/pgi,PGIDIR=/usr/pgi/linux86/5.0,
POSTFL_FORTRAN=f90 -i8 -r8 -omp -reentrancy threaded,PROJECT=e12,
QCAUX=/opt/qchem-2.02/aux,QCPLATFORM=DEC_ALPHA,
RMS_PROJECT=e12,RUNCPP=/lib/cpp,SHELL=/opt/rash/bin/tcsh,SHLVL=1,
QUEUE=normal
"""

from pyparsing import printables, Word, Optional, Literal, Group, Dict, \
delimitedList, Combine, OneOrMore, alphanums

# definition of key
key = Word(alphanums+"_")

# definition of value
_noncommachars = "".join( [ c for c in printables if c not in r"\," ] )
_escChar = Word("\\",printables,exact=2)
# add this parse action to "unescape" commas
_escChar.setParseAction( lambda s,l,t: [t[0][1]] )
value = Combine( OneOrMore( _escChar | Word(_noncommachars) ),
adjacent=False )

# add parse action to remove whitespace
collapseWhitespace = lambda s,l,t: [ "".join(t[0].split()) ]
value.setParseAction( collapseWhitespace )

# create overall definition, using Dict element to create a dictionary
# result structure
envVarDef = Dict(delimitedList(
Group(key + Literal("=").suppress() + value)))

# parse input, and access returned results as a dictionary, or as attributes
# on an object if the key name is valid as an attribute name
envVars = envVarDef.parseString( search_string )

print "GAUSS_EXEDIR:", envVars["GAUSS_EXEDIR"]
print "GAUSS_EXEDIR:", envVars.GAUSS_EXEDIR
for k in envVars.keys():
print k+":", envVars[k]
Jul 18 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: lecichy | last post by:
Hello Heres the situation: I got a file with lines like: name:second_name:somenumber:otherinfo etc with different values between colons ( just like passwd file) What I want is to extract...
2
by: hantie | last post by:
Is it possible to do this extraction in Java from the InputStream? <sql commands="select title from books" sql> other patterns .... .... The result will be just: select title from books
7
by: Ryan Swift | last post by:
Hi, I'm new to Python, so this may be an easy solution. I'm having trouble extracting TIFF files from incoming emails. Actually, I think the root of my problem is that I'm having trouble reading...
13
by: yaipa | last post by:
What would be the common sense way of finding a binary pattern in a ..bin file, say some 200 bytes, and replacing it with an updated pattern of the same length at the same offset? Also, the...
16
by: Preben Randhol | last post by:
Hi A short newbie question. I would like to extract some values from a given text file directly into python variables. Can this be done simply by either standard library or other libraries? Some...
4
by: Debbiedo | last post by:
My software program outputs an XML Driving Directions file that I need to input into an Access table (although if need be I can import a dbf or xls) so that I can relate one of the fields...
3
by: | last post by:
I'm analyzing large strings and finding matches using the Regex class. I want to find the context those matches are found in and to display excerpts of that context, just as a search engine might....
6
by: geegeegeegee | last post by:
Hi All, I have come across a difficult problem to do with extracting UniCode characters from RTF strings. A detailed description of my problem is below, if anyone could help, it would be much...
275
by: Astley Le Jasper | last post by:
Sorry for the numpty question ... How do you find the reference name of an object? So if i have this bob = modulename.objectname() how do i find that the name is 'bob'
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.