471,350 Members | 2,049 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,350 software developers and data experts.

extract text from log file using re

Hi,

I would like to delete a region on a log file which has this
kind of structure:
#------flutest------------------------------------------------------------
498 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04
8.3956e-04 3.8560e-03 4.8384e-02 11:40:01 499
499 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04
8.3956e-04 3.8560e-03 4.8384e-02 11:40:01 499
reversed flow in 1 faces on pressure-outlet 35.

Writing
"/home/gcae504/SCR1/Solververgleich/Klimakruemmer_AK/CAD/Daimler/fluent-0500.cas"...
5429199 mixed cells, zone 29, binary.
11187656 mixed interior faces, zone 30, binary.
20004 triangular wall faces, zone 31, binary.
1104 mixed velocity-inlet faces, zone 32, binary.
133638 triangular wall faces, zone 33, binary.
14529 triangular wall faces, zone 34, binary.
1350 mixed pressure-outlet faces, zone 35, binary.
11714 mixed wall faces, zone 36, binary.
1232141 nodes, binary.
1232141 node flags, binary.
Done.
Writing
"/home/gcae504/SCR1/Solververgleich/Klimakruemmer_AK/CAD/Daimler/fluent-0500.dat"...
Done.

500 1.0049e-03 2.4630e-04 9.8395e-05 1.4865e-04
8.3913e-04 3.8545e-03 1.3315e-01 11:14:10 500

reversed flow in 2 faces on pressure-outlet 35.
501 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04
8.3956e-04 3.8560e-03 4.8384e-02 11:40:01 499

#------------------------------------------------------------------

I have a small script, which removes lines starting with
'(re)versed', '(i)teration' and '(t)urbulent' and put the
rest into an array:

# -- plot residuals ----------------------------------------
import re
filename="flutest"
reversed_flow=re.compile('^\ re')
turbulent_viscosity_ratio=re.compile('^\ tu')
iteration=re.compile('^\ \ i')

begin_of_res=re.compile('>\ \ \ i')
end_of_res=re.compile('^\ ad')

begin_of_writing=re.compile('^\Writing')
end_of_writing=re.compile('^\Done')

end_number=0
begin_number=0
n = 0
for line in open(filename).readlines():
n = n + 1
if begin_of_res.match(line):
begin_number=n+1
print "Line Number (begin): " + str(n)

if end_of_res.match(line):
end_number=n
print "Line Number (end): " + str(n)

if begin_of_writing.match(line):
begin_w=n+1
print "BeginWriting: " + str(n)
print "HALLO"

if end_of_writing.match(line):
end_w=n+1
print "EndWriting: " +str(n)

if n end_number:
end_number=n
print "Line Number (end): " + str(end_number)

n = 0
array = []
array_dummy = []
array_mapped = []

mapped = []
mappe = []

n = 0
for line in open(filename).readlines():
n = n + 1
if (begin_number <= n) and (end_number n):
# if (begin_w <= n) and (end_w n):
if not reversed_flow.match(line) and not
iteration.match(line) and not
turbulent_viscosity_ratio.match(line):
m=(line.strip().split())
print m
if len(m) 0:
# print len(m)
laenge_liste=len(m)
# print len(m)
mappe.append(m)
#--end plot
residuals-------------------------------------------------

This works fine ; except for the region with the writing
information:

#-----writing information
-----------------------------------------
Writing "/home/fb/fluent-0500.cas"...
5429199 mixed cells, zone 29, binary.
11187656 mixed interior faces, zone 30, binary.
20004 triangular wall faces, zone 31, binary.
1104 mixed velocity-inlet faces, zone 32, binary.
133638 triangular wall faces, zone 33, binary.
14529 triangular wall faces, zone 34, binary.
1350 mixed pressure-outlet faces, zone 35, binary.
11714 mixed wall faces, zone 36, binary.
1232141 nodes, binary.
1232141 node flags, binary.
Done.
# -------end writing information -------------------------------

Does anyone know, how I can this 'writing' stuff too? The
matchingIt occurs a lot :-(

Regards!
Fabian

Sep 13 '07 #1
2 2980
Fabian Braennstroem wrote:
I would like to delete a region on a log file which has this
kind of structure:
#------flutest------------------------------------------------------------
498 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04
8.3956e-04 3.8560e-03 4.8384e-02 11:40:01 499
499 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04
8.3956e-04 3.8560e-03 4.8384e-02 11:40:01 499
reversed flow in 1 faces on pressure-outlet 35.

Writing
"/home/gcae504/SCR1/Solververgleich/Klimakruemmer_AK/CAD/Daimler/fluent-0500.cas"...
5429199 mixed cells, zone 29, binary.
11187656 mixed interior faces, zone 30, binary.
20004 triangular wall faces, zone 31, binary.
1104 mixed velocity-inlet faces, zone 32, binary.
133638 triangular wall faces, zone 33, binary.
14529 triangular wall faces, zone 34, binary.
1350 mixed pressure-outlet faces, zone 35, binary.
11714 mixed wall faces, zone 36, binary.
1232141 nodes, binary.
1232141 node flags, binary.
Done.
Writing
"/home/gcae504/SCR1/Solververgleich/Klimakruemmer_AK/CAD/Daimler/fluent-0500.dat"...
Done.

500 1.0049e-03 2.4630e-04 9.8395e-05 1.4865e-04
8.3913e-04 3.8545e-03 1.3315e-01 11:14:10 500

reversed flow in 2 faces on pressure-outlet 35.
501 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04
8.3956e-04 3.8560e-03 4.8384e-02 11:40:01 499

#------------------------------------------------------------------

I have a small script, which removes lines starting with
'(re)versed', '(i)teration' and '(t)urbulent' and put the
rest into an array:

# -- plot residuals ----------------------------------------
import re
filename="flutest"
reversed_flow=re.compile('^\ re')
turbulent_viscosity_ratio=re.compile('^\ tu')
iteration=re.compile('^\ \ i')

begin_of_res=re.compile('>\ \ \ i')
end_of_res=re.compile('^\ ad')
The following regular expressions have some extra backslashes
which change their meaning:
begin_of_writing=re.compile('^\Writing')
end_of_writing=re.compile('^\Done')
But I don't think you need regular expressions at all.
Also, it's better to iterate over the file just once because
you don't need to remember the position of regions to be skipped.
Here's a simplified demo:

def skip_region(items, start, end):
items = iter(items)
while 1:
for line in items:
if start(line):
break
yield line
else:
break
for line in items:
if end(line):
break
else:
break

def begin(line):
return line.strip() == "Writing"

def end(line):
return line.strip() == "Done."

# --- begin demo setup (remove to test with real data) ---
def open(filename):
from StringIO import StringIO
return StringIO("""\
iteration # to be ignored
alpha
beta
reversed # to be ignored
Writing
to
be
ignored
Done.
gamma
delta

""")
# --- end demo setup ---

if __name__ == "__main__":
filename = "fluetest"
for line in skip_region(open(filename), begin, end):
line = line.strip()
if line and not line.startswith(("reversed", "iteration")):
print line

skip_region() takes a file (or any iterable) and two functions
that test for the begin/end of the region to be skipped.
You can nest skip_region() calls if you have regions with different
start/end conditions.

Peter
Sep 14 '07 #2
On Sep 13, 4:09 pm, Fabian Braennstroem <f.braennstr...@gmx.dewrote:
Hi,

I would like to delete a region on a log file which has this
kind of structure:
How about just searching for what you want. Here are two approaches,
one using pyparsing, one using the batteries-included re module.

-- Paul
# -*- coding: iso-8859-15 -*-
data = """\
498 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04 8.3956e-04
3.8560e-03 4.8384e-02 11:40:01 499
499 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04 8.3956e-04
3.8560e-03 4.8384e-02 11:40:01 499
reversed flow in 1 faces on pressure-outlet 35.

Writing
"/home/gcae504/SCR1/Solververgleich/Klimakruemmer_AK/CAD/Daimler/
fluent-050*0.cas"...
5429199 mixed cells, zone 29, binary.
11187656 mixed interior faces, zone 30, binary.
20004 triangular wall faces, zone 31, binary.
1104 mixed velocity-inlet faces, zone 32, binary.
133638 triangular wall faces, zone 33, binary.
14529 triangular wall faces, zone 34, binary.
1350 mixed pressure-outlet faces, zone 35, binary.
11714 mixed wall faces, zone 36, binary.
1232141 nodes, binary.
1232141 node flags, binary.
Done.

Writing
"/home/gcae504/SCR1/Solververgleich/Klimakruemmer_AK/CAD/Daimler/
fluent-050*0.dat"...
Done.
500 1.0049e-03 2.4630e-04 9.8395e-05 1.4865e-04 8.3913e-04
3.8545e-03 1.3315e-01 11:14:10 500
reversed flow in 2 faces on pressure-outlet 35.
501 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04 8.3956e-04
3.8560e-03 4.8384e-02 11:40:01 499
"""

print "search using pyparsing"
from pyparsing import *

integer = Word(nums).setParseAction(lambda t:int(t[0]))
scireal = Regex(r"\d*\.\d*e\-\d\d").setParseAction(lambda
t:float(t[0]))
time = Regex(r"\d\d:\d\d:\d\d")

logline = (integer("testNum") +
And([scireal]*7)("data") +
time("testTime") +
integer("result"))

for tRes in logline.searchString(data):
print "Test#:",tRes.testNum
print "Data:", tRes.data
print "Time:", tRes.testTime
print "Output:", tRes.result
print

print
print "search using re's"
import re
integer = r"\d*"
scireal = r"\d*\.\d*e\-\d\d"
time = r"\d\d:\d\d:\d\d"
ws = r"\s*"

namedField = lambda reStr,n: "(?P<%s>%s)" % (n,reStr)
logline = re.compile(
namedField(integer,"testNum") + ws +
namedField( (scireal+ws)*7,"data" ) +
namedField(time,"testTime") + ws +
namedField(integer,"result") )
for m in logline.finditer(data):
print "Test#:",int(m.group("testNum"))
print "Data:", map(float,m.group("data").split())
print "Time:", m.group("testTime")
print "Output:", int(m.group("result"))
print

Prints:

search using pyparsing
Test#: 498
Data: [0.0010085999999999999, 0.00024607999999999997,
9.8589000000000001e-005, 0.00014908, 0.00083956000000000005,
0.0038560000000000001, 0.048384000000000003]
Time: 11:40:01
Output: 499

Test#: 499
Data: [0.0010085999999999999, 0.00024607999999999997,
9.8589000000000001e-005, 0.00014908, 0.00083956000000000005,
0.0038560000000000001, 0.048384000000000003]
Time: 11:40:01
Output: 499

Test#: 500
Data: [0.0010049, 0.00024630000000000002, 9.8394999999999996e-005,
0.00014865000000000001, 0.00083913, 0.0038544999999999999,
0.13314999999999999]
Time: 11:14:10
Output: 500

Test#: 501
Data: [0.0010085999999999999, 0.00024607999999999997,
9.8589000000000001e-005, 0.00014908, 0.00083956000000000005,
0.0038560000000000001, 0.048384000000000003]
Time: 11:40:01
Output: 499
search using re's
Test#: 498
Data: [0.0010085999999999999, 0.00024607999999999997,
9.8589000000000001e-005, 0.00014908, 0.00083956000000000005,
0.0038560000000000001, 0.048384000000000003]
Time: 11:40:01
Output: 499

Test#: 499
Data: [0.0010085999999999999, 0.00024607999999999997,
9.8589000000000001e-005, 0.00014908, 0.00083956000000000005,
0.0038560000000000001, 0.048384000000000003]
Time: 11:40:01
Output: 499

Test#: 500
Data: [0.0010049, 0.00024630000000000002, 9.8394999999999996e-005,
0.00014865000000000001, 0.00083913, 0.0038544999999999999,
0.13314999999999999]
Time: 11:14:10
Output: 500

Test#: 501
Data: [0.0010085999999999999, 0.00024607999999999997,
9.8589000000000001e-005, 0.00014908, 0.00083956000000000005,
0.0038560000000000001, 0.048384000000000003]
Time: 11:40:01
Output: 499
Sep 14 '07 #3

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

9 posts views Thread by Sharon | last post: by
8 posts views Thread by nick | last post: by
9 posts views Thread by trihanhcie | last post: by
8 posts views Thread by Fabian Braennstroem | last post: by
reply views Thread by napolpie | last post: by
3 posts views Thread by SteveB | last post: by
5 posts views Thread by Steve | last post: by
reply views Thread by XIAOLAOHU | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.