472,952 Members | 1,797 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,952 software developers and data experts.

extract text from log file using re

Hi,

I would like to delete a region on a log file which has this
kind of structure:
#------flutest------------------------------------------------------------
498 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04
8.3956e-04 3.8560e-03 4.8384e-02 11:40:01 499
499 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04
8.3956e-04 3.8560e-03 4.8384e-02 11:40:01 499
reversed flow in 1 faces on pressure-outlet 35.

Writing
"/home/gcae504/SCR1/Solververgleich/Klimakruemmer_AK/CAD/Daimler/fluent-0500.cas"...
5429199 mixed cells, zone 29, binary.
11187656 mixed interior faces, zone 30, binary.
20004 triangular wall faces, zone 31, binary.
1104 mixed velocity-inlet faces, zone 32, binary.
133638 triangular wall faces, zone 33, binary.
14529 triangular wall faces, zone 34, binary.
1350 mixed pressure-outlet faces, zone 35, binary.
11714 mixed wall faces, zone 36, binary.
1232141 nodes, binary.
1232141 node flags, binary.
Done.
Writing
"/home/gcae504/SCR1/Solververgleich/Klimakruemmer_AK/CAD/Daimler/fluent-0500.dat"...
Done.

500 1.0049e-03 2.4630e-04 9.8395e-05 1.4865e-04
8.3913e-04 3.8545e-03 1.3315e-01 11:14:10 500

reversed flow in 2 faces on pressure-outlet 35.
501 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04
8.3956e-04 3.8560e-03 4.8384e-02 11:40:01 499

#------------------------------------------------------------------

I have a small script, which removes lines starting with
'(re)versed', '(i)teration' and '(t)urbulent' and put the
rest into an array:

# -- plot residuals ----------------------------------------
import re
filename="flutest"
reversed_flow=re.compile('^\ re')
turbulent_viscosity_ratio=re.compile('^\ tu')
iteration=re.compile('^\ \ i')

begin_of_res=re.compile('>\ \ \ i')
end_of_res=re.compile('^\ ad')

begin_of_writing=re.compile('^\Writing')
end_of_writing=re.compile('^\Done')

end_number=0
begin_number=0
n = 0
for line in open(filename).readlines():
n = n + 1
if begin_of_res.match(line):
begin_number=n+1
print "Line Number (begin): " + str(n)

if end_of_res.match(line):
end_number=n
print "Line Number (end): " + str(n)

if begin_of_writing.match(line):
begin_w=n+1
print "BeginWriting: " + str(n)
print "HALLO"

if end_of_writing.match(line):
end_w=n+1
print "EndWriting: " +str(n)

if n end_number:
end_number=n
print "Line Number (end): " + str(end_number)

n = 0
array = []
array_dummy = []
array_mapped = []

mapped = []
mappe = []

n = 0
for line in open(filename).readlines():
n = n + 1
if (begin_number <= n) and (end_number n):
# if (begin_w <= n) and (end_w n):
if not reversed_flow.match(line) and not
iteration.match(line) and not
turbulent_viscosity_ratio.match(line):
m=(line.strip().split())
print m
if len(m) 0:
# print len(m)
laenge_liste=len(m)
# print len(m)
mappe.append(m)
#--end plot
residuals-------------------------------------------------

This works fine ; except for the region with the writing
information:

#-----writing information
-----------------------------------------
Writing "/home/fb/fluent-0500.cas"...
5429199 mixed cells, zone 29, binary.
11187656 mixed interior faces, zone 30, binary.
20004 triangular wall faces, zone 31, binary.
1104 mixed velocity-inlet faces, zone 32, binary.
133638 triangular wall faces, zone 33, binary.
14529 triangular wall faces, zone 34, binary.
1350 mixed pressure-outlet faces, zone 35, binary.
11714 mixed wall faces, zone 36, binary.
1232141 nodes, binary.
1232141 node flags, binary.
Done.
# -------end writing information -------------------------------

Does anyone know, how I can this 'writing' stuff too? The
matchingIt occurs a lot :-(

Regards!
Fabian

Sep 13 '07 #1
2 3040
Fabian Braennstroem wrote:
I would like to delete a region on a log file which has this
kind of structure:
#------flutest------------------------------------------------------------
498 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04
8.3956e-04 3.8560e-03 4.8384e-02 11:40:01 499
499 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04
8.3956e-04 3.8560e-03 4.8384e-02 11:40:01 499
reversed flow in 1 faces on pressure-outlet 35.

Writing
"/home/gcae504/SCR1/Solververgleich/Klimakruemmer_AK/CAD/Daimler/fluent-0500.cas"...
5429199 mixed cells, zone 29, binary.
11187656 mixed interior faces, zone 30, binary.
20004 triangular wall faces, zone 31, binary.
1104 mixed velocity-inlet faces, zone 32, binary.
133638 triangular wall faces, zone 33, binary.
14529 triangular wall faces, zone 34, binary.
1350 mixed pressure-outlet faces, zone 35, binary.
11714 mixed wall faces, zone 36, binary.
1232141 nodes, binary.
1232141 node flags, binary.
Done.
Writing
"/home/gcae504/SCR1/Solververgleich/Klimakruemmer_AK/CAD/Daimler/fluent-0500.dat"...
Done.

500 1.0049e-03 2.4630e-04 9.8395e-05 1.4865e-04
8.3913e-04 3.8545e-03 1.3315e-01 11:14:10 500

reversed flow in 2 faces on pressure-outlet 35.
501 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04
8.3956e-04 3.8560e-03 4.8384e-02 11:40:01 499

#------------------------------------------------------------------

I have a small script, which removes lines starting with
'(re)versed', '(i)teration' and '(t)urbulent' and put the
rest into an array:

# -- plot residuals ----------------------------------------
import re
filename="flutest"
reversed_flow=re.compile('^\ re')
turbulent_viscosity_ratio=re.compile('^\ tu')
iteration=re.compile('^\ \ i')

begin_of_res=re.compile('>\ \ \ i')
end_of_res=re.compile('^\ ad')
The following regular expressions have some extra backslashes
which change their meaning:
begin_of_writing=re.compile('^\Writing')
end_of_writing=re.compile('^\Done')
But I don't think you need regular expressions at all.
Also, it's better to iterate over the file just once because
you don't need to remember the position of regions to be skipped.
Here's a simplified demo:

def skip_region(items, start, end):
items = iter(items)
while 1:
for line in items:
if start(line):
break
yield line
else:
break
for line in items:
if end(line):
break
else:
break

def begin(line):
return line.strip() == "Writing"

def end(line):
return line.strip() == "Done."

# --- begin demo setup (remove to test with real data) ---
def open(filename):
from StringIO import StringIO
return StringIO("""\
iteration # to be ignored
alpha
beta
reversed # to be ignored
Writing
to
be
ignored
Done.
gamma
delta

""")
# --- end demo setup ---

if __name__ == "__main__":
filename = "fluetest"
for line in skip_region(open(filename), begin, end):
line = line.strip()
if line and not line.startswith(("reversed", "iteration")):
print line

skip_region() takes a file (or any iterable) and two functions
that test for the begin/end of the region to be skipped.
You can nest skip_region() calls if you have regions with different
start/end conditions.

Peter
Sep 14 '07 #2
On Sep 13, 4:09 pm, Fabian Braennstroem <f.braennstr...@gmx.dewrote:
Hi,

I would like to delete a region on a log file which has this
kind of structure:
How about just searching for what you want. Here are two approaches,
one using pyparsing, one using the batteries-included re module.

-- Paul
# -*- coding: iso-8859-15 -*-
data = """\
498 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04 8.3956e-04
3.8560e-03 4.8384e-02 11:40:01 499
499 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04 8.3956e-04
3.8560e-03 4.8384e-02 11:40:01 499
reversed flow in 1 faces on pressure-outlet 35.

Writing
"/home/gcae504/SCR1/Solververgleich/Klimakruemmer_AK/CAD/Daimler/
fluent-050*0.cas"...
5429199 mixed cells, zone 29, binary.
11187656 mixed interior faces, zone 30, binary.
20004 triangular wall faces, zone 31, binary.
1104 mixed velocity-inlet faces, zone 32, binary.
133638 triangular wall faces, zone 33, binary.
14529 triangular wall faces, zone 34, binary.
1350 mixed pressure-outlet faces, zone 35, binary.
11714 mixed wall faces, zone 36, binary.
1232141 nodes, binary.
1232141 node flags, binary.
Done.

Writing
"/home/gcae504/SCR1/Solververgleich/Klimakruemmer_AK/CAD/Daimler/
fluent-050*0.dat"...
Done.
500 1.0049e-03 2.4630e-04 9.8395e-05 1.4865e-04 8.3913e-04
3.8545e-03 1.3315e-01 11:14:10 500
reversed flow in 2 faces on pressure-outlet 35.
501 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04 8.3956e-04
3.8560e-03 4.8384e-02 11:40:01 499
"""

print "search using pyparsing"
from pyparsing import *

integer = Word(nums).setParseAction(lambda t:int(t[0]))
scireal = Regex(r"\d*\.\d*e\-\d\d").setParseAction(lambda
t:float(t[0]))
time = Regex(r"\d\d:\d\d:\d\d")

logline = (integer("testNum") +
And([scireal]*7)("data") +
time("testTime") +
integer("result"))

for tRes in logline.searchString(data):
print "Test#:",tRes.testNum
print "Data:", tRes.data
print "Time:", tRes.testTime
print "Output:", tRes.result
print

print
print "search using re's"
import re
integer = r"\d*"
scireal = r"\d*\.\d*e\-\d\d"
time = r"\d\d:\d\d:\d\d"
ws = r"\s*"

namedField = lambda reStr,n: "(?P<%s>%s)" % (n,reStr)
logline = re.compile(
namedField(integer,"testNum") + ws +
namedField( (scireal+ws)*7,"data" ) +
namedField(time,"testTime") + ws +
namedField(integer,"result") )
for m in logline.finditer(data):
print "Test#:",int(m.group("testNum"))
print "Data:", map(float,m.group("data").split())
print "Time:", m.group("testTime")
print "Output:", int(m.group("result"))
print

Prints:

search using pyparsing
Test#: 498
Data: [0.0010085999999999999, 0.00024607999999999997,
9.8589000000000001e-005, 0.00014908, 0.00083956000000000005,
0.0038560000000000001, 0.048384000000000003]
Time: 11:40:01
Output: 499

Test#: 499
Data: [0.0010085999999999999, 0.00024607999999999997,
9.8589000000000001e-005, 0.00014908, 0.00083956000000000005,
0.0038560000000000001, 0.048384000000000003]
Time: 11:40:01
Output: 499

Test#: 500
Data: [0.0010049, 0.00024630000000000002, 9.8394999999999996e-005,
0.00014865000000000001, 0.00083913, 0.0038544999999999999,
0.13314999999999999]
Time: 11:14:10
Output: 500

Test#: 501
Data: [0.0010085999999999999, 0.00024607999999999997,
9.8589000000000001e-005, 0.00014908, 0.00083956000000000005,
0.0038560000000000001, 0.048384000000000003]
Time: 11:40:01
Output: 499
search using re's
Test#: 498
Data: [0.0010085999999999999, 0.00024607999999999997,
9.8589000000000001e-005, 0.00014908, 0.00083956000000000005,
0.0038560000000000001, 0.048384000000000003]
Time: 11:40:01
Output: 499

Test#: 499
Data: [0.0010085999999999999, 0.00024607999999999997,
9.8589000000000001e-005, 0.00014908, 0.00083956000000000005,
0.0038560000000000001, 0.048384000000000003]
Time: 11:40:01
Output: 499

Test#: 500
Data: [0.0010049, 0.00024630000000000002, 9.8394999999999996e-005,
0.00014865000000000001, 0.00083913, 0.0038544999999999999,
0.13314999999999999]
Time: 11:14:10
Output: 500

Test#: 501
Data: [0.0010085999999999999, 0.00024607999999999997,
9.8589000000000001e-005, 0.00014908, 0.00083956000000000005,
0.0038560000000000001, 0.048384000000000003]
Time: 11:40:01
Output: 499
Sep 14 '07 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
by: Sharon | last post by:
hi, I want to extract a string from a file, if the file is like this: 1 This is the string 2 3 4 how could I extract the string, starting from the 10th position (i.e. "T") and...
8
by: nick | last post by:
Hi all can any one please tell me what is wrong in this code?? I'm new to deal with text files and extract data. i'm trying to look for data in a text file (3~4 pages) some lines start with a...
7
by: Neo Geshel | last post by:
Greetings. I have managed to stitch together an awesome method of posting text along with an image to a database, in a way that allows an unlimited number of previews to ensure that text and...
9
by: trihanhcie | last post by:
Hi, I would like to extract the text in an HTML file For the moment, I'm trying to get all text between <tdand </td>. I used a regular expression because i don't know the "format between...
8
by: Fabian Braennstroem | last post by:
Hi, I would like to remove certain lines from a log files. I had some sed/awk scripts for this, but now, I want to use python with its re module for this task. Actually, I have two different...
1
by: steveyjg | last post by:
I want to extract the following data from a retrieved html file and store the information as strings. 'get the text of "title" <h1 id="test_title">title</h1> 'get the contents of the value...
0
by: napolpie | last post by:
DISCUSSION IN USER nappie writes: Hello, I'm Peter and I'm new in python codying and I'm using parsying to extract data from one meteo Arpege file. This file is long file and it's composed by...
3
by: maylee21 | last post by:
hi, anyone can help me figure out how to read data from a text file like this: 10980012907200228082002 and extract the data according to this kind of format: Record type 1 TY-RECORD ...
3
by: SteveB | last post by:
I have posted this question in the Visual Basic 2005 and Visual Basic .Net 2005 discussion groups, also. Hi. I am developing an application/web page with VB.Net that will populate a SQL...
5
by: Steve | last post by:
Hi all Does anybody please know a way to extract an Image from a pdf file and save it as a TIFF? I have used a scanner to scan documents which are then placed on a server, but I need to...
0
by: Mushico | last post by:
How to calculate date of retirement from date of birth
2
by: DJRhino | last post by:
Was curious if anyone else was having this same issue or not.... I was just Up/Down graded to windows 11 and now my access combo boxes are not acting right. With win 10 I could start typing...
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 4 Oct 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM) The start time is equivalent to 19:00 (7PM) in Central...
0
tracyyun
by: tracyyun | last post by:
Hello everyone, I have a question and would like some advice on network connectivity. I have one computer connected to my router via WiFi, but I have two other computers that I want to be able to...
2
by: giovanniandrean | last post by:
The energy model is structured as follows and uses excel sheets to give input data: 1-Utility.py contains all the functions needed to calculate the variables and other minor things (mentions...
1
by: Teri B | last post by:
Hi, I have created a sub-form Roles. In my course form the user selects the roles assigned to the course. 0ne-to-many. One course many roles. Then I created a report based on the Course form and...
3
by: nia12 | last post by:
Hi there, I am very new to Access so apologies if any of this is obvious/not clear. I am creating a data collection tool for health care employees to complete. It consists of a number of...
0
NeoPa
by: NeoPa | last post by:
Introduction For this article I'll be focusing on the Report (clsReport) class. This simply handles making the calling Form invisible until all of the Reports opened by it have been closed, when it...
0
isladogs
by: isladogs | last post by:
The next online meeting of the Access Europe User Group will be on Wednesday 6 Dec 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, Mike...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.