extract text from log file using re

Fabian Braennstroem

Hi,

I would like to delete a region on a log file which has this
kind of structure:
#------flutest------------------------------------------------------------
498 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04
8.3956e-04 3.8560e-03 4.8384e-02 11:40:01 499
499 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04
8.3956e-04 3.8560e-03 4.8384e-02 11:40:01 499
reversed flow in 1 faces on pressure-outlet 35.

Writing
"/home/gcae504/SCR1/Solververgleich/Klimakruemmer_AK/CAD/Daimler/fluent-0500.cas"...
5429199 mixed cells, zone 29, binary.
11187656 mixed interior faces, zone 30, binary.
20004 triangular wall faces, zone 31, binary.
1104 mixed velocity-inlet faces, zone 32, binary.
133638 triangular wall faces, zone 33, binary.
14529 triangular wall faces, zone 34, binary.
1350 mixed pressure-outlet faces, zone 35, binary.
11714 mixed wall faces, zone 36, binary.
1232141 nodes, binary.
1232141 node flags, binary.
Done.
Writing
"/home/gcae504/SCR1/Solververgleich/Klimakruemmer_AK/CAD/Daimler/fluent-0500.dat"...
Done.

500 1.0049e-03 2.4630e-04 9.8395e-05 1.4865e-04
8.3913e-04 3.8545e-03 1.3315e-01 11:14:10 500

reversed flow in 2 faces on pressure-outlet 35.
501 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04
8.3956e-04 3.8560e-03 4.8384e-02 11:40:01 499

#------------------------------------------------------------------

I have a small script, which removes lines starting with
'(re)versed', '(i)teration' and '(t)urbulent' and put the
rest into an array:

# -- plot residuals ----------------------------------------
import re
filename="flutest"
reversed_flow=re.compile('^\ re')
turbulent_viscosity_ratio=re.compile('^\ tu')
iteration=re.compile('^\ \ i')

begin_of_res=re.compile('>\ \ \ i')
end_of_res=re.compile('^\ ad')

begin_of_writing=re.compile('^\Writing')
end_of_writing=re.compile('^\Done')

end_number=0
begin_number=0
n = 0
for line in open(filename).readlines():
n = n + 1
if begin_of_res.match(line):
begin_number=n+1
print "Line Number (begin): " + str(n)

if end_of_res.match(line):
end_number=n
print "Line Number (end): " + str(n)

if begin_of_writing.match(line):
begin_w=n+1
print "BeginWriting: " + str(n)
print "HALLO"

if end_of_writing.match(line):
end_w=n+1
print "EndWriting: " +str(n)

if n end_number:
end_number=n
print "Line Number (end): " + str(end_number)

n = 0
array = []
array_dummy = []
array_mapped = []

mapped = []
mappe = []

n = 0
for line in open(filename).readlines():
n = n + 1
if (begin_number <= n) and (end_number n):
# if (begin_w <= n) and (end_w n):
if not reversed_flow.match(line) and not
iteration.match(line) and not
turbulent_viscosity_ratio.match(line):
m=(line.strip().split())
print m
if len(m) 0:
# print len(m)
laenge_liste=len(m)
# print len(m)
mappe.append(m)
#--end plot
residuals-------------------------------------------------

This works fine ; except for the region with the writing
information:

#-----writing information
-----------------------------------------
Writing "/home/fb/fluent-0500.cas"...
5429199 mixed cells, zone 29, binary.
11187656 mixed interior faces, zone 30, binary.
20004 triangular wall faces, zone 31, binary.
1104 mixed velocity-inlet faces, zone 32, binary.
133638 triangular wall faces, zone 33, binary.
14529 triangular wall faces, zone 34, binary.
1350 mixed pressure-outlet faces, zone 35, binary.
11714 mixed wall faces, zone 36, binary.
1232141 nodes, binary.
1232141 node flags, binary.
Done.
# -------end writing information -------------------------------

Does anyone know, how I can this 'writing' stuff too? The
matchingIt occurs a lot :-(

Regards!
Fabian

Sep 13 '07 #1

Subscribe Reply

3062

Peter Otten

Fabian Braennstroem wrote:

I would like to delete a region on a log file which has this
kind of structure:
#------flutest------------------------------------------------------------
498 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04
8.3956e-04 3.8560e-03 4.8384e-02 11:40:01 499
499 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04
8.3956e-04 3.8560e-03 4.8384e-02 11:40:01 499
reversed flow in 1 faces on pressure-outlet 35.

Writing
"/home/gcae504/SCR1/Solververgleich/Klimakruemmer_AK/CAD/Daimler/fluent-0500.cas"...
5429199 mixed cells, zone 29, binary.
11187656 mixed interior faces, zone 30, binary.
20004 triangular wall faces, zone 31, binary.
1104 mixed velocity-inlet faces, zone 32, binary.
133638 triangular wall faces, zone 33, binary.
14529 triangular wall faces, zone 34, binary.
1350 mixed pressure-outlet faces, zone 35, binary.
11714 mixed wall faces, zone 36, binary.
1232141 nodes, binary.
1232141 node flags, binary.
Done.
Writing
"/home/gcae504/SCR1/Solververgleich/Klimakruemmer_AK/CAD/Daimler/fluent-0500.dat"...
Done.

500 1.0049e-03 2.4630e-04 9.8395e-05 1.4865e-04
8.3913e-04 3.8545e-03 1.3315e-01 11:14:10 500

reversed flow in 2 faces on pressure-outlet 35.
501 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04
8.3956e-04 3.8560e-03 4.8384e-02 11:40:01 499

#------------------------------------------------------------------

I have a small script, which removes lines starting with
'(re)versed', '(i)teration' and '(t)urbulent' and put the
rest into an array:

# -- plot residuals ----------------------------------------
import re
filename="flutest"
reversed_flow=re.compile('^\ re')
turbulent_viscosity_ratio=re.compile('^\ tu')
iteration=re.compile('^\ \ i')

begin_of_res=re.compile('>\ \ \ i')
end_of_res=re.compile('^\ ad')

The following regular expressions have some extra backslashes
which change their meaning:

begin_of_writing=re.compile('^\Writing')
end_of_writing=re.compile('^\Done')

But I don't think you need regular expressions at all.
Also, it's better to iterate over the file just once because
you don't need to remember the position of regions to be skipped.
Here's a simplified demo:

def skip_region(items, start, end):
items = iter(items)
while 1:
for line in items:
if start(line):
break
yield line
else:
break
for line in items:
if end(line):
break
else:
break

def begin(line):
return line.strip() == "Writing"

def end(line):
return line.strip() == "Done."

# --- begin demo setup (remove to test with real data) ---
def open(filename):
from StringIO import StringIO
return StringIO("""\
iteration # to be ignored
alpha
beta
reversed # to be ignored
Writing
to
be
ignored
Done.
gamma
delta

""")
# --- end demo setup ---

if __name__ == "__main__":
filename = "fluetest"
for line in skip_region(open(filename), begin, end):
line = line.strip()
if line and not line.startswith(("reversed", "iteration")):
print line

skip_region() takes a file (or any iterable) and two functions
that test for the begin/end of the region to be skipped.
You can nest skip_region() calls if you have regions with different
start/end conditions.

Peter

Sep 14 '07 #2

Paul McGuire

On Sep 13, 4:09 pm, Fabian Braennstroem <f.braennstr...@gmx.dewrote:

Hi,

I would like to delete a region on a log file which has this
kind of structure:

How about just searching for what you want. Here are two approaches,
one using pyparsing, one using the batteries-included re module.

-- Paul
# -*- coding: iso-8859-15 -*-
data = """\
498 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04 8.3956e-04
3.8560e-03 4.8384e-02 11:40:01 499
499 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04 8.3956e-04
3.8560e-03 4.8384e-02 11:40:01 499
reversed flow in 1 faces on pressure-outlet 35.

Writing
"/home/gcae504/SCR1/Solververgleich/Klimakruemmer_AK/CAD/Daimler/
fluent-050*0.cas"...
5429199 mixed cells, zone 29, binary.
11187656 mixed interior faces, zone 30, binary.
20004 triangular wall faces, zone 31, binary.
1104 mixed velocity-inlet faces, zone 32, binary.
133638 triangular wall faces, zone 33, binary.
14529 triangular wall faces, zone 34, binary.
1350 mixed pressure-outlet faces, zone 35, binary.
11714 mixed wall faces, zone 36, binary.
1232141 nodes, binary.
1232141 node flags, binary.
Done.

Writing
"/home/gcae504/SCR1/Solververgleich/Klimakruemmer_AK/CAD/Daimler/
fluent-050*0.dat"...
Done.
500 1.0049e-03 2.4630e-04 9.8395e-05 1.4865e-04 8.3913e-04
3.8545e-03 1.3315e-01 11:14:10 500
reversed flow in 2 faces on pressure-outlet 35.
501 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04 8.3956e-04
3.8560e-03 4.8384e-02 11:40:01 499
"""

print "search using pyparsing"
from pyparsing import *

integer = Word(nums).setParseAction(lambda t:int(t[0]))
scireal = Regex(r"\d*\.\d*e\-\d\d").setParseAction(lambda
t:float(t[0]))
time = Regex(r"\d\d:\d\d:\d\d")

logline = (integer("testNum") +
And([scireal]*7)("data") +
time("testTime") +
integer("result"))

for tRes in logline.searchString(data):
print "Test#:",tRes.testNum
print "Data:", tRes.data
print "Time:", tRes.testTime
print "Output:", tRes.result
print

print
print "search using re's"
import re
integer = r"\d*"
scireal = r"\d*\.\d*e\-\d\d"
time = r"\d\d:\d\d:\d\d"
ws = r"\s*"

namedField = lambda reStr,n: "(?P<%s>%s)" % (n,reStr)
logline = re.compile(
namedField(integer,"testNum") + ws +
namedField( (scireal+ws)*7,"data" ) +
namedField(time,"testTime") + ws +
namedField(integer,"result") )
for m in logline.finditer(data):
print "Test#:",int(m.group("testNum"))
print "Data:", map(float,m.group("data").split())
print "Time:", m.group("testTime")
print "Output:", int(m.group("result"))
print

Prints:

search using pyparsing
Test#: 498
Data: [0.0010085999999999999, 0.00024607999999999997,
9.8589000000000001e-005, 0.00014908, 0.00083956000000000005,
0.0038560000000000001, 0.048384000000000003]
Time: 11:40:01
Output: 499

Test#: 499
Data: [0.0010085999999999999, 0.00024607999999999997,
9.8589000000000001e-005, 0.00014908, 0.00083956000000000005,
0.0038560000000000001, 0.048384000000000003]
Time: 11:40:01
Output: 499

Test#: 500
Data: [0.0010049, 0.00024630000000000002, 9.8394999999999996e-005,
0.00014865000000000001, 0.00083913, 0.0038544999999999999,
0.13314999999999999]
Time: 11:14:10
Output: 500

Test#: 501
Data: [0.0010085999999999999, 0.00024607999999999997,
9.8589000000000001e-005, 0.00014908, 0.00083956000000000005,
0.0038560000000000001, 0.048384000000000003]
Time: 11:40:01
Output: 499
search using re's
Test#: 498
Data: [0.0010085999999999999, 0.00024607999999999997,
9.8589000000000001e-005, 0.00014908, 0.00083956000000000005,
0.0038560000000000001, 0.048384000000000003]
Time: 11:40:01
Output: 499

Test#: 499
Data: [0.0010085999999999999, 0.00024607999999999997,
9.8589000000000001e-005, 0.00014908, 0.00083956000000000005,
0.0038560000000000001, 0.048384000000000003]
Time: 11:40:01
Output: 499

Test#: 500
Data: [0.0010049, 0.00024630000000000002, 9.8394999999999996e-005,
0.00014865000000000001, 0.00083913, 0.0038544999999999999,
0.13314999999999999]
Time: 11:14:10
Output: 500

Test#: 501
Data: [0.0010085999999999999, 0.00024607999999999997,
9.8589000000000001e-005, 0.00014908, 0.00083956000000000005,
0.0038560000000000001, 0.048384000000000003]
Time: 11:40:01
Output: 499

Sep 14 '07 #3

Similar topics

C: extract string from file

by: Sharon | last post by:

hi, I want to extract a string from a file, if the file is like this: 1 This is the string 2 3 4 how could I extract the string, starting from the 10th position (i.e. "T") and...

C / C++

extract data from file

by: nick | last post by:

Hi all can any one please tell me what is wrong in this code?? I'm new to deal with text files and extract data. i'm trying to look for data in a text file (3~4 pages) some lines start with a...

C / C++

Extract image dimensions (height, width) from Base64 String?

by: Neo Geshel | last post by:

Greetings. I have managed to stitch together an awesome method of posting text along with an image to a database, in a way that allows an unlimited number of previews to ensure that text and...

Visual Basic .NET

PHP4 : Extract text from HTML file

by: trihanhcie | last post by:

Hi, I would like to extract the text in an HTML file For the moment, I'm trying to get all text between <tdand </td>. I used a regular expression because i don't know the "format between...

PHP

extract certain values from file with re

by: Fabian Braennstroem | last post by:

Hi, I would like to remove certain lines from a log files. I had some sed/awk scripts for this, but now, I want to use python with its re module for this task. Actually, I have two different...

Python

VB Extract data from HTML - use DOM, MSHTML, XML?

by: steveyjg | last post by:

I want to extract the following data from a retrieved html file and store the information as strings. 'get the text of "title" <h1 id="test_title">title</h1> 'get the contents of the value...

.NET Framework

using pyparsing to extract METEO DATAS

by: napolpie | last post by:

DISCUSSION IN USER nappie writes: Hello, I'm Peter and I'm new in python codying and I'm using parsying to extract data from one meteo Arpege file. This file is long file and it's composed by...

Python

how to read data in text file, extract and display in tabular manner in excel doc?

by: maylee21 | last post by:

hi, anyone can help me figure out how to read data from a text file like this: 10980012907200228082002 and extract the data according to this kind of format: Record type 1 TY-RECORD ...

C / C++

Code to Extract Text from PDF

by: SteveB | last post by:

I have posted this question in the Visual Basic 2005 and Visual Basic .Net 2005 discussion groups, also. Hi. I am developing an application/web page with VB.Net that will populate a SQL...

Visual Basic .NET

Extract Image From PDF

by: Steve | last post by:

Hi all Does anybody please know a way to extract an Image from a pdf file and save it as a TIFF? I have used a scanner to scan documents which are then placed on a server, but I need to...

Visual Basic .NET

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

C# / C Sharp

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...

C# / C Sharp