how to use python to extract certain text in the file?

New Member

thanks will relook into it.

Jan 6 '12 #3

391

Recognized Expert Contributor

The easy way to do this more safely would be something like

Expand|Select|Wrap|Line Numbers

 import re

lines=open("input.txt",'r').readlines()
 
for line in lines:

    a=re.findall(r'\w+',line)

    print re.findall(r'\w+',line)

    #print a[0],a[1],a[2],a[3],a[4],a[5],a[6]

    for b in a:

        print b,

    print

Jan 6 '12 #4

New Member

hi, thanks for your f/back.
would like to check whether the last line print is a typo? and also in the for loop, there is a print, ?
i was thinking of doing:

Expand|Select|Wrap|Line Numbers

 
for line in lines:

    a=line.split('-')[0]

    print a

    for b in a: 

        print b,

    print

Jan 6 '12 #5

391

Recognized Expert Contributor

The "print b," is to print b without going to a new line. It's to do the equivalent of printing a[0], a[1], a[2],... for as many as are needed.

The "print" at the end is to make a new line.

I was assuming your regular expression was working, but perhaps it isn't. I can't imaging that your split expression would work either.

Perhaps you can explain the logic of what you're trying to achieve. There are many ways that you could get that output given that input. But what are the more general rules? Eg is the first line always in that format? Do you find that all lines are one of two formats? If you can provide more about what you're trying to achieve, then it will be easier to help.

Jan 6 '12 #6

New Member

apologies for confusion.
general rules:
1) the first line inside input.txt (as attached in the first post):
-- num cell port function safe [ccell disval rslt]
i just need num cell port function safe ccell
however my script below couldnt get this so i skip the line. it will be great if you can show me.

2) i'm trying to convert the line inside the input.txt
"17 (BC_1, CLK, input, X)," &
into 17 BC_1 CLK input X
basically i'm only extracting column for num, cell, port, function, safe and ccell. the rest are not needed.

and " 7 (BC_1, Q(1), output3, X, 16, 1, Z)," &
into 7 BC_1 Q1 output3 X 16 1

so far my script is which result in the output.txt (as attached in the first post):

Expand|Select|Wrap|Line Numbers

 
import re
 
fileIn = open("input.txt", "rb")

fileOut = open("output.txt", "w")
 
for strData in fileIn:

    strData = strData.split('-')[0] #this is to remove the first line
 
    if("input" in strData):

        a=re.split("\W+", strData)

        #print a

        #fileOut.write (' '.join(a[1:7]) )

        fileOut.write(a[1]+' '+a[2]+' '+a[3]+' '+a[4]+' '+a[5]+' '+a[6]+'\n')
 
    if("output" in strData):

        a=re.split("\W+", strData)

        #print a

        fileOut.write(a[1]+' '+a[2]+' '+a[3]+' '+a[4]+' '+a[5]+' '+a[6]+' '+a[7]+'\n')

Jan 6 '12 #7

391

Recognized Expert Contributor

Regular expressions are what you need. They take a bit of getting used to, but work brilliantly once you have the hang of it.

You have a different number of variables in your input.txt file, so the below works with the first ones, which have the format you posted originally, but not with the latter ones:

Expand|Select|Wrap|Line Numbers

 import re
 
lines=open("input.txt","r")
 
p=re.compile('   " *(.*) \((.*), (.*), (.*), (.*)\)," &.*')
 
for line in lines:

    m=p.match(line)

    if not m: continue

    for i in range(1,6):

        print m.group(i),

    print

Gives

Expand|Select|Wrap|Line Numbers

 17 BC_1 CLK input X

15 BC_1 D(1) input X

14 BC_1 D(2) input X

13 BC_1 D(3) input X

12 BC_1 D(4) input X

11 BC_1 D(5) input X

10 BC_1 D(6) input X

9 BC_1 D(7) input X

8 BC_1 D(8) input X

7 BC_1, Q(1), output3, X 16 1 Z

6 BC_1, Q(2), output3, X 16 1 Z

5 BC_1, Q(3), output3, X 16 1 Z

4 BC_1, Q(4), output3, X 16 1 Z

3 BC_1, Q(5), output3, X 16 1 Z

2 BC_1, Q(6), output3, X 16 1 Z

1 BC_1, Q(7), output3, X 16 1 Z

Jan 6 '12 #8

New Member

wow only a few lines of codes. i dont get it regular expression, it is hard.
dont understand:
1)p=re.compile(' " *(.*) $(.*), (.*), (.*), (.*)$," &.*')
2)m=p.match(line), what does match line mean?
3)m.group(i), what does it group?
i tried to print but it only print address.

thanks

Jan 6 '12 #9

391

Recognized Expert Contributor

You can look here to get more details on how regular expressions work.

I don't understand your statement "I tried to print but it only print address". What does this mean? The code should work to create the output that I gave.

re.compile is to make a pattern that you can match some text against. In this case it looks for the following:
' "' is the start of the string

' *' is some number of spaces (bigger than or equal to 0)

'(.*)' is a string of any characters (.) and any length (*). The brackets say that this is one of the groups you want to find (so m.group(1) will return the string that's in there

' \(' find the string " (". You need to escape character ('\') because ( is one of the special characters (see above)

'(.*), ' find the next string of characters (for group(2)) followed by a comma (,) and a space ( ).

etc. You probably get the idea by now.

Then m is a match object from matching the pattern (p) to line (which is a line from input.txt).

Then m.group(i) refers to the groups that you said should be selected by putting the brackets () around them.

Note, that regular expressions are "greedy" in the sense that they find the biggest string that fits the pattern (starting from the left). Thus for the last 7 lines of input.txt group(1) is the string "BC_1, Q(1),ouput3, X", which I assume is not what you want.

Jan 6 '12 #10

New Member

thanks for your guidance. RE is one of the hardest topic to understand in python.

sorry for confusion. i tried to understand your code by doing a print. for eg:

Expand|Select|Wrap|Line Numbers

 
p=re.compile('   " *(.*) \((.*), (.*), (.*), (.*)\)," &.*') 

print p

which printed:
<_sre.SRE_Match object at 0x02B5C800>

and

Expand|Select|Wrap|Line Numbers

 
    m=p.match(line)

    print m

which printed:
None
<_sre.SRE_Match object at 0x02B5C620>
17 BC_1 CLK input X
None
None
<_sre.SRE_Match object at 0x02B5C620>
15 BC_1 D(1) input X
<_sre.SRE_Match object at 0x02B5CBC0>
14 BC_1 D(2) input X
<_sre.SRE_Match object at 0x02B5C620>
13 BC_1 D(3) input X
<_sre.SRE_Match object at 0x02B5CBC0>
12 BC_1 D(4) input X
<_sre.SRE_Match object at 0x02B5C620>
11 BC_1 D(5) input X
<_sre.SRE_Match object at 0x02B5CBC0>
10 BC_1 D(6) input X
<_sre.SRE_Match object at 0x02B5C620>
9 BC_1 D(7) input X
<_sre.SRE_Match object at 0x02B5CBC0>
8 BC_1 D(8) input X
<_sre.SRE_Match object at 0x02B5C620>
7 BC_1, Q(1), output3, X 16 1 Z
<_sre.SRE_Match object at 0x02B5CBC0>
6 BC_1, Q(2), output3, X 16 1 Z
<_sre.SRE_Match object at 0x02B5C620>
5 BC_1, Q(3), output3, X 16 1 Z
<_sre.SRE_Match object at 0x02B5CBC0>
4 BC_1, Q(4), output3, X 16 1 Z
<_sre.SRE_Match object at 0x02B5C620>
3 BC_1, Q(5), output3, X 16 1 Z
<_sre.SRE_Match object at 0x02B5CBC0>
2 BC_1, Q(6), output3, X 16 1 Z
<_sre.SRE_Match object at 0x02B5C620>
1 BC_1, Q(7), output3, X 16 1 Z
None

Jan 6 '12 #11

391

Recognized Expert Contributor

Yes, re objects don't print well, I'm afraid.

You need to use their attributes or methods. It can be a bit frustrating to debug and to understand. Try reading the docs link from my previous post. Good luck!

Jan 6 '12 #12

How to match literal backslashes read from a text file using regular expressions?

391

Recognized Expert Contributor

Alternatively you can do it without re

Expand|Select|Wrap|Line Numbers

 import re
 
lines=open("input.txt","r")
 
for line in lines:

    l1=line.replace(" ","").replace('"','').split(",")  #Remove the spaces from the line and separate on ,

    if len(l1)<2: continue   #to avoid lines that don't fit the general pattern

    l2=l1[0].split("(")

    l3=[l1[-2].replace(")","")]

    l4=l2+l1[1:-2]+l3

    print l4

gives this:

Expand|Select|Wrap|Line Numbers

 >>> 

['17', 'BC_1', 'CLK', 'input', 'X']

['16', 'BC_1', 'OC_NEG', 'input', 'X']

['16', 'BC_1', '*', 'control', '1']

['15', 'BC_1', 'D(1)', 'input', 'X']

['14', 'BC_1', 'D(2)', 'input', 'X']

['13', 'BC_1', 'D(3)', 'input', 'X']

['12', 'BC_1', 'D(4)', 'input', 'X']

['11', 'BC_1', 'D(5)', 'input', 'X']

['10', 'BC_1', 'D(6)', 'input', 'X']

['9', 'BC_1', 'D(7)', 'input', 'X']

['8', 'BC_1', 'D(8)', 'input', 'X']

['7', 'BC_1', 'Q(1)', 'output3', 'X', '16', '1', 'Z']

['6', 'BC_1', 'Q(2)', 'output3', 'X', '16', '1', 'Z']

['5', 'BC_1', 'Q(3)', 'output3', 'X', '16', '1', 'Z']

['4', 'BC_1', 'Q(4)', 'output3', 'X', '16', '1', 'Z']

['3', 'BC_1', 'Q(5)', 'output3', 'X', '16', '1', 'Z']

['2', 'BC_1', 'Q(6)', 'output3', 'X', '16', '1', 'Z']

['1', 'BC_1', 'Q(7)', 'output3', 'X', '16', '1', 'Z']

['0', 'BC_1', 'Q(8)', 'output3', 'X', '16', '1']

obviously you can use the contents of the list as you wish

Jan 6 '12 #13

Similar topics

2120

by: cricfan | last post by:

I'm parsing a text file to extract word definitions. For example the input text file contains the following content: di.va.gate \'di_--v*-.ga_-t\ vb pas.sim \'pas-*m\ adv : here and there :...

Any way to extract certain bytes from the remote file?

283

by: Vjay77 | last post by:

I posted this question, but I pressed 'post' and it disappeared. So once again: Problem: I need to go to lets say www.site.com/page.html Imagine that this html code is 6 mb long. I need to...

Visual Basic .NET

1820

extract from text file

by: rsj | last post by:

Hi! I have a large text file consisting of more than 1000 lines. The pattern for the text file is: xxxxx -- -- -- --

C# / C Sharp

10941

Extracting values from text file

by: Preben Randhol | last post by:

Hi A short newbie question. I would like to extract some values from a given text file directly into python variables. Can this be done simply by either standard library or other libraries? Some...

extract certain values from file with re

2817

by: Fabian Braennstroem | last post by:

Hi, I would like to remove certain lines from a log files. I had some sed/awk scripts for this, but now, I want to use python with its re module for this task. Actually, I have two different...

VB6 How can I download the html from a web page and save it to a text file?

1415

by: raymonddaly | last post by:

I have a list of web pages from which I want to extract certain information. I need to download the html from the page, save to a local text file, extract the data I want and save that to another...

Visual Basic 4 / 5 / 6

3166

editing a text file

by: bluemountain | last post by:

Hi there, Iam new to python forms and programming too I had a text file where i need to extract few words of data from the header(which is of 3 lines) and search for the keyword TEXT1, TEXT2,...

any way to extract a value from a text file?

2413

by: poolboi | last post by:

hi guys, i just did printing to a certain file #!perl\bin\perl use strict; use warnings; my $t; my @ok;

Perl

1868

How to get a text file name from the application start up folder in vb.net

by: selvialagar | last post by:

Hi.............. I need to extract a textfile (.txt) file alone from the application startup path. That is I stored a text file in d:\text\inputfile.txt. The folder contains a text file.but i...

.NET Framework

164327

Python to search text file string and replace it

by: DannyMc | last post by:

Hi , i am in the middle of creating the script to match the string and replace it in mount.txt mickey:/work1 /work1 bla bla bla mickey:/work2 /work2 bla bla bla micket:/job /job bla bla bla