473,505 Members | 15,976 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

how to use python to extract certain text in the file?

30 New Member
i want to extract certain section of the text file. my input file:

-- num cell port function safe [ccell disval rslt]
"17 (BC_1, CLK, input, X)," &
"16 (BC_1, OC_NEG, input, X), " &-- Merged input/
" 8 (BC_1, D(8), input, X)," & -- cell 16 @ 1 -> Hi-Z
" 7 (BC_1, Q(1), output3, X, 16, 1, Z)," &
" 0 (BC_1, Q(8), output3, X, 16, 1, Z)";
and i need the output to be as such:

num cell port function safe ccell
17 BC_1 CLK input X
16 BC_1 OC_NEG input X
16 BC_1 * control 1
8 BC_1 D8 input X
7 BC_1 Q1 output3 X 16 1
0 BC_1 Q8 output3 X 16 1
so far i tried below code but it gave index error. pls advise.

Expand|Select|Wrap|Line Numbers
  1. import re
  2. lines=open("input.txt",'r').readlines()
  3.  
  4. for line in lines:
  5.     a=re.findall(r'\w+',line)
  6.     print re.findall(r'\w+',line)
  7.     print a[0],a[1],a[2],a[3],a[4],a[5],a[6]
  8.  
  9.  
i'm using python 2.6.6 and win 7 and error as below: ['num', 'cell', 'port', 'function', 'safe', 'ccell', 'disval', 'rslt'] num cell port function safe ccell disval ['17', 'BC_1', 'CLK', 'input', 'X'] 17 BC_1 CLK input X Traceback (most recent call last): File "C:\Users\ctee1\Desktop\pyparsing\outputparser.py" , line 39, in print a[0],a[1],a[2],a[3],a[4],a[5],a[6] IndexError: list index out of range

thanks maximus
Attached Files
File Type: txt input.txt (863 Bytes, 713 views)
File Type: txt output.txt (528 Bytes, 623 views)
Jan 5 '12 #1
12 14430
Mariostg
332 Contributor
I believe it is because there are only 5 elements in "17 BC_1 CLK input X" and you are trying to print 7 (a[0] to a[6]).
Jan 5 '12 #2
maximus tee
30 New Member
thanks will relook into it.
Jan 6 '12 #3
Glenton
391 Recognized Expert Contributor
The easy way to do this more safely would be something like
Expand|Select|Wrap|Line Numbers
  1. import re
  2. lines=open("input.txt",'r').readlines()
  3.  
  4. for line in lines:
  5.     a=re.findall(r'\w+',line)
  6.     print re.findall(r'\w+',line)
  7.     #print a[0],a[1],a[2],a[3],a[4],a[5],a[6]
  8.     for b in a:
  9.         print b,
  10.     print
Jan 6 '12 #4
maximus tee
30 New Member
hi, thanks for your f/back.
would like to check whether the last line print is a typo? and also in the for loop, there is a print, ?
i was thinking of doing:
Expand|Select|Wrap|Line Numbers
  1. for line in lines:
  2.     a=line.split('-')[0]
  3.     print a
  4.     for b in a: 
  5.         print b,
  6.     print
  7.  
Jan 6 '12 #5
Glenton
391 Recognized Expert Contributor
The "print b," is to print b without going to a new line. It's to do the equivalent of printing a[0], a[1], a[2],... for as many as are needed.

The "print" at the end is to make a new line.

I was assuming your regular expression was working, but perhaps it isn't. I can't imaging that your split expression would work either.

Perhaps you can explain the logic of what you're trying to achieve. There are many ways that you could get that output given that input. But what are the more general rules? Eg is the first line always in that format? Do you find that all lines are one of two formats? If you can provide more about what you're trying to achieve, then it will be easier to help.
Jan 6 '12 #6
maximus tee
30 New Member
apologies for confusion.
general rules:
1) the first line inside input.txt (as attached in the first post):
-- num cell port function safe [ccell disval rslt]
i just need num cell port function safe ccell
however my script below couldnt get this so i skip the line. it will be great if you can show me.

2) i'm trying to convert the line inside the input.txt
"17 (BC_1, CLK, input, X)," &
into 17 BC_1 CLK input X
basically i'm only extracting column for num, cell, port, function, safe and ccell. the rest are not needed.

and " 7 (BC_1, Q(1), output3, X, 16, 1, Z)," &
into 7 BC_1 Q1 output3 X 16 1

so far my script is which result in the output.txt (as attached in the first post):

Expand|Select|Wrap|Line Numbers
  1. import re
  2.  
  3. fileIn = open("input.txt", "rb")
  4. fileOut = open("output.txt", "w")
  5.  
  6. for strData in fileIn:
  7.     strData = strData.split('-')[0] #this is to remove the first line
  8.  
  9.     if("input" in strData):
  10.         a=re.split("\W+", strData)
  11.         #print a
  12.         #fileOut.write (' '.join(a[1:7]) )
  13.         fileOut.write(a[1]+' '+a[2]+' '+a[3]+' '+a[4]+' '+a[5]+' '+a[6]+'\n')
  14.  
  15.     if("output" in strData):
  16.         a=re.split("\W+", strData)
  17.         #print a
  18.         fileOut.write(a[1]+' '+a[2]+' '+a[3]+' '+a[4]+' '+a[5]+' '+a[6]+' '+a[7]+'\n')
  19.  
Jan 6 '12 #7
Glenton
391 Recognized Expert Contributor
Regular expressions are what you need. They take a bit of getting used to, but work brilliantly once you have the hang of it.

You have a different number of variables in your input.txt file, so the below works with the first ones, which have the format you posted originally, but not with the latter ones:

Expand|Select|Wrap|Line Numbers
  1. import re
  2.  
  3. lines=open("input.txt","r")
  4.  
  5. p=re.compile('   " *(.*) \((.*), (.*), (.*), (.*)\)," &.*')
  6.  
  7.  
  8. for line in lines:
  9.     m=p.match(line)
  10.     if not m: continue
  11.     for i in range(1,6):
  12.         print m.group(i),
  13.     print
Gives
Expand|Select|Wrap|Line Numbers
  1. 17 BC_1 CLK input X
  2. 15 BC_1 D(1) input X
  3. 14 BC_1 D(2) input X
  4. 13 BC_1 D(3) input X
  5. 12 BC_1 D(4) input X
  6. 11 BC_1 D(5) input X
  7. 10 BC_1 D(6) input X
  8. 9 BC_1 D(7) input X
  9. 8 BC_1 D(8) input X
  10. 7 BC_1, Q(1), output3, X 16 1 Z
  11. 6 BC_1, Q(2), output3, X 16 1 Z
  12. 5 BC_1, Q(3), output3, X 16 1 Z
  13. 4 BC_1, Q(4), output3, X 16 1 Z
  14. 3 BC_1, Q(5), output3, X 16 1 Z
  15. 2 BC_1, Q(6), output3, X 16 1 Z
  16. 1 BC_1, Q(7), output3, X 16 1 Z
Jan 6 '12 #8
maximus tee
30 New Member
wow only a few lines of codes. i dont get it regular expression, it is hard.
dont understand:
1)p=re.compile(' " *(.*) \((.*), (.*), (.*), (.*)\)," &.*')
2)m=p.match(line), what does match line mean?
3)m.group(i), what does it group?
i tried to print but it only print address.

thanks
Jan 6 '12 #9
Glenton
391 Recognized Expert Contributor
You can look here to get more details on how regular expressions work.

I don't understand your statement "I tried to print but it only print address". What does this mean? The code should work to create the output that I gave.

re.compile is to make a pattern that you can match some text against. In this case it looks for the following:
' "' is the start of the string

' *' is some number of spaces (bigger than or equal to 0)

'(.*)' is a string of any characters (.) and any length (*). The brackets say that this is one of the groups you want to find (so m.group(1) will return the string that's in there

' \(' find the string " (". You need to escape character ('\') because ( is one of the special characters (see above)

'(.*), ' find the next string of characters (for group(2)) followed by a comma (,) and a space ( ).

etc. You probably get the idea by now.

Then m is a match object from matching the pattern (p) to line (which is a line from input.txt).

Then m.group(i) refers to the groups that you said should be selected by putting the brackets () around them.

Note, that regular expressions are "greedy" in the sense that they find the biggest string that fits the pattern (starting from the left). Thus for the last 7 lines of input.txt group(1) is the string "BC_1, Q(1),ouput3, X", which I assume is not what you want.
Jan 6 '12 #10
maximus tee
30 New Member
thanks for your guidance. RE is one of the hardest topic to understand in python.

sorry for confusion. i tried to understand your code by doing a print. for eg:
Expand|Select|Wrap|Line Numbers
  1. p=re.compile('   " *(.*) \((.*), (.*), (.*), (.*)\)," &.*') 
  2. print p
  3.  
which printed:
<_sre.SRE_Match object at 0x02B5C800>

and
Expand|Select|Wrap|Line Numbers
  1.     m=p.match(line)
  2.     print m
  3.  
which printed:
None
<_sre.SRE_Match object at 0x02B5C620>
17 BC_1 CLK input X
None
None
<_sre.SRE_Match object at 0x02B5C620>
15 BC_1 D(1) input X
<_sre.SRE_Match object at 0x02B5CBC0>
14 BC_1 D(2) input X
<_sre.SRE_Match object at 0x02B5C620>
13 BC_1 D(3) input X
<_sre.SRE_Match object at 0x02B5CBC0>
12 BC_1 D(4) input X
<_sre.SRE_Match object at 0x02B5C620>
11 BC_1 D(5) input X
<_sre.SRE_Match object at 0x02B5CBC0>
10 BC_1 D(6) input X
<_sre.SRE_Match object at 0x02B5C620>
9 BC_1 D(7) input X
<_sre.SRE_Match object at 0x02B5CBC0>
8 BC_1 D(8) input X
<_sre.SRE_Match object at 0x02B5C620>
7 BC_1, Q(1), output3, X 16 1 Z
<_sre.SRE_Match object at 0x02B5CBC0>
6 BC_1, Q(2), output3, X 16 1 Z
<_sre.SRE_Match object at 0x02B5C620>
5 BC_1, Q(3), output3, X 16 1 Z
<_sre.SRE_Match object at 0x02B5CBC0>
4 BC_1, Q(4), output3, X 16 1 Z
<_sre.SRE_Match object at 0x02B5C620>
3 BC_1, Q(5), output3, X 16 1 Z
<_sre.SRE_Match object at 0x02B5CBC0>
2 BC_1, Q(6), output3, X 16 1 Z
<_sre.SRE_Match object at 0x02B5C620>
1 BC_1, Q(7), output3, X 16 1 Z
None
Jan 6 '12 #11
Glenton
391 Recognized Expert Contributor
Yes, re objects don't print well, I'm afraid.

You need to use their attributes or methods. It can be a bit frustrating to debug and to understand. Try reading the docs link from my previous post. Good luck!
Jan 6 '12 #12
Glenton
391 Recognized Expert Contributor
Alternatively you can do it without re
Expand|Select|Wrap|Line Numbers
  1. import re
  2.  
  3. lines=open("input.txt","r")
  4.  
  5.  
  6. for line in lines:
  7.     l1=line.replace(" ","").replace('"','').split(",")  #Remove the spaces from the line and separate on ,
  8.     if len(l1)<2: continue   #to avoid lines that don't fit the general pattern
  9.     l2=l1[0].split("(")
  10.     l3=[l1[-2].replace(")","")]
  11.     l4=l2+l1[1:-2]+l3
  12.     print l4
gives this:
Expand|Select|Wrap|Line Numbers
  1. >>> 
  2. ['17', 'BC_1', 'CLK', 'input', 'X']
  3. ['16', 'BC_1', 'OC_NEG', 'input', 'X']
  4. ['16', 'BC_1', '*', 'control', '1']
  5. ['15', 'BC_1', 'D(1)', 'input', 'X']
  6. ['14', 'BC_1', 'D(2)', 'input', 'X']
  7. ['13', 'BC_1', 'D(3)', 'input', 'X']
  8. ['12', 'BC_1', 'D(4)', 'input', 'X']
  9. ['11', 'BC_1', 'D(5)', 'input', 'X']
  10. ['10', 'BC_1', 'D(6)', 'input', 'X']
  11. ['9', 'BC_1', 'D(7)', 'input', 'X']
  12. ['8', 'BC_1', 'D(8)', 'input', 'X']
  13. ['7', 'BC_1', 'Q(1)', 'output3', 'X', '16', '1', 'Z']
  14. ['6', 'BC_1', 'Q(2)', 'output3', 'X', '16', '1', 'Z']
  15. ['5', 'BC_1', 'Q(3)', 'output3', 'X', '16', '1', 'Z']
  16. ['4', 'BC_1', 'Q(4)', 'output3', 'X', '16', '1', 'Z']
  17. ['3', 'BC_1', 'Q(5)', 'output3', 'X', '16', '1', 'Z']
  18. ['2', 'BC_1', 'Q(6)', 'output3', 'X', '16', '1', 'Z']
  19. ['1', 'BC_1', 'Q(7)', 'output3', 'X', '16', '1', 'Z']
  20. ['0', 'BC_1', 'Q(8)', 'output3', 'X', '16', '1']
obviously you can use the contents of the list as you wish
Jan 6 '12 #13

Sign in to post your reply or Sign up for a free account.

Similar topics

2
2120
by: cricfan | last post by:
I'm parsing a text file to extract word definitions. For example the input text file contains the following content: di.va.gate \'di_--v*-.ga_-t\ vb pas.sim \'pas-*m\ adv : here and there :...
3
283
by: Vjay77 | last post by:
I posted this question, but I pressed 'post' and it disappeared. So once again: Problem: I need to go to lets say www.site.com/page.html Imagine that this html code is 6 mb long. I need to...
1
1820
by: rsj | last post by:
Hi! I have a large text file consisting of more than 1000 lines. The pattern for the text file is: xxxxx -- -- -- --
16
10941
by: Preben Randhol | last post by:
Hi A short newbie question. I would like to extract some values from a given text file directly into python variables. Can this be done simply by either standard library or other libraries? Some...
8
2817
by: Fabian Braennstroem | last post by:
Hi, I would like to remove certain lines from a log files. I had some sed/awk scripts for this, but now, I want to use python with its re module for this task. Actually, I have two different...
0
1415
by: raymonddaly | last post by:
I have a list of web pages from which I want to extract certain information. I need to download the html from the page, save to a local text file, extract the data I want and save that to another...
10
3166
by: bluemountain | last post by:
Hi there, Iam new to python forms and programming too I had a text file where i need to extract few words of data from the header(which is of 3 lines) and search for the keyword TEXT1, TEXT2,...
8
2413
by: poolboi | last post by:
hi guys, i just did printing to a certain file #!perl\bin\perl use strict; use warnings; my $t; my @ok;
3
1868
by: selvialagar | last post by:
Hi.............. I need to extract a textfile (.txt) file alone from the application startup path. That is I stored a text file in d:\text\inputfile.txt. The folder contains a text file.but i...
7
164327
by: DannyMc | last post by:
Hi , i am in the middle of creating the script to match the string and replace it in mount.txt mickey:/work1 /work1 bla bla bla mickey:/work2 /work2 bla bla bla micket:/job /job bla bla bla
0
7218
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
7370
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
7021
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
7478
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
5614
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
4701
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
3188
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
1
755
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
409
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.