i want to extract certain section of the text file. my input file:
-- num cell port function safe [ccell disval rslt]
"17 (BC_1, CLK, input, X)," &
"16 (BC_1, OC_NEG, input, X), " &-- Merged input/
" 8 (BC_1, D(8), input, X)," & -- cell 16 @ 1 -> Hi-Z
" 7 (BC_1, Q(1), output3, X, 16, 1, Z)," &
" 0 (BC_1, Q(8), output3, X, 16, 1, Z)";
and i need the output to be as such:
num cell port function safe ccell
17 BC_1 CLK input X
16 BC_1 OC_NEG input X
16 BC_1 * control 1
8 BC_1 D8 input X
7 BC_1 Q1 output3 X 16 1
0 BC_1 Q8 output3 X 16 1
so far i tried below code but it gave index error. pls advise. -
import re
-
lines=open("input.txt",'r').readlines()
-
-
for line in lines:
-
a=re.findall(r'\w+',line)
-
print re.findall(r'\w+',line)
-
print a[0],a[1],a[2],a[3],a[4],a[5],a[6]
-
-
i'm using python 2.6.6 and win 7 and error as below: ['num', 'cell', 'port', 'function', 'safe', 'ccell', 'disval', 'rslt'] num cell port function safe ccell disval ['17', 'BC_1', 'CLK', 'input', 'X'] 17 BC_1 CLK input X Traceback (most recent call last): File "C:\Users\ctee1\Desktop\pyparsing\outputparser.py" , line 39, in print a[0],a[1],a[2],a[3],a[4],a[5],a[6] IndexError: list index out of range
thanks maximus
12 14430
I believe it is because there are only 5 elements in "17 BC_1 CLK input X" and you are trying to print 7 (a[0] to a[6]).
thanks will relook into it.
Glenton 391
Recognized Expert Contributor
The easy way to do this more safely would be something like - import re
-
lines=open("input.txt",'r').readlines()
-
-
for line in lines:
-
a=re.findall(r'\w+',line)
-
print re.findall(r'\w+',line)
-
#print a[0],a[1],a[2],a[3],a[4],a[5],a[6]
-
for b in a:
-
print b,
-
print
hi, thanks for your f/back.
would like to check whether the last line print is a typo? and also in the for loop, there is a print, ?
i was thinking of doing: -
for line in lines:
-
a=line.split('-')[0]
-
print a
-
for b in a:
-
print b,
-
print
-
Glenton 391
Recognized Expert Contributor
The "print b," is to print b without going to a new line. It's to do the equivalent of printing a[0], a[1], a[2],... for as many as are needed.
The "print" at the end is to make a new line.
I was assuming your regular expression was working, but perhaps it isn't. I can't imaging that your split expression would work either.
Perhaps you can explain the logic of what you're trying to achieve. There are many ways that you could get that output given that input. But what are the more general rules? Eg is the first line always in that format? Do you find that all lines are one of two formats? If you can provide more about what you're trying to achieve, then it will be easier to help.
apologies for confusion.
general rules:
1) the first line inside input.txt (as attached in the first post):
-- num cell port function safe [ccell disval rslt]
i just need num cell port function safe ccell
however my script below couldnt get this so i skip the line. it will be great if you can show me.
2) i'm trying to convert the line inside the input.txt
"17 (BC_1, CLK, input, X)," &
into 17 BC_1 CLK input X
basically i'm only extracting column for num, cell, port, function, safe and ccell. the rest are not needed.
and " 7 (BC_1, Q(1), output3, X, 16, 1, Z)," &
into 7 BC_1 Q1 output3 X 16 1
so far my script is which result in the output.txt (as attached in the first post): -
import re
-
-
fileIn = open("input.txt", "rb")
-
fileOut = open("output.txt", "w")
-
-
for strData in fileIn:
-
strData = strData.split('-')[0] #this is to remove the first line
-
-
if("input" in strData):
-
a=re.split("\W+", strData)
-
#print a
-
#fileOut.write (' '.join(a[1:7]) )
-
fileOut.write(a[1]+' '+a[2]+' '+a[3]+' '+a[4]+' '+a[5]+' '+a[6]+'\n')
-
-
if("output" in strData):
-
a=re.split("\W+", strData)
-
#print a
-
fileOut.write(a[1]+' '+a[2]+' '+a[3]+' '+a[4]+' '+a[5]+' '+a[6]+' '+a[7]+'\n')
-
Glenton 391
Recognized Expert Contributor
Regular expressions are what you need. They take a bit of getting used to, but work brilliantly once you have the hang of it.
You have a different number of variables in your input.txt file, so the below works with the first ones, which have the format you posted originally, but not with the latter ones: - import re
-
-
lines=open("input.txt","r")
-
-
p=re.compile(' " *(.*) \((.*), (.*), (.*), (.*)\)," &.*')
-
-
-
for line in lines:
-
m=p.match(line)
-
if not m: continue
-
for i in range(1,6):
-
print m.group(i),
-
print
Gives - 17 BC_1 CLK input X
-
15 BC_1 D(1) input X
-
14 BC_1 D(2) input X
-
13 BC_1 D(3) input X
-
12 BC_1 D(4) input X
-
11 BC_1 D(5) input X
-
10 BC_1 D(6) input X
-
9 BC_1 D(7) input X
-
8 BC_1 D(8) input X
-
7 BC_1, Q(1), output3, X 16 1 Z
-
6 BC_1, Q(2), output3, X 16 1 Z
-
5 BC_1, Q(3), output3, X 16 1 Z
-
4 BC_1, Q(4), output3, X 16 1 Z
-
3 BC_1, Q(5), output3, X 16 1 Z
-
2 BC_1, Q(6), output3, X 16 1 Z
-
1 BC_1, Q(7), output3, X 16 1 Z
wow only a few lines of codes. i dont get it regular expression, it is hard.
dont understand:
1)p=re.compile(' " *(.*) \((.*), (.*), (.*), (.*)\)," &.*')
2)m=p.match(line), what does match line mean?
3)m.group(i), what does it group?
i tried to print but it only print address.
thanks
Glenton 391
Recognized Expert Contributor
You can look here to get more details on how regular expressions work.
I don't understand your statement "I tried to print but it only print address". What does this mean? The code should work to create the output that I gave.
re.compile is to make a pattern that you can match some text against. In this case it looks for the following:
' "' is the start of the string
' *' is some number of spaces (bigger than or equal to 0)
'(.*)' is a string of any characters (.) and any length (*). The brackets say that this is one of the groups you want to find (so m.group(1) will return the string that's in there
' \(' find the string " (". You need to escape character ('\') because ( is one of the special characters (see above)
'(.*), ' find the next string of characters (for group(2)) followed by a comma (,) and a space ( ).
etc. You probably get the idea by now.
Then m is a match object from matching the pattern (p) to line (which is a line from input.txt).
Then m.group(i) refers to the groups that you said should be selected by putting the brackets () around them.
Note, that regular expressions are "greedy" in the sense that they find the biggest string that fits the pattern (starting from the left). Thus for the last 7 lines of input.txt group(1) is the string "BC_1, Q(1),ouput3, X", which I assume is not what you want.
thanks for your guidance. RE is one of the hardest topic to understand in python.
sorry for confusion. i tried to understand your code by doing a print. for eg: -
p=re.compile(' " *(.*) \((.*), (.*), (.*), (.*)\)," &.*')
-
print p
-
which printed:
<_sre.SRE_Match object at 0x02B5C800>
and
which printed:
None
<_sre.SRE_Match object at 0x02B5C620>
17 BC_1 CLK input X
None
None
<_sre.SRE_Match object at 0x02B5C620>
15 BC_1 D(1) input X
<_sre.SRE_Match object at 0x02B5CBC0>
14 BC_1 D(2) input X
<_sre.SRE_Match object at 0x02B5C620>
13 BC_1 D(3) input X
<_sre.SRE_Match object at 0x02B5CBC0>
12 BC_1 D(4) input X
<_sre.SRE_Match object at 0x02B5C620>
11 BC_1 D(5) input X
<_sre.SRE_Match object at 0x02B5CBC0>
10 BC_1 D(6) input X
<_sre.SRE_Match object at 0x02B5C620>
9 BC_1 D(7) input X
<_sre.SRE_Match object at 0x02B5CBC0>
8 BC_1 D(8) input X
<_sre.SRE_Match object at 0x02B5C620>
7 BC_1, Q(1), output3, X 16 1 Z
<_sre.SRE_Match object at 0x02B5CBC0>
6 BC_1, Q(2), output3, X 16 1 Z
<_sre.SRE_Match object at 0x02B5C620>
5 BC_1, Q(3), output3, X 16 1 Z
<_sre.SRE_Match object at 0x02B5CBC0>
4 BC_1, Q(4), output3, X 16 1 Z
<_sre.SRE_Match object at 0x02B5C620>
3 BC_1, Q(5), output3, X 16 1 Z
<_sre.SRE_Match object at 0x02B5CBC0>
2 BC_1, Q(6), output3, X 16 1 Z
<_sre.SRE_Match object at 0x02B5C620>
1 BC_1, Q(7), output3, X 16 1 Z
None
Glenton 391
Recognized Expert Contributor
Yes, re objects don't print well, I'm afraid.
You need to use their attributes or methods. It can be a bit frustrating to debug and to understand. Try reading the docs link from my previous post. Good luck!
Glenton 391
Recognized Expert Contributor
Alternatively you can do it without re - import re
-
-
lines=open("input.txt","r")
-
-
-
for line in lines:
-
l1=line.replace(" ","").replace('"','').split(",") #Remove the spaces from the line and separate on ,
-
if len(l1)<2: continue #to avoid lines that don't fit the general pattern
-
l2=l1[0].split("(")
-
l3=[l1[-2].replace(")","")]
-
l4=l2+l1[1:-2]+l3
-
print l4
gives this: - >>>
-
['17', 'BC_1', 'CLK', 'input', 'X']
-
['16', 'BC_1', 'OC_NEG', 'input', 'X']
-
['16', 'BC_1', '*', 'control', '1']
-
['15', 'BC_1', 'D(1)', 'input', 'X']
-
['14', 'BC_1', 'D(2)', 'input', 'X']
-
['13', 'BC_1', 'D(3)', 'input', 'X']
-
['12', 'BC_1', 'D(4)', 'input', 'X']
-
['11', 'BC_1', 'D(5)', 'input', 'X']
-
['10', 'BC_1', 'D(6)', 'input', 'X']
-
['9', 'BC_1', 'D(7)', 'input', 'X']
-
['8', 'BC_1', 'D(8)', 'input', 'X']
-
['7', 'BC_1', 'Q(1)', 'output3', 'X', '16', '1', 'Z']
-
['6', 'BC_1', 'Q(2)', 'output3', 'X', '16', '1', 'Z']
-
['5', 'BC_1', 'Q(3)', 'output3', 'X', '16', '1', 'Z']
-
['4', 'BC_1', 'Q(4)', 'output3', 'X', '16', '1', 'Z']
-
['3', 'BC_1', 'Q(5)', 'output3', 'X', '16', '1', 'Z']
-
['2', 'BC_1', 'Q(6)', 'output3', 'X', '16', '1', 'Z']
-
['1', 'BC_1', 'Q(7)', 'output3', 'X', '16', '1', 'Z']
-
['0', 'BC_1', 'Q(8)', 'output3', 'X', '16', '1']
obviously you can use the contents of the list as you wish
Sign in to post your reply or Sign up for a free account.
Similar topics |
by: cricfan |
last post by:
I'm parsing a text file to extract word definitions. For example the
input text file contains the following content:
di.va.gate \'di_--v*-.ga_-t\ vb
pas.sim \'pas-*m\ adv : here and there :...
|
by: Vjay77 |
last post by:
I posted this question, but I pressed 'post' and it disappeared. So
once again:
Problem:
I need to go to lets say www.site.com/page.html
Imagine that this html code is 6 mb long.
I need to...
|
by: rsj |
last post by:
Hi!
I have a large text file consisting of more than 1000 lines. The
pattern for the text file is:
xxxxx
--
--
--
--
|
by: Preben Randhol |
last post by:
Hi
A short newbie question. I would like to extract some values from a
given text file directly into python variables. Can this be done simply
by either standard library or other libraries? Some...
|
by: Fabian Braennstroem |
last post by:
Hi,
I would like to remove certain lines from a log files. I had
some sed/awk scripts for this, but now, I want to use python
with its re module for this task.
Actually, I have two different...
| |
by: raymonddaly |
last post by:
I have a list of web pages from which I want to extract certain information.
I need to download the html from the page, save to a local text file, extract the data I want and save that to another...
|
by: bluemountain |
last post by:
Hi there,
Iam new to python forms and programming too
I had a text file where i need to extract few words of data from the header(which is of 3 lines) and search for the keyword TEXT1, TEXT2,...
|
by: poolboi |
last post by:
hi guys,
i just did printing to a certain file
#!perl\bin\perl
use strict;
use warnings;
my $t;
my @ok;
|
by: selvialagar |
last post by:
Hi..............
I need to extract a textfile (.txt) file alone from the application startup path. That is I stored a text file in d:\text\inputfile.txt. The folder contains a text file.but i...
|
by: DannyMc |
last post by:
Hi ,
i am in the middle of creating the script to match the string and replace it
in mount.txt
mickey:/work1 /work1 bla bla bla
mickey:/work2 /work2 bla bla bla
micket:/job /job bla bla bla
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
| |
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The...
| |
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
|
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...
| |