473,465 Members | 4,818 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

Simple text parsing gets difficult when line continues to next line

Hello,

I have a simple script to parse a text file (a visual basic program)
and convert key parts to tcl. Since I am only working on specific
sections and I need it quick, I decided not to learn/try a full blown
parsing module. My simple script works well until it runs into
functions that straddle multiple lines. For example:

Call mass_write(&H0, &HF, &H4, &H0, &H5, &H0, &H6, &H0, &H7, &H0,
&H8, &H0, _
&H9, &H0, &HA, &H0, &HB, &H0, &HC, &H0, &HD, &H0, &HE,
&H0, &HF, &H0, -1)
I read in each line with:

for line in open(fileName).readlines():

I would line to identify if a line continues (if line.endswith('_'))
and concate with the next line:

line = line + nextLine

How can I get the next line when I am in a for loop using readlines?

jr

Nov 28 '06 #1
6 2661
Jacob Rael wrote:
Hello,

I have a simple script to parse a text file (a visual basic program)
and convert key parts to tcl. Since I am only working on specific
sections and I need it quick, I decided not to learn/try a full blown
parsing module. My simple script works well until it runs into
functions that straddle multiple lines. For example:

Call mass_write(&H0, &HF, &H4, &H0, &H5, &H0, &H6, &H0, &H7, &H0,
&H8, &H0, _
&H9, &H0, &HA, &H0, &HB, &H0, &HC, &H0, &HD, &H0, &HE,
&H0, &HF, &H0, -1)
I read in each line with:

for line in open(fileName).readlines():

I would line to identify if a line continues (if line.endswith('_'))
and concate with the next line:

line = line + nextLine

How can I get the next line when I am in a for loop using readlines?

jr
Something like (not tested):

fp=open(filename, 'r')
for line in fp:
while line.rstrip().endswith('_'):
line+=fp.next()
fp.close()

-Larry

Nov 28 '06 #2
Jacob Rael wrote:
[...]
I would line to identify if a line continues (if line.endswith('_'))
and concate with the next line:

line = line + nextLine

How can I get the next line when I am in a for loop using readlines?
Don't use readlines.

# NOT TESTED
program = open(fileName)
for line in program:
while line.rstrip("\n").endswith("_"):
line = line.rstrip("_ \n") + program.readline()
do_the_magic()

Cheers,
--
Roberto Bonvallet
Nov 28 '06 #3
Jacob Rael wrote:
Hello,

I have a simple script to parse a text file (a visual basic program)
and convert key parts to tcl. Since I am only working on specific
sections and I need it quick, I decided not to learn/try a full blown
parsing module. My simple script works well until it runs into
functions that straddle multiple lines. For example:

Call mass_write(&H0, &HF, &H4, &H0, &H5, &H0, &H6, &H0, &H7, &H0,
&H8, &H0, _
&H9, &H0, &HA, &H0, &HB, &H0, &HC, &H0, &HD, &H0, &HE,
&H0, &HF, &H0, -1)
I read in each line with:

for line in open(fileName).readlines():

I would line to identify if a line continues (if line.endswith('_'))
and concate with the next line:

line = line + nextLine

How can I get the next line when I am in a for loop using readlines?
Don't do that. I'm rather dubious about approaches that try to grab the
next line on the fly e.g. fp.next(). Here's a function that takes a
list of lines and returns another with all trailing whitespace removed
and the continued lines glued together. It uses a simple state machine
approach.

def continue_join(linesin):
linesout = []
buff = ""
NORMAL = 0
PENDING = 1
state = NORMAL
for line in linesin:
line = line.rstrip()
if state == NORMAL:
if line.endswith('_'):
buff = line[:-1]
state = PENDING
else:
linesout.append(line)
else:
if line.endswith('_'):
buff += line[:-1]
else:
buff += line
linesout.append(buff)
buff = ""
state = NORMAL
if state == PENDING:
raise ValueError("last line is continued: %r" % line)
return linesout

import sys
fp = open(sys.argv[1])
rawlines = fp.readlines()
cleanlines = continue_join(rawlines)
for line in cleanlines:
print repr(line)
===
Tested with following files:
C:\junk>type contlinet1.txt
only one line

C:\junk>type contlinet2.txt
line 1
line 2

C:\junk>type contlinet3.txt
line 1
line 2a _
line 2b _
line 2c
line 3

C:\junk>type contlinet4.txt
line 1
_
_
line 2c
line 3

C:\junk>type contlinet5.txt
line 1
_
_
line 2c
line 3 _

C:\junk>

HTH,
John

Nov 28 '06 #4
John Machin wrote:
Jacob Rael wrote:
>Hello,

I have a simple script to parse a text file (a visual basic program)
and convert key parts to tcl. Since I am only working on specific
sections and I need it quick, I decided not to learn/try a full blown
parsing module. My simple script works well until it runs into
functions that straddle multiple lines. For example:

Call mass_write(&H0, &HF, &H4, &H0, &H5, &H0, &H6, &H0, &H7, &H0,
&H8, &H0, _
&H9, &H0, &HA, &H0, &HB, &H0, &HC, &H0, &HD, &H0, &HE,
&H0, &HF, &H0, -1)
I read in each line with:

for line in open(fileName).readlines():

I would line to identify if a line continues (if line.endswith('_'))
and concate with the next line:

line = line + nextLine

How can I get the next line when I am in a for loop using readlines?

Don't do that. I'm rather dubious about approaches that try to grab the
next line on the fly e.g. fp.next(). Here's a function that takes a
list of lines and returns another with all trailing whitespace removed
and the continued lines glued together. It uses a simple state machine
approach.
I agree that mixing the line assembly and parsing is probably a mistake
although using next explicitly is fine as long as your careful with it.
For instance, I would be wary to use the mixed for-loop, next strategy
that some of the previous posts suggested. Here's a different,
generator-based implementation of the same idea that, for better or for
worse is considerably less verbose:

def continue_join_2(linesin):
getline = iter(linesin).next
while True:
buffer = getline().rstrip()
try:
while buffer.endswith('_'):
buffer = buffer[:-1] + getline().rstrip()
except StopIteration:
raise ValueError("last line is continued: %r" % line)
yield buffer

-tim

[SNIP]

Nov 28 '06 #5

Tim Hochberg wrote:
[snip]
I agree that mixing the line assembly and parsing is probably a mistake
although using next explicitly is fine as long as your careful with it.
For instance, I would be wary to use the mixed for-loop, next strategy
that some of the previous posts suggested. Here's a different,
generator-based implementation of the same idea that, for better or for
worse is considerably less verbose:
[snip]

Here's a somewhat less verbose version of the state machine gadget.

def continue_join_3(linesin):
linesout = []
buff = ""
pending = 0
for line in linesin:
# remove *all* trailing whitespace
line = line.rstrip()
if line.endswith('_'):
buff += line[:-1]
pending = 1
else:
linesout.append(buff + line)
buff = ""
pending = 0
if pending:
raise ValueError("last line is continued: %r" % line)
return linesout

FWIW, it works all the way back to Python 2.1

Cheers,
John,

Nov 28 '06 #6
Thanks all. I think I'll follow the "don't do that" advice.

jr

Jacob Rael wrote:
Hello,

I have a simple script to parse a text file (a visual basic program)
and convert key parts to tcl. Since I am only working on specific
sections and I need it quick, I decided not to learn/try a full blown
parsing module. My simple script works well until it runs into
functions that straddle multiple lines. For example:

Call mass_write(&H0, &HF, &H4, &H0, &H5, &H0, &H6, &H0, &H7, &H0,
&H8, &H0, _
&H9, &H0, &HA, &H0, &HB, &H0, &HC, &H0, &HD, &H0, &HE,
&H0, &HF, &H0, -1)
I read in each line with:

for line in open(fileName).readlines():

I would line to identify if a line continues (if line.endswith('_'))
and concate with the next line:

line = line + nextLine

How can I get the next line when I am in a for loop using readlines?

jr
Nov 28 '06 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

16
by: Terry | last post by:
Hi, This is a newbie's question. I want to preload 4 images and only when all 4 images has been loaded into browser's cache, I want to start a slideshow() function. If images are not completed...
11
by: JKop | last post by:
Take the following simple function: unsigned long Plus5Percent(unsigned long input) { return ( input + input / 20 ); } Do yous ever consider the possibly more efficent:
6
by: KevinD | last post by:
assumption: I am new to C and old to COBOL I have been reading a lot (self teaching) but something is not sinking in with respect to reading a simple file - one record at a time. Using C, I am...
2
by: Eniac | last post by:
*argh* ... *pull hairs* I've recently started developing from ASP to ASP.NET The switch was fairly smooth since i had done some VB.net before ... then came...FORMS! :) I find it astounding...
30
by: drhowarddrfine | last post by:
I'm working with a server that will provide me the pathname to a file, among many paths. So from getenv I may get /home/myweb/page1 but, of course, there will be many variations of that. I'm...
4
by: thenewuser | last post by:
Hi all, I am working on windows 2000 and using php 5.0 and apache 2.0.59. I am facing a problem while parsing a text file.Actually I am using a pop server for parsing an email.I am downloading...
4
by: cjl | last post by:
As a learning exercise, I am trying to write a web-based version of 'drawbot' in PHP. See: http://just.letterror.com/ltrwiki/DrawBot I am interested in hearing ideas about how to approach...
13
by: John Dann | last post by:
A Python newbie, but some basic understanding of how classes, objects etc work in eg VB.Net. However, I'm struggling a little to translate this knowledge into the Python context. I'm trying to...
3
by: Andy B | last post by:
I need to search an xml element for blocks of text. The start of the text block will have a 5 digit number in it and i then need to read until the next 5 digit number. After this, I need to put...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
1
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.