473,549 Members | 2,756 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Simple text parsing gets difficult when line continues to next line

Hello,

I have a simple script to parse a text file (a visual basic program)
and convert key parts to tcl. Since I am only working on specific
sections and I need it quick, I decided not to learn/try a full blown
parsing module. My simple script works well until it runs into
functions that straddle multiple lines. For example:

Call mass_write(&H0, &HF, &H4, &H0, &H5, &H0, &H6, &H0, &H7, &H0,
&H8, &H0, _
&H9, &H0, &HA, &H0, &HB, &H0, &HC, &H0, &HD, &H0, &HE,
&H0, &HF, &H0, -1)
I read in each line with:

for line in open(fileName). readlines():

I would line to identify if a line continues (if line.endswith(' _'))
and concate with the next line:

line = line + nextLine

How can I get the next line when I am in a for loop using readlines?

jr

Nov 28 '06 #1
6 2671
Jacob Rael wrote:
Hello,

I have a simple script to parse a text file (a visual basic program)
and convert key parts to tcl. Since I am only working on specific
sections and I need it quick, I decided not to learn/try a full blown
parsing module. My simple script works well until it runs into
functions that straddle multiple lines. For example:

Call mass_write(&H0, &HF, &H4, &H0, &H5, &H0, &H6, &H0, &H7, &H0,
&H8, &H0, _
&H9, &H0, &HA, &H0, &HB, &H0, &HC, &H0, &HD, &H0, &HE,
&H0, &HF, &H0, -1)
I read in each line with:

for line in open(fileName). readlines():

I would line to identify if a line continues (if line.endswith(' _'))
and concate with the next line:

line = line + nextLine

How can I get the next line when I am in a for loop using readlines?

jr
Something like (not tested):

fp=open(filenam e, 'r')
for line in fp:
while line.rstrip().e ndswith('_'):
line+=fp.next()
fp.close()

-Larry

Nov 28 '06 #2
Jacob Rael wrote:
[...]
I would line to identify if a line continues (if line.endswith(' _'))
and concate with the next line:

line = line + nextLine

How can I get the next line when I am in a for loop using readlines?
Don't use readlines.

# NOT TESTED
program = open(fileName)
for line in program:
while line.rstrip("\n ").endswith("_" ):
line = line.rstrip("_ \n") + program.readlin e()
do_the_magic()

Cheers,
--
Roberto Bonvallet
Nov 28 '06 #3
Jacob Rael wrote:
Hello,

I have a simple script to parse a text file (a visual basic program)
and convert key parts to tcl. Since I am only working on specific
sections and I need it quick, I decided not to learn/try a full blown
parsing module. My simple script works well until it runs into
functions that straddle multiple lines. For example:

Call mass_write(&H0, &HF, &H4, &H0, &H5, &H0, &H6, &H0, &H7, &H0,
&H8, &H0, _
&H9, &H0, &HA, &H0, &HB, &H0, &HC, &H0, &HD, &H0, &HE,
&H0, &HF, &H0, -1)
I read in each line with:

for line in open(fileName). readlines():

I would line to identify if a line continues (if line.endswith(' _'))
and concate with the next line:

line = line + nextLine

How can I get the next line when I am in a for loop using readlines?
Don't do that. I'm rather dubious about approaches that try to grab the
next line on the fly e.g. fp.next(). Here's a function that takes a
list of lines and returns another with all trailing whitespace removed
and the continued lines glued together. It uses a simple state machine
approach.

def continue_join(l inesin):
linesout = []
buff = ""
NORMAL = 0
PENDING = 1
state = NORMAL
for line in linesin:
line = line.rstrip()
if state == NORMAL:
if line.endswith(' _'):
buff = line[:-1]
state = PENDING
else:
linesout.append (line)
else:
if line.endswith(' _'):
buff += line[:-1]
else:
buff += line
linesout.append (buff)
buff = ""
state = NORMAL
if state == PENDING:
raise ValueError("las t line is continued: %r" % line)
return linesout

import sys
fp = open(sys.argv[1])
rawlines = fp.readlines()
cleanlines = continue_join(r awlines)
for line in cleanlines:
print repr(line)
===
Tested with following files:
C:\junk>type contlinet1.txt
only one line

C:\junk>type contlinet2.txt
line 1
line 2

C:\junk>type contlinet3.txt
line 1
line 2a _
line 2b _
line 2c
line 3

C:\junk>type contlinet4.txt
line 1
_
_
line 2c
line 3

C:\junk>type contlinet5.txt
line 1
_
_
line 2c
line 3 _

C:\junk>

HTH,
John

Nov 28 '06 #4
John Machin wrote:
Jacob Rael wrote:
>Hello,

I have a simple script to parse a text file (a visual basic program)
and convert key parts to tcl. Since I am only working on specific
sections and I need it quick, I decided not to learn/try a full blown
parsing module. My simple script works well until it runs into
functions that straddle multiple lines. For example:

Call mass_write(&H0, &HF, &H4, &H0, &H5, &H0, &H6, &H0, &H7, &H0,
&H8, &H0, _
&H9, &H0, &HA, &H0, &HB, &H0, &HC, &H0, &HD, &H0, &HE,
&H0, &HF, &H0, -1)
I read in each line with:

for line in open(fileName). readlines():

I would line to identify if a line continues (if line.endswith(' _'))
and concate with the next line:

line = line + nextLine

How can I get the next line when I am in a for loop using readlines?

Don't do that. I'm rather dubious about approaches that try to grab the
next line on the fly e.g. fp.next(). Here's a function that takes a
list of lines and returns another with all trailing whitespace removed
and the continued lines glued together. It uses a simple state machine
approach.
I agree that mixing the line assembly and parsing is probably a mistake
although using next explicitly is fine as long as your careful with it.
For instance, I would be wary to use the mixed for-loop, next strategy
that some of the previous posts suggested. Here's a different,
generator-based implementation of the same idea that, for better or for
worse is considerably less verbose:

def continue_join_2 (linesin):
getline = iter(linesin).n ext
while True:
buffer = getline().rstri p()
try:
while buffer.endswith ('_'):
buffer = buffer[:-1] + getline().rstri p()
except StopIteration:
raise ValueError("las t line is continued: %r" % line)
yield buffer

-tim

[SNIP]

Nov 28 '06 #5

Tim Hochberg wrote:
[snip]
I agree that mixing the line assembly and parsing is probably a mistake
although using next explicitly is fine as long as your careful with it.
For instance, I would be wary to use the mixed for-loop, next strategy
that some of the previous posts suggested. Here's a different,
generator-based implementation of the same idea that, for better or for
worse is considerably less verbose:
[snip]

Here's a somewhat less verbose version of the state machine gadget.

def continue_join_3 (linesin):
linesout = []
buff = ""
pending = 0
for line in linesin:
# remove *all* trailing whitespace
line = line.rstrip()
if line.endswith(' _'):
buff += line[:-1]
pending = 1
else:
linesout.append (buff + line)
buff = ""
pending = 0
if pending:
raise ValueError("las t line is continued: %r" % line)
return linesout

FWIW, it works all the way back to Python 2.1

Cheers,
John,

Nov 28 '06 #6
Thanks all. I think I'll follow the "don't do that" advice.

jr

Jacob Rael wrote:
Hello,

I have a simple script to parse a text file (a visual basic program)
and convert key parts to tcl. Since I am only working on specific
sections and I need it quick, I decided not to learn/try a full blown
parsing module. My simple script works well until it runs into
functions that straddle multiple lines. For example:

Call mass_write(&H0, &HF, &H4, &H0, &H5, &H0, &H6, &H0, &H7, &H0,
&H8, &H0, _
&H9, &H0, &HA, &H0, &HB, &H0, &HC, &H0, &HD, &H0, &HE,
&H0, &HF, &H0, -1)
I read in each line with:

for line in open(fileName). readlines():

I would line to identify if a line continues (if line.endswith(' _'))
and concate with the next line:

line = line + nextLine

How can I get the next line when I am in a for loop using readlines?

jr
Nov 28 '06 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

16
2863
by: Terry | last post by:
Hi, This is a newbie's question. I want to preload 4 images and only when all 4 images has been loaded into browser's cache, I want to start a slideshow() function. If images are not completed loaded into cache, the slideshow doesn't look very nice. I am not sure how/when to call the slideshow() function to make sure it starts after...
11
2681
by: JKop | last post by:
Take the following simple function: unsigned long Plus5Percent(unsigned long input) { return ( input + input / 20 ); } Do yous ever consider the possibly more efficent:
6
3756
by: KevinD | last post by:
assumption: I am new to C and old to COBOL I have been reading a lot (self teaching) but something is not sinking in with respect to reading a simple file - one record at a time. Using C, I am trying to read a flatfile. In COBOL, my simple file layout and READ statement would look like below. Question: what is the standard, simple...
2
2273
by: Eniac | last post by:
*argh* ... *pull hairs* I've recently started developing from ASP to ASP.NET The switch was fairly smooth since i had done some VB.net before ... then came...FORMS! :) I find it astounding at how difficult it has become to control a form, something that was so dead easy in ASP.
30
8071
by: drhowarddrfine | last post by:
I'm working with a server that will provide me the pathname to a file, among many paths. So from getenv I may get /home/myweb/page1 but, of course, there will be many variations of that. I'm unsure of the best way to go about following the path. Should I read one char at a time or use scanf? The problem could occur with something like...
4
1653
by: thenewuser | last post by:
Hi all, I am working on windows 2000 and using php 5.0 and apache 2.0.59. I am facing a problem while parsing a text file.Actually I am using a pop server for parsing an email.I am downloading new mails from that server using php and parsing the attachments.I want to parse text files as well. BUt when i save the attached .txt file on my...
4
1609
by: cjl | last post by:
As a learning exercise, I am trying to write a web-based version of 'drawbot' in PHP. See: http://just.letterror.com/ltrwiki/DrawBot I am interested in hearing ideas about how to approach the user input parsing problem. I would like to allow people to type in simple code and have it executed, but I need to limit the code they can write...
13
2073
by: John Dann | last post by:
A Python newbie, but some basic understanding of how classes, objects etc work in eg VB.Net. However, I'm struggling a little to translate this knowledge into the Python context. I'm trying to teach myself this aspect of Python by working up a trial project, part of which calls for pulling in data from a serial data connection at regular...
3
1807
by: Andy B | last post by:
I need to search an xml element for blocks of text. The start of the text block will have a 5 digit number in it and i then need to read until the next 5 digit number. After this, I need to put them in different containers of their own. Where would I start?
0
7551
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
7750
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...
0
7991
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
1
7509
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
0
7838
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
0
6084
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
0
5118
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3503
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
790
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.