473,837 Members | 1,493 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Simple text parsing gets difficult when line continues to next line

Hello,

I have a simple script to parse a text file (a visual basic program)
and convert key parts to tcl. Since I am only working on specific
sections and I need it quick, I decided not to learn/try a full blown
parsing module. My simple script works well until it runs into
functions that straddle multiple lines. For example:

Call mass_write(&H0, &HF, &H4, &H0, &H5, &H0, &H6, &H0, &H7, &H0,
&H8, &H0, _
&H9, &H0, &HA, &H0, &HB, &H0, &HC, &H0, &HD, &H0, &HE,
&H0, &HF, &H0, -1)
I read in each line with:

for line in open(fileName). readlines():

I would line to identify if a line continues (if line.endswith(' _'))
and concate with the next line:

line = line + nextLine

How can I get the next line when I am in a for loop using readlines?

jr

Nov 28 '06 #1
6 2690
Jacob Rael wrote:
Hello,

I have a simple script to parse a text file (a visual basic program)
and convert key parts to tcl. Since I am only working on specific
sections and I need it quick, I decided not to learn/try a full blown
parsing module. My simple script works well until it runs into
functions that straddle multiple lines. For example:

Call mass_write(&H0, &HF, &H4, &H0, &H5, &H0, &H6, &H0, &H7, &H0,
&H8, &H0, _
&H9, &H0, &HA, &H0, &HB, &H0, &HC, &H0, &HD, &H0, &HE,
&H0, &HF, &H0, -1)
I read in each line with:

for line in open(fileName). readlines():

I would line to identify if a line continues (if line.endswith(' _'))
and concate with the next line:

line = line + nextLine

How can I get the next line when I am in a for loop using readlines?

jr
Something like (not tested):

fp=open(filenam e, 'r')
for line in fp:
while line.rstrip().e ndswith('_'):
line+=fp.next()
fp.close()

-Larry

Nov 28 '06 #2
Jacob Rael wrote:
[...]
I would line to identify if a line continues (if line.endswith(' _'))
and concate with the next line:

line = line + nextLine

How can I get the next line when I am in a for loop using readlines?
Don't use readlines.

# NOT TESTED
program = open(fileName)
for line in program:
while line.rstrip("\n ").endswith("_" ):
line = line.rstrip("_ \n") + program.readlin e()
do_the_magic()

Cheers,
--
Roberto Bonvallet
Nov 28 '06 #3
Jacob Rael wrote:
Hello,

I have a simple script to parse a text file (a visual basic program)
and convert key parts to tcl. Since I am only working on specific
sections and I need it quick, I decided not to learn/try a full blown
parsing module. My simple script works well until it runs into
functions that straddle multiple lines. For example:

Call mass_write(&H0, &HF, &H4, &H0, &H5, &H0, &H6, &H0, &H7, &H0,
&H8, &H0, _
&H9, &H0, &HA, &H0, &HB, &H0, &HC, &H0, &HD, &H0, &HE,
&H0, &HF, &H0, -1)
I read in each line with:

for line in open(fileName). readlines():

I would line to identify if a line continues (if line.endswith(' _'))
and concate with the next line:

line = line + nextLine

How can I get the next line when I am in a for loop using readlines?
Don't do that. I'm rather dubious about approaches that try to grab the
next line on the fly e.g. fp.next(). Here's a function that takes a
list of lines and returns another with all trailing whitespace removed
and the continued lines glued together. It uses a simple state machine
approach.

def continue_join(l inesin):
linesout = []
buff = ""
NORMAL = 0
PENDING = 1
state = NORMAL
for line in linesin:
line = line.rstrip()
if state == NORMAL:
if line.endswith(' _'):
buff = line[:-1]
state = PENDING
else:
linesout.append (line)
else:
if line.endswith(' _'):
buff += line[:-1]
else:
buff += line
linesout.append (buff)
buff = ""
state = NORMAL
if state == PENDING:
raise ValueError("las t line is continued: %r" % line)
return linesout

import sys
fp = open(sys.argv[1])
rawlines = fp.readlines()
cleanlines = continue_join(r awlines)
for line in cleanlines:
print repr(line)
===
Tested with following files:
C:\junk>type contlinet1.txt
only one line

C:\junk>type contlinet2.txt
line 1
line 2

C:\junk>type contlinet3.txt
line 1
line 2a _
line 2b _
line 2c
line 3

C:\junk>type contlinet4.txt
line 1
_
_
line 2c
line 3

C:\junk>type contlinet5.txt
line 1
_
_
line 2c
line 3 _

C:\junk>

HTH,
John

Nov 28 '06 #4
John Machin wrote:
Jacob Rael wrote:
>Hello,

I have a simple script to parse a text file (a visual basic program)
and convert key parts to tcl. Since I am only working on specific
sections and I need it quick, I decided not to learn/try a full blown
parsing module. My simple script works well until it runs into
functions that straddle multiple lines. For example:

Call mass_write(&H0, &HF, &H4, &H0, &H5, &H0, &H6, &H0, &H7, &H0,
&H8, &H0, _
&H9, &H0, &HA, &H0, &HB, &H0, &HC, &H0, &HD, &H0, &HE,
&H0, &HF, &H0, -1)
I read in each line with:

for line in open(fileName). readlines():

I would line to identify if a line continues (if line.endswith(' _'))
and concate with the next line:

line = line + nextLine

How can I get the next line when I am in a for loop using readlines?

Don't do that. I'm rather dubious about approaches that try to grab the
next line on the fly e.g. fp.next(). Here's a function that takes a
list of lines and returns another with all trailing whitespace removed
and the continued lines glued together. It uses a simple state machine
approach.
I agree that mixing the line assembly and parsing is probably a mistake
although using next explicitly is fine as long as your careful with it.
For instance, I would be wary to use the mixed for-loop, next strategy
that some of the previous posts suggested. Here's a different,
generator-based implementation of the same idea that, for better or for
worse is considerably less verbose:

def continue_join_2 (linesin):
getline = iter(linesin).n ext
while True:
buffer = getline().rstri p()
try:
while buffer.endswith ('_'):
buffer = buffer[:-1] + getline().rstri p()
except StopIteration:
raise ValueError("las t line is continued: %r" % line)
yield buffer

-tim

[SNIP]

Nov 28 '06 #5

Tim Hochberg wrote:
[snip]
I agree that mixing the line assembly and parsing is probably a mistake
although using next explicitly is fine as long as your careful with it.
For instance, I would be wary to use the mixed for-loop, next strategy
that some of the previous posts suggested. Here's a different,
generator-based implementation of the same idea that, for better or for
worse is considerably less verbose:
[snip]

Here's a somewhat less verbose version of the state machine gadget.

def continue_join_3 (linesin):
linesout = []
buff = ""
pending = 0
for line in linesin:
# remove *all* trailing whitespace
line = line.rstrip()
if line.endswith(' _'):
buff += line[:-1]
pending = 1
else:
linesout.append (buff + line)
buff = ""
pending = 0
if pending:
raise ValueError("las t line is continued: %r" % line)
return linesout

FWIW, it works all the way back to Python 2.1

Cheers,
John,

Nov 28 '06 #6
Thanks all. I think I'll follow the "don't do that" advice.

jr

Jacob Rael wrote:
Hello,

I have a simple script to parse a text file (a visual basic program)
and convert key parts to tcl. Since I am only working on specific
sections and I need it quick, I decided not to learn/try a full blown
parsing module. My simple script works well until it runs into
functions that straddle multiple lines. For example:

Call mass_write(&H0, &HF, &H4, &H0, &H5, &H0, &H6, &H0, &H7, &H0,
&H8, &H0, _
&H9, &H0, &HA, &H0, &HB, &H0, &HC, &H0, &HD, &H0, &HE,
&H0, &HF, &H0, -1)
I read in each line with:

for line in open(fileName). readlines():

I would line to identify if a line continues (if line.endswith(' _'))
and concate with the next line:

line = line + nextLine

How can I get the next line when I am in a for loop using readlines?

jr
Nov 28 '06 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

16
2915
by: Terry | last post by:
Hi, This is a newbie's question. I want to preload 4 images and only when all 4 images has been loaded into browser's cache, I want to start a slideshow() function. If images are not completed loaded into cache, the slideshow doesn't look very nice. I am not sure how/when to call the slideshow() function to make sure it starts after the preload has been completed.
11
2721
by: JKop | last post by:
Take the following simple function: unsigned long Plus5Percent(unsigned long input) { return ( input + input / 20 ); } Do yous ever consider the possibly more efficent:
6
3804
by: KevinD | last post by:
assumption: I am new to C and old to COBOL I have been reading a lot (self teaching) but something is not sinking in with respect to reading a simple file - one record at a time. Using C, I am trying to read a flatfile. In COBOL, my simple file layout and READ statement would look like below. Question: what is the standard, simple coding convention for reading in a flatfile - one record at a time?? SCANF does not work because of...
2
2291
by: Eniac | last post by:
*argh* ... *pull hairs* I've recently started developing from ASP to ASP.NET The switch was fairly smooth since i had done some VB.net before ... then came...FORMS! :) I find it astounding at how difficult it has become to control a form, something that was so dead easy in ASP.
30
8290
by: drhowarddrfine | last post by:
I'm working with a server that will provide me the pathname to a file, among many paths. So from getenv I may get /home/myweb/page1 but, of course, there will be many variations of that. I'm unsure of the best way to go about following the path. Should I read one char at a time or use scanf? The problem could occur with something like /home/mypage/page1/page1/page2/page2, for example. I have not been programming in a few years so I...
4
1668
by: thenewuser | last post by:
Hi all, I am working on windows 2000 and using php 5.0 and apache 2.0.59. I am facing a problem while parsing a text file.Actually I am using a pop server for parsing an email.I am downloading new mails from that server using php and parsing the attachments.I want to parse text files as well. BUt when i save the attached .txt file on my machine, "=20" gets appended at the end of every line. If the file is an xml file, sometimes "=90" gets...
4
1628
by: cjl | last post by:
As a learning exercise, I am trying to write a web-based version of 'drawbot' in PHP. See: http://just.letterror.com/ltrwiki/DrawBot I am interested in hearing ideas about how to approach the user input parsing problem. I would like to allow people to type in simple code and have it executed, but I need to limit the code they can write to a few pre-defined drawing functions, as well as control structures like loops, if thens, etc...
13
2098
by: John Dann | last post by:
A Python newbie, but some basic understanding of how classes, objects etc work in eg VB.Net. However, I'm struggling a little to translate this knowledge into the Python context. I'm trying to teach myself this aspect of Python by working up a trial project, part of which calls for pulling in data from a serial data connection at regular intervals. It looked sensible to place all the comms procedures/functions in their own class and...
3
1824
by: Andy B | last post by:
I need to search an xml element for blocks of text. The start of the text block will have a 5 digit number in it and i then need to read until the next 5 digit number. After this, I need to put them in different containers of their own. Where would I start?
0
9682
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10881
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10575
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
10275
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7004
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5670
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
4475
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
4043
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
3126
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.