471,075 Members | 715 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,075 software developers and data experts.

Refactoring a generator function

Here is a simple function that scans through an input file and groups the lines of the file into
sections. Sections start with 'Name:' and end with a blank line. The function yields sections as
they are found.

def makeSections(f):
currSection = []

for line in f:
line = line.strip()
if line == 'Name:':
# Start of a new section
if currSection:
yield currSection
currSection = []
currSection.append(line)

elif not line:
# Blank line ends a section
if currSection:
yield currSection
currSection = []

else:
# Accumulate into a section
currSection.append(line)

# Yield the last section
if currSection:
yield currSection

There is some obvious code duplication in the function - this bit is repeated 2.67 times ;-):
if currSection:
yield currSection
currSection = []

As a firm believer in Once and Only Once, I would like to factor this out into a separate function,
either a nested function of makeSections(), or as a separate method of a class implementation.
Something like this:

def makeSections(f): ### DOESN'T WORK ###
currSection = []

def yieldSection():
if currSection:
yield currSection
del currSection[:]

for line in f:
line = line.strip()
if line == 'Name:':
# Start of a new section
yieldSection()
currSection.append(line)

elif not line:
# Blank line ends a section
yieldSection()

else:
# Accumulate into a section
currSection.append(line)

# Yield the last section
yieldSection()
The problem is that yieldSection() now is the generator, and makeSections() is not, and the result
of calling yieldSection() is a new iterator, not the section...

Is there a way to do this or do I have to live with the duplication?

Thanks,
Kent
Here is a complete program:

data = '''
Name:
City:
xxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxx
.....................
xxxxxxxxxxxxxxxxxxxx
Name:
City:
xxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxx

'''

import cStringIO # just for test

def makeSections(f):
''' This is a generator function. It will return successive sections
of f until EOF.

Sections are every line from a 'Name:' line to the first blank line.
Sections are returned as a list of lines with line endings stripped.
'''

currSection = []

for line in f:
line = line.strip()
if line == 'Name:':
# Start of a new section
if currSection:
yield currSection
currSection = []
currSection.append(line)

elif not line:
# Blank line ends a section
if currSection:
yield currSection
currSection = []

else:
# Accumulate into a section
currSection.append(line)

# Yield the last section
if currSection:
yield currSection
f = cStringIO.StringIO(data)

for section in makeSections(f):
print 'Section'
for line in section:
print ' ', line
print
Jul 18 '05 #1
2 1571
Kent Johnson wrote:
Here is a simple function that scans through an input file and groups
the lines of the file into sections. Sections start with 'Name:' and end
with a blank line. The function yields sections as they are found.

def makeSections(f):
currSection = []

for line in f:
line = line.strip()
if line == 'Name:':
# Start of a new section
if currSection:
yield currSection
currSection = []
currSection.append(line)

elif not line:
# Blank line ends a section
if currSection:
yield currSection
currSection = []

else:
# Accumulate into a section
currSection.append(line)

# Yield the last section
if currSection:
yield currSection

There is some obvious code duplication in the function - this bit is
repeated 2.67 times ;-):
if currSection:
yield currSection
currSection = []


You can write:

for section in yieldSection():
yield section

in both places, but I assume you still don't like the code duplication
this would create.

How about something like (completely untested):

if line == 'Name:' or not line:
if currSection:
yield currSection
currSection = []
if line == 'Name:'
currSection.append(line)

Another consideration: in Python 2.4, itertools has a groupby function
that you could probably get some benefit from:
class Sections(object): .... def __init__(self):
.... self.is_section = False
.... def __call__(self, line):
.... if line == 'Name:\n':
.... self.is_section = True
.... elif line == '\n':
.... self.is_section = False
.... return self.is_section
.... def make_sections(f): .... for _, section in itertools.groupby(f, Sections()):
.... result = ''.join(section)
.... if result != '\n':
.... yield result
.... f = 'Name:\nA\nx\ny\nz\n\nName:\nB\na\nb\nc\n'.splitli nes(True)
list(make_sections(f))

['Name:\nA\nx\ny\nz\n', 'Name:\nB\na\nb\nc\n']
Jul 18 '05 #2
max
Kent Johnson <ke******@yahoo.com> wrote in
news:41**********@newspeer2.tds.net:
Here is a simple function that scans through an input file and
groups the lines of the file into sections. Sections start with
'Name:' and end with a blank line. The function yields sections
as they are found.

def makeSections(f):
currSection = []

for line in f:
line = line.strip()
if line == 'Name:':
# Start of a new section
if currSection:
yield currSection
currSection = []
currSection.append(line)

elif not line:
# Blank line ends a section
if currSection:
yield currSection
currSection = []

else:
# Accumulate into a section
currSection.append(line)

# Yield the last section
if currSection:
yield currSection

There is some obvious code duplication in the function - this bit
is repeated 2.67 times ;-):
if currSection:
yield currSection
currSection = []

As a firm believer in Once and Only Once, I would like to factor
this out into a separate function, either a nested function of
makeSections(), or as a separate method of a class
implementation. Something like this:
The problem is that yieldSection() now is the generator, and
makeSections() is not, and the result of calling yieldSection()
is a new iterator, not the section...

Is there a way to do this or do I have to live with the
duplication?

Thanks,
Kent


This gets rid of some duplication by ignoring blanklines altogether,
which might be a bug...

def makeSections2(f):
currSection = []
for line in f:
line = line.strip()
if line:
if line == 'Name:':
if currSection:
yield cs
currSection = []
currSection.append(line)
if currSection:
yield currSection

but

def makeSections2(f):
currSection = []
for line in f:
line = line.strip()

if line:
if line == 'Name:':
if currSection:
yield currSection
currSection = []
currSection.append(line)

elif currSection:
yield currSection

if currSection:
yield currSection

should be equivalent.
Jul 18 '05 #3

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

45 posts views Thread by Joh | last post: by
reply views Thread by Andre Baresel | last post: by
5 posts views Thread by Jerzy Karczmarczuk | last post: by
8 posts views Thread by Frank Rizzo | last post: by
3 posts views Thread by andy.leszczynski | last post: by
1 post views Thread by Schüle Daniel | last post: by
2 posts views Thread by pingu219 | last post: by
14 posts views Thread by castironpi | last post: by
reply views Thread by leo001 | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.