473,411 Members | 2,080 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,411 software developers and data experts.

Refactoring a generator function

Here is a simple function that scans through an input file and groups the lines of the file into
sections. Sections start with 'Name:' and end with a blank line. The function yields sections as
they are found.

def makeSections(f):
currSection = []

for line in f:
line = line.strip()
if line == 'Name:':
# Start of a new section
if currSection:
yield currSection
currSection = []
currSection.append(line)

elif not line:
# Blank line ends a section
if currSection:
yield currSection
currSection = []

else:
# Accumulate into a section
currSection.append(line)

# Yield the last section
if currSection:
yield currSection

There is some obvious code duplication in the function - this bit is repeated 2.67 times ;-):
if currSection:
yield currSection
currSection = []

As a firm believer in Once and Only Once, I would like to factor this out into a separate function,
either a nested function of makeSections(), or as a separate method of a class implementation.
Something like this:

def makeSections(f): ### DOESN'T WORK ###
currSection = []

def yieldSection():
if currSection:
yield currSection
del currSection[:]

for line in f:
line = line.strip()
if line == 'Name:':
# Start of a new section
yieldSection()
currSection.append(line)

elif not line:
# Blank line ends a section
yieldSection()

else:
# Accumulate into a section
currSection.append(line)

# Yield the last section
yieldSection()
The problem is that yieldSection() now is the generator, and makeSections() is not, and the result
of calling yieldSection() is a new iterator, not the section...

Is there a way to do this or do I have to live with the duplication?

Thanks,
Kent
Here is a complete program:

data = '''
Name:
City:
xxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxx
.....................
xxxxxxxxxxxxxxxxxxxx
Name:
City:
xxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxx

'''

import cStringIO # just for test

def makeSections(f):
''' This is a generator function. It will return successive sections
of f until EOF.

Sections are every line from a 'Name:' line to the first blank line.
Sections are returned as a list of lines with line endings stripped.
'''

currSection = []

for line in f:
line = line.strip()
if line == 'Name:':
# Start of a new section
if currSection:
yield currSection
currSection = []
currSection.append(line)

elif not line:
# Blank line ends a section
if currSection:
yield currSection
currSection = []

else:
# Accumulate into a section
currSection.append(line)

# Yield the last section
if currSection:
yield currSection
f = cStringIO.StringIO(data)

for section in makeSections(f):
print 'Section'
for line in section:
print ' ', line
print
Jul 18 '05 #1
2 1647
Kent Johnson wrote:
Here is a simple function that scans through an input file and groups
the lines of the file into sections. Sections start with 'Name:' and end
with a blank line. The function yields sections as they are found.

def makeSections(f):
currSection = []

for line in f:
line = line.strip()
if line == 'Name:':
# Start of a new section
if currSection:
yield currSection
currSection = []
currSection.append(line)

elif not line:
# Blank line ends a section
if currSection:
yield currSection
currSection = []

else:
# Accumulate into a section
currSection.append(line)

# Yield the last section
if currSection:
yield currSection

There is some obvious code duplication in the function - this bit is
repeated 2.67 times ;-):
if currSection:
yield currSection
currSection = []


You can write:

for section in yieldSection():
yield section

in both places, but I assume you still don't like the code duplication
this would create.

How about something like (completely untested):

if line == 'Name:' or not line:
if currSection:
yield currSection
currSection = []
if line == 'Name:'
currSection.append(line)

Another consideration: in Python 2.4, itertools has a groupby function
that you could probably get some benefit from:
class Sections(object): .... def __init__(self):
.... self.is_section = False
.... def __call__(self, line):
.... if line == 'Name:\n':
.... self.is_section = True
.... elif line == '\n':
.... self.is_section = False
.... return self.is_section
.... def make_sections(f): .... for _, section in itertools.groupby(f, Sections()):
.... result = ''.join(section)
.... if result != '\n':
.... yield result
.... f = 'Name:\nA\nx\ny\nz\n\nName:\nB\na\nb\nc\n'.splitli nes(True)
list(make_sections(f))

['Name:\nA\nx\ny\nz\n', 'Name:\nB\na\nb\nc\n']
Jul 18 '05 #2
max
Kent Johnson <ke******@yahoo.com> wrote in
news:41**********@newspeer2.tds.net:
Here is a simple function that scans through an input file and
groups the lines of the file into sections. Sections start with
'Name:' and end with a blank line. The function yields sections
as they are found.

def makeSections(f):
currSection = []

for line in f:
line = line.strip()
if line == 'Name:':
# Start of a new section
if currSection:
yield currSection
currSection = []
currSection.append(line)

elif not line:
# Blank line ends a section
if currSection:
yield currSection
currSection = []

else:
# Accumulate into a section
currSection.append(line)

# Yield the last section
if currSection:
yield currSection

There is some obvious code duplication in the function - this bit
is repeated 2.67 times ;-):
if currSection:
yield currSection
currSection = []

As a firm believer in Once and Only Once, I would like to factor
this out into a separate function, either a nested function of
makeSections(), or as a separate method of a class
implementation. Something like this:
The problem is that yieldSection() now is the generator, and
makeSections() is not, and the result of calling yieldSection()
is a new iterator, not the section...

Is there a way to do this or do I have to live with the
duplication?

Thanks,
Kent


This gets rid of some duplication by ignoring blanklines altogether,
which might be a bug...

def makeSections2(f):
currSection = []
for line in f:
line = line.strip()
if line:
if line == 'Name:':
if currSection:
yield cs
currSection = []
currSection.append(line)
if currSection:
yield currSection

but

def makeSections2(f):
currSection = []
for line in f:
line = line.strip()

if line:
if line == 'Name:':
if currSection:
yield currSection
currSection = []
currSection.append(line)

elif currSection:
yield currSection

if currSection:
yield currSection

should be equivalent.
Jul 18 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
by: Francis Avila | last post by:
A little annoyed one day that I couldn't use the statefulness of generators as "resumable functions", I came across Hettinger's PEP 288 (http://www.python.org/peps/pep-0288.html, still listed as...
45
by: Joh | last post by:
hello, i'm trying to understand how i could build following consecutive sets from a root one using generator : l = would like to produce : , , , ,
0
by: Andre Baresel | last post by:
Hello together, just a year ago I was searching arround for a tool supporting refactoring for c++. I've seen implementations for java and was impressed how an IDE can help with such a feature....
5
by: Jerzy Karczmarczuk | last post by:
I thought that the following sequence gl=0 def gen(x): global gl gl=x yield x s=gen(1)
8
by: Frank Rizzo | last post by:
I keep hearing this term thrown around. What does it mean in the context of code? Can someone provide a definition and example using concrete code? Thanks.
3
by: andy.leszczynski | last post by:
Hi, I might understand why this does not work, but I am not convinced it should not - following: def nnn(): print 'inside' yield 1 def nn():
1
by: Schüle Daniel | last post by:
Hello, I came up with this algorithm to generate all permutations it's not the best one, but it's easy enough # lst = list with objects def permute3(lst): tmp = lenlst = len(lst) def...
2
by: pingu219 | last post by:
Hi I'm currently in the midst of building a C high-level refactoring program in Java but I was wondering if there are any good parsers (or some other alternative) which are able to read in C files...
14
by: castironpi | last post by:
I'm actually curious if there's a way to write a generator function (not a generator expression) in C, or what the simplest way to do it is... besides link the Python run-time.
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.