Refactoring a generator function

Kent Johnson

Here is a simple function that scans through an input file and groups the lines of the file into
sections. Sections start with 'Name:' and end with a blank line. The function yields sections as
they are found.

def makeSections(f):
currSection = []

for line in f:
line = line.strip()
if line == 'Name:':
# Start of a new section
if currSection:
yield currSection
currSection = []
currSection.append(line)

elif not line:
# Blank line ends a section
if currSection:
yield currSection
currSection = []

else:
# Accumulate into a section
currSection.append(line)

# Yield the last section
if currSection:
yield currSection

There is some obvious code duplication in the function - this bit is repeated 2.67 times ;-):
if currSection:
yield currSection
currSection = []

As a firm believer in Once and Only Once, I would like to factor this out into a separate function,
either a nested function of makeSections(), or as a separate method of a class implementation.
Something like this:

def makeSections(f): ### DOESN'T WORK ###
currSection = []

def yieldSection():
if currSection:
yield currSection
del currSection[:]

for line in f:
line = line.strip()
if line == 'Name:':
# Start of a new section
yieldSection()
currSection.append(line)

elif not line:
# Blank line ends a section
yieldSection()

else:
# Accumulate into a section
currSection.append(line)

# Yield the last section
yieldSection()
The problem is that yieldSection() now is the generator, and makeSections() is not, and the result
of calling yieldSection() is a new iterator, not the section...

Is there a way to do this or do I have to live with the duplication?

Thanks,
Kent
Here is a complete program:

data = '''
Name:
City:
xxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxx
.....................
xxxxxxxxxxxxxxxxxxxx
Name:
City:
xxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxx

'''

import cStringIO # just for test

def makeSections(f):
''' This is a generator function. It will return successive sections
of f until EOF.

Sections are every line from a 'Name:' line to the first blank line.
Sections are returned as a list of lines with line endings stripped.
'''

currSection = []

for line in f:
line = line.strip()
if line == 'Name:':
# Start of a new section
if currSection:
yield currSection
currSection = []
currSection.append(line)

elif not line:
# Blank line ends a section
if currSection:
yield currSection
currSection = []

else:
# Accumulate into a section
currSection.append(line)

# Yield the last section
if currSection:
yield currSection
f = cStringIO.StringIO(data)

for section in makeSections(f):
print 'Section'
for line in section:
print ' ', line
print

Jul 18 '05 #1

Subscribe Post Reply

1647

Steven Bethard

Kent Johnson wrote:

Here is a simple function that scans through an input file and groups
the lines of the file into sections. Sections start with 'Name:' and end
with a blank line. The function yields sections as they are found.

def makeSections(f):
currSection = []

for line in f:
line = line.strip()
if line == 'Name:':
# Start of a new section
if currSection:
yield currSection
currSection = []
currSection.append(line)

elif not line:
# Blank line ends a section
if currSection:
yield currSection
currSection = []

else:
# Accumulate into a section
currSection.append(line)

# Yield the last section
if currSection:
yield currSection

There is some obvious code duplication in the function - this bit is
repeated 2.67 times ;-):
if currSection:
yield currSection
currSection = []

You can write:

for section in yieldSection():
yield section

in both places, but I assume you still don't like the code duplication
this would create.

How about something like (completely untested):

if line == 'Name:' or not line:
if currSection:
yield currSection
currSection = []
if line == 'Name:'
currSection.append(line)

Another consideration: in Python 2.4, itertools has a groupby function
that you could probably get some benefit from:

class Sections(object): .... def __init__(self):
.... self.is_section = False
.... def __call__(self, line):
.... if line == 'Name:\n':
.... self.is_section = True
.... elif line == '\n':
.... self.is_section = False
.... return self.is_section
.... def make_sections(f): .... for _, section in itertools.groupby(f, Sections()):
.... result = ''.join(section)
.... if result != '\n':
.... yield result
.... f = 'Name:\nA\nx\ny\nz\n\nName:\nB\na\nb\nc\n'.splitli nes(True)
list(make_sections(f))

['Name:\nA\nx\ny\nz\n', 'Name:\nB\na\nb\nc\n']

Jul 18 '05 #2

max

Kent Johnson <ke******@yahoo.com> wrote in
news:41**********@newspeer2.tds.net:

Here is a simple function that scans through an input file and
groups the lines of the file into sections. Sections start with
'Name:' and end with a blank line. The function yields sections
as they are found.

def makeSections(f):
currSection = []

for line in f:
line = line.strip()
if line == 'Name:':
# Start of a new section
if currSection:
yield currSection
currSection = []
currSection.append(line)

elif not line:
# Blank line ends a section
if currSection:
yield currSection
currSection = []

else:
# Accumulate into a section
currSection.append(line)

# Yield the last section
if currSection:
yield currSection

There is some obvious code duplication in the function - this bit
is repeated 2.67 times ;-):
if currSection:
yield currSection
currSection = []

As a firm believer in Once and Only Once, I would like to factor
this out into a separate function, either a nested function of
makeSections(), or as a separate method of a class
implementation. Something like this:
The problem is that yieldSection() now is the generator, and
makeSections() is not, and the result of calling yieldSection()
is a new iterator, not the section...

Is there a way to do this or do I have to live with the
duplication?

Thanks,
Kent

This gets rid of some duplication by ignoring blanklines altogether,
which might be a bug...

def makeSections2(f):
currSection = []
for line in f:
line = line.strip()
if line:
if line == 'Name:':
if currSection:
yield cs
currSection = []
currSection.append(line)
if currSection:
yield currSection

but

def makeSections2(f):
currSection = []
for line in f:
line = line.strip()

if line:
if line == 'Name:':
if currSection:
yield currSection
currSection = []
currSection.append(line)

elif currSection:
yield currSection

if currSection:
yield currSection

should be equivalent.

Jul 18 '05 #3

by: Francis Avila | last post by:

A little annoyed one day that I couldn't use the statefulness of generators as "resumable functions", I came across Hettinger's PEP 288 (http://www.python.org/peps/pep-0288.html, still listed as...

Python

need help on generator...

by: Joh | last post by:

hello, i'm trying to understand how i could build following consecutive sets from a root one using generator : l = would like to produce : , , , ,

Python

C++ Refactoring Tool

by: Andre Baresel | last post by:

Hello together, just a year ago I was searching arround for a tool supporting refactoring for c++. I've seen implementations for java and was impressed how an IDE can help with such a feature....

C / C++

What is executed when in a generator

by: Jerzy Karczmarczuk | last post by:

I thought that the following sequence gl=0 def gen(x): global gl gl=x yield x s=gen(1)

Python

Refactoring???

by: Frank Rizzo | last post by:

I keep hearing this term thrown around. What does it mean in the context of code? Can someone provide a definition and example using concrete code? Thanks.

C# / C Sharp

generator with subfunction calling yield

by: andy.leszczynski | last post by:

Hi, I might understand why this does not work, but I am not convinced it should not - following: def nnn(): print 'inside' yield 1 def nn():

Python

how to convert a function into generator?

by: Schüle Daniel | last post by:

Hello, I came up with this algorithm to generate all permutations it's not the best one, but it's easy enough # lst = list with objects def permute3(lst): tmp = lenlst = len(lst) def...

Python

Making a C Refactoring Program

by: pingu219 | last post by:

Hi I'm currently in the midst of building a C high-level refactoring program in Java but I was wondering if there are any good parsers (or some other alternative) which are able to read in C files...

C / C++

generator functions in another language

by: castironpi | last post by:

I'm actually curious if there's a way to write a generator function (not a generator expression) in C, or what the simplest way to do it is... besides link the Python run-time.

Python

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

Refactoring a generator function

Similar topics