By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
425,805 Members | 1,056 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 425,805 IT Pros & Developers. It's quick & easy.

iterblocks cookbook example

P: n/a
George Sakkis produced the following cookbook recipe,
which addresses a common problem that comes up on this
mailing list:

http://aspn.activestate.com/ASPN/Coo.../Recipe/521877
I would propose adding something like this to the
cookbook example above.

def iterblocks2(lst, start_delim):
# This variation on iterblocks shows a more
typical
# implementation that behaves like iterblocks for
# the Hello World example. The problem with this
naive
# implementation is that you cannot pass arbitrary
# iterators.
blocks = []
new_block = []
for item in lst:
if start_delim(item):
if new_block: blocks.append(new_block)
new_block = []
else:
new_block.append(item)
if new_block: blocks.append(new_block)
return blocks

Comments welcome. This has been tested on George's
slow-version-of-string-split example. It treates the
delimiter as not being part of the block, and it punts
on the issue of what to do when you have empty blocks
(i.e. consecutive delimiters).



__________________________________________________ __________________________________
Be a better Heartthrob. Get better relationship answers from someone who knows. Yahoo! Answers - Check it out.
http://answers.yahoo.com/dir/?link=list&sid=396545433
Jun 2 '07 #1
Share this Question
Share on Google+
3 Replies


P: n/a
On Jun 2, 10:19 am, Steve Howell <showel...@yahoo.comwrote:
George Sakkis produced the following cookbook recipe,
which addresses a common problem that comes up on this
mailing list:
ISTM, this is a common mailing list problem because it is fun
to solve, not because people actually need it on a day-to-day basis.

In that spirit, it would be fun to compare several different
approaches to the same problem using re.finditer, itertools.groupby,
or the tokenize module. To get the ball rolling, here is one variant:

from itertools import groupby

def blocks(s, start, end):
def classify(c, ingroup=[0], delim={start:2, end:3}):
result = delim.get(c, ingroup[0])
ingroup[0] = result in (1, 2)
return result
return [tuple(g) for k, g in groupby(s, classify) if k == 1]

print blocks('the <quickbrown <foxjumped', start='<', end='>')

One observation is that groupby() is an enormously flexible tool.
Given a well crafted key= function, it makes short work of almost
any data partitioning problem.
Raymond

Jun 2 '07 #2

P: n/a
On Jun 2, 10:47 pm, Raymond Hettinger <pyt...@rcn.comwrote:
On Jun 2, 10:19 am, Steve Howell <showel...@yahoo.comwrote:
George Sakkis produced the following cookbook recipe,
which addresses a common problem that comes up on this
mailing list:

ISTM, this is a common mailing list problem because it is fun
to solve, not because people actually need it on a day-to-day basis.

In that spirit, it would be fun to compare several different
approaches to the same problem using re.finditer, itertools.groupby,
or the tokenize module. To get the ball rolling, here is one variant:

from itertools import groupby

def blocks(s, start, end):
def classify(c, ingroup=[0], delim={start:2, end:3}):
result = delim.get(c, ingroup[0])
ingroup[0] = result in (1, 2)
return result
return [tuple(g) for k, g in groupby(s, classify) if k == 1]

print blocks('the <quickbrown <foxjumped', start='<', end='>')

One observation is that groupby() is an enormously flexible tool.
Given a well crafted key= function, it makes short work of almost
any data partitioning problem.
Can anyone suggest a function that will split text by paragraphs, but
NOT if the paragraphs are contained within a
...
construct. In other words, the following text should yield 3 blocks
not 6:

TEXT = '''
Lorem ipsum dolor sit amet, consectetuer adipiscing elit.
Pellentesque dolor quam, dignissim ornare, porta et,
auctor eu, leo. Phasellus malesuada metus id magna.

Only when flight shall soar
not for its own sake only
up into heaven's lonely
silence, and be no more

merely the lightly profiling,
proudly successful tool,
playmate of winds, beguiling
time there, careless and cool:

only when some pure Whither
outweighs boyish insistence
on the achieved machine

will who has journeyed thither
be, in that fading distance,
all that his flight has been.
Integer urna nulla, tempus sit amet, ultrices interdum,
rhoncus eget, ipsum. Cum sociis natoque penatibus et
magnis dis parturient montes, nascetur ridiculus mus.
'''

Other info:

* don't worry about nesting
* the
and
musn't be stripped.

Gerard

Jun 4 '07 #3

P: n/a
On Jun 4, 1:52 pm, Gerard Flanagan <grflana...@yahoo.co.ukwrote:
On Jun 2, 10:47 pm, Raymond Hettinger <pyt...@rcn.comwrote:
On Jun 2, 10:19 am, Steve Howell <showel...@yahoo.comwrote:
George Sakkis produced the following cookbook recipe,
which addresses a common problem that comes up on this
mailing list:
ISTM, this is a common mailing list problem because it is fun
to solve, not because people actually need it on a day-to-day basis.
In that spirit, it would be fun to compare several different
approaches to the same problem using re.finditer, itertools.groupby,
or the tokenize module. To get the ball rolling, here is one variant:
from itertools import groupby
def blocks(s, start, end):
def classify(c, ingroup=[0], delim={start:2, end:3}):
result = delim.get(c, ingroup[0])
ingroup[0] = result in (1, 2)
return result
return [tuple(g) for k, g in groupby(s, classify) if k == 1]
print blocks('the <quickbrown <foxjumped', start='<', end='>')
One observation is that groupby() is an enormously flexible tool.
Given a well crafted key= function, it makes short work of almost
any data partitioning problem.

Can anyone suggest a function that will split text by paragraphs, but
NOT if the paragraphs are contained within a
...
construct. In other words, the following text should yield 3 blocks
not 6:

TEXT = '''
Lorem ipsum dolor sit amet, consectetuer adipiscing elit.
Pellentesque dolor quam, dignissim ornare, porta et,
auctor eu, leo. Phasellus malesuada metus id magna.

Only when flight shall soar
not for its own sake only
up into heaven's lonely
silence, and be no more

merely the lightly profiling,
proudly successful tool,
playmate of winds, beguiling
time there, careless and cool:

only when some pure Whither
outweighs boyish insistence
on the achieved machine

will who has journeyed thither
be, in that fading distance,
all that his flight has been.

Integer urna nulla, tempus sit amet, ultrices interdum,
rhoncus eget, ipsum. Cum sociis natoque penatibus et
magnis dis parturient montes, nascetur ridiculus mus.
'''

Other info:

* don't worry about nesting
* the
and
musn't be stripped.

Gerard
(Sorry if I ruined the parent thread.) FWIW, I didn't get a groupby
solution but with some help from the Python Cookbook (O'Reilly), I
came up with the following:

import re

RE_START_BLOCK = re.compile('^\[[\w|\s]*\]$')
RE_END_BLOCK = re.compile('^\[/[\w|\s]*\]$')

def iter_blocks(lines):
block = []
inblock = False
for line in lines:
if line.isspace():
if inblock:
block.append(line)
elif block:
yield block
block = []
else:
if RE_START_BLOCK.match(line):
inblock = True
elif RE_END_BLOCK.match(line):
inblock = False
block.append(line.lstrip())
if block:
yield block

Jun 4 '07 #4

This discussion thread is closed

Replies have been disabled for this discussion.