473,386 Members | 1,647 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

getting n items at a time from a generator

I am relatively new the python language and I am afraid to be missing
some clever construct or built-in way equivalent to my 'chunk'
generator below.

def chunk(size, items):
"""generate N items from a generator."""
chunk = []
count = 0
while True:
try:
item = items.next()
count += 1
except StopIteration:
yield chunk
break
chunk.append(item)
if not (count % size):
yield chunk
chunk = []
count = 0
>>t = (i for i in range(30))
c = chunk(7, t)
for i in c:
.... print i
....
[0, 1, 2, 3, 4, 5, 6]
[7, 8, 9, 10, 11, 12, 13]
[14, 15, 16, 17, 18, 19, 20]
[21, 22, 23, 24, 25, 26, 27]
[28, 29]

In my real world project, I have over 250 million items that are too
big to fit in memory and that processed and later used to update
records in a database... to minimize disk IO, I found it was more
efficient to process them by batch or "chunk" of 50,000 or so. Hence

Is this the proper way to do this?

Dec 27 '07 #1
9 2002
On Dec 27, 11:34*am, Kugutsumen <kugutsu...@gmail.comwrote:
I am relatively new the python language and I am afraid to be missing
some clever construct or built-in way equivalent to my 'chunk'
generator below.

def chunk(size, items):
* * """generate N items from a generator."""
* * chunk = []
* * count = 0
* * while True:
* * * * try:
* * * * * * item = items.next()
* * * * * * count += 1
* * * * except StopIteration:
* * * * * * yield chunk
* * * * * * break
* * * * chunk.append(item)
* * * * if not (count % size):
* * * * * * yield chunk
* * * * * * chunk = []
* * * * * * count = 0
The itertools module is always a good place to look when you've got a
complicated generator.

import itertools
import operator

def chunk(N, items):
"Group items in chunks of N"
def clump((n, _)):
return n // N
for _, group in itertools.groupby(enumerate(items), clump):
yield itertools.imap(operator.itemgetter(1), group)

for ch in chunk(7, range(30)):
print list(ch)
I've changed chunk to return a generator rather than building a list
which is probably only going to be iterated over. But if you prefer
the list version, replace 'itertools.imap' with 'map'.

--
Paul Hankin
Dec 27 '07 #2
On Dec 27, 7:07*pm, Paul Hankin <paul.han...@gmail.comwrote:
On Dec 27, 11:34*am, Kugutsumen <kugutsu...@gmail.comwrote:
I am relatively new the python language and I am afraid to be missing
some clever construct or built-in way equivalent to my 'chunk'
generator below.
def chunk(size, items):
* * """generate N items from a generator."""
* * chunk = []
* * count = 0
* * while True:
* * * * try:
* * * * * * item = items.next()
* * * * * * count += 1
* * * * except StopIteration:
* * * * * * yield chunk
* * * * * * break
* * * * chunk.append(item)
* * * * if not (count % size):
* * * * * * yield chunk
* * * * * * chunk = []
* * * * * * count = 0

The itertools module is always a good place to look when you've got a
complicated generator.

import itertools
import operator

def chunk(N, items):
* * "Group items in chunks of N"
* * def clump((n, _)):
* * * * return n // N
* * for _, group in itertools.groupby(enumerate(items), clump):
* * * * yield itertools.imap(operator.itemgetter(1), group)

for ch in chunk(7, range(30)):
* * print list(ch)

I've changed chunk to return a generator rather than building a list
which is probably only going to be iterated over. But if you prefer
the list version, replace 'itertools.imap' with 'map'.

--
Paul Hankin
Thanks, I am going to take a look at itertools.
I prefer the list version since I need to buffer that chunk in memory
at this point.

Dec 27 '07 #3
>>>>"Kugutsumen" == Kugutsumen <ku********@gmail.comwrites:
KugutsumenOn Dec 27, 7:07*pm, Paul Hankin <paul.han...@gmail.comwrote:
>On Dec 27, 11:34*am, Kugutsumen <kugutsu...@gmail.comwrote:
I am relatively new the python language and I am afraid to be missing
some clever construct or built-in way equivalent to my 'chunk'
generator below.
KugutsumenThanks, I am going to take a look at itertools. I prefer the
Kugutsumenlist version since I need to buffer that chunk in memory at
Kugutsumenthis point.

Also consider this solution from O'Reilly's Python Cookbook (2nd Ed.) p705

def chop(iterable, length=2):
return izip(*(iter(iterable),) * length)

Terry
Dec 27 '07 #4
On Dec 27, 7:24*pm, Terry Jones <te...@jon.eswrote:
>>>"Kugutsumen" == Kugutsumen *<kugutsu...@gmail.comwrites:

KugutsumenOn Dec 27, 7:07*pm, Paul Hankin <paul.han...@gmail.comwrote:
On Dec 27, 11:34*am, Kugutsumen <kugutsu...@gmail.comwrote:
I am relatively new the python language and I am afraid to be missing
some clever construct or built-in way equivalent to my 'chunk'
generator below.

KugutsumenThanks, I am going to take a look at itertools. *I prefer the
Kugutsumenlist version since I need to buffer that chunk in memory at
Kugutsumenthis point.

Also consider this solution from O'Reilly's Python Cookbook (2nd Ed.) p705

* * def chop(iterable, length=2):
* * * * return izip(*(iter(iterable),) * length)

Terry
Thanks Terry,

However, chop ignores the remainder of the data in the example.
>>t = (i for i in range(30))
c =chop (t, 7)
for ch in c:
... print ch
...
(0, 1, 2, 3, 4, 5, 6)
(7, 8, 9, 10, 11, 12, 13)
(14, 15, 16, 17, 18, 19, 20)
(21, 22, 23, 24, 25, 26, 27)

k
Dec 27 '07 #5
Kugutsumen <ku********@gmail.comwrote:
>
I am relatively new the python language and I am afraid to be missing
some clever construct or built-in way equivalent to my 'chunk'
generator below.
I have to say that I have found this to be a surprisingly common need as
well. Would this be an appropriate construct to add to itertools?
--
Tim Roberts, ti**@probo.com
Providenza & Boekelheide, Inc.
Dec 29 '07 #6
* * def chop(iterable, length=2):
* * * * return izip(*(iter(iterable),) * length)

Is this *always* guaranteed by the language to work?
Yes!

Users requested this guarantee, and I agreed. The docs now explicitly
guarantee this behavior.
Raymond
Dec 29 '07 #7
Also consider this solution from O'Reilly's Python Cookbook (2nd Ed.) p705
>
* * def chop(iterable, length=2):
* * * * return izip(*(iter(iterable),) * length)

However, chop ignores the remainder of the data in the example.
There is a recipe in the itertools docs which handles the odd-length
data at the end:

def grouper(n, iterable, padvalue=None):
"grouper(3, 'abcdefg', 'x') --('a','b','c'), ('d','e','f'),
('g','x','x')"
return izip(*[chain(iterable, repeat(padvalue, n-1))]*n)
Raymond
Dec 29 '07 #8
Tim Roberts <ti**@probo.comwrites:
I have to say that I have found this to be a surprisingly common need as
well. Would this be an appropriate construct to add to itertools?
I'm in favor.
Jan 11 '08 #9

Paul Rubin wrote:
Tim Roberts <ti**@probo.comwrites:
>I have to say that I have found this to be a surprisingly common need as
well. Would this be an appropriate construct to add to itertools?

I'm in favor.

I am ecstatic about the idea of getting n items at a time from a
generator! This would eliminate the use of less elegant functions to do
this sort of thing which I would do even more frequently if it were
easier.

Is it possible that this syntax for generator expressions could be adopted?
>>sentence = 'this is a senTence WiTH'
generator = (word.capitalize() for word in sentence.split())
print generator.next(3,'PadValue')
('This','Is','A')
>>print generator.next(3,'PadValue')
('Sentence','With','PadValue')
>>generator.next(3,'PadValue')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>>

While on the topic of generators:

Something else I have longed for is assignment within a while loop. (I
realize this might be more controversial and might have been avoided on
purpose, but I wasn't around for that discussion.)

>>sentence = 'this is a senTence WiTH'
generator = (word.capitalize() for word in sentence.split())
while a,b,c = generator.next(3,'PadValue'):
.... print a,b,c
....
This Is A
Sentence With PadValue
>>>

--
Shane Geiger
IT Director
National Council on Economic Education
sg*****@ncee.net | 402-438-8958 | http://www.ncee.net

Leading the Campaign for Economic and Financial Literacy

Jan 11 '08 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: Charlotte Henkle | last post by:
Hello; I'm pondering how to count the number of times an item appears in total in a nested list. For example: myList=,,] I'd like to know that a appeared three times, and b appeared twice,...
16
by: It's me | last post by:
Okay, I give up. What's the best way to count number of items in a list? For instance, a=,4,5,] I want to know how many items are there in a (answer should be 7 - I don't want it to be 4)
5
by: Steven Bethard | last post by:
So, I have a list of lists, where the items in each sublist are of basically the same form. It looks something like: py> data = , .... .... , .... .... ] Now, I'd like to...
10
by: Adam Clauss | last post by:
I have a page containing a list box. This list may contain duplicate items - in which the ORDER is important. ex: a b b a is significant as compared to: b
5
by: Nathan Sokalski | last post by:
I have a user control that contains three variables which are accessed through public properties. They are declared immediately below the "Web Form Designer Generated Code" section. Every time an...
1
by: Simon Forman | last post by:
I've got a function that I'd like to improve. It takes a list of lists and a "target" element, and it returns the set of the items in the lists that appear either before or after the target...
12
by: Dave Dean | last post by:
Hi all, I'm looking for a way to iterate through a list, two (or more) items at a time. Basically... myList = I'd like to be able to pull out two items at a time - simple examples would...
5
by: krisbee1983 | last post by:
Hello to all, I'm beginer in learning Python I wish somebody help me with solving this problem. I would like to read all text files wchich are in some folder. For this text files I need to make...
3
by: Gilles Ganault | last post by:
Hello I'd like to make sure there isn't an easier way to extract all the occurences found with re.finditer: ======================= req = urllib2.Request(url, None, headers) response =...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.