473,587 Members | 2,568 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

getting n items at a time from a generator

I am relatively new the python language and I am afraid to be missing
some clever construct or built-in way equivalent to my 'chunk'
generator below.

def chunk(size, items):
"""generate N items from a generator."""
chunk = []
count = 0
while True:
try:
item = items.next()
count += 1
except StopIteration:
yield chunk
break
chunk.append(it em)
if not (count % size):
yield chunk
chunk = []
count = 0
>>t = (i for i in range(30))
c = chunk(7, t)
for i in c:
.... print i
....
[0, 1, 2, 3, 4, 5, 6]
[7, 8, 9, 10, 11, 12, 13]
[14, 15, 16, 17, 18, 19, 20]
[21, 22, 23, 24, 25, 26, 27]
[28, 29]

In my real world project, I have over 250 million items that are too
big to fit in memory and that processed and later used to update
records in a database... to minimize disk IO, I found it was more
efficient to process them by batch or "chunk" of 50,000 or so. Hence

Is this the proper way to do this?

Dec 27 '07 #1
9 2011
On Dec 27, 11:34*am, Kugutsumen <kugutsu...@gma il.comwrote:
I am relatively new the python language and I am afraid to be missing
some clever construct or built-in way equivalent to my 'chunk'
generator below.

def chunk(size, items):
* * """generate N items from a generator."""
* * chunk = []
* * count = 0
* * while True:
* * * * try:
* * * * * * item = items.next()
* * * * * * count += 1
* * * * except StopIteration:
* * * * * * yield chunk
* * * * * * break
* * * * chunk.append(it em)
* * * * if not (count % size):
* * * * * * yield chunk
* * * * * * chunk = []
* * * * * * count = 0
The itertools module is always a good place to look when you've got a
complicated generator.

import itertools
import operator

def chunk(N, items):
"Group items in chunks of N"
def clump((n, _)):
return n // N
for _, group in itertools.group by(enumerate(it ems), clump):
yield itertools.imap( operator.itemge tter(1), group)

for ch in chunk(7, range(30)):
print list(ch)
I've changed chunk to return a generator rather than building a list
which is probably only going to be iterated over. But if you prefer
the list version, replace 'itertools.imap ' with 'map'.

--
Paul Hankin
Dec 27 '07 #2
On Dec 27, 7:07*pm, Paul Hankin <paul.han...@gm ail.comwrote:
On Dec 27, 11:34*am, Kugutsumen <kugutsu...@gma il.comwrote:
I am relatively new the python language and I am afraid to be missing
some clever construct or built-in way equivalent to my 'chunk'
generator below.
def chunk(size, items):
* * """generate N items from a generator."""
* * chunk = []
* * count = 0
* * while True:
* * * * try:
* * * * * * item = items.next()
* * * * * * count += 1
* * * * except StopIteration:
* * * * * * yield chunk
* * * * * * break
* * * * chunk.append(it em)
* * * * if not (count % size):
* * * * * * yield chunk
* * * * * * chunk = []
* * * * * * count = 0

The itertools module is always a good place to look when you've got a
complicated generator.

import itertools
import operator

def chunk(N, items):
* * "Group items in chunks of N"
* * def clump((n, _)):
* * * * return n // N
* * for _, group in itertools.group by(enumerate(it ems), clump):
* * * * yield itertools.imap( operator.itemge tter(1), group)

for ch in chunk(7, range(30)):
* * print list(ch)

I've changed chunk to return a generator rather than building a list
which is probably only going to be iterated over. But if you prefer
the list version, replace 'itertools.imap ' with 'map'.

--
Paul Hankin
Thanks, I am going to take a look at itertools.
I prefer the list version since I need to buffer that chunk in memory
at this point.

Dec 27 '07 #3
>>>>"Kugutsumen " == Kugutsumen <ku********@gma il.comwrites:
KugutsumenOn Dec 27, 7:07*pm, Paul Hankin <paul.han...@gm ail.comwrote:
>On Dec 27, 11:34*am, Kugutsumen <kugutsu...@gma il.comwrote:
I am relatively new the python language and I am afraid to be missing
some clever construct or built-in way equivalent to my 'chunk'
generator below.
KugutsumenThank s, I am going to take a look at itertools. I prefer the
Kugutsumenlist version since I need to buffer that chunk in memory at
Kugutsumenthis point.

Also consider this solution from O'Reilly's Python Cookbook (2nd Ed.) p705

def chop(iterable, length=2):
return izip(*(iter(ite rable),) * length)

Terry
Dec 27 '07 #4
On Dec 27, 7:24*pm, Terry Jones <te...@jon.eswr ote:
>>>"Kugutsume n" == Kugutsumen *<kugutsu...@gm ail.comwrites:

KugutsumenOn Dec 27, 7:07*pm, Paul Hankin <paul.han...@gm ail.comwrote:
On Dec 27, 11:34*am, Kugutsumen <kugutsu...@gma il.comwrote:
I am relatively new the python language and I am afraid to be missing
some clever construct or built-in way equivalent to my 'chunk'
generator below.

KugutsumenThank s, I am going to take a look at itertools. *I prefer the
Kugutsumenlist version since I need to buffer that chunk in memory at
Kugutsumenthis point.

Also consider this solution from O'Reilly's Python Cookbook (2nd Ed.) p705

* * def chop(iterable, length=2):
* * * * return izip(*(iter(ite rable),) * length)

Terry
Thanks Terry,

However, chop ignores the remainder of the data in the example.
>>t = (i for i in range(30))
c =chop (t, 7)
for ch in c:
... print ch
...
(0, 1, 2, 3, 4, 5, 6)
(7, 8, 9, 10, 11, 12, 13)
(14, 15, 16, 17, 18, 19, 20)
(21, 22, 23, 24, 25, 26, 27)

k
Dec 27 '07 #5
Kugutsumen <ku********@gma il.comwrote:
>
I am relatively new the python language and I am afraid to be missing
some clever construct or built-in way equivalent to my 'chunk'
generator below.
I have to say that I have found this to be a surprisingly common need as
well. Would this be an appropriate construct to add to itertools?
--
Tim Roberts, ti**@probo.com
Providenza & Boekelheide, Inc.
Dec 29 '07 #6
* * def chop(iterable, length=2):
* * * * return izip(*(iter(ite rable),) * length)

Is this *always* guaranteed by the language to work?
Yes!

Users requested this guarantee, and I agreed. The docs now explicitly
guarantee this behavior.
Raymond
Dec 29 '07 #7
Also consider this solution from O'Reilly's Python Cookbook (2nd Ed.) p705
>
* * def chop(iterable, length=2):
* * * * return izip(*(iter(ite rable),) * length)

However, chop ignores the remainder of the data in the example.
There is a recipe in the itertools docs which handles the odd-length
data at the end:

def grouper(n, iterable, padvalue=None):
"grouper(3, 'abcdefg', 'x') --('a','b','c'), ('d','e','f'),
('g','x','x')"
return izip(*[chain(iterable, repeat(padvalue , n-1))]*n)
Raymond
Dec 29 '07 #8
Tim Roberts <ti**@probo.com writes:
I have to say that I have found this to be a surprisingly common need as
well. Would this be an appropriate construct to add to itertools?
I'm in favor.
Jan 11 '08 #9

Paul Rubin wrote:
Tim Roberts <ti**@probo.com writes:
>I have to say that I have found this to be a surprisingly common need as
well. Would this be an appropriate construct to add to itertools?

I'm in favor.

I am ecstatic about the idea of getting n items at a time from a
generator! This would eliminate the use of less elegant functions to do
this sort of thing which I would do even more frequently if it were
easier.

Is it possible that this syntax for generator expressions could be adopted?
>>sentence = 'this is a senTence WiTH'
generator = (word.capitaliz e() for word in sentence.split( ))
print generator.next( 3,'PadValue')
('This','Is','A ')
>>print generator.next( 3,'PadValue')
('Sentence','Wi th','PadValue')
>>generator.nex t(3,'PadValue')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>>

While on the topic of generators:

Something else I have longed for is assignment within a while loop. (I
realize this might be more controversial and might have been avoided on
purpose, but I wasn't around for that discussion.)

>>sentence = 'this is a senTence WiTH'
generator = (word.capitaliz e() for word in sentence.split( ))
while a,b,c = generator.next( 3,'PadValue'):
.... print a,b,c
....
This Is A
Sentence With PadValue
>>>

--
Shane Geiger
IT Director
National Council on Economic Education
sg*****@ncee.ne t | 402-438-8958 | http://www.ncee.net

Leading the Campaign for Economic and Financial Literacy

Jan 11 '08 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
13054
by: Charlotte Henkle | last post by:
Hello; I'm pondering how to count the number of times an item appears in total in a nested list. For example: myList=,,] I'd like to know that a appeared three times, and b appeared twice, and the rest appeard only once.
16
1887
by: It's me | last post by:
Okay, I give up. What's the best way to count number of items in a list? For instance, a=,4,5,] I want to know how many items are there in a (answer should be 7 - I don't want it to be 4)
5
1815
by: Steven Bethard | last post by:
So, I have a list of lists, where the items in each sublist are of basically the same form. It looks something like: py> data = , .... .... , .... .... ] Now, I'd like to sample down the number of items in each sublist in the
10
3280
by: Adam Clauss | last post by:
I have a page containing a list box. This list may contain duplicate items - in which the ORDER is important. ex: a b b a is significant as compared to: b
5
5947
by: Nathan Sokalski | last post by:
I have a user control that contains three variables which are accessed through public properties. They are declared immediately below the "Web Form Designer Generated Code" section. Every time an event is fired by one of the controls contained in the User Control, these variable are reset. Here is my current code (I have a little more to add...
1
1678
by: Simon Forman | last post by:
I've got a function that I'd like to improve. It takes a list of lists and a "target" element, and it returns the set of the items in the lists that appear either before or after the target item. (Actually, it's a generator, and I use the set class outside of it to collect the unique items, but you get the idea. ;-) ) data = , , ,
12
21008
by: Dave Dean | last post by:
Hi all, I'm looking for a way to iterate through a list, two (or more) items at a time. Basically... myList = I'd like to be able to pull out two items at a time - simple examples would be: Create this output: 1 2
5
2111
by: krisbee1983 | last post by:
Hello to all, I'm beginer in learning Python I wish somebody help me with solving this problem. I would like to read all text files wchich are in some folder. For this text files I need to make some word frequencies using defined words like "buy", "red", "good". If some file don't have that word will get "0" for this frequency. It shoud be...
3
1824
by: Gilles Ganault | last post by:
Hello I'd like to make sure there isn't an easier way to extract all the occurences found with re.finditer: ======================= req = urllib2.Request(url, None, headers) response = urllib2.urlopen(req).read() matches = re.compile("(\d+).html").finditer(response) # ----------- BEGIN
0
7918
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
1
7967
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
0
8220
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
0
6621
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
0
5392
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3840
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
0
3875
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2353
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1452
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.