I am relatively new the python language and I am afraid to be missing
some clever construct or built-in way equivalent to my 'chunk'
generator below.
def chunk(size, items):
"""generate N items from a generator."""
chunk = []
count = 0
while True:
try:
item = items.next()
count += 1
except StopIteration:
yield chunk
break
chunk.append(it em)
if not (count % size):
yield chunk
chunk = []
count = 0
>>t = (i for i in range(30)) c = chunk(7, t) for i in c:
.... print i
....
[0, 1, 2, 3, 4, 5, 6]
[7, 8, 9, 10, 11, 12, 13]
[14, 15, 16, 17, 18, 19, 20]
[21, 22, 23, 24, 25, 26, 27]
[28, 29]
In my real world project, I have over 250 million items that are too
big to fit in memory and that processed and later used to update
records in a database... to minimize disk IO, I found it was more
efficient to process them by batch or "chunk" of 50,000 or so. Hence
Is this the proper way to do this? 9 2011
On Dec 27, 11:34*am, Kugutsumen <kugutsu...@gma il.comwrote:
I am relatively new the python language and I am afraid to be missing
some clever construct or built-in way equivalent to my 'chunk'
generator below.
def chunk(size, items):
* * """generate N items from a generator."""
* * chunk = []
* * count = 0
* * while True:
* * * * try:
* * * * * * item = items.next()
* * * * * * count += 1
* * * * except StopIteration:
* * * * * * yield chunk
* * * * * * break
* * * * chunk.append(it em)
* * * * if not (count % size):
* * * * * * yield chunk
* * * * * * chunk = []
* * * * * * count = 0
The itertools module is always a good place to look when you've got a
complicated generator.
import itertools
import operator
def chunk(N, items):
"Group items in chunks of N"
def clump((n, _)):
return n // N
for _, group in itertools.group by(enumerate(it ems), clump):
yield itertools.imap( operator.itemge tter(1), group)
for ch in chunk(7, range(30)):
print list(ch)
I've changed chunk to return a generator rather than building a list
which is probably only going to be iterated over. But if you prefer
the list version, replace 'itertools.imap ' with 'map'.
--
Paul Hankin
On Dec 27, 7:07*pm, Paul Hankin <paul.han...@gm ail.comwrote:
On Dec 27, 11:34*am, Kugutsumen <kugutsu...@gma il.comwrote:
I am relatively new the python language and I am afraid to be missing
some clever construct or built-in way equivalent to my 'chunk'
generator below.
def chunk(size, items):
* * """generate N items from a generator."""
* * chunk = []
* * count = 0
* * while True:
* * * * try:
* * * * * * item = items.next()
* * * * * * count += 1
* * * * except StopIteration:
* * * * * * yield chunk
* * * * * * break
* * * * chunk.append(it em)
* * * * if not (count % size):
* * * * * * yield chunk
* * * * * * chunk = []
* * * * * * count = 0
The itertools module is always a good place to look when you've got a
complicated generator.
import itertools
import operator
def chunk(N, items):
* * "Group items in chunks of N"
* * def clump((n, _)):
* * * * return n // N
* * for _, group in itertools.group by(enumerate(it ems), clump):
* * * * yield itertools.imap( operator.itemge tter(1), group)
for ch in chunk(7, range(30)):
* * print list(ch)
I've changed chunk to return a generator rather than building a list
which is probably only going to be iterated over. But if you prefer
the list version, replace 'itertools.imap ' with 'map'.
--
Paul Hankin
Thanks, I am going to take a look at itertools.
I prefer the list version since I need to buffer that chunk in memory
at this point.
>>>>"Kugutsumen " == Kugutsumen <ku********@gma il.comwrites:
KugutsumenOn Dec 27, 7:07*pm, Paul Hankin <paul.han...@gm ail.comwrote:
>On Dec 27, 11:34*am, Kugutsumen <kugutsu...@gma il.comwrote:
I am relatively new the python language and I am afraid to be missing
some clever construct or built-in way equivalent to my 'chunk'
generator below.
KugutsumenThank s, I am going to take a look at itertools. I prefer the
Kugutsumenlist version since I need to buffer that chunk in memory at
Kugutsumenthis point.
Also consider this solution from O'Reilly's Python Cookbook (2nd Ed.) p705
def chop(iterable, length=2):
return izip(*(iter(ite rable),) * length)
Terry
On Dec 27, 7:24*pm, Terry Jones <te...@jon.eswr ote:
>>>"Kugutsume n" == Kugutsumen *<kugutsu...@gm ail.comwrites:
KugutsumenOn Dec 27, 7:07*pm, Paul Hankin <paul.han...@gm ail.comwrote:
On Dec 27, 11:34*am, Kugutsumen <kugutsu...@gma il.comwrote:
I am relatively new the python language and I am afraid to be missing
some clever construct or built-in way equivalent to my 'chunk'
generator below.
KugutsumenThank s, I am going to take a look at itertools. *I prefer the
Kugutsumenlist version since I need to buffer that chunk in memory at
Kugutsumenthis point.
Also consider this solution from O'Reilly's Python Cookbook (2nd Ed.) p705
* * def chop(iterable, length=2):
* * * * return izip(*(iter(ite rable),) * length)
Terry
Thanks Terry,
However, chop ignores the remainder of the data in the example.
>>t = (i for i in range(30)) c =chop (t, 7) for ch in c:
... print ch
...
(0, 1, 2, 3, 4, 5, 6)
(7, 8, 9, 10, 11, 12, 13)
(14, 15, 16, 17, 18, 19, 20)
(21, 22, 23, 24, 25, 26, 27)
k
Kugutsumen <ku********@gma il.comwrote:
> I am relatively new the python language and I am afraid to be missing some clever construct or built-in way equivalent to my 'chunk' generator below.
I have to say that I have found this to be a surprisingly common need as
well. Would this be an appropriate construct to add to itertools?
--
Tim Roberts, ti**@probo.com
Providenza & Boekelheide, Inc.
* * def chop(iterable, length=2):
* * * * return izip(*(iter(ite rable),) * length)
Is this *always* guaranteed by the language to work?
Yes!
Users requested this guarantee, and I agreed. The docs now explicitly
guarantee this behavior.
Raymond
Also consider this solution from O'Reilly's Python Cookbook (2nd Ed.) p705
>
* * def chop(iterable, length=2):
* * * * return izip(*(iter(ite rable),) * length)
However, chop ignores the remainder of the data in the example.
There is a recipe in the itertools docs which handles the odd-length
data at the end:
def grouper(n, iterable, padvalue=None):
"grouper(3, 'abcdefg', 'x') --('a','b','c'), ('d','e','f'),
('g','x','x')"
return izip(*[chain(iterable, repeat(padvalue , n-1))]*n)
Raymond
Tim Roberts <ti**@probo.com writes:
I have to say that I have found this to be a surprisingly common need as
well. Would this be an appropriate construct to add to itertools?
I'm in favor.
Paul Rubin wrote:
Tim Roberts <ti**@probo.com writes:
>I have to say that I have found this to be a surprisingly common need as well. Would this be an appropriate construct to add to itertools?
I'm in favor.
I am ecstatic about the idea of getting n items at a time from a
generator! This would eliminate the use of less elegant functions to do
this sort of thing which I would do even more frequently if it were
easier.
Is it possible that this syntax for generator expressions could be adopted?
>>sentence = 'this is a senTence WiTH' generator = (word.capitaliz e() for word in sentence.split( )) print generator.next( 3,'PadValue')
('This','Is','A ')
>>print generator.next( 3,'PadValue')
('Sentence','Wi th','PadValue')
>>generator.nex t(3,'PadValue')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>>
While on the topic of generators:
Something else I have longed for is assignment within a while loop. (I
realize this might be more controversial and might have been avoided on
purpose, but I wasn't around for that discussion.)
>>sentence = 'this is a senTence WiTH' generator = (word.capitaliz e() for word in sentence.split( )) while a,b,c = generator.next( 3,'PadValue'):
.... print a,b,c
....
This Is A
Sentence With PadValue
>>>
--
Shane Geiger
IT Director
National Council on Economic Education sg*****@ncee.ne t | 402-438-8958 | http://www.ncee.net
Leading the Campaign for Economic and Financial Literacy This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Charlotte Henkle |
last post by:
Hello;
I'm pondering how to count the number of times an item appears in
total in a nested list. For example:
myList=,,]
I'd like to know that a appeared three times, and b appeared twice,
and the rest appeard only once.
|
by: It's me |
last post by:
Okay, I give up.
What's the best way to count number of items in a list?
For instance,
a=,4,5,]
I want to know how many items are there in a (answer should be 7 - I don't
want it to be 4)
|
by: Steven Bethard |
last post by:
So, I have a list of lists, where the items in each sublist are of
basically the same form. It looks something like:
py> data = ,
....
.... ,
....
.... ]
Now, I'd like to sample down the number of items in each sublist in the
|
by: Adam Clauss |
last post by:
I have a page containing a list box. This list may contain duplicate
items - in which the ORDER is important.
ex:
a
b
b
a
is significant as compared to:
b
|
by: Nathan Sokalski |
last post by:
I have a user control that contains three variables which are accessed through public properties. They are declared immediately below the "Web Form Designer Generated Code" section. Every time an event is fired by one of the controls contained in the User Control, these variable are reset. Here is my current code (I have a little more to add...
| |
by: Simon Forman |
last post by:
I've got a function that I'd like to improve.
It takes a list of lists and a "target" element, and it returns the set
of the items in the lists that appear either before or after the target
item. (Actually, it's a generator, and I use the set class outside of
it to collect the unique items, but you get the idea. ;-) )
data = ,
,
,
|
by: Dave Dean |
last post by:
Hi all,
I'm looking for a way to iterate through a list, two (or more) items at a
time. Basically...
myList =
I'd like to be able to pull out two items at a time - simple examples would
be:
Create this output:
1 2
|
by: krisbee1983 |
last post by:
Hello to all,
I'm beginer in learning Python I wish somebody help me with solving
this problem. I would like to read all text files wchich are in some
folder. For this text files I need to make some word frequencies using
defined words like "buy", "red", "good". If some file don't have that
word will get "0" for this frequency. It shoud be...
|
by: Gilles Ganault |
last post by:
Hello
I'd like to make sure there isn't an easier way to extract all the
occurences found with re.finditer:
=======================
req = urllib2.Request(url, None, headers)
response = urllib2.urlopen(req).read()
matches = re.compile("(\d+).html").finditer(response)
# ----------- BEGIN
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
| |
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
|
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
| |
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |