473,809 Members | 2,703 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

itertools.izip brokeness

The code below should be pretty self-explanatory.
I want to read two files in parallel, so that I
can print corresponding lines from each, side by
side. itertools.izip( ) seems the obvious way
to do this.

izip() will stop interating when it reaches the
end of the shortest file. I don't know how to
tell which file was exhausted so I just try printing
them both. The exhausted one will generate a
StopInteration, the other will continue to be
iterable.

The problem is that sometimes, depending on which
file is the shorter, a line ends up missing,
appearing neither in the izip() output, or in
the subsequent direct file iteration. I would
guess that it was in izip's buffer when izip
terminates due to the exception on the other file.

This behavior seems plain out broken, especially
because it is dependent on order of izip's
arguments, and not documented anywhere I saw.
It makes using izip() for iterating files in
parallel essentially useless (unless you are
lucky enough to have files of the same length).

Also, it seems to me that this is likely a problem
with any iterables with different lengths.
I am hoping I am missing something...

#---------------------------------------------------------
# Task: print contents of file1 in column 1, and
# contents of file2 in column two. iterators and
# izip() are the "obvious" way to do it.

from itertools import izip
import cStringIO, pdb

def prt_files (file1, file2):

for line1, line2 in izip (file1, file2):
print line1.rstrip(), "\t", line2.rstrip()

try:
for line1 in file1:
print line1,
except StopIteration: pass

try:
for line2 in file2:
print "\t",line2,
except StopIteration: pass

if __name__ == "__main__":
# Use StringIO to simulate files. Real files
# show the same behavior.
f = cStringIO.Strin gIO

print "Two files with same number of lines work ok."
prt_files (f("abc\nde\nfg h\n"), f("xyz\nwv\nstu \n"))

print "\nFirst file shorter is also ok."
prt_files (f("abc\nde\n") , f("xyz\nwv\nstu \n"))

print "\nSecond file shorter is a problem."
prt_files (f("abc\nde\nfg h\n"), f("xyz\nwv\n" ))
print "What happened to \"fgh\" line that should be in column
1?"

print "\nBut only a problem for one line."
prt_files (f("abc\nde\nfg h\nijk\nlm\n"), f("xyz\nwv\n" ))
print "The line \"fgh\" is still missing, but following\n" \
"line(s) are ok! Looks like izip() ate a line."

Jan 3 '06
41 2687
On 9 Jan 2006 08:19:21 GMT, Antoon Pardon <ap*****@forel. vub.ac.be> wrote:
Op 2006-01-05, Bengt Richter schreef <bo**@oz.net> :
On 5 Jan 2006 15:48:26 GMT, Antoon Pardon <ap*****@forel. vub.ac.be> wrote: [...]
But you can fix that (only test is what you see ;-) :
Maybe, but not with this version.
>>> from itertools import repeat, chain, izip
>>> it = iter(lambda z=izip(chain([3,5,8],repeat("Bye")) , chain([11,22],repeat("Bye")) ):z.next(), ("Bye","Bye" ))
>>> for t in it: print t

...
(3, 11)
(5, 22)
(8, 'Bye')

(Feel free to generalize ;-)


The problem with this version is that it will stop if for some reason
each iterable contains a 'Bye' at the same place. Now this may seem
far fetched at first. But consider that if data is collected from

ISTM the job of choosing an appropriate sentinel involves making
that not only far fetched but well-nigh impossible ;-)
experiments certain values may be missing. This can be indicated
by a special "Missing Data" value in an iterable. But this "Missing
Data" value would also be the prime canidate for a fill parameter
when an iterable is exhausted.

ISTM that confuses "missing data" with "end of data stream."
I assumed your choice of terminating sentinel ("Bye") would not have
that problem ;-)

Regards,
Bengt Richter
Jan 10 '06 #41
Op 2006-01-10, Bengt Richter schreef <bo**@oz.net> :
On 9 Jan 2006 08:19:21 GMT, Antoon Pardon <ap*****@forel. vub.ac.be> wrote:
Op 2006-01-05, Bengt Richter schreef <bo**@oz.net> :
On 5 Jan 2006 15:48:26 GMT, Antoon Pardon <ap*****@forel. vub.ac.be> wrote: [...] But you can fix that (only test is what you see ;-) :
Maybe, but not with this version.
>>> from itertools import repeat, chain, izip
>>> it = iter(lambda z=izip(chain([3,5,8],repeat("Bye")) , chain([11,22],repeat("Bye")) ):z.next(), ("Bye","Bye" ))
>>> for t in it: print t
...
(3, 11)
(5, 22)
(8, 'Bye')

(Feel free to generalize ;-)


The problem with this version is that it will stop if for some reason
each iterable contains a 'Bye' at the same place. Now this may seem
far fetched at first. But consider that if data is collected from

ISTM the job of choosing an appropriate sentinel involves making
that not only far fetched but well-nigh impossible ;-)

experiments certain values may be missing. This can be indicated
by a special "Missing Data" value in an iterable. But this "Missing
Data" value would also be the prime canidate for a fill parameter
when an iterable is exhausted.

ISTM that confuses "missing data" with "end of data stream."


"end of data stream" implies "missing data". If I'm doing experiments
with a number of materials under a number of tempertures and I want
to compare how copper, iron and lead behaved then when I compare
the results for 400 K and there is no data for lead, I don't care
whether that is because the measurement for 400K was somehow
lost or unsuable or because they stopped the lead measurements at 350K.

It all boils down to no data for lead at 400K, there is no need that
the processing unit differentiates beteen the different reasons for
the missing data. That difference is only usefull for the loop control.
I assumed your choice of terminating sentinel ("Bye") would not have
that problem ;-)


That is true, but what is adequate in one situation doesn't need to
be adequate in general.

--
Antoon Pardon
Jan 10 '06 #42

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
1745
by: Robert Brewer | last post by:
def warehouse(stock, factory=None): """warehouse(stock, factory=None) -> iavailable, iremainder. Iterate over stock, yielding each value. Once the 'stock' sequence is exhausted, the factory function (or any callable, such as a class) is called to produce a new valid object upon each subsequent call to next().
18
2640
by: Ville Vainio | last post by:
For quick-and-dirty stuff, it's often convenient to flatten a sequence (which perl does, surprise surprise, by default): ]]] -> One such implementation is at http://aspn.activestate.com/ASPN/Mail/Message/python-tutor/2302348
21
2191
by: Steven Bethard | last post by:
Jack Diederich wrote: > > itertools to iter transition, huh? I slipped that one in, I mentioned > it to Raymond at PyCon and he didn't flinch. It would be nice not to > have to sprinkle 'import itertools as it' in code. iter could also > become a type wrapper instead of a function, so an iter instance could > be a wrapper that figures out whether to call .next or __getitem__ > depending on it's argument. > for item in...
0
10637
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10376
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10379
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10115
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9199
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
6881
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5687
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
3861
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
3014
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.