By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,908 Members | 1,945 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,908 IT Pros & Developers. It's quick & easy.

zipfile stupidly broken

P: n/a

To quote from zipfile.py (2.4 library):

# Search the last END_BLOCK bytes of the file for the record signature.
# The comment is appended to the ZIP file and has a 16 bit length.
# So the comment may be up to 64K long. We limit the search for the
# signature to a few Kbytes at the end of the file for efficiency.
# also, the signature must not appear in the comment.
END_BLOCK = min(filesize, 1024 * 4)

So the author knows that there's a hard limit of 64K on the comment
size, but feels it's more important to fail a little more quickly when
fed something that's not a zipfile - or a perfectly legitimate zipfile
that doesn't observe his ad-hoc 4K limitation. I don't have time to
find a gentler way to say it because I have to find a work around for
this arbitrary limit (1): this is stupid.
(1) the leading candidate is to copy and paste the whole frigging
zipfile module so I can patch it, but that's even uglier than it is
stupid. "This battery is pining for the fjords!"
Normally I despise being CC'd on a reply to list or group traffic, but
in this case it's probably necessary, as I haven't had time to keep up
with this place for several years. :-/

--
To be alive, is that not to be
again and again surprised? -- Nicholas van Rijn
May 16 '07 #1
Share this Question
Share on Google+
15 Replies


P: n/a
En Wed, 16 May 2007 12:18:35 -0300, Martin Maney <ma***@two14.net>
escribió:
So the author knows that there's a hard limit of 64K on the comment
size, but feels it's more important to fail a little more quickly when
fed something that's not a zipfile - or a perfectly legitimate zipfile
that doesn't observe his ad-hoc 4K limitation. I don't have time to
find a gentler way to say it because I have to find a work around for
this arbitrary limit (1): this is stupid.
This is not a good place for reporting bugs - use
http://sourceforge.net/bugs/?group_id=5470

--
Gabriel Genellina

May 16 '07 #2

P: n/a
Martin Maney wrote:
To quote from zipfile.py (2.4 library):

# Search the last END_BLOCK bytes of the file for the record signature.
# The comment is appended to the ZIP file and has a 16 bit length.
# So the comment may be up to 64K long. We limit the search for the
# signature to a few Kbytes at the end of the file for efficiency.
# also, the signature must not appear in the comment.
END_BLOCK = min(filesize, 1024 * 4)

So the author knows that there's a hard limit of 64K on the comment
size, but feels it's more important to fail a little more quickly when
fed something that's not a zipfile - or a perfectly legitimate zipfile
that doesn't observe his ad-hoc 4K limitation. I don't have time to
find a gentler way to say it because I have to find a work around for
this arbitrary limit (1): this is stupid.
(1) the leading candidate is to copy and paste the whole frigging
zipfile module so I can patch it, but that's even uglier than it is
stupid. "This battery is pining for the fjords!"
Normally I despise being CC'd on a reply to list or group traffic, but
in this case it's probably necessary, as I haven't had time to keep up
with this place for several years. :-/
Are you serious? A zipfile with a comment 4Kbytes. I've never encountered
such a beast.

As with any open source product it is much better to roll up your sleeves
and pitch in to fix a problem than to rail about "how it is stupidly
broken". You are welcome to submit a patch or at the very least a good
description of the problem and possible solutions. If you have gotten a
lot of value out of Python, you might consider this "giving back". You
haven't paid anything for the value it has provided.

-Larry
May 16 '07 #3

P: n/a
On May 17, 5:38 am, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
wrote:
This is not a good place for reporting bugs - use http://sourceforge.net/bugs/?group_id=5470
I disagree. Given that most suspected bugs aren't, new users
especially would be wise to post their "bugs' here before filing a bug
report.

May 17 '07 #4

P: n/a
En Wed, 16 May 2007 23:14:38 -0300, Asun Friere <af*****@yahoo.co.uk>
escribió:
On May 17, 5:38 am, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
wrote:
>This is not a good place for reporting bugs - use
http://sourceforge.net/bugs/?group_id=5470

I disagree. Given that most suspected bugs aren't, new users
especially would be wise to post their "bugs' here before filing a bug
report.
My first replies were auto censored. This was the most neutral answer I
could think of.
The original post was not a typical bug report.

--
Gabriel Genellina

May 17 '07 #5

P: n/a
Martin Maney <ma***@two14.netwrote:
To quote from zipfile.py (2.4 library):

# Search the last END_BLOCK bytes of the file for the record signature.
# The comment is appended to the ZIP file and has a 16 bit length.
# So the comment may be up to 64K long. We limit the search for the
# signature to a few Kbytes at the end of the file for efficiency.
# also, the signature must not appear in the comment.
END_BLOCK = min(filesize, 1024 * 4)

So the author knows that there's a hard limit of 64K on the comment
size, but feels it's more important to fail a little more quickly when
fed something that's not a zipfile - or a perfectly legitimate zipfile
that doesn't observe his ad-hoc 4K limitation. I don't have time to
find a gentler way to say it because I have to find a work around for
this arbitrary limit (1): this is stupid.
To search 64k for all zip files would slow down the opening of all zip
files whereas most zipfiles don't have comments.

The code in _EndRecData should probably read 1k first, and then retry
with 64k.
(1) the leading candidate is to copy and paste the whole frigging
zipfile module so I can patch it, but that's even uglier than it is
stupid. "This battery is pining for the fjords!"
You don't need to do that, you can just "monkey patch" the _EndRecData
function.

--
Nick Craig-Wood <ni**@craig-wood.com-- http://www.craig-wood.com/nick
May 18 '07 #6

P: n/a
Nick Craig-Wood <ni**@craig-wood.comwrote:
To search 64k for all zip files would slow down the opening of all zip
files whereas most zipfiles don't have comments.
No, actually it would only slow down for files which do have comments,
assuming I understand the code correctly. IME most zipfiles don't have
any comments at all, and would be unaffected. To be honest, if I had
even known that zipfiles could have comments before I ran into this,
I'd long since forgotten it.
You don't need to do that, you can just "monkey patch" the _EndRecData
function.
For a quick & dirty test, sure. If I were certain I'd only ever use
this on one machine for a limited time (viz, no system upgrades that
replace zipfile.py) it might suffice. But that doesn't generalize
worth a damn.

--
Education makes people easy to lead, but difficult to drive;
easy to govern, but impossible to enslave. -- Henry Peter Brougham
May 19 '07 #7

P: n/a
Larry Bates <la*********@websafe.combristled:
Are you serious? A zipfile with a comment 4Kbytes. I've never encountered
such a beast.
If I hadn't run into one I would never have had a clue that Python's
zipfile module had this silly bug.
As with any open source product it is much better to roll up your sleeves
and pitch in to fix a problem than to rail about "how it is stupidly
broken". You are welcome to submit a patch or at the very least a good
description of the problem and possible solutions. If you have gotten a
lot of value out of Python, you might consider this "giving back". You
haven't paid anything for the value it has provided.
Ah yes, the old "well, if you found it you should fix it" meme -
another reason I found it pretty easy to stop reading this group. It's
as stupid a position as it ever was (and FWIW I don't believe I've ever
seen any of the real Python developers mouth this crap).

Now, I have learned somewhat more than I knew (or ever wanted to know)
about zipfiles since I smacked headfirst into this bug, and I've
changed the subject line to reflect my current understanding. :-/ Back
then it had already occurred to me that *just* changing the size of the
step back seemed an incomplete fix: after all, that leaves you scanning
through random binary glop looking for the signature. With the
signature being four bytes, okay, it will *nearly* always work (just as
the exisiting 4K scan does), but... well, from what I've read in the
format specs that's about as good as it gets. The alternative, some
sort of backwards scan, would avoid the binary glop but has much the
same problem, in principle, with finding the signature embedded in the
archive comment. Even worse, arguably, since that comment is
apparently entirely up to the archive creator, so if there's a way to
use a fake central directory for nefarious purposes, that would make it
trivial to do. Which is the point where I decided that the file format
itself is broken... (oh, and then I came across something from the
info-zip crew that said much the same thing, though they didn't mention
this particular design, uhm, shortcoming.)

So I guess that perhaps the stupidly obvious fix:

- END_BLOCK = min(filesize, 1024 * 4)
+ END_BLOCK = min(filesize, 1024 * 64 + 22)

is after all about the best that can be done. (the lack of the
size-of-End-Of-Central-Directory-record in the existing code isn't a
separate bug, but if we're going to pretend we accomodate all valid
zipfiles it wouldn't do to overlook it)

So now you may imagine that your rudeness has had the result you
intended after all, and I guess it has, though at a cost - well, you
probably never cared what I thought about you anyway.

BTW, thanks for the pointer someone else gave to the proper place for
posting bugs. I'd had the silly idea that I would be able to find that
easily at www.python.org, but if I had then I'd not have posted here
and had so much fun.

--
The most effective way to get information from usenet is not to ask
a question; it is to post incorrect information. -- Aahz's Law

Apparently denigrating the bug reporter can sometimes result in a
patch, too, but I don't think that's in the same spirit.
May 19 '07 #8

P: n/a
En Sat, 19 May 2007 14:00:01 -0300, Martin Maney <ma***@two14.net>
escribió:
BTW, thanks for the pointer someone else gave to the proper place for
posting bugs. I'd had the silly idea that I would be able to find that
easily at www.python.org, but if I had then I'd not have posted here
and had so much fun.
My microwave oven doesn't work very well, it's rather new and I want it
fixed. I take the manual, go to the last pages, and find how to contact
the factory.
A module in the Python Standard Library has a bug. I take the Python
Library Reference manual, go to the last pages (Appendix B), and find how
to properly report a bug.

--
Gabriel Genellina

May 19 '07 #9

P: n/a
Gabriel Genellina <ga*******@yahoo.com.arwrote:
A module in the Python Standard Library has a bug. I take the Python
Library Reference manual, go to the last pages (Appendix B), and find how
to properly report a bug.
Sure, the information is *somewhere*. Silly me, I expected it to be
readily findable from the project's home page, as it usually is for open
source projects (and I thought I remembered it being so in the past -
y'know, before the web site got prettified and dumbed down). Is there
any good reason not to have it in lots of likely places, aside from the
opportunity to jeer at those who didn't look in the right one while
spending their time trying to report a problem?

Never mind, rhetorical question.

--
There is overwhelming evidence that the higher the level of self-esteem,
the more likely one will be to treat others with respect, kindness, and
generosity. -- Nathaniel Branden
May 20 '07 #10

P: n/a
Martin Maney <ma***@two14.netwrote:
Nick Craig-Wood <ni**@craig-wood.comwrote:
You don't need to do that, you can just "monkey patch" the _EndRecData
function.

For a quick & dirty test, sure. If I were certain I'd only ever use
this on one machine for a limited time (viz, no system upgrades that
replace zipfile.py) it might suffice. But that doesn't generalize
worth a damn.
From the above I don't think you've understood the concept of monkey
patching - it is run time patching. You patch the zipfile module from
your code - no messing with the installed python needed. Eg something
like :-

------------------------------------------------------------
import zipfile

OriginalEndRecData = zipfile._EndRecData

def MyEndRecData(fpin):
"""
Return data from the "End of Central Directory" record, or
None.
"""
# Try the builtin one first
endrec = OriginalEndRecData(fpin)
if endrec is None:
# didn't work so do something extra
# you fill this bit in!
pass
return endrec

zipfile._EndRecData = MyEndRecData

# Now use your run time patched zipfile module as normal
------------------------------------------------------------

It isn't ideal, but it certainly does generalise.

--
Nick Craig-Wood <ni**@craig-wood.com-- http://www.craig-wood.com/nick
May 20 '07 #11

P: n/a
* Gabriel Genellina (Wed, 16 May 2007 16:38:39 -0300)
En Wed, 16 May 2007 12:18:35 -0300, Martin Maney <ma***@two14.net>
escribió:
So the author knows that there's a hard limit of 64K on the comment
size, but feels it's more important to fail a little more quickly when
fed something that's not a zipfile - or a perfectly legitimate zipfile
that doesn't observe his ad-hoc 4K limitation. I don't have time to
find a gentler way to say it because I have to find a work around for
this arbitrary limit (1): this is stupid.

This is not a good place for reporting bugs - use
http://sourceforge.net/bugs/?group_id=5470
Actually it is:

,---
| Giving the shortage of reviewer time, invalid bug reports on tracker
| are a nuisance and a diversion from attending to valid reports and
| reviewing patches. That is why I encourage people to post here for
| community review.
`---

http://groups.google.com/group/comp....7b1906d3ef68a5
May 20 '07 #12

P: n/a
* Gabriel Genellina (Sat, 19 May 2007 18:09:06 -0300)
En Sat, 19 May 2007 14:00:01 -0300, Martin Maney <ma***@two14.net>
escribió:
BTW, thanks for the pointer someone else gave to the proper place for
posting bugs. I'd had the silly idea that I would be able to find that
easily at www.python.org, but if I had then I'd not have posted here
and had so much fun.

My microwave oven doesn't work very well, it's rather new and I want it
fixed. I take the manual, go to the last pages, and find how to contact
the factory.
A module in the Python Standard Library has a bug. I take the Python
Library Reference manual, go to the last pages (Appendix B), and find how
to properly report a bug.
Don't be silly. Where would you look for the URL to report bugs? On
the website of the project, of course. It's not that easy to find on
python.org (although not as hard as Martin says):

Core Development Links for Developers Bug Manager or

About Help Got a Python problem or question? Python Bug Tracker

Both ways are kind of misleading (or non-intuitive) as you do not want
to engage in Core Development to report a bug. Lots of good projects
have a prominent link on their website (start page) how to report
bugs. Python hasn't.

Thorsten
May 20 '07 #13

P: n/a
Thorsten Kampe wrote:
>
Don't be silly. Where would you look for the URL to report bugs? On
the website of the project, of course. It's not that easy to find on
python.org (although not as hard as Martin says):

Core Development Links for Developers Bug Manager or
This is the "in crowd" route.
About Help Got a Python problem or question? Python Bug Tracker
And this is the "it's not my fault, it's yours" route.
Both ways are kind of misleading (or non-intuitive) as you do not want
to engage in Core Development to report a bug. Lots of good projects
have a prominent link on their website (start page) how to report
bugs. Python hasn't.
Indeed. The big problem with python.org in its current form is the
navigation, as I have complained about already. Unfortunately, I never
did get round to tooling up with the python.org toolchain because it
involved installing large numbers of packages, including some directly
from a Subversion repository, along with a few which actually
conflicted with others on my system, and I wasn't about to start
either uninstalling lots of things or messing around with environment
settings just to throw it all together and make the tentative edits
necessary to reduce the above "beware of the leopard" syndrome. The
"last straw" was picking through Twisted 2 installation details for
the benefit of a solution which apparently doesn't even use Twisted in
any reasonable sense.

Meanwhile, the Wiki (that's Documentation Wiki) just keeps getting
better. A "best of" edition of that particular resource (with simple
approval mechanisms) might prove more accessible and more likely to
get improved by the community.

Paul

P.S. I still respect the work done on the python.org visuals - I think
they have mostly stood the test of time. And I don't envy anyone who
had the task of going through python.org and reorganising all the
pieces of content to still link to each other properly and look the
same as everything else.

May 20 '07 #14

P: n/a
* Paul Boddie (20 May 2007 08:36:18 -0700)
Thorsten Kampe wrote:
Don't be silly. Where would you look for the URL to report bugs? On
the website of the project, of course. It's not that easy to find on
python.org (although not as hard as Martin says):

Core Development Links for Developers Bug Manager or

This is the "in crowd" route.
Hehe
About Help Got a Python problem or question? Python Bug Tracker

And this is the "it's not my fault, it's yours" route.
Hehe, it's /never/ my fault, actually ;)
Both ways are kind of misleading (or non-intuitive) as you do not want
to engage in Core Development to report a bug. Lots of good projects
have a prominent link on their website (start page) how to report
bugs. Python hasn't.

Indeed. The big problem with python.org in its current form is the
navigation, as I have complained about already.
Yeah, probably. But for me there is no doubt that the website's new
appearance looks much better, cleaner, more professional and visually
pleasing than the previous one.

Thorsten
May 20 '07 #15

P: n/a
On Sat, 19 May 2007 17:00:01 +0000 (UTC), Martin Maney <ma***@two14.netwrote:
....
posted here and had so much fun.
Apparently I don't speak for most readers here, but I had fun too.

Smart, reasonable people write braindead code all the time. I think
it's fine when people whine about that once in a while, as long as
it's done in an entertaining manner. Less constructive than writing,
testing and submitting a patch, but more constructive than slapping
your monitor, cursing and doing nothing at all about it.

/Jorgen

--
// Jorgen Grahn <grahn@ Ph'nglui mglw'nafh Cthulhu
\X/ snipabacken.dyndns.org R'lyeh wgah'nagl fhtagn!
May 21 '07 #16

This discussion thread is closed

Replies have been disabled for this discussion.