473,383 Members | 1,798 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,383 software developers and data experts.

zipfile stupidly broken


To quote from zipfile.py (2.4 library):

# Search the last END_BLOCK bytes of the file for the record signature.
# The comment is appended to the ZIP file and has a 16 bit length.
# So the comment may be up to 64K long. We limit the search for the
# signature to a few Kbytes at the end of the file for efficiency.
# also, the signature must not appear in the comment.
END_BLOCK = min(filesize, 1024 * 4)

So the author knows that there's a hard limit of 64K on the comment
size, but feels it's more important to fail a little more quickly when
fed something that's not a zipfile - or a perfectly legitimate zipfile
that doesn't observe his ad-hoc 4K limitation. I don't have time to
find a gentler way to say it because I have to find a work around for
this arbitrary limit (1): this is stupid.
(1) the leading candidate is to copy and paste the whole frigging
zipfile module so I can patch it, but that's even uglier than it is
stupid. "This battery is pining for the fjords!"
Normally I despise being CC'd on a reply to list or group traffic, but
in this case it's probably necessary, as I haven't had time to keep up
with this place for several years. :-/

--
To be alive, is that not to be
again and again surprised? -- Nicholas van Rijn
May 16 '07 #1
15 2021
En Wed, 16 May 2007 12:18:35 -0300, Martin Maney <ma***@two14.net>
escribió:
So the author knows that there's a hard limit of 64K on the comment
size, but feels it's more important to fail a little more quickly when
fed something that's not a zipfile - or a perfectly legitimate zipfile
that doesn't observe his ad-hoc 4K limitation. I don't have time to
find a gentler way to say it because I have to find a work around for
this arbitrary limit (1): this is stupid.
This is not a good place for reporting bugs - use
http://sourceforge.net/bugs/?group_id=5470

--
Gabriel Genellina

May 16 '07 #2
Martin Maney wrote:
To quote from zipfile.py (2.4 library):

# Search the last END_BLOCK bytes of the file for the record signature.
# The comment is appended to the ZIP file and has a 16 bit length.
# So the comment may be up to 64K long. We limit the search for the
# signature to a few Kbytes at the end of the file for efficiency.
# also, the signature must not appear in the comment.
END_BLOCK = min(filesize, 1024 * 4)

So the author knows that there's a hard limit of 64K on the comment
size, but feels it's more important to fail a little more quickly when
fed something that's not a zipfile - or a perfectly legitimate zipfile
that doesn't observe his ad-hoc 4K limitation. I don't have time to
find a gentler way to say it because I have to find a work around for
this arbitrary limit (1): this is stupid.
(1) the leading candidate is to copy and paste the whole frigging
zipfile module so I can patch it, but that's even uglier than it is
stupid. "This battery is pining for the fjords!"
Normally I despise being CC'd on a reply to list or group traffic, but
in this case it's probably necessary, as I haven't had time to keep up
with this place for several years. :-/
Are you serious? A zipfile with a comment 4Kbytes. I've never encountered
such a beast.

As with any open source product it is much better to roll up your sleeves
and pitch in to fix a problem than to rail about "how it is stupidly
broken". You are welcome to submit a patch or at the very least a good
description of the problem and possible solutions. If you have gotten a
lot of value out of Python, you might consider this "giving back". You
haven't paid anything for the value it has provided.

-Larry
May 16 '07 #3
On May 17, 5:38 am, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
wrote:
This is not a good place for reporting bugs - use http://sourceforge.net/bugs/?group_id=5470
I disagree. Given that most suspected bugs aren't, new users
especially would be wise to post their "bugs' here before filing a bug
report.

May 17 '07 #4
En Wed, 16 May 2007 23:14:38 -0300, Asun Friere <af*****@yahoo.co.uk>
escribió:
On May 17, 5:38 am, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
wrote:
>This is not a good place for reporting bugs - use
http://sourceforge.net/bugs/?group_id=5470

I disagree. Given that most suspected bugs aren't, new users
especially would be wise to post their "bugs' here before filing a bug
report.
My first replies were auto censored. This was the most neutral answer I
could think of.
The original post was not a typical bug report.

--
Gabriel Genellina

May 17 '07 #5
Martin Maney <ma***@two14.netwrote:
To quote from zipfile.py (2.4 library):

# Search the last END_BLOCK bytes of the file for the record signature.
# The comment is appended to the ZIP file and has a 16 bit length.
# So the comment may be up to 64K long. We limit the search for the
# signature to a few Kbytes at the end of the file for efficiency.
# also, the signature must not appear in the comment.
END_BLOCK = min(filesize, 1024 * 4)

So the author knows that there's a hard limit of 64K on the comment
size, but feels it's more important to fail a little more quickly when
fed something that's not a zipfile - or a perfectly legitimate zipfile
that doesn't observe his ad-hoc 4K limitation. I don't have time to
find a gentler way to say it because I have to find a work around for
this arbitrary limit (1): this is stupid.
To search 64k for all zip files would slow down the opening of all zip
files whereas most zipfiles don't have comments.

The code in _EndRecData should probably read 1k first, and then retry
with 64k.
(1) the leading candidate is to copy and paste the whole frigging
zipfile module so I can patch it, but that's even uglier than it is
stupid. "This battery is pining for the fjords!"
You don't need to do that, you can just "monkey patch" the _EndRecData
function.

--
Nick Craig-Wood <ni**@craig-wood.com-- http://www.craig-wood.com/nick
May 18 '07 #6
Nick Craig-Wood <ni**@craig-wood.comwrote:
To search 64k for all zip files would slow down the opening of all zip
files whereas most zipfiles don't have comments.
No, actually it would only slow down for files which do have comments,
assuming I understand the code correctly. IME most zipfiles don't have
any comments at all, and would be unaffected. To be honest, if I had
even known that zipfiles could have comments before I ran into this,
I'd long since forgotten it.
You don't need to do that, you can just "monkey patch" the _EndRecData
function.
For a quick & dirty test, sure. If I were certain I'd only ever use
this on one machine for a limited time (viz, no system upgrades that
replace zipfile.py) it might suffice. But that doesn't generalize
worth a damn.

--
Education makes people easy to lead, but difficult to drive;
easy to govern, but impossible to enslave. -- Henry Peter Brougham
May 19 '07 #7
Larry Bates <la*********@websafe.combristled:
Are you serious? A zipfile with a comment 4Kbytes. I've never encountered
such a beast.
If I hadn't run into one I would never have had a clue that Python's
zipfile module had this silly bug.
As with any open source product it is much better to roll up your sleeves
and pitch in to fix a problem than to rail about "how it is stupidly
broken". You are welcome to submit a patch or at the very least a good
description of the problem and possible solutions. If you have gotten a
lot of value out of Python, you might consider this "giving back". You
haven't paid anything for the value it has provided.
Ah yes, the old "well, if you found it you should fix it" meme -
another reason I found it pretty easy to stop reading this group. It's
as stupid a position as it ever was (and FWIW I don't believe I've ever
seen any of the real Python developers mouth this crap).

Now, I have learned somewhat more than I knew (or ever wanted to know)
about zipfiles since I smacked headfirst into this bug, and I've
changed the subject line to reflect my current understanding. :-/ Back
then it had already occurred to me that *just* changing the size of the
step back seemed an incomplete fix: after all, that leaves you scanning
through random binary glop looking for the signature. With the
signature being four bytes, okay, it will *nearly* always work (just as
the exisiting 4K scan does), but... well, from what I've read in the
format specs that's about as good as it gets. The alternative, some
sort of backwards scan, would avoid the binary glop but has much the
same problem, in principle, with finding the signature embedded in the
archive comment. Even worse, arguably, since that comment is
apparently entirely up to the archive creator, so if there's a way to
use a fake central directory for nefarious purposes, that would make it
trivial to do. Which is the point where I decided that the file format
itself is broken... (oh, and then I came across something from the
info-zip crew that said much the same thing, though they didn't mention
this particular design, uhm, shortcoming.)

So I guess that perhaps the stupidly obvious fix:

- END_BLOCK = min(filesize, 1024 * 4)
+ END_BLOCK = min(filesize, 1024 * 64 + 22)

is after all about the best that can be done. (the lack of the
size-of-End-Of-Central-Directory-record in the existing code isn't a
separate bug, but if we're going to pretend we accomodate all valid
zipfiles it wouldn't do to overlook it)

So now you may imagine that your rudeness has had the result you
intended after all, and I guess it has, though at a cost - well, you
probably never cared what I thought about you anyway.

BTW, thanks for the pointer someone else gave to the proper place for
posting bugs. I'd had the silly idea that I would be able to find that
easily at www.python.org, but if I had then I'd not have posted here
and had so much fun.

--
The most effective way to get information from usenet is not to ask
a question; it is to post incorrect information. -- Aahz's Law

Apparently denigrating the bug reporter can sometimes result in a
patch, too, but I don't think that's in the same spirit.
May 19 '07 #8
En Sat, 19 May 2007 14:00:01 -0300, Martin Maney <ma***@two14.net>
escribió:
BTW, thanks for the pointer someone else gave to the proper place for
posting bugs. I'd had the silly idea that I would be able to find that
easily at www.python.org, but if I had then I'd not have posted here
and had so much fun.
My microwave oven doesn't work very well, it's rather new and I want it
fixed. I take the manual, go to the last pages, and find how to contact
the factory.
A module in the Python Standard Library has a bug. I take the Python
Library Reference manual, go to the last pages (Appendix B), and find how
to properly report a bug.

--
Gabriel Genellina

May 19 '07 #9
Gabriel Genellina <ga*******@yahoo.com.arwrote:
A module in the Python Standard Library has a bug. I take the Python
Library Reference manual, go to the last pages (Appendix B), and find how
to properly report a bug.
Sure, the information is *somewhere*. Silly me, I expected it to be
readily findable from the project's home page, as it usually is for open
source projects (and I thought I remembered it being so in the past -
y'know, before the web site got prettified and dumbed down). Is there
any good reason not to have it in lots of likely places, aside from the
opportunity to jeer at those who didn't look in the right one while
spending their time trying to report a problem?

Never mind, rhetorical question.

--
There is overwhelming evidence that the higher the level of self-esteem,
the more likely one will be to treat others with respect, kindness, and
generosity. -- Nathaniel Branden
May 20 '07 #10
Martin Maney <ma***@two14.netwrote:
Nick Craig-Wood <ni**@craig-wood.comwrote:
You don't need to do that, you can just "monkey patch" the _EndRecData
function.

For a quick & dirty test, sure. If I were certain I'd only ever use
this on one machine for a limited time (viz, no system upgrades that
replace zipfile.py) it might suffice. But that doesn't generalize
worth a damn.
From the above I don't think you've understood the concept of monkey
patching - it is run time patching. You patch the zipfile module from
your code - no messing with the installed python needed. Eg something
like :-

------------------------------------------------------------
import zipfile

OriginalEndRecData = zipfile._EndRecData

def MyEndRecData(fpin):
"""
Return data from the "End of Central Directory" record, or
None.
"""
# Try the builtin one first
endrec = OriginalEndRecData(fpin)
if endrec is None:
# didn't work so do something extra
# you fill this bit in!
pass
return endrec

zipfile._EndRecData = MyEndRecData

# Now use your run time patched zipfile module as normal
------------------------------------------------------------

It isn't ideal, but it certainly does generalise.

--
Nick Craig-Wood <ni**@craig-wood.com-- http://www.craig-wood.com/nick
May 20 '07 #11
* Gabriel Genellina (Wed, 16 May 2007 16:38:39 -0300)
En Wed, 16 May 2007 12:18:35 -0300, Martin Maney <ma***@two14.net>
escribió:
So the author knows that there's a hard limit of 64K on the comment
size, but feels it's more important to fail a little more quickly when
fed something that's not a zipfile - or a perfectly legitimate zipfile
that doesn't observe his ad-hoc 4K limitation. I don't have time to
find a gentler way to say it because I have to find a work around for
this arbitrary limit (1): this is stupid.

This is not a good place for reporting bugs - use
http://sourceforge.net/bugs/?group_id=5470
Actually it is:

,---
| Giving the shortage of reviewer time, invalid bug reports on tracker
| are a nuisance and a diversion from attending to valid reports and
| reviewing patches. That is why I encourage people to post here for
| community review.
`---

http://groups.google.com/group/comp....7b1906d3ef68a5
May 20 '07 #12
* Gabriel Genellina (Sat, 19 May 2007 18:09:06 -0300)
En Sat, 19 May 2007 14:00:01 -0300, Martin Maney <ma***@two14.net>
escribió:
BTW, thanks for the pointer someone else gave to the proper place for
posting bugs. I'd had the silly idea that I would be able to find that
easily at www.python.org, but if I had then I'd not have posted here
and had so much fun.

My microwave oven doesn't work very well, it's rather new and I want it
fixed. I take the manual, go to the last pages, and find how to contact
the factory.
A module in the Python Standard Library has a bug. I take the Python
Library Reference manual, go to the last pages (Appendix B), and find how
to properly report a bug.
Don't be silly. Where would you look for the URL to report bugs? On
the website of the project, of course. It's not that easy to find on
python.org (although not as hard as Martin says):

Core Development Links for Developers Bug Manager or

About Help Got a Python problem or question? Python Bug Tracker

Both ways are kind of misleading (or non-intuitive) as you do not want
to engage in Core Development to report a bug. Lots of good projects
have a prominent link on their website (start page) how to report
bugs. Python hasn't.

Thorsten
May 20 '07 #13
Thorsten Kampe wrote:
>
Don't be silly. Where would you look for the URL to report bugs? On
the website of the project, of course. It's not that easy to find on
python.org (although not as hard as Martin says):

Core Development Links for Developers Bug Manager or
This is the "in crowd" route.
About Help Got a Python problem or question? Python Bug Tracker
And this is the "it's not my fault, it's yours" route.
Both ways are kind of misleading (or non-intuitive) as you do not want
to engage in Core Development to report a bug. Lots of good projects
have a prominent link on their website (start page) how to report
bugs. Python hasn't.
Indeed. The big problem with python.org in its current form is the
navigation, as I have complained about already. Unfortunately, I never
did get round to tooling up with the python.org toolchain because it
involved installing large numbers of packages, including some directly
from a Subversion repository, along with a few which actually
conflicted with others on my system, and I wasn't about to start
either uninstalling lots of things or messing around with environment
settings just to throw it all together and make the tentative edits
necessary to reduce the above "beware of the leopard" syndrome. The
"last straw" was picking through Twisted 2 installation details for
the benefit of a solution which apparently doesn't even use Twisted in
any reasonable sense.

Meanwhile, the Wiki (that's Documentation Wiki) just keeps getting
better. A "best of" edition of that particular resource (with simple
approval mechanisms) might prove more accessible and more likely to
get improved by the community.

Paul

P.S. I still respect the work done on the python.org visuals - I think
they have mostly stood the test of time. And I don't envy anyone who
had the task of going through python.org and reorganising all the
pieces of content to still link to each other properly and look the
same as everything else.

May 20 '07 #14
* Paul Boddie (20 May 2007 08:36:18 -0700)
Thorsten Kampe wrote:
Don't be silly. Where would you look for the URL to report bugs? On
the website of the project, of course. It's not that easy to find on
python.org (although not as hard as Martin says):

Core Development Links for Developers Bug Manager or

This is the "in crowd" route.
Hehe
About Help Got a Python problem or question? Python Bug Tracker

And this is the "it's not my fault, it's yours" route.
Hehe, it's /never/ my fault, actually ;)
Both ways are kind of misleading (or non-intuitive) as you do not want
to engage in Core Development to report a bug. Lots of good projects
have a prominent link on their website (start page) how to report
bugs. Python hasn't.

Indeed. The big problem with python.org in its current form is the
navigation, as I have complained about already.
Yeah, probably. But for me there is no doubt that the website's new
appearance looks much better, cleaner, more professional and visually
pleasing than the previous one.

Thorsten
May 20 '07 #15
On Sat, 19 May 2007 17:00:01 +0000 (UTC), Martin Maney <ma***@two14.netwrote:
....
posted here and had so much fun.
Apparently I don't speak for most readers here, but I had fun too.

Smart, reasonable people write braindead code all the time. I think
it's fine when people whine about that once in a while, as long as
it's done in an entertaining manner. Less constructive than writing,
testing and submitting a patch, but more constructive than slapping
your monitor, cursing and doing nothing at all about it.

/Jorgen

--
// Jorgen Grahn <grahn@ Ph'nglui mglw'nafh Cthulhu
\X/ snipabacken.dyndns.org R'lyeh wgah'nagl fhtagn!
May 21 '07 #16

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
by: Tung Wai Yip | last post by:
Can I add empty directory using zipfile? When I try to add a directory it complains that it is not a file. tung
1
by: Waitman Gobble | last post by:
Hello, I am new to Python. I am having trouble with zipfile.py. On a Linux machine with python 2.4.2 I have trouble opening a zipfile. Python is complaining about the bit where it does a...
11
by: Hari Sekhon | last post by:
I do import zipfile zip=zipfile.ZipFile('d:\somepath\cdimage.zip') zip.namelist() then either of the two: A) file('someimage.iso','w').write(zip.read('someimage.iso'))
3
bvdet
by: bvdet | last post by:
Following is an example that may provide a solution to you: """ Function makeArchive is a wrapper for the Python class zipfile.ZipFile 'fileList' is a list of file names - full path each name...
8
by: =?utf-8?B?5Lq66KiA6JC95pel5piv5aSp5rav77yM5pyb5p6B | last post by:
I made a C/S network program, the client receive the zip file from the server, and read the data into a variable. how could I process the zipfile directly without saving it into file. In the...
5
by: Martin | last post by:
I get below error when trying to write unicode xml to a zipfile. zip.writestr('content.xml', content.toxml()) File "/usr/lib/python2.4/zipfile.py", line 460, in writestr zinfo.CRC =...
3
by: towers | last post by:
Hi I'm probably doing something stupid but I've run into a problem whereby I'm trying to add a csv file to a zip archive - see example code below. The csv just has several rows with carriage...
5
by: Neil Crighton | last post by:
I'm using the zipfile library to read a zip file in Windows, and it seems to be adding too many newlines to extracted files. I've found that for extracted text-encoded files, removing all instances...
1
by: John Machin | last post by:
On Jun 4, 8:06 pm, jwesonga <crazylun...@gmail.comwrote: Nothing is ever as it seems. Let's try to work backwards from the error message ... and we don't need your magnificent script, just the...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.