473,804 Members | 2,989 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

zipfile stupidly broken


To quote from zipfile.py (2.4 library):

# Search the last END_BLOCK bytes of the file for the record signature.
# The comment is appended to the ZIP file and has a 16 bit length.
# So the comment may be up to 64K long. We limit the search for the
# signature to a few Kbytes at the end of the file for efficiency.
# also, the signature must not appear in the comment.
END_BLOCK = min(filesize, 1024 * 4)

So the author knows that there's a hard limit of 64K on the comment
size, but feels it's more important to fail a little more quickly when
fed something that's not a zipfile - or a perfectly legitimate zipfile
that doesn't observe his ad-hoc 4K limitation. I don't have time to
find a gentler way to say it because I have to find a work around for
this arbitrary limit (1): this is stupid.
(1) the leading candidate is to copy and paste the whole frigging
zipfile module so I can patch it, but that's even uglier than it is
stupid. "This battery is pining for the fjords!"
Normally I despise being CC'd on a reply to list or group traffic, but
in this case it's probably necessary, as I haven't had time to keep up
with this place for several years. :-/

--
To be alive, is that not to be
again and again surprised? -- Nicholas van Rijn
May 16 '07 #1
15 2050
En Wed, 16 May 2007 12:18:35 -0300, Martin Maney <ma***@two14.ne t>
escribió:
So the author knows that there's a hard limit of 64K on the comment
size, but feels it's more important to fail a little more quickly when
fed something that's not a zipfile - or a perfectly legitimate zipfile
that doesn't observe his ad-hoc 4K limitation. I don't have time to
find a gentler way to say it because I have to find a work around for
this arbitrary limit (1): this is stupid.
This is not a good place for reporting bugs - use
http://sourceforge.net/bugs/?group_id=5470

--
Gabriel Genellina

May 16 '07 #2
Martin Maney wrote:
To quote from zipfile.py (2.4 library):

# Search the last END_BLOCK bytes of the file for the record signature.
# The comment is appended to the ZIP file and has a 16 bit length.
# So the comment may be up to 64K long. We limit the search for the
# signature to a few Kbytes at the end of the file for efficiency.
# also, the signature must not appear in the comment.
END_BLOCK = min(filesize, 1024 * 4)

So the author knows that there's a hard limit of 64K on the comment
size, but feels it's more important to fail a little more quickly when
fed something that's not a zipfile - or a perfectly legitimate zipfile
that doesn't observe his ad-hoc 4K limitation. I don't have time to
find a gentler way to say it because I have to find a work around for
this arbitrary limit (1): this is stupid.
(1) the leading candidate is to copy and paste the whole frigging
zipfile module so I can patch it, but that's even uglier than it is
stupid. "This battery is pining for the fjords!"
Normally I despise being CC'd on a reply to list or group traffic, but
in this case it's probably necessary, as I haven't had time to keep up
with this place for several years. :-/
Are you serious? A zipfile with a comment 4Kbytes. I've never encountered
such a beast.

As with any open source product it is much better to roll up your sleeves
and pitch in to fix a problem than to rail about "how it is stupidly
broken". You are welcome to submit a patch or at the very least a good
description of the problem and possible solutions. If you have gotten a
lot of value out of Python, you might consider this "giving back". You
haven't paid anything for the value it has provided.

-Larry
May 16 '07 #3
On May 17, 5:38 am, "Gabriel Genellina" <gagsl-...@yahoo.com.a r>
wrote:
This is not a good place for reporting bugs - use http://sourceforge.net/bugs/?group_id=5470
I disagree. Given that most suspected bugs aren't, new users
especially would be wise to post their "bugs' here before filing a bug
report.

May 17 '07 #4
En Wed, 16 May 2007 23:14:38 -0300, Asun Friere <af*****@yahoo. co.uk>
escribió:
On May 17, 5:38 am, "Gabriel Genellina" <gagsl-...@yahoo.com.a r>
wrote:
>This is not a good place for reporting bugs - use
http://sourceforge.net/bugs/?group_id=5470

I disagree. Given that most suspected bugs aren't, new users
especially would be wise to post their "bugs' here before filing a bug
report.
My first replies were auto censored. This was the most neutral answer I
could think of.
The original post was not a typical bug report.

--
Gabriel Genellina

May 17 '07 #5
Martin Maney <ma***@two14.ne twrote:
To quote from zipfile.py (2.4 library):

# Search the last END_BLOCK bytes of the file for the record signature.
# The comment is appended to the ZIP file and has a 16 bit length.
# So the comment may be up to 64K long. We limit the search for the
# signature to a few Kbytes at the end of the file for efficiency.
# also, the signature must not appear in the comment.
END_BLOCK = min(filesize, 1024 * 4)

So the author knows that there's a hard limit of 64K on the comment
size, but feels it's more important to fail a little more quickly when
fed something that's not a zipfile - or a perfectly legitimate zipfile
that doesn't observe his ad-hoc 4K limitation. I don't have time to
find a gentler way to say it because I have to find a work around for
this arbitrary limit (1): this is stupid.
To search 64k for all zip files would slow down the opening of all zip
files whereas most zipfiles don't have comments.

The code in _EndRecData should probably read 1k first, and then retry
with 64k.
(1) the leading candidate is to copy and paste the whole frigging
zipfile module so I can patch it, but that's even uglier than it is
stupid. "This battery is pining for the fjords!"
You don't need to do that, you can just "monkey patch" the _EndRecData
function.

--
Nick Craig-Wood <ni**@craig-wood.com-- http://www.craig-wood.com/nick
May 18 '07 #6
Nick Craig-Wood <ni**@craig-wood.comwrote:
To search 64k for all zip files would slow down the opening of all zip
files whereas most zipfiles don't have comments.
No, actually it would only slow down for files which do have comments,
assuming I understand the code correctly. IME most zipfiles don't have
any comments at all, and would be unaffected. To be honest, if I had
even known that zipfiles could have comments before I ran into this,
I'd long since forgotten it.
You don't need to do that, you can just "monkey patch" the _EndRecData
function.
For a quick & dirty test, sure. If I were certain I'd only ever use
this on one machine for a limited time (viz, no system upgrades that
replace zipfile.py) it might suffice. But that doesn't generalize
worth a damn.

--
Education makes people easy to lead, but difficult to drive;
easy to govern, but impossible to enslave. -- Henry Peter Brougham
May 19 '07 #7
Larry Bates <la*********@we bsafe.combristl ed:
Are you serious? A zipfile with a comment 4Kbytes. I've never encountered
such a beast.
If I hadn't run into one I would never have had a clue that Python's
zipfile module had this silly bug.
As with any open source product it is much better to roll up your sleeves
and pitch in to fix a problem than to rail about "how it is stupidly
broken". You are welcome to submit a patch or at the very least a good
description of the problem and possible solutions. If you have gotten a
lot of value out of Python, you might consider this "giving back". You
haven't paid anything for the value it has provided.
Ah yes, the old "well, if you found it you should fix it" meme -
another reason I found it pretty easy to stop reading this group. It's
as stupid a position as it ever was (and FWIW I don't believe I've ever
seen any of the real Python developers mouth this crap).

Now, I have learned somewhat more than I knew (or ever wanted to know)
about zipfiles since I smacked headfirst into this bug, and I've
changed the subject line to reflect my current understanding. :-/ Back
then it had already occurred to me that *just* changing the size of the
step back seemed an incomplete fix: after all, that leaves you scanning
through random binary glop looking for the signature. With the
signature being four bytes, okay, it will *nearly* always work (just as
the exisiting 4K scan does), but... well, from what I've read in the
format specs that's about as good as it gets. The alternative, some
sort of backwards scan, would avoid the binary glop but has much the
same problem, in principle, with finding the signature embedded in the
archive comment. Even worse, arguably, since that comment is
apparently entirely up to the archive creator, so if there's a way to
use a fake central directory for nefarious purposes, that would make it
trivial to do. Which is the point where I decided that the file format
itself is broken... (oh, and then I came across something from the
info-zip crew that said much the same thing, though they didn't mention
this particular design, uhm, shortcoming.)

So I guess that perhaps the stupidly obvious fix:

- END_BLOCK = min(filesize, 1024 * 4)
+ END_BLOCK = min(filesize, 1024 * 64 + 22)

is after all about the best that can be done. (the lack of the
size-of-End-Of-Central-Directory-record in the existing code isn't a
separate bug, but if we're going to pretend we accomodate all valid
zipfiles it wouldn't do to overlook it)

So now you may imagine that your rudeness has had the result you
intended after all, and I guess it has, though at a cost - well, you
probably never cared what I thought about you anyway.

BTW, thanks for the pointer someone else gave to the proper place for
posting bugs. I'd had the silly idea that I would be able to find that
easily at www.python.org, but if I had then I'd not have posted here
and had so much fun.

--
The most effective way to get information from usenet is not to ask
a question; it is to post incorrect information. -- Aahz's Law

Apparently denigrating the bug reporter can sometimes result in a
patch, too, but I don't think that's in the same spirit.
May 19 '07 #8
En Sat, 19 May 2007 14:00:01 -0300, Martin Maney <ma***@two14.ne t>
escribió:
BTW, thanks for the pointer someone else gave to the proper place for
posting bugs. I'd had the silly idea that I would be able to find that
easily at www.python.org, but if I had then I'd not have posted here
and had so much fun.
My microwave oven doesn't work very well, it's rather new and I want it
fixed. I take the manual, go to the last pages, and find how to contact
the factory.
A module in the Python Standard Library has a bug. I take the Python
Library Reference manual, go to the last pages (Appendix B), and find how
to properly report a bug.

--
Gabriel Genellina

May 19 '07 #9
Gabriel Genellina <ga*******@yaho o.com.arwrote:
A module in the Python Standard Library has a bug. I take the Python
Library Reference manual, go to the last pages (Appendix B), and find how
to properly report a bug.
Sure, the information is *somewhere*. Silly me, I expected it to be
readily findable from the project's home page, as it usually is for open
source projects (and I thought I remembered it being so in the past -
y'know, before the web site got prettified and dumbed down). Is there
any good reason not to have it in lots of likely places, aside from the
opportunity to jeer at those who didn't look in the right one while
spending their time trying to report a problem?

Never mind, rhetorical question.

--
There is overwhelming evidence that the higher the level of self-esteem,
the more likely one will be to treat others with respect, kindness, and
generosity. -- Nathaniel Branden
May 20 '07 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
15223
by: Tung Wai Yip | last post by:
Can I add empty directory using zipfile? When I try to add a directory it complains that it is not a file. tung
1
4218
by: Waitman Gobble | last post by:
Hello, I am new to Python. I am having trouble with zipfile.py. On a Linux machine with python 2.4.2 I have trouble opening a zipfile. Python is complaining about the bit where it does a seek(-22,2). Looks to me like zipfile.py is trying to come back 22 bytes from the end of file. # python
11
7618
by: Hari Sekhon | last post by:
I do import zipfile zip=zipfile.ZipFile('d:\somepath\cdimage.zip') zip.namelist() then either of the two: A) file('someimage.iso','w').write(zip.read('someimage.iso'))
3
4896
bvdet
by: bvdet | last post by:
Following is an example that may provide a solution to you: """ Function makeArchive is a wrapper for the Python class zipfile.ZipFile 'fileList' is a list of file names - full path each name 'archive' is the file name for the archive with a full path """ import zipfile, os def makeArchive(fileList, archive):
8
3944
by: =?utf-8?B?5Lq66KiA6JC95pel5piv5aSp5rav77yM5pyb5p6B | last post by:
I made a C/S network program, the client receive the zip file from the server, and read the data into a variable. how could I process the zipfile directly without saving it into file. In the document of the zipfile module, I note that it mentions the file-like object? what does it mean? class ZipFile( file]]) Open a ZIP file, where file can be either a path to a file (a string) or a file-like object.
5
6637
by: Martin | last post by:
I get below error when trying to write unicode xml to a zipfile. zip.writestr('content.xml', content.toxml()) File "/usr/lib/python2.4/zipfile.py", line 460, in writestr zinfo.CRC = binascii.crc32(bytes) # CRC-32 checksum UnicodeEncodeError: 'ascii' codec can't encode character u'\u25cf' in position 2848: ordinal not in range(128) Any ideas?
3
4225
by: towers | last post by:
Hi I'm probably doing something stupid but I've run into a problem whereby I'm trying to add a csv file to a zip archive - see example code below. The csv just has several rows with carriage return line feeds (CRLF). However after adding it to an archive and then decompressing the line endings have been converted to just line feeds (LF).
5
5170
by: Neil Crighton | last post by:
I'm using the zipfile library to read a zip file in Windows, and it seems to be adding too many newlines to extracted files. I've found that for extracted text-encoded files, removing all instances of '\r' in the extracted file seems to fix the problem, but I can't find an easy solution for binary files. The code I'm using is something like: from zipfile import Zipfile z = Zipfile(open('zippedfile.zip'))
1
2195
by: John Machin | last post by:
On Jun 4, 8:06 pm, jwesonga <crazylun...@gmail.comwrote: Nothing is ever as it seems. Let's try to work backwards from the error message ... and we don't need your magnificent script, just the traceback will do for now, so: The error says that you are trying to seek 22 bytes backwards from the
0
9704
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9572
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10562
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
10303
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
1
7608
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6845
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5508
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
2
3803
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2978
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.