I'm using the zipfile library to read a zip file in Windows, and it
seems to be adding too many newlines to extracted files. I've found
that for extracted text-encoded files, removing all instances of '\r'
in the extracted file seems to fix the problem, but I can't find an
easy solution for binary files.
The code I'm using is something like:
from zipfile import Zipfile
z = Zipfile(open('zippedfile.zip'))
extractedfile = z.read('filename_in_zippedfile')
I'm using Python version 2.5. Has anyone else had this problem
before, or know how to fix it?
Thanks,
Neil 5 4969
On Mar 10, 8:31 pm, "Neil Crighton" <neilcrigh...@gmail.comwrote:
I'm using the zipfile library to read a zip file in Windows, and it
seems to be adding too many newlines to extracted files. I've found
that for extracted text-encoded files, removing all instances of '\r'
in the extracted file seems to fix the problem, but I can't find an
easy solution for binary files.
The code I'm using is something like:
from zipfile import Zipfile
z = Zipfile(open('zippedfile.zip'))
extractedfile = z.read('filename_in_zippedfile')
"Too many newlines" is fixed by removing all instances of '\r'. What
are you calling a newline? '\r'??
How do you know there are too many thingies? What operating system
were the original files created on?
When you do:
# using a more meaningful name :-)
extractedfilecontents = z.read('filename_in_zippedfile')
then:
print repr(extractedfilecontents)
what do you see at the end of what you regard as each line:
(1) \n
(2) \r\n
(3) \r
(4) something else
?
Do you fiddle with extractedfilecontents (other than trying to fix it)
before writing it to the file?
When you write out a text file,
do you do:
open('foo.txt', 'w').write(extractedfilecontents)
or
open('foo.txt', 'wb').write(extractedfilecontents)
?
When you write out a binary file,
do you do:
open('foo.txt', 'w').write(extractedfilecontents)
or
open('foo.txt', 'wb').write(extractedfilecontents)
?
"Neil Crighton" <ne**********@gmail.comwrote:
I'm using the zipfile library to read a zip file in Windows, and it
seems to be adding too many newlines to extracted files. I've found
that for extracted text-encoded files, removing all instances of '\r'
in the extracted file seems to fix the problem, but I can't find an
easy solution for binary files.
The code I'm using is something like:
from zipfile import Zipfile
z = Zipfile(open('zippedfile.zip'))
extractedfile = z.read('filename_in_zippedfile')
I'm using Python version 2.5. Has anyone else had this problem
before, or know how to fix it?
Thanks,
Zip files aren't text. Try opening the zipfile file in binary mode:
open('zippedfile.zip', 'rb')
On Mar 10, 11:14 pm, Duncan Booth <duncan.bo...@invalid.invalid>
wrote:
"Neil Crighton" <neilcrigh...@gmail.comwrote:
I'm using the zipfile library to read a zip file in Windows, and it
seems to be adding too many newlines to extracted files. I've found
that for extracted text-encoded files, removing all instances of '\r'
in the extracted file seems to fix the problem, but I can't find an
easy solution for binary files.
The code I'm using is something like:
from zipfile import Zipfile
z = Zipfile(open('zippedfile.zip'))
extractedfile = z.read('filename_in_zippedfile')
I'm using Python version 2.5. Has anyone else had this problem
before, or know how to fix it?
Thanks,
Zip files aren't text. Try opening the zipfile file in binary mode:
open('zippedfile.zip', 'rb')
Good pickup, but that indicates that the OP may have *TWO* problems,
the first of which is not posting the code that was actually executed.
If the OP actually executed the code that he posted, it is highly
likely to have died in a hole long before it got to the z.read()
stage, e.g.
>>import zipfile z = zipfile.ZipFile(open('foo.zip'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\python25\lib\zipfile.py", line 346, in __init__
self._GetContents()
File "C:\python25\lib\zipfile.py", line 366, in _GetContents
self._RealGetContents()
File "C:\python25\lib\zipfile.py", line 404, in _RealGetContents
centdir = struct.unpack(structCentralDir, centdir)
File "C:\python25\lib\struct.py", line 87, in unpack
return o.unpack(s)
struct.error: unpack requires a string argument of length 46
>>z = zipfile.ZipFile(open('foo.zip', 'rb')) # OK z = zipfile.ZipFile('foo.zip', 'r') # OK
If it somehow made it through the open stage, it surely would have
blown up at the read stage, when trying to decompress a contained
file.
Cheers,
John
Sorry my initial post was muddled. Let me try again.
I've got a zipped archive that I can extract files from with my
standard archive unzipping program, 7-zip. I'd like to extract the
files in python via the zipfile module. However, when I extract the
file from the archive with ZipFile.read(), it isn't the same as the 7-
zip-extracted file. For text files, the zipfile-extracted version has
'\r\n' everywhere the 7-zip-extracted file only has '\n'. I haven't
tried comparing binary files via the two extraction methods yet.
Regarding the code I posted; I was writing it from memory, and made a
mistake. I didn't use:
z = zipfile.ZipFile(open('foo.zip', 'r'))
I used this:
z = zipfile.ZipFile('foo.zip')
But Duncan's comment was useful, as I generally only ever work with
text files, and I didn't realise you have to use 'rb' or 'wb' options
when reading and writing binary files.
To answer John's questions - I was calling '\r' a newline. I should
have said carriage return. I'm not sure what operating system the
original zip file was created on. I didn't fiddle with the extracted
file contents, other than replacing '\r' with ''. I wrote out all the
files with open('outputfile','w') - I seems that I should have been
using 'wb' when writing out the binary files.
Thanks for the quick responses - any ideas why the zipfile-extracted
files and 7-zip-extracted files are different?
On Mar 10, 9:37 pm, John Machin <sjmac...@lexicon.netwrote:
On Mar 10, 11:14 pm, Duncan Booth <duncan.bo...@invalid.invalid>
wrote:
"Neil Crighton" <neilcrigh...@gmail.comwrote:
I'm using the zipfile library to read a zip file in Windows, and it
seems to be adding too many newlines to extracted files. I've found
that for extracted text-encoded files, removing all instances of '\r'
in the extracted file seems to fix the problem, but I can't find an
easy solution for binary files.
The code I'm using is something like:
from zipfile import Zipfile
z = Zipfile(open('zippedfile.zip'))
extractedfile = z.read('filename_in_zippedfile')
I'm using Python version 2.5. Has anyone else had this problem
before, or know how to fix it?
Thanks,
Zip files aren't text. Try opening the zipfile file in binary mode:
open('zippedfile.zip', 'rb')
Good pickup, but that indicates that the OP may have *TWO* problems,
the first of which is not posting the code that was actually executed.
If the OP actually executed the code that he posted, it is highly
likely to have died in a hole long before it got to the z.read()
stage, e.g.
>import zipfile z = zipfile.ZipFile(open('foo.zip'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\python25\lib\zipfile.py", line 346, in __init__
self._GetContents()
File "C:\python25\lib\zipfile.py", line 366, in _GetContents
self._RealGetContents()
File "C:\python25\lib\zipfile.py", line 404, in _RealGetContents
centdir = struct.unpack(structCentralDir, centdir)
File "C:\python25\lib\struct.py", line 87, in unpack
return o.unpack(s)
struct.error: unpack requires a string argument of length 46
>z = zipfile.ZipFile(open('foo.zip', 'rb')) # OK z = zipfile.ZipFile('foo.zip', 'r') # OK
If it somehow made it through the open stage, it surely would have
blown up at the read stage, when trying to decompress a contained
file.
Cheers,
John
I think I've worked it out after reading the 'Binary mode for files'
section of http://zephyrfalcon.org/labs/python_pitfalls.html
zipfile extracts as file as a binary series of characters, and I'm
writing out this binary file as a text file with open('foo','w').
Normally Python converts a '\n' in a text file to whatever the
platform-dependent indication of a new line is ('\n' on Unix, '\r\n'
on Windows, '\r' on Macs). So it sees '\r\n' in the binary file and
converts it to '\r\r\n' for the text file.
The upshot of this is that writing out the zipfile-extracted files
with open('foo','wb') instead of open('foo','w') solves my problem.
On Mar 11, 8:43 pm, neilcrigh...@gmail.com wrote:
Sorry my initial post was muddled. Let me try again.
I've got a zipped archive that I can extract files from with my
standard archive unzipping program, 7-zip. I'd like to extract the
files in python via the zipfile module. However, when I extract the
file from the archive with ZipFile.read(), it isn't the same as the 7-
zip-extracted file. For text files, the zipfile-extracted version has
'\r\n' everywhere the 7-zip-extracted file only has '\n'. I haven't
tried comparing binary files via the two extraction methods yet.
Regarding the code I posted; I was writing it from memory, and made a
mistake. I didn't use:
z = zipfile.ZipFile(open('foo.zip', 'r'))
I used this:
z = zipfile.ZipFile('foo.zip')
But Duncan's comment was useful, as I generally only ever work with
text files, and I didn't realise you have to use 'rb' or 'wb' options
when reading and writing binary files.
To answer John's questions - I was calling '\r' a newline. I should
have said carriage return. I'm not sure what operating system the
original zip file was created on. I didn't fiddle with the extracted
file contents, other than replacing '\r' with ''. I wrote out all the
files with open('outputfile','w') - I seems that I should have been
using 'wb' when writing out the binary files.
Thanks for the quick responses - any ideas why the zipfile-extracted
files and 7-zip-extracted files are different?
On Mar 10, 9:37 pm, John Machin <sjmac...@lexicon.netwrote:
On Mar 10, 11:14 pm, Duncan Booth <duncan.bo...@invalid.invalid>
wrote:
"Neil Crighton" <neilcrigh...@gmail.comwrote:
I'm using the zipfile library to read a zip file in Windows, and it
seems to be adding too many newlines to extracted files. I've found
that for extracted text-encoded files, removing all instances of '\r'
in the extracted file seems to fix the problem, but I can't find an
easy solution for binary files.
The code I'm using is something like:
from zipfile import Zipfile
z = Zipfile(open('zippedfile.zip'))
extractedfile = z.read('filename_in_zippedfile')
I'm using Python version 2.5. Has anyone else had this problem
before, or know how to fix it?
Thanks,
Zip files aren't text. Try opening the zipfile file in binary mode:
open('zippedfile.zip', 'rb')
Good pickup, but that indicates that the OP may have *TWO* problems,
the first of which is not posting the code that was actually executed.
If the OP actually executed the code that he posted, it is highly
likely to have died in a hole long before it got to the z.read()
stage, e.g.
>>import zipfile
>>z = zipfile.ZipFile(open('foo.zip'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\python25\lib\zipfile.py", line 346, in __init__
self._GetContents()
File "C:\python25\lib\zipfile.py", line 366, in _GetContents
self._RealGetContents()
File "C:\python25\lib\zipfile.py", line 404, in _RealGetContents
centdir = struct.unpack(structCentralDir, centdir)
File "C:\python25\lib\struct.py", line 87, in unpack
return o.unpack(s)
struct.error: unpack requires a string argument of length 46
>>z = zipfile.ZipFile(open('foo.zip', 'rb')) # OK
>>z = zipfile.ZipFile('foo.zip', 'r') # OK
If it somehow made it through the open stage, it surely would have
blown up at the read stage, when trying to decompress a contained
file.
Cheers,
John
This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Bert |
last post by:
Hello,
I'm using a script to handle downloads of files, it seems to work but
I'm getting random server errors. I guess it has something to do with
the filesize. The script will be needed to...
|
by: Renzo |
last post by:
Hi,
I'm working to create a backup function for a software; in particular,
from a directory with 12300 files (Jpg, medium size = 250KB / total size
= 2.90GB), i have to create a zip file.
I've...
|
by: Waitman Gobble |
last post by:
Hello,
I am new to Python. I am having trouble with zipfile.py.
On a Linux machine with python 2.4.2 I have trouble opening a zipfile.
Python is complaining about the bit where it does a...
|
by: Xela |
last post by:
Hi
A have a very annoying problem. I have written java strored procedures for
DB2 v8.1. Their deployement and usage is fine as long as the server is a
Windows one. But under Solaris 8 and Linux...
|
by: Juan Carlos Huitzache |
last post by:
Hi,
I am trying to build a very simple Java stored procedure on DB2 V8.2
for AIX.
I use the WSAD 5.1 or the DB2 Development Center with the same result:
C:\Program Files\SQLLIB\java\jdk\bin\jar...
|
by: Waguy |
last post by:
Hi all,
I am new to python and want to create a process to unzip large numbers of
zip files I get from a SOAP application. The files all have a ZIP extention
and can be unzipped using WinZip.
...
|
by: Hari Sekhon |
last post by:
I do
import zipfile
zip=zipfile.ZipFile('d:\somepath\cdimage.zip')
zip.namelist()
then either of the two:
A) file('someimage.iso','w').write(zip.read('someimage.iso'))
|
by: bvdet |
last post by:
Following is an example that may provide a solution to you:
"""
Function makeArchive is a wrapper for the Python class zipfile.ZipFile
'fileList' is a list of file names - full path each name...
|
by: =?utf-8?B?5Lq66KiA6JC95pel5piv5aSp5rav77yM5pyb5p6B |
last post by:
I made a C/S network program, the client receive the zip file from the
server, and read the data into a variable. how could I process the
zipfile directly without saving it into file.
In the...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 2 August 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM)
The start time is equivalent to 19:00 (7PM) in Central...
|
by: erikbower65 |
last post by:
Using CodiumAI's pr-agent is simple and powerful. Follow these steps:
1. Install CodiumAI CLI: Ensure Node.js is installed, then run 'npm install -g codiumai' in the terminal.
2. Connect to...
|
by: erikbower65 |
last post by:
Here's a concise step-by-step guide for manually installing IntelliJ IDEA:
1. Download: Visit the official JetBrains website and download the IntelliJ IDEA Community or Ultimate edition based on...
|
by: kcodez |
last post by:
As a H5 game development enthusiast, I recently wrote a very interesting little game - Toy Claw ((http://claw.kjeek.com/))。Here I will summarize and share the development experience here, and hope it...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 6 Sept 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM)
The start time is equivalent to 19:00 (7PM) in Central...
|
by: Taofi |
last post by:
I try to insert a new record but the error message says the number of query names and destination fields are not the same
This are my field names
ID, Budgeted, Actual, Status and Differences
...
|
by: Rina0 |
last post by:
I am looking for a Python code to find the longest common subsequence of two strings. I found this blog post that describes the length of longest common subsequence problem and provides a solution in...
|
by: lllomh |
last post by:
Define the method first
this.state = {
buttonBackgroundColor: 'green',
isBlinking: false, // A new status is added to identify whether the button is blinking or not
}
autoStart=()=>{
|
by: DJRhino |
last post by:
Was curious if anyone else was having this same issue or not....
I was just Up/Down graded to windows 11 and now my access combo boxes are not acting right. With win 10 I could start typing...
| |