472,794 Members | 1,715 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,794 software developers and data experts.

zipfile module: problems with filename having non ascii characters


I've a simple python script that read a directory and put the files into a
Zip file.

I'm using the os.walk method to get the directory content,
I'm creating ZipInfo objects and set "filename", ... to what os.walk give
me.
....
And it works!!!!

BUT!!

When I open the created zip file with "WinZip" (or any other zip tool)
filenames are not always like they should be.
In fact filenames with characters like "é","è","ç" are not correctly defined
in the zip file.

Does any one knows what must be done ?
Does this is a "unicode" problem ?
Does this is a known bug in ZipFile module ?
????

Thanks

Vincent
Jul 18 '05 #1
4 3153
Zip files don't have a way to define the encoding of filenames---names
are just byte strings, and different utilities may interpret them in
different ways. The only thing that seems to be defined is that '/' is
the directory separator, and possibly that the filename can't contain
'\0'.

You can probably find the encoding that winzip uses with a little
trial-and-error, and convert your filenames in your encoding to
filenames in that encoding. This may depend on the language or region
of the installed Windows, though.

Jeff

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)

iD8DBQFBJ7rcJd01MZaTXX0RAg3AAJ4j4bJi1zy5kJxIuPJm5y 0RRrmDNQCglS+S
D+016AywZh98VkLrPOKyBbM=
=i06Z
-----END PGP SIGNATURE-----

Jul 18 '05 #2
Jeff Epler wrote:
Zip files don't have a way to define the encoding of filenames---names
are just byte strings, and different utilities may interpret them in
different ways. The only thing that seems to be defined is that '/' is
the directory separator, and possibly that the filename can't contain
'\0'.

Thanks, I've got the problem and replace all "\" to "/".

You can probably find the encoding that winzip uses with a little
trial-and-error, and convert your filenames in your encoding to
filenames in that encoding. This may depend on the language or region
of the installed Windows, though.


Thanks for the explanation.

That limitation is only valid for zip files ?
Is there an another "compression tool" that don't have such limitation
(tgz? , bz2? , ???à
Jul 18 '05 #3
vi***********@yahoo.com wrote:
That limitation is only valid for zip files ?
It appears that WinZip and other tools interpret the file names in a
zipfile in CP437. So to properly put non-ASCII file names into a
zipfile, you need to convert them into CP437. If the file name
contains a character which is not available in CP437, you cannot
save the file in a zipfile (without renaming it).

Not really a Unicode problem, but rather a problem that Unicode
tries to solve.
Is there an another "compression tool" that don't have such limitation
(tgz? , bz2? , ???à


tar, traditionally, is also unaware of character sets. Single Unix 3
(and I believe also earlier) ended the tar wars with the introduction
of the pax utility, which does allow for specification of a character
set in a pax file; among the supported character sets are ISO-8859-n,
and UTF-8.

Jörg Schilling's star(1) also uses UTF-8 for file names.

On the non-tar side of the world, WinRAR supports Unicode in archives.
For compatibility, they also put a non-Unicode name into the archive,
but the Unicode name, if present, is meant to take precedence.

Regards,
Martin
Jul 18 '05 #4
"Martin v. Löwis" wrote:
vi***********@yahoo.com wrote:
That limitation is only valid for zip files ?
It appears that WinZip and other tools interpret the file names in a
zipfile in CP437. So to properly put non-ASCII file names into a
zipfile, you need to convert them into CP437. If the file name
contains a character which is not available in CP437, you cannot
save the file in a zipfile (without renaming it).


Thanks, with cp437 it rocks!!!!

Not really a Unicode problem, but rather a problem that Unicode
tries to solve.
Is there an another "compression tool" that don't have such limitation
(tgz? , bz2? , ???à
tar, traditionally, is also unaware of character sets. Single Unix 3
(and I believe also earlier) ended the tar wars with the introduction
of the pax utility, which does allow for specification of a character
set in a pax file; among the supported character sets are ISO-8859-n,
and UTF-8.


Thanks for the info.

Jörg Schilling's star(1) also uses UTF-8 for file names.

On the non-tar side of the world, WinRAR supports Unicode in archives.
For compatibility, they also put a non-Unicode name into the archive,
but the Unicode name, if present, is meant to take precedence.


Thus, the most "portable" compression tool.

Thanks for those valuable remarks.

Vincent
Jul 18 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: LC | last post by:
Hi, I'm having a problem using the zipfile module in Windows 2000 sp4. When I use it to zip a small file it works fine, but large file doesnt. Here's the error msg i get......
19
by: Gerson Kurz | last post by:
AAAAAAAARG I hate the way python handles unicode. Here is a nice problem for y'all to enjoy: say you have a variable thats unicode directory = u"c:\temp" Its unicode not because you want it...
6
by: Bennie | last post by:
Hi, I have a problem with ZipFile. It works okay untily I come across a file that is greater then 1Gb. Then it exit with the error: OverflowError: long int too large to convert to int How...
5
by: Waguy | last post by:
Hi all, I am new to python and want to create a process to unzip large numbers of zip files I get from a SOAP application. The files all have a ZIP extention and can be unzipped using WinZip. ...
1
by: Ritesh Raj Sarraf | last post by:
Hi, I've got a problem here. def compress_the_file(zip_file_name, files_to_compress, sSourceDir): """ Condenses all the files into one single file for easy transfer """ try:
5
by: OriginalBrownster | last post by:
This will probably sound like a very dumb question. I am trying to zip some files within a directory. I want to zip all the files within a directory called "temp" and have the zip archive...
8
by: =?utf-8?B?5Lq66KiA6JC95pel5piv5aSp5rav77yM5pyb5p6B | last post by:
I made a C/S network program, the client receive the zip file from the server, and read the data into a variable. how could I process the zipfile directly without saving it into file. In the...
21
KevinADC
by: KevinADC | last post by:
Note: You may skip to the end of the article if all you want is the perl code. Introduction Uploading files from a local computer to a remote web server has many useful purposes, the most...
5
by: Neil Crighton | last post by:
I'm using the zipfile library to read a zip file in Windows, and it seems to be adding too many newlines to extracted files. I've found that for extracted text-encoded files, removing all instances...
0
by: erikbower65 | last post by:
Using CodiumAI's pr-agent is simple and powerful. Follow these steps: 1. Install CodiumAI CLI: Ensure Node.js is installed, then run 'npm install -g codiumai' in the terminal. 2. Connect to...
0
linyimin
by: linyimin | last post by:
Spring Startup Analyzer generates an interactive Spring application startup report that lets you understand what contributes to the application startup time and helps to optimize it. Support for...
0
by: kcodez | last post by:
As a H5 game development enthusiast, I recently wrote a very interesting little game - Toy Claw ((http://claw.kjeek.com/))。Here I will summarize and share the development experience here, and hope it...
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Sept 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM) The start time is equivalent to 19:00 (7PM) in Central...
14
DJRhino1175
by: DJRhino1175 | last post by:
When I run this code I get an error, its Run-time error# 424 Object required...This is my first attempt at doing something like this. I test the entire code and it worked until I added this - If...
0
by: Rina0 | last post by:
I am looking for a Python code to find the longest common subsequence of two strings. I found this blog post that describes the length of longest common subsequence problem and provides a solution in...
5
by: DJRhino | last post by:
Private Sub CboDrawingID_BeforeUpdate(Cancel As Integer) If = 310029923 Or 310030138 Or 310030152 Or 310030346 Or 310030348 Or _ 310030356 Or 310030359 Or 310030362 Or...
0
by: lllomh | last post by:
How does React native implement an English player?
0
by: Mushico | last post by:
How to calculate date of retirement from date of birth

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.