473,324 Members | 2,248 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,324 software developers and data experts.

zipfile module: problems with filename having non ascii characters


I've a simple python script that read a directory and put the files into a
Zip file.

I'm using the os.walk method to get the directory content,
I'm creating ZipInfo objects and set "filename", ... to what os.walk give
me.
....
And it works!!!!

BUT!!

When I open the created zip file with "WinZip" (or any other zip tool)
filenames are not always like they should be.
In fact filenames with characters like "é","è","ç" are not correctly defined
in the zip file.

Does any one knows what must be done ?
Does this is a "unicode" problem ?
Does this is a known bug in ZipFile module ?
????

Thanks

Vincent
Jul 18 '05 #1
4 3200
Zip files don't have a way to define the encoding of filenames---names
are just byte strings, and different utilities may interpret them in
different ways. The only thing that seems to be defined is that '/' is
the directory separator, and possibly that the filename can't contain
'\0'.

You can probably find the encoding that winzip uses with a little
trial-and-error, and convert your filenames in your encoding to
filenames in that encoding. This may depend on the language or region
of the installed Windows, though.

Jeff

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)

iD8DBQFBJ7rcJd01MZaTXX0RAg3AAJ4j4bJi1zy5kJxIuPJm5y 0RRrmDNQCglS+S
D+016AywZh98VkLrPOKyBbM=
=i06Z
-----END PGP SIGNATURE-----

Jul 18 '05 #2
Jeff Epler wrote:
Zip files don't have a way to define the encoding of filenames---names
are just byte strings, and different utilities may interpret them in
different ways. The only thing that seems to be defined is that '/' is
the directory separator, and possibly that the filename can't contain
'\0'.

Thanks, I've got the problem and replace all "\" to "/".

You can probably find the encoding that winzip uses with a little
trial-and-error, and convert your filenames in your encoding to
filenames in that encoding. This may depend on the language or region
of the installed Windows, though.


Thanks for the explanation.

That limitation is only valid for zip files ?
Is there an another "compression tool" that don't have such limitation
(tgz? , bz2? , ???à
Jul 18 '05 #3
vi***********@yahoo.com wrote:
That limitation is only valid for zip files ?
It appears that WinZip and other tools interpret the file names in a
zipfile in CP437. So to properly put non-ASCII file names into a
zipfile, you need to convert them into CP437. If the file name
contains a character which is not available in CP437, you cannot
save the file in a zipfile (without renaming it).

Not really a Unicode problem, but rather a problem that Unicode
tries to solve.
Is there an another "compression tool" that don't have such limitation
(tgz? , bz2? , ???à


tar, traditionally, is also unaware of character sets. Single Unix 3
(and I believe also earlier) ended the tar wars with the introduction
of the pax utility, which does allow for specification of a character
set in a pax file; among the supported character sets are ISO-8859-n,
and UTF-8.

Jörg Schilling's star(1) also uses UTF-8 for file names.

On the non-tar side of the world, WinRAR supports Unicode in archives.
For compatibility, they also put a non-Unicode name into the archive,
but the Unicode name, if present, is meant to take precedence.

Regards,
Martin
Jul 18 '05 #4
"Martin v. Löwis" wrote:
vi***********@yahoo.com wrote:
That limitation is only valid for zip files ?
It appears that WinZip and other tools interpret the file names in a
zipfile in CP437. So to properly put non-ASCII file names into a
zipfile, you need to convert them into CP437. If the file name
contains a character which is not available in CP437, you cannot
save the file in a zipfile (without renaming it).


Thanks, with cp437 it rocks!!!!

Not really a Unicode problem, but rather a problem that Unicode
tries to solve.
Is there an another "compression tool" that don't have such limitation
(tgz? , bz2? , ???à
tar, traditionally, is also unaware of character sets. Single Unix 3
(and I believe also earlier) ended the tar wars with the introduction
of the pax utility, which does allow for specification of a character
set in a pax file; among the supported character sets are ISO-8859-n,
and UTF-8.


Thanks for the info.

Jörg Schilling's star(1) also uses UTF-8 for file names.

On the non-tar side of the world, WinRAR supports Unicode in archives.
For compatibility, they also put a non-Unicode name into the archive,
but the Unicode name, if present, is meant to take precedence.


Thus, the most "portable" compression tool.

Thanks for those valuable remarks.

Vincent
Jul 18 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: LC | last post by:
Hi, I'm having a problem using the zipfile module in Windows 2000 sp4. When I use it to zip a small file it works fine, but large file doesnt. Here's the error msg i get......
19
by: Gerson Kurz | last post by:
AAAAAAAARG I hate the way python handles unicode. Here is a nice problem for y'all to enjoy: say you have a variable thats unicode directory = u"c:\temp" Its unicode not because you want it...
6
by: Bennie | last post by:
Hi, I have a problem with ZipFile. It works okay untily I come across a file that is greater then 1Gb. Then it exit with the error: OverflowError: long int too large to convert to int How...
5
by: Waguy | last post by:
Hi all, I am new to python and want to create a process to unzip large numbers of zip files I get from a SOAP application. The files all have a ZIP extention and can be unzipped using WinZip. ...
1
by: Ritesh Raj Sarraf | last post by:
Hi, I've got a problem here. def compress_the_file(zip_file_name, files_to_compress, sSourceDir): """ Condenses all the files into one single file for easy transfer """ try:
5
by: OriginalBrownster | last post by:
This will probably sound like a very dumb question. I am trying to zip some files within a directory. I want to zip all the files within a directory called "temp" and have the zip archive...
8
by: =?utf-8?B?5Lq66KiA6JC95pel5piv5aSp5rav77yM5pyb5p6B | last post by:
I made a C/S network program, the client receive the zip file from the server, and read the data into a variable. how could I process the zipfile directly without saving it into file. In the...
21
KevinADC
by: KevinADC | last post by:
Note: You may skip to the end of the article if all you want is the perl code. Introduction Uploading files from a local computer to a remote web server has many useful purposes, the most...
5
by: Neil Crighton | last post by:
I'm using the zipfile library to read a zip file in Windows, and it seems to be adding too many newlines to extracted files. I've found that for extracted text-encoded files, removing all instances...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.