By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
428,558 Members | 1,607 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 428,558 IT Pros & Developers. It's quick & easy.

Special characters () and zipfiles

P: n/a

I've been searching google about this for days but can't find anything,
so I'm hoping someone here can help me out.

I'm trying to create zip-files without needing the zip-file extension in
PHP, mainly because I need the ability to both create and extract
zip-files. I've tried a couple of classes found here and there, and they
all seem to have the same problem. I'm currently using PclZip
(http://phpconcept.net/pclzip/) but even the simplest one I've tried
(zip.lib.php from phpMyAdmin) gives the same result.

This is the problem:

When I create a zip-file containing any file with special characters in
their filenames, the characters gets translated into different special
characters. The three characters I myself am having problems with is the
Norwegian , and (uppercase , and ), all of which are very
common in my language. The zip-file itself can contain these characters
without any problems, the only files affected are the ones put into the
zip-file. Same happens with directories, obviously. The funny thing is,
if I extract a zip-file using the same class, the conversion gets
reversed, so the files do end up with the correct names after
extraction. This of course means that if I upload a zip-file created
using WinZip or any other zip-application, any files with special
characters will get translated into completely different characters again.

I've made a table showing the converted characters which can be found
here: http://akkar.sourceforge.net/zipchars.html

Also very strange - I tried making a zip-file containing a zero-length
file with the special characters in the filename, and when opening that
zip-file in a hex-editor I wasn't able to find the hex values for the
converted characters anywhere in the file, but the original filename
characters were found at the places where I expected them to be.

If someone can help me figure out what's going on I would really
appreciate it. I've submitted it as a bug for PclZip but it hasn't
gotten any response yet, and since I've seen the same thing happen with
other classes I sort of doubt it's only related to PclZip. I've tried it
on different servers as well, and with the same result. I've got the
impression that PclZip is a popular class for managing zip-files, so I'm
hoping anyone with some experience with it can help me out.

Thanks in advance :)
Roy W. Andersen
--
ra at broadpark dot no / http://roy.netgoth.org/

"Hey! What kind of party is this? There's no booze
and only one hooker!" - Bender, Futurama
Jul 17 '05 #1
Share this Question
Share on Google+
7 Replies


P: n/a
rl
Hello Roy,

the obvious problem is that the data of the files to be zipped, is not
treated as binary data, as it should. This may root in your own file/
variable handling. php is loose typed, what means that the type of a
variable is selected by the php engine automatically. What can go
wrong...
thus more info on what you exactly do would be needed to locate the
problem within your code or the lib used.

The other thing is the treatment of the file names. these names are
character data and stored in a zip-file as that, with no information
on the encoding as I found at least at a first glance at
http://www.pkware.com/company/standards/appnote/
..
http://www.geocities.com/marcoschmid...le-format.html
Also states "No support for extended character sets in file names" as
a limitation to this file format.
So the only thing you can use for sure is 7-Bit-ASCII. But as the
ISO-Latin-1 code table is used wide spread (and contains all
scandinavian special character), the problems you face tends to be
caused by automatic conversions, too, as a filename typed on your own
computer shouldn't lead to any difference when again displayed there.
Have look at 'setlocale' at php.net.

Cheers,

Robert
Jul 17 '05 #2

P: n/a
rl wrote:
Hello Roy,

[snip]

Thanks for the help, it's appreciated even though it unfortunately
doesn't help me much. Guess I'll just have to wait for the developer of
my class to look into it - I don't really know where to begin looking
for the cause, although I have tried. I suspect the error lies in the
use of pack() and unpack() which are functions I don't understand how
work (the PHP manual doesn't help me there - it's just my knowledge of
working with binary files that's limiting). I'm no expert on character
encodings either (I know ISO-8859-1 and UTF-8, but that's about it), so
I've been more or less stumbling around blindly in the code, and when
I've tried some changes I've ended up with corrupted zip-files ;)

I've tried several different settings using setlocale() and it doesn't
make any difference at all, so I've concluded that the zip-class doesn't
use any functions affected by PHP's own locale-setting.

What's strange though is that I can't find any reference to this problem
anywhere on the web or the google Usenet archive, and the problem
doesn't only affect '', '' and '' but all special
language-characters. That lead me to believe it was the settings on my
server that was causing it, but when testing it on my project's
sourceforge webspace I got the same thing happening there as well, which
again tells me the problem is with the class itself.
Roy W. Andersen
--
ra at broadpark dot no / http://roy.netgoth.org/

"Hey! What kind of party is this? There's no booze
and only one hooker!" - Bender, Futurama
Jul 17 '05 #3

P: n/a
Roy W. Andersen wrote:
That lead me to believe ... the problem is with the class itself.


Can you try the CLI version of PKZip or WinZip?
I believe they have demo versions available.
--
Mail to my "From:" address is readable by all at http://www.dodgeit.com/
== ** ## !! ------------------------------------------------ !! ## ** ==
TEXT-ONLY mail to the whole "Reply-To:" address ("My Name" <my@address>)
may bypass my spam filter. If it does, I may reply from another address!
Jul 17 '05 #4

P: n/a
"Roy W. Andersen" <ro******@netgoth.org> wrote in message
news:34*************@individual.net...
[ ... ]
This is the problem:

When I create a zip-file containing any file with special characters in
their filenames, the characters gets translated into different special
characters. The three characters I myself am having problems with is the
Norwegian , and (uppercase , and ), all of which are very
common in my language. The zip-file itself can contain these characters
without any problems, the only files affected are the ones put into the
zip-file. Same happens with directories, obviously. The funny thing is,
if I extract a zip-file using the same class, the conversion gets
reversed, so the files do end up with the correct names after
extraction. This of course means that if I upload a zip-file created
using WinZip or any other zip-application, any files with special
characters will get translated into completely different characters again.

I've made a table showing the converted characters which can be found
here: http://akkar.sourceforge.net/zipchars.html


WinZip stores filenames in the CP437 (MS-DOS) charset, where = 0x92, and
= 0x91 (see http://www.microsoft.com/globaldev/r...e/oem/437.htm). It's
neither Unicode or ISO-8859-1 compatible. Why they're showing up as
characters shown in your chart I'm not sure.
Jul 17 '05 #5

P: n/a
Chung Leong wrote:
"Roy W. Andersen" <ro******@netgoth.org> wrote in message

WinZip stores filenames in the CP437 (MS-DOS) charset, where = 0x92, and
= 0x91 (see http://www.microsoft.com/globaldev/r...e/oem/437.htm). It's
neither Unicode or ISO-8859-1 compatible. Why they're showing up as
characters shown in your chart I'm not sure.


Apparently that was exactly the right piece of information I needed :D

The PclZip class has an option of passing the files through a callback
function before adding from or extracting to the archive, and by using
that option I actually managed to get it working.

Before adding the file:
iconv("ISO-8859-1", "CP437", $p_header['filename'])

And, of course, before extracting:
iconv("CP437", "ISO-8859-1", $p_header['filename'])

And now it works! Thank you very much! I'd just written this problem on
the "Known Issues" list for the upcoming release of my project, and now
I can safely remove it again! Thank you thank you thank you! And
everyone else who offered help as well, of course, but this little piece
of info unlocked the riddle :)
Roy W. Andersen
--
ra at broadpark dot no / http://roy.netgoth.org/

"Hey! What kind of party is this? There's no booze
and only one hooker!" - Bender, Futurama
Jul 17 '05 #6

P: n/a
Roy W. Andersen wrote:
Before adding the file:
iconv("ISO-8859-1", "CP437", $p_header['filename'])


I jumped the gun a bit here, but it was the right track atleast :)

Using CP850 worked, but CP437 didn't handle all my characters that well
(I remember codepage 850 and/or 865 is what I used back in the days of
good old MS-DOS).

Still though, it works now :) Hopefully others with the same problem in
the future have an easier time finding the answer thanks to this thread
- I sure wish I had ;)
Roy W. Andersen
--
ra at broadpark dot no / http://roy.netgoth.org/

"Hey! What kind of party is this? There's no booze
and only one hooker!" - Bender, Futurama
Jul 17 '05 #7

P: n/a
rl
Roy,
rl wrote:
Hello Roy,
[snip]
Thanks for the help, it's appreciated even though it unfortunately
doesn't help me much.

Doesn't look like you've read.
Guess I'll just have to wait for the developer of
my class to look into it - I don't really know where to begin looking
for the cause, although I have tried. I suspect the error lies in the
use of pack() and unpack() which are functions I don't understand how
work (the PHP manual doesn't help me there - it's just my knowledge of
working with binary files that's limiting). I'm no expert on character
encodings either (I know ISO-8859-1 and UTF-8, but that's about it), so
I've been more or less stumbling around blindly in the code, and when
I've tried some changes I've ended up with corrupted zip-files ;) pack and unpack leave character encoding completely untouched, if you
specify the conversion accordingly. That's why I asked for respective
code sniplets (if you're allowed).
And: All files to be put to the zip must be opened in binary mode if need
to be opened by yourself anyway.
I had no problems putting complete binary files to database and fetching
out again via unpack and pack.
I've tried several different settings using setlocale() and it doesn't
make any difference at all, so I've concluded that the zip-class doesn't
use any functions affected by PHP's own locale-setting.

What's strange though is that I can't find any reference to this problem
anywhere on the web or the google Usenet archive, and the problem
doesn't only affect '', '' and '' but all special
language-characters. That lead me to believe it was the settings on my
server that was causing it, but when testing it on my project's
sourceforge webspace I got the same thing happening there as well, which
again tells me the problem is with the class itself.

Or with your own code ...
Jul 17 '05 #8

This discussion thread is closed

Replies have been disabled for this discussion.