473,499 Members | 1,873 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Special characters (æøå) and zipfiles


I've been searching google about this for days but can't find anything,
so I'm hoping someone here can help me out.

I'm trying to create zip-files without needing the zip-file extension in
PHP, mainly because I need the ability to both create and extract
zip-files. I've tried a couple of classes found here and there, and they
all seem to have the same problem. I'm currently using PclZip
(http://phpconcept.net/pclzip/) but even the simplest one I've tried
(zip.lib.php from phpMyAdmin) gives the same result.

This is the problem:

When I create a zip-file containing any file with special characters in
their filenames, the characters gets translated into different special
characters. The three characters I myself am having problems with is the
Norwegian æ, ø and å (uppercase Æ, Ø and Å), all of which are very
common in my language. The zip-file itself can contain these characters
without any problems, the only files affected are the ones put into the
zip-file. Same happens with directories, obviously. The funny thing is,
if I extract a zip-file using the same class, the conversion gets
reversed, so the files do end up with the correct names after
extraction. This of course means that if I upload a zip-file created
using WinZip or any other zip-application, any files with special
characters will get translated into completely different characters again.

I've made a table showing the converted characters which can be found
here: http://akkar.sourceforge.net/zipchars.html

Also very strange - I tried making a zip-file containing a zero-length
file with the special characters in the filename, and when opening that
zip-file in a hex-editor I wasn't able to find the hex values for the
converted characters anywhere in the file, but the original filename
characters were found at the places where I expected them to be.

If someone can help me figure out what's going on I would really
appreciate it. I've submitted it as a bug for PclZip but it hasn't
gotten any response yet, and since I've seen the same thing happen with
other classes I sort of doubt it's only related to PclZip. I've tried it
on different servers as well, and with the same result. I've got the
impression that PclZip is a popular class for managing zip-files, so I'm
hoping anyone with some experience with it can help me out.

Thanks in advance :)
Roy W. Andersen
--
ra at broadpark dot no / http://roy.netgoth.org/

"Hey! What kind of party is this? There's no booze
and only one hooker!" - Bender, Futurama
Jul 17 '05 #1
7 8944
rl
Hello Roy,

the obvious problem is that the data of the files to be zipped, is not
treated as binary data, as it should. This may root in your own file/
variable handling. php is loose typed, what means that the type of a
variable is selected by the php engine automatically. What can go
wrong...
thus more info on what you exactly do would be needed to locate the
problem within your code or the lib used.

The other thing is the treatment of the file names. these names are
character data and stored in a zip-file as that, with no information
on the encoding as I found at least at a first glance at
http://www.pkware.com/company/standards/appnote/
..
http://www.geocities.com/marcoschmid...le-format.html
Also states "No support for extended character sets in file names" as
a limitation to this file format.
So the only thing you can use for sure is 7-Bit-ASCII. But as the
ISO-Latin-1 code table is used wide spread (and contains all
scandinavian special character), the problems you face tends to be
caused by automatic conversions, too, as a filename typed on your own
computer shouldn't lead to any difference when again displayed there.
Have look at 'setlocale' at php.net.

Cheers,

Robert
Jul 17 '05 #2
rl wrote:
Hello Roy,

[snip]

Thanks for the help, it's appreciated even though it unfortunately
doesn't help me much. Guess I'll just have to wait for the developer of
my class to look into it - I don't really know where to begin looking
for the cause, although I have tried. I suspect the error lies in the
use of pack() and unpack() which are functions I don't understand how
work (the PHP manual doesn't help me there - it's just my knowledge of
working with binary files that's limiting). I'm no expert on character
encodings either (I know ISO-8859-1 and UTF-8, but that's about it), so
I've been more or less stumbling around blindly in the code, and when
I've tried some changes I've ended up with corrupted zip-files ;)

I've tried several different settings using setlocale() and it doesn't
make any difference at all, so I've concluded that the zip-class doesn't
use any functions affected by PHP's own locale-setting.

What's strange though is that I can't find any reference to this problem
anywhere on the web or the google Usenet archive, and the problem
doesn't only affect 'æ', 'ø' and 'å' but all special
language-characters. That lead me to believe it was the settings on my
server that was causing it, but when testing it on my project's
sourceforge webspace I got the same thing happening there as well, which
again tells me the problem is with the class itself.
Roy W. Andersen
--
ra at broadpark dot no / http://roy.netgoth.org/

"Hey! What kind of party is this? There's no booze
and only one hooker!" - Bender, Futurama
Jul 17 '05 #3
Roy W. Andersen wrote:
That lead me to believe ... the problem is with the class itself.


Can you try the CLI version of PKZip or WinZip?
I believe they have demo versions available.
--
Mail to my "From:" address is readable by all at http://www.dodgeit.com/
== ** ## !! ------------------------------------------------ !! ## ** ==
TEXT-ONLY mail to the whole "Reply-To:" address ("My Name" <my@address>)
may bypass my spam filter. If it does, I may reply from another address!
Jul 17 '05 #4
"Roy W. Andersen" <ro******@netgoth.org> wrote in message
news:34*************@individual.net...
[ ... ]
This is the problem:

When I create a zip-file containing any file with special characters in
their filenames, the characters gets translated into different special
characters. The three characters I myself am having problems with is the
Norwegian æ, ø and å (uppercase Æ, Ø and Å), all of which are very
common in my language. The zip-file itself can contain these characters
without any problems, the only files affected are the ones put into the
zip-file. Same happens with directories, obviously. The funny thing is,
if I extract a zip-file using the same class, the conversion gets
reversed, so the files do end up with the correct names after
extraction. This of course means that if I upload a zip-file created
using WinZip or any other zip-application, any files with special
characters will get translated into completely different characters again.

I've made a table showing the converted characters which can be found
here: http://akkar.sourceforge.net/zipchars.html


WinZip stores filenames in the CP437 (MS-DOS) charset, where Æ = 0x92, and æ
= 0x91 (see http://www.microsoft.com/globaldev/r...e/oem/437.htm). It's
neither Unicode or ISO-8859-1 compatible. Why they're showing up as
characters shown in your chart I'm not sure.
Jul 17 '05 #5
Chung Leong wrote:
"Roy W. Andersen" <ro******@netgoth.org> wrote in message

WinZip stores filenames in the CP437 (MS-DOS) charset, where Æ = 0x92, and æ
= 0x91 (see http://www.microsoft.com/globaldev/r...e/oem/437.htm). It's
neither Unicode or ISO-8859-1 compatible. Why they're showing up as
characters shown in your chart I'm not sure.


Apparently that was exactly the right piece of information I needed :D

The PclZip class has an option of passing the files through a callback
function before adding from or extracting to the archive, and by using
that option I actually managed to get it working.

Before adding the file:
iconv("ISO-8859-1", "CP437", $p_header['filename'])

And, of course, before extracting:
iconv("CP437", "ISO-8859-1", $p_header['filename'])

And now it works! Thank you very much! I'd just written this problem on
the "Known Issues" list for the upcoming release of my project, and now
I can safely remove it again! Thank you thank you thank you! And
everyone else who offered help as well, of course, but this little piece
of info unlocked the riddle :)
Roy W. Andersen
--
ra at broadpark dot no / http://roy.netgoth.org/

"Hey! What kind of party is this? There's no booze
and only one hooker!" - Bender, Futurama
Jul 17 '05 #6
Roy W. Andersen wrote:
Before adding the file:
iconv("ISO-8859-1", "CP437", $p_header['filename'])


I jumped the gun a bit here, but it was the right track atleast :)

Using CP850 worked, but CP437 didn't handle all my characters that well
(I remember codepage 850 and/or 865 is what I used back in the days of
good old MS-DOS).

Still though, it works now :) Hopefully others with the same problem in
the future have an easier time finding the answer thanks to this thread
- I sure wish I had ;)
Roy W. Andersen
--
ra at broadpark dot no / http://roy.netgoth.org/

"Hey! What kind of party is this? There's no booze
and only one hooker!" - Bender, Futurama
Jul 17 '05 #7
rl
Roy,
rl wrote:
Hello Roy,
[snip]
Thanks for the help, it's appreciated even though it unfortunately
doesn't help me much.

Doesn't look like you've read.
Guess I'll just have to wait for the developer of
my class to look into it - I don't really know where to begin looking
for the cause, although I have tried. I suspect the error lies in the
use of pack() and unpack() which are functions I don't understand how
work (the PHP manual doesn't help me there - it's just my knowledge of
working with binary files that's limiting). I'm no expert on character
encodings either (I know ISO-8859-1 and UTF-8, but that's about it), so
I've been more or less stumbling around blindly in the code, and when
I've tried some changes I've ended up with corrupted zip-files ;) pack and unpack leave character encoding completely untouched, if you
specify the conversion accordingly. That's why I asked for respective
code sniplets (if you're allowed).
And: All files to be put to the zip must be opened in binary mode if need
to be opened by yourself anyway.
I had no problems putting complete binary files to database and fetching
out again via unpack and pack.
I've tried several different settings using setlocale() and it doesn't
make any difference at all, so I've concluded that the zip-class doesn't
use any functions affected by PHP's own locale-setting.

What's strange though is that I can't find any reference to this problem
anywhere on the web or the google Usenet archive, and the problem
doesn't only affect 'æ', 'ø' and 'å' but all special
language-characters. That lead me to believe it was the settings on my
server that was causing it, but when testing it on my project's
sourceforge webspace I got the same thing happening there as well, which
again tells me the problem is with the class itself.

Or with your own code ...
Jul 17 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
2643
by: Lars Michael | last post by:
Hi, I'm on a Win2000 SP4 using IIS5, PHP4 (CGI-version) and MySQL My problem is this: Instead of getting nicely formatted danish characters when I use PHP to retrieve data from MySQL, all I...
0
1071
by: Peter Hemmingsen | last post by:
Hi, I've an asp.net (cs code behind) application where I want the user to be able to download a file with national (danish) characters. Ex: Response.Clear(); // clear the current output...
1
1390
by: Rikart Pettersen | last post by:
Hi I have problems with the Norwegian characters æøå disappearing in UserControls. I had the same problem for aspx pages, but when I changed the charset to utf-8 this solved the problem for aspx....
0
1261
by: Eric Carr | last post by:
Hi, we have been using WMI from VB6 to automate configuration of new DNS zones on our win2000 servers, and are now trying to move the system to a vb.net application. We recently added some...
5
8601
by: Sakharam Phapale | last post by:
Hi All, I am using an API function, which takes file path as an input. When file path contains special characters (@,#,$,%,&,^, etc), API function gives an error as "Unable to open input file"....
4
2007
by: Martin | last post by:
Using Python / ASP on a IIS server with Mark Hammond's win32 extensions, i have the following problem. All occurences of local characters (fx. danish æøå) in comments or in strings result in a...
2
6673
by: joakim.hove | last post by:
Hello, I am having great problems writing norwegian characters æøå to file from a python application. My (simplified) scenario is as follows: 1. I have a web form where the user can enter his...
1
2028
by: Ask Josephsen | last post by:
Hi I want to create an image with vertical text, but how do I handle special chars like "æøå"? I'm using php version 4.4.2. This is my code so far: $string = "æøå"; $fh = 6;
3
10172
KevinADC
by: KevinADC | last post by:
Purpose The purpose of this article is to discuss the difference between characters inside a character class and outside a character class and some special characters inside a character class....
0
7134
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
7012
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
7225
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
5479
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
1
4920
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
4605
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
3105
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
3101
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
1429
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.