473,583 Members | 3,010 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Special characters (æøå) and zipfiles


I've been searching google about this for days but can't find anything,
so I'm hoping someone here can help me out.

I'm trying to create zip-files without needing the zip-file extension in
PHP, mainly because I need the ability to both create and extract
zip-files. I've tried a couple of classes found here and there, and they
all seem to have the same problem. I'm currently using PclZip
(http://phpconcept.net/pclzip/) but even the simplest one I've tried
(zip.lib.php from phpMyAdmin) gives the same result.

This is the problem:

When I create a zip-file containing any file with special characters in
their filenames, the characters gets translated into different special
characters. The three characters I myself am having problems with is the
Norwegian æ, ø and å (uppercase Æ, Ø and Å), all of which are very
common in my language. The zip-file itself can contain these characters
without any problems, the only files affected are the ones put into the
zip-file. Same happens with directories, obviously. The funny thing is,
if I extract a zip-file using the same class, the conversion gets
reversed, so the files do end up with the correct names after
extraction. This of course means that if I upload a zip-file created
using WinZip or any other zip-application, any files with special
characters will get translated into completely different characters again.

I've made a table showing the converted characters which can be found
here: http://akkar.sourceforge.net/zipchars.html

Also very strange - I tried making a zip-file containing a zero-length
file with the special characters in the filename, and when opening that
zip-file in a hex-editor I wasn't able to find the hex values for the
converted characters anywhere in the file, but the original filename
characters were found at the places where I expected them to be.

If someone can help me figure out what's going on I would really
appreciate it. I've submitted it as a bug for PclZip but it hasn't
gotten any response yet, and since I've seen the same thing happen with
other classes I sort of doubt it's only related to PclZip. I've tried it
on different servers as well, and with the same result. I've got the
impression that PclZip is a popular class for managing zip-files, so I'm
hoping anyone with some experience with it can help me out.

Thanks in advance :)
Roy W. Andersen
--
ra at broadpark dot no / http://roy.netgoth.org/

"Hey! What kind of party is this? There's no booze
and only one hooker!" - Bender, Futurama
Jul 17 '05 #1
7 8978
rl
Hello Roy,

the obvious problem is that the data of the files to be zipped, is not
treated as binary data, as it should. This may root in your own file/
variable handling. php is loose typed, what means that the type of a
variable is selected by the php engine automatically. What can go
wrong...
thus more info on what you exactly do would be needed to locate the
problem within your code or the lib used.

The other thing is the treatment of the file names. these names are
character data and stored in a zip-file as that, with no information
on the encoding as I found at least at a first glance at
http://www.pkware.com/company/standards/appnote/
..
http://www.geocities.com/marcoschmid...le-format.html
Also states "No support for extended character sets in file names" as
a limitation to this file format.
So the only thing you can use for sure is 7-Bit-ASCII. But as the
ISO-Latin-1 code table is used wide spread (and contains all
scandinavian special character), the problems you face tends to be
caused by automatic conversions, too, as a filename typed on your own
computer shouldn't lead to any difference when again displayed there.
Have look at 'setlocale' at php.net.

Cheers,

Robert
Jul 17 '05 #2
rl wrote:
Hello Roy,

[snip]

Thanks for the help, it's appreciated even though it unfortunately
doesn't help me much. Guess I'll just have to wait for the developer of
my class to look into it - I don't really know where to begin looking
for the cause, although I have tried. I suspect the error lies in the
use of pack() and unpack() which are functions I don't understand how
work (the PHP manual doesn't help me there - it's just my knowledge of
working with binary files that's limiting). I'm no expert on character
encodings either (I know ISO-8859-1 and UTF-8, but that's about it), so
I've been more or less stumbling around blindly in the code, and when
I've tried some changes I've ended up with corrupted zip-files ;)

I've tried several different settings using setlocale() and it doesn't
make any difference at all, so I've concluded that the zip-class doesn't
use any functions affected by PHP's own locale-setting.

What's strange though is that I can't find any reference to this problem
anywhere on the web or the google Usenet archive, and the problem
doesn't only affect 'æ', 'ø' and 'å' but all special
language-characters. That lead me to believe it was the settings on my
server that was causing it, but when testing it on my project's
sourceforge webspace I got the same thing happening there as well, which
again tells me the problem is with the class itself.
Roy W. Andersen
--
ra at broadpark dot no / http://roy.netgoth.org/

"Hey! What kind of party is this? There's no booze
and only one hooker!" - Bender, Futurama
Jul 17 '05 #3
Roy W. Andersen wrote:
That lead me to believe ... the problem is with the class itself.


Can you try the CLI version of PKZip or WinZip?
I believe they have demo versions available.
--
Mail to my "From:" address is readable by all at http://www.dodgeit.com/
== ** ## !! ------------------------------------------------ !! ## ** ==
TEXT-ONLY mail to the whole "Reply-To:" address ("My Name" <my@address>)
may bypass my spam filter. If it does, I may reply from another address!
Jul 17 '05 #4
"Roy W. Andersen" <ro******@netgo th.org> wrote in message
news:34******** *****@individua l.net...
[ ... ]
This is the problem:

When I create a zip-file containing any file with special characters in
their filenames, the characters gets translated into different special
characters. The three characters I myself am having problems with is the
Norwegian æ, ø and å (uppercase Æ, Ø and Å), all of which are very
common in my language. The zip-file itself can contain these characters
without any problems, the only files affected are the ones put into the
zip-file. Same happens with directories, obviously. The funny thing is,
if I extract a zip-file using the same class, the conversion gets
reversed, so the files do end up with the correct names after
extraction. This of course means that if I upload a zip-file created
using WinZip or any other zip-application, any files with special
characters will get translated into completely different characters again.

I've made a table showing the converted characters which can be found
here: http://akkar.sourceforge.net/zipchars.html


WinZip stores filenames in the CP437 (MS-DOS) charset, where Æ = 0x92, and æ
= 0x91 (see http://www.microsoft.com/globaldev/r...e/oem/437.htm). It's
neither Unicode or ISO-8859-1 compatible. Why they're showing up as
characters shown in your chart I'm not sure.
Jul 17 '05 #5
Chung Leong wrote:
"Roy W. Andersen" <ro******@netgo th.org> wrote in message

WinZip stores filenames in the CP437 (MS-DOS) charset, where Æ = 0x92, and æ
= 0x91 (see http://www.microsoft.com/globaldev/r...e/oem/437.htm). It's
neither Unicode or ISO-8859-1 compatible. Why they're showing up as
characters shown in your chart I'm not sure.


Apparently that was exactly the right piece of information I needed :D

The PclZip class has an option of passing the files through a callback
function before adding from or extracting to the archive, and by using
that option I actually managed to get it working.

Before adding the file:
iconv("ISO-8859-1", "CP437", $p_header['filename'])

And, of course, before extracting:
iconv("CP437", "ISO-8859-1", $p_header['filename'])

And now it works! Thank you very much! I'd just written this problem on
the "Known Issues" list for the upcoming release of my project, and now
I can safely remove it again! Thank you thank you thank you! And
everyone else who offered help as well, of course, but this little piece
of info unlocked the riddle :)
Roy W. Andersen
--
ra at broadpark dot no / http://roy.netgoth.org/

"Hey! What kind of party is this? There's no booze
and only one hooker!" - Bender, Futurama
Jul 17 '05 #6
Roy W. Andersen wrote:
Before adding the file:
iconv("ISO-8859-1", "CP437", $p_header['filename'])


I jumped the gun a bit here, but it was the right track atleast :)

Using CP850 worked, but CP437 didn't handle all my characters that well
(I remember codepage 850 and/or 865 is what I used back in the days of
good old MS-DOS).

Still though, it works now :) Hopefully others with the same problem in
the future have an easier time finding the answer thanks to this thread
- I sure wish I had ;)
Roy W. Andersen
--
ra at broadpark dot no / http://roy.netgoth.org/

"Hey! What kind of party is this? There's no booze
and only one hooker!" - Bender, Futurama
Jul 17 '05 #7
rl
Roy,
rl wrote:
Hello Roy,
[snip]
Thanks for the help, it's appreciated even though it unfortunately
doesn't help me much.

Doesn't look like you've read.
Guess I'll just have to wait for the developer of
my class to look into it - I don't really know where to begin looking
for the cause, although I have tried. I suspect the error lies in the
use of pack() and unpack() which are functions I don't understand how
work (the PHP manual doesn't help me there - it's just my knowledge of
working with binary files that's limiting). I'm no expert on character
encodings either (I know ISO-8859-1 and UTF-8, but that's about it), so
I've been more or less stumbling around blindly in the code, and when
I've tried some changes I've ended up with corrupted zip-files ;) pack and unpack leave character encoding completely untouched, if you
specify the conversion accordingly. That's why I asked for respective
code sniplets (if you're allowed).
And: All files to be put to the zip must be opened in binary mode if need
to be opened by yourself anyway.
I had no problems putting complete binary files to database and fetching
out again via unpack and pack.
I've tried several different settings using setlocale() and it doesn't
make any difference at all, so I've concluded that the zip-class doesn't
use any functions affected by PHP's own locale-setting.

What's strange though is that I can't find any reference to this problem
anywhere on the web or the google Usenet archive, and the problem
doesn't only affect 'æ', 'ø' and 'å' but all special
language-characters. That lead me to believe it was the settings on my
server that was causing it, but when testing it on my project's
sourceforge webspace I got the same thing happening there as well, which
again tells me the problem is with the class itself.

Or with your own code ...
Jul 17 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
2649
by: Lars Michael | last post by:
Hi, I'm on a Win2000 SP4 using IIS5, PHP4 (CGI-version) and MySQL My problem is this: Instead of getting nicely formatted danish characters when I use PHP to retrieve data from MySQL, all I get is a '}'-character. If I access the database through a DOS promt, everything looks just like it should. All characters are formatted correctly.
0
1077
by: Peter Hemmingsen | last post by:
Hi, I've an asp.net (cs code behind) application where I want the user to be able to download a file with national (danish) characters. Ex: Response.Clear(); // clear the current output content from the buffer Response.AddHeader("Content-Disposition", "attachment; filename=" +
1
1396
by: Rikart Pettersen | last post by:
Hi I have problems with the Norwegian characters æøå disappearing in UserControls. I had the same problem for aspx pages, but when I changed the charset to utf-8 this solved the problem for aspx. I have seen people in other forums having the same problem as I have, but it seems that this is an unresolved problem for all. Rikart Pettersen
0
1270
by: Eric Carr | last post by:
Hi, we have been using WMI from VB6 to automate configuration of new DNS zones on our win2000 servers, and are now trying to move the system to a vb.net application. We recently added some zones with names containing national characters (norwegian ÆØÅ), and then everything stopped working, including the old VB6 app. The error occurs when...
5
8610
by: Sakharam Phapale | last post by:
Hi All, I am using an API function, which takes file path as an input. When file path contains special characters (@,#,$,%,&,^, etc), API function gives an error as "Unable to open input file". Same file path containing special characters works fine in one machine, but doesn't work in other. I am using following API function to get short...
4
2012
by: Martin | last post by:
Using Python / ASP on a IIS server with Mark Hammond's win32 extensions, i have the following problem. All occurences of local characters (fx. danish æøå) in comments or in strings result in a HTTP/1.1 500 Server Error. Is there a solution to this problem? /Martin
2
6688
by: joakim.hove | last post by:
Hello, I am having great problems writing norwegian characters æøå to file from a python application. My (simplified) scenario is as follows: 1. I have a web form where the user can enter his name. 2. I use the cgi module module to get to the input from the user: .... name = form.value
1
2042
by: Ask Josephsen | last post by:
Hi I want to create an image with vertical text, but how do I handle special chars like "æøå"? I'm using php version 4.4.2. This is my code so far: $string = "æøå"; $fh = 6;
3
10189
KevinADC
by: KevinADC | last post by:
Purpose The purpose of this article is to discuss the difference between characters inside a character class and outside a character class and some special characters inside a character class. This is not a regular expression tutorial. Assumes you are already familiar with basic regular expression concepts and terminology. If not, you may want...
0
7895
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
8327
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
0
8193
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
0
6579
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
1
5701
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
3818
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
0
3843
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
1433
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
1157
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.