Hi all,
I currently face a problem with htmlentities and german "umlaute".
After moving my scripts to a new box (from Linux to FreeBSD) I had to
see that htmlentities is not working anymore.
The BSD Server (FreeBSD 5.1.2) runs PHP 4.3.9 and Apache 2 as well as
the Linux Server does/did too.
I also tried defining the charset with ISO 8859-1 as 3rd parameter in
htmlentities but without a result.
Any suggestions how to solve this mysterious misery?
Thx
Rob 13 8353
.oO(Robert Zierhofer) I currently face a problem with htmlentities and german "umlaute".
Another question: Why do you want to translate them?
Micha
Michael Fesser wrote: .oO(Robert Zierhofer)
I currently face a problem with htmlentities and german "umlaute".
Another question: Why do you want to translate them?
Micha
Well, I would say that the HTML equivalents are a bit more reliable in
terms of browser display than ä, ö and ü's.
Wouldn't you agree :)
.oO(Robert Zierhofer) Michael Fesser wrote:
Another question: Why do you want to translate them? Well, I would say that the HTML equivalents are a bit more reliable in terms of browser display than ä, ö and ü's. Wouldn't you agree :)
No. I deliver my documents as ISO-8859-1 (Latin-1), which contains
umlauts and other "special" chars. All browsers I have available on my
machines are able to handle that. And if you deliver your documents as
UTF-8 you don't really have to care anymore.
I don't think using entities is really necessary, except for <, >,
" and & sometimes. That's why I've never used htmlentities(),
but htmlspecialchars().
Micha
Robert Zierhofer <ro*@starbugg.de> wrote: Well, I would say that the HTML equivalents are a bit more reliable in terms of browser display than ?, ? and ?'s. Wouldn't you agree :)
No, the document encoding of HTML is Unicode. iso-8859-1 characters
are part of that characterset, to be more precise: the first 256
characters of unicode are equal to us-ascii plus iso-8859-1.
Michael Fesser wrote: .oO(Robert Zierhofer)
Michael Fesser wrote:
Another question: Why do you want to translate them?
Well, I would say that the HTML equivalents are a bit more reliable in terms of browser display than ä, ö and ü's. Wouldn't you agree :)
No. I deliver my documents as ISO-8859-1 (Latin-1), which contains umlauts and other "special" chars. All browsers I have available on my machines are able to handle that. And if you deliver your documents as UTF-8 you don't really have to care anymore.
I don't think using entities is really necessary, except for <, >, " and & sometimes. That's why I've never used htmlentities(), but htmlspecialchars().
Micha
Micha,
I also deliver my documents in ISO-8859-1. Do you use Windows?
As I do not and on all of my Browsers umlauts are not properly displayed.
Greetings
Rob
.oO(Robert Zierhofer) I also deliver my documents in ISO-8859-1. Do you use Windows?
Yep, Win2k most of the time, but I'm also using Linux from time to time.
As I do not and on all of my Browsers umlauts are not properly displayed.
OK, what browsers on what OS? Does it happen in general or only on
particular websites? Can you give an example URL, which uses no entities
and does not look correctly on your system?
Micha
Michael Fesser wrote: .oO(Robert Zierhofer)
I also deliver my documents in ISO-8859-1. Do you use Windows?
Yep, Win2k most of the time, but I'm also using Linux from time to time.
As I do not and on all of my Browsers umlauts are not properly displayed.
OK, what browsers on what OS? Does it happen in general or only on particular websites? Can you give an example URL, which uses no entities and does not look correctly on your system?
Micha
Ok,
OS -> MAC OSX
Browsers -> Safari, Firefox, Mozilla
Yepp, it happens in general.. I think at least :)
Nope let me correct myself - it does not happen in general.
But I can not name the exceptions. But as you know from your maths
class... one exception's enough to proove that a theory is wrong.
My site, the one with the htmlentities problem, is not reachable yet
without editing your host file.
But if you wanna do so, use
213.203.227.121 kingstoncorner.de
The example phrase looks on my browsers like this:
Hallo & <Frau> & KrŠmer, hŠtten Sie šffentliches GetŸmmel vermeiden kšnnen?
This is what I used:
$str = "Hallo & <Frau> & Krämer, hätten Sie öffentliches Getümmel
vermeiden können?";
$encoded = htmlspecialchars($str, ENT_NOQUOTES, "iso-8859-1");
print $encoded;
Regs
Rob
Robert Zierhofer wrote: This is what I used:
$str = "Hallo & <Frau> & Krämer, hätten Sie öffentliches Getümmel vermeiden können?"; $encoded = htmlspecialchars($str, ENT_NOQUOTES, "iso-8859-1"); print $encoded;
Running php 4.3.9 on a Debian GNU/Linux system.
php$ cat umlaut.php
<?php
$str = "Hallo & <Frau> & Krämer, hätten Sie öffentliches Getümmel"
. " vermeiden können?";
echo '1: ', $str, "\n\n";
echo '2: ', htmlspecialchars($str, ENT_NOQUOTES, "iso-8859-1"), "\n\n";
echo '3: ', htmlentities($str), "\n\n";
?>
php$ php umlaut.php
1: Hallo & <Frau> & Krämer, hätten Sie öffentliches Getümmel vermeiden
können?
2: Hallo & <Frau> & Krämer, hätten Sie öffentliches
Getümmel vermeiden können?
3: Hallo & <Frau> & Krämer, hätten Sie
öffentliches Getümmel vermeiden können?
--
Mail to my "From:" address is readable by all at http://www.dodgeit.com/
== ** ## !! ------------------------------------------------ !! ## ** ==
TEXT-ONLY mail to the whole "Reply-To:" address ("My Name" <my@address>)
may bypass my spam filter. If it does, I may reply from another address!
Pedro Graca wrote: Robert Zierhofer wrote:
This is what I used:
$str = "Hallo & <Frau> & Krämer, hätten Sie öffentliches Getümmel vermeiden können?"; $encoded = htmlspecialchars($str, ENT_NOQUOTES, "iso-8859-1"); print $encoded;
Running php 4.3.9 on a Debian GNU/Linux system.
php$ cat umlaut.php <?php $str = "Hallo & <Frau> & Krämer, hätten Sie öffentliches Getümmel" . " vermeiden können?";
echo '1: ', $str, "\n\n"; echo '2: ', htmlspecialchars($str, ENT_NOQUOTES, "iso-8859-1"), "\n\n"; echo '3: ', htmlentities($str), "\n\n"; ?>
php$ php umlaut.php 1: Hallo & <Frau> & Krämer, hätten Sie öffentliches Getümmel vermeiden können?
2: Hallo & <Frau> & Krämer, hätten Sie öffentliches Getümmel vermeiden können?
3: Hallo & <Frau> & Krämer, hätten Sie öffentliches Getümmel vermeiden können?
Hi Pedro,
so it really looks as if it is a FreeBSD issue here :(
Coz this is exactly the behavior of htmlspecialchars(), and
htmlentities() on my old linux box.
Thx for trying though.
Do you have any idea what could be the bug in that specific case?
Regs
Rob
Robert Zierhofer wrote: Do you have any idea what could be the bug in that specific case?
No ... I didn't check php bugs database :)
Try this
<?php
$tab = get_html_translation_table(HTML_ENTITIES);
$tab['¨'] = '¨';
$tab['Ä'] = 'Ä';
$tab['Ë'] = 'Ë';
$tab['Ï'] = 'Ï';
$tab['Ö'] = 'Ö';
$tab['Ü'] = 'Ü';
$tab['ä'] = 'ä';
$tab['ë'] = 'ë';
$tab['ï'] = 'ï';
$tab['ö'] = 'ö';
$tab['ü'] = 'ü';
$tab['ÿ'] = 'ÿ';
$str = "Hallo & <Frau> & Krämer, hätten Sie öffentliches Getümmel"
. " vermeiden können?";
echo strtr($str, $tab), "\n";
?>
--
Mail to my "From:" address is readable by all at http://www.dodgeit.com/
== ** ## !! ------------------------------------------------ !! ## ** ==
TEXT-ONLY mail to the whole "Reply-To:" address ("My Name" <my@address>)
may bypass my spam filter. If it does, I may reply from another address!
On 08 Dec 2004 17:40:23 GMT, Daniel Tryba <sp**@tryba.invalid> wrote: Robert Zierhofer <ro*@starbugg.de> wrote: Well, I would say that the HTML equivalents are a bit more reliable in terms of browser display than ?, ? and ?'s. Wouldn't you agree :)
No, the document encoding of HTML is Unicode.
Don't you mean the document _character set_ of HTML is Unicode (or even more
precisely ISO10646)? The character encoding can then be any encoding that
represents a subset of Unicode.
Actually according to HTML 4.0.1 sec. 5.2.2 it seems the encoding doesn't even
have to be a subset:
"Note. If, for a specific application, it becomes necessary to refer to
characters outside [ISO10646], characters should be assigned to a private zone
to avoid conflicts with present or future versions of the standard. This is
highly discouraged, however, for reasons of portability."
--
Andy Hassall / <an**@andyh.co.uk> / <http://www.andyh.co.uk>
<http://www.andyhsoftware.co.uk/space> Space: disk usage analysis tool
Pedro Graca wrote: Robert Zierhofer wrote:
Do you have any idea what could be the bug in that specific case?
No ... I didn't check php bugs database :)
Try this
<?php $tab = get_html_translation_table(HTML_ENTITIES); $tab['¨'] = '¨'; $tab['Ä'] = 'Ä'; $tab['Ë'] = 'Ë'; $tab['Ï'] = 'Ï'; $tab['Ö'] = 'Ö'; $tab['Ü'] = 'Ü'; $tab['ä'] = 'ä'; $tab['ë'] = 'ë'; $tab['ï'] = 'ï'; $tab['ö'] = 'ö'; $tab['ü'] = 'ü'; $tab['ÿ'] = 'ÿ';
$str = "Hallo & <Frau> & Krämer, hätten Sie öffentliches Getümmel" . " vermeiden können?"; echo strtr($str, $tab), "\n"; ?>
Hi Pedro,
nice work around!
works fine - will though continue to look for an overall solution for
the problem.
Thank you very much for your help.
Regs
Rob
Andy Hassall <an**@andyh.co.uk> wrote: No, the document encoding of HTML is Unicode. Don't you mean the document _character set_ of HTML is Unicode (or even more precisely ISO10646)?
It's the same to me.
The character encoding can then be any encoding that represents a subset of Unicode.
Character encoding, called charset in the HTTP/1.1 RFC
<q src='http://www.ietf.org/rfc/rfc2616.txt'>
Note: This use of the term "character set" is more commonly
referred to as a "character encoding." However, since HTTP and
MIME share the same registry, it is important that the
terminology also be shared.
</q>
Is at the base of all confusion of the terms document/character
encoding.
HTML uses internally unicode, the documents get transfered by eg HTTP in
a specific encoding, mostly due to efficientcy (why send multiple bytes
per character when you can suffice with 1 byte if you only use a
specific subset like iso-8859-1). And to make things even worse 2 HTTP
gateways may choose to encode the bytestream (eg to make it 7bit clean).
Actually according to HTML 4.0.1 sec. 5.2.2 it seems the encoding doesn't even have to be a subset:
"Note. If, for a specific application, it becomes necessary to refer to characters outside [ISO10646], characters should be assigned to a private zone to avoid conflicts with present or future versions of the standard. This is highly discouraged, however, for reasons of portability."
So if you have a character that isn't in unicode, you'll have to add it
to unicode (the iuserdefine private zones) to make it work, so it
unicode again :) This discussion thread is closed Replies have been disabled for this discussion. Similar topics
16 posts
views
Thread by Dany |
last post: by
|
12 posts
views
Thread by Uwe Braunholz |
last post: by
| | | | | | | | | | |