472,127 Members | 2,054 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,127 software developers and data experts.

htmentities does not translate german "umlaute"

Hi all,

I currently face a problem with htmlentities and german "umlaute".
After moving my scripts to a new box (from Linux to FreeBSD) I had to
see that htmlentities is not working anymore.
The BSD Server (FreeBSD 5.1.2) runs PHP 4.3.9 and Apache 2 as well as
the Linux Server does/did too.

I also tried defining the charset with ISO 8859-1 as 3rd parameter in
htmlentities but without a result.

Any suggestions how to solve this mysterious misery?

Thx
Rob
Jul 17 '05 #1
13 8353
.oO(Robert Zierhofer)
I currently face a problem with htmlentities and german "umlaute".


Another question: Why do you want to translate them?

Micha
Jul 17 '05 #2
Michael Fesser wrote:
.oO(Robert Zierhofer)

I currently face a problem with htmlentities and german "umlaute".

Another question: Why do you want to translate them?

Micha

Well, I would say that the HTML equivalents are a bit more reliable in
terms of browser display than ä, ö and ü's.
Wouldn't you agree :)
Jul 17 '05 #3
.oO(Robert Zierhofer)
Michael Fesser wrote:
Another question: Why do you want to translate them?

Well, I would say that the HTML equivalents are a bit more reliable in
terms of browser display than ä, ö and ü's.
Wouldn't you agree :)


No. I deliver my documents as ISO-8859-1 (Latin-1), which contains
umlauts and other "special" chars. All browsers I have available on my
machines are able to handle that. And if you deliver your documents as
UTF-8 you don't really have to care anymore.

I don't think using entities is really necessary, except for <, >,
" and & sometimes. That's why I've never used htmlentities(),
but htmlspecialchars().

Micha
Jul 17 '05 #4
Robert Zierhofer <ro*@starbugg.de> wrote:
Well, I would say that the HTML equivalents are a bit more reliable in
terms of browser display than ?, ? and ?'s.
Wouldn't you agree :)


No, the document encoding of HTML is Unicode. iso-8859-1 characters
are part of that characterset, to be more precise: the first 256
characters of unicode are equal to us-ascii plus iso-8859-1.

Jul 17 '05 #5
Michael Fesser wrote:
.oO(Robert Zierhofer)

Michael Fesser wrote:

Another question: Why do you want to translate them?


Well, I would say that the HTML equivalents are a bit more reliable in
terms of browser display than ä, ö and ü's.
Wouldn't you agree :)

No. I deliver my documents as ISO-8859-1 (Latin-1), which contains
umlauts and other "special" chars. All browsers I have available on my
machines are able to handle that. And if you deliver your documents as
UTF-8 you don't really have to care anymore.

I don't think using entities is really necessary, except for &lt;, &gt;,
&quot; and &amp; sometimes. That's why I've never used htmlentities(),
but htmlspecialchars().

Micha

Micha,

I also deliver my documents in ISO-8859-1. Do you use Windows?
As I do not and on all of my Browsers umlauts are not properly displayed.

Greetings
Rob
Jul 17 '05 #6
.oO(Robert Zierhofer)
I also deliver my documents in ISO-8859-1. Do you use Windows?
Yep, Win2k most of the time, but I'm also using Linux from time to time.
As I do not and on all of my Browsers umlauts are not properly displayed.


OK, what browsers on what OS? Does it happen in general or only on
particular websites? Can you give an example URL, which uses no entities
and does not look correctly on your system?

Micha
Jul 17 '05 #7
Michael Fesser wrote:
.oO(Robert Zierhofer)

I also deliver my documents in ISO-8859-1. Do you use Windows?

Yep, Win2k most of the time, but I'm also using Linux from time to time.

As I do not and on all of my Browsers umlauts are not properly displayed.

OK, what browsers on what OS? Does it happen in general or only on
particular websites? Can you give an example URL, which uses no entities
and does not look correctly on your system?

Micha

Ok,
OS -> MAC OSX
Browsers -> Safari, Firefox, Mozilla
Yepp, it happens in general.. I think at least :)
Nope let me correct myself - it does not happen in general.
But I can not name the exceptions. But as you know from your maths
class... one exception's enough to proove that a theory is wrong.
My site, the one with the htmlentities problem, is not reachable yet
without editing your host file.
But if you wanna do so, use

213.203.227.121 kingstoncorner.de

The example phrase looks on my browsers like this:

Hallo & <Frau> & KrŠmer, hŠtten Sie šffentliches GetŸmmel vermeiden kšnnen?

This is what I used:

$str = "Hallo & <Frau> & Krämer, hätten Sie öffentliches Getümmel
vermeiden können?";
$encoded = htmlspecialchars($str, ENT_NOQUOTES, "iso-8859-1");
print $encoded;

Regs
Rob
Jul 17 '05 #8
Robert Zierhofer wrote:
This is what I used:

$str = "Hallo & <Frau> & Krämer, hätten Sie öffentliches Getümmel
vermeiden können?";
$encoded = htmlspecialchars($str, ENT_NOQUOTES, "iso-8859-1");
print $encoded;


Running php 4.3.9 on a Debian GNU/Linux system.

php$ cat umlaut.php
<?php
$str = "Hallo & <Frau> & Krämer, hätten Sie öffentliches Getümmel"
. " vermeiden können?";

echo '1: ', $str, "\n\n";
echo '2: ', htmlspecialchars($str, ENT_NOQUOTES, "iso-8859-1"), "\n\n";
echo '3: ', htmlentities($str), "\n\n";
?>
php$ php umlaut.php
1: Hallo & <Frau> & Krämer, hätten Sie öffentliches Getümmel vermeiden
können?

2: Hallo &amp; &lt;Frau&gt; &amp; Krämer, hätten Sie öffentliches
Getümmel vermeiden können?

3: Hallo &amp; &lt;Frau&gt; &amp; Kr&auml;mer, h&auml;tten Sie
&ouml;ffentliches Get&uuml;mmel vermeiden k&ouml;nnen?
--
Mail to my "From:" address is readable by all at http://www.dodgeit.com/
== ** ## !! ------------------------------------------------ !! ## ** ==
TEXT-ONLY mail to the whole "Reply-To:" address ("My Name" <my@address>)
may bypass my spam filter. If it does, I may reply from another address!
Jul 17 '05 #9
Pedro Graca wrote:
Robert Zierhofer wrote:
This is what I used:

$str = "Hallo & <Frau> & Krämer, hätten Sie öffentliches Getümmel
vermeiden können?";
$encoded = htmlspecialchars($str, ENT_NOQUOTES, "iso-8859-1");
print $encoded;

Running php 4.3.9 on a Debian GNU/Linux system.

php$ cat umlaut.php
<?php
$str = "Hallo & <Frau> & Krämer, hätten Sie öffentliches Getümmel"
. " vermeiden können?";

echo '1: ', $str, "\n\n";
echo '2: ', htmlspecialchars($str, ENT_NOQUOTES, "iso-8859-1"), "\n\n";
echo '3: ', htmlentities($str), "\n\n";
?>
php$ php umlaut.php
1: Hallo & <Frau> & Krämer, hätten Sie öffentliches Getümmel vermeiden
können?

2: Hallo &amp; &lt;Frau&gt; &amp; Krämer, hätten Sie öffentliches
Getümmel vermeiden können?

3: Hallo &amp; &lt;Frau&gt; &amp; Kr&auml;mer, h&auml;tten Sie
&ouml;ffentliches Get&uuml;mmel vermeiden k&ouml;nnen?

Hi Pedro,

so it really looks as if it is a FreeBSD issue here :(
Coz this is exactly the behavior of htmlspecialchars(), and
htmlentities() on my old linux box.

Thx for trying though.
Do you have any idea what could be the bug in that specific case?
Regs
Rob
Jul 17 '05 #10
Robert Zierhofer wrote:
Do you have any idea what could be the bug in that specific case?


No ... I didn't check php bugs database :)

Try this
<?php
$tab = get_html_translation_table(HTML_ENTITIES);
$tab['¨'] = '&uml;';
$tab['Ä'] = '&Auml;';
$tab['Ë'] = '&Euml;';
$tab['Ï'] = '&Iuml;';
$tab['Ö'] = '&Ouml;';
$tab['Ü'] = '&Uuml;';
$tab['ä'] = '&auml;';
$tab['ë'] = '&euml;';
$tab['ï'] = '&iuml;';
$tab['ö'] = '&ouml;';
$tab['ü'] = '&uuml;';
$tab['ÿ'] = '&yuml;';

$str = "Hallo & <Frau> & Krämer, hätten Sie öffentliches Getümmel"
. " vermeiden können?";
echo strtr($str, $tab), "\n";
?>

--
Mail to my "From:" address is readable by all at http://www.dodgeit.com/
== ** ## !! ------------------------------------------------ !! ## ** ==
TEXT-ONLY mail to the whole "Reply-To:" address ("My Name" <my@address>)
may bypass my spam filter. If it does, I may reply from another address!
Jul 17 '05 #11
On 08 Dec 2004 17:40:23 GMT, Daniel Tryba <sp**@tryba.invalid> wrote:
Robert Zierhofer <ro*@starbugg.de> wrote:
Well, I would say that the HTML equivalents are a bit more reliable in
terms of browser display than ?, ? and ?'s.
Wouldn't you agree :)


No, the document encoding of HTML is Unicode.


Don't you mean the document _character set_ of HTML is Unicode (or even more
precisely ISO10646)? The character encoding can then be any encoding that
represents a subset of Unicode.

Actually according to HTML 4.0.1 sec. 5.2.2 it seems the encoding doesn't even
have to be a subset:

"Note. If, for a specific application, it becomes necessary to refer to
characters outside [ISO10646], characters should be assigned to a private zone
to avoid conflicts with present or future versions of the standard. This is
highly discouraged, however, for reasons of portability."

--
Andy Hassall / <an**@andyh.co.uk> / <http://www.andyh.co.uk>
<http://www.andyhsoftware.co.uk/space> Space: disk usage analysis tool
Jul 17 '05 #12
Pedro Graca wrote:
Robert Zierhofer wrote:
Do you have any idea what could be the bug in that specific case?

No ... I didn't check php bugs database :)

Try this
<?php
$tab = get_html_translation_table(HTML_ENTITIES);
$tab['¨'] = '&uml;';
$tab['Ä'] = '&Auml;';
$tab['Ë'] = '&Euml;';
$tab['Ï'] = '&Iuml;';
$tab['Ö'] = '&Ouml;';
$tab['Ü'] = '&Uuml;';
$tab['ä'] = '&auml;';
$tab['ë'] = '&euml;';
$tab['ï'] = '&iuml;';
$tab['ö'] = '&ouml;';
$tab['ü'] = '&uuml;';
$tab['ÿ'] = '&yuml;';

$str = "Hallo & <Frau> & Krämer, hätten Sie öffentliches Getümmel"
. " vermeiden können?";
echo strtr($str, $tab), "\n";
?>

Hi Pedro,
nice work around!
works fine - will though continue to look for an overall solution for
the problem.
Thank you very much for your help.
Regs
Rob
Jul 17 '05 #13
Andy Hassall <an**@andyh.co.uk> wrote:
No, the document encoding of HTML is Unicode.
Don't you mean the document _character set_ of HTML is Unicode (or even more
precisely ISO10646)?


It's the same to me.
The character encoding can then be any encoding that represents a
subset of Unicode.
Character encoding, called charset in the HTTP/1.1 RFC
<q src='http://www.ietf.org/rfc/rfc2616.txt'>
Note: This use of the term "character set" is more commonly
referred to as a "character encoding." However, since HTTP and
MIME share the same registry, it is important that the
terminology also be shared.
</q>

Is at the base of all confusion of the terms document/character
encoding.

HTML uses internally unicode, the documents get transfered by eg HTTP in
a specific encoding, mostly due to efficientcy (why send multiple bytes
per character when you can suffice with 1 byte if you only use a
specific subset like iso-8859-1). And to make things even worse 2 HTTP
gateways may choose to encode the bytestream (eg to make it 7bit clean).
Actually according to HTML 4.0.1 sec. 5.2.2 it seems the encoding doesn't even
have to be a subset:

"Note. If, for a specific application, it becomes necessary to refer to
characters outside [ISO10646], characters should be assigned to a private zone
to avoid conflicts with present or future versions of the standard. This is
highly discouraged, however, for reasons of portability."


So if you have a character that isn't in unicode, you'll have to add it
to unicode (the iuserdefine private zones) to make it work, so it
unicode again :)
Jul 17 '05 #14

This discussion thread is closed

Replies have been disabled for this discussion.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.