473,409 Members | 1,934 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,409 software developers and data experts.

htmentities does not translate german "umlaute"

Hi all,

I currently face a problem with htmlentities and german "umlaute".
After moving my scripts to a new box (from Linux to FreeBSD) I had to
see that htmlentities is not working anymore.
The BSD Server (FreeBSD 5.1.2) runs PHP 4.3.9 and Apache 2 as well as
the Linux Server does/did too.

I also tried defining the charset with ISO 8859-1 as 3rd parameter in
htmlentities but without a result.

Any suggestions how to solve this mysterious misery?

Thx
Rob
Jul 17 '05 #1
13 8488
.oO(Robert Zierhofer)
I currently face a problem with htmlentities and german "umlaute".


Another question: Why do you want to translate them?

Micha
Jul 17 '05 #2
Michael Fesser wrote:
.oO(Robert Zierhofer)

I currently face a problem with htmlentities and german "umlaute".

Another question: Why do you want to translate them?

Micha

Well, I would say that the HTML equivalents are a bit more reliable in
terms of browser display than ä, ö and ü's.
Wouldn't you agree :)
Jul 17 '05 #3
.oO(Robert Zierhofer)
Michael Fesser wrote:
Another question: Why do you want to translate them?

Well, I would say that the HTML equivalents are a bit more reliable in
terms of browser display than ä, ö and ü's.
Wouldn't you agree :)


No. I deliver my documents as ISO-8859-1 (Latin-1), which contains
umlauts and other "special" chars. All browsers I have available on my
machines are able to handle that. And if you deliver your documents as
UTF-8 you don't really have to care anymore.

I don't think using entities is really necessary, except for <, >,
" and & sometimes. That's why I've never used htmlentities(),
but htmlspecialchars().

Micha
Jul 17 '05 #4
Robert Zierhofer <ro*@starbugg.de> wrote:
Well, I would say that the HTML equivalents are a bit more reliable in
terms of browser display than ?, ? and ?'s.
Wouldn't you agree :)


No, the document encoding of HTML is Unicode. iso-8859-1 characters
are part of that characterset, to be more precise: the first 256
characters of unicode are equal to us-ascii plus iso-8859-1.

Jul 17 '05 #5
Michael Fesser wrote:
.oO(Robert Zierhofer)

Michael Fesser wrote:

Another question: Why do you want to translate them?


Well, I would say that the HTML equivalents are a bit more reliable in
terms of browser display than ä, ö and ü's.
Wouldn't you agree :)

No. I deliver my documents as ISO-8859-1 (Latin-1), which contains
umlauts and other "special" chars. All browsers I have available on my
machines are able to handle that. And if you deliver your documents as
UTF-8 you don't really have to care anymore.

I don't think using entities is really necessary, except for &lt;, &gt;,
&quot; and &amp; sometimes. That's why I've never used htmlentities(),
but htmlspecialchars().

Micha

Micha,

I also deliver my documents in ISO-8859-1. Do you use Windows?
As I do not and on all of my Browsers umlauts are not properly displayed.

Greetings
Rob
Jul 17 '05 #6
.oO(Robert Zierhofer)
I also deliver my documents in ISO-8859-1. Do you use Windows?
Yep, Win2k most of the time, but I'm also using Linux from time to time.
As I do not and on all of my Browsers umlauts are not properly displayed.


OK, what browsers on what OS? Does it happen in general or only on
particular websites? Can you give an example URL, which uses no entities
and does not look correctly on your system?

Micha
Jul 17 '05 #7
Michael Fesser wrote:
.oO(Robert Zierhofer)

I also deliver my documents in ISO-8859-1. Do you use Windows?

Yep, Win2k most of the time, but I'm also using Linux from time to time.

As I do not and on all of my Browsers umlauts are not properly displayed.

OK, what browsers on what OS? Does it happen in general or only on
particular websites? Can you give an example URL, which uses no entities
and does not look correctly on your system?

Micha

Ok,
OS -> MAC OSX
Browsers -> Safari, Firefox, Mozilla
Yepp, it happens in general.. I think at least :)
Nope let me correct myself - it does not happen in general.
But I can not name the exceptions. But as you know from your maths
class... one exception's enough to proove that a theory is wrong.
My site, the one with the htmlentities problem, is not reachable yet
without editing your host file.
But if you wanna do so, use

213.203.227.121 kingstoncorner.de

The example phrase looks on my browsers like this:

Hallo & <Frau> & KrŠmer, hŠtten Sie šffentliches GetŸmmel vermeiden kšnnen?

This is what I used:

$str = "Hallo & <Frau> & Krämer, hätten Sie öffentliches Getümmel
vermeiden können?";
$encoded = htmlspecialchars($str, ENT_NOQUOTES, "iso-8859-1");
print $encoded;

Regs
Rob
Jul 17 '05 #8
Robert Zierhofer wrote:
This is what I used:

$str = "Hallo & <Frau> & Krämer, hätten Sie öffentliches Getümmel
vermeiden können?";
$encoded = htmlspecialchars($str, ENT_NOQUOTES, "iso-8859-1");
print $encoded;


Running php 4.3.9 on a Debian GNU/Linux system.

php$ cat umlaut.php
<?php
$str = "Hallo & <Frau> & Krämer, hätten Sie öffentliches Getümmel"
. " vermeiden können?";

echo '1: ', $str, "\n\n";
echo '2: ', htmlspecialchars($str, ENT_NOQUOTES, "iso-8859-1"), "\n\n";
echo '3: ', htmlentities($str), "\n\n";
?>
php$ php umlaut.php
1: Hallo & <Frau> & Krämer, hätten Sie öffentliches Getümmel vermeiden
können?

2: Hallo &amp; &lt;Frau&gt; &amp; Krämer, hätten Sie öffentliches
Getümmel vermeiden können?

3: Hallo &amp; &lt;Frau&gt; &amp; Kr&auml;mer, h&auml;tten Sie
&ouml;ffentliches Get&uuml;mmel vermeiden k&ouml;nnen?
--
Mail to my "From:" address is readable by all at http://www.dodgeit.com/
== ** ## !! ------------------------------------------------ !! ## ** ==
TEXT-ONLY mail to the whole "Reply-To:" address ("My Name" <my@address>)
may bypass my spam filter. If it does, I may reply from another address!
Jul 17 '05 #9
Pedro Graca wrote:
Robert Zierhofer wrote:
This is what I used:

$str = "Hallo & <Frau> & Krämer, hätten Sie öffentliches Getümmel
vermeiden können?";
$encoded = htmlspecialchars($str, ENT_NOQUOTES, "iso-8859-1");
print $encoded;

Running php 4.3.9 on a Debian GNU/Linux system.

php$ cat umlaut.php
<?php
$str = "Hallo & <Frau> & Krämer, hätten Sie öffentliches Getümmel"
. " vermeiden können?";

echo '1: ', $str, "\n\n";
echo '2: ', htmlspecialchars($str, ENT_NOQUOTES, "iso-8859-1"), "\n\n";
echo '3: ', htmlentities($str), "\n\n";
?>
php$ php umlaut.php
1: Hallo & <Frau> & Krämer, hätten Sie öffentliches Getümmel vermeiden
können?

2: Hallo &amp; &lt;Frau&gt; &amp; Krämer, hätten Sie öffentliches
Getümmel vermeiden können?

3: Hallo &amp; &lt;Frau&gt; &amp; Kr&auml;mer, h&auml;tten Sie
&ouml;ffentliches Get&uuml;mmel vermeiden k&ouml;nnen?

Hi Pedro,

so it really looks as if it is a FreeBSD issue here :(
Coz this is exactly the behavior of htmlspecialchars(), and
htmlentities() on my old linux box.

Thx for trying though.
Do you have any idea what could be the bug in that specific case?
Regs
Rob
Jul 17 '05 #10
Robert Zierhofer wrote:
Do you have any idea what could be the bug in that specific case?


No ... I didn't check php bugs database :)

Try this
<?php
$tab = get_html_translation_table(HTML_ENTITIES);
$tab['¨'] = '&uml;';
$tab['Ä'] = '&Auml;';
$tab['Ë'] = '&Euml;';
$tab['Ï'] = '&Iuml;';
$tab['Ö'] = '&Ouml;';
$tab['Ü'] = '&Uuml;';
$tab['ä'] = '&auml;';
$tab['ë'] = '&euml;';
$tab['ï'] = '&iuml;';
$tab['ö'] = '&ouml;';
$tab['ü'] = '&uuml;';
$tab['ÿ'] = '&yuml;';

$str = "Hallo & <Frau> & Krämer, hätten Sie öffentliches Getümmel"
. " vermeiden können?";
echo strtr($str, $tab), "\n";
?>

--
Mail to my "From:" address is readable by all at http://www.dodgeit.com/
== ** ## !! ------------------------------------------------ !! ## ** ==
TEXT-ONLY mail to the whole "Reply-To:" address ("My Name" <my@address>)
may bypass my spam filter. If it does, I may reply from another address!
Jul 17 '05 #11
On 08 Dec 2004 17:40:23 GMT, Daniel Tryba <sp**@tryba.invalid> wrote:
Robert Zierhofer <ro*@starbugg.de> wrote:
Well, I would say that the HTML equivalents are a bit more reliable in
terms of browser display than ?, ? and ?'s.
Wouldn't you agree :)


No, the document encoding of HTML is Unicode.


Don't you mean the document _character set_ of HTML is Unicode (or even more
precisely ISO10646)? The character encoding can then be any encoding that
represents a subset of Unicode.

Actually according to HTML 4.0.1 sec. 5.2.2 it seems the encoding doesn't even
have to be a subset:

"Note. If, for a specific application, it becomes necessary to refer to
characters outside [ISO10646], characters should be assigned to a private zone
to avoid conflicts with present or future versions of the standard. This is
highly discouraged, however, for reasons of portability."

--
Andy Hassall / <an**@andyh.co.uk> / <http://www.andyh.co.uk>
<http://www.andyhsoftware.co.uk/space> Space: disk usage analysis tool
Jul 17 '05 #12
Pedro Graca wrote:
Robert Zierhofer wrote:
Do you have any idea what could be the bug in that specific case?

No ... I didn't check php bugs database :)

Try this
<?php
$tab = get_html_translation_table(HTML_ENTITIES);
$tab['¨'] = '&uml;';
$tab['Ä'] = '&Auml;';
$tab['Ë'] = '&Euml;';
$tab['Ï'] = '&Iuml;';
$tab['Ö'] = '&Ouml;';
$tab['Ü'] = '&Uuml;';
$tab['ä'] = '&auml;';
$tab['ë'] = '&euml;';
$tab['ï'] = '&iuml;';
$tab['ö'] = '&ouml;';
$tab['ü'] = '&uuml;';
$tab['ÿ'] = '&yuml;';

$str = "Hallo & <Frau> & Krämer, hätten Sie öffentliches Getümmel"
. " vermeiden können?";
echo strtr($str, $tab), "\n";
?>

Hi Pedro,
nice work around!
works fine - will though continue to look for an overall solution for
the problem.
Thank you very much for your help.
Regs
Rob
Jul 17 '05 #13
Andy Hassall <an**@andyh.co.uk> wrote:
No, the document encoding of HTML is Unicode.
Don't you mean the document _character set_ of HTML is Unicode (or even more
precisely ISO10646)?


It's the same to me.
The character encoding can then be any encoding that represents a
subset of Unicode.
Character encoding, called charset in the HTTP/1.1 RFC
<q src='http://www.ietf.org/rfc/rfc2616.txt'>
Note: This use of the term "character set" is more commonly
referred to as a "character encoding." However, since HTTP and
MIME share the same registry, it is important that the
terminology also be shared.
</q>

Is at the base of all confusion of the terms document/character
encoding.

HTML uses internally unicode, the documents get transfered by eg HTTP in
a specific encoding, mostly due to efficientcy (why send multiple bytes
per character when you can suffice with 1 byte if you only use a
specific subset like iso-8859-1). And to make things even worse 2 HTTP
gateways may choose to encode the bytestream (eg to make it 7bit clean).
Actually according to HTML 4.0.1 sec. 5.2.2 it seems the encoding doesn't even
have to be a subset:

"Note. If, for a specific application, it becomes necessary to refer to
characters outside [ISO10646], characters should be assigned to a private zone
to avoid conflicts with present or future versions of the standard. This is
highly discouraged, however, for reasons of portability."


So if you have a character that isn't in unicode, you'll have to add it
to unicode (the iuserdefine private zones) to make it work, so it
unicode again :)
Jul 17 '05 #14

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

16
by: Dany | last post by:
Our web service was working fine until we installed .net Framework 1.1 service pack 1. Uninstalling SP1 is not an option because our largest customer says service packs marked as "critical" by...
12
by: Uwe Braunholz | last post by:
Hello, working on a asp.net Website brought me to a strange problem. I want to enable my users to pass a search string via the query string of an url. It works if the user calls the URL...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.