htmentities does not translate german "umlaute"

Robert Zierhofer

Hi all,

I currently face a problem with htmlentities and german "umlaute".
After moving my scripts to a new box (from Linux to FreeBSD) I had to
see that htmlentities is not working anymore.
The BSD Server (FreeBSD 5.1.2) runs PHP 4.3.9 and Apache 2 as well as
the Linux Server does/did too.

I also tried defining the charset with ISO 8859-1 as 3rd parameter in
htmlentities but without a result.

Any suggestions how to solve this mysterious misery?

Thx
Rob

Jul 17 '05 #1

Subscribe Post Reply

8488

Michael Fesser

.oO(Robert Zierhofer)

I currently face a problem with htmlentities and german "umlaute".

Another question: Why do you want to translate them?

Micha

Jul 17 '05 #2

Robert Zierhofer

Michael Fesser wrote:

.oO(Robert Zierhofer)

I currently face a problem with htmlentities and german "umlaute".

Another question: Why do you want to translate them?

Micha

Well, I would say that the HTML equivalents are a bit more reliable in
terms of browser display than ä, ö and ü's.
Wouldn't you agree :)

Jul 17 '05 #3

Michael Fesser

.oO(Robert Zierhofer)

Michael Fesser wrote:
Another question: Why do you want to translate them?

Well, I would say that the HTML equivalents are a bit more reliable in
terms of browser display than ä, ö and ü's.
Wouldn't you agree :)

No. I deliver my documents as ISO-8859-1 (Latin-1), which contains
umlauts and other "special" chars. All browsers I have available on my
machines are able to handle that. And if you deliver your documents as
UTF-8 you don't really have to care anymore.

I don't think using entities is really necessary, except for <, >,
" and & sometimes. That's why I've never used htmlentities(),
but htmlspecialchars().

Micha

Jul 17 '05 #4

Daniel Tryba

Robert Zierhofer <ro*@starbugg.de> wrote:

Well, I would say that the HTML equivalents are a bit more reliable in
terms of browser display than ?, ? and ?'s.
Wouldn't you agree :)

No, the document encoding of HTML is Unicode. iso-8859-1 characters
are part of that characterset, to be more precise: the first 256
characters of unicode are equal to us-ascii plus iso-8859-1.

Jul 17 '05 #5

Robert Zierhofer

Michael Fesser wrote:

.oO(Robert Zierhofer)

Michael Fesser wrote:

Another question: Why do you want to translate them?

Well, I would say that the HTML equivalents are a bit more reliable in
terms of browser display than ä, ö and ü's.
Wouldn't you agree :)

No. I deliver my documents as ISO-8859-1 (Latin-1), which contains
umlauts and other "special" chars. All browsers I have available on my
machines are able to handle that. And if you deliver your documents as
UTF-8 you don't really have to care anymore.

I don't think using entities is really necessary, except for <, >,
" and & sometimes. That's why I've never used htmlentities(),
but htmlspecialchars().

Micha

Micha,

I also deliver my documents in ISO-8859-1. Do you use Windows?
As I do not and on all of my Browsers umlauts are not properly displayed.

Greetings
Rob

Jul 17 '05 #6

Michael Fesser

.oO(Robert Zierhofer)

I also deliver my documents in ISO-8859-1. Do you use Windows?
Yep, Win2k most of the time, but I'm also using Linux from time to time.
As I do not and on all of my Browsers umlauts are not properly displayed.

OK, what browsers on what OS? Does it happen in general or only on
particular websites? Can you give an example URL, which uses no entities
and does not look correctly on your system?

Micha

Jul 17 '05 #7

Robert Zierhofer

Michael Fesser wrote:

.oO(Robert Zierhofer)

I also deliver my documents in ISO-8859-1. Do you use Windows?

Yep, Win2k most of the time, but I'm also using Linux from time to time.

As I do not and on all of my Browsers umlauts are not properly displayed.

OK, what browsers on what OS? Does it happen in general or only on
particular websites? Can you give an example URL, which uses no entities
and does not look correctly on your system?

Micha

Ok,
OS -> MAC OSX
Browsers -> Safari, Firefox, Mozilla
Yepp, it happens in general.. I think at least :)
Nope let me correct myself - it does not happen in general.
But I can not name the exceptions. But as you know from your maths
class... one exception's enough to proove that a theory is wrong.
My site, the one with the htmlentities problem, is not reachable yet
without editing your host file.
But if you wanna do so, use

213.203.227.121 kingstoncorner.de

The example phrase looks on my browsers like this:

Hallo & <Frau> & KrŠmer, hŠtten Sie šffentliches GetŸmmel vermeiden kšnnen?

This is what I used:

$str = "Hallo & <Frau> & Krämer, hätten Sie öffentliches Getümmel
vermeiden können?";
$encoded = htmlspecialchars($str, ENT_NOQUOTES, "iso-8859-1");
print $encoded;

Regs
Rob

Jul 17 '05 #8

Pedro Graca

Robert Zierhofer wrote:

This is what I used:

$str = "Hallo & <Frau> & Krämer, hätten Sie öffentliches Getümmel
vermeiden können?";
$encoded = htmlspecialchars($str, ENT_NOQUOTES, "iso-8859-1");
print $encoded;

Running php 4.3.9 on a Debian GNU/Linux system.

php$ cat umlaut.php
<?php
$str = "Hallo & <Frau> & Krämer, hätten Sie öffentliches Getümmel"
. " vermeiden können?";

echo '1: ', $str, "\n\n";
echo '2: ', htmlspecialchars($str, ENT_NOQUOTES, "iso-8859-1"), "\n\n";
echo '3: ', htmlentities($str), "\n\n";
?>
php$ php umlaut.php
1: Hallo & <Frau> & Krämer, hätten Sie öffentliches Getümmel vermeiden
können?

2: Hallo & <Frau> & Krämer, hätten Sie öffentliches
Getümmel vermeiden können?

3: Hallo & <Frau> & Krämer, hätten Sie
öffentliches Getümmel vermeiden können?
--
Mail to my "From:" address is readable by all at http://www.dodgeit.com/
== ** ## !! ------------------------------------------------ !! ## ** ==
TEXT-ONLY mail to the whole "Reply-To:" address ("My Name" <my@address>)
may bypass my spam filter. If it does, I may reply from another address!

Jul 17 '05 #9

Robert Zierhofer

Pedro Graca wrote:

Robert Zierhofer wrote:
This is what I used:

$str = "Hallo & <Frau> & Krämer, hätten Sie öffentliches Getümmel
vermeiden können?";
$encoded = htmlspecialchars($str, ENT_NOQUOTES, "iso-8859-1");
print $encoded;

Running php 4.3.9 on a Debian GNU/Linux system.

php$ cat umlaut.php
<?php
$str = "Hallo & <Frau> & Krämer, hätten Sie öffentliches Getümmel"
. " vermeiden können?";

echo '1: ', $str, "\n\n";
echo '2: ', htmlspecialchars($str, ENT_NOQUOTES, "iso-8859-1"), "\n\n";
echo '3: ', htmlentities($str), "\n\n";
?>
php$ php umlaut.php
1: Hallo & <Frau> & Krämer, hätten Sie öffentliches Getümmel vermeiden
können?

2: Hallo & <Frau> & Krämer, hätten Sie öffentliches
Getümmel vermeiden können?

3: Hallo & <Frau> & Krämer, hätten Sie
öffentliches Getümmel vermeiden können?

Hi Pedro,

so it really looks as if it is a FreeBSD issue here :(
Coz this is exactly the behavior of htmlspecialchars(), and
htmlentities() on my old linux box.

Thx for trying though.
Do you have any idea what could be the bug in that specific case?
Regs
Rob

Jul 17 '05 #10

Pedro Graca

Robert Zierhofer wrote:

Do you have any idea what could be the bug in that specific case?

No ... I didn't check php bugs database :)

Try this
<?php
$tab = get_html_translation_table(HTML_ENTITIES);
$tab['¨'] = '¨';
$tab['Ä'] = 'Ä';
$tab['Ë'] = 'Ë';
$tab['Ï'] = 'Ï';
$tab['Ö'] = 'Ö';
$tab['Ü'] = 'Ü';
$tab['ä'] = 'ä';
$tab['ë'] = 'ë';
$tab['ï'] = 'ï';
$tab['ö'] = 'ö';
$tab['ü'] = 'ü';
$tab['ÿ'] = 'ÿ';

$str = "Hallo & <Frau> & Krämer, hätten Sie öffentliches Getümmel"
. " vermeiden können?";
echo strtr($str, $tab), "\n";
?>

--
Mail to my "From:" address is readable by all at http://www.dodgeit.com/
== ** ## !! ------------------------------------------------ !! ## ** ==
TEXT-ONLY mail to the whole "Reply-To:" address ("My Name" <my@address>)
may bypass my spam filter. If it does, I may reply from another address!

Jul 17 '05 #11

Andy Hassall

On 08 Dec 2004 17:40:23 GMT, Daniel Tryba <sp**@tryba.invalid> wrote:

Robert Zierhofer <ro*@starbugg.de> wrote:
Well, I would say that the HTML equivalents are a bit more reliable in
terms of browser display than ?, ? and ?'s.
Wouldn't you agree :)

No, the document encoding of HTML is Unicode.

Don't you mean the document _character set_ of HTML is Unicode (or even more
precisely ISO10646)? The character encoding can then be any encoding that
represents a subset of Unicode.

Actually according to HTML 4.0.1 sec. 5.2.2 it seems the encoding doesn't even
have to be a subset:

"Note. If, for a specific application, it becomes necessary to refer to
characters outside [ISO10646], characters should be assigned to a private zone
to avoid conflicts with present or future versions of the standard. This is
highly discouraged, however, for reasons of portability."

--
Andy Hassall / <an**@andyh.co.uk> / <http://www.andyh.co.uk>
<http://www.andyhsoftware.co.uk/space> Space: disk usage analysis tool

Jul 17 '05 #12

Robert Zierhofer

Pedro Graca wrote:

Robert Zierhofer wrote:
Do you have any idea what could be the bug in that specific case?

No ... I didn't check php bugs database :)

Try this
<?php
$tab = get_html_translation_table(HTML_ENTITIES);
$tab['¨'] = '¨';
$tab['Ä'] = 'Ä';
$tab['Ë'] = 'Ë';
$tab['Ï'] = 'Ï';
$tab['Ö'] = 'Ö';
$tab['Ü'] = 'Ü';
$tab['ä'] = 'ä';
$tab['ë'] = 'ë';
$tab['ï'] = 'ï';
$tab['ö'] = 'ö';
$tab['ü'] = 'ü';
$tab['ÿ'] = 'ÿ';

$str = "Hallo & <Frau> & Krämer, hätten Sie öffentliches Getümmel"
. " vermeiden können?";
echo strtr($str, $tab), "\n";
?>

Hi Pedro,
nice work around!
works fine - will though continue to look for an overall solution for
the problem.
Thank you very much for your help.
Regs
Rob

Jul 17 '05 #13

Daniel Tryba

Andy Hassall <an**@andyh.co.uk> wrote:

No, the document encoding of HTML is Unicode.
Don't you mean the document _character set_ of HTML is Unicode (or even more
precisely ISO10646)?

It's the same to me.
The character encoding can then be any encoding that represents a
subset of Unicode.
Character encoding, called charset in the HTTP/1.1 RFC
<q src='http://www.ietf.org/rfc/rfc2616.txt'>
Note: This use of the term "character set" is more commonly
referred to as a "character encoding." However, since HTTP and
MIME share the same registry, it is important that the
terminology also be shared.
</q>

Is at the base of all confusion of the terms document/character
encoding.

HTML uses internally unicode, the documents get transfered by eg HTTP in
a specific encoding, mostly due to efficientcy (why send multiple bytes
per character when you can suffice with 1 byte if you only use a
specific subset like iso-8859-1). And to make things even worse 2 HTTP
gateways may choose to encode the bytestream (eg to make it 7bit clean).
Actually according to HTML 4.0.1 sec. 5.2.2 it seems the encoding doesn't even
have to be a subset:

"Note. If, for a specific application, it becomes necessary to refer to
characters outside [ISO10646], characters should be assigned to a private zone
to avoid conflicts with present or future versions of the standard. This is
highly discouraged, however, for reasons of portability."

So if you have a character that isn't in unicode, you'll have to add it
to unicode (the iuserdefine private zones) to make it work, so it
unicode again :)

Jul 17 '05 #14

by: Dany | last post by:

Our web service was working fine until we installed .net Framework 1.1 service pack 1. Uninstalling SP1 is not an option because our largest customer says service packs marked as "critical" by...

.NET Framework

German "Umlaute" in QueryString

by: Uwe Braunholz | last post by:

Hello, working on a asp.net Website brought me to a strange problem. I want to enable my users to pass a search string via the query string of an url. It works if the user calls the URL...

ASP.NET

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

htmentities does not translate german "umlaute"

Similar topics