473,382 Members | 1,336 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,382 software developers and data experts.

Diacritical marks in array don't translate

Dear group members,

I cam across a glitch in Javascript and I don't know how to solve it
elegantly.

I have an array with strings of German words:

profile[1] = "Fröhliches Fräulein";

Because HTML doesn't or didn't allow some of these characters, I wrote:

profile[1] = "Fröhliches Fräulein";

but when I use an alert(profile[1]); the dialog displays the escape
codes instead of the diacritical marks. I then figured the unescape()
function would solve the problem, but not. I don't want to write:

profile[1] = "Fr%190hliches Fr%191ulein";
alert(unescape(profile[1]));

The numbers in the above example only serve to illustrate the idea. I
don't know where to look the exact numbers up, unless they are the
ASCII codes. I haven't tried the last technique yet, but I'm pondering
the issue

Any suggestions,
Jean Biver

________________________________________________
Check out my home page at http://homepage.internet.lu/aibiver
Please recommend my seti@home profile at
http://setiathome2.ssl.berkeley.edu/...dback&id=26539

Nov 11 '05 #1
15 1790
On 11 Nov 2005 07:09:43 -0800, jiverbean wrote:
I cam across a glitch in Javascript and I don't know how to solve it
elegantly.

I have an array with strings of German words:

profile[1] = "Fröhliches Fräulein";

Because HTML doesn't or didn't allow some of these characters


Then you need to use a different character set for your document. Try
ISO-8859-1, which allows the standard European accented characters.

--
Safalra (Stephen Morley)
http://www.safalra.com/programming/javascript/
Nov 11 '05 #2
jiverbean wrote:
I have an array with strings of German words:

profile[1] = "Fröhliches Fräulein";

Because HTML doesn't or didn't allow some of these characters,
That's an urban legend that will probably never die. HTML allows these
characters, HTTP is and has been 8-bit-safe. You just need to declare
that with the Content-Type header and, for offline use,

<head>
<meta http-equiv="Content-Type" content="text/html; charset=...">
...
</head>

A good reason for escaping 8-bit characters _in HTML_ is editing on
different platforms without having the knowledge or facility (due to
keyboard layout) to type them there.

<http://www.htmlhelp.com/faq/html/design.html#entity-or-number>
I wrote:

profile[1] = "Fr&ouml;hliches Fr&auml;ulein";
JS (programming language) is not HTML (markup language). This source code
has to be interpreted by the JS engine, and it does not and is not supposed
to "know" how to handle SGML character entity references like "&ouml;".

There is no problem you have to work around.
but when I use an alert(profile[1]); the dialog displays the escape
codes instead of the diacritical marks. I then figured the unescape()
function would solve the problem, but not. I don't want to write:

profile[1] = "Fr%190hliches Fr%191ulein";
alert(unescape(profile[1]));
It is not supposed to work anyway. unescape(), which is proprietary,
accepts only 8-bit escape sequences (in contrast to standardized
decodeURI*()). The above results in

Fr<EM>0hliches Fr<EM>1ulein

where <EM> is the character at code point 0x19 (31).
________________________________________________
[...]


Signatures are to be delimited by a line containing only "--<SP><CR><LF>".
HTH

PointedEars (a German)
Nov 11 '05 #3
jiverbean wrote:
Dear group members,

I cam across a glitch in Javascript and I don't know how to solve it
elegantly.

I have an array with strings of German words:

profile[1] = "Fröhliches Fräulein";
The fact that these words are in an array doesn't matter.
The problem that you are probably having is that the encoding that your
html and/or javascript is saved is in a different encoding than the
encoding you specified in your HTML. Or maybe you forgot to specify the
encoding and the encoding is wrongly auto-detected.

The most useuful encoding in your case is probably UTF-8.
So makes sure you have this in your header:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>

Next make sure that your html/javascript file is in UTF-8 format.
I myself use emeditor as a text editior, because it uses the correct
Unicode terminology for saving files.


Because HTML doesn't or didn't allow some of these characters, I wrote:

profile[1] = "Fr&ouml;hliches Fr&auml;ulein";

but when I use an alert(profile[1]); the dialog displays the escape
codes instead of the diacritical marks.


That's because javascript is not html. Javascript has other mechanisms
for escaping characters such as the \u for any unicode character.
So you can write
profile[1] = "Fr\u00F6hliches Fr\u00E4ulein";
if you want to save your file in a different encoding as your output.

Robert.
Nov 11 '05 #4
Robert wrote:
The problem that you are probably having is that the encoding that your
html and/or javascript is saved is in a different encoding than the
encoding you specified in your HTML. Or maybe you forgot to specify the
encoding and the encoding is wrongly auto-detected.

The most useuful encoding in your case is probably UTF-8.
So makes sure you have this in your header:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
Utter nonsense.

First, the above (which cannot be part of the [HTTP] header, but of
the `head' element) will not suffice, the HTTP Content-Type header is
important. Second, UTF-8, especially the German umlauts in it, is
not compatible to ISO-8859-* (encoding is different), and you do not
know that he used a Unicode editor for this file.
Next make sure that your html/javascript file is in UTF-8 format.
He does not need to and should not want to if not necessary.
ISO-8859-1(5) will suffice and will be more widely supported.
So you can write
profile[1] = "Fr\u00F6hliches Fr\u00E4ulein";
if you want to save your file in a different encoding as your output.


Provided that the used script engine supports Unicode escape sequences.
PointedEars
Nov 11 '05 #5
Robert wrote:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>


I forgot to mention that the above is not Valid HTML. It is subject to
error-correction if SGML NET is ignored; if not, it is equivalent to

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">&gt;

XHTML != HTML.
PointedEars
Nov 11 '05 #6
Thomas 'PointedEars' Lahn wrote:
Robert wrote:

The problem that you are probably having is that the encoding that your
html and/or javascript is saved is in a different encoding than the
encoding you specified in your HTML. Or maybe you forgot to specify the
encoding and the encoding is wrongly auto-detected.

The most useuful encoding in your case is probably UTF-8.
So makes sure you have this in your header:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>

Utter nonsense.


What part is utter nonsense?
First, the above (which cannot be part of the [HTTP] header, but of
the `head' element) will not suffice, the HTTP Content-Type header is
important.
As can clearly seen by the syntax I was talking about HTML and not the
header transmitted by a server.
To my knowledge the content-type of the HTML file overrides the one
given by the webserver. However I may be wrong about this and therefore
made no comment about it before. It does not change the fact that as a
good author you must provide the content-type in your webpage.

Second, UTF-8, especially the German umlauts in it, is
not compatible to ISO-8859-* (encoding is different),
Where exactly did you see me write that these are compatible?
and you do not
know that he used a Unicode editor for this file.
Where exactly did you see me write this?
I actually made a suggestion for a good editor for him if he needed it.
Next make sure that your html/javascript file is in UTF-8 format.

He does not need to and should not want to if not necessary.
ISO-8859-1(5) will suffice and will be more widely supported.


There is huge support for Unicode.

I cannot see his full needs in one word. Maybe he will need characters
that are not in ISO-8859-1 soon. In any case ISO-8859-1 may suffice, but
UTF-8 will suffice for sure and it's just as easy to use.
So you can write
profile[1] = "Fr\u00F6hliches Fr\u00E4ulein";
if you want to save your file in a different encoding as your output.

Provided that the used script engine supports Unicode escape sequences.


Which it should in 2005.

Don't make yourself look ridiculous by saying something is utter nonsense.
Unicode is very important.
Nov 11 '05 #7
Thomas 'PointedEars' Lahn wrote:
Robert wrote:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>

I forgot to mention that the above is not Valid HTML.


The original poster did not specify if he wanted HTML or XHTML.
It is subject to
error-correction if SGML NET is ignored; if not, it is equivalent to

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">&gt;

XHTML != HTML.


Most browsers do not use real SGML parsers and will not see the
difference between those two.
Nov 11 '05 #8
Robert wrote:
Thomas 'PointedEars' Lahn wrote:
Robert wrote:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>


I forgot to mention that the above is not Valid HTML.


The original poster did not specify if he wanted HTML or XHTML.


However, he specified that he is using HTML right now. Why
do you try to force XHTML (or HTML to be error-corrected,
for that matter) on him, with all its ramifications?
It is subject to
error-correction if SGML NET is ignored; if not, it is equivalent to

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">&gt;

XHTML != HTML.


Most browsers do not use real SGML parsers and will not see the
difference between those two.


Relying on error-correction is error-prone.
PointedEars
Nov 11 '05 #9
Robert wrote:
Thomas 'PointedEars' Lahn wrote:
Robert wrote:
The problem that you are probably having is that the encoding that your
html and/or javascript is saved is in a different encoding than the
encoding you specified in your HTML. Or maybe you forgot to specify the
encoding and the encoding is wrongly auto-detected.

The most useuful encoding in your case is probably UTF-8.
So makes sure you have this in your header:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
Utter nonsense.


What part is utter nonsense?
First, the above (which cannot be part of the [HTTP] header, but of
the `head' element) will not suffice, the HTTP Content-Type header is
important.


As can clearly seen by the syntax I was talking about HTML and not the
header transmitted by a server.


There is no such thing as a HTML header. There is the HTML `head'
element, which is a completely different thing. To call the latter
a "header" is inappropriate.
To my knowledge the content-type of the HTML file overrides the one
given by the webserver. However I may be wrong about this and therefore
made no comment about it before.
It MAY override the default (before serving), there is no MUST.
HTML 4.01, section 7.4.4, clearly states that:

,-<http://www.w3.org/TR/html4/struct/global.html#h-7.4.4.2>
|
| The http-equiv attribute can be used in place of the name attribute and
| has a special significance when documents are retrieved via the Hypertext
| Transfer Protocol (HTTP). HTTP servers may use the property name specified
| by the http-equiv attribute to create an [RFC822]-style header in the HTTP
| response.

Most notably, the HTML 4.01 Specification does _not_ state that user agents
MUST or MAY allow the Content-Type header to be overridden by the `meta'
element _after_ the document was served with a different header value.
It does not change the fact that as a good author you must provide the
content-type in your webpage.
For possible future non-HTTP use. Yes, indeed.
Second, UTF-8, especially the German umlauts in it, is
not compatible to ISO-8859-* (encoding is different),


Where exactly did you see me write that these are compatible?


Your statement is written in a way that is looks like as if the
OP does not have the choice. You have been proposing the more
complicated way when there is a simpler and still compliant one
which I consider a Bad Thing, especially when addressing a newbie.
Next make sure that your html/javascript file is in UTF-8 format.


He does not need to and should not want to if not necessary.
ISO-8859-1(5) will suffice and will be more widely supported.


There is huge support for Unicode.


Especially on the Web, one has to consider to be backwards compatible.
There are used UAs out there which does not support Unicode, so it is
unwise to use or recommend that if not needed. And it is certainly
not needed here.
Don't make yourself look ridiculous by saying something is utter
nonsense.
I may have been a bit harsh but proposing to declare UTF-8 and using
an Unicode-compatible editor where ISO-8859-* and any text editor
sufficed seemed rather quite ridiculous to me.
Unicode is very important.


Unicode is very important, I did not and do not doubt that. However,
using and recommending it without thinking of the ramifications of its
use only makes matters worse.
Regards,
PointedEars
Nov 11 '05 #10
Thomas 'PointedEars' Lahn wrote:
Utter nonsense.
What part is utter nonsense?

There is no such thing as a HTML header. There is the HTML `head'
element, which is a completely different thing. To call the latter
a "header" is inappropriate.


Maybe not 100% accurate, but not utter nonsense.
It does not change the fact that as a good author you must provide the
content-type in your webpage.


For possible future non-HTTP use. Yes, indeed.


So not utter nonsense too.
Second, UTF-8, especially the German umlauts in it, is
not compatible to ISO-8859-* (encoding is different),


Where exactly did you see me write that these are compatible?

Your statement is written in a way that is looks like as if the
OP does not have the choice.


I do not see it in that way. I clearly stated his problem and said UTF-8
is probably best for him and how he could fix it.
You have been proposing the more
complicated way when there is a simpler and still compliant one
which I consider a Bad Thing, especially when addressing a newbie.
I do not think it is the more complicated way. Even for a newbie Unicode
awareness cannot come soon enough. Actually I do not know if this person
is a newbie, because I have seen developers with years of experience,
but no knowledge about character sets and encodings, and have the same
problems that he is having.
Especially on the Web, one has to consider to be backwards compatible.
There are used UAs out there which does not support Unicode, so it is
unwise to use or recommend that if not needed. And it is certainly
not needed here.
The sooner everyone adopts the Unicode standard, the faster these
outdated user agents will be updated.
I may have been a bit harsh but proposing to declare UTF-8 and using
an Unicode-compatible editor where ISO-8859-* and any text editor
sufficed seemed rather quite ridiculous to me.


Even notepad (windows xp) can save in UTF-8!

Look it is obvious that we have different views towards using Unicode,
and there is room for discussion. But to just put away with it as utter
nonsense is insulting.
I just wish someone made me aware of Unicode and encodings in my newbie
days. Now the original poster is aware, he can inform himself further
and can make a conscious decision. And when he decides to use ISO-8859-1
instead, I am sure he is capable to change my suggestion to
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
instead
Nov 23 '05 #11
Thomas 'PointedEars' Lahn wrote:
Robert wrote:

Thomas 'PointedEars' Lahn wrote:
Robert wrote:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>

I forgot to mention that the above is not Valid HTML.


The original poster did not specify if he wanted HTML or XHTML.

However, he specified that he is using HTML right now.


Ok did not see that.
Just copied it and did not think about removing the last slash.
Nov 23 '05 #12
Robert wrote:
Thomas 'PointedEars' Lahn wrote:
Especially on the Web, one has to consider to be backwards compatible.
There are used UAs out there which does not support Unicode, so it is
unwise to use or recommend that if not needed. And it is certainly
not needed here.


The sooner everyone adopts the Unicode standard, the faster these
outdated user agents will be updated.


Though I am all in for standards compliance (you will find me advocating
Valid markup, W3C DOM compliant scripting and the like here the n-th time),
especially but not exclusively people trying to make money on and from the
Web can seldom afford this extreme attitude, sometimes in the literal
sense.

It has always been my opinion that a Web developer should try to get as
much audience as possible if the odds for achieving this are acceptable.
Since Unicode is not needed here and the alternative is easy to implement
while having the advantage of broader support, I find them acceptable here.

BTW, talking about adhering to standards, your From header is a violation
of RFC2822, section 3.4.
I may have been a bit harsh but proposing to declare UTF-8 and using
an Unicode-compatible editor where ISO-8859-* and any text editor
sufficed seemed rather quite ridiculous to me.


Even notepad (windows xp) can save in UTF-8!


Interesting, I did not know that. My work platforms are GNU/Linux
and (seldom) Win2k (where I even more seldom use Notepad).
PointedEars
Nov 23 '05 #13
"Thomas 'PointedEars' Lahn" <Po*********@web.de> wrote in message
news:12****************@PointedEars.de...
Robert wrote:
<snipped/> Even notepad (windows xp) can save in UTF-8!

Interesting, I did not know that. My work platforms are GNU/Linux
and (seldom) Win2k (where I even more seldom use Notepad).


Yup...
* ANSI
* Unicode
* Unicode Big Endian
* UTF-8

You can select it from a combobox in the save dialog.
(ANSI is default).

--
Dag.
Nov 23 '05 #14
Dag Sunde wrote:
"Thomas 'PointedEars' Lahn" <Po*********@web.de> wrote in message
news:12****************@PointedEars.de...
Robert wrote:


<snipped/>
Even notepad (windows xp) can save in UTF-8!

Yup...
* ANSI
* Unicode
* Unicode Big Endian
* UTF-8

You can select it from a combobox in the save dialog.
(ANSI is default).


Just wanted to comment that of course the "Unicode" selection is kinda
ridiculous. What they meant was UTF-16 (Little Endian)

Actually ANSI is kinda ridiculous too, because it has nothing to do with
the American National Standards Institute.
Nov 23 '05 #15
"Robert" <ro*@secret.xyz> wrote in message
news:43***********************@news.xs4all.nl...
Dag Sunde wrote:
"Thomas 'PointedEars' Lahn" <Po*********@web.de> wrote in message
news:12****************@PointedEars.de...
Robert wrote:


<snipped/>
Even notepad (windows xp) can save in UTF-8!

Yup...
* ANSI
* Unicode
* Unicode Big Endian
* UTF-8

You can select it from a combobox in the save dialog.
(ANSI is default).


Just wanted to comment that of course the "Unicode" selection is kinda
ridiculous. What they meant was UTF-16 (Little Endian)

Actually ANSI is kinda ridiculous too, because it has nothing to do with
the American National Standards Institute.


Of course it is ridiculous...
It's MS NotePad!

;-)

--
Dag.

Nov 23 '05 #16

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

63
by: Tristan Miller | last post by:
Greetings. Do any popular browsers correctly support <q>, at least for Western languages? I've noticed that Mozilla uses the standard English double-quote character, ", regardless of the lang...
1
by: lothar.behrens | last post by:
Hi, I had or even have a problem with functions, that are returning char arrays. I want to translate text that should be internationalized. So the following function definition, I thought,...
5
by: R. Rajesh Jeba Anbiah | last post by:
I could see that it is possible to have hash array using objects like var hash = {"a" : "1", "b" : "2"}; Couldn't still findout how to declare hash array in Array. var arr = new Array("a" : "1",...
7
by: Barry | last post by:
Hi all, I've noticed a strange error on my website. When I print a capital letter P with a dot above, using & #7766; it appears correctly, but when I use P& #0775 it doesn't. The following...
1
by: BostonNole | last post by:
Refer to MS article Q98999 for an explanation of Diacritical Marks. When a line from a flat text file is imported from a file using the StreamReader, any character that has a Diacritical Mark is...
0
by: vbnetprogramer | last post by:
How Can I Ignore Accsents and diacritical marks in WHERE statement? i use sql statment for searching and i wont to search in text filds where some accent and diacritical marks have been enterd. ...
5
by: Tristan Miller | last post by:
Greetings. Is it possible using HTML and CSS to represent a combining diacritical mark in a different style from the letter it modifies? For example, say I want to render Å‘ (Latin small letter...
8
by: sexauthor | last post by:
I'm converting a VB6 application over that called a 3rd party DLL with specific data structures. The VB6 code defined custom types for those data structures (ie: one with the specific data types,...
1
by: yuleball | last post by:
hi I am designing an editor program. I am using ChrW() and AscW() functions to print the character on screen. I want to implement combining diacritical marks(or accent marks) i-e if i print a...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.