473,403 Members | 2,354 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,403 software developers and data experts.

Latin & Arabic Characters on the Same Page

Hello CIWAH ...

I want to propose full internationalization of three related websites:
http://africadatabase.org/
http://people.africadatabase.org/
http://institutions.africadatabase.org/

My role is mainly advisory and server management. I have very little
to do with content or page generation, so it's not something I can
do myself -- I have to persuade other people to do it.

On all three home pages, we have mixed Latin and Arabic characters
(all other pages right now are Latin-only).

However there is a difference: on http://africadatabase.org/ and
http://institutions.africadatabase.org/ the character set is
windows-1256, which is commonly used by Arabic websites. The Arabic
characters are all 8-bit, and their correct display depends (presumably)
on visitors having the required codepage installed on their computers.
If they haven't got it, the first four Arabic words will look like this:
"المعلومات و المعطيات الحديثة".

Now I don't think that is a major problem in Arabic-speaking countries,
because they nearly always have the windows-1256 codepage. I'm more
concerned about what all the rest of us see.

The other home page http://people.africadatabase.org/ uses UTF-8, and
the Arabic characters are converted to numeric entitities. Although the
conversion is a bit of extra work, I think it is worthwhile.

CIWAH has alwasy been an exceptionally useful resource, so I would like
to ask anyone here who has a few minutes spare to look at these pages,
and report on what they see. I guess the UTF-8 page will look exactly
the same to everybody. But what about the windows-1256 pages? So far, I
have only looked at them on PC machines in countries where the Latin
character set is normal. They look OK. But what do Mac users and Linux
desktop users see?

I would be really grateful for any information people can offer -- it
will all help. And I always appreciate any other comments, criticism
or advice that might be useful. I know there are a number of character
set problems on some of those pages -- they are on a long to-do list,
which you are all welcome to make longer still.
tia etc ...

Jul 20 '05 #1
17 8314
thinkfirst wrote:
Hello CIWAH ...

I want to propose full internationalization of three related websites:
http://africadatabase.org/
http://people.africadatabase.org/
http://institutions.africadatabase.org/

My role is mainly advisory and server management. I have very little
to do with content or page generation, so it's not something I can
do myself -- I have to persuade other people to do it.

On all three home pages, we have mixed Latin and Arabic characters
(all other pages right now are Latin-only).

However there is a difference: on http://africadatabase.org/ and
http://institutions.africadatabase.org/ the character set is
windows-1256, which is commonly used by Arabic websites. The Arabic
characters are all 8-bit, and their correct display depends (presumably)
on visitors having the required codepage installed on their computers.
If they haven't got it, the first four Arabic words will look like this:
"المعلومات و المعطيات الحديثة".

Now I don't think that is a major problem in Arabic-speaking countries,
because they nearly always have the windows-1256 codepage. I'm more
concerned about what all the rest of us see.

The other home page http://people.africadatabase.org/ uses UTF-8, and
the Arabic characters are converted to numeric entitities. Although the
conversion is a bit of extra work, I think it is worthwhile.

CIWAH has alwasy been an exceptionally useful resource, so I would like
to ask anyone here who has a few minutes spare to look at these pages,
and report on what they see. I guess the UTF-8 page will look exactly
the same to everybody. But what about the windows-1256 pages? So far, I
have only looked at them on PC machines in countries where the Latin
character set is normal. They look OK. But what do Mac users and Linux
desktop users see?

I would be really grateful for any information people can offer -- it
will all help. And I always appreciate any other comments, criticism
or advice that might be useful. I know there are a number of character
set problems on some of those pages -- they are on a long to-do list,
which you are all welcome to make longer still.


good work...they all come up fine here using Opera on Win98...it looks
like Arabic script to me, though I can't read the language myself...I'm
afraid I can't check on any other OS from here

for safety's sake I'd consider going over to utf-8 for all the
multilingual pages

--
eric
www.ericjarvis.co.uk
all these years I've waited for the revolution
and all we end up getting is spin
Jul 20 '05 #2
"thinkfirst" <th***********@yahoo.com> wrote:
On all three home pages, we have mixed Latin and Arabic characters
(all other pages right now are Latin-only).
However there is a difference: on http://africadatabase.org/ and
http://institutions.africadatabase.org/ the character set is
windows-1256, which is commonly used by Arabic websites.
It doesn't matter what "is commonly used". For example, tag soup
is commonly used. If you need an 8-bit encoding, use ISO-8859-6,
which is identical with Arabic Standard ASMO 708. My test page
http://www.unics.uni-hannover.de/nhtcapri/arabic.html6
displays perfectly in Mozilla 1.3 on Mac OS 9.1 and also with other
browsers/operating systems. On the Macintosh, you need the Arabic
Language Kit. http://www.unics.uni-hannover.de/nhtcapri/arabic.html
Now I don't think that is a major problem in Arabic-speaking countries,
because they nearly always have the windows-1256 codepage.
They don't "have the windows-1256 codepage" but they have fonts with
Arabic glyphs and operating systems suitable for the Arabic script.
The other home page http://people.africadatabase.org/ uses UTF-8, and
the Arabic characters are converted to numeric entitities.
This is possible but a better idea would be to use UTF-8:
http://www.unics.uni-hannover.de/nht...l1.html#arabic
I guess the UTF-8 page will look exactly the same to everybody.
What makes you think so?
But what about the windows-1256 pages?


The main difference between UTF-8 and Windows-1256 is *not* what you
think, i.e. different encodings. The main difference is that current
browsers use different typefaces to display them.
http://ppewww.ph.gla.ac.uk/~flavell/...ers-fonts.html
If have much Latin text on your pages, especially West European letters,
then use "charset=UTF-8" by all means.

You'll find further information on
http://ppewww.ph.gla.ac.uk/~flavell/...direction.html
Most important: Label *all* your text with DIR and LANG attributes:
e.g. <p dir="ltr" lang="fr"> <span dir="rtl" lang="ar">
Jul 20 '05 #3
thinkfirst wrote:

[snip]
CIWAH has alwasy been an exceptionally useful resource, so I would like
to ask anyone here who has a few minutes spare to look at these pages,
and report on what they see. I guess the UTF-8 page will look exactly
the same to everybody. But what about the windows-1256 pages? So far, I
have only looked at them on PC machines in countries where the Latin
character set is normal. They look OK. But what do Mac users and Linux
desktop users see?


On Linux:

http://africadatabase.org/ looks fine in Firebird 0.6.1, Konqueror 3.1.4 and
Opera 7.20 B9. Lynx 2.8.4rel.1 shows the Arabic text as a+l+m+e+l+w+m+a+t+
w+ a+l+m+e+tjy and so on, but the rest seems fine. Links 2.1pre9 gives the
same behaviour as Lynx in console mode, and shows almost all of the Arabic
text properly in graphical mode. W3M 0.4.1 and Netscape 4.79 show the
Arabic text as أ‡أ،أ£أڑأ،أ¦أ£أ‡أ‡ and so on.

The same goes for http://people.africadatabase.org/ and
http://institutions.africadatabase.org/ except that W3M and Netscape show
the Arabic text as a series of question marks on the people site, Netscape
shows question marks on the institutions site, and W3M shows أ‡أ،أ£أڑأ،أ¦أ£أ‡أ‡...
on the institutions site.

Bear in mind that there's no single release of Linux, and systems can vary
wildly. I'm using a Gentoo system if you think it matters - exactly what
fonts are installed I couldn't say without rummaging around a fair bit, but
I'm pretty sure I have a few of the decent Microsoft fonts. As far as I
know, it's common for desktop distributions to include these fonts.

On Mac OS X 10.2.8:

http://africadatabase.org looks fine in Safari 1.0, Mozilla 1.4, Opera 6.03
and Omniweb 4.5. Internet Explorer 5.2.3 displays the Arabic text as
random Latin glyphs but is otherwise fine. The same applies to the other
two sites.
--
Jim Dabell

Jul 20 '05 #4
On Fri, 17 Oct 2003, Jim Dabell wrote:
Lynx 2.8.4rel.1 shows the Arabic text as a+l+m+e+l+w+m+a+t+
w+ a+l+m+e+tjy and so on,


Well, not that it's of any practical use to anyone, but take a look
at: http://ppewww.ph.gla.ac.uk/~flavell/tests/ARALYNX.GIF

This is Lynx in a putty terminal window, set for utf-8 coding, with
Courier New font selected. (RedHat 9). (It's Lynx 2.8.5dev.7 if
anyone wanted to know.)

It doesn't understand right-to-left (not even when specified
explicitly, it seems), -NOR- the need for initial, medial and final
forms; so it's not much use in this context.

To the original poster: despite the fact that many Arabic pages, for
some incomprehensible reason(?), use a proprietary 8-bit coding
provided by a USA corporation, I couldn't advise choosing it yourself.
You'd probably have to look quite hard to find a web browser which
understood Windows-125x codings that didn't support the corresponding
coding in the iso-8859-* series. So if you want an 8-bit coding then
I'd have to recommend the iso-series one.

But use of utf-8 codings is catching up. I'm not familiar with the
current browser population in use in the area myself, so it's hard to
make any more-detailed practical recommendations, even if I take an
interest in the technologies of character representation at a more
basic level.

Jul 20 '05 #5
"thinkfirst" <th***********@yahoo.com> wrote:
http://africadatabase.org/
http://institutions.africadatabase.org/
http://people.africadatabase.org/


Further thoughts:

| font-family: Arial, Helvetica, Geneva, Swiss, sans-serif;

Never ever specify typefaces for non-Roman scripts! Your example is
especially bad since Helvetica, Geneva, Swiss do not contain Arabic
glyphs. Arial also does not necessarily include Arabic glyphs.
http://ppewww.ph.gla.ac.uk/~flavell/...onts.html#dont

| text-align: justify

Don't do this even with Latin/Cyrillic script as long as we don't have
reliable hyphenation. No justification without hyphenation!
"text-align: justify" with Arabic and Hebrew scripts is just silly.
Jul 20 '05 #6
Eric Jarvis <we*@ericjarvis.co.uk> wrote:
they all come up fine here using Opera on Win98...it looks
like Arabic script to me, though I can't read the language myself..


I'm afraid it's an illusion. I don't know Arabic either, but I know a
little about the writing system. And viewing http://africadatabase.org/
on Opera 7.11 (Win98), I see Arabic characters but
a) written left to right
b) presented using isolated form glyphs, making - I suppose - the
appearance really inappropriate.

IE 6.0 on Win98 seems to show it OK. But testing on another system,
with IE 5.5, I get a prompt that tells that the browser is trying to
download some plugin (or something) and hangs there. I have a vague
recollection of older versions of IE not being able to select glyphs
appropriately.

I'm afraid there's not much an author can do. Unicode contains
characters that corresponds to the specific glyph forms, but using them
would be awkward and would not guarantee correct display (partly
because those characters may not appear in common fonts).

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Jul 20 '05 #7
"Jukka K. Korpela" <jk******@cs.tut.fi> wrote:
IE 6.0 on Win98 seems to show it OK. But testing on another system,
with IE 5.5, I get a prompt that tells that the browser is trying to
download some plugin (or something) and hangs there. I have a vague
recollection of older versions of IE not being able to select glyphs
appropriately.


You need the "Arabic Language Support". I don't remember whether this
comes with Windows 98 or is installed by Internet Explorer.
Mozilla 1.5 on MS Windows 98 with "Arabic Language Support" displays
Arabic and Persian OK. (For Urdu, you would need third-party extensions
or Windows 2000.)
Jul 20 '05 #8
On Fri, 17 Oct 2003, Jukka K. Korpela wrote:
I'm afraid there's not much an author can do.
I don't honestly think there's much that an author -needs- to do.

Surely those who can read Arabic script will have their browsers set
up to be capable of browsing the normal ways in which Arabic is
published on the web? Whereas for those who cannot read it, it hardly
matters what's displayed there. Missing-glyphs might look ugly, but
it's not going to disrupt the whole display: so they can still read
the parts which are in a script that they can read.

Andreas will be along any moment with one of his magic Google searches
to tell us how many Arabic pages were found with this or that
encoding...?
Unicode contains characters that corresponds to the specific glyph
forms, but using them would be awkward and would not guarantee
correct display
Furthermore, such usage is deprecated, and will almost certainly ruin
the ability to find the text via a search engine. One is supposed to
use the characters from the 06xx Unicode range and let the rendering
engine choose the correct glyph forms.
(partly because those characters may not appear in common fonts).


I'm not sure about that. The glyphs have to be there anyway, so that
the rendering engine can use them for proper rendering of the normal
characters. Have you tried ListFont or a similar tool? I'm no
expert, but I can see the presenation forms up there at FB50 onwards,
and in the FExx block, in fonts which have 06xx populated.
Jul 20 '05 #9
Andreas Prilop wrote:
"Jukka K. Korpela" <jk******@cs.tut.fi> wrote:
IE 6.0 on Win98 seems to show it OK. But testing on another system,
with IE 5.5, I get a prompt that tells that the browser is trying to
download some plugin (or something) and hangs there. I have a vague
recollection of older versions of IE not being able to select glyphs
appropriately.


You need the "Arabic Language Support". I don't remember whether this
comes with Windows 98 or is installed by Internet Explorer.
Mozilla 1.5 on MS Windows 98 with "Arabic Language Support" displays
Arabic and Persian OK. (For Urdu, you would need third-party extensions
or Windows 2000.)


I installed it later, but so long ago I can't remember how since it was
part of a flurry of measures I took when I started having to work
multilingually

--
eric
www.ericjarvis.co.uk
all these years I've waited for the revolution
and all we end up getting is spin
Jul 20 '05 #10
In article <17*************************@rrzn-user.uni-hannover.de>,
Andreas Prilop <nh******@rrzn-user.uni-hannover.de> wrote:
On the Macintosh, you need the Arabic
Language Kit. http://www.unics.uni-hannover.de/nhtcapri/arabic.html


On Mac OS 9 and earlier, that is.

Mac OS X doesn't need Language Kits. However, the availability of some
input methods is (strangely) triggered by the availability of fonts in
particular encodings (even if the catch-all Lucida Grande has the
glyphs), so to be safe, it is a good idea to install the optional fonts
from the Mac OS X 10.2 CD.

--
Henri Sivonen
hs******@iki.fi
http://www.iki.fi/hsivonen/
Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html
Jul 20 '05 #11

"Jukka K. Korpela" <jk******@cs.tut.fi> wrote in message news:Xn*****************************@193.229.0.31. ..
I'm afraid it's an illusion. I don't know Arabic either, but I know a
little about the writing system. And viewing http://africadatabase.org/
on Opera 7.11 (Win98), I see Arabic characters but
a) written left to right
b) presented using isolated form glyphs, making - I suppose - the
appearance really inappropriate.


Opera doesn't do Right-To-Left, nothing to be done about that. Netscape
4 is the same. But for that reason, Opera is not used in Arabic-speaking
countries. Anyone who needs to read Arabic text will not be using a browser
that cannot display it. Others can see that it is a mess -- but also that
it *is* Arabic, not a string of incomprehensible non-ASCII characters.

I don't think we expect to achieve compatibility with browsers that are
old or under development.
Jul 20 '05 #12
Hi everybody ...

Thanks for all the wonderful help ... far more than I ever expected.
For half a day, I really thought I was almost on top of this thing.
Now, with all such delusions removed, I am back to square one.

The first big mistake was to imagine that we could simply insert a
bit of Arabic text at the bottom of the page, and adjust the
<meta http-equiv=Content-Type content="text/html; ... > tag. It won't
work, and thanks to this group, I found out within a few hours rather
than weeks.

Clearly we also have to reconsider what tools we use, and even our
working methods.

I have taken on board *all* of the comments and suggestions made here,
and I will try to implement all of them -- but I can't do it all as
quickly as I would wish.

In all three home pages, I have made some changes to the CSS, and I
have completely changed http://africadatabase.org/ ...

* Removed the <meta http-equiv=Content-Type content="text/html; charset=windows-1256"> tag
* Saved as UTF-8 with the utf-8 charset emitted by the server in the header

This seems to be the best thing I can actually *do* right now.

Software tools are a limiting factor here. As we are still feeling our
way, we don't have a good working process. Right now, we have just one
member of the team whose first language is Arabic. She produces text in
the Arabic version of MS Word, sends me the document, then I save it as
a "web page", and tidy it up in Topstyle. That is not sustainable!
Jul 20 '05 #13
Hi Andreas ...

I can't tell you how pleased I am to get all this information, and
being brought down to Earth before I had wasted too much time doing
everything wrong.

I'm still looking at those links ... it's never to late to learn.
Jul 20 '05 #14
thinkfirst wrote:

Software tools are a limiting factor here. As we are still feeling our
way, we don't have a good working process. Right now, we have just one
member of the team whose first language is Arabic. She produces text in
the Arabic version of MS Word, sends me the document, then I save it as
a "web page", and tidy it up in Topstyle. That is not sustainable!


have you thought about giving her a template to use in Word, with styles
set to match the web site?

--
eric
www.ericjarvis.co.uk
"live fast, die only if strictly necessary"
Jul 20 '05 #15
thinkfirst wrote:
Hi Andreas ...

I can't tell you how pleased I am to get all this information, and
being brought down to Earth before I had wasted too much time doing
everything wrong.

I'm still looking at those links ... it's never to late to learn.


between them Andreas, Jukka, Alan and Paul gave me enough advice to get a
fifteen language web site off the ground, they are wonderfully
knowledgeable, and amazingly patient

--
eric
www.ericjarvis.co.uk
all these years I've waited for the revolution
and all we end up getting is spin
Jul 20 '05 #16
"Alan J. Flavell" <fl*****@ph.gla.ac.uk> wrote:
To the original poster: despite the fact that many Arabic pages, for
some incomprehensible reason(?), use a proprietary 8-bit coding
provided by a USA corporation, I couldn't advise choosing it yourself.


http://www.microsoft.com/middleeast/arabic/default.mspx
comes with "charset=iso-8859-6".
Jul 20 '05 #17
"thinkfirst" <th***********@yahoo.com> wrote in message news:<bm**********@newsg4.svr.pol.co.uk>...

Opera doesn't do Right-To-Left, nothing to be done about that. Netscape
4 is the same. But for that reason, Opera is not used in Arabic-speaking
countries. Anyone who needs to read Arabic text will not be using a browser
that cannot display it. Others can see that it is a mess -- but also that
it *is* Arabic, not a string of incomprehensible non-ASCII characters.


Opera added right-to-left support in Opera 7.2, so it can now display
Arabic and Hebrew in Windows and Linux (but not Mac yet).

--
Alan Wood
http://www.alanwood.net (Unicode, special characters, pesticide names)
Jul 20 '05 #18

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: NohaKhalifa | last post by:
Dear All , I have a very big problem regarding using Arabic Character set . I'm Developing an Arabic Web Site using Asp and connecting to Access Database . but i have a problem retreiving data...
5
by: Jukka K. Korpela | last post by:
The HTML specifications define the entities &zwj;, &zwnj;, &lrm;, &rlm; as denoting zero-width joiner, zero-width non-joiner, left to right mark, and right to left mark. Is there any evidence of...
35
by: Dr.Tube | last post by:
Hi there, I have this web site (www.DrTube.com) which has the following DTD: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> which switches...
10
by: Arne | last post by:
Since I am Swedish, I write website content mostly in Swedish language and using charset iso-8859-1. I have (just for testing) tried to use utf-8 on a test page (...
1
by: Joe Abou Jaoude | last post by:
hi, I m trying to fill dropdownlists on my aspx web page with arabic characters. here's an example: <SELECT NAME="subject" SIZE="1" dir="rtl"> <OPTION SELECTED> <OPTION value="14"> ...
4
by: Greg | last post by:
I'm having trouble displaying and passing arabic characters from my web form. I've added requestEncoding="windows-1256" responseEncoding="windows-1256" to my web.config but it didn't help. I also...
29
by: amos | last post by:
Hi I'm experiencing a real nasty thing about dotnet. I've made a big application in dotnet and I would like to use ILAYERS for netscape 4. You CAN NOT USE Layers and Form buttons in...
2
by: mansour via DotNetMonster.com | last post by:
Dear all I am using ASP.NET 2003 and SQL SERVER 2000 I am building a website in English and Arabic. I am having a problem with Arabic characters, when I am trying to insert Arabic text it appears...
1
by: CideoEspada | last post by:
I have a problem with my child asp pages, They seem to not recognize Arabic characters and display them as gibberish, while on the master pages or pages that are independent from master pages they...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.