Connecting Tech Pros Worldwide Help | Site Map

Using Hindi Language with Unicode

pratik.best@gmail.com
Guest
 
Posts: n/a
#1: Jan 5 '07
Hi,
I just seen the web site of the unicode committee and was amazed to see
the site showing document in Hindi without using any such fonts like
"Kruti Dev" or "Dev Lys". "Webdunia.com" is also showing documents in
Hindi without the need to download any specific font. How's that done?
Also, can I build such a page?

Andreas Prilop
Guest
 
Posts: n/a
#2: Jan 5 '07

re: Using Hindi Language with Unicode


On 5 Jan 2007, pratik.best@gmail.com wrote:
Quote:
I just seen the web site of the unicode committee and was amazed to see
the site showing document in Hindi without using any such fonts like
"Kruti Dev" or "Dev Lys". "Webdunia.com" is also showing documents in
Hindi without the need to download any specific font. How's that done?
With UTF-8
http://www.unics.uni-hannover.de/nht...l1.html#nagari
or numeric references
http://www.unics.uni-hannover.de/nht...l2.html#nagari
http://www.unics.uni-hannover.de/nht...-alphabet.html
http://ppewww.physics.gla.ac.uk/~fla...unidata09.html
Quote:
Also, can I build such a page?
You need an editor that can save documents in Unicode UTF-8.

Jukka K. Korpela
Guest
 
Posts: n/a
#3: Jan 5 '07

re: Using Hindi Language with Unicode


Scripsit Andreas Prilop:
Quote:
Quote:
>I just seen the web site of the unicode committee and was amazed to
>see the site showing document in Hindi without using any such fonts
>like "Kruti Dev" or "Dev Lys". "Webdunia.com" is also showing
>documents in Hindi without the need to download any specific font.
>How's that done?
>
With UTF-8
You can see this if you access e.g.
http://www.unicode.org/standard/translations/hindi.html
and select View/Encoding in your browser; you'll see the encoding "UTF-8"
selected, because that's the encoding specified on the page and the browser
uses it. (Actually, the encoding would better be specified in HTTP headers
as well, but nobody's perfect.)

For Hindi on the web, UTF-8 is clearly the only feasible solution. Other
Unicode encodings such as UTF-16 are possible in principle, but less widely
supported. There is an 8-bit encoding "ISCII", for writing Indic languages,
that has been used to some extent but it is not officially registered, but
it isn't recognized by browsers. Finally, you _could_ use any encoding (even
US-ASCII) and represent Hindi (Devanagari) characters using character
references like अ, but that would be too awkward and too inefficient
(except perhaps for short fragments of texts in documents that are mostly in
another language).
Quote:
Quote:
>Also, can I build such a page?
>
You need an editor that can save documents in Unicode UTF-8.
This still gives a wide range of opportunities, see e.g. the software listed
at
http://www.alanwood.net/unicode/utilities.html
I also noticed a page specifically about authoring in Devanagari:
http://tlt.psu.edu/suggestions/inter...evanagari.html

By the way, if you look at the Unicode page I mentioned, you'll see meta
tags that indicate that it was created using Microsoft FrontPage 6.0. I
don't particularly recommend that software (note that the HTML markup on the
page is rather poor), but this shows that you _can_ use Unicode even if you
just play with common office programs. Moving to Unicode is often just a
matter of using the possibilities in the software you are using now, rather
than getting some fancy novelties.

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/

Andreas Prilop
Guest
 
Posts: n/a
#4: Jan 9 '07

re: Using Hindi Language with Unicode


On Fri, 5 Jan 2007, Jukka K. Korpela wrote:
Quote:
There is an 8-bit encoding "ISCII", for writing Indic languages,
There are several 8-bit-coded character sets under the generic
name ISCII.
Quote:
but it is not officially registered,
They are not officially registered.
Quote:
but it isn't recognized by browsers.
They are not recognized by browsers.

For use with MIME, we would need designations such as

charset=iscii-devanagari
charset=iscii-gujarati
charset=iscii-gurmukhi

etc. Some browsers recognize

charset=x-mac-devanagari

etc.

Closed Thread