"Chung Leong" <chernyshevsky@hotmail.com> wrote in message news:<Q-2dnYW2xMUIc1XdRVn-ig@comcast.com>...[color=blue]
> "R. Rajesh Jeba Anbiah" <ng4rrjanbiah@rediffmail.com> wrote in message
> news:abc4d8b8.0406100019.61b48ff7@posting.google.c om...[color=green]
> > Thanks for the info/logic. Though I'm bit aware of unicode, this is
> > the first time I'm putting my hands on it... It's bit kinda pain as
> > PHP's unicode support is broken and strange...[/color]
>
> Yeah, Unicode support in PHP is practically non-existence. You can still get
> by though. More recent version of PHP supports character classes in regular
> expressions, so you can do things like
> /([\x{0900}-\x{09FF}]+)/.
>
> UTF8 is in general rather tricky to work with. For example, you can't limit
> the length of text entered by users using just the length attribute in HTML.
> And when database width constraint chops off some UTF8 text in
> mid-character, all sort of funky things happen in the browser.[/color]
Thanks a lot for your comments and help. As you said, utf8 acts
much strange; if we include the utf8 texts from other files, it works
differently than expected. Anyway we can somehow get it work.
[color=blue]
> My advise is not to use Unicode unless you have to. I am not familiar with
> the Tamil script, but I think done a lot of work with Hindi. Most Hindi
> websites do not use Unicode (e.g.
www.webdunia.com), because Unicode Hindi
> text requires rendering support from the operation system, which essentially
> limits you to Windows/IE only.[/color]
Yeah I understand. But, for Tamil staying behind Unicode may not
help much as many people are moving towards it. The reason should be
many people here use Windows/IE alone. <OT>BTW,
www.webdunia.com is
your work? :-)</OT>
[color=blue][color=green]
> > I'll be interested to try both. Are you hinting that at least one is
> > easier? Thanks.[/color]
>
> Choosing one encoding out of three is obviously easier than choosing one out
> of several hundred. As far as I know the only fool proof way is to run a
> spell check on the text. Statistical analysis could also work. Just count
> how often the letters are occurring and compare that to a known profile for
> that language.[/color]
In Tamil, some characters won't start a word (unless someone did
a typo). I'd thought of using such grammar stuff, if there is no
direct solution to detect encoding. Thanks a lot for your help.
--
| Just another PHP saint |
Email: rrjanbiah-at-Y!com