Hi, i have a website which contains both chinese and english content
which is stored in a database. Each record in the dB has an english
and Chinese field. If a user enters a search string i have to be able
to detect which characters are latin based and which are chinese
ideographs.
eg) a user may enter "hello ÐÂÎÅÍø world"
this is because many Chinese search phrases (especially those involved
with technology may include English words or acronyms) eg) I think MP3
in Chinese is MPÈý as MP is an English acronym with the number 3 after
it, which in chinese is Èý (i may be wrong, my written Chinese is non-
existent :-) but that's just an example)
to make an effective search on the Chinese field I cannot just put
latin characters through the same search process as it would detract
from the effectiveness of the search.
What I need, from the search string (hello ÐÂÎÅÍø world) is a PHP
function that will give me an array telling me if each character in
the string is Chinese or not (i do not need to know if it is
punctuation symbols or any other characters, just yes Chinese or no
something else)
all of my dB fields are UTF-8, i looked at finding out the range of
Han characters in UTF-8 encoding but its seems very complicated. If
anyone can help out id appreciate it.
Regards
Simon