473,670 Members | 2,495 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

detect language

Hello
I have an UTF string, how can i detect what language it is?
thanks
from Peter (cm****@hotmail .com)
Aug 30 '08 #1
7 2642
On 30 Aug, 05:04, Peter <cmk...@hotmail .comwrote:
Hello
* * *I have an UTF string, how can i detect what language it is?
thanks
from Peter (cmk...@hotmail .com)
This is a difficult one since many words are the same in more than one
language. You would also need a large pool of words with their
language listed so that you could compare each word in your string
with them and make a guess at the language.

The fact that the string is UTF has no real relavance to the language
in which the string is written, unless it is one of the non-latin ones.
Aug 30 '08 #2
AqD
Peter wrote:
Hello
I have an UTF string, how can i detect what language it is?
thanks
from Peter (cm****@hotmail .com)
An UTF string can have characters of more than one languages. You can
find out what lanaguages are used by scanning & checking every
characters (see [mbstring] extension) and their codepages they're in.

Sep 2 '08 #3
On Aug 30, 5:04 am, Peter <cmk...@hotmail .comwrote:
Hello
I have an UTF string, how can i detect what language it is?
thanks
from Peter (cmk...@hotmail .com)
You can't. Only a person reading it could tell for sure what language
it is. If, however, you want to know the language a browser visiting
your site is configured to use you could look at the Accept-Language
header.
Sep 2 '08 #4
On Sep 2, 9:32*am, Gordon <gordon.mc...@n tlworld.comwrot e:
On Aug 30, 5:04 am, Peter <cmk...@hotmail .comwrote:
Hello
* * *I have an UTF string, how can i detect what language it is?
thanks
from Peter (cmk...@hotmail .com)

You can't. Only a person reading it could tell for sure what language
it is. *If, however, you want to know the language a browser visiting
your site is configured to use you could look at the Accept-Language
header.
Short version: How to detect if a UTF-8 character is Chinese (CJK) or
NOT.

Long version: I have a database which contains articles stored in
UTF-8 format. The website is multilingual, so I have two database
fields one for chinese, one for english. the problem is that the
client wants the search to be seamless so if the user enters Chinese
characters it should search chinese and search english if there are
english characters. Now i am aware that in some cases the chinese use
english(latin) characters for things like acronyms etc and that my
chinese records may contain latin character. But what i NEED is to run
the 'search' string through a function that can look at each character
and tell me if that character is Chinese or not (whether 'not' is
latin, punctuation or anything else) i am simply looking to detect if
the character is chinese

regards

Simon
Oct 15 '08 #5
On Oct 15, 8:59*pm, Wassy <si...@wass1.en tadsl.comwrote:
On Sep 2, 9:32*am, Gordon <gordon.mc...@n tlworld.comwrot e:
On Aug 30, 5:04 am, Peter <cmk...@hotmail .comwrote:
Hello
* * *I have an UTF string, how can i detect what language it is?
thanks
from Peter (cmk...@hotmail .com)
You can't. Only a person reading it could tell for sure what language
it is. *If, however, you want to know the language a browser visiting
your site is configured to use you could look at the Accept-Language
header.

Short version: How to detect if a UTF-8 character is Chinese (CJK) or
NOT.

Long version: I have a database which contains articles stored in
UTF-8 format. The website is multilingual, so I have two database
fields one for chinese, one for english. the problem is that the
client wants the search to be seamless so if the user enters Chinese
characters it should search chinese and search english if there are
english characters. Now i am aware that in some cases the chinese use
english(latin) characters for things like acronyms etc and that my
chinese records may contain latin character. But what i NEED is to run
the 'search' string through a function that can look at each character
and tell me if that character is Chinese or not (whether 'not' is
latin, punctuation or anything else) i am simply looking to detect if
the character is chinese

regards

Simon
Why not just search the whole database? If the user enters Chinese
characters then articles with characters that match will be returned,
if they don't then they wont.
Oct 16 '08 #6
rf

"Wassy" <si***@wass1.en tadsl.comwrote in message
news:ee******** *************** ***********@8g2 000hse.googlegr oups.com...
On Sep 2, 9:32 am, Gordon <gordon.mc...@n tlworld.comwrot e:
On Aug 30, 5:04 am, Peter <cmk...@hotmail .comwrote:
Hello
I have an UTF string, how can i detect what language it is?
thanks
from Peter (cmk...@hotmail .com)

You can't. Only a person reading it could tell for sure what language
it is. If, however, you want to know the language a browser visiting
your site is configured to use you could look at the Accept-Language
header.
Short version: How to detect if a UTF-8 character is Chinese (CJK) or
NOT.

Move the other way. Detect if it is not English.

If the character code is 127 then it is not "English". It may be Chinese
or Korean or Arabic but it is most likely not English. Of course it may also
be French or German or even a &nbsp; but how fine to you want this to be? It
is a search string after all.

Oct 16 '08 #7
Thanks for the replies.

Gordon: i understand what you are saying but its not that simple
because of the difference between Chinese word characters and our
Latin letters to make the search EFFECTIVE I have to split the search
string up into INDIVIDUAL characters and search for each character on
its own when the input is Chinese but when it is English I need to
keep the LETTERS that make up an individual word or acronym together
and search the database for those sequence of letters together. ie.
searhcing for "cat" is a lot difference from searching for "c" then
"a" then "t".

a quick example: a user may enter a search string that is: "how do i
turn my computer on", because in latin languages we can split the
string up into individual words by seperating out the spaces. However,
becauseof the way chinese is written there are no spaces between words
so if the chinese equivalent is: "howdoiturnmyco mputeron" the dB will
not return a result if it tries to match ALL of the characters in
sequence. which is why for chinese character i need to know if the
search entry is characters so that i can split them up and which is
latin or other ounctuation etc, so i can keep them together because
chinese sometimes uses latin for technical acronyms etc. so you cannot
guarantee that a sentence will be fully characters and not a mixture
of latin letters and characters.

(i hope my ramblings make sense)

RF: thank you, i should have thought of that answer before :-) it is
probably the best way that im going to be able to separate the two, if
i can separate out all English (and the most common punctuation marks)
then what i am left with should be mostly chinese. I dont care if the
user enters some uncommon accented characters like in German or French
etc, the website is English/Chinese so there should be no need for
them to enter these characters in the normal case. Do you know how
UTF-8 numbers works? because i know it is not like ASCII where
characters go from 0-255? anyway thanks for the help, i should at
least be able to do a Regexp on the string to find things like a-z 0-9
and reglar punctuation symbols

Oct 16 '08 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
27370
by: Chris Fynn | last post by:
Is there an easy way to detect the local language settings of a client browser or system? I want to for example determine if the users browser or system is set to English, French, or Italian and display a message in the appropriate language. -- Christopher J. Fynn
1
436
by: Gouri | last post by:
How fo you detect whether the client machine has a modem attached to it with VB.NET or C#
8
2381
by: Brent | last post by:
Does anyone know the best way to detect and track a visitors screen resolution. I know the javascript to detect the users resolution but I am a bit confused on the best way to track and save this. Should I save it to a database or is it easier to save it to a text file?
3
4139
by: TJS | last post by:
trying to detect status of the checkbox to produce an alert can someone guide me to a fix on this? <input id="badcustomers" type="checkbox" checked name="badcustomers" >............... <script language ="javascript"> var part1, part2, part3, isNN; function setVariables() { if (navigator.appName == "Netscape") { isNN = true; if(!document.all && document.getElementById) {
1
1419
by: mbasil7 | last post by:
Hi at all. Is there a way to detect with javascript the scrolling bar with of the browser? My problem is that the the following script assing to the pos variable the browser window size but only internet explorer substracts the scrolling bar with from the result. <script language="JavaScript" type="text/JavaScript"> <!--
3
9115
by: Sara | last post by:
HI, I want to code a program to detect GSM mobile (any kind) which connected through serial port to computer and then be able to send SMS through this mobile phone to other mobile phones, could anyone help me and guide me, I wrote a program which could open COM port but still couldn't detect mobile phone and send SMS through it. I searched internet and I didn't find right answer to my question. please if anyone could guide me. the...
7
7155
by: Antoni Massó Mola | last post by:
Hi, I need to detect from which Country (City would be great) the user is connecting. Is there any free script for doing this in C#? Thanks
5
2828
by: =?Utf-8?B?U2llZ2ZyaWVkIEhlaW50emU=?= | last post by:
I need to redundantly write C++ and C# programs to (1) determine the input language (german, french, english...) (2) determine changes in the input language (preferably a delegate rather than polling!) Can anyone guide me to the proper functions to call? Thanks, siegfried
1
11797
by: =?Utf-8?B?Sm9obg==?= | last post by:
Hi, My C# program supports different languages using resource files. I need a way to detect the language used in the host computer, then the software can change its own language settings. How can I detect the language settings in a computer? Thanks.
0
8466
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8384
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8659
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7410
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6211
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
4387
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2798
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
2035
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
2
1790
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.