470,596 Members | 1,608 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 470,596 developers. It's quick & easy.

javascript - regular expression - foreign characters

i have this function:

------------------------------------------------------------
function isAlfaNumeric(vnos,space) {
if (space==false) {
validRegExp = /^[a-zA-Z0-9]{0,}$/;
}
else {
validRegExp = /^[a-zA-Z0-9\s]{0,}$/;
}
return vnos.search(validRegExp)
}
-------------------------------------------------------------

the function is checking if the string "vnos" contains any non-alfanumeric
characters... it works fine except it returns true if the string contains
my country characters like ,.....i tried to do the following

validRegExp = /^[a-zA-Z0-9]{0,}$/; and also

validRegExp = /^[a-zA-Z0-9\\]{0,}$/; but result was the same

Does anyone know how to check for foreign characters in string using regular
expression??
Jul 20 '05 #1
12 11238
Smash wrote on 20 jan 2004 in comp.lang.javascript:
function isAlfaNumeric(vnos,space) {
if (space==false) {
validRegExp = /^[a-zA-Z0-9]{0,}$/;
}
else {
validRegExp = /^[a-zA-Z0-9\s]{0,}$/;
}
return vnos.search(validRegExp)
}
-------------------------------------------------------------

the function is checking if the string "vnos" contains any
non-alfanumeric characters... it works fine except it returns true if
the string contains my country characters like z,s.....i tried to do
the following

validRegExp = /^[a-zA-Z0-9zs]{0,}$/; and also

validRegExp = /^[a-zA-Z0-9\z\s]{0,}$/; but result was the same


for {0,} use +
for 0-9 use \d
\s is all kinds of whitespace, like tabs etc.
use test, if you test for a string

try this:

<SCRIPT>
function isAlfaNumeric(s,sp) {
r = /^[a-zA-Z\d]+$/;
rs = /^[a-zA-Z\d\s]+$/;
return (sp)? rs.test(s) : r.test(s);
};

alert(isAlfaNumeric("12ast",true));
alert(isAlfaNumeric("34ast",false));
alert(isAlfaNumeric("56as t",true));
alert(isAlfaNumeric("78as t",false));
</SCRIPT>

If you want to accept empty strings as true, use:

r = /^[a-zA-Z\d]*$/;
rs = /^[a-zA-Z\d\s]*$/;

this on works the other way around, accepts empty strings:

<SCRIPT>
function isAlfaNumeric(s,sp) {
r = /[^a-zA-Z\d]/;
rs = /[^a-zA-Z\d\s]/;
return ! ((sp)? rs.test(s) : r.test(s));
};

alert(isAlfaNumeric("12ast",true));
alert(isAlfaNumeric("34ast",false));
alert(isAlfaNumeric("56as t",true));
alert(isAlfaNumeric("78as t",false));
</SCRIPT>

--
Evertjan.
The Netherlands.
(Please change the x'es to dots in my emailaddress)
Jul 20 '05 #2
sm*****@email.si (Smash) writes:
Does anyone know how to check for foreign characters in string using regular
expression??


I think the safest is to use the \w esacpe, which matches "word characters".
That includes letters, international included, digits and the underscore.
If you can live with that:

if (space==false) {
validRegExp = /^[\w]*$/;
}
else {
validRegExp = /^[\w\s]*$/;
}

/L
--
Lasse Reichstein Nielsen - lr*@hotpop.com
DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
'Faith without judgement merely degrades the spirit divine.'
Jul 20 '05 #3
JRS: In article <72*************************@posting.google.com> , seen
in news:comp.lang.javascript, Smash <sm*****@email.si> posted at Tue, 20
Jan 2004 00:37:31 :-
function isAlfaNumeric(vnos,space) {
if (space==false) {
if (!space) { // or if (space) and swap the rest

Does anyone know how to check for foreign characters in string using regular
expression??

"Foreign" does not mean "non-Anglo"; Americans & British are foreigners
too.

AIUI, a string can contain any Unicode character, and there are tens of
thousands of those, a large proportion of which are letters in some
language or other. Therefore, to test fully for letters outside A-Za-z,
one needs in some form or another either a list of *all* letters or a
list of *all* non-letters, or both.

I don't know Slovenian; but I guess that it has a relatively small
number of non-Anglo letters; those could be listed and tested for, but
that would not be entirely helpful to a Scandinavian visitor.

There *should* be a javascript function to test whether the current font
has a specific glyph for a given character, or for all those in a
string; but AFAIK there is not.

--
John Stockton, Surrey, UK. ?@merlyn.demon.co.uk Turnpike v4.00 IE 4
<URL:http://jibbering.com/faq/> Jim Ley's FAQ for news:comp.lang.javascript
<URL:http://www.merlyn.demon.co.uk/js-index.htm> jscr maths, dates, sources.
<URL:http://www.merlyn.demon.co.uk/> TP/BP/Delphi/jscr/&c, FAQ items, links.
Jul 20 '05 #4
Dr John Stockton wrote on 20 jan 2004 in comp.lang.javascript:
There *should* be a javascript function to test whether the current font
has a specific glyph for a given character, or for all those in a
string; but AFAIK there is not.


If we had a Regex syntax for a character above-a/below-a/in-a-range-of
certain char number(s), even without the knowledge of the specific font,
that would be nice.

regex.definerange('%3','>#80')
regex.definerange('%5','>#0','<#20')

boolean = /aa\%5+bb[^\%3]?/.test(string)
--
Evertjan.
The Netherlands.
(Please change the x'es to dots in my emailaddress)
Jul 20 '05 #5
"Evertjan." <ex**************@interxnl.net> writes:
If we had a Regex syntax for a character above-a/below-a/in-a-range-of
certain char number(s), even without the knowledge of the specific font,
that would be nice.

regex.definerange('%3','>#80') regex.definerange('%5','>#0','<#20')

boolean = /aa\%5+bb[^\%3]?/.test(string)


Try:
var boolean = /aa[\x01-\x1f]+bb[^\x81-\uffff]?/.test(string);
It says true for
var string = "aa\n\rbb\u1268";
(which is 7 characters long).

/L
--
Lasse Reichstein Nielsen - lr*@hotpop.com
DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
'Faith without judgement merely degrades the spirit divine.'
Jul 20 '05 #6
Lasse Reichstein Nielsen wrote on 21 jan 2004 in comp.lang.javascript:
Try:
var boolean = /aa[\x01-\x1f]+bb[^\x81-\uffff]?/.test(string);
It says true for
var string = "aa\n\rbb\u1268";
(which is 7 characters long).


[\x01-\x1f] etc

Nice, never thought of that !

--
Evertjan.
The Netherlands.
(Please change the x'es to dots in my emailaddress)
Jul 20 '05 #7
JRS: In article <8y**********@hotpop.com>, seen in
news:comp.lang.javascript, Lasse Reichstein Nielsen <lr*@hotpop.com>
posted at Tue, 20 Jan 2004 22:47:33 :-
sm*****@email.si (Smash) writes:
Does anyone know how to check for foreign characters in string using regular
expression??


I think the safest is to use the \w esacpe, which matches "word characters".
That includes letters, international included, digits and the underscore.


In MSIE4, it does not match (E-acute), (a-umlait), (A-ring); and,
I suppose, others.

A Netscape 1.3 reference page include(s|d) :
Matches any alphanumeric character including the underscore.
Equivalent to [A-Za-z0-9_].

It would be nice to be able to match *any* letter, including non-anglo;
but ISTM that \w is fundamentally matching the characters that normally
appear in identifiers, and there it would be very wrong for that to be
altered.

--
John Stockton, Surrey, UK. ?@merlyn.demon.co.uk Turnpike v4.00 IE 4
<URL:http://jibbering.com/faq/> Jim Ley's FAQ for news:comp.lang.javascript
<URL:http://www.merlyn.demon.co.uk/js-index.htm> jscr maths, dates, sources.
<URL:http://www.merlyn.demon.co.uk/> TP/BP/Delphi/jscr/&c, FAQ items, links.
Jul 20 '05 #8
Dr John Stockton <sp**@merlyn.demon.co.uk> writes:
In MSIE4, it does not match (E-acute), (a-umlait), (A-ring); and,
I suppose, others.


Yes, that was me misremembering. Bummer. I would have been nice with
an escape that matches alphanumeric unicode characters, and not just
ASCII ones, and I though ECMAScript had it. That was apparently
just wishful thinking.

/L
--
Lasse Reichstein Nielsen - lr*@hotpop.com
DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
'Faith without judgement merely degrades the spirit divine.'
Jul 20 '05 #9
JRS: In article <pt**********@hotpop.com>, seen in
news:comp.lang.javascript, Lasse Reichstein Nielsen <lr*@hotpop.com>
posted at Wed, 21 Jan 2004 18:24:12 :-

Try:
var boolean = /aa[\x01-\x1f]+bb[^\x81-\uffff]?/.test(string);
It says true for
var string = "aa\n\rbb\u1268";
(which is 7 characters long).


But for that approach to do the original job in full, one needs to read
the entire Unicode table and decide which squashed spiders are foreign
letters and which are foreign non-letters.

I've seen AJF's Unicode table in HTML; but I don't recall seeing one
written in ISO-7 and intended for simple machine-reading.

http://ppewww.ph.gla.ac.uk/~flavell/...e/unidata.html

--
John Stockton, Surrey, UK. ?@merlyn.demon.co.uk Turnpike v4.00 IE 4
<URL:http://jibbering.com/faq/> Jim Ley's FAQ for news:comp.lang.javascript
<URL:http://www.merlyn.demon.co.uk/js-index.htm> jscr maths, dates, sources.
<URL:http://www.merlyn.demon.co.uk/> TP/BP/Delphi/jscr/&c, FAQ items, links.
Jul 20 '05 #10
Dr John Stockton wrote on 21 jan 2004 in comp.lang.javascript:
But for that approach to do the original job in full, one needs to read
the entire Unicode table and decide which squashed spiders are foreign
letters and which are foreign non-letters.


A perfect solution is impossible, as long as the unicode is not redesigned
to have seperate ranges for both types. And that probably will not happen.

For an imperfect solution, say for most European languages, could be done
in a standard string along the lines of [\x01-\x1f].

Seems a perfect job for you, John, to collect suggestions from many of us
about their local lingo needs. ;-}

Would the same unicode number stand for different [letter vs nonletter]
types in different European languages ?
Or in different fonts ?

I hope not.

--
Evertjan.
The Netherlands.
(Please change the x'es to dots in my emailaddress)
Jul 20 '05 #11
JRS: In article <Xn********************@194.109.133.29>, seen in
news:comp.lang.javascript, Evertjan. <ex**************@interxnl.net>
posted at Thu, 22 Jan 2004 08:47:39 :-
Dr John Stockton wrote on 21 jan 2004 in comp.lang.javascript:
But for that approach to do the original job in full, one needs to read
the entire Unicode table and decide which squashed spiders are foreign
letters and which are foreign non-letters.


A perfect solution is impossible, as long as the unicode is not redesigned
to have seperate ranges for both types. And that probably will not happen.

For an imperfect solution, say for most European languages, could be done
in a standard string along the lines of [\x01-\x1f].

Seems a perfect job for you, John, to collect suggestions from many of us
about their local lingo needs. ;-}

Would the same unicode number stand for different [letter vs nonletter]
types in different European languages ?
Or in different fonts ?


Read AJF's cited page, and others, on Unicode. Look and see what is
actually in Unicode.

AIUI, the idea of Unicode is that a given character has a given number,
independently of font, size, and language; \u0041 is 'A' and \u0061 is
'a' *everywhere*. If it's not \u0061, it's not our 'a', whatever it
looks like.

In practice, though, a letter only counts as a letter if it is a letter
of the current language. In English, Nijmegen has eight letters; I
suspect it of having only seven in Dutch, only six of which are English.

--
John Stockton, Surrey, UK. ?@merlyn.demon.co.uk Turnpike v4.00 IE 4
<URL:http://jibbering.com/faq/> Jim Ley's FAQ for news:comp.lang.javascript
<URL:http://www.merlyn.demon.co.uk/js-index.htm> jscr maths, dates, sources.
<URL:http://www.merlyn.demon.co.uk/> TP/BP/Delphi/jscr/&c, FAQ items, links.
Jul 20 '05 #12
Dr John Stockton wrote on 22 jan 2004 in comp.lang.javascript:
In practice, though, a letter only counts as a letter if it is a
letter of the current language.
I do not think o. It depends on definition, of cource. I would say a
letter in computerstrings can also be a letter in another language and
be counted as a letter. the u-umlaut [] is definitly a letter in
English in the sense that it is definitly not a non-letter, like
!?#%&.,.
In English, Nijmegen has eight letters; I
suspect it of having only seven in Dutch, only six of which are
English.


This is long since left concept in this time of computer generated and
sorted telephone books. The "ij", though it counts a one letter in
linguistic Dutch sense has definitely become a two letter "thing" like
the "ph".

The "ph" however, can also be pronounced in a two letter fassion in
words like:

poephark
ophaalbrug
Generaal van Opheusden ;-)

If there were a word with the ij pronounced as seperate letters, the j
should have two little points [de trema] like an umlaut. This is not
available in current fonts, I definitely presume, because the j is
usually thought as a consonant.

[The above thoughts are not tested on recent or old versions of eastern
languages, nor on Netscape]

--
Evertjan.
The Netherlands.
(Please change the x'es to dots in my emailaddress)
Jul 20 '05 #13

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

3 posts views Thread by Zach | last post: by
1 post views Thread by NvrBst | last post: by
27 posts views Thread by rhaazy | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.