473,324 Members | 2,214 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,324 software developers and data experts.

Validating utf-8 character strings in javascript regular expression

los
Hi,

I've created a web application using struts. I am trying to solve an
issue where in one of the forms where I want to allow the values
inserted to be special characters from other languages, but not symbols
such as (, <, +, }, etc... Creating the regular expression that
handles these values is becoming quite hard to find. Right now I have
^([a-zA-Z0-9_\x81-\xFF])*$ and this works for some utf-8 characters
such as ã, é, ó, etc... But doesn't work for other characters such
as Æ, Ü, ß, etc...

I was wondering if someone has come across this issue and has found a
solution for the problem.

Thanks,

-Los

Sep 20 '05 #1
6 27769
ASM
los wrote:
Hi,

I've created a web application using struts. I am trying to solve an
issue where in one of the forms where I want to allow the values
inserted to be special characters from other languages, but not symbols
such as (, <, +, }, etc... Creating the regular expression that
handles these values is becoming quite hard to find. Right now I have
^([a-zA-Z0-9_\x81-\xFF])*$ and this works for some utf-8 characters
such as ã, é, ó, etc... But doesn't work for other characters such
as Æ, Ü, ß, etc...

I was wondering if someone has come across this issue and has found a
solution for the problem.


if you are in encodage IS0-8859-1 :

? ^([a-zA-Z0-9_\xA0-\xFF])*$
? ^([a-zA-Z0-9_\x A0-\x FF])*$
? ^([a-zA-Z0-9_/A0/-/FF/])*$
? ^([\x61-\x7A\x41-\x5A\x30-\x39\x5F\xA0-\xFF])*$

? ^([/61/-/7A/41/-/5A/30/-/39//5F/A0-/FF/])*$

--
Stephane Moriaux et son [moins] vieux Mac
Sep 20 '05 #2
los
What if we don't want to restrict to just ISO-8859-1 characters? What
if we want to be all of the UTF-8 characters?

I tried doing something like ^([a-zA-Z0-9_\x0080-\xFFFF])*$ and it
didn't work.

-Los

Sep 21 '05 #3
ASM
los wrote:
What if we don't want to restrict to just ISO-8859-1 characters? What
if we want to be all of the UTF-8 characters?
because \x?? is not utf-8
it is hexa
and because the hexa code is not same in each charset

example :
space = \xA0 (hexa) = 00A0 (unicode) = C2 A0 (utf-8)
space = hexa : A0 with chartsets : ISO-8859-1 & CP1252
space = hexa : FF with chartsets : CP850 & CP437

http://www.miakinen.net/vrac/c10/charsets
I tried doing something like ^([a-zA-Z0-9_\x0080-\xFFFF])*$ and it
didn't work.


? ^([a-zA-Z0-9_/0080/-/FFFF/])

think it could be :

0081 to 00FF unicode
or
C2A0 to C3BF utf-8
from :
http://www.macchiato.com/unicode/chart/
or :
other url above

--
Stephane Moriaux et son [moins] vieux Mac
Sep 21 '05 #4
los
Thanks for the reply!

I tried your approach but for some reason the javascript parser doesn't
recognize the utf-8 characters still.

Could someone please verify that the correct regex should be
^([a-zA-Z0-9_\u00A1-\uFFFF])*$ ?

If I use the above regex in my xml, in the javascript that gets
generated on the web page I get the following rule;

this.mask=/^([a-zA-Z0-9_\\u00A1-\\uFFFF])*$/;

I apologize if this is a frugal question but I'm new at this and am
learning this as I go along.

Thanks,

-Los

Sep 21 '05 #5
los wrote:
I tried your approach but for some reason the javascript parser doesn't
recognize the utf-8 characters still.

Could someone please verify that the correct regex should be
^([a-zA-Z0-9_\u00A1-\uFFFF])*$ ?
It should not. Firstly, Unicode escapes needs to be supported which is not
the case with every script engine. Test it like

/\u00A1/.toString().length < 4 ? supported : unsupported

Secondly, using the Asterisk (`*') quantifier includes that it also matches
for the empty string; you should use the Plus (`+') quantifier instead.

Thirdly, you have to specify what Unicode glyphs you consider to be
"symbols". For example, including Unicode glyphs 0x00A1 to 0xFFFF as above
would also include glyph range 0x2100 to 0x214F (Letterlike Symbols).
See <http://unicode.org/> and <http://pointedears.de/scripts/test/charset>
for details.
If I use the above regex in my xml, in the javascript that gets
generated on the web page I get the following rule;

this.mask=/^([a-zA-Z0-9_\\u00A1-\\uFFFF])*$/;


The fact aside that this would include the empty string as well, that
would be quite obviously a RegExp completely different to the one above.
Escaping the backslash would include it as literal character into the
character class including all following elements of the previous escape
sequence (here: u, 0, A, 1, F).

What you possibly could want is

this.mask = new RegExp("^([a-zA-Z0-9_\\u00A1-\\uFFFF])+$");

where the escaped backslashes would collapse to single ones before
passed to the RegExp constructor and so apply to the first RegExp
literal (apart from the quantifier). But you should rather configure
your server-side code generator not to escape escape sequences.
PointedEars
Oct 16 '05 #6
JRS: In article <11****************@PointedEars.de>, dated Sun, 16 Oct
2005 18:41:31, seen in news:comp.lang.javascript, Thomas 'PointedEars'
Lahn <Po*********@web.de> posted :
los wrote:


ON 21 SEPTEMBER
AISB, your attribution line does not comply with the minimum current
Usenet thinking - this is not news:de,* here, as you should know.
I tried your approach but for some reason the javascript parser doesn't
recognize the utf-8 characters still.

Could someone please verify that the correct regex should be
^([a-zA-Z0-9_\u00A1-\uFFFF])*$ ?


It should not. Firstly, Unicode escapes needs to be supported which is not
the case with every script engine. Test it like

One had hoped that the turd who thinks it useful to disinter aged
threads had himself passed on to another place.

--
© John Stockton, Surrey, UK. ?@merlyn.demon.co.uk Turnpike v4.00 MIME. ©
Web <URL:http://www.merlyn.demon.co.uk/> - FAQish topics, acronyms, & links.
Proper <= 4-line sig. separator as above, a line exactly "-- " (SonOfRFC1036)
Do not Mail News to me. Before a reply, quote with ">" or "> " (SonOfRFC1036)
Oct 16 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: scorpion | last post by:
I have this problem that an xml instance is validated correctly by xml tools, but not with my simple code, by setting the validating flag to true. --------------- Schema...
3
by: Paul Wake | last post by:
http://www.xmission.com/~wake/section27.html now works for me in IE/Win but not in Mozilla/Win (my PowerBook is dead and I'm now using my mother-in-law's PC, which limits my options for checking...
2
by: Dan Shookowsky | last post by:
I'm trying to write a schema that allows me to substitute entensions for a base type. The schema (included below) defines a StepType and an AnnouncementStepType that is an extension of the base...
1
by: Christian | last post by:
Hi, I load an Xml-file "customers.xml" into a DataSet (works fine) but then how do I validate it against a schema (e.g. customers.xsd) ? my customers.xml: <?xml version="1.0"...
1
by: Craig Beuker | last post by:
Hello, I am experimenting with this XmlValidatingReader and have a question about how it is working (or not working as would be the case) The sample documents and code are included at the end...
4
by: bibsoconner | last post by:
Hi, I hope someone can please help me. I'm having a lot of trouble with schema files in .NET. I have produced a very simple example that uses "include" to include other schema files. It all...
2
by: Cesar | last post by:
Hello, I've developed a .NET C# web service; which has one method named, let's say, upload_your_data. This method has one parameter ( string your_data). The value that this parameter will...
2
by: PapaRandy | last post by:
Hello, I am trying to validate the following .py webpage as HTML (through W3C). I put: ----------------------------------------------------------------------------- print "Content-type:...
21
by: Jack White | last post by:
Hi there, I've created a strongly-typed "DataSet" using VS. If I save the data via "DataSet.WriteXml()" and later prompt my users for the name of the file in order to read it back in again...
3
by: jh3an | last post by:
Please give me your advice! I made two files according to xml book, but when validating these two files, it gives me an error that I totally don't understand. Is there a problem in these...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.