472,328 Members | 1,383 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,328 software developers and data experts.

Validating utf-8 character strings in javascript regular expression

los
Hi,

I've created a web application using struts. I am trying to solve an
issue where in one of the forms where I want to allow the values
inserted to be special characters from other languages, but not symbols
such as (, <, +, }, etc... Creating the regular expression that
handles these values is becoming quite hard to find. Right now I have
^([a-zA-Z0-9_\x81-\xFF])*$ and this works for some utf-8 characters
such as ã, é, ó, etc... But doesn't work for other characters such
as Æ, Ü, ß, etc...

I was wondering if someone has come across this issue and has found a
solution for the problem.

Thanks,

-Los

Sep 20 '05 #1
6 27355
ASM
los wrote:
Hi,

I've created a web application using struts. I am trying to solve an
issue where in one of the forms where I want to allow the values
inserted to be special characters from other languages, but not symbols
such as (, <, +, }, etc... Creating the regular expression that
handles these values is becoming quite hard to find. Right now I have
^([a-zA-Z0-9_\x81-\xFF])*$ and this works for some utf-8 characters
such as ã, é, ó, etc... But doesn't work for other characters such
as Æ, Ü, ß, etc...

I was wondering if someone has come across this issue and has found a
solution for the problem.


if you are in encodage IS0-8859-1 :

? ^([a-zA-Z0-9_\xA0-\xFF])*$
? ^([a-zA-Z0-9_\x A0-\x FF])*$
? ^([a-zA-Z0-9_/A0/-/FF/])*$
? ^([\x61-\x7A\x41-\x5A\x30-\x39\x5F\xA0-\xFF])*$

? ^([/61/-/7A/41/-/5A/30/-/39//5F/A0-/FF/])*$

--
Stephane Moriaux et son [moins] vieux Mac
Sep 20 '05 #2
los
What if we don't want to restrict to just ISO-8859-1 characters? What
if we want to be all of the UTF-8 characters?

I tried doing something like ^([a-zA-Z0-9_\x0080-\xFFFF])*$ and it
didn't work.

-Los

Sep 21 '05 #3
ASM
los wrote:
What if we don't want to restrict to just ISO-8859-1 characters? What
if we want to be all of the UTF-8 characters?
because \x?? is not utf-8
it is hexa
and because the hexa code is not same in each charset

example :
space = \xA0 (hexa) = 00A0 (unicode) = C2 A0 (utf-8)
space = hexa : A0 with chartsets : ISO-8859-1 & CP1252
space = hexa : FF with chartsets : CP850 & CP437

http://www.miakinen.net/vrac/c10/charsets
I tried doing something like ^([a-zA-Z0-9_\x0080-\xFFFF])*$ and it
didn't work.


? ^([a-zA-Z0-9_/0080/-/FFFF/])

think it could be :

0081 to 00FF unicode
or
C2A0 to C3BF utf-8
from :
http://www.macchiato.com/unicode/chart/
or :
other url above

--
Stephane Moriaux et son [moins] vieux Mac
Sep 21 '05 #4
los
Thanks for the reply!

I tried your approach but for some reason the javascript parser doesn't
recognize the utf-8 characters still.

Could someone please verify that the correct regex should be
^([a-zA-Z0-9_\u00A1-\uFFFF])*$ ?

If I use the above regex in my xml, in the javascript that gets
generated on the web page I get the following rule;

this.mask=/^([a-zA-Z0-9_\\u00A1-\\uFFFF])*$/;

I apologize if this is a frugal question but I'm new at this and am
learning this as I go along.

Thanks,

-Los

Sep 21 '05 #5
los wrote:
I tried your approach but for some reason the javascript parser doesn't
recognize the utf-8 characters still.

Could someone please verify that the correct regex should be
^([a-zA-Z0-9_\u00A1-\uFFFF])*$ ?
It should not. Firstly, Unicode escapes needs to be supported which is not
the case with every script engine. Test it like

/\u00A1/.toString().length < 4 ? supported : unsupported

Secondly, using the Asterisk (`*') quantifier includes that it also matches
for the empty string; you should use the Plus (`+') quantifier instead.

Thirdly, you have to specify what Unicode glyphs you consider to be
"symbols". For example, including Unicode glyphs 0x00A1 to 0xFFFF as above
would also include glyph range 0x2100 to 0x214F (Letterlike Symbols).
See <http://unicode.org/> and <http://pointedears.de/scripts/test/charset>
for details.
If I use the above regex in my xml, in the javascript that gets
generated on the web page I get the following rule;

this.mask=/^([a-zA-Z0-9_\\u00A1-\\uFFFF])*$/;


The fact aside that this would include the empty string as well, that
would be quite obviously a RegExp completely different to the one above.
Escaping the backslash would include it as literal character into the
character class including all following elements of the previous escape
sequence (here: u, 0, A, 1, F).

What you possibly could want is

this.mask = new RegExp("^([a-zA-Z0-9_\\u00A1-\\uFFFF])+$");

where the escaped backslashes would collapse to single ones before
passed to the RegExp constructor and so apply to the first RegExp
literal (apart from the quantifier). But you should rather configure
your server-side code generator not to escape escape sequences.
PointedEars
Oct 16 '05 #6
JRS: In article <11****************@PointedEars.de>, dated Sun, 16 Oct
2005 18:41:31, seen in news:comp.lang.javascript, Thomas 'PointedEars'
Lahn <Po*********@web.de> posted :
los wrote:


ON 21 SEPTEMBER
AISB, your attribution line does not comply with the minimum current
Usenet thinking - this is not news:de,* here, as you should know.
I tried your approach but for some reason the javascript parser doesn't
recognize the utf-8 characters still.

Could someone please verify that the correct regex should be
^([a-zA-Z0-9_\u00A1-\uFFFF])*$ ?


It should not. Firstly, Unicode escapes needs to be supported which is not
the case with every script engine. Test it like

One had hoped that the turd who thinks it useful to disinter aged
threads had himself passed on to another place.

--
© John Stockton, Surrey, UK. ?@merlyn.demon.co.uk Turnpike v4.00 MIME. ©
Web <URL:http://www.merlyn.demon.co.uk/> - FAQish topics, acronyms, & links.
Proper <= 4-line sig. separator as above, a line exactly "-- " (SonOfRFC1036)
Do not Mail News to me. Before a reply, quote with ">" or "> " (SonOfRFC1036)
Oct 16 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: scorpion | last post by:
I have this problem that an xml instance is validated correctly by xml tools, but not with my simple code, by setting the validating flag to true....
3
by: Paul Wake | last post by:
http://www.xmission.com/~wake/section27.html now works for me in IE/Win but not in Mozilla/Win (my PowerBook is dead and I'm now using my...
2
by: Dan Shookowsky | last post by:
I'm trying to write a schema that allows me to substitute entensions for a base type. The schema (included below) defines a StepType and an...
1
by: Christian | last post by:
Hi, I load an Xml-file "customers.xml" into a DataSet (works fine) but then how do I validate it against a schema (e.g. customers.xsd) ? my...
1
by: Craig Beuker | last post by:
Hello, I am experimenting with this XmlValidatingReader and have a question about how it is working (or not working as would be the case) The...
4
by: bibsoconner | last post by:
Hi, I hope someone can please help me. I'm having a lot of trouble with schema files in .NET. I have produced a very simple example that uses...
2
by: Cesar | last post by:
Hello, I've developed a .NET C# web service; which has one method named, let's say, upload_your_data. This method has one parameter ( string...
2
by: PapaRandy | last post by:
Hello, I am trying to validate the following .py webpage as HTML (through W3C). I put: ...
21
by: Jack White | last post by:
Hi there, I've created a strongly-typed "DataSet" using VS. If I save the data via "DataSet.WriteXml()" and later prompt my users for the name of...
3
by: jh3an | last post by:
Please give me your advice! I made two files according to xml book, but when validating these two files, it gives me an error that I totally...
0
by: tammygombez | last post by:
Hey fellow JavaFX developers, I'm currently working on a project that involves using a ComboBox in JavaFX, and I've run into a bit of an issue....
0
by: concettolabs | last post by:
In today's business world, businesses are increasingly turning to PowerApps to develop custom business applications. PowerApps is a powerful tool...
0
by: teenabhardwaj | last post by:
How would one discover a valid source for learning news, comfort, and help for engineering designs? Covering through piles of books takes a lot of...
0
by: Kemmylinns12 | last post by:
Blockchain technology has emerged as a transformative force in the business world, offering unprecedented opportunities for innovation and...
0
by: CD Tom | last post by:
This happens in runtime 2013 and 2016. When a report is run and then closed a toolbar shows up and the only way to get it to go away is to right...
0
by: CD Tom | last post by:
This only shows up in access runtime. When a user select a report from my report menu when they close the report they get a menu I've called Add-ins...
0
by: Naresh1 | last post by:
What is WebLogic Admin Training? WebLogic Admin Training is a specialized program designed to equip individuals with the skills and knowledge...
1
by: Matthew3360 | last post by:
Hi, I have a python app that i want to be able to get variables from a php page on my webserver. My python app is on my computer. How would I make it...
0
by: AndyPSV | last post by:
HOW CAN I CREATE AN AI with an .executable file that would suck all files in the folder and on my computerHOW CAN I CREATE AN AI with an .executable...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.