473,326 Members | 2,655 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,326 software developers and data experts.

How to get Unicode attributes of a character?

Hi,

Does there exist a portable (cross-browser) way to determine Unicode
attributes of a character in Javascript? I couldn't even find functions
like isUpper or isDigit, but it would be more desirable to have full
(or partial) set of Unicode attributes for a character.

Browsers that support Unicode must have this stuff compiled inside; is
this available to Javascript?

Thanks.

Jan 5 '07 #1
3 4163
go********@gmail.com wrote:
Does there exist a portable (cross-browser) way to determine Unicode
attributes of a character in Javascript? I couldn't even find functions
like isUpper or isDigit, but it would be more desirable to have full
(or partial) set of Unicode attributes for a character.

Browsers that support Unicode must have this stuff compiled inside; is
this available to Javascript?
I think you're mixing a few things.

To get the unicode code point from a character:

alert('L'.charCodeAt(0))

To find out if a string is a digit:

if (/^\d+$/.test('456')) { alert('is digit') }

To find out if a string is uppercase:

if (/^[A-Z]+$/.test('ADQ')) { alert('is upper') }

More info: http://www.merlyn.demon.co.uk/js-valid.htm

--
Bart

Jan 5 '07 #2
Bart Van der Donck wrote:
go********@gmail.com wrote:
>Does there exist a portable (cross-browser) way to determine Unicode
attributes of a character in Javascript? I couldn't even find functions
like isUpper or isDigit, but it would be more desirable to have full
(or partial) set of Unicode attributes for a character.

Browsers that support Unicode must have this stuff compiled inside; is
this available to Javascript?
To find out if a string is uppercase:

if (/^[A-Z]+$/.test('ADQ')) { alert('is upper') }
The original poster seems to be looking for something different. Unicode
defines character categories and blocks that contain quite a lot more
letters than the Latin A-Z.

Neither the regular expression language in ECMAScript edition 3 nor the
string functions have much support for that, besides toUpperCase and
toLowerCase respectively toLocaleLowerCase and toLocaleUpperCase going
beyond a-z/A-Z.

Regular expression language in Java or .NET have more support for such
Unicode categories (e.g. \p{Lu} for all upper case letters), with
JavaScript you are currently forced to list the ranges you are
interested in yourself.
--

Martin Honnen
http://JavaScript.FAQTs.com/
Jan 5 '07 #3
Hi,

Martin Honnen wrote:
The original poster seems to be looking for something different. Unicode
defines character categories and blocks that contain quite a lot more
letters than the Latin A-Z.
Exactly. Those attributes (as well as simple case mapings) that are
defined in the Unicode characters database (a large comma-separated
text file distributed from Unicode.org).
Regular expression language in Java or .NET have more support for such
Unicode categories (e.g. \p{Lu} for all upper case letters), with
JavaScript you are currently forced to list the ranges you are
interested in yourself.
Well, to get a category or case mapping for a character, using of
regexps is a bit of overkill (and this type of regexps is not supported
anyway). Looks like I'll have to compile the characters database myself
(I did that for C/Haskell, so there shouldn't be any trouble, just size
increase).

Thanks.

Jan 5 '07 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

19
by: Gerson Kurz | last post by:
AAAAAAAARG I hate the way python handles unicode. Here is a nice problem for y'all to enjoy: say you have a variable thats unicode directory = u"c:\temp" Its unicode not because you want it...
5
by: Xah Lee | last post by:
python has this nice unicodedata module that deals with unicode nicely. #-*- coding: utf-8 -*- # python from unicodedata import * # each unicode char has a unique name. # one can use the...
0
by: Phillip Farber | last post by:
Hello, I'm posting here with a somewhat technical question in the hope of finding someone with experience coding C++ against the SP_API in OpenSP 1.5. I have an app that uses the SP_API to...
6
by: S. | last post by:
if in my website i am using the sgml { notation, is it accurate to say to my users that the site uses unicode or that it requires unicode? is there a mathematical formula to calculate a unicode...
4
by: Basil | last post by:
Hello. I have compiler BC Builder 6.0. I have an example: #include <strstrea.h> int main () { wchar_t ff = {' s','d ', 'f', 'g', 't'};
13
by: Tomás | last post by:
Let's start off with: class Nation { public: virtual const char* GetName() const = 0; } class Norway : public Nation { public: virtual const char* GetName() const
8
by: Richard Schulman | last post by:
The following program fragment works correctly with an ascii input file. But the file I actually want to process is Unicode (utf-16 encoding). The file must be Unicode rather than ASCII or...
2
by: =?Utf-8?B?QWxleCBLLg==?= | last post by:
Hi all My TreeView has unicode and english labels. The treeview shows OK on the screen. When I am trying to get an item's label using TVM_GETITEM API message, the buffer returned by SendMessage...
7
by: 7stud | last post by:
Based on this example and the error: ----- u_str = u"abc\u9999" print u_str UnicodeEncodeError: 'ascii' codec can't encode character u'\u9999' in position 3: ordinal not in range(128) ------
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.