473,385 Members | 1,834 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

Regex for languages other than english

Hi,

Anybody could show me a regex for capturing words (alphas, without
numerics) in languages other than english (languages with special
characters i.e. french, german)? I've tried '[a-zA-Z]+' but the special
letters for some language (i.e. french) are not captured. The '\w+' works
fine, but it also include numerics, which I don't want.

TIA
Jul 17 '05 #1
4 2317
Ricky Romaya wrote:
Anybody could show me a regex for capturing words (alphas, without
numerics) in languages other than english (languages with special
characters i.e. french, german)? I've tried '[a-zA-Z]+' but the special
letters for some language (i.e. french) are not captured. The '\w+' works
fine, but it also include numerics, which I don't want.


You'd need to specify what characters you want individually
or, carefully, with a character range, as in [a-z].

--
Jock
Jul 17 '05 #2
John Dunlop <us*********@john.dunlop.name> wrote in
news:MP************************@News.Individual.NE T:

You'd need to specify what characters you want individually
or, carefully, with a character range, as in [a-z].

Well, that much I know. The problem is in my native tongue, and english
(as 2nd language), there are no such special characters. My work requires
me to also include supports for other languages (such as french, german,
etc) which I can't speak, let alone write. I don't know the list of those
special characters and how to input them with ordinary 101 US keyboard.
Care to point me to a (internet) resource where the complete list of
those special characters are listed and how to input them?

BTW, as I said '\w+' works fine, except it also include numerics. Are
there ways to simulate '\w' without including the numerics, and without
knowing the list of all special characters?

TIA
Jul 17 '05 #3
Ricky Romaya wrote:
The problem is in my native tongue, and english (as 2nd language), there
are no such special characters. My work requires me to also include supports
for other languages (such as french, german, etc) which I can't speak, let
alone write. I don't know the list of those special characters and how to
input them with ordinary 101 US keyboard. Care to point me to a (internet)
resource where the complete list of those special characters are listed
If it isn't English, then I'm afraid I'm not overly familiar
with it. I think, though you'd better check yourself, that
German is covered by the Latin-1 alphabet, lists of which
are abundant on the web; French I think, again I'm not sure,
uses a character or two, such as the oe ligature, which are
outside Latin-1.
and how to input them?
How you enter those special characters depends on your
system. On Windows I would press and hold down the Alt key
and type the character's position in the native character
set, in decimal, with a leading zero, on the numeric keypad,
not on the numbers above the letters. So to type the
character 'é' (SMALL LETTER E WITH ACUTE ACCENT), hold down
Alt and using the numeric keypad type 0233.

In PCREs, you can also enter characters indirectly, by way
of an escape notation: A backslash followed by the letter
'x' followed by the code position in hexadecimal (case
insensitive) of the character; e.g., \xE9 represents 'é'.
This works both inside and outside of character classes.

So the regular expressions `^[a-zA-Zé]+$` and `^[a-zA-
Z\xE9]+$` are equivalent, and can be extended to match other
special characters.
BTW, as I said '\w+' works fine, except it also include numerics. Are
there ways to simulate '\w' without including the numerics, and without
knowing the list of all special characters?


There is no PCRE metacharacter for that. Although you can
specify a character class that would simulate that, you'd
need to know what characters you want to include.

Maybe there's another way. PHP keeps on surprising me.

--
Jock
Jul 17 '05 #4
Ricky Romaya wrote:
Hi,

Anybody could show me a regex for capturing words (alphas, without
numerics) in languages other than english (languages with special
characters i.e. french, german)? I've tried '[a-zA-Z]+' but the special letters for some language (i.e. french) are not captured. The '\w+' works fine, but it also include numerics, which I don't want.


Use something like [\xc8-\xcb]+
<http://in2.php.net/manual/en/reference.pcre.pattern.syntax.php>

--
<?php echo 'Just another PHP saint'; ?>
Email: rrjanbiah-at-Y!com Blog: http://rajeshanbiah.blogspot.com/

Jul 17 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: lawrence | last post by:
When users enter urls or other long strings it can destroy the formatting of a page. A long url, posted in a comment, can cause page distortions that make the page unreadable, till the website...
75
by: Xah Lee | last post by:
http://python.org/doc/2.4.1/lib/module-re.html http://python.org/doc/2.4.1/lib/node114.html --------- QUOTE The module defines several functions, constants, and an exception. Some of the...
4
by: Michael Vilain | last post by:
Originally, I was using $value =~ s/<.*>//g; to strip HTML tags from a variable. It actually stripped everything from the first "<" to the last ">" after the ending tag. I found this regex...
7
by: Marek Mand | last post by:
<script> var newval = ''; var name = 'marek mänd-österreich a'; // http://www.faqts.com/knowledge_base/view.phtml/aid/15940 correctedname = name.replace(/\b\w+b/g, function(word) { return ...
33
by: Joerg Schuster | last post by:
Hello, Python regular expressions must not have more than 100 capturing groups. The source code responsible for this reads as follows: # XXX: <fl> get rid of this limitation! if...
7
by: bill tie | last post by:
I'd appreciate it if you could advise. 1. How do I replace "\" (backslash) with anything? 2. Suppose I want to replace (a) every occurrence of characters "a", "b", "c", "d" with "x", (b)...
5
by: Tony Johansson | last post by:
Hello! Assume I have a windows forms application for people that can read English. Now to my question assume that this windows form application should also support the swedish language.What is...
10
by: Shapper | last post by:
Hello, I am working on an ASP.NET / VB web site and I need to have 2 versions: In English and French. I could create two versions of the web site by duplicating all the pages. Is that the...
17
by: clintonG | last post by:
I'm using an .aspx tool I found at but as nice as the interface is I think I need to consider using others. Some can generate C# I understand. Your preferences please... <%= Clinton Gallagher ...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.