473,599 Members | 3,074 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Regexp: Case-insensitive matching | N factorial

In a setting where I can specify only a JS regular
expression, but not the JS code that will use it, I seek
a regexp component that matches a string of letters,
ignoring case. E.g, for "cat" I'd like the effect of

([Cc][Aa][Tt])

but without having to have many occurrences of [Xx].
Secondly, what is an efficient regexp that matches a
string exactly when ALL words in a certain list occur in
the string. I'd like the effect of

(cat.*nip|nip.* cat)

except that there are N words rather than just the two
words "cat" and "nip". (I can assume that no word in the
list is a prefix of any other.) Naturally, I'm looking for
a regexp-solution that does not involve listing all
N factorial
many orderings.

--Jonathan LF King, Mathematics dept, Univ. of Florida
Jun 27 '08 #1
5 2126
RobG wrote:
If you want to match the word cat exactly, then:

var reA = /\bcat\b/i;
That depends on how you define a word. If you define a word as a sequence
of word characters as specified in the ECMAScript Language Specification,
Ed. 3 Final, section 15.10.2.6 (i.e. those matching /[0-9A-Za-z_]/), you are
right.

However, for example "Menü" is a word in German, and

var reA = /\bmen\b/i;

will (only) match the "Men" in "Menü" there. Because `ü' is not considered
a word character per the Specification, and so the empty word ε between "n"
and "ü" constitutes a word boundary matched by /\b/ (as e.g.

"Menü".mat ch(/\bmen\b/i)

shows).

So for matching Unicode words in strings, you have to use

var reA = /(^|\s)cat(\s|$)/i;

instead; that is, a character sequence (here: without whitespace in-between)
bounded by whitespace, or one or two input boundaries.
PointedEars
--
Anyone who slaps a 'this page is best viewed with Browser X' label on
a Web page appears to be yearning for the bad old days, before the Web,
when you had very little chance of reading a document written on another
computer, another word processor, or another network. -- Tim Berners-Lee
Jun 27 '08 #2
On Jun 26, 4:17 pm, Thomas 'PointedEars' Lahn <PointedE...@we b.de>
wrote:
RobG wrote:
If you want to match the word cat exactly, then:
var reA = /\bcat\b/i;

That depends on how you define a word. If you define a word as a sequence
of word characters as specified in the ECMAScript Language Specification,
Ed. 3 Final, section 15.10.2.6 (i.e. those matching /[0-9A-Za-z_]/), you are
right.

However, for example "Men" is a word in German, and

var reA = /\bmen\b/i;

will (only) match the "Men" in "Men" there. Because `' is not considered
a word character per the Specification,
Hence I included the sentence "Also, the regular expression's idea of
a word
boundary might be different to what you expect."

and so the empty word between "n"
and "" constitutes a word boundary matched by /\b/ (as e.g.

"Men".mat ch(/\bmen\b/i)

shows).

So for matching Unicode words in strings, you have to use

var reA = /(^|\s)cat(\s|$)/i;
That expression is commonly used for matching values in the HTML class
attribute where the separator is specified as being whitespace. It is
not sufficient for matching words in general where they may be
followed by punctuation marks such as commas, semi-colons, colons,
dashes, periods and so on.
--
Rob
Jun 27 '08 #3
RobG wrote:
Thomas 'PointedEars' Lahn wrote:
>RobG wrote:
>>If you want to match the word cat exactly, then:
var reA = /\bcat\b/i;
That depends on how you define a word. If you define a word as a sequence
of word characters as specified in the ECMAScript Language Specification,
Ed. 3 Final, section 15.10.2.6 (i.e. those matching /[0-9A-Za-z_]/), you are
right.

However, for example "Menü" is a word in German, and

var reA = /\bmen\b/i;

will (only) match the "Men" in "Menü" there. Because `ü' is not considered
a word character per the Specification,

Hence I included the sentence "Also, the regular expression's idea of
a word boundary might be different to what you expect."
It was easy to overlook and provides no explanation as to what should be
expected instead.
>and so the empty word ε between "n"
and "ü" constitutes a word boundary matched by /\b/ (as e.g.

"Menü".mat ch(/\bmen\b/i)

shows).

So for matching Unicode words in strings, you have to use

var reA = /(^|\s)cat(\s|$)/i;

That expression is commonly used for matching values in the HTML class
attribute where the separator is specified as being whitespace. It is
not sufficient for matching words in general where they may be
followed by punctuation marks such as commas, semi-colons, colons,
dashes, periods and so on.
Good point. However, a character class can take care of that. For example,
in Unicode text that uses only ASCII and Latin-1 punctuation:

var reA = /(^|[\s,;:.-])cat([\s,;:.-]|$)/i;

But whether a punctuation mark really delimits a word is a matter of
language, interpretation, and personal taste. For example, the HYPHEN-MINUS
character ("-") may have been used as hyphen in compounds.

An alternative would be to use the \w escape sequence to build your own
character class:

var reA = /(^|[^\wäöü])cat([^\wäöü]|$)/i;
PointedEars
--
Anyone who slaps a 'this page is best viewed with Browser X' label on
a Web page appears to be yearning for the bad old days, before the Web,
when you had very little chance of reading a document written on another
computer, another word processor, or another network. -- Tim Berners-Lee
Jun 27 '08 #4
In comp.lang.javas cript message <6aa0c1c4-b785-4da1-9107-b681df097261@c5
8g2000hsc.googl egroups.com>, Wed, 25 Jun 2008 15:31:37,
ge********@gmai l.com posted:
>In a setting where I can specify only a JS regular
expression, but not the JS code that will use it, I seek
a regexp component that matches a string of letters,
ignoring case. E.g, for "cat" I'd like the effect of

([Cc][Aa][Tt])

but without having to have many occurrences of [Xx].
If all else fails, read the manual. There are links in <URL:http://www.
merlyn.demon.co .uk/js-valid.htm>.
Note that the average intellectual level of those who post with @gmail
addresses is so low that readers may kill-file it /in toto/.

Secondly, what is an efficient regexp that matches a
string exactly when ALL words in a certain list occur in
the string. I'd like the effect of

(cat.*nip|nip.* cat)

except that there are N words rather than just the two
words "cat" and "nip". (I can assume that no word in the
list is a prefix of any other.) Naturally, I'm looking for
a regexp-solution that does not involve listing all
N factorial
many orderings.
I doubt whether one exists to do a direct match, at least if it is to be
compatible with any user agent that knows RegExps.

But one could use S2 = S1.replace(/cat|nip/gi, "") and see whether the
difference of the lengths matches the total of the strings, provided
that no string can occur more than once and matchable strings cannot
overlap.
--Jonathan LF King, Mathematics dept, Univ. of Florida
DSS.

--
(c) John Stockton, nr London, UK. ?@merlyn.demon. co.uk Turnpike v6.05 MIME.
Web <URL:http://www.merlyn.demo n.co.uk/- FAQish topics, acronyms, & links.
Proper <= 4-line sig. separator as above, a line exactly "-- " (SonOfRFC1036)
Do not Mail News to me. Before a reply, quote with ">" or "" (SonOfRFC1036)
Jun 27 '08 #5
On Jun 26, 10:52 pm, Dr J R Stockton <j...@merlyn.de mon.co.ukwrote:
[...]
Note that the average intellectual level of those who post with @gmail
addresses is so low that readers may kill-file it /in toto/.
Bad day? My Google Groups profile has a non-gmail address that is
easily discovered by those who care to do so.

<URL: http://www.prejudicenoway.com.au/activities/2156.html >
--
Rob
Jun 27 '08 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

10
39343
by: Anand Pillai | last post by:
To search a word in a group of words, say a paragraph or a web page, would a string search or a regexp search be faster? The string search would of course be, if str.find(substr) != -1: domything() And the regexp search assuming no case restriction would be,
5
2344
by: Lukas Holcik | last post by:
Hi everyone! How can I simply search text for regexps (lets say <a href="(.*?)">(.*?)</a>) and save all URLs(1) and link contents(2) in a dictionary { name : URL}? In a single pass if it could. Or how can I replace the html &entities; in a string "blablabla&amp;blablabal&amp;balbalbal" with the chars they mean using re.sub? I found out they are stored in an dict . I though about this functionality:
10
7676
by: Andrew DeFaria | last post by:
I was reading my O'Reilly JavaScript The Definitive Guide when I came across RegExp and thought I could tighten up my JavaScript code that checks for a valid email address. Why does the following not appear to work: var email_address = "Joe@Schmoe"; var email_regex = new RegExp ("^(\\w+)(\@)(\\w+)(\.)(\\w+)$"); var result = email_regex.exec (email_address); alert (" result = \"" + result + "\"\n" + " result = \"" + result + "\"\n" + "...
5
1815
by: Dr John Stockton | last post by:
ISTM that RegExps deserve a FAQ entry, with links to more detailed sources. An important question, probably not treated by many otherwise worthwhile sources, must be on feature detection of the newer RegExp facilities - for example, greedy/non-greedy. The answer may be that it is not possible to do so in a safe manner; that one can do no better than something like
20
3524
by: RobG | last post by:
I'm messing with getPropertyValue (Mozilla et al) and currentStyle (IE) and have a general function (slightly modified from one originally posted by Steve van Dongen) for getting style properties: function GetCurrentStyle( el, prop ) { if ( window.getComputedStyle ) { // Mozilla et al return window.getComputedStyle(el, '').getPropertyValue(prop) ); } // IE5+ else if ( el.currentStyle ) {
19
3553
by: Dr Clue | last post by:
I'm not really an expert with RegExp() , although I do use it. The problem I have is that I want to strip comments out of a CSS file using RegExp() The reason is that I'm loading and parsing to simulate javscript access to stylesheets in Opera. I thought I had it licked untill the '/' characters in url('') tripped me up Below is a test case. I've tried many things. but if I can't figure out a nice clean RegExp(), I'm going to have to...
4
7461
by: Jon Maz | last post by:
Hi All, I want to strip the accents off characters in a string so that, for example, the (Spanish) word "prctico" comes out as "practico" - but ignoring case, so that "PRCTICO" comes out as "PRACTICO". What's the best way to do this? TIA,
8
2016
by: Dmitry Korolyov | last post by:
ASP.NET app using c# and framework version 1.1.4322.573 on a IIS 6.0 web server. A single-line asp:textbox control and regexp validator attached to it. ^\d+$ expression does match an empty string (when you don't enter any values) - this is wrong d+ expression does not match, for example "g24" string - this is also wrong www.regexplib.com test validator works fine for both cases, i.e. it is reporting "not match" for the...
26
2106
by: Matt Kruse | last post by:
Are there any current browsers that have Javascript support, but not RegExp support? For example, cell phone browsers, blackberrys, or other "minimal" browsers? I know that someone using Netscape 3 would fall into this category, for example, but that's not a realistic situation anymore. And if such a condition exists, then how do you guys handle validation using regular expressions, if the browser lacks them? For example:
6
2265
by: runsun pan | last post by:
Hi I am wondering why I couldn't get what I want in the following 3 cases of re: (A) var p=/(+-?+):(+)/g p.exec("style='font-size:12'") -- // expected
0
7993
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, well explore What is ONU, What Is Router, ONU & Routers main usage, and What is the difference between ONU and Router. Lets take a closer look ! Part I. Meaning of...
0
7916
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8401
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
6729
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development projectplanning, coding, testing, and deploymentwithout human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
5866
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupr who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5440
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
3944
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
1508
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
1252
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.