Regular expression guaranteed to fail

Des Small

I want to use sets and regular expressions to implement some
linguistic ideas. Representing sounds by symbols, and properties
(coronal or velar articulation; voicedness) by sets of symbols with
those properties, it is natural to then map these sets, and
intersections of them, to regular expressions to apply to strings.

The question is, what regular expression should correspond to the
empty set? I've provisionally gone with "(?!.*)", i.e., the negation
of a look-ahead which matches anything. Is there an established idiom
for this, and is that it? And if there isn't, does this seem
reasonable?

Example code:

"""
import sets

def str2set(s): return sets.Set(s.split())

cor = str2set("N D T") # Coronal articulation
vel = str2set("K G") # Velar articulation
voi = str2set("N D G") # Voiced

def set2re(s):
if s: return "|".join([e for e in s])
else: return "(?!.*)"
"""

So we can get a regexp (string) that matches symbols corresponding to
velar and voiced sounds:
"""

set2re(cor & voi) => 'D|N'
"""
But nothing can be (in this model at least) velar and coronal:
""" cor & vel => Set([])
"""
and this maps to the Regexp Which Matches Nothing:
""" set2re(cor & vel)

=> '(?!.*)'
"""

This seems quite elegant to me, but there is such a fine line between
elegance and utter weirdness and I'd be glad to know which side other
persons think I'm on here.

Des
--
"[T]he structural trend in linguistics which took root with the
International Congresses of the twenties and early thirties [...] had
close and effective connections with phenomenology in its Husserlian
and Hegelian versions." -- Roman Jakobson

Jul 18 '05 #1

Subscribe Post Reply

1759

Eric Brunel

Des Small wrote:

I want to use sets and regular expressions to implement some
linguistic ideas. Representing sounds by symbols, and properties
(coronal or velar articulation; voicedness) by sets of symbols with
those properties, it is natural to then map these sets, and
intersections of them, to regular expressions to apply to strings.

The question is, what regular expression should correspond to the
empty set? I've provisionally gone with "(?!.*)", i.e., the negation
of a look-ahead which matches anything. Is there an established idiom
for this, and is that it? And if there isn't, does this seem
reasonable?

I also looked for a never-matching re just a few days ago and ended up with
"^(?!$)$". It's certainly not more "standard" than yours, but I find it a wee
tad more readable (for a regular expression, I mean...): it's quite clear that
it requests a string start not followed by a string end and followed by a string
end, which is guaranteed to never happen. Yours is a bit harder to explain. Mine
may also be more efficient for very long strings, but I can be wrong here.

See what other people think...
--
- Eric Brunel <eric (underscore) brunel (at) despammed (dot) com> -
PragmaDev : Real Time Software Development Tools - http://www.pragmadev.com

Jul 18 '05 #2

Jeremy Bowers

On Fri, 20 Aug 2004 10:35:18 +0000, Des Small wrote:

The question is, what regular expression should correspond to the
empty set?

I would return compiled RE objects instead of strings, and in the empty
case, return a class you write that matches the interface of a compiled RE
but returns what you like. Something like:

def NeverMatch(object):
def match(*args, **kwargs):
return None

def set2re(s):
if s: return re.compile("|".join([e for e in s]))
else: return NeverMatch()

Jul 18 '05 #3

Hallvard B Furuseth

Eric Brunel wrote:

I also looked for a never-matching re just a few days ago and ended up
with "^(?!$)$". It's certainly not more "standard" than yours, but I
find it a wee tad more readable (for a regular expression, I mean...):

I think e.g. r'\Zx' and r'x\A' are more readable. In particular the
latter, but perhaps that causes Python to locate every 'x' in the string
and then check if the string starts at the next character...

--
Hallvard

Jul 18 '05 #4

Greg Chapman

On 22 Aug 2004 20:07:51 +0200, Hallvard B Furuseth <h.**********@usit.uio.no>
wrote:

Eric Brunel wrote:
I also looked for a never-matching re just a few days ago and ended up
with "^(?!$)$". It's certainly not more "standard" than yours, but I
find it a wee tad more readable (for a regular expression, I mean...):

I think e.g. r'\Zx' and r'x\A' are more readable. In particular the
latter, but perhaps that causes Python to locate every 'x' in the string
and then check if the string starts at the next character...

Why not just "(?!)": this always fails immediately (since an empty pattern
matches any string, the negation of an empty pattern match always fails).

---
Greg Chapman

Jul 18 '05 #5

Des Small

Greg Chapman <gl*@well.com> writes:

On 22 Aug 2004 20:07:51 +0200, Hallvard B Furuseth <h.**********@usit.uio.no>
wrote:
Eric Brunel wrote:
I also looked for a never-matching re just a few days ago and
ended up with "^(?!$)$". It's certainly not more "standard" than
yours, but I find it a wee tad more readable (for a regular
expression, I mean...):

I think e.g. r'\Zx' and r'x\A' are more readable. In particular
the latter, but perhaps that causes Python to locate every 'x' in
the string and then check if the string starts at the next
character...

Why not just "(?!)": this always fails immediately (since an empty
pattern matches any string, the negation of an empty pattern match
always fails).

I think we have a winner!

Des
thanks all the persons who contributed, of course.
--
"[T]he structural trend in linguistics which took root with the
International Congresses of the twenties and early thirties [...] had
close and effective connections with phenomenology in its Husserlian
and Hegelian versions." -- Roman Jakobson

Jul 18 '05 #6

Hallvard B Furuseth

Greg Chapman wrote:

Why not just "(?!)": this always fails immediately (since an empty pattern
matches any string, the negation of an empty pattern match always fails).

It's fine for re.match.

'Why not?': Because I'd expect re.search to walk through the entire
string and check if each position in the string matches that regexp.
Unfortunately, a little timing shows that that happens with _every_
regexp suggested so far. Long strings take longer for each of them.
(Except Jeremy's solution, of course, which avoids the whole problem.)
r'\A(?!)' or r'\Ax\A' didn't work either.

Anyway, I note that r'x\A' beats all the other regexps suggested so far
with a factor of 20 when searching 's'*10000.

--
Hallvard

Jul 18 '05 #7

Eric Brunel

Hallvard B Furuseth wrote:

Greg Chapman wrote:
Why not just "(?!)": this always fails immediately (since an empty pattern
matches any string, the negation of an empty pattern match always fails).

It's fine for re.match.

'Why not?': Because I'd expect re.search to walk through the entire
string and check if each position in the string matches that regexp.
Unfortunately, a little timing shows that that happens with _every_
regexp suggested so far. Long strings take longer for each of them.
(Except Jeremy's solution, of course, which avoids the whole problem.)
r'\A(?!)' or r'\Ax\A' didn't work either.

Anyway, I note that r'x\A' beats all the other regexps suggested so far
with a factor of 20 when searching 's'*10000.

And when searching 'x'*10000? Since there is an 'x' in the re, it may change
things a lot...
--
- Eric Brunel <eric (underscore) brunel (at) despammed (dot) com> -
PragmaDev : Real Time Software Development Tools - http://www.pragmadev.com

Jul 18 '05 #8

Hallvard B Furuseth

Eric Brunel wrote:

Hallvard B Furuseth wrote:
Anyway, I note that r'x\A' beats all the other regexps suggested so far
with a factor of 20 when searching 's'*10000.

And when searching 'x'*10000? Since there is an 'x' in the re, it may change
things a lot...

Heh. You are right: That's about almost as slow as the others. A bit
slower than \Zx and \Ax\A, but still faster than the other alternatives.

--
Hallvard

Jul 18 '05 #9

Similar topics

regular expression reverse match?

by: Ron Adam | last post by:

Is it possible to match a string to regular expression pattern instead of the other way around? For example, instead of finding a match within a string, I want to find out, (pass or fail), if...

Python

Need help understanding regular expression

by: Joe | last post by:

Hi, I have been using a regular expression that I donâ€™t uite understand to filter the valid email address. My regular expression is as follows: <asp:RegularExpressionValidator...

ASP / Active Server Pages

C# Regular Expression

by: hclugano | last post by:

Hello! I need some help... I have a Text (an SQL-Create-Table-Statement) and have to find the name of the table. There are two ways, the tablename is written: . (new SQL-standard) or...

C# / C Sharp

Regular Expression Help

by: tmeister | last post by:

I am in need of a regular expression that tests and fails if there are 14 or more of a character in the test string. There can be up to 13 of these characters in the string and any other...

ASP.NET

check object being a compiled regular expression

by: Helmut Jarausch | last post by:

Hi, sorry, this seems to be a FAQ but I couldn't find anything I need to check if an object is a compiled regular expression Say import re RX= re.compile('^something') how to test

Python

Ignoring spaces in regular expression matching

by: Mark Rae | last post by:

Hi, I'm trying to construct a RegEx pattern which will validate a string so that it can contain: only the numerical characters from 0 to 9 i.e. no decimal points, negative signs, exponentials...

C# / C Sharp

Regular Expression help

by: Zach | last post by:

Hello, Please forgive if this is not the most appropriate newsgroup for this question. Unfortunately I didn't find a newsgroup specific to regular expressions. I have the following regular...

C# / C Sharp

regular expression question

by: laredotornado | last post by:

Hi, I have a span that contains text of the form var spanHtml = "My Tab Content (7)"; The content is guaranteed to end with a string of the form "(#)" where "#" is a whole number. My...

Javascript

Help with Regular Expression

by: carlos | last post by:

I am working on a regular expression validation for my search page. What I have so far works for most cases, but I would like to fine tune it some. I am new to regular expressions, and I do not...

ASP.NET

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++