I want to use sets and regular expressions to implement some
linguistic ideas. Representing sounds by symbols, and properties
(coronal or velar articulation; voicedness) by sets of symbols with
those properties, it is natural to then map these sets, and
intersections of them, to regular expressions to apply to strings.
The question is, what regular expression should correspond to the
empty set? I've provisionally gone with "(?!.*)", i.e., the negation
of a look-ahead which matches anything. Is there an established idiom
for this, and is that it? And if there isn't, does this seem
reasonable?
Example code:
"""
import sets
def str2set(s): return sets.Set(s.spli t())
cor = str2set("N D T") # Coronal articulation
vel = str2set("K G") # Velar articulation
voi = str2set("N D G") # Voiced
def set2re(s):
if s: return "|".join([e for e in s])
else: return "(?!.*)"
"""
So we can get a regexp (string) that matches symbols corresponding to
velar and voiced sounds:
""" set2re(cor & voi)
=> 'D|N'
"""
But nothing can be (in this model at least) velar and coronal:
""" cor & vel
=> Set([])
"""
and this maps to the Regexp Which Matches Nothing:
""" set2re(cor & vel)
=> '(?!.*)'
"""
This seems quite elegant to me, but there is such a fine line between
elegance and utter weirdness and I'd be glad to know which side other
persons think I'm on here.
Des
--
"[T]he structural trend in linguistics which took root with the
International Congresses of the twenties and early thirties [...] had
close and effective connections with phenomenology in its Husserlian
and Hegelian versions." -- Roman Jakobson 8 1771
Des Small wrote: I want to use sets and regular expressions to implement some linguistic ideas. Representing sounds by symbols, and properties (coronal or velar articulation; voicedness) by sets of symbols with those properties, it is natural to then map these sets, and intersections of them, to regular expressions to apply to strings.
The question is, what regular expression should correspond to the empty set? I've provisionally gone with "(?!.*)", i.e., the negation of a look-ahead which matches anything. Is there an established idiom for this, and is that it? And if there isn't, does this seem reasonable?
I also looked for a never-matching re just a few days ago and ended up with
"^(?!$)$". It's certainly not more "standard" than yours, but I find it a wee
tad more readable (for a regular expression, I mean...): it's quite clear that
it requests a string start not followed by a string end and followed by a string
end, which is guaranteed to never happen. Yours is a bit harder to explain. Mine
may also be more efficient for very long strings, but I can be wrong here.
See what other people think...
--
- Eric Brunel <eric (underscore) brunel (at) despammed (dot) com> -
PragmaDev : Real Time Software Development Tools - http://www.pragmadev.com
On Fri, 20 Aug 2004 10:35:18 +0000, Des Small wrote: The question is, what regular expression should correspond to the empty set?
I would return compiled RE objects instead of strings, and in the empty
case, return a class you write that matches the interface of a compiled RE
but returns what you like. Something like:
def NeverMatch(obje ct):
def match(*args, **kwargs):
return None
def set2re(s):
if s: return re.compile("|". join([e for e in s]))
else: return NeverMatch()
Eric Brunel wrote: I also looked for a never-matching re just a few days ago and ended up with "^(?!$)$". It's certainly not more "standard" than yours, but I find it a wee tad more readable (for a regular expression, I mean...):
I think e.g. r'\Zx' and r'x\A' are more readable. In particular the
latter, but perhaps that causes Python to locate every 'x' in the string
and then check if the string starts at the next character...
--
Hallvard
On 22 Aug 2004 20:07:51 +0200, Hallvard B Furuseth <h.**********@u sit.uio.no>
wrote: Eric Brunel wrote:
I also looked for a never-matching re just a few days ago and ended up with "^(?!$)$". It's certainly not more "standard" than yours, but I find it a wee tad more readable (for a regular expression, I mean...):
I think e.g. r'\Zx' and r'x\A' are more readable. In particular the latter, but perhaps that causes Python to locate every 'x' in the string and then check if the string starts at the next character...
Why not just "(?!)": this always fails immediately (since an empty pattern
matches any string, the negation of an empty pattern match always fails).
---
Greg Chapman
Greg Chapman <gl*@well.com > writes: On 22 Aug 2004 20:07:51 +0200, Hallvard B Furuseth <h.**********@u sit.uio.no> wrote:
Eric Brunel wrote:
I also looked for a never-matching re just a few days ago and ended up with "^(?!$)$". It's certainly not more "standard" than yours, but I find it a wee tad more readable (for a regular expression, I mean...):
I think e.g. r'\Zx' and r'x\A' are more readable. In particular the latter, but perhaps that causes Python to locate every 'x' in the string and then check if the string starts at the next character...
Why not just "(?!)": this always fails immediately (since an empty pattern matches any string, the negation of an empty pattern match always fails).
I think we have a winner!
Des
thanks all the persons who contributed, of course.
--
"[T]he structural trend in linguistics which took root with the
International Congresses of the twenties and early thirties [...] had
close and effective connections with phenomenology in its Husserlian
and Hegelian versions." -- Roman Jakobson
Greg Chapman wrote: Why not just "(?!)": this always fails immediately (since an empty pattern matches any string, the negation of an empty pattern match always fails).
It's fine for re.match.
'Why not?': Because I'd expect re.search to walk through the entire
string and check if each position in the string matches that regexp.
Unfortunately, a little timing shows that that happens with _every_
regexp suggested so far. Long strings take longer for each of them.
(Except Jeremy's solution, of course, which avoids the whole problem.)
r'\A(?!)' or r'\Ax\A' didn't work either.
Anyway, I note that r'x\A' beats all the other regexps suggested so far
with a factor of 20 when searching 's'*10000.
--
Hallvard
Hallvard B Furuseth wrote: Greg Chapman wrote:
Why not just "(?!)": this always fails immediately (since an empty pattern matches any string, the negation of an empty pattern match always fails).
It's fine for re.match.
'Why not?': Because I'd expect re.search to walk through the entire string and check if each position in the string matches that regexp. Unfortunately, a little timing shows that that happens with _every_ regexp suggested so far. Long strings take longer for each of them. (Except Jeremy's solution, of course, which avoids the whole problem.) r'\A(?!)' or r'\Ax\A' didn't work either.
Anyway, I note that r'x\A' beats all the other regexps suggested so far with a factor of 20 when searching 's'*10000.
And when searching 'x'*10000? Since there is an 'x' in the re, it may change
things a lot...
--
- Eric Brunel <eric (underscore) brunel (at) despammed (dot) com> -
PragmaDev : Real Time Software Development Tools - http://www.pragmadev.com
Eric Brunel wrote: Hallvard B Furuseth wrote: Anyway, I note that r'x\A' beats all the other regexps suggested so far with a factor of 20 when searching 's'*10000.
And when searching 'x'*10000? Since there is an 'x' in the re, it may change things a lot...
Heh. You are right: That's about almost as slow as the others. A bit
slower than \Zx and \Ax\A, but still faster than the other alternatives.
--
Hallvard This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Ron Adam |
last post by:
Is it possible to match a string to regular expression pattern instead
of the other way around?
For example, instead of finding a match within a string, I want to
find out, (pass or fail), if a string is a partial match to an re.
Given an re of 'abcd and a bunch of other stuff'
This is what i'm looking for:
|
by: Joe |
last post by:
Hi,
I have been using a regular expression that I don’t uite understand to
filter the valid email address. My regular expression is as follows:
<asp:RegularExpressionValidator id="valValidEmail"
runat="server"
ControlToValidate="txtEmail"
ValidationExpression="^(+)(\.+)*@(+)(\.+)*(\.{2,4})$"
|
by: hclugano |
last post by:
Hello!
I need some help... I have a Text (an SQL-Create-Table-Statement) and
have to find the name of the table.
There are two ways, the tablename is written:
. (new SQL-standard) or dbo.TABLENAME (old)
Now I have to select the TABLENAME. I have no experience in Regular
Expressions. I tried it with a lot of Regex-Expressions and never get
the right thing.
For Example I tried: (*)\.(*)
|
by: tmeister |
last post by:
I am in need of a regular expression that tests and fails if there are 14 or
more of a character in the test string. There can be up to 13 of these
characters in the string and any other characters, but at the 14th of this
character it should fail.
Thanks,
Todd Meister
|
by: Helmut Jarausch |
last post by:
Hi,
sorry, this seems to be a FAQ but I couldn't find anything
I need to check if an object is a compiled regular expression
Say
import re
RX= re.compile('^something')
how to test
| |
by: Mark Rae |
last post by:
Hi,
I'm trying to construct a RegEx pattern which will validate a string so that
it can contain:
only the numerical characters from 0 to 9 i.e. no decimal points, negative
signs, exponentials etc
only the 26 letters of the standard Western alphabet in either upper or
lower case
spaces i.e. ASCII character 32
|
by: Zach |
last post by:
Hello,
Please forgive if this is not the most appropriate newsgroup for this
question. Unfortunately I didn't find a newsgroup specific to regular
expressions.
I have the following regular expression.
^(.+?) uses (?!a spoon)\.$
|
by: laredotornado |
last post by:
Hi,
I have a span that contains text of the form
var spanHtml = "My Tab Content (7)";
The content is guaranteed to end with a string of the form "(#)" where
"#" is a whole number. My question is, how would I write a regular
expression that would change the value of #, given another number?
For example, given the above variable, if I had
|
by: carlos |
last post by:
I am working on a regular expression validation for my search page.
What I have so far works for most cases, but I would like to fine tune
it some. I am new to regular expressions, and I do not have the time
to read up some more on it. Can someone help?
What I would like to do is allow words to be parsed using quotes.
However, they can also include boolean searching. Lastly, I need to
ensure the character's do not exceed a certain...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
| |
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
|
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
| |
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |