Connecting Tech Pros Worldwide Help | Site Map

Regular expression woes

 
LinkBack Thread Tools Search this Thread
  #1  
Old July 23rd, 2005, 05:34 PM
Mark (News)
Guest
 
Posts: n/a
Default Regular expression woes

I'm not really sure where to post this question as it covers so many
platforms, but as the platform isn't relevant, here goes...

I'm trying to (pulling my hair out more like) construct a regular
expression string that says the following: "match if the input string
does not start with the characters http". E.g.

e.g.
"this string" - match
"this http string" - match
"http-and-a-bit-more-text" - no match
"ht" - match
"" - match

I've tried something like ^[^(^http)] but this gives no match on the
last 2. Any ideas? - I'd really appreciate it!
Cheers
Mark


  #2  
Old July 23rd, 2005, 05:34 PM
Paul Lalli
Guest
 
Posts: n/a
Default Re: Regular expression woes

"Mark (News)" <news@mail.adsl4less.com> wrote in message
news:1107530384.767403.82620@c13g2000cwb.googlegro ups.com...[color=blue]
> I'm not really sure where to post this question as it covers so many
> platforms, but as the platform isn't relevant, here goes...[/color]

Incorrect. The platform is exceedingly relevant. Regular expressions
are not a constant across languages. Perl regular expression are not
the same as Javascript regular expressions are not the same as PHP
regular expressions.

Choose one or the other, tell us what you're *trying* to do, and in what
environment you're doing it, and then someone can help you.

Paul Lalli

  #3  
Old July 23rd, 2005, 05:34 PM
Leendert Bottelberghs
Guest
 
Posts: n/a
Default Re: Regular expression woes

On Fri, 04 Feb 2005 07:19:44 -0800, Mark (News) wrote:[color=blue]
> I'm trying to (pulling my hair out more like) construct a regular
> expression string that says the following: "match if the input string
> does not start with the characters http". E.g.
>
> e.g.
> "this string" - match
> "this http string" - match
> "http-and-a-bit-more-text" - no match
> "ht" - match
> "" - match[/color]

So don't match if the string starts with "http":

$str !~ m/^http/


-leendert bottelberghs
  #4  
Old July 23rd, 2005, 05:34 PM
ioneabu@yahoo.com
Guest
 
Posts: n/a
Default Re: Regular expression woes


Mark (News) wrote:[color=blue]
> I'm not really sure where to post this question as it covers so many
> platforms, but as the platform isn't relevant, here goes...
>
> I'm trying to (pulling my hair out more like) construct a regular
> expression string that says the following: "match if the input string
> does not start with the characters http". E.g.[/color]

wouldn't it be:

$match !~ m/^http/;

Is there an equivalent negation metacharacter for a word and not just a
character class? I was just wondering about that.

wana

  #5  
Old July 23rd, 2005, 05:34 PM
Chris Mattern
Guest
 
Posts: n/a
Default Re: Regular expression woes

Mark (News) wrote:
[color=blue]
> I'm not really sure where to post this question as it covers so many
> platforms, but as the platform isn't relevant, here goes...
>
> I'm trying to (pulling my hair out more like) construct a regular
> expression string that says the following: "match if the input string
> does not start with the characters http". E.g.
>
> e.g.
> "this string" - match
> "this http string" - match
> "http-and-a-bit-more-text" - no match
> "ht" - match
> "" - match
>
> I've tried something like ^[^(^http)] but this gives no match on the
> last 2. Any ideas? - I'd really appreciate it!
> Cheers
> Mark[/color]

Use the "does not match" operator, !~.

if ($my_string !~ /^http/) {
do_something(); }

If you're not using perl, well I guess your platform *is* relevant...
--
Christopher Mattern

"Which one you figure tracked us?"
"The ugly one, sir."
"...Could you be more specific?"
  #6  
Old July 23rd, 2005, 05:34 PM
Sherm Pendley
Guest
 
Posts: n/a
Default Re: Regular expression woes

Paul Lalli wrote:
[color=blue]
> Incorrect. The platform is exceedingly relevant. Regular expressions
> are not a constant across languages. Perl regular expression are not
> the same as Javascript regular expressions are not the same as PHP
> regular expressions.[/color]

Also, what you're trying to do - negate a match condition - is often easier
to do in the host language than in the regex itself. For example, in Perl
you could do what you asked with this:

if ($some_string !~ /^http/) { ... }
# or
unless (/^http/) { ... }

But that just reinforces Paul's point - the platform is very relevant.

sherm--

--
Cocoa programming in Perl: http://camelbones.sourceforge.net
Hire me! My resume: http://www.dot-app.org
  #7  
Old July 23rd, 2005, 05:34 PM
Mark (News)
Guest
 
Posts: n/a
Default Re: Regular expression woes

I appreciate all the effort in providing a solution to the wider
problem, but perhaps I should have been more explicit - my fault.

I'm specifically trying to avoid using the host shell to do the
negation even though I can use this approach in just about any
language. What I'm really after is to contain the logic entirely within
the regular expression.

Why? Intellectual exercise. :-) (Kind of like why people climb
mountains, but without having to take my butt off the chair.)

Cheers
Mark

  #8  
Old July 23rd, 2005, 05:34 PM
Evertjan.
Guest
 
Posts: n/a
Default Re: Regular expression woes

Mark (News) wrote on 04 feb 2005 in comp.lang.javascript:
[color=blue]
> I'm not really sure where to post this question as it covers so many
> platforms, but as the platform isn't relevant, here goes...
>
> I'm trying to (pulling my hair out more like) construct a regular
> expression string that says the following: "match if the input string
> does not start with the characters http". E.g.
>
> e.g.
> "this string" - match
> "this http string" - match
> "http-and-a-bit-more-text" - no match
> "ht" - match
> "" - match[/color]

In javascript this function is not match but test:

var s = "this http string"

if (!/^http/.test(s))
alert("Match!")
else
alert("No match!")

--
Evertjan.
The Netherlands.
(Replace all crosses with dots in my emailaddress)

  #9  
Old July 23rd, 2005, 05:35 PM
Rasto Levrinc
Guest
 
Posts: n/a
Default Re: Regular expression woes

Mark (News) wrote:[color=blue]
> I appreciate all the effort in providing a solution to the wider
> problem, but perhaps I should have been more explicit - my fault.
>
> I'm specifically trying to avoid using the host shell to do the
> negation even though I can use this approach in just about any
> language. What I'm really after is to contain the logic entirely within
> the regular expression.[/color]

You can do it with a zero-width negative look-ahead assertion in perl.

$string=~/^(?!http)/

--

Rasto Levrinc
http://sourceforge.net/projects/rlocate/
  #10  
Old July 23rd, 2005, 05:35 PM
Mark (News)
Guest
 
Posts: n/a
Default Re: Regular expression woes

Wow - quite brilliant!

Clearly this was far too easy for you. :-)

Cheers
Mark

  #11  
Old July 23rd, 2005, 05:35 PM
Dietmar Meier
Guest
 
Posts: n/a
Default Re: Regular expression woes

Rasto Levrinc wrote:
[color=blue][color=green]
>> What I'm really after is to contain the logic entirely within
>> the regular expression.[/color][/color]
[color=blue]
> You can do it with a zero-width negative look-ahead assertion in perl.
>
> $string=~/^(?!http)/[/color]

Some JavaScript implementations implement regular expressions but
don't implement look-ahead assertions. Here you would need

/^([^h]ttp.*|h[^t]tp.*|ht[^t]p|htt[^p].*|.{0,3})$/.test(string)

ciao, dhgm
  #12  
Old July 23rd, 2005, 05:35 PM
Evertjan.
Guest
 
Posts: n/a
Default Re: Regular expression woes

Dietmar Meier wrote on 04 feb 2005 in comp.lang.javascript:
[color=blue][color=green][color=darkred]
>>> What I'm really after is to contain the logic entirely within
>>> the regular expression.[/color][/color]
>[color=green]
>> You can do it with a zero-width negative look-ahead assertion in perl.
>>
>> $string=~/^(?!http)/[/color]
>
> Some JavaScript implementations implement regular expressions but
> don't implement look-ahead assertions. Here you would need
>
> /^([^h]ttp.*|h[^t]tp.*|ht[^t]p|htt[^p].*|.{0,3})$/.test(string)[/color]

[The $ cannot be right, I think.]

r = /^(([^h]...)|(.[^t]..)|(..[^t].)|(...[^p]))/.test(s)



--
Evertjan.
The Netherlands.
(Replace all crosses with dots in my emailaddress)

  #13  
Old July 23rd, 2005, 05:35 PM
Dietmar Meier
Guest
 
Posts: n/a
Default Re: Regular expression woes

Evertjan. wrote:
[color=blue][color=green]
>> /^([^h]ttp.*|h[^t]tp.*|ht[^t]p|htt[^p].*|.{0,3})$/.test(string)[/color][/color]
[color=blue]
> [The $ cannot be right, I think.][/color]

For what value of string do you think, the "$" would lead to the
wrong result?
[color=blue]
> r = /^(([^h]...)|(.[^t]..)|(..[^t].)|(...[^p]))/.test(s)[/color]

This would not match strings with 3 or less characters.

ciao, dhgm
  #14  
Old July 23rd, 2005, 05:35 PM
Evertjan.
Guest
 
Posts: n/a
Default Re: Regular expression woes

Dietmar Meier wrote on 04 feb 2005 in comp.lang.javascript:
[color=blue]
> Evertjan. wrote:
>[color=green][color=darkred]
>>> /^([^h]ttp.*|h[^t]tp.*|ht[^t]p|htt[^p].*|.{0,3})$/.test(string)[/color][/color]
>[color=green]
>> [The $ cannot be right, I think.][/color]
>
> For what value of string do you think, the "$" would lead to the
> wrong result?[/color]

"xttp://" should return true
"http://" should return false

Yes, you are right here.
[color=blue][color=green]
>> r = /^(([^h]...)|(.[^t]..)|(..[^t].)|(...[^p]))/.test(s)[/color]
>
> This would not match strings with 3 or less characters.[/color]

Yes, you are right again.

Let me try:

r = /^(([^h]...)|(.[^t]..)|(..[^t].)|(...[^p])|(.{0,3}$))/.test(s)

[I could loose some () but I like them for clarity

--
Evertjan.
The Netherlands.
(Replace all crosses with dots in my emailaddress)

  #15  
Old July 23rd, 2005, 05:35 PM
Grant Wagner
Guest
 
Posts: n/a
Default Re: Regular expression woes

"Dietmar Meier" <usereplytoinstead@innoline-systemtechnik.de> wrote in
message news:36hoasF52jhqlU1@individual.net...[color=blue]
> Rasto Levrinc wrote:
>[color=green][color=darkred]
>>> What I'm really after is to contain the logic entirely within
>>> the regular expression.[/color][/color]
>[color=green]
>> You can do it with a zero-width negative look-ahead assertion in
>> perl.
>>
>> $string=~/^(?!http)/[/color]
>
> Some JavaScript implementations implement regular expressions but
> don't implement look-ahead assertions. Here you would need
>
> /^([^h]ttp.*|h[^t]tp.*|ht[^t]p|htt[^p].*|.{0,3})$/.test(string)[/color]

Why do people insist on doing things the hardest way possible. Test for
the condition you don't want, then negate it.

if (!/^http/i.test(some_string)) { ... }

By the way, this is pretty much the same solution already provided for
Perl:

if ($some_string !~ /^http/) { ... }

(although I chose to make it case-insensitive, since the protocol in a
URI isn't case-sensitive, it could be upper, lower or mixed case)

--
Grant Wagner <gwagner@agricoreunited.com>
comp.lang.javascript FAQ - http://jibbering.com/faq


  #16  
Old July 23rd, 2005, 05:35 PM
Mark (News)
Guest
 
Posts: n/a
Default Re: Regular expression woes

"Why do people insist on doing things the hardest way possible."? Well,
as I said in an earlier post, I wanted to do the whole thing within a
regex rather than resorting to the shell. Mainly because, crazy as it
sounds, it's a fun intellectual exercise. :-) And anyway, if I always
take the path of least resistance, I'll never learn, right? (But I
guess that's OT.)

  #17  
Old July 23rd, 2005, 05:35 PM
Richards Noah \(IFR LIT MET\)
Guest
 
Posts: n/a
Default Re: Regular expression woes

"Evertjan." <exjxw.hannivoort@interxnl.net> wrote in message
news:Xns95F3C5F131A46eejj99@194.109.133.29...[color=blue]
> Dietmar Meier wrote on 04 feb 2005 in comp.lang.javascript:
>[color=green]
> > Evertjan. wrote:
> >[color=darkred]
> >>> /^([^h]ttp.*|h[^t]tp.*|ht[^t]p|htt[^p].*|.{0,3})$/.test(string)[/color]
> >[color=darkred]
> >> [The $ cannot be right, I think.][/color]
> >
> > For what value of string do you think, the "$" would lead to the
> > wrong result?[/color]
>
> "xttp://" should return true
> "http://" should return false
>
> Yes, you are right here.
>[color=green][color=darkred]
> >> r = /^(([^h]...)|(.[^t]..)|(..[^t].)|(...[^p]))/.test(s)[/color]
> >
> > This would not match strings with 3 or less characters.[/color]
>
> Yes, you are right again.
>
> Let me try:
>
> r = /^(([^h]...)|(.[^t]..)|(..[^t].)|(...[^p])|(.{0,3}$))/.test(s)
>
> [I could loose some () but I like them for clarity
>[/color]

None of those regular expressions will work. For example, you regexp will
not match against "this string", since it differs in 4 places in the first 4
characters.

You cannot negate a string by negating each character. If you really wanted
to do it in that way, you would have to negate all possible combinations of
letters in "http". So, just for fun, it would look something like this
(newlines added for clarity):

/^(
([^h][^t][^t][^p])|

(h[^t][^t][^p])|
([^h]t[^t][^p])|
([^h][^t]t[^p])|
([^h][^t][^t]p)|

(ht[^t][^p])|
(h[^t]t[^p])|
(h[^t][^t]p)|
([^h]tt[^p])|
([^h]t[^t]p)|
([^h][^t]tp)|

(htt[^p])|
(ht[^t]p)|
(h[^t]tp)|
([^h]ttp)

)|(.{0,3}$)/

The moral of this story: "negating" a string in regular expressions is very,
very ugly (without negative look ahead). Your best bet, as many others have
mentioned, is to do something akin to perl's !~, i.e. match against ^http,
and consider matches to be, well, not matches.


  #18  
Old July 23rd, 2005, 05:35 PM
Evertjan.
Guest
 
Posts: n/a
Default Re: Regular expression woes

Richards Noah (IFR LIT MET) wrote on 04 feb 2005 in
comp.lang.javascript:
[color=blue][color=green]
>> r = /^(([^h]...)|(.[^t]..)|(..[^t].)|(...[^p])|(.{0,3}$))/.test(s)
>>
>> [I could loose some () but I like them for clarity
>>[/color]
>
> None of those regular expressions will work. For example, you regexp
> will not match against "this string", since it differs in 4 places in
> the first 4 characters.
>[/color]

s = "this string"
r = /^(([^h]...)|(.[^t]..)|(..[^t].)|(...[^p])|(.{0,3}$))/.test(s)
alert(r)

shows: true as per OQ.

So what is the problem?

Please show a string that does not work.

--
Evertjan.
The Netherlands.
(Replace all crosses with dots in my emailaddress)

  #19  
Old July 23rd, 2005, 05:35 PM
Ilya Zakharevich
Guest
 
Posts: n/a
Default Re: Regular expression woes

[A complimentary Cc of this posting was sent to
Evertjan.
<exjxw.hannivoort@interxnl.net>], who wrote in article <Xns95F3E7D7DF08Ceejj99@194.109.133.29>:[color=blue]
> r = /^(([^h]...)|(.[^t]..)|(..[^t].)|(...[^p])|(.{0,3}$))/.test(s)[/color]

Too much work.

[^h]
| h[^t]
| ht[^t]
| htt[^p]
| .{0,3}$

Hope this helps,
Ilya
  #20  
Old July 23rd, 2005, 05:36 PM
Mark (News)
Guest
 
Posts: n/a
Default Re: Regular expression woes

Is it true that if zero-width negative look-ahead is not available,
there is always an alternative regex to do the job?

  #21  
Old July 23rd, 2005, 05:36 PM
Joe Smith
Guest
 
Posts: n/a
Default Re: Regular expression woes

Mark (News) wrote:[color=blue]
> "Why do people insist on doing things the hardest way possible."? Well,
> as I said in an earlier post, I wanted to do the whole thing within a
> regex rather than resorting to the shell.[/color]

OK, but that doesn't answer the question. The statement

if (!/^http/i.test(some_string)) { ... }

does not resort to using the shell, and therefore is acceptable,
is it not?

Doing it with a regex + some simple programming is not the same
as resorting to the shell. So, what are your actual requirements?
-Joe
  #22  
Old July 23rd, 2005, 05:36 PM
Dietmar Meier
Guest
 
Posts: n/a
Default Re: Regular expression woes

Grant Wagner wrote:
[color=blue]
> Why do people insist on doing things the hardest way possible.[/color]

I don't insist on nothing. Mark announced this as a brainteaser,
nobody is actually expected to use this in a real script.

ciao, dhgm
  #23  
Old July 23rd, 2005, 05:39 PM
Mark (News)
Guest
 
Posts: n/a
Default Re: Regular expression woes

The if (! some-test) { ... } has the negation as part of the if
statement (I may have erroneously called this negation "using the
shell", whereas it might have been more precise to say "part of the if
statement"). What I was challenging was to achieve the same result, but
keeping any negations inside the RE: if(some-re-test). Hope that makes
sense. My actual requirements are to enjoy solving this challenge -
nothing more. :-)

  #24  
Old July 23rd, 2005, 05:39 PM
Ilya Zakharevich
Guest
 
Posts: n/a
Default Re: Regular expression woes

[A complimentary Cc of this posting was sent to
Mark (News)
<news@mail.adsl4less.com>], who wrote in article <1107875179.397295.104110@c13g2000cwb.googlegroups .com>:[color=blue]
> The if (! some-test) { ... } has the negation as part of the if
> statement (I may have erroneously called this negation "using the
> shell", whereas it might have been more precise to say "part of the if
> statement"). What I was challenging was to achieve the same result, but
> keeping any negations inside the RE: if(some-re-test). Hope that makes
> sense. My actual requirements are to enjoy solving this challenge -
> nothing more. :-)[/color]

Keep in mind that there *is* a legitimate situation when such a
requirement is not bogus: some programs take a REx as a command-line
argument. Given this design, you may be forced to provide some REx
aerobatics if you want squeeze more from such programs.

[If the program is not updated often, sometimes it is easier to change
the source code. ;-]

Hope this helps,
Ilya
 

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Popular Articles

What is Bytes?

We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights. Get the best answers to your questions from over 220,989 network members.