Connecting Tech Pros Worldwide Help | Site Map

regular expression syntax basics

  #1  
Old July 17th, 2005, 09:41 AM
deko
Guest
 
Posts: n/a
I'm trying to match top-level domains, where $sub = top level domain

if ( eregi("^com$ | ^org$ | ^net$ | ^biz$ | ^info$ | ^edu$ | ^gov$ | ^int$ |
^mil$", $sub) )

{
do stuff here
}

I've tried quoting each one individually ( "^com$" | "^net$" | ... ) but
that does not work either. How are quotes supposed to be used here? Should
I use a single quote? What's the difference?


  #2  
Old July 17th, 2005, 09:41 AM
deko
Guest
 
Posts: n/a

re: regular expression syntax basics


> I'm trying to match top-level domains, where $sub = top level domain[color=blue]
>
> if ( eregi("^com$ | ^org$ | ^net$ | ^biz$ | ^info$ | ^edu$ | ^gov$ | ^int$[/color]
|[color=blue]
> ^mil$", $sub) )
>
> {
> do stuff here
> }
>
> I've tried quoting each one individually ( "^com$" | "^net$" | ... ) but
> that does not work either. How are quotes supposed to be used here?[/color]
Should[color=blue]
> I use a single quote? What's the difference?
>[/color]

Another example (that works, but could be imporoved, I think) is this:

if ( (eregi("bot",$agent)) || (ereg("Google",$agent)) ||
(ereg("Slurp",$agent)) || (ereg("Scooter",$agent)) ||
(eregi("Spider",$agent)) || (eregi("Infoseek",$agent)) ||
(eregi("W3C_Validator",$agent)) || (eregi("ia_archiver",$agent)) )
{
do stuff here
}

Is there a way to avoid the "||" statements?


  #3  
Old July 17th, 2005, 09:41 AM
Justin Koivisto
Guest
 
Posts: n/a

re: regular expression syntax basics


deko wrote:
[color=blue]
> I'm trying to match top-level domains, where $sub = top level domain
>
> if ( eregi("^com$ | ^org$ | ^net$ | ^biz$ | ^info$ | ^edu$ | ^gov$ | ^int$ |
> ^mil$", $sub) )
>
> {
> do stuff here
> }
>
> I've tried quoting each one individually ( "^com$" | "^net$" | ... ) but
> that does not work either. How are quotes supposed to be used here? Should
> I use a single quote? What's the difference?
>
>[/color]

Try more like this:
eregi("^(com|org|net|biz|info|edu|gov|int|mil)$", $sub)

--
Justin Koivisto - spam@koivi.com
http://www.koivi.com
  #4  
Old July 17th, 2005, 09:41 AM
deko
Guest
 
Posts: n/a

re: regular expression syntax basics


> Try more like this:[color=blue]
> eregi("^(com|org|net|biz|info|edu|gov|int|mil)$", $sub)[/color]

I'll give it a shot. Do you think I could apply the same syntax to this:

if ( (eregi("bot",$agent)) || (ereg("Google",$agent)) ||
(ereg("Slurp",$agent)) || (ereg("Scooter",$agent)) ||
(eregi("Spider",$agent)) || (eregi("Infoseek",$agent)) ||
(eregi("W3C_Validator",$agent)) || (eregi("ia_archiver",$agent)) )
{
do stuff here
}


  #5  
Old July 17th, 2005, 09:41 AM
Justin Koivisto
Guest
 
Posts: n/a

re: regular expression syntax basics


deko wrote:
[color=blue][color=green]
>>I'm trying to match top-level domains, where $sub = top level domain
>>
>>if ( eregi("^com$ | ^org$ | ^net$ | ^biz$ | ^info$ | ^edu$ | ^gov$ | ^int$[/color]
>
> |
>[color=green]
>>^mil$", $sub) )
>>
>> {
>> do stuff here
>> }
>>
>>I've tried quoting each one individually ( "^com$" | "^net$" | ... ) but
>>that does not work either. How are quotes supposed to be used here?[/color]
>
> Should
>[color=green]
>>I use a single quote? What's the difference?
>>[/color]
> Another example (that works, but could be imporoved, I think) is this:
>
> if ( (eregi("bot",$agent)) || (ereg("Google",$agent)) ||
> (ereg("Slurp",$agent)) || (ereg("Scooter",$agent)) ||
> (eregi("Spider",$agent)) || (eregi("Infoseek",$agent)) ||
> (eregi("W3C_Validator",$agent)) || (eregi("ia_archiver",$agent)) )
> {
> do stuff here
> }
>
> Is there a way to avoid the "||" statements?[/color]

if (
(eregi("(bot|Google|Slurp|Scooter|Spider|Infoseek| W3C_Validator|ia_archiver)",$agent))
)

--
Justin Koivisto - spam@koivi.com
http://www.koivi.com
  #6  
Old July 17th, 2005, 09:41 AM
deko
Guest
 
Posts: n/a

re: regular expression syntax basics


> if ([color=blue]
>[/color]
(eregi("(bot|Google|Slurp|Scooter|Spider|Infoseek| W3C_Validator|ia_archiver)
",$agent))[color=blue]
> )[/color]

Cool!! Thanks! That's much better...

As an aside, are there many cases where a domain would appear as:

something.com.something

Is that only for international domains?


  #7  
Old July 17th, 2005, 09:41 AM
Michael Fesser
Guest
 
Posts: n/a

re: regular expression syntax basics


.oO(deko)
[color=blue]
>I'm trying to match top-level domains, where $sub = top level domain[/color]

You could also do it without a regex:

$gTLD = array('com', 'org', 'net', ...);
if (in_array(strtolower($sub), $gTLD)) {
// do something
}

Micha
  #8  
Old July 17th, 2005, 09:41 AM
Michael Fesser
Guest
 
Posts: n/a

re: regular expression syntax basics


.oO(deko)
[color=blue]
>Another example (that works, but could be imporoved, I think) is this:
>
>if ( (eregi("bot",$agent)) || (ereg("Google",$agent)) ||
>[...][/color]

Much to slow.
[color=blue]
>Is there a way to avoid the "||" statements?[/color]

Another idea:

$agents = array(
'bot', 'Google', 'Slurp',
'Scooter', 'Spider', 'Infoseek',
'W3_Validator', 'ia_archiver'
);
$pattern = sprintf('#%s#i', implode($agents, '|'));

if (preg_match($pattern, $agent)) {
// do stuff here
}

Micha
  #9  
Old July 17th, 2005, 09:41 AM
Justin Koivisto
Guest
 
Posts: n/a

re: regular expression syntax basics


deko wrote:[color=blue]
> As an aside, are there many cases where a domain would appear as:
>
> something.com.something
>
> Is that only for international domains?[/color]

You mean like example.com.uk ?

If so, the tld is "uk" (country codes)...

--
Justin Koivisto - spam@koivi.com
http://www.koivi.com
  #10  
Old July 17th, 2005, 09:41 AM
deko
Guest
 
Posts: n/a

re: regular expression syntax basics


> $agents = array([color=blue]
> 'bot', 'Google', 'Slurp',
> 'Scooter', 'Spider', 'Infoseek',
> 'W3_Validator', 'ia_archiver'
> );
> $pattern = sprintf('#%s#i', implode($agents, '|'));
>
> if (preg_match($pattern, $agent)) {
> // do stuff here
> }[/color]

I think I understand... is '#%s#i' removing commas? what does sprintf do?

But doesn't creating and imploding the array add an extra step - as opposed
to something like:

eregi ( "(bot|Google|Slurp|Scooter)" , $agent )


  #11  
Old July 17th, 2005, 09:41 AM
deko
Guest
 
Posts: n/a

re: regular expression syntax basics


When using a regex like this:

eregi("(bot|google|infoseek|w3c_validator|ia_archi ver)", $agent)

Can I put meta characters within the parentheses, like this:

eregi("(bot$|google|infoseek|w3c_validator|ia_arch iver)", $agent) )

So it would match "robot" or "superbot", as well as just "bot".


  #12  
Old July 17th, 2005, 09:45 AM
deko
Guest
 
Posts: n/a

re: regular expression syntax basics



"Michael Fesser" <netizen@gmx.net> wrote in message
news:sivtk0lv2u6ui0g4is9rpvtho4mt4qe87r@4ax.com...[color=blue]
> .oO(deko)
>[color=green]
> >When using a regex like this:
> >
> >eregi("(bot|google|infoseek|w3c_validator|ia_arch iver)", $agent)
> >
> >Can I put meta characters within the parentheses, like this:
> >
> >eregi("(bot$|google|infoseek|w3c_validator|ia_arc hiver)", $agent) )
> >
> >So it would match "robot" or "superbot", as well as just "bot".[/color]
>
> Yes, but you have to do it in regex syntax:
>
> .*bot
>
> This matches the literal 'bot' which may be preceeded by any char (.) in
> any number (*).
>
> Micha[/color]

But isn't bot$ saying: match a "b" followed by an "o" followed by a "t"
followed by an end-of-line?
But then again, I suppose it's unlikely that "bot" will be followed by an
end of line in an agent string...


  #13  
Old July 17th, 2005, 09:45 AM
Michael Fesser
Guest
 
Posts: n/a

re: regular expression syntax basics


.oO(deko)
[color=blue]
>But isn't bot$ saying: match a "b" followed by an "o" followed by a "t"
>followed by an end-of-line?[/color]

Hmm, true. ;)

OK, I confused it with your other question with the TLDs with the
explicit start and end match (^ and $).
[color=blue]
>But then again, I suppose it's unlikely that "bot" will be followed by an
>end of line in an agent string...[/color]

Yep. I think just 'bot' should be fine.

Micha
  #14  
Old July 17th, 2005, 09:45 AM
deko
Guest
 
Posts: n/a

re: regular expression syntax basics


> Yep. I think just 'bot' should be fine.

Agreed - in this case I think .*bot and bot should return the same thing:

eregi("(bot|google|infoseek|etc...|ia_archiver)", $agent)

Where I was confused was where to use quotes and meta characters. I have a
bunch of other optimizations to do now that I've figured it out...

Thanks for the help!


Closed Thread


Similar Threads
Thread Thread Starter Forum Replies Last Post
python-dev Summary for 2006-05-01 through 2006-05-15 Steven Bethard answers 3 June 14th, 2006 03:55 PM
Search for multiple things in a string tshad answers 32 November 17th, 2005 10:18 AM
comp.lang.c Answers to Frequently Asked Questions (FAQ List) Steve Summit answers 0 November 13th, 2005 09:56 PM
comp.lang.c Answers to Frequently Asked Questions (FAQ List) Steve Summit answers 0 November 13th, 2005 03:15 AM