Connecting Tech Pros Worldwide Help | Site Map

regular expression syntax basics

 
LinkBack Thread Tools Search this Thread
  #1  
Old July 17th, 2005, 08:41 AM
deko
Guest
 
Posts: n/a
Default regular expression syntax basics

I'm trying to match top-level domains, where $sub = top level domain

if ( eregi("^com$ | ^org$ | ^net$ | ^biz$ | ^info$ | ^edu$ | ^gov$ | ^int$ |
^mil$", $sub) )

{
do stuff here
}

I've tried quoting each one individually ( "^com$" | "^net$" | ... ) but
that does not work either. How are quotes supposed to be used here? Should
I use a single quote? What's the difference?



  #2  
Old July 17th, 2005, 08:41 AM
deko
Guest
 
Posts: n/a
Default Re: regular expression syntax basics

> I'm trying to match top-level domains, where $sub = top level domain[color=blue]
>
> if ( eregi("^com$ | ^org$ | ^net$ | ^biz$ | ^info$ | ^edu$ | ^gov$ | ^int$[/color]
|[color=blue]
> ^mil$", $sub) )
>
> {
> do stuff here
> }
>
> I've tried quoting each one individually ( "^com$" | "^net$" | ... ) but
> that does not work either. How are quotes supposed to be used here?[/color]
Should[color=blue]
> I use a single quote? What's the difference?
>[/color]

Another example (that works, but could be imporoved, I think) is this:

if ( (eregi("bot",$agent)) || (ereg("Google",$agent)) ||
(ereg("Slurp",$agent)) || (ereg("Scooter",$agent)) ||
(eregi("Spider",$agent)) || (eregi("Infoseek",$agent)) ||
(eregi("W3C_Validator",$agent)) || (eregi("ia_archiver",$agent)) )
{
do stuff here
}

Is there a way to avoid the "||" statements?


  #3  
Old July 17th, 2005, 08:41 AM
Justin Koivisto
Guest
 
Posts: n/a
Default Re: regular expression syntax basics

deko wrote:
[color=blue]
> I'm trying to match top-level domains, where $sub = top level domain
>
> if ( eregi("^com$ | ^org$ | ^net$ | ^biz$ | ^info$ | ^edu$ | ^gov$ | ^int$ |
> ^mil$", $sub) )
>
> {
> do stuff here
> }
>
> I've tried quoting each one individually ( "^com$" | "^net$" | ... ) but
> that does not work either. How are quotes supposed to be used here? Should
> I use a single quote? What's the difference?
>
>[/color]

Try more like this:
eregi("^(com|org|net|biz|info|edu|gov|int|mil)$", $sub)

--
Justin Koivisto - spam@koivi.com
http://www.koivi.com
  #4  
Old July 17th, 2005, 08:41 AM
deko
Guest
 
Posts: n/a
Default Re: regular expression syntax basics

> Try more like this:[color=blue]
> eregi("^(com|org|net|biz|info|edu|gov|int|mil)$", $sub)[/color]

I'll give it a shot. Do you think I could apply the same syntax to this:

if ( (eregi("bot",$agent)) || (ereg("Google",$agent)) ||
(ereg("Slurp",$agent)) || (ereg("Scooter",$agent)) ||
(eregi("Spider",$agent)) || (eregi("Infoseek",$agent)) ||
(eregi("W3C_Validator",$agent)) || (eregi("ia_archiver",$agent)) )
{
do stuff here
}


  #5  
Old July 17th, 2005, 08:41 AM
Justin Koivisto
Guest
 
Posts: n/a
Default Re: regular expression syntax basics

deko wrote:
[color=blue][color=green]
>>I'm trying to match top-level domains, where $sub = top level domain
>>
>>if ( eregi("^com$ | ^org$ | ^net$ | ^biz$ | ^info$ | ^edu$ | ^gov$ | ^int$[/color]
>
> |
>[color=green]
>>^mil$", $sub) )
>>
>> {
>> do stuff here
>> }
>>
>>I've tried quoting each one individually ( "^com$" | "^net$" | ... ) but
>>that does not work either. How are quotes supposed to be used here?[/color]
>
> Should
>[color=green]
>>I use a single quote? What's the difference?
>>[/color]
> Another example (that works, but could be imporoved, I think) is this:
>
> if ( (eregi("bot",$agent)) || (ereg("Google",$agent)) ||
> (ereg("Slurp",$agent)) || (ereg("Scooter",$agent)) ||
> (eregi("Spider",$agent)) || (eregi("Infoseek",$agent)) ||
> (eregi("W3C_Validator",$agent)) || (eregi("ia_archiver",$agent)) )
> {
> do stuff here
> }
>
> Is there a way to avoid the "||" statements?[/color]

if (
(eregi("(bot|Google|Slurp|Scooter|Spider|Infoseek| W3C_Validator|ia_archiver)",$agent))
)

--
Justin Koivisto - spam@koivi.com
http://www.koivi.com
  #6  
Old July 17th, 2005, 08:41 AM
deko
Guest
 
Posts: n/a
Default Re: regular expression syntax basics

> if ([color=blue]
>[/color]
(eregi("(bot|Google|Slurp|Scooter|Spider|Infoseek| W3C_Validator|ia_archiver)
",$agent))[color=blue]
> )[/color]

Cool!! Thanks! That's much better...

As an aside, are there many cases where a domain would appear as:

something.com.something

Is that only for international domains?


  #7  
Old July 17th, 2005, 08:41 AM
Michael Fesser
Guest
 
Posts: n/a
Default Re: regular expression syntax basics

.oO(deko)
[color=blue]
>I'm trying to match top-level domains, where $sub = top level domain[/color]

You could also do it without a regex:

$gTLD = array('com', 'org', 'net', ...);
if (in_array(strtolower($sub), $gTLD)) {
// do something
}

Micha
  #8  
Old July 17th, 2005, 08:41 AM
Michael Fesser
Guest
 
Posts: n/a
Default Re: regular expression syntax basics

.oO(deko)
[color=blue]
>Another example (that works, but could be imporoved, I think) is this:
>
>if ( (eregi("bot",$agent)) || (ereg("Google",$agent)) ||
>[...][/color]

Much to slow.
[color=blue]
>Is there a way to avoid the "||" statements?[/color]

Another idea:

$agents = array(
'bot', 'Google', 'Slurp',
'Scooter', 'Spider', 'Infoseek',
'W3_Validator', 'ia_archiver'
);
$pattern = sprintf('#%s#i', implode($agents, '|'));

if (preg_match($pattern, $agent)) {
// do stuff here
}

Micha
  #9  
Old July 17th, 2005, 08:41 AM
Justin Koivisto
Guest
 
Posts: n/a
Default Re: regular expression syntax basics

deko wrote:[color=blue]
> As an aside, are there many cases where a domain would appear as:
>
> something.com.something
>
> Is that only for international domains?[/color]

You mean like example.com.uk ?

If so, the tld is "uk" (country codes)...

--
Justin Koivisto - spam@koivi.com
http://www.koivi.com
  #10  
Old July 17th, 2005, 08:41 AM
deko
Guest
 
Posts: n/a
Default Re: regular expression syntax basics

> $agents = array([color=blue]
> 'bot', 'Google', 'Slurp',
> 'Scooter', 'Spider', 'Infoseek',
> 'W3_Validator', 'ia_archiver'
> );
> $pattern = sprintf('#%s#i', implode($agents, '|'));
>
> if (preg_match($pattern, $agent)) {
> // do stuff here
> }[/color]

I think I understand... is '#%s#i' removing commas? what does sprintf do?

But doesn't creating and imploding the array add an extra step - as opposed
to something like:

eregi ( "(bot|Google|Slurp|Scooter)" , $agent )


  #11  
Old July 17th, 2005, 08:41 AM
deko
Guest
 
Posts: n/a
Default Re: regular expression syntax basics

When using a regex like this:

eregi("(bot|google|infoseek|w3c_validator|ia_archi ver)", $agent)

Can I put meta characters within the parentheses, like this:

eregi("(bot$|google|infoseek|w3c_validator|ia_arch iver)", $agent) )

So it would match "robot" or "superbot", as well as just "bot".


  #12  
Old July 17th, 2005, 08:45 AM
deko
Guest
 
Posts: n/a
Default Re: regular expression syntax basics


"Michael Fesser" <netizen@gmx.net> wrote in message
news:sivtk0lv2u6ui0g4is9rpvtho4mt4qe87r@4ax.com...[color=blue]
> .oO(deko)
>[color=green]
> >When using a regex like this:
> >
> >eregi("(bot|google|infoseek|w3c_validator|ia_arch iver)", $agent)
> >
> >Can I put meta characters within the parentheses, like this:
> >
> >eregi("(bot$|google|infoseek|w3c_validator|ia_arc hiver)", $agent) )
> >
> >So it would match "robot" or "superbot", as well as just "bot".[/color]
>
> Yes, but you have to do it in regex syntax:
>
> .*bot
>
> This matches the literal 'bot' which may be preceeded by any char (.) in
> any number (*).
>
> Micha[/color]

But isn't bot$ saying: match a "b" followed by an "o" followed by a "t"
followed by an end-of-line?
But then again, I suppose it's unlikely that "bot" will be followed by an
end of line in an agent string...


  #13  
Old July 17th, 2005, 08:45 AM
Michael Fesser
Guest
 
Posts: n/a
Default Re: regular expression syntax basics

.oO(deko)
[color=blue]
>But isn't bot$ saying: match a "b" followed by an "o" followed by a "t"
>followed by an end-of-line?[/color]

Hmm, true. ;)

OK, I confused it with your other question with the TLDs with the
explicit start and end match (^ and $).
[color=blue]
>But then again, I suppose it's unlikely that "bot" will be followed by an
>end of line in an agent string...[/color]

Yep. I think just 'bot' should be fine.

Micha
  #14  
Old July 17th, 2005, 08:45 AM
deko
Guest
 
Posts: n/a
Default Re: regular expression syntax basics

> Yep. I think just 'bot' should be fine.

Agreed - in this case I think .*bot and bot should return the same thing:

eregi("(bot|google|infoseek|etc...|ia_archiver)", $agent)

Where I was confused was where to use quotes and meta characters. I have a
bunch of other optimizations to do now that I've figured it out...

Thanks for the help!


 

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Popular Articles

What is Bytes?

We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights. Get the best answers to your questions from over 220,989 network members.