regular expression syntax basics 
July 17th, 2005, 09:41 AM
| | | |
I'm trying to match top-level domains, where $sub = top level domain
if ( eregi("^com$ | ^org$ | ^net$ | ^biz$ | ^info$ | ^edu$ | ^gov$ | ^int$ |
^mil$", $sub) )
{
do stuff here
}
I've tried quoting each one individually ( "^com$" | "^net$" | ... ) but
that does not work either. How are quotes supposed to be used here? Should
I use a single quote? What's the difference? | 
July 17th, 2005, 09:41 AM
| | | | re: regular expression syntax basics
> I'm trying to match top-level domains, where $sub = top level domain[color=blue]
>
> if ( eregi("^com$ | ^org$ | ^net$ | ^biz$ | ^info$ | ^edu$ | ^gov$ | ^int$[/color]
|[color=blue]
> ^mil$", $sub) )
>
> {
> do stuff here
> }
>
> I've tried quoting each one individually ( "^com$" | "^net$" | ... ) but
> that does not work either. How are quotes supposed to be used here?[/color]
Should[color=blue]
> I use a single quote? What's the difference?
>[/color]
Another example (that works, but could be imporoved, I think) is this:
if ( (eregi("bot",$agent)) || (ereg("Google",$agent)) ||
(ereg("Slurp",$agent)) || (ereg("Scooter",$agent)) ||
(eregi("Spider",$agent)) || (eregi("Infoseek",$agent)) ||
(eregi("W3C_Validator",$agent)) || (eregi("ia_archiver",$agent)) )
{
do stuff here
}
Is there a way to avoid the "||" statements? | 
July 17th, 2005, 09:41 AM
| | | | re: regular expression syntax basics
deko wrote:
[color=blue]
> I'm trying to match top-level domains, where $sub = top level domain
>
> if ( eregi("^com$ | ^org$ | ^net$ | ^biz$ | ^info$ | ^edu$ | ^gov$ | ^int$ |
> ^mil$", $sub) )
>
> {
> do stuff here
> }
>
> I've tried quoting each one individually ( "^com$" | "^net$" | ... ) but
> that does not work either. How are quotes supposed to be used here? Should
> I use a single quote? What's the difference?
>
>[/color]
Try more like this:
eregi("^(com|org|net|biz|info|edu|gov|int|mil)$", $sub)
--
Justin Koivisto - spam@koivi.com http://www.koivi.com | 
July 17th, 2005, 09:41 AM
| | | | re: regular expression syntax basics
> Try more like this:[color=blue]
> eregi("^(com|org|net|biz|info|edu|gov|int|mil)$", $sub)[/color]
I'll give it a shot. Do you think I could apply the same syntax to this:
if ( (eregi("bot",$agent)) || (ereg("Google",$agent)) ||
(ereg("Slurp",$agent)) || (ereg("Scooter",$agent)) ||
(eregi("Spider",$agent)) || (eregi("Infoseek",$agent)) ||
(eregi("W3C_Validator",$agent)) || (eregi("ia_archiver",$agent)) )
{
do stuff here
} | 
July 17th, 2005, 09:41 AM
| | | | re: regular expression syntax basics
deko wrote:
[color=blue][color=green]
>>I'm trying to match top-level domains, where $sub = top level domain
>>
>>if ( eregi("^com$ | ^org$ | ^net$ | ^biz$ | ^info$ | ^edu$ | ^gov$ | ^int$[/color]
>
> |
>[color=green]
>>^mil$", $sub) )
>>
>> {
>> do stuff here
>> }
>>
>>I've tried quoting each one individually ( "^com$" | "^net$" | ... ) but
>>that does not work either. How are quotes supposed to be used here?[/color]
>
> Should
>[color=green]
>>I use a single quote? What's the difference?
>>[/color]
> Another example (that works, but could be imporoved, I think) is this:
>
> if ( (eregi("bot",$agent)) || (ereg("Google",$agent)) ||
> (ereg("Slurp",$agent)) || (ereg("Scooter",$agent)) ||
> (eregi("Spider",$agent)) || (eregi("Infoseek",$agent)) ||
> (eregi("W3C_Validator",$agent)) || (eregi("ia_archiver",$agent)) )
> {
> do stuff here
> }
>
> Is there a way to avoid the "||" statements?[/color]
if (
(eregi("(bot|Google|Slurp|Scooter|Spider|Infoseek| W3C_Validator|ia_archiver)",$agent))
)
--
Justin Koivisto - spam@koivi.com http://www.koivi.com | 
July 17th, 2005, 09:41 AM
| | | | re: regular expression syntax basics
> if ([color=blue]
>[/color]
(eregi("(bot|Google|Slurp|Scooter|Spider|Infoseek| W3C_Validator|ia_archiver)
",$agent))[color=blue]
> )[/color]
Cool!! Thanks! That's much better...
As an aside, are there many cases where a domain would appear as:
something.com.something
Is that only for international domains? | 
July 17th, 2005, 09:41 AM
| | | | re: regular expression syntax basics
.oO(deko)
[color=blue]
>I'm trying to match top-level domains, where $sub = top level domain[/color]
You could also do it without a regex:
$gTLD = array('com', 'org', 'net', ...);
if (in_array(strtolower($sub), $gTLD)) {
// do something
}
Micha | 
July 17th, 2005, 09:41 AM
| | | | re: regular expression syntax basics
.oO(deko)
[color=blue]
>Another example (that works, but could be imporoved, I think) is this:
>
>if ( (eregi("bot",$agent)) || (ereg("Google",$agent)) ||
>[...][/color]
Much to slow.
[color=blue]
>Is there a way to avoid the "||" statements?[/color]
Another idea:
$agents = array(
'bot', 'Google', 'Slurp',
'Scooter', 'Spider', 'Infoseek',
'W3_Validator', 'ia_archiver'
);
$pattern = sprintf('#%s#i', implode($agents, '|'));
if (preg_match($pattern, $agent)) {
// do stuff here
}
Micha | 
July 17th, 2005, 09:41 AM
| | | | re: regular expression syntax basics
deko wrote:[color=blue]
> As an aside, are there many cases where a domain would appear as:
>
> something.com.something
>
> Is that only for international domains?[/color]
You mean like example.com.uk ?
If so, the tld is "uk" (country codes)...
--
Justin Koivisto - spam@koivi.com http://www.koivi.com | 
July 17th, 2005, 09:41 AM
| | | | re: regular expression syntax basics
> $agents = array([color=blue]
> 'bot', 'Google', 'Slurp',
> 'Scooter', 'Spider', 'Infoseek',
> 'W3_Validator', 'ia_archiver'
> );
> $pattern = sprintf('#%s#i', implode($agents, '|'));
>
> if (preg_match($pattern, $agent)) {
> // do stuff here
> }[/color]
I think I understand... is '#%s#i' removing commas? what does sprintf do?
But doesn't creating and imploding the array add an extra step - as opposed
to something like:
eregi ( "(bot|Google|Slurp|Scooter)" , $agent ) | 
July 17th, 2005, 09:41 AM
| | | | re: regular expression syntax basics
When using a regex like this:
eregi("(bot|google|infoseek|w3c_validator|ia_archi ver)", $agent)
Can I put meta characters within the parentheses, like this:
eregi("(bot$|google|infoseek|w3c_validator|ia_arch iver)", $agent) )
So it would match "robot" or "superbot", as well as just "bot". | 
July 17th, 2005, 09:45 AM
| | | | re: regular expression syntax basics
"Michael Fesser" <netizen@gmx.net> wrote in message
news:sivtk0lv2u6ui0g4is9rpvtho4mt4qe87r@4ax.com...[color=blue]
> .oO(deko)
>[color=green]
> >When using a regex like this:
> >
> >eregi("(bot|google|infoseek|w3c_validator|ia_arch iver)", $agent)
> >
> >Can I put meta characters within the parentheses, like this:
> >
> >eregi("(bot$|google|infoseek|w3c_validator|ia_arc hiver)", $agent) )
> >
> >So it would match "robot" or "superbot", as well as just "bot".[/color]
>
> Yes, but you have to do it in regex syntax:
>
> .*bot
>
> This matches the literal 'bot' which may be preceeded by any char (.) in
> any number (*).
>
> Micha[/color]
But isn't bot$ saying: match a "b" followed by an "o" followed by a "t"
followed by an end-of-line?
But then again, I suppose it's unlikely that "bot" will be followed by an
end of line in an agent string... | 
July 17th, 2005, 09:45 AM
| | | | re: regular expression syntax basics
.oO(deko)
[color=blue]
>But isn't bot$ saying: match a "b" followed by an "o" followed by a "t"
>followed by an end-of-line?[/color]
Hmm, true. ;)
OK, I confused it with your other question with the TLDs with the
explicit start and end match (^ and $).
[color=blue]
>But then again, I suppose it's unlikely that "bot" will be followed by an
>end of line in an agent string...[/color]
Yep. I think just 'bot' should be fine.
Micha | 
July 17th, 2005, 09:45 AM
| | | | re: regular expression syntax basics
> Yep. I think just 'bot' should be fine.
Agreed - in this case I think .*bot and bot should return the same thing:
eregi("(bot|google|infoseek|etc...|ia_archiver)", $agent)
Where I was confused was where to use quotes and meta characters. I have a
bunch of other optimizations to do now that I've figured it out...
Thanks for the help! |  | | | | /bytes/about
We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights.
Get the best answers to your questions from over 225,662 network members.
|