By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
455,719 Members | 1,220 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 455,719 IT Pros & Developers. It's quick & easy.

regular expression syntax basics

P: n/a
I'm trying to match top-level domains, where $sub = top level domain

if ( eregi("^com$ | ^org$ | ^net$ | ^biz$ | ^info$ | ^edu$ | ^gov$ | ^int$ |
^mil$", $sub) )

{
do stuff here
}

I've tried quoting each one individually ( "^com$" | "^net$" | ... ) but
that does not work either. How are quotes supposed to be used here? Should
I use a single quote? What's the difference?
Jul 17 '05 #1
Share this Question
Share on Google+
13 Replies


P: n/a
> I'm trying to match top-level domains, where $sub = top level domain

if ( eregi("^com$ | ^org$ | ^net$ | ^biz$ | ^info$ | ^edu$ | ^gov$ | ^int$ | ^mil$", $sub) )

{
do stuff here
}

I've tried quoting each one individually ( "^com$" | "^net$" | ... ) but
that does not work either. How are quotes supposed to be used here? Should I use a single quote? What's the difference?


Another example (that works, but could be imporoved, I think) is this:

if ( (eregi("bot",$agent)) || (ereg("Google",$agent)) ||
(ereg("Slurp",$agent)) || (ereg("Scooter",$agent)) ||
(eregi("Spider",$agent)) || (eregi("Infoseek",$agent)) ||
(eregi("W3C_Validator",$agent)) || (eregi("ia_archiver",$agent)) )
{
do stuff here
}

Is there a way to avoid the "||" statements?
Jul 17 '05 #2

P: n/a
deko wrote:
I'm trying to match top-level domains, where $sub = top level domain

if ( eregi("^com$ | ^org$ | ^net$ | ^biz$ | ^info$ | ^edu$ | ^gov$ | ^int$ |
^mil$", $sub) )

{
do stuff here
}

I've tried quoting each one individually ( "^com$" | "^net$" | ... ) but
that does not work either. How are quotes supposed to be used here? Should
I use a single quote? What's the difference?


Try more like this:
eregi("^(com|org|net|biz|info|edu|gov|int|mil)$", $sub)

--
Justin Koivisto - sp**@koivi.com
http://www.koivi.com
Jul 17 '05 #3

P: n/a
> Try more like this:
eregi("^(com|org|net|biz|info|edu|gov|int|mil)$", $sub)


I'll give it a shot. Do you think I could apply the same syntax to this:

if ( (eregi("bot",$agent)) || (ereg("Google",$agent)) ||
(ereg("Slurp",$agent)) || (ereg("Scooter",$agent)) ||
(eregi("Spider",$agent)) || (eregi("Infoseek",$agent)) ||
(eregi("W3C_Validator",$agent)) || (eregi("ia_archiver",$agent)) )
{
do stuff here
}
Jul 17 '05 #4

P: n/a
deko wrote:
I'm trying to match top-level domains, where $sub = top level domain

if ( eregi("^com$ | ^org$ | ^net$ | ^biz$ | ^info$ | ^edu$ | ^gov$ | ^int$


|
^mil$", $sub) )

{
do stuff here
}

I've tried quoting each one individually ( "^com$" | "^net$" | ... ) but
that does not work either. How are quotes supposed to be used here?


Should
I use a single quote? What's the difference?

Another example (that works, but could be imporoved, I think) is this:

if ( (eregi("bot",$agent)) || (ereg("Google",$agent)) ||
(ereg("Slurp",$agent)) || (ereg("Scooter",$agent)) ||
(eregi("Spider",$agent)) || (eregi("Infoseek",$agent)) ||
(eregi("W3C_Validator",$agent)) || (eregi("ia_archiver",$agent)) )
{
do stuff here
}

Is there a way to avoid the "||" statements?


if (
(eregi("(bot|Google|Slurp|Scooter|Spider|Infoseek| W3C_Validator|ia_archiver)",$agent))
)

--
Justin Koivisto - sp**@koivi.com
http://www.koivi.com
Jul 17 '05 #5

P: n/a
> if (
(eregi("(bot|Google|Slurp|Scooter|Spider|Infoseek| W3C_Validator|ia_archiver)
",$agent)) )


Cool!! Thanks! That's much better...

As an aside, are there many cases where a domain would appear as:

something.com.something

Is that only for international domains?
Jul 17 '05 #6

P: n/a
.oO(deko)
I'm trying to match top-level domains, where $sub = top level domain


You could also do it without a regex:

$gTLD = array('com', 'org', 'net', ...);
if (in_array(strtolower($sub), $gTLD)) {
// do something
}

Micha
Jul 17 '05 #7

P: n/a
.oO(deko)
Another example (that works, but could be imporoved, I think) is this:

if ( (eregi("bot",$agent)) || (ereg("Google",$agent)) ||
[...]
Much to slow.
Is there a way to avoid the "||" statements?


Another idea:

$agents = array(
'bot', 'Google', 'Slurp',
'Scooter', 'Spider', 'Infoseek',
'W3_Validator', 'ia_archiver'
);
$pattern = sprintf('#%s#i', implode($agents, '|'));

if (preg_match($pattern, $agent)) {
// do stuff here
}

Micha
Jul 17 '05 #8

P: n/a
deko wrote:
As an aside, are there many cases where a domain would appear as:

something.com.something

Is that only for international domains?


You mean like example.com.uk ?

If so, the tld is "uk" (country codes)...

--
Justin Koivisto - sp**@koivi.com
http://www.koivi.com
Jul 17 '05 #9

P: n/a
> $agents = array(
'bot', 'Google', 'Slurp',
'Scooter', 'Spider', 'Infoseek',
'W3_Validator', 'ia_archiver'
);
$pattern = sprintf('#%s#i', implode($agents, '|'));

if (preg_match($pattern, $agent)) {
// do stuff here
}


I think I understand... is '#%s#i' removing commas? what does sprintf do?

But doesn't creating and imploding the array add an extra step - as opposed
to something like:

eregi ( "(bot|Google|Slurp|Scooter)" , $agent )
Jul 17 '05 #10

P: n/a
When using a regex like this:

eregi("(bot|google|infoseek|w3c_validator|ia_archi ver)", $agent)

Can I put meta characters within the parentheses, like this:

eregi("(bot$|google|infoseek|w3c_validator|ia_arch iver)", $agent) )

So it would match "robot" or "superbot", as well as just "bot".
Jul 17 '05 #11

P: n/a

"Michael Fesser" <ne*****@gmx.net> wrote in message
news:si********************************@4ax.com...
.oO(deko)
When using a regex like this:

eregi("(bot|google|infoseek|w3c_validator|ia_arch iver)", $agent)

Can I put meta characters within the parentheses, like this:

eregi("(bot$|google|infoseek|w3c_validator|ia_arc hiver)", $agent) )

So it would match "robot" or "superbot", as well as just "bot".


Yes, but you have to do it in regex syntax:

.*bot

This matches the literal 'bot' which may be preceeded by any char (.) in
any number (*).

Micha


But isn't bot$ saying: match a "b" followed by an "o" followed by a "t"
followed by an end-of-line?
But then again, I suppose it's unlikely that "bot" will be followed by an
end of line in an agent string...
Jul 17 '05 #12

P: n/a
.oO(deko)
But isn't bot$ saying: match a "b" followed by an "o" followed by a "t"
followed by an end-of-line?
Hmm, true. ;)

OK, I confused it with your other question with the TLDs with the
explicit start and end match (^ and $).
But then again, I suppose it's unlikely that "bot" will be followed by an
end of line in an agent string...


Yep. I think just 'bot' should be fine.

Micha
Jul 17 '05 #13

P: n/a
> Yep. I think just 'bot' should be fine.

Agreed - in this case I think .*bot and bot should return the same thing:

eregi("(bot|google|infoseek|etc...|ia_archiver)", $agent)

Where I was confused was where to use quotes and meta characters. I have a
bunch of other optimizations to do now that I've figured it out...

Thanks for the help!
Jul 17 '05 #14

This discussion thread is closed

Replies have been disabled for this discussion.