473,386 Members | 1,997 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

regular expression syntax basics

I'm trying to match top-level domains, where $sub = top level domain

if ( eregi("^com$ | ^org$ | ^net$ | ^biz$ | ^info$ | ^edu$ | ^gov$ | ^int$ |
^mil$", $sub) )

{
do stuff here
}

I've tried quoting each one individually ( "^com$" | "^net$" | ... ) but
that does not work either. How are quotes supposed to be used here? Should
I use a single quote? What's the difference?
Jul 17 '05 #1
13 1703
> I'm trying to match top-level domains, where $sub = top level domain

if ( eregi("^com$ | ^org$ | ^net$ | ^biz$ | ^info$ | ^edu$ | ^gov$ | ^int$ | ^mil$", $sub) )

{
do stuff here
}

I've tried quoting each one individually ( "^com$" | "^net$" | ... ) but
that does not work either. How are quotes supposed to be used here? Should I use a single quote? What's the difference?


Another example (that works, but could be imporoved, I think) is this:

if ( (eregi("bot",$agent)) || (ereg("Google",$agent)) ||
(ereg("Slurp",$agent)) || (ereg("Scooter",$agent)) ||
(eregi("Spider",$agent)) || (eregi("Infoseek",$agent)) ||
(eregi("W3C_Validator",$agent)) || (eregi("ia_archiver",$agent)) )
{
do stuff here
}

Is there a way to avoid the "||" statements?
Jul 17 '05 #2
deko wrote:
I'm trying to match top-level domains, where $sub = top level domain

if ( eregi("^com$ | ^org$ | ^net$ | ^biz$ | ^info$ | ^edu$ | ^gov$ | ^int$ |
^mil$", $sub) )

{
do stuff here
}

I've tried quoting each one individually ( "^com$" | "^net$" | ... ) but
that does not work either. How are quotes supposed to be used here? Should
I use a single quote? What's the difference?


Try more like this:
eregi("^(com|org|net|biz|info|edu|gov|int|mil)$", $sub)

--
Justin Koivisto - sp**@koivi.com
http://www.koivi.com
Jul 17 '05 #3
> Try more like this:
eregi("^(com|org|net|biz|info|edu|gov|int|mil)$", $sub)


I'll give it a shot. Do you think I could apply the same syntax to this:

if ( (eregi("bot",$agent)) || (ereg("Google",$agent)) ||
(ereg("Slurp",$agent)) || (ereg("Scooter",$agent)) ||
(eregi("Spider",$agent)) || (eregi("Infoseek",$agent)) ||
(eregi("W3C_Validator",$agent)) || (eregi("ia_archiver",$agent)) )
{
do stuff here
}
Jul 17 '05 #4
deko wrote:
I'm trying to match top-level domains, where $sub = top level domain

if ( eregi("^com$ | ^org$ | ^net$ | ^biz$ | ^info$ | ^edu$ | ^gov$ | ^int$


|
^mil$", $sub) )

{
do stuff here
}

I've tried quoting each one individually ( "^com$" | "^net$" | ... ) but
that does not work either. How are quotes supposed to be used here?


Should
I use a single quote? What's the difference?

Another example (that works, but could be imporoved, I think) is this:

if ( (eregi("bot",$agent)) || (ereg("Google",$agent)) ||
(ereg("Slurp",$agent)) || (ereg("Scooter",$agent)) ||
(eregi("Spider",$agent)) || (eregi("Infoseek",$agent)) ||
(eregi("W3C_Validator",$agent)) || (eregi("ia_archiver",$agent)) )
{
do stuff here
}

Is there a way to avoid the "||" statements?


if (
(eregi("(bot|Google|Slurp|Scooter|Spider|Infoseek| W3C_Validator|ia_archiver)",$agent))
)

--
Justin Koivisto - sp**@koivi.com
http://www.koivi.com
Jul 17 '05 #5
> if (
(eregi("(bot|Google|Slurp|Scooter|Spider|Infoseek| W3C_Validator|ia_archiver)
",$agent)) )


Cool!! Thanks! That's much better...

As an aside, are there many cases where a domain would appear as:

something.com.something

Is that only for international domains?
Jul 17 '05 #6
.oO(deko)
I'm trying to match top-level domains, where $sub = top level domain


You could also do it without a regex:

$gTLD = array('com', 'org', 'net', ...);
if (in_array(strtolower($sub), $gTLD)) {
// do something
}

Micha
Jul 17 '05 #7
.oO(deko)
Another example (that works, but could be imporoved, I think) is this:

if ( (eregi("bot",$agent)) || (ereg("Google",$agent)) ||
[...]
Much to slow.
Is there a way to avoid the "||" statements?


Another idea:

$agents = array(
'bot', 'Google', 'Slurp',
'Scooter', 'Spider', 'Infoseek',
'W3_Validator', 'ia_archiver'
);
$pattern = sprintf('#%s#i', implode($agents, '|'));

if (preg_match($pattern, $agent)) {
// do stuff here
}

Micha
Jul 17 '05 #8
deko wrote:
As an aside, are there many cases where a domain would appear as:

something.com.something

Is that only for international domains?


You mean like example.com.uk ?

If so, the tld is "uk" (country codes)...

--
Justin Koivisto - sp**@koivi.com
http://www.koivi.com
Jul 17 '05 #9
> $agents = array(
'bot', 'Google', 'Slurp',
'Scooter', 'Spider', 'Infoseek',
'W3_Validator', 'ia_archiver'
);
$pattern = sprintf('#%s#i', implode($agents, '|'));

if (preg_match($pattern, $agent)) {
// do stuff here
}


I think I understand... is '#%s#i' removing commas? what does sprintf do?

But doesn't creating and imploding the array add an extra step - as opposed
to something like:

eregi ( "(bot|Google|Slurp|Scooter)" , $agent )
Jul 17 '05 #10
When using a regex like this:

eregi("(bot|google|infoseek|w3c_validator|ia_archi ver)", $agent)

Can I put meta characters within the parentheses, like this:

eregi("(bot$|google|infoseek|w3c_validator|ia_arch iver)", $agent) )

So it would match "robot" or "superbot", as well as just "bot".
Jul 17 '05 #11

"Michael Fesser" <ne*****@gmx.net> wrote in message
news:si********************************@4ax.com...
.oO(deko)
When using a regex like this:

eregi("(bot|google|infoseek|w3c_validator|ia_arch iver)", $agent)

Can I put meta characters within the parentheses, like this:

eregi("(bot$|google|infoseek|w3c_validator|ia_arc hiver)", $agent) )

So it would match "robot" or "superbot", as well as just "bot".


Yes, but you have to do it in regex syntax:

.*bot

This matches the literal 'bot' which may be preceeded by any char (.) in
any number (*).

Micha


But isn't bot$ saying: match a "b" followed by an "o" followed by a "t"
followed by an end-of-line?
But then again, I suppose it's unlikely that "bot" will be followed by an
end of line in an agent string...
Jul 17 '05 #12
.oO(deko)
But isn't bot$ saying: match a "b" followed by an "o" followed by a "t"
followed by an end-of-line?
Hmm, true. ;)

OK, I confused it with your other question with the TLDs with the
explicit start and end match (^ and $).
But then again, I suppose it's unlikely that "bot" will be followed by an
end of line in an agent string...


Yep. I think just 'bot' should be fine.

Micha
Jul 17 '05 #13
> Yep. I think just 'bot' should be fine.

Agreed - in this case I think .*bot and bot should return the same thing:

eregi("(bot|google|infoseek|etc...|ia_archiver)", $agent)

Where I was confused was where to use quotes and meta characters. I have a
bunch of other optimizations to do now that I've figured it out...

Thanks for the help!
Jul 17 '05 #14

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
by: Chris Lasher | last post by:
Hello, I would like to create a set of very similar regular expression. In my initial thought, I'd hoped to create a regular expression with a variable inside of it that I could simply pass a...
1
by: Kenneth McDonald | last post by:
I'm working on the 0.8 release of my 'rex' module, and would appreciate feedback, suggestions, and criticism as I work towards finalizing the API and feature sets. rex is a module intended to make...
4
by: Buddy | last post by:
Can someone please show me how to create a regular expression to do the following My text is set to MyColumn{1, 100} Test I want a regular expression that sets the text to the following...
8
by: Rajeev Soni | last post by:
Hi I am looking for the regular expression for validating the allowed file types to upload like files like "zip,pdf,doc,rtf,gif,jpg,png,txt"; and the expression should not be case sensitive like...
5
by: Ryan | last post by:
HELLO I am using the following MICROSOFT SUGGESTED (somewhere on msdn) regular expression to validate email addresses however I understand that the RFP allows for "+" symbols in the email address...
6
by: Ludwig | last post by:
Hi, i'm using the regular expression \b\w to find the beginning of a word, in my C# application. If the word is 'public', for example, it works. However, if the word is '<public', it does not...
3
by: LordHog | last post by:
Hello all, I am attempting to create a small scripting application to be used during testing. I extract the commands from the script file I was going to tokenize the each line as one of the...
5
by: Noah Hoffman | last post by:
I have been trying to write a regular expression that identifies a block of text enclosed by (potentially nested) parentheses. I've found solutions using other regular expression engines (for...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.