473,386 Members | 1,673 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

How to extract domain from string with regex?

I'm sure someone has passed this way before...

I want to check to see is a domain name is contained in a string, and if one is,
I want to extract it. In these strings, domains are always preceded by
"http://" or "http : //www" (without the spaces).

in pseudo code, I thought it might look like this:

if (eregi("http: //", $mystring))
{
$domain = explode("http: //", $mystring);
$domain = array_reverse($domain);
}
$parts = domain[0];
explode(".", $parts);
if ($parts[0] == "www")
{
$extracted = $parts[1]."."$parts[2];
}
else
{
$extracted = $parts[0]."."$parts[1];
}

Does this look about right?

Thanks in advance.

Aug 26 '06 #1
3 18344
here's a cleaner example:

if (eregi("http://", $mystring))
{
$mystring = explode("http://", $mystring);
$mystring = array_reverse($mystring);
$domain = $mystring[0];
$domain = explode(".", $domain);
if ($domain[0] == "www")
{
$extracted = $domain[1].".".$domain[2];
}
else
{
$extracted = "$domain[0].".".$domain[1];
}
}

Can I egrep on "http://" ? or do I need to escape the "/" ?
Aug 26 '06 #2
*** deko escribió/wrote (Fri, 25 Aug 2006 23:09:28 -0700):
In these strings, domains are always preceded by
"http://" or "http : //www" (without the spaces).
Without the spaces? Then, why do you add the spaces?

Given that precondition, I wouldn't use regex:

parse_url Parse a URL and return its components

usage:
array parse_url ( string url )

Parameters
url
The URL to parse

Return Values
On seriously malformed URLs, parse_url() may return FALSE and emit a
E_WARNING. Otherwise an associative array is returned, whose components may
be (at least one):

scheme - e.g. http
host
port
user
pass
path
query - after the question mark ?
fragment - after the hashmark #
in pseudo code, I thought it might look like this:

if (eregi("http: //", $mystring))
Sorry, but I just can't understand all that story about spaces/not spaces
:-?

--
-+ http://alvaro.es - Álvaro G. Vicario - Burgos, Spain
++ Mi sitio sobre programación web: http://bits.demogracia.com
+- Mi web de humor con rayos UVA: http://www.demogracia.com
--
Aug 26 '06 #3
Thanks for the tip on parse_url.

But I still have to find the URL (which could be anywhere) in the string.
Without the spaces? Then, why do you add the spaces?
Here's a psuedocode example without the spaces:

if (eregi("http://", $mystring))
{
$mystring = explode("http://", $mystring);
$mystring = array_reverse($mystring);
$domain = $mystring[0];
$domain = explode(".", $domain);
if ($domain[0] == "www")
{
$extracted = $domain[1].".".$domain[2];
}
else
{
$extracted = "$domain[0].".".$domain[1];
}
}

Would it be better to use preg_match here?

preg_match('@^(?:http://)?([^/]+)@i',
"http://www.php.net/index.html", $matches);
$host = $matches[1];

// get last two segments of host name
preg_match('/[^.]+\.[^.]+$/', $host, $matches);
echo "domain name is: {$matches[0]}\n";

But would this work if the URL is buried in a string?

Aug 26 '06 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Ori | last post by:
Hi, I have a HTML text which I need to parse in order to extract data from it. My html contain a table contains few rows and two columns. I want to extract the data from the 2nd column in...
4
by: Alex Ayzin | last post by:
Hi, I have a var-sized URL passed into my method. I need to trim it, so instead of : "123abc.MyDomain.com", I ended up with "MyDomain". The size of the initial string is not fixed. IndexOf...
2
by: Thief_ | last post by:
I've got this type of info on a web page: ---------------------------------------------------------------------------- -------------------------------------------- <tr height="25"> <td nowrap...
13
by: Tony Girgenti | last post by:
Hello. Using VS.NET 2003 VB. If i have a string similar to the attached, how would i extract the "Truckname=" data from it in a loop and stay in the loop until the end of the string is reached...
5
by: Mrinal | last post by:
Hi , I am dealing with a strange issue , that , i initially thought would be a sitter to implement , let me know if you have some clue to resolve the issue : In one of my business logic , i...
5
by: deko | last post by:
If I have random and unpredictable user agent strings containing URLs, what is the best way to extract the URL? For example, let's say the string looks like this: registered NYSE 943 <a...
14
by: deko | last post by:
geturl.php Too much code to paste here, but have a look at http://www.liarsscourge.com/ So far, I have not found a string that can break this... Any built-in functions or suggestions for...
4
by: Ciaran | last post by:
Hi can someone give me hand with this please? What's the best way to extract the extension from the url? example: $string="http://www.domain.co.uk/anypage.html" In this example, I'd be...
1
by: GS | last post by:
I need to extract sections out of a long string of about 5 to 10 KB, change any date format of dd Mmm yyyy to yyyy-mm-dd, then further from each section extract columns of tables. what is the...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.