473,327 Members | 2,094 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,327 software developers and data experts.

How to Extract a website from a string / plain text?

clain
79
I need to extract website from Plain text string, I searched in most of the forums but i could not find a better one..
  1. since its a plain text it will not have any HTML links or anchor tags to find and extract.
  2. the website may or may not contain "www." for example the website name can be "learnwell.com" instead of "www.learnwell.com".
  3. There are website names like main.cool.edu

Here is an example string "visit our webiste gravesfab.com"
Nov 1 '10 #1

✓ answered by code green

I am not good with regex, I always look for somebody elses solution with Mr Google, that is the only reason I suggested it.
Like I said, I did find numerous versions that validated an email structure.
I am suprised you did not find similar for web addresses.

I will happily show my email regex.
Maybe it will give you something to build on, or hopefully prompt a regex guru to suggest something better
Expand|Select|Wrap|Line Numbers
  1. if(preg_match('/^[[:alnum:]][a-z0-9_\.\-]*@[a-z0-9\.\-]+\.[a-z]{2,4}$/i',$email))

6 2470
code green
1,726 Expert 1GB
It may be me, but your question doesn't seem to make sense.
What are you trying to do?
Nov 1 '10 #2
clain
79
Its simple ... i have bulks of plain text files ... i just need to extract all the Websites from it.
Nov 1 '10 #3
code green
1,726 Expert 1GB
So you mean you want to find all the domain names in a text file?

There may be a regex that validates a domain name structure.
I know they exist for email addresses, so try googling for regex and domain name and validate or check
Nov 1 '10 #4
clain
79
Hello Mr Code Green.. its not exactly the domain name. it may also contain sub-domain's for example "support.domain.com".

And regarding Googling... I did not find a regex that match my criteria in while googling... and also "try googling" is not the answer that I am expecting from Bytes. if that was the case ,I would not have posted this topic here ... ha ha
Nov 1 '10 #5
code green
1,726 Expert 1GB
I am not good with regex, I always look for somebody elses solution with Mr Google, that is the only reason I suggested it.
Like I said, I did find numerous versions that validated an email structure.
I am suprised you did not find similar for web addresses.

I will happily show my email regex.
Maybe it will give you something to build on, or hopefully prompt a regex guru to suggest something better
Expand|Select|Wrap|Line Numbers
  1. if(preg_match('/^[[:alnum:]][a-z0-9_\.\-]*@[a-z0-9\.\-]+\.[a-z]{2,4}$/i',$email))
Nov 1 '10 #6
clain
79
Thanks Buddy ... I can start from here... Some more work around on your regex must get me there to the actual code.

To be frank I got many regex but could not find a perfect one.. most of them faild in odd conditions...

hopfully a regex that can omit "@" symbol can be derived from you code... I am on it... thanks again
Nov 1 '10 #7

Sign in to post your reply or Sign up for a free account.

Similar topics

2
by: Russell Klopfer | last post by:
Hello. I would like to know how I can parse a plain-text file. All I want to do is be able to sequentially extract each word from a document. Similar to the StringTokenizer in Java. Is there a...
3
by: Alfredo Agosti | last post by:
Hi folks, I have an Access 2000 db with a memo field. Into the memo field I put text with bold attributes, URL etc etc What I need to to is converting the rich text contained into the memo...
10
by: J. Alan Rueckgauer | last post by:
Hello. I'm looking for a simple way to do the following: We have a database that serves-up content to a website. Some of those items are events, some are news articles. They're stored in the...
8
by: LRW | last post by:
I'm not sure this message is totally appropriate for this group, so please, if anyone has a better group suggestion, let me know! My company sends out a monthly newsletter in HTML format to our...
4
by: Guogang | last post by:
Hi, I need to extract plain text from HTML page (i.e. do not show images, html formatting, ...) Is there some C# class/function that can help me on this? Thanks, Guogang
10
by: Eric Lindsay | last post by:
This may be too far off topic, however I was looking at this page http://www.hixie.ch/advocacy/xhtml about XHTML problems by Ian Hickson. It is served as text/plain, according to Firefox...
7
by: teo | last post by:
hallo, I need to extract a word and few text that precedes and follows it (about 30 + 30 chars) from a long textual document. Like the description that Google returns when it has found a...
1
by: nkg1234567 | last post by:
I'm trying to extract HTML from a website in the form of a string, and then I want to extract particular elements from the string using the substr function: here is some sample code that I have thus...
3
by: realmerl | last post by:
Hi All. I'm trying to transform a html document into plain text via xslt. Simple you say! (i hope) I have got it working, by using the magnificent <xsl:value-of select="."/>. This returns the...
4
by: saumalatha | last post by:
someone please help me do this...how to extract the plain text from an xml file.
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.