473,396 Members | 1,938 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Regex Issues - Qualified URLS

Hi,
I am using the following function to match any URLS from within a string
containing the html of a webpage:

public List<stringDumpHrefs(String inputString)
{
Regex r;
Match m;
List<stringLstURLs = new List<string>();

r = new Regex("href\\s*=\\s*(?:\"(?<1>[^\"]*)\"|(?<1>\\S+))",
RegexOptions.IgnoreCase | RegexOptions.Compiled);
for (m = r.Match(inputString); m.Success; m = m.NextMatch())
{
LstURLs.Add(m.Groups[1].ToString());
}
return LstURLs;
}
However the problem with this, is it returns all links on the page, and
I only wish to return fully qualified links such as
http://www.domain.com/page.html and not relitive links.
I was given the following information by Kevin Spencer:
/* Start */
(?i)href\s*=\s*"?(?<1>http://[^"]+\"?[^>]*)>
First, rather than using an alternation, I just gave a rule that it could
have 0 or 1 quotes at the beginning and end. The (?i) indicates that the
regex is not case-sensitive. The group 1 consists of the character sequence
"http://" followed by any character that is not a quote mark, followed by
zero or 1 quote marks, followed by any character that is not ">". The
expression ends with the ">" character.
/* End */

I am unsure of how to incorperate the regex given by kevin into my
function, does anyone have any suggestions?

Regards
Dec 11 '07 #1
0 886

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

12
by: chris | last post by:
i can see the power of regular expressions but am having a bit of a battle getting my head around them. can anyone recommend some BASIC - tutorials for using regex something like th idots...
0
by: lkrubner | last post by:
My boss gave me this assignment: "Change all the URLs on our website so that they no longer look like dynamic URLs. Make them look like folders." I spent all yesterday studying Apache...
2
by: larry | last post by:
Hi, I need to use "regex" to validate user input into a text box. I'm having trouble adding the 'regex' class to my project. Essentially, I haven't learned how to add it. can someone tell me...
6
by: Martin Evans | last post by:
Sorry, yet another REGEX question. I've been struggling with trying to get a regular expression to do the following example in Python: Search and replace all instances of "sleeping" with "dead"....
5
by: Petra Meier | last post by:
Hello, I use the following script to parse URI and email: function parseLinks($sData){ $regexEmail = "/\w+((-\w+)|(\.\w+))*\@+((\.|-)+)*\.+/"; $sData = preg_replace($regexEmail, "<a...
4
by: Henrik Dahl | last post by:
Hello! In my application I have a need for using a regular expression now and then. Often the same regular expression must be used multiple times. For performance reasons I use the...
11
by: ymic8 | last post by:
Hi everyone, this is my first thread coz I just joined. Does anyone know how to crawl a particular URL using Python? I tried to build a breadth-first sort of crawler but have little success. ...
1
by: Mick Walker | last post by:
Hi, I am using the following function to match any URLS from within a string containing the html of a webpage: public List<stringDumpHrefs(String inputString) { Regex r; Match m;...
2
by: Mick Walker | last post by:
Hi, I am using the following function to match any URLS from within a string containing the html of a webpage: public List<stringDumpHrefs(String inputString) { Regex r; Match m;...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.