473,394 Members | 1,761 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,394 software developers and data experts.

finding href with regex.

I am trying to write a regular expression that locates href attributes in
some html content. I used the example in the .NET documentation as a
starting point
("background\\s*=\\s*(?:\"(?<webpath>[^\"]*)\"|(?<webpath>\\S+))"). This
works well for most cases, but I ran into a problem when the href is not
quoted and it is the last attribute in the tag (meaning that there is a
greater-then (>) rather then a space that immediately following the URL).
How might I modified the URL so that it allows all the following?

href="..."
href=... <--space right here
href=...>

Also, are there any other formats that I need to be aware of?
Nov 15 '05 #1
1 3813
Hi,
[inline]
----- Original Message -----
From: "Peter Rilling" <pe***@nospam.rilling.net>
Newsgroups: microsoft.public.dotnet.languages.csharp
Sent: Friday, December 26, 2003 8:04 PM
Subject: finding href with regex.

I am trying to write a regular expression that locates href attributes in
some html content. I used the example in the .NET documentation as a
starting point
("background\\s*=\\s*(?:\"(?<webpath>[^\"]*)\"|(?<webpath>\\S+))"). This
Try this instead:
"background\\s*=\\s*(?:\"(?<webpath>[^\"]*)\"|(?<webpath>[^\\s>]+)[\\s>])"

\\S+ one character or more, no space
[^\\s>]+ one character or more, no space and no >
works well for most cases, but I ran into a problem when the href is not
quoted and it is the last attribute in the tag (meaning that there is a
greater-then (>) rather then a space that immediately following the URL).
How might I modified the URL so that it allows all the following?

href="..."
href=... <--space right here
href=...>

Also, are there any other formats that I need to be aware of?
Don't think there are other "valid" formats.

HTH,
Greetings

Nov 15 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
by: Paweł | last post by:
Hello! I'm looking for efficient code or site where I can find code for finding one string in another string. String which I search should have "wild" characters like '?' for any one char and...
1
by: Ryan Moore | last post by:
I am trying to write a regex expression which extracts all href links from a HTML page... I'm currently using the following: href\\s*=\\s*(?:\"(?<1>*)\"|(?<1>\\S+)) but it has a problem with...
7
by: melanieab | last post by:
Hi, I'm trying to use DataView to find the row number in the datatable that contains "Rich" in it so that I can highlight it. It works fine when I enter the entire string (i.e. Richard), but I...
1
by: Indian Offshore Company | last post by:
Hi, I want to parse html with multiple <a href.. text...</atags as follow: "... some html... <a class="1" href="city1.html" onclick="etc."click for info on city1 </a.. some html.. <a...
2
by: Extremest | last post by:
Here is the code I have so far. It connects to a db and grabs headers. It then sorts them into groups and then puts all the complete ones into another table. Problem I am having is that for some...
2
by: jd | last post by:
hi all, i'm using regular expression to find the url of the page to be opened using the window.open(). typical urls look like this http://domain.com/path-to-page.html right now i'm trying...
2
by: Mick Walker | last post by:
Hi, I am using the following function to match any URLS from within a string containing the html of a webpage: public List<stringDumpHrefs(String inputString) { Regex r; Match m;...
2
Markus
by: Markus | last post by:
I was intruiged as to how to build a parse that would run like bbcode does. So's i had a go at it, somewhat shakily and naively. As was expected, i hit a snare: when searching for a i could...
11
by: coflo | last post by:
Hello I would like to replace an a href link that is provided in the RSS below with my own link. The link that I am looking to replace is defined in the <description> tag within the RSS. Im...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.