473,326 Members | 2,680 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,326 software developers and data experts.

Regular Expression help - Extract links from certain tag

Hi Guys,

I was wondering if someone could help me out with the following
requirements
<mydocument>
<div id="other">
<a href="linkother">linkother</a>
</div>

<div id="hello">
<a href="link1url">link1</a>
<a href="link2url">link2</a>
</div>
</mydocument>

If I wanted to extract all links from the div tag id="hello" how do I
go about it
Desired result would be:
link1url
link2url

So far I'm extracting links like this: <a href="[^"]+">[^<]+</a> but
how do I go about only making sure they are from a particular tag
group?

Regards DotnetShadow

Jun 29 '06 #1
1 1414
On 29 Jun 2006 06:14:40 -0700, ro********@gmail.com wrote:
Hi Guys,

I was wondering if someone could help me out with the following
requirements
<mydocument>
<div id="other">
<a href="linkother">linkother</a>
</div>

<div id="hello">
<a href="link1url">link1</a>
<a href="link2url">link2</a>
</div>
</mydocument>

If I wanted to extract all links from the div tag id="hello" how do I
go about it
Desired result would be:
link1url
link2url

So far I'm extracting links like this: <a href="[^"]+">[^<]+</a> but
how do I go about only making sure they are from a particular tag
group?

Regards DotnetShadow


You could look at loading it into a DOM tree (possibly XML DOM if the
document is well formed). Then you just have to navigate the tree
looking for div tags with an id attribute of hello, then fetch all the A
tags under that node (could be possible using XPATH).
Jun 29 '06 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Keith Morris | last post by:
Hi all! I'm creating a mini CMS that will store content in a MySQL database. What I am trying to do is parse the content and replace certain keywords with a link. The keywords and associated...
8
by: Michael McGarry | last post by:
Hi, I am horrible with Regular Expressions, can anyone recommend a book on it? Also I am trying to parse the following string to extract the number after load average. ".... load average:...
1
by: Kenneth McDonald | last post by:
I'm working on the 0.8 release of my 'rex' module, and would appreciate feedback, suggestions, and criticism as I work towards finalizing the API and feature sets. rex is a module intended to make...
4
by: Steve | last post by:
Hi all I have to validate a password to determine whether or not it adheres to certain rules. For example, the password must contain at least 1 number, at least 1 uppercase character and at...
3
by: Tom | last post by:
I have struggled with the issue of whether or not to use Regular Expressions for a long time now, and after implementing many text manipulating solutions both ways, I've found that writing...
4
by: Neri | last post by:
Some document processing program I write has to deal with documents that have headers and footers that are unnecessary for the main processing part. Therefore, I'm using a regular expression to go...
3
by: jarod1701 | last post by:
Hi, I'm currently trying to create a regular expression that can extract certain elements from a url. The url will be of the following form: http://user:pass@www.sitename.com I want a...
18
by: Q. John Chen | last post by:
I have Vidation Controls First One: Simple exluce certain special characters: say no a or b or c in the string: * Second One: I required date be entered in "MM/DD/YYYY" format: //+4 How...
9
by: Pete Davis | last post by:
I'm using regular expressions to extract some data and some links from some web pages. I download the page and then I want to get a list of certain links. For building regular expressions, I use...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.