473,409 Members | 1,954 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,409 software developers and data experts.

what is wrong with my script.

Im using the below to extract the text between all the <br></br>.

But it does not prints out all text and prints the normal text which is not a part of html link tag.

Example, if you have <a href="test.html" ><b>The Testing Page is here</b></a>
<b> extrat text</b>
I want to extract only - "The Testing Page is here"



Here variable $myfile

Here variable $myfile contains the whole HTML page
Expand|Select|Wrap|Line Numbers
  1. while ($myfile =~ /<br.+?>(.*)<\/br>/xg) 
  2.  {print ("a");
  3.  print $1;
  4.  }
  5.  
Can some one help me out, what I am doing wrong here?

More Information, I am trying to extract all the text which is a link in the given HTML page.
Feb 6 '10 #1
3 1457
There is no breakline tag in your example and the breakline does not have a closing tag, it is self closing.... I will assume you mean the bold tag.

The way you have written your regex, it is looking for a breakline tag so right off the bat, that needs to be fixed.

Furthermore, the way you have it written, it will only pickup on a pattern that contains a URL text between bold tags. Not very flexible.

the pattern you want to look for is anchor tag, followed by 0 or more tags which is followed by alphanumeric characters of any length and ends when you hit the open bracket of a tag.

but even with that, there is a problem if a tag is embeded in the middle of a sentence used as the link text. I'll leave that to you to figure out though, if you care to.
Feb 7 '10 #2
numberwhun
3,509 Expert Mod 2GB
You need to really examine what you are telling your code to extract and what you actually have in your data.

You are telling it to match everything between <br> and </br>, but those tags do not exist in your example. Instead, remove the 'r' and try matching the <b> </b> tag set.

Regards,

Jeff
Feb 8 '10 #3
nithinpes
410 Expert 256MB
If you use :
Expand|Select|Wrap|Line Numbers
  1. $myfile =~ /<b>(.*)<\/b>/xg
  2.  
$1 would have "The Testing Page is here</b></a>
<b> extrat text".
This is because of the greedy nature of * quantifier. To limit this behaviour in order to match minimum number of characters before finding a </b>, use:
Expand|Select|Wrap|Line Numbers
  1. $myfile =~ /<b>(.*?)<\/b>/xg
  2.  
Feb 8 '10 #4

Sign in to post your reply or Sign up for a free account.

Similar topics

10
by: Greener | last post by:
Hi, I need help badly. Can you do client-side programming instead of server-side to capture the Browser type info? If this is the case, what's wrong with the following? <script...
5
by: tin | last post by:
<script language="Javascript"> <!-- function apri (theURL,winName,features){ window.open (theURL,winName,features); var a=null; oldwindow = window.self; oldwindow.opener = window.self;...
4
by: F. Da Costa | last post by:
Hi, I was wondering whether someone could enlighten me as to the reason why the slice does not work in IE when the arr is passed in properly. Checked the values in the srcArr and they are...
6
by: Rtritell | last post by:
Please can you find out what's wrong, fix the script and tell me what was wrong. Im just beginning <html> <head> <title>Random Mad Lib!</title> <script language="JavaScript"> <!-- Hide
17
by: Paul | last post by:
HI! I get an error with this code. <SCRIPT language="JavaScript"> If (ifp==""){ ifp="default.htm"} //--></SCRIPT> Basicly I want my iframe to have a default page if the user enters in...
4
by: Paul | last post by:
HI! I have a script that does not seem to work. can someone tell me what I am doing wrong here? <script language="JavaScript"> function firefoxautofix(){ parent.window.resizeBy(-1,-1)...
2
by: Miguel Dias Moura | last post by:
Hello, i want to open a new window when a button is clicked. I am working in ASP.net / VB. However my code is not working. This is my Script Code: <script runat="server"> Private Sub...
8
by: Midnight Java Junkie | last post by:
Dear Colleagues: I feel that the dumbest questions are those that are never asked. I have been given the opportunity to get into .NET. Our organization has a subscription with Microsoft that...
6
by: plemon | last post by:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta...
16
by: SirG | last post by:
I'm looking for an explanation of why one piece of code works and another does not. I have to warn you that this is the first piece of Javascript I've ever written, so if there is a better way or a...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.