473,772 Members | 2,965 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Regular Expression problem

(I don't know if it is the right place. So if I am wrong, please point
me the right direction.
If this post is read by you masters, I'm honoured. If I am getting a
mere response, I'm blessed!)

Hi,

I'm a newbie regular expression user. I use regex in my Python
programs. I have a strange

(sometimes not strange, but please bear in mind; I'm a newbie ;)
problem using regex. That I want

a particular tag value of one of my HTML files.

ie: I want only the value after 'href=' in the tag >>

'<link href="mystylesh eet.css" rel="stylesheet " type="text/css">'

here it would be 'mystylesheet.c ss'. I used the following regex to get
this value(I dont know if it

is good).

_"<link\s+hr ef=["]?(.*?)["]?\s+rel=["]?stylesheet["]?\s+type=["]?text/css["]?>"_
I thought I was doing fine until I got stuck by this tag >>

<link rel="stylesheet " href="mystylesh eet.css" type="text/css" : same
tag but with 'href=' part

at a different place. I think you got the point!

So What should I do to get the exact value(here the value after
'href=') in any case even if the

tags are like these? >>

<link rel="stylesheet " href="mystylesh eet.css" type="text/css">
-OR-
<link href="mystylesh eet.css" rel="stylesheet " type="text/css">
-OR-
<link type="text/css" href="mystylesh eet.css" rel="stylesheet ">

Jul 13 '06 #1
5 1593
Hey,

I'm new with regex's as well but here is my idea. Since you don't know
which attribute will come first why don't structure your regex like
this

(first off, I'll assume that \s == ' ', actually now that I think of
it, isn't \s any whitespace character? anyways \s == ' ' for now)

'<link\s*((\s*a ttribute1\s*)|( \s*attribute2\s *)|(\s*attribut e3\s*))+>'

I think that should just about do it.

Hope this helped,

Colin

John Blogger wrote:
(I don't know if it is the right place. So if I am wrong, please point
me the right direction.
If this post is read by you masters, I'm honoured. If I am getting a
mere response, I'm blessed!)

Hi,

I'm a newbie regular expression user. I use regex in my Python
programs. I have a strange

(sometimes not strange, but please bear in mind; I'm a newbie ;)
problem using regex. That I want

a particular tag value of one of my HTML files.

ie: I want only the value after 'href=' in the tag >>

'<link href="mystylesh eet.css" rel="stylesheet " type="text/css">'

here it would be 'mystylesheet.c ss'. I used the following regex to get
this value(I dont know if it

is good).

_"<link\s+hr ef=["]?(.*?)["]?\s+rel=["]?stylesheet["]?\s+type=["]?text/css["]?>"_
I thought I was doing fine until I got stuck by this tag >>

<link rel="stylesheet " href="mystylesh eet.css" type="text/css" : same
tag but with 'href=' part

at a different place. I think you got the point!

So What should I do to get the exact value(here the value after
'href=') in any case even if the

tags are like these? >>

<link rel="stylesheet " href="mystylesh eet.css" type="text/css">
-OR-
<link href="mystylesh eet.css" rel="stylesheet " type="text/css">
-OR-
<link type="text/css" href="mystylesh eet.css" rel="stylesheet ">
Jul 13 '06 #2
John Blogger wrote:
That I want a particular tag value of one of my HTML files.

ie: I want only the value after 'href=' in the tag >>

'<link href="mystylesh eet.css" rel="stylesheet " type="text/css">'

here it would be 'mystylesheet.c ss'. I used the following regex to get
this value(I dont know if it is good).
No matter how good it is you should still use something that
understands html:
>>from BeautifulSoup import BeautifulSoup
html='<link href="mystylesh eet.css" rel="stylesheet " type="text/css">'
page=Beautifu lSoup(html)
page.link.get ('href')
'mystylesheet.c ss'

--
- Justin

Jul 14 '06 #3
Justin Azoff wrote:
>from BeautifulSoup import BeautifulSoup
html='<link href="mystylesh eet.css" rel="stylesheet " type="text/css">'
page=Beautiful Soup(html)
page.link.get( 'href')
'mystylesheet.c ss'
On second thought, you will probably want something like
>>[link.get('href' ) for link in page.fetch('lin k',{'type':'tex t/css'})]
['mystylesheet.c ss']

which will properly handle multiple link tags.

--
- Justin

Jul 14 '06 #4
Ant
So What should I do to get the exact value(here the value after
'href=') in any case even if the

tags are like these? >>

<link rel="stylesheet " href="mystylesh eet.css" type="text/css">
-OR-
<link href="mystylesh eet.css" rel="stylesheet " type="text/css">
-OR-
<link type="text/css" href="mystylesh eet.css" rel="stylesheet ">
The following should do it:

expr = r'<link .*?href="(.*?)" '

or if single quotes might have been used:

expr = r'''<link .*?href=["'](.*?)['"]'''

But like the others have said, beautiful soup is very good for things
like this.

Jul 14 '06 #5
Pyparsing is also good for recognizing basic HTML tags and their
attributes, regardless of the order of the attributes.

-- Paul

testText = """sldkjflsa;fa j

<link href="mystylesh eet.css" rel="stylesheet " type="text/css">

here it would be 'mystylesheet.c ss'. I used the following regex to get
this value(I dont know if it

I thought I was doing fine until I got stuck by this tag >>

<link rel="stylesheet " href="mystylesh eet.css" type="text/css" : same

tag but with 'href=' part

tags are like these? >>

<link rel="stylesheet " href="mystylesh eet.css" type="text/css">
-OR-
<link href="mystylesh eet.css" rel="stylesheet " type="text/css">
-OR-
<link type="text/css" href="mystylesh eet.css" rel="stylesheet ">

"""
from pyparsing import makeHTMLTags,li ne

linkTag = makeHTMLTags("l ink")[0]
for toks,s,e in linkTag.scanStr ing(testText):
print toks.href
print line(s,testText )
print

Prints out:

mystylesheet.cs s
<link href="mystylesh eet.css" rel="stylesheet " type="text/css">

mystylesheet.cs s
<link rel="stylesheet " href="mystylesh eet.css" type="text/css" : same
mystylesheet.cs s
<link rel="stylesheet " href="mystylesh eet.css" type="text/css">

mystylesheet.cs s
<link href="mystylesh eet.css" rel="stylesheet " type="text/css">

mystylesheet.cs s
<link type="text/css" href="mystylesh eet.css" rel="stylesheet ">

Jul 14 '06 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
3154
by: Harry | last post by:
Hi there, does anyone know how I can build a regular expression e.g. for the string.search() function on runtime, depending on the content of variables? Should be something like this: var strkey = "something"; var str = "Somethin like this"; if( str.search( / + strkey + / ) > -1 )
11
5392
by: Dimitris Georgakopuolos | last post by:
Hello, I have a text file that I load up to a string. The text includes certain expression like {firstName} or {userName} that I want to match and then replace with a new expression. However, I want to use the text included within the brackets to do a lookup so that I can replace the expression with the new text. For example:
3
3221
by: James D. Marshall | last post by:
The issue at hand, I believe is my comprehension of using regular expression, specially to assist in replacing the expression with other text. using regular expression (\s*) my understanding is that this will one or more occurrences to replace all the white space between with a comma. This search ElseIf InStr(1, indivline, "$") Then insert a replace statement that uses the regular expression to find and replace all the white space...
7
3830
by: Billa | last post by:
Hi, I am replaceing a big string using different regular expressions (see some example at the end of the message). The problem is whenever I apply a "replace" it makes a new copy of string and I want to avoid that. My question here is if there is a way to pass either a memory stream or array of "find", "replace" expressions or any other way to avoid multiple copies of a string. Any help will be highly appreciated
9
3358
by: Pete Davis | last post by:
I'm using regular expressions to extract some data and some links from some web pages. I download the page and then I want to get a list of certain links. For building regular expressions, I use an app call The Regulator, which makes it pretty easy to build and test regular expressions. As a warning, I'm real weak with regular expressions. Let's say my regular expression is:
3
3334
by: LordHog | last post by:
Hello all, I am attempting to create a small scripting application to be used during testing. I extract the commands from the script file I was going to tokenize the each line as one of the requirements is there one command per line. I have always wanted to learn Regular Expressions, so I was hoping I might do this using Regular Expressions. For a fair number of the command will have the syntax like Write( 0x123, 0x12, 25, 100 ) <-...
25
5167
by: Mike | last post by:
I have a regular expression (^(.+)(?=\s*).*\1 ) that results in matches. I would like to get what the actual regular expression is. In other words, when I apply ^(.+)(?=\s*).*\1 to " HEART (CONDUCTION DEFECT) 37.33/2 HEART (CONDUCTION DEFECT) WITH CATHETER 37.34/2 " the expression is "HEART (CONDUCTION DEFECT)". How do I gain access to the expression (not the matches) at runtime? Thanks, Mike
5
3790
by: shawnmkramer | last post by:
Anyone every heard of the Regex.IsMatch and Regex.Match methods just hanging and eventually getting a message "Requested Service not found"? I have the following pattern: ^(?<OrgCity>(+)+), City of, (?<OrgState>(()|( +\.)))( \((?<OrgCountry>{2,})\))?$ (ignore the line wrap)
1
2112
by: sunil | last post by:
Hi, Am writing one C program for one of my module and facing one problem with the regular expression functions provided by the library libgen.h in solaris. In this library we are having two functions to deal with the regular expressions char *regcmp(const char *string1, /* char *string2 */ ,
1
1694
by: Shawn B. | last post by:
Greetings, I'm using a custom WebBrowser control: http://www.codeproject.com/KB/miscctrl/csEXWB.aspx When I get the DocumentSource of a web page I browsed, and run a regular expression against it, the Expression never matches anything, nothing, nadda. Never. I know it is a correct Regular Expression because if I use the intrinsic WebBrowser control, it the expression works. I know that if I
0
9620
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10261
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
10038
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9912
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
7460
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6715
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5482
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4007
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3609
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.