(I don't know if it is the right place. So if I am wrong, please point
me the right direction.
If this post is read by you masters, I'm honoured. If I am getting a
mere response, I'm blessed!)
Hi,
I'm a newbie regular expression user. I use regex in my Python
programs. I have a strange
(sometimes not strange, but please bear in mind; I'm a newbie ;)
problem using regex. That I want
a particular tag value of one of my HTML files.
ie: I want only the value after 'href=' in the tag >>
'<link href="mystylesh eet.css" rel="stylesheet " type="text/css">'
here it would be 'mystylesheet.c ss'. I used the following regex to get
this value(I dont know if it
is good).
_"<link\s+hr ef=["]?(.*?)["]?\s+rel=["]?stylesheet["]?\s+type=["]?text/css["]?>"_
I thought I was doing fine until I got stuck by this tag >>
<link rel="stylesheet " href="mystylesh eet.css" type="text/css" : same
tag but with 'href=' part
at a different place. I think you got the point!
So What should I do to get the exact value(here the value after
'href=') in any case even if the
tags are like these? >>
<link rel="stylesheet " href="mystylesh eet.css" type="text/css">
-OR-
<link href="mystylesh eet.css" rel="stylesheet " type="text/css">
-OR-
<link type="text/css" href="mystylesh eet.css" rel="stylesheet "> 5 1593
Hey,
I'm new with regex's as well but here is my idea. Since you don't know
which attribute will come first why don't structure your regex like
this
(first off, I'll assume that \s == ' ', actually now that I think of
it, isn't \s any whitespace character? anyways \s == ' ' for now)
'<link\s*((\s*a ttribute1\s*)|( \s*attribute2\s *)|(\s*attribut e3\s*))+>'
I think that should just about do it.
Hope this helped,
Colin
John Blogger wrote:
(I don't know if it is the right place. So if I am wrong, please point
me the right direction.
If this post is read by you masters, I'm honoured. If I am getting a
mere response, I'm blessed!)
Hi,
I'm a newbie regular expression user. I use regex in my Python
programs. I have a strange
(sometimes not strange, but please bear in mind; I'm a newbie ;)
problem using regex. That I want
a particular tag value of one of my HTML files.
ie: I want only the value after 'href=' in the tag >>
'<link href="mystylesh eet.css" rel="stylesheet " type="text/css">'
here it would be 'mystylesheet.c ss'. I used the following regex to get
this value(I dont know if it
is good).
_"<link\s+hr ef=["]?(.*?)["]?\s+rel=["]?stylesheet["]?\s+type=["]?text/css["]?>"_
I thought I was doing fine until I got stuck by this tag >>
<link rel="stylesheet " href="mystylesh eet.css" type="text/css" : same
tag but with 'href=' part
at a different place. I think you got the point!
So What should I do to get the exact value(here the value after
'href=') in any case even if the
tags are like these? >>
<link rel="stylesheet " href="mystylesh eet.css" type="text/css">
-OR-
<link href="mystylesh eet.css" rel="stylesheet " type="text/css">
-OR-
<link type="text/css" href="mystylesh eet.css" rel="stylesheet ">
John Blogger wrote:
That I want a particular tag value of one of my HTML files.
ie: I want only the value after 'href=' in the tag >>
'<link href="mystylesh eet.css" rel="stylesheet " type="text/css">'
here it would be 'mystylesheet.c ss'. I used the following regex to get
this value(I dont know if it is good).
No matter how good it is you should still use something that
understands html:
>>from BeautifulSoup import BeautifulSoup html='<link href="mystylesh eet.css" rel="stylesheet " type="text/css">' page=Beautifu lSoup(html) page.link.get ('href')
'mystylesheet.c ss'
--
- Justin
Justin Azoff wrote:
>from BeautifulSoup import BeautifulSoup html='<link href="mystylesh eet.css" rel="stylesheet " type="text/css">' page=Beautiful Soup(html) page.link.get( 'href')
'mystylesheet.c ss'
On second thought, you will probably want something like
>>[link.get('href' ) for link in page.fetch('lin k',{'type':'tex t/css'})]
['mystylesheet.c ss']
which will properly handle multiple link tags.
--
- Justin
So What should I do to get the exact value(here the value after
'href=') in any case even if the
tags are like these? >>
<link rel="stylesheet " href="mystylesh eet.css" type="text/css">
-OR-
<link href="mystylesh eet.css" rel="stylesheet " type="text/css">
-OR-
<link type="text/css" href="mystylesh eet.css" rel="stylesheet ">
The following should do it:
expr = r'<link .*?href="(.*?)" '
or if single quotes might have been used:
expr = r'''<link .*?href=["'](.*?)['"]'''
But like the others have said, beautiful soup is very good for things
like this.
Pyparsing is also good for recognizing basic HTML tags and their
attributes, regardless of the order of the attributes.
-- Paul
testText = """sldkjflsa;fa j
<link href="mystylesh eet.css" rel="stylesheet " type="text/css">
here it would be 'mystylesheet.c ss'. I used the following regex to get
this value(I dont know if it
I thought I was doing fine until I got stuck by this tag >>
<link rel="stylesheet " href="mystylesh eet.css" type="text/css" : same
tag but with 'href=' part
tags are like these? >>
<link rel="stylesheet " href="mystylesh eet.css" type="text/css">
-OR-
<link href="mystylesh eet.css" rel="stylesheet " type="text/css">
-OR-
<link type="text/css" href="mystylesh eet.css" rel="stylesheet ">
"""
from pyparsing import makeHTMLTags,li ne
linkTag = makeHTMLTags("l ink")[0]
for toks,s,e in linkTag.scanStr ing(testText):
print toks.href
print line(s,testText )
print
Prints out:
mystylesheet.cs s
<link href="mystylesh eet.css" rel="stylesheet " type="text/css">
mystylesheet.cs s
<link rel="stylesheet " href="mystylesh eet.css" type="text/css" : same
mystylesheet.cs s
<link rel="stylesheet " href="mystylesh eet.css" type="text/css">
mystylesheet.cs s
<link href="mystylesh eet.css" rel="stylesheet " type="text/css">
mystylesheet.cs s
<link type="text/css" href="mystylesh eet.css" rel="stylesheet "> This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Harry |
last post by:
Hi there,
does anyone know how I can build a regular expression e.g. for the
string.search() function on runtime, depending on the content of
variables? Should be something like this:
var strkey = "something";
var str = "Somethin like this";
if( str.search( / + strkey + / ) > -1 )
|
by: Dimitris Georgakopuolos |
last post by:
Hello,
I have a text file that I load up to a string. The text includes
certain expression like {firstName} or {userName} that I want to match
and then replace with a new expression. However, I want to use the
text included within the brackets to do a lookup so that I can replace
the expression with the new text.
For example:
|
by: James D. Marshall |
last post by:
The issue at hand, I believe is my comprehension of using regular
expression, specially to assist in replacing the expression with other text.
using regular expression (\s*) my understanding is that this will one or
more occurrences to replace all the white space between with a comma.
This search
ElseIf InStr(1, indivline, "$") Then
insert a replace statement that uses the regular expression to find and
replace all the white space...
|
by: Billa |
last post by:
Hi,
I am replaceing a big string using different regular expressions (see
some example at the end of the message). The problem is whenever I
apply a "replace" it makes a new copy of string and I want to avoid
that. My question here is if there is a way to pass either a memory
stream or array of "find", "replace" expressions or any other way to
avoid multiple copies of a string.
Any help will be highly appreciated
|
by: Pete Davis |
last post by:
I'm using regular expressions to extract some data and some links from some
web pages. I download the page and then I want to get a list of certain
links.
For building regular expressions, I use an app call The Regulator, which
makes it pretty easy to build and test regular expressions.
As a warning, I'm real weak with regular expressions. Let's say my regular
expression is:
| |
by: LordHog |
last post by:
Hello all,
I am attempting to create a small scripting application to be used
during testing. I extract the commands from the script file I was going
to tokenize the each line as one of the requirements is there one
command per line. I have always wanted to learn Regular Expressions, so
I was hoping I might do this using Regular Expressions. For a fair
number of the command will have the syntax like
Write( 0x123, 0x12, 25, 100 ) <-...
|
by: Mike |
last post by:
I have a regular expression (^(.+)(?=\s*).*\1 ) that results in
matches. I would like to get what the actual regular expression is.
In other words, when I apply ^(.+)(?=\s*).*\1 to " HEART (CONDUCTION
DEFECT) 37.33/2 HEART (CONDUCTION DEFECT) WITH
CATHETER 37.34/2 " the expression is "HEART (CONDUCTION DEFECT)". How
do I gain access to the expression (not the matches) at runtime?
Thanks,
Mike
|
by: shawnmkramer |
last post by:
Anyone every heard of the Regex.IsMatch and Regex.Match methods just
hanging and eventually getting a message "Requested Service not
found"?
I have the following pattern:
^(?<OrgCity>(+)+), City of, (?<OrgState>(()|(
+\.)))( \((?<OrgCountry>{2,})\))?$
(ignore the line wrap)
|
by: sunil |
last post by:
Hi,
Am writing one C program for one of my module and facing one problem
with the regular expression functions provided by the library libgen.h
in solaris.
In this library we are having two functions to deal with
the regular expressions
char *regcmp(const char *string1, /* char *string2 */ ,
|
by: Shawn B. |
last post by:
Greetings,
I'm using a custom WebBrowser control:
http://www.codeproject.com/KB/miscctrl/csEXWB.aspx
When I get the DocumentSource of a web page I browsed, and run a regular
expression against it, the Expression never matches anything, nothing,
nadda. Never. I know it is a correct Regular Expression because if I use
the intrinsic WebBrowser control, it the expression works. I know that if I
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
| |
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
| |
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
|
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |