Le Thursday 26 June 2008 15:53:06 oyster, vous avez écrit*:
that is, there is no TABLE tag between a TABLE, for example
<table >something with out table tag</table>
what is the RE pattern? thanks
the following is not right
<table.*?>[^table]*?</table>
The construct [abc] does not match a whole word but only one char, so
[^table] means "any char which is not t, a, b, l or e".
Anyway the inside table word won't match your pattern, as there are '<'
and '>' in it, and these chars have to be escaped when used as simple text.
So this should work:
re.compile(r'<t able(|[ ].*)>.*</table>')
^ this is to avoid matching a tag name starting with table
(like <table_ext>)
--
Cédric Lucantis 6 1210
In article <ma************ *************** **********@pyth on.org>,
Cédric Lucantis <om**@no-log.orgwrote:
Le Thursday 26 June 2008 15:53:06 oyster, vous avez écrit*:
that is, there is no TABLE tag between a TABLE, for example
<table >something with out table tag</table>
what is the RE pattern? thanks
the following is not right
<table.*?>[^table]*?</table>
The construct [abc] does not match a whole word but only one char, so
[^table] means "any char which is not t, a, b, l or e".
Anyway the inside table word won't match your pattern, as there are '<'
and '>' in it, and these chars have to be escaped when used as simple text.
So this should work:
re.compile(r'<t able(|[ ].*)>.*</table>')
^ this is to avoid matching a tag name starting with
table
(like <table_ext>)
Doesn't work - for example it matches '<table></table><table></table>'
(and in fact if the html contains any number of tables it's going
to match the string starting at the start of the first table and
ending at the end of the last one.)
--
David C. Ullrich
In article
<62************ *************** *******@w4g2000 prd.googlegroup s.com>,
Jonathan Gardner <jg******@jonat hangardner.netw rote:
On Jun 26, 3:22*pm, MRAB <goo...@mrabarn ett.plus.comwro te:
Try something like:
re.compile(r'<t able\b.*?>.*?</table>', re.DOTALL)
So you would pick up strings like "<table><tr><td ><table><tr><td >foo</
td></tr></table>"? I doubt that is what oyster wants.
I asked a question recently - nobody answered, I think
because they assumed it was just a rhetorical question:
(i) It's true, isn't it, that it's impossible for the
formal CS notion of "regular expression" to correctly
parse nested open/close delimiters?
(ii) The regexes in languages like Python and Perl include
features that are not part of the formal CS notion of
"regular expression". Do they include something that
does allow parsing nested delimiters properly?
--
David C. Ullrich
On Jun 27, 1:32 pm, "David C. Ullrich" <dullr...@spryn et.comwrote:
In article
<62f752f3-d840-42de-a414-0d56d15d7...@w4 g2000prd.google groups.com>,
Jonathan Gardner <jgard...@jonat hangardner.netw rote:
On Jun 26, 3:22 pm, MRAB <goo...@mrabarn ett.plus.comwro te:
Try something like:
re.compile(r'<t able\b.*?>.*?</table>', re.DOTALL)
So you would pick up strings like "<table><tr><td ><table><tr><td >foo</
td></tr></table>"? I doubt that is what oyster wants.
I asked a question recently - nobody answered, I think
because they assumed it was just a rhetorical question:
(i) It's true, isn't it, that it's impossible for the
formal CS notion of "regular expression" to correctly
parse nested open/close delimiters?
Yes. For the proof, you want to look at the pumping lemma found in
your favorite Theory of Computation textbook.
>
(ii) The regexes in languages like Python and Perl include
features that are not part of the formal CS notion of
"regular expression". Do they include something that
does allow parsing nested delimiters properly?
So, I think most of the extensions fall into syntactic sugar
(certainly all the character classes \b \s \w, etc). The ability to
look at input without consuming it is more than syntactic sugar, but
my intuition is that it could be pretty easily modeled by a
nondeterministi c finite state machine, which is of equivalent power to
REs. The only thing I can really think of that is completely non-
regular is the \1 \2, etc syntax to match previously match strings
exactly. But since you can't to an arbitrary number of them, I don't
think its actually context free. (I'm not prepared to give a proof
either way). Needless to say that even if you could, it would be
highly impractical to match parentheses using those.
So, yeah, to match arbitrary nested delimiters, you need a real
context free parser.
>
--
David C. Ullrich
-Dan
In article
<50************ *************** *******@56g2000 hsm.googlegroup s.com>,
Dan <th********@gma il.comwrote:
On Jun 27, 1:32 pm, "David C. Ullrich" <dullr...@spryn et.comwrote:
In article
<62f752f3-d840-42de-a414-0d56d15d7...@w4 g2000prd.google groups.com>,
Jonathan Gardner <jgard...@jonat hangardner.netw rote:
On Jun 26, 3:22 pm, MRAB <goo...@mrabarn ett.plus.comwro te:
Try something like:
re.compile(r'<t able\b.*?>.*?</table>', re.DOTALL)
So you would pick up strings like "<table><tr><td ><table><tr><td >foo</
td></tr></table>"? I doubt that is what oyster wants.
I asked a question recently - nobody answered, I think
because they assumed it was just a rhetorical question:
(i) It's true, isn't it, that it's impossible for the
formal CS notion of "regular expression" to correctly
parse nested open/close delimiters?
Yes. For the proof, you want to look at the pumping lemma found in
your favorite Theory of Computation textbook.
Ah, thanks. Don't have a favorite text, not having any at all.
But wikipedia works - what I found at http://en.wikipedia.org/wiki/Pumping...ular_languages
was pretty clear. (Yes, it's exactly that \1, \2 stuff that
convinced me I really don't understand what one can do with
a Python regex.)
(ii) The regexes in languages like Python and Perl include
features that are not part of the formal CS notion of
"regular expression". Do they include something that
does allow parsing nested delimiters properly?
So, I think most of the extensions fall into syntactic sugar
(certainly all the character classes \b \s \w, etc). The ability to
look at input without consuming it is more than syntactic sugar, but
my intuition is that it could be pretty easily modeled by a
nondeterministi c finite state machine, which is of equivalent power to
REs. The only thing I can really think of that is completely non-
regular is the \1 \2, etc syntax to match previously match strings
exactly. But since you can't to an arbitrary number of them, I don't
think its actually context free. (I'm not prepared to give a proof
either way). Needless to say that even if you could, it would be
highly impractical to match parentheses using those.
So, yeah, to match arbitrary nested delimiters, you need a real
context free parser.
--
David C. Ullrich
-Dan
--
David C. Ullrich
On Jun 27, 10:32*am, "David C. Ullrich" <dullr...@spryn et.comwrote:
(ii) The regexes in languages like Python and Perl include
features that are not part of the formal CS notion of
"regular expression". Do they include something that
does allow parsing nested delimiters properly?
In perl, there are some pretty wild extensions to the regex syntax,
features that make it much more than a regular expression engine.
Yes, it is possible to match parentheses and other nested structures
(such as HTML), and the regex to do so isn't incredibly difficult.
Note that Python doesn't support this extension.
See http://www.perl.com/pub/a/2003/08/21/perlcookbook.html
In article
<87************ *************** *******@p39g200 0prm.googlegrou ps.com>,
Jonathan Gardner <jg******@jonat hangardner.netw rote:
On Jun 27, 10:32*am, "David C. Ullrich" <dullr...@spryn et.comwrote:
(ii) The regexes in languages like Python and Perl include
features that are not part of the formal CS notion of
"regular expression". Do they include something that
does allow parsing nested delimiters properly?
In perl, there are some pretty wild extensions to the regex syntax,
features that make it much more than a regular expression engine.
Yes, it is possible to match parentheses and other nested structures
(such as HTML), and the regex to do so isn't incredibly difficult.
Note that Python doesn't support this extension.
Huh. My evidently misinformed impression was that the regexes
in P and P were essentially equivalent. (I hope nobody takes
that as a complaint...)
See http://www.perl.com/pub/a/2003/08/21/perlcookbook.html
--
David C. Ullrich This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: rk |
last post by:
Hi,
I'm a beginner for perl/cgi programs and i tried to write a cgi script
and when i ran it, i got the following error. But when i verified it
from the book i typed exactly whatever it is there and i checked other
examples too. I did't get any clue.Can someone please help me on this.
#!/usr/bin/perl
use warnings;
|
by: ahogue at theory dot lcs dot mit dot edu |
last post by:
Hello -
Is there any way to match complex subtree patterns with XPath? The
functions I see all seem to match along a single path from root to leaf.
I would like to match full subtrees.
For example, given the XHTML:
<html>
<body>
|
by: David Nedrow |
last post by:
OK, I have a problem which I'm guessing is simply my inability to
figure out a select pattern in XSL.
I have an XML file similar to the following:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="vmv.xsl"?>
<ruleset xmlns="https://foo.com"
|
by: Tjerk Wolterink |
last post by:
I have an xsl file wich xsl:includes this file:
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0"
xmlns="http://www.w3.org/1999/xhtml"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:page="http://www.wolterinkwebdesign.com/xml/page"
xmlns:xc="http://www.wolterinkwebdesign.com/xml/xcontent">
<xsl:output method="xml" indent="yes"/>
|
by: Kelmen Wong |
last post by:
Greeting,
I want to extract all "" from a string, what pattern should I
used?
eg.
=
- return array or test1, or test2
| |
by: Ed Brown |
last post by:
I'm working on a VB.Net application that needs to do quite a bit of string
pattern matching, and am having problems using the "LIKE" operator to match
the same string twice in the pattern. For example, in the following code:
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As
System.EventArgs) Handles Button1.Click
Dim theString As String
theString = "1234 TEST 5432 TEST ABCD"
If theString Like "*TEST*TEST*" Then...
|
by: Terry Olsen |
last post by:
Is there a good way to find a pattern of bytes/chars in a stream? I've got
a serial port connected to a tcp port. I need to be able to catch a unique
character string in the stream so that I can perform certain functions. For
example, I have a telnet client connected to an Apple II through the serial
port. The user at the telnet terminal is using the BBS running on the Apple
II just like the good ole days of dialup BBS's. I need to be...
|
by: Jéjé |
last post by:
Hi,
I have a file which contain 1 pair of values by line like:
Name1=Value1
=
I nned to store these pair of values in a sortedlist.
So the result expected for the 2 samples lines is:
Key Value
Name1 Value1
|
by: oyster |
last post by:
that is, there is no TABLE tag between a TABLE, for example
<table >something with out table tag</table>
what is the RE pattern? thanks
the following is not right
<table.*?>*?</table>
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
| |
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
|
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| |