473,581 Members | 2,785 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

About Regular Expressions

Hi,

I am learning Regular Expression and currently i am trying to capture
information from web page.
I wrote the following code to capture the ID as well as the Title

Dim regex = New Regex( _
"viewtopic.php\ ?t=(?<ID>\d+)"" \sclass=""topic title"">(?<Titl e>.*)\</a>",
_
RegexOptions.Ig noreCase _
Or RegexOptions.Mu ltiline _
Or RegexOptions.Co mpiled _
)

It works fine
Now i am trying to get the post date by using the following regex
"s=""postdetail s"">(?<Date>.*) \<br />"

It works fine too, but when i combine the two reg ex together (That is
"viewtopic.php\ ?t=(?<ID>\d+)"" \sclass=""topic title"">(?<Titl e>.*)\</a>.*s=""
postdetails"">( ?<Date>.*)\<br />"", it cannot got the correct data, could
anyone know how can i fix this? whatz wrong on my regex?

Thx a lot
Regards,
Norton
(Here is the source text for reg exp to take place, the ID i want to capture
is 245090, title is "Can someone help me import a database" and the date is
Wed Dec 08, 2004 6:31 pm)

<tr>
<td class="row1" align="center" valign="middle" width="20"><img
src="templates/subSilver/images/folder.gif" width="19" height="18" alt="No
new posts" title="No new posts" /></td>
<td class="row1" width="100%"><s pan class="topictit le"><a
href="viewtopic .php?t=245090" class="topictit le">Can someone help me import
a database</a></span><span class="gensmall "><br />
</span></td>
<td class="row2" align="center" valign="middle" ><span
class="postdeta ils">2</span></td>
<td class="row3" align="center" valign="middle" ><span class="name"><a
href="profile.p hp?mode=viewpro file&amp;u=1542 79">savethesqui rrels</a></span>
</td>
<td class="row2" align="center" valign="middle" ><span
class="postdeta ils">126</span></td>
<td class="row3Righ t" align="center" valign="middle"
nowrap="nowrap" ><span class="postdeta ils">Wed Dec 08, 2004 6:31 pm<br /><a
href="profile.p hp?mode=viewpro file&amp;u=1542 79">savethesqui rrels</a> <a
href="viewtopic .php?p=1344942# 1344942"><img
src="templates/subSilver/images/icon_latest_rep ly.gif" alt="View latest
post" title="View latest post" border="0" /></a></span></td>
</tr>
Nov 21 '05 #1
2 2042
Hi Norton,

I'll look more into the issue. I don't have an answer yet what is wrong,
but I can reproduce your issue.
So far I can advise only to build more specific regular expressions
minimizing ".*" usage. I suspect this is the root of the problem. As soon
as I get more information, I'll get back to you.

--
Regards,
Victor Urnyshev

This posting is provided "AS IS" with no warranties, and confers no rights.

--------------------
| From: "norton" <no********@hot mail.com>
| Subject: About Regular Expressions
| Date: Fri, 10 Dec 2004 01:15:23 +0800
| Lines: 56
| X-Priority: 3
| X-MSMail-Priority: Normal
| X-Newsreader: Microsoft Outlook Express 6.00.3790.181
| X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.181
| Message-ID: <uX************ **@TK2MSFTNGP15 .phx.gbl>
| Newsgroups: microsoft.publi c.dotnet.langua ges.vb
| NNTP-Posting-Host: 210006240035.ct inets.com 210.6.240.35
| Path:
cpmsftngxa10.ph x.gbl!TK2MSFTNG XA01.phx.gbl!TK 2MSFTNGP08.phx. gbl!TK2MSFTNGP1 5
phx.gbl
| Xref: cpmsftngxa10.ph x.gbl microsoft.publi c.dotnet.langua ges.vb:247781
| X-Tomcat-NG: microsoft.publi c.dotnet.langua ges.vb
|
| Hi,
|
| I am learning Regular Expression and currently i am trying to capture
| information from web page.
| I wrote the following code to capture the ID as well as the Title
|
| Dim regex = New Regex( _
|
"viewtopic.php\ ?t=(?<ID>\d+)"" \sclass=""topic title"">(?<Titl e>.*)\</a>",
| _
| RegexOptions.Ig noreCase _
| Or RegexOptions.Mu ltiline _
| Or RegexOptions.Co mpiled _
| )
|
| It works fine
| Now i am trying to get the post date by using the following regex
| "s=""postdetail s"">(?<Date>.*) \<br />"
|
| It works fine too, but when i combine the two reg ex together (That is
|
"viewtopic.php\ ?t=(?<ID>\d+)"" \sclass=""topic title"">(?<Titl e>.*)\</a>.*s=""
| postdetails"">( ?<Date>.*)\<br />"", it cannot got the correct data, could
| anyone know how can i fix this? whatz wrong on my regex?
|
| Thx a lot
| Regards,
| Norton
|
|
| (Here is the source text for reg exp to take place, the ID i want to
capture
| is 245090, title is "Can someone help me import a database" and the date
is
| Wed Dec 08, 2004 6:31 pm)
|
| <tr>
| <td class="row1" align="center" valign="middle" width="20"><img
| src="templates/subSilver/images/folder.gif" width="19" height="18" alt="No
| new posts" title="No new posts" /></td>
| <td class="row1" width="100%"><s pan class="topictit le"><a
| href="viewtopic .php?t=245090" class="topictit le">Can someone help me
import
| a database</a></span><span class="gensmall "><br />
| </span></td>
| <td class="row2" align="center" valign="middle" ><span
| class="postdeta ils">2</span></td>
| <td class="row3" align="center" valign="middle" ><span class="name"><a
|
href="profile.p hp?mode=viewpro file&amp;u=1542 79">savethesqui rrels</a></span>
| </td>
| <td class="row2" align="center" valign="middle" ><span
| class="postdeta ils">126</span></td>
| <td class="row3Righ t" align="center" valign="middle"
| nowrap="nowrap" ><span class="postdeta ils">Wed Dec 08, 2004 6:31 pm<br /><a
| href="profile.p hp?mode=viewpro file&amp;u=1542 79">savethesqui rrels</a> <a
| href="viewtopic .php?p=1344942# 1344942"><img
| src="templates/subSilver/images/icon_latest_rep ly.gif" alt="View latest
| post" title="View latest post" border="0" /></a></span></td>
| </tr>
|
|
|

Nov 21 '05 #2
Hi Norton,

This seems to be a bug in RegEx implementation. The bug is passed to our
Devs. I cannot promise anything at this point, but we will try to address
it in the future releases.

--
Regards,
Victor Urnyshev

This posting is provided "AS IS" with no warranties, and confers no rights.

--------------------
| X-Tomcat-ID: 294753620
| References: <uX************ **@TK2MSFTNGP15 .phx.gbl>
| MIME-Version: 1.0
| Content-Type: text/plain
| Content-Transfer-Encoding: 7bit
| From: vi*****@online. microsoft.com (Victor Urnyshev [MSFT])
| Organization: Microsoft
| Date: Mon, 13 Dec 2004 14:08:46 GMT
| Subject: RE: About Regular Expressions
| X-Tomcat-NG: microsoft.publi c.dotnet.langua ges.vb
| Message-ID: <wQ************ **@cpmsftngxa10 .phx.gbl>
| Newsgroups: microsoft.publi c.dotnet.langua ges.vb
| Lines: 86
| Path: cpmsftngxa10.ph x.gbl
| Xref: cpmsftngxa10.ph x.gbl microsoft.publi c.dotnet.langua ges.vb:248290
| NNTP-Posting-Host: TOMCATIMPORT1 10.201.218.122
|
| Hi Norton,
|
| I'll look more into the issue. I don't have an answer yet what is wrong,
| but I can reproduce your issue.
| So far I can advise only to build more specific regular expressions
| minimizing ".*" usage. I suspect this is the root of the problem. As soon
| as I get more information, I'll get back to you.
|
| --
| Regards,
| Victor Urnyshev
|
| This posting is provided "AS IS" with no warranties, and confers no
rights.
|
| --------------------
| | From: "norton" <no********@hot mail.com>
| | Subject: About Regular Expressions
| | Date: Fri, 10 Dec 2004 01:15:23 +0800
| | Lines: 56
| | X-Priority: 3
| | X-MSMail-Priority: Normal
| | X-Newsreader: Microsoft Outlook Express 6.00.3790.181
| | X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.181
| | Message-ID: <uX************ **@TK2MSFTNGP15 .phx.gbl>
| | Newsgroups: microsoft.publi c.dotnet.langua ges.vb
| | NNTP-Posting-Host: 210006240035.ct inets.com 210.6.240.35
| | Path:
|
cpmsftngxa10.ph x.gbl!TK2MSFTNG XA01.phx.gbl!TK 2MSFTNGP08.phx. gbl!TK2MSFTNGP1 5
| .phx.gbl
| | Xref: cpmsftngxa10.ph x.gbl microsoft.publi c.dotnet.langua ges.vb:247781
| | X-Tomcat-NG: microsoft.publi c.dotnet.langua ges.vb
| |
| | Hi,
| |
| | I am learning Regular Expression and currently i am trying to capture
| | information from web page.
| | I wrote the following code to capture the ID as well as the Title
| |
| | Dim regex = New Regex( _
| |
| "viewtopic.php\ ?t=(?<ID>\d+)"" \sclass=""topic title"">(?<Titl e>.*)\</a>",
| | _
| | RegexOptions.Ig noreCase _
| | Or RegexOptions.Mu ltiline _
| | Or RegexOptions.Co mpiled _
| | )
| |
| | It works fine
| | Now i am trying to get the post date by using the following regex
| | "s=""postdetail s"">(?<Date>.*) \<br />"
| |
| | It works fine too, but when i combine the two reg ex together (That is
| |
|
"viewtopic.php\ ?t=(?<ID>\d+)"" \sclass=""topic title"">(?<Titl e>.*)\</a>.*s=""
| | postdetails"">( ?<Date>.*)\<br />"", it cannot got the correct data,
could
| | anyone know how can i fix this? whatz wrong on my regex?
| |
| | Thx a lot
| | Regards,
| | Norton
| |
| |
| | (Here is the source text for reg exp to take place, the ID i want to
| capture
| | is 245090, title is "Can someone help me import a database" and the
date
| is
| | Wed Dec 08, 2004 6:31 pm)
| |
| | <tr>
| | <td class="row1" align="center" valign="middle" width="20"><img
| | src="templates/subSilver/images/folder.gif" width="19" height="18"
alt="No
| | new posts" title="No new posts" /></td>
| | <td class="row1" width="100%"><s pan class="topictit le"><a
| | href="viewtopic .php?t=245090" class="topictit le">Can someone help me
| import
| | a database</a></span><span class="gensmall "><br />
| | </span></td>
| | <td class="row2" align="center" valign="middle" ><span
| | class="postdeta ils">2</span></td>
| | <td class="row3" align="center" valign="middle" ><span class="name"><a
| |
|
href="profile.p hp?mode=viewpro file&amp;u=1542 79">savethesqui rrels</a></span>
| | </td>
| | <td class="row2" align="center" valign="middle" ><span
| | class="postdeta ils">126</span></td>
| | <td class="row3Righ t" align="center" valign="middle"
| | nowrap="nowrap" ><span class="postdeta ils">Wed Dec 08, 2004 6:31 pm<br
/><a
| | href="profile.p hp?mode=viewpro file&amp;u=1542 79">savethesqui rrels</a> <a
| | href="viewtopic .php?p=1344942# 1344942"><img
| | src="templates/subSilver/images/icon_latest_rep ly.gif" alt="View latest
| | post" title="View latest post" border="0" /></a></span></td>
| | </tr>
| |
| |
| |
|
|

Nov 21 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
2421
by: Michael McGarry | last post by:
Hi, I am horrible with Regular Expressions, can anyone recommend a book on it? Also I am trying to parse the following string to extract the number after load average. ".... load average: 0.04, 0.02, 0.01" how can I extract this number with RE or otherwise?
4
6115
by: Dwayne Epps | last post by:
I've created a function that checks form fields that only will have letters. This is the script: <script type="text/javascript" language="javascript"> function validateString(field, msg, min, max) { if (!min) { min = 1 } if (!max) { max = 65535} if (!field.value || field.value.length < min || field.value.max > max) { alert(msg);...
2
5031
by: Sehboo | last post by:
Hi, I have several regular expressions that I need to run against documents. Is it possible to combine several expressions in one expression in Regex object. So that it is faster, or will I have to use all the expressions seperately? Here are my regular expressions that check for valid email address and link Dim Expression As String =
2
2468
by: cleo | last post by:
I'm experimenting with Regular Expressions and Windows Forms. Frequently I want a value to be either a valid pattern or empty. For example a Zip code must be 5 digits or may be empty. I know that I can use the Regular Expression "\d{5}" to test for exactly 5 digits. How can I add the option for the string to be empty or must I always test...
7
3806
by: Billa | last post by:
Hi, I am replaceing a big string using different regular expressions (see some example at the end of the message). The problem is whenever I apply a "replace" it makes a new copy of string and I want to avoid that. My question here is if there is a way to pass either a memory stream or array of "find", "replace" expressions or any other way...
3
3012
by: a | last post by:
I'm a newbie needing to use some Regular Expressions in PHP. Can I safely use the results of my tests using 'The Regex Coach' (http://www.weitz.de/regex-coach/index.html) Are the Regular Expressions used in Perl identical to the Regular Expressions in PHP?
20
416
by: Asper Faner | last post by:
I seem to always have hard time understaing how this regular expression works, especially how on earth do people bring it up as part of computer programming language. Natural language processing seems not enough to explain by the way. Why no eliminate it ?
1
4366
by: Allan Ebdrup | last post by:
I have a dynamic list of regular expressions, the expressions don't change very often but they can change. And I have a single string that I want to match the regular expressions against and find the first regular expression that matches the string. I've gor the regular expressions ordered so that the highest priority is first (if two or more...
13
7470
by: Wiseman | last post by:
I'm kind of disappointed with the re regular expressions module. In particular, the lack of support for recursion ( (?R) or (?n) ) is a major drawback to me. There are so many great things that can be accomplished with regular expressions this way, such as validating a mathematical expression or parsing a language with nested parens, quoting...
0
7804
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
8156
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...
0
8310
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
1
7910
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
0
8180
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
0
6563
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
1
5681
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
5366
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3809
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.