473,399 Members | 3,038 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,399 software developers and data experts.

About Regular Expressions

Hi,

I am learning Regular Expression and currently i am trying to capture
information from web page.
I wrote the following code to capture the ID as well as the Title

Dim regex = New Regex( _
"viewtopic.php\?t=(?<ID>\d+)""\sclass=""topictitle "">(?<Title>.*)\</a>",
_
RegexOptions.IgnoreCase _
Or RegexOptions.Multiline _
Or RegexOptions.Compiled _
)

It works fine
Now i am trying to get the post date by using the following regex
"s=""postdetails"">(?<Date>.*)\<br />"

It works fine too, but when i combine the two reg ex together (That is
"viewtopic.php\?t=(?<ID>\d+)""\sclass=""topictitle "">(?<Title>.*)\</a>.*s=""
postdetails"">(?<Date>.*)\<br />"", it cannot got the correct data, could
anyone know how can i fix this? whatz wrong on my regex?

Thx a lot
Regards,
Norton
(Here is the source text for reg exp to take place, the ID i want to capture
is 245090, title is "Can someone help me import a database" and the date is
Wed Dec 08, 2004 6:31 pm)

<tr>
<td class="row1" align="center" valign="middle" width="20"><img
src="templates/subSilver/images/folder.gif" width="19" height="18" alt="No
new posts" title="No new posts" /></td>
<td class="row1" width="100%"><span class="topictitle"><a
href="viewtopic.php?t=245090" class="topictitle">Can someone help me import
a database</a></span><span class="gensmall"><br />
</span></td>
<td class="row2" align="center" valign="middle"><span
class="postdetails">2</span></td>
<td class="row3" align="center" valign="middle"><span class="name"><a
href="profile.php?mode=viewprofile&amp;u=154279">s avethesquirrels</a></span>
</td>
<td class="row2" align="center" valign="middle"><span
class="postdetails">126</span></td>
<td class="row3Right" align="center" valign="middle"
nowrap="nowrap"><span class="postdetails">Wed Dec 08, 2004 6:31 pm<br /><a
href="profile.php?mode=viewprofile&amp;u=154279">s avethesquirrels</a> <a
href="viewtopic.php?p=1344942#1344942"><img
src="templates/subSilver/images/icon_latest_reply.gif" alt="View latest
post" title="View latest post" border="0" /></a></span></td>
</tr>
Nov 21 '05 #1
2 2029
Hi Norton,

I'll look more into the issue. I don't have an answer yet what is wrong,
but I can reproduce your issue.
So far I can advise only to build more specific regular expressions
minimizing ".*" usage. I suspect this is the root of the problem. As soon
as I get more information, I'll get back to you.

--
Regards,
Victor Urnyshev

This posting is provided "AS IS" with no warranties, and confers no rights.

--------------------
| From: "norton" <no********@hotmail.com>
| Subject: About Regular Expressions
| Date: Fri, 10 Dec 2004 01:15:23 +0800
| Lines: 56
| X-Priority: 3
| X-MSMail-Priority: Normal
| X-Newsreader: Microsoft Outlook Express 6.00.3790.181
| X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.181
| Message-ID: <uX**************@TK2MSFTNGP15.phx.gbl>
| Newsgroups: microsoft.public.dotnet.languages.vb
| NNTP-Posting-Host: 210006240035.ctinets.com 210.6.240.35
| Path:
cpmsftngxa10.phx.gbl!TK2MSFTNGXA01.phx.gbl!TK2MSFT NGP08.phx.gbl!TK2MSFTNGP15
phx.gbl
| Xref: cpmsftngxa10.phx.gbl microsoft.public.dotnet.languages.vb:247781
| X-Tomcat-NG: microsoft.public.dotnet.languages.vb
|
| Hi,
|
| I am learning Regular Expression and currently i am trying to capture
| information from web page.
| I wrote the following code to capture the ID as well as the Title
|
| Dim regex = New Regex( _
|
"viewtopic.php\?t=(?<ID>\d+)""\sclass=""topictitle "">(?<Title>.*)\</a>",
| _
| RegexOptions.IgnoreCase _
| Or RegexOptions.Multiline _
| Or RegexOptions.Compiled _
| )
|
| It works fine
| Now i am trying to get the post date by using the following regex
| "s=""postdetails"">(?<Date>.*)\<br />"
|
| It works fine too, but when i combine the two reg ex together (That is
|
"viewtopic.php\?t=(?<ID>\d+)""\sclass=""topictitle "">(?<Title>.*)\</a>.*s=""
| postdetails"">(?<Date>.*)\<br />"", it cannot got the correct data, could
| anyone know how can i fix this? whatz wrong on my regex?
|
| Thx a lot
| Regards,
| Norton
|
|
| (Here is the source text for reg exp to take place, the ID i want to
capture
| is 245090, title is "Can someone help me import a database" and the date
is
| Wed Dec 08, 2004 6:31 pm)
|
| <tr>
| <td class="row1" align="center" valign="middle" width="20"><img
| src="templates/subSilver/images/folder.gif" width="19" height="18" alt="No
| new posts" title="No new posts" /></td>
| <td class="row1" width="100%"><span class="topictitle"><a
| href="viewtopic.php?t=245090" class="topictitle">Can someone help me
import
| a database</a></span><span class="gensmall"><br />
| </span></td>
| <td class="row2" align="center" valign="middle"><span
| class="postdetails">2</span></td>
| <td class="row3" align="center" valign="middle"><span class="name"><a
|
href="profile.php?mode=viewprofile&amp;u=154279">s avethesquirrels</a></span>
| </td>
| <td class="row2" align="center" valign="middle"><span
| class="postdetails">126</span></td>
| <td class="row3Right" align="center" valign="middle"
| nowrap="nowrap"><span class="postdetails">Wed Dec 08, 2004 6:31 pm<br /><a
| href="profile.php?mode=viewprofile&amp;u=154279">s avethesquirrels</a> <a
| href="viewtopic.php?p=1344942#1344942"><img
| src="templates/subSilver/images/icon_latest_reply.gif" alt="View latest
| post" title="View latest post" border="0" /></a></span></td>
| </tr>
|
|
|

Nov 21 '05 #2
Hi Norton,

This seems to be a bug in RegEx implementation. The bug is passed to our
Devs. I cannot promise anything at this point, but we will try to address
it in the future releases.

--
Regards,
Victor Urnyshev

This posting is provided "AS IS" with no warranties, and confers no rights.

--------------------
| X-Tomcat-ID: 294753620
| References: <uX**************@TK2MSFTNGP15.phx.gbl>
| MIME-Version: 1.0
| Content-Type: text/plain
| Content-Transfer-Encoding: 7bit
| From: vi*****@online.microsoft.com (Victor Urnyshev [MSFT])
| Organization: Microsoft
| Date: Mon, 13 Dec 2004 14:08:46 GMT
| Subject: RE: About Regular Expressions
| X-Tomcat-NG: microsoft.public.dotnet.languages.vb
| Message-ID: <wQ**************@cpmsftngxa10.phx.gbl>
| Newsgroups: microsoft.public.dotnet.languages.vb
| Lines: 86
| Path: cpmsftngxa10.phx.gbl
| Xref: cpmsftngxa10.phx.gbl microsoft.public.dotnet.languages.vb:248290
| NNTP-Posting-Host: TOMCATIMPORT1 10.201.218.122
|
| Hi Norton,
|
| I'll look more into the issue. I don't have an answer yet what is wrong,
| but I can reproduce your issue.
| So far I can advise only to build more specific regular expressions
| minimizing ".*" usage. I suspect this is the root of the problem. As soon
| as I get more information, I'll get back to you.
|
| --
| Regards,
| Victor Urnyshev
|
| This posting is provided "AS IS" with no warranties, and confers no
rights.
|
| --------------------
| | From: "norton" <no********@hotmail.com>
| | Subject: About Regular Expressions
| | Date: Fri, 10 Dec 2004 01:15:23 +0800
| | Lines: 56
| | X-Priority: 3
| | X-MSMail-Priority: Normal
| | X-Newsreader: Microsoft Outlook Express 6.00.3790.181
| | X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.181
| | Message-ID: <uX**************@TK2MSFTNGP15.phx.gbl>
| | Newsgroups: microsoft.public.dotnet.languages.vb
| | NNTP-Posting-Host: 210006240035.ctinets.com 210.6.240.35
| | Path:
|
cpmsftngxa10.phx.gbl!TK2MSFTNGXA01.phx.gbl!TK2MSFT NGP08.phx.gbl!TK2MSFTNGP15
| .phx.gbl
| | Xref: cpmsftngxa10.phx.gbl microsoft.public.dotnet.languages.vb:247781
| | X-Tomcat-NG: microsoft.public.dotnet.languages.vb
| |
| | Hi,
| |
| | I am learning Regular Expression and currently i am trying to capture
| | information from web page.
| | I wrote the following code to capture the ID as well as the Title
| |
| | Dim regex = New Regex( _
| |
| "viewtopic.php\?t=(?<ID>\d+)""\sclass=""topictitle "">(?<Title>.*)\</a>",
| | _
| | RegexOptions.IgnoreCase _
| | Or RegexOptions.Multiline _
| | Or RegexOptions.Compiled _
| | )
| |
| | It works fine
| | Now i am trying to get the post date by using the following regex
| | "s=""postdetails"">(?<Date>.*)\<br />"
| |
| | It works fine too, but when i combine the two reg ex together (That is
| |
|
"viewtopic.php\?t=(?<ID>\d+)""\sclass=""topictitle "">(?<Title>.*)\</a>.*s=""
| | postdetails"">(?<Date>.*)\<br />"", it cannot got the correct data,
could
| | anyone know how can i fix this? whatz wrong on my regex?
| |
| | Thx a lot
| | Regards,
| | Norton
| |
| |
| | (Here is the source text for reg exp to take place, the ID i want to
| capture
| | is 245090, title is "Can someone help me import a database" and the
date
| is
| | Wed Dec 08, 2004 6:31 pm)
| |
| | <tr>
| | <td class="row1" align="center" valign="middle" width="20"><img
| | src="templates/subSilver/images/folder.gif" width="19" height="18"
alt="No
| | new posts" title="No new posts" /></td>
| | <td class="row1" width="100%"><span class="topictitle"><a
| | href="viewtopic.php?t=245090" class="topictitle">Can someone help me
| import
| | a database</a></span><span class="gensmall"><br />
| | </span></td>
| | <td class="row2" align="center" valign="middle"><span
| | class="postdetails">2</span></td>
| | <td class="row3" align="center" valign="middle"><span class="name"><a
| |
|
href="profile.php?mode=viewprofile&amp;u=154279">s avethesquirrels</a></span>
| | </td>
| | <td class="row2" align="center" valign="middle"><span
| | class="postdetails">126</span></td>
| | <td class="row3Right" align="center" valign="middle"
| | nowrap="nowrap"><span class="postdetails">Wed Dec 08, 2004 6:31 pm<br
/><a
| | href="profile.php?mode=viewprofile&amp;u=154279">s avethesquirrels</a> <a
| | href="viewtopic.php?p=1344942#1344942"><img
| | src="templates/subSilver/images/icon_latest_reply.gif" alt="View latest
| | post" title="View latest post" border="0" /></a></span></td>
| | </tr>
| |
| |
| |
|
|

Nov 21 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: Michael McGarry | last post by:
Hi, I am horrible with Regular Expressions, can anyone recommend a book on it? Also I am trying to parse the following string to extract the number after load average. ".... load average:...
4
by: Dwayne Epps | last post by:
I've created a function that checks form fields that only will have letters. This is the script: <script type="text/javascript" language="javascript"> function validateString(field, msg, min,...
2
by: Sehboo | last post by:
Hi, I have several regular expressions that I need to run against documents. Is it possible to combine several expressions in one expression in Regex object. So that it is faster, or will I...
2
by: cleo | last post by:
I'm experimenting with Regular Expressions and Windows Forms. Frequently I want a value to be either a valid pattern or empty. For example a Zip code must be 5 digits or may be empty. I know that...
7
by: Billa | last post by:
Hi, I am replaceing a big string using different regular expressions (see some example at the end of the message). The problem is whenever I apply a "replace" it makes a new copy of string and I...
3
by: a | last post by:
I'm a newbie needing to use some Regular Expressions in PHP. Can I safely use the results of my tests using 'The Regex Coach' (http://www.weitz.de/regex-coach/index.html) Are the Regular...
20
by: Asper Faner | last post by:
I seem to always have hard time understaing how this regular expression works, especially how on earth do people bring it up as part of computer programming language. Natural language processing...
1
by: Allan Ebdrup | last post by:
I have a dynamic list of regular expressions, the expressions don't change very often but they can change. And I have a single string that I want to match the regular expressions against and find...
13
by: Wiseman | last post by:
I'm kind of disappointed with the re regular expressions module. In particular, the lack of support for recursion ( (?R) or (?n) ) is a major drawback to me. There are so many great things that can...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.