473,394 Members | 1,935 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,394 software developers and data experts.

Regex for HTML

Hi all.
I have this database table (inherited from an legacy application) that
contains some information that I want to extract.
Basically, in one of the tables, there's a column containing a description
that starts with a NUMBER, but can be preceeded by some raw html elements.
Examples:
ex1:
<p>12 this is the first item ....
ex2:
<p>12. this is the first item ....
ex3:
<span id="my id" style="width:3" ><p>12. this is the first item ....
ex4:
12. this is the first item ....

I'm trying to extract the Number ("12" in all above examples)

The closest I got was when I tried the following regular expression pattern
:
string pattern = @"(<\w*>)*(?<digit>(\d+)).+";

It didn't match put the number in the right match group (= digit). I'm
still new to Regex.

Has anybody came accross any similar situation ?

thnks a bunch

TJ !
Nov 13 '05 #1
1 2167
"TJoker .NET" <no****@nonono.no> wrote in
news:eZ**************@tk2msftngp13.phx.gbl:
Hi all.
I have this database table (inherited from an legacy application) that
contains some information that I want to extract.
Basically, in one of the tables, there's a column containing a
description that starts with a NUMBER, but can be preceeded by some
raw html elements. Examples:
ex1:
<p>12 this is the first item ....
ex2:
<p>12. this is the first item ....
ex3:
<span id="my id" style="width:3" ><p>12. this is the first item ....
ex4:
12. this is the first item ....

I'm trying to extract the Number ("12" in all above examples)

The closest I got was when I tried the following regular expression
pattern
:
string pattern = @"(<\w*>)*(?<digit>(\d+)).+";

It didn't match put the number in the right match group (= digit).
I'm still new to Regex.


hmm I'd try the following .NET regular expression:

"(<[^>]+>)*(?<digit>\d+)[^\d]"
0 or more tags where a tag is defined as starting with '<' followed by at
least 1 character not a '>' followed by a '>'.

followed by a string consisting of all the digits (at least 1) up to but
not including the 1st non digit. This could be a problem if it is
possible for the number to be the last thing on the line. It will work if
there are always characters that follow the number.

Mike
Nov 15 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Alan Pretre | last post by:
Can anyone help me figure out a regex pattern for the following input example: xxx:a=b,c=d,yyy:e=f,zzz:www:g=h,i=j,l=m I would want four matches from this: 1. xxx a=b,c=d 2. yyy e=f 3....
1
by: kevin | last post by:
I am trying to strip the outermost html tag by capturing this tag with regex and then using the string replace function to replace it with an empty string. while stepping through the code, RegEx...
1
by: George Durzi | last post by:
Consider this excerpt from some HTML. (This is a copy from View->Source, except for the comment) <TABLE WIDTH=100% CELLPADDING=0 CELLSPACING=0 border=0> <?xml version="1.0" encoding="UTF-16"?>...
17
by: clintonG | last post by:
I'm using an .aspx tool I found at but as nice as the interface is I think I need to consider using others. Some can generate C# I understand. Your preferences please... <%= Clinton Gallagher ...
7
by: Mike Labosh | last post by:
I have the following System.Text.RegularExpressions.Regex that is supposed to remove this predefined list of garbage characters from contact names that come in on import files : Dim...
5
by: Digital.Rebel.18 | last post by:
I'm trying to figure out how to extract the keywords from an HTML document. The input string would typically look like: <meta name='keywords' content='word1, more stuff, etc'> Either single...
9
by: jmchadha | last post by:
I have got the following html: "something in html ... etc.. city1... etc... <a class="font1" href="city1.html" onclick="etc."click for <b>info</bon city1 </a> ... some html. city1.. can repeat...
8
by: Xah Lee | last post by:
the Python regex documentation is available at: http://xahlee.org/perl-python/python_re-write/lib/module-re.html Note that, i've just made the terms of use clear. Also, can anyone answer what...
1
by: jonnyboy6969 | last post by:
Hi All Really hoping someone can help me out here with my deficient regex skills :) I have a function which takes a string of HTML and replaces a term (word or phrase) with a link. The pupose...
1
by: Karch | last post by:
If you run this: string result = "<html><head></head><body>The body</body></html>"; result = retainBody.Replace(result, "$1"); With the following Regex: private static readonly Regex...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.