By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
438,722 Members | 1,874 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 438,722 IT Pros & Developers. It's quick & easy.

Regular express for <p>, <ul> and <ol> tags

P: n/a
Hi,
I am parsing an .HTML file that contains following example code:
<div>
<p class="html_preformatted" awml:style="HTML Preformatted"
dir="ltr" style="text-align:left"><span style="font-size:12pt;font-
family:'Arial'" xml:lang="en-US" lang="en-US">Normal Text Arial 12
Black before bullets.</span></p>
<ul>
<li class="html_preformatted" dir="ltr" style="text-
align:left">&nbsp;<span style="font-size:12pt;font-family:'Arial'"
xml:lang="en-US" lang="en-US">Bullet1: If you want to convert bitmap
images Single Line.</span></li>

<li class="html_preformatted" dir="ltr" style="text-
align:left">&nbsp;<span style="font-size:12pt;font-family:'Arial'"
xml:lang="en-US" lang="en-US">Bullet2: D you want to convert </
span><span style="font-weight:bold;font-size:13pt;font-family:'Times
New Roman';color:#ff0000" xml:lang="en-US" lang="en-US">Times New
Roman Bold Red 13</span><span style="font-size:12pt;font-
family:'Arial'" xml:lang="en-US" lang="en-US"like BMP, JPG?</span></
li>
<li class="html_preformatted" dir="ltr" style="text-
align:left">&nbsp;<span style="font-weight:bold;font-size:12pt;font-
family:'Arial'" xml:lang="en-US" lang="en-US">Bullet3 bold:</
span><span style="font-size:12pt;font-family:'Arial'" xml:lang="en-US"
lang="en-US"If you want to convert bitmap images like BMP, JPG</
span></li>
<li class="html_preformatted" dir="ltr" style="text-
align:left">&nbsp;<span style="font-weight:bold;font-size:14pt;font-
family:'Arial'" xml:lang="en-US" lang="en-US">Bullet4 bold 14: </
span><span style="font-size:14pt;font-family:'Arial'" xml:lang="en-US"
lang="en-US">If you want to convert bitmap images like BMP, JPG 2
lines.</span></li>
<li class="html_preformatted" dir="ltr" style="text-
align:left">&nbsp;<span style="font-weight:bold;font-size:16pt;font-
family:'Arial';color:#ff0000" xml:lang="en-US" lang="en-US">Bullet4
bold 14 all Red: </span><span style="font-size:16pt;font-
family:'Arial';color:#ff0000" xml:lang="en-US" lang="en-US">If you
want to convert bitmap images like BMP, JPG.</span></li>

<li class="html_preformatted" dir="ltr" style="text-
align:left">&nbsp;<span style="font-weight:bold;font-size:14pt;font-
family:'Arial'" xml:lang="en-US" lang="en-US">Bullet4 bold 14 Black:
</
span><span style="font-size:14pt;font-family:'Arial';color:#0000ff"
xml:lang="en-US" lang="en-US">Blue If you want to convert bitmap. </
span><span style="font-size:16pt;font-family:'Arial';color:#008000"
xml:lang="en-US" lang="en-US">Green 16 images like BMP, JPG.</span>
</li>
</ul>
<p class="html_preformatted" awml:style="HTML Preformatted"
dir="ltr" style="text-align:left"><span style="font-size:14pt;font-
family:'Arial';color:#ff0000" xml:lang="en-US" lang="en-US">Normal
Text Red Arial 14 after bullets.</span></p>
<p class="html_preformatted" awml:style="HTML Preformatted"
dir="ltr" style="text-align:left;margin-left:0.2500in"><span
style="font-weight:bold;font-size:14pt;font-family:'Arial'"
xml:lang="en-US" lang="en-US">&nbsp;</span></p>
<p dir="ltr" style="text-align:left"></p>
<p></p>
</div>

I am trying to parse all the <p>, <oland <ultags but couldn't
succeed yet.
I am trying following Regular Expression(RE):
"(<[pP][^>]*>(.*)</[pP]>)|(<[oO][lL][^>]+>(.*)</[oO][lL]>)|(<[uU][lL]
[^>]+>(.*)</[uU][lL]>)"

I am using preg_match_all(). Remember I am working in PHP.
If any one can help me, I will be very grateful to him/her. I need its
solution urgent.
Aug 26 '08 #1
Share this Question
Share on Google+
2 Replies


P: n/a
..oO(Shahid)
>I am parsing an .HTML file that contains following example code:
<div>
<p class="html_preformatted" awml:style="HTML Preformatted"
dir="ltr" style="text-align:left"><span style="font-size:12pt;font-
family:'Arial'" xml:lang="en-US" lang="en-US">Normal Text Arial 12
Black before bullets.</span></p>
<ul>
[...]

I am trying to parse all the <p>, <oland <ultags but couldn't
succeed yet.
I am trying following Regular Expression(RE):
"(<[pP][^>]*>(.*)</[pP]>)|(<[oO][lL][^>]+>(.*)</[oO][lL]>)|(<[uU][lL]
[^>]+>(.*)</[uU][lL]>)"

I am using preg_match_all(). Remember I am working in PHP.
If any one can help me, I will be very grateful to him/her. I need its
solution urgent.
Why don't you use the DOM with an XPath expression?

Micha
Aug 26 '08 #2

P: n/a
Shahid wrote:
Hi,
I am parsing an .HTML file that contains following example code:
[snip]

I am trying to parse all the <p>, <oland <ultags but couldn't
succeed yet.
I am trying following Regular Expression(RE):
"(<[pP][^>]*>(.*)</[pP]>)|(<[oO][lL][^>]+>(.*)</[oO][lL]>)|(<[uU][lL]
[^>]+>(.*)</[uU][lL]>)"

I am using preg_match_all(). Remember I am working in PHP.
If any one can help me, I will be very grateful to him/her. I need its
solution urgent.
Have you bothered checking php.net's docs? Their page for
preg_match_all has an example regex doing what you want.

--
Curtis
Aug 26 '08 #3

This discussion thread is closed

Replies have been disabled for this discussion.