By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
455,636 Members | 1,784 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 455,636 IT Pros & Developers. It's quick & easy.

Regular Expression HELP!

P: n/a
jn
I'm stripping out the attributes in <TD> tags...but I want to strip out
everything BUT the COLSPAN attribute.

The following strips out all attributes. What do I do if I want to keep a
certain one?

eregi_replace("<TD[^>]*>","<TD>", $string);

I suck at regular expressions. I need some help.

Thanks
Jul 17 '05 #1
Share this Question
Share on Google+
9 Replies


P: n/a
jn wrote:
I'm stripping out the attributes in <TD> tags...but I want to strip out
everything BUT the COLSPAN attribute.

The following strips out all attributes. What do I do if I want to keep a
certain one?

eregi_replace("<TD[^>]*>","<TD>", $string);


This is what I had used one time long ago:

preg_match_all('/<td(\s([^>]+)*)>/i',$subject,$attributes);
preg_match_all('/[a-z]+\s*=\s*(\'|")?([^\'"]*)\\1/i',$attributes[2][0],$attributes);
$attributes=$attributes[0];

I'm sure someone will have something better though...

--
Justin Koivisto - sp**@koivi.com
PHP POSTERS: Please use comp.lang.php for PHP related questions,
alt.php* groups are not recommended.

Jul 17 '05 #2

P: n/a
jn wrote:
I'm stripping out the attributes in <TD> tags...but I want to strip out
everything BUT the COLSPAN attribute.

The following strips out all attributes. What do I do if I want to keep a
certain one?

eregi_replace("<TD[^>]*>","<TD>", $string);


preg_replace is faster and more powerful

I tried this:
<?php

$data = '===<td a="b" colspan="3" x="y">===';

$regex = '<td([^>]*)( colspan=\S+)([^>]*)>';

$newdata = preg_replace("/$regex/i", '<td$2>', $data);

echo $newdata, "\n";

?>
The output was:
===<td colspan="3">===
HTH
--
I have a spam filter working.
To mail me include "urkxvq" (with or without the quotes)
in the subject line, or your mail will be ruthlessly discarded.
Jul 17 '05 #3

P: n/a
jn
"Pedro" <he****@hotpop.com> wrote in message
news:bo*************@ID-203069.news.uni-berlin.de...
jn wrote:
I'm stripping out the attributes in <TD> tags...but I want to strip out
everything BUT the COLSPAN attribute.

The following strips out all attributes. What do I do if I want to keep a certain one?

eregi_replace("<TD[^>]*>","<TD>", $string);


preg_replace is faster and more powerful

I tried this:
<?php

$data = '===<td a="b" colspan="3" x="y">===';

$regex = '<td([^>]*)( colspan=\S+)([^>]*)>';

$newdata = preg_replace("/$regex/i", '<td$2>', $data);

echo $newdata, "\n";

?>
The output was:
===<td colspan="3">===
HTH
--
I have a spam filter working.
To mail me include "urkxvq" (with or without the quotes)
in the subject line, or your mail will be ruthlessly discarded.


That does indeed strip out everything but the colspan! But how do I strip
out everything in TD tags that don't have the colspan at the same time?
Maybe a pattern that matches TD tags if they don't contain colspan?

I wish I knew this stuff...it's very useful.
Jul 17 '05 #4

P: n/a
jn

"Justin Koivisto" <sp**@koivi.com> wrote in message
news:Qg****************@news7.onvoy.net...
jn wrote:
I'm stripping out the attributes in <TD> tags...but I want to strip out
everything BUT the COLSPAN attribute.

The following strips out all attributes. What do I do if I want to keep a certain one?

eregi_replace("<TD[^>]*>","<TD>", $string);
This is what I had used one time long ago:

preg_match_all('/<td(\s([^>]+)*)>/i',$subject,$attributes);

preg_match_all('/[a-z]+\s*=\s*(\'|")?([^\'"]*)\\1/i',$attributes[2][0],$attr
ibutes); $attributes=$attributes[0];

I'm sure someone will have something better though...

--
Justin Koivisto - sp**@koivi.com
PHP POSTERS: Please use comp.lang.php for PHP related questions,
alt.php* groups are not recommended.


Thanks for the reply. That's pretty scary looking :)
Jul 17 '05 #5

P: n/a
> > $regex = '<td([^>]*)( colspan=\S+)([^>]*)>';

$newdata = preg_replace("/$regex/i", '<td$2>', $data);
That does indeed strip out everything but the colspan! But how do I strip
out everything in TD tags that don't have the colspan at the same time?
Maybe a pattern that matches TD tags if they don't contain colspan?
The above regex is very elegant. If you add a ? after the second regex it
will make matching the colspan optional. This can be problematic in terms
of what gets assigned to $1 and $2, so you can add ?: to those previous
patterns to suppress matching, and then use $1, which should be either the
colspan statement of null (but I haven't tested it, so I don't guarantee
it).
So the new regex would be:
$regex = '<td(?:[^>]*)( colspan=\S+)?(?:[^>]*)>';
$newdata = preg_replace("/$regex/i", '<td$1>', $data);

Another approach is to use preg_replace_callback:
http://us4.php.net/manual/en/functio...e-callback.php
I wish I knew this stuff...it's very useful. I highly recommend the book Mastering Regular Expressions, by Jeffrey
Friedl. It's very easy to ready and really gets you understand regexes.

Cheers,

Eric
"jn" <js******@cfl.rr.com> wrote in message
news:hN**********************@twister.tampabay.rr. com... "Pedro" <he****@hotpop.com> wrote in message
news:bo*************@ID-203069.news.uni-berlin.de...
jn wrote:
I'm stripping out the attributes in <TD> tags...but I want to strip out everything BUT the COLSPAN attribute.

The following strips out all attributes. What do I do if I want to
keep a certain one?

eregi_replace("<TD[^>]*>","<TD>", $string);


preg_replace is faster and more powerful

I tried this:
<?php

$data = '===<td a="b" colspan="3" x="y">===';

$regex = '<td([^>]*)( colspan=\S+)([^>]*)>';

$newdata = preg_replace("/$regex/i", '<td$2>', $data);

echo $newdata, "\n";

?>
The output was:
===<td colspan="3">===
HTH
--
I have a spam filter working.
To mail me include "urkxvq" (with or without the quotes)
in the subject line, or your mail will be ruthlessly discarded.


That does indeed strip out everything but the colspan! But how do I strip
out everything in TD tags that don't have the colspan at the same time?
Maybe a pattern that matches TD tags if they don't contain colspan?

I wish I knew this stuff...it's very useful.

Jul 17 '05 #6

P: n/a
jn

"Eric Ellsworth" <s@n> wrote in message
news:U4********************@speakeasy.net...
$regex = '<td([^>]*)( colspan=\S+)([^>]*)>';

$newdata = preg_replace("/$regex/i", '<td$2>', $data);
That does indeed strip out everything but the colspan! But how do I strip out everything in TD tags that don't have the colspan at the same time?
Maybe a pattern that matches TD tags if they don't contain colspan?


The above regex is very elegant. If you add a ? after the second regex it
will make matching the colspan optional. This can be problematic in terms
of what gets assigned to $1 and $2, so you can add ?: to those previous
patterns to suppress matching, and then use $1, which should be either the
colspan statement of null (but I haven't tested it, so I don't guarantee
it).
So the new regex would be:
$regex = '<td(?:[^>]*)( colspan=\S+)?(?:[^>]*)>';
$newdata = preg_replace("/$regex/i", '<td$1>', $data);

Another approach is to use preg_replace_callback:
http://us4.php.net/manual/en/functio...e-callback.php
I wish I knew this stuff...it's very useful.

I highly recommend the book Mastering Regular Expressions, by Jeffrey
Friedl. It's very easy to ready and really gets you understand regexes.

Cheers,

Eric
"jn" <js******@cfl.rr.com> wrote in message
news:hN**********************@twister.tampabay.rr. com...
"Pedro" <he****@hotpop.com> wrote in message
news:bo*************@ID-203069.news.uni-berlin.de...
jn wrote:
> I'm stripping out the attributes in <TD> tags...but I want to strip out > everything BUT the COLSPAN attribute.
>
> The following strips out all attributes. What do I do if I want to

keep
a
> certain one?
>
> eregi_replace("<TD[^>]*>","<TD>", $string);

preg_replace is faster and more powerful

I tried this:
<?php

$data = '===<td a="b" colspan="3" x="y">===';

$regex = '<td([^>]*)( colspan=\S+)([^>]*)>';

$newdata = preg_replace("/$regex/i", '<td$2>', $data);

echo $newdata, "\n";

?>
The output was:
===<td colspan="3">===
HTH
--
I have a spam filter working.
To mail me include "urkxvq" (with or without the quotes)
in the subject line, or your mail will be ruthlessly discarded.


That does indeed strip out everything but the colspan! But how do I

strip out everything in TD tags that don't have the colspan at the same time?
Maybe a pattern that matches TD tags if they don't contain colspan?

I wish I knew this stuff...it's very useful.



Thanks, but it stripped out everything, including the colspan. I'll try to
tinker with it and see if I can get it to work though.

Jul 17 '05 #7

P: n/a
Eric Ellsworth wrote:
So the new regex would be:
$regex = '<td(?:[^>]*)( colspan=\S+)?(?:[^>]*)>';
Maybe regex's aren't the best way to do this ... however I *had* to
manage it. Here it is for your enjoyment:

<?php
$s = ''; ### test data
$s.= 'CS ===<td a="b" color="blue" colspan="3" x="y">===' . "\n";
$s.= ' ===<td a="b" color="blue" rowspan="3" x="y">===' . "\n";
$s.= 'CS ===<td a="b" colspan="3" x="y">===' . "\n";
$s.= ' ===<td a="b" rowspan="3" x="y">===' . "\n";
$s.= 'CS ===<td colspan="3" x="y">===' . "\n";
$s.= ' ===<td rowspan="3" x="y">===' . "\n";
$s.= 'CS ===<td a="b" colspan="3">===' . "\n";
$s.= ' ===<td a="b" rowspan="3">===' . "\n";
$s.= 'CS ===<td colspan="3">===' . "\n";
$s.= ' ===<td rowspan="3">===' . "\n";
$s.= ' ===<td>===' . "\n";
$s.= ' ====== :)' . "\n";

$cs = '( colspan=[0-9\'"]+)?'; # optional " colspan=" followed by one or more digits or quotes
$ns = '(?:(?! colspan=[0-9\'"]+) \S+)*'; # zero or more, not grabbed *NOT* colspan
# ^^^------------------^ negative lookahead assertion

$regex = "<td$cs$ns$cs$ns$cs>"; # colspan can be immediately after td,
# or in the middle of the
# parameters or at the last position
$newx = preg_replace("/$regex/i", '<td$1$2$3>', $s);

echo "original:\n", $s, "\n\nchanged:\n", $newx, "\n";
?>
Another approach is to use preg_replace_callback:
http://us4.php.net/manual/en/functio...e-callback.php


And not learn the "negative lookahead assertion"? :-))
This was a very challenging challenge!

--
I have a spam filter working.
To mail me include "urkxvq" (with or without the quotes)
in the subject line, or your mail will be ruthlessly discarded.
Jul 17 '05 #8

P: n/a
jn

"Pedro" <he****@hotpop.com> wrote in message
news:bo*************@ID-203069.news.uni-berlin.de...
Eric Ellsworth wrote:
So the new regex would be:
$regex = '<td(?:[^>]*)( colspan=\S+)?(?:[^>]*)>';
Maybe regex's aren't the best way to do this ... however I *had* to
manage it. Here it is for your enjoyment:

<?php
$s = ''; ### test data
$s.= 'CS ===<td a="b" color="blue" colspan="3" x="y">===' . "\n";
$s.= ' ===<td a="b" color="blue" rowspan="3" x="y">===' . "\n";
$s.= 'CS ===<td a="b" colspan="3" x="y">===' . "\n";
$s.= ' ===<td a="b" rowspan="3" x="y">===' . "\n";
$s.= 'CS ===<td colspan="3" x="y">===' . "\n";
$s.= ' ===<td rowspan="3" x="y">===' . "\n";
$s.= 'CS ===<td a="b" colspan="3">===' . "\n";
$s.= ' ===<td a="b" rowspan="3">===' . "\n";
$s.= 'CS ===<td colspan="3">===' . "\n";
$s.= ' ===<td rowspan="3">===' . "\n";
$s.= ' ===<td>===' . "\n";
$s.= ' ====== :)' . "\n";

$cs = '( colspan=[0-9\'"]+)?'; # optional " colspan=" followed by one or

more digits or quotes $ns = '(?:(?! colspan=[0-9\'"]+) \S+)*'; # zero or more, not grabbed *NOT* colspan # ^^^------------------^ negative lookahead assertion

$regex = "<td$cs$ns$cs$ns$cs>"; # colspan can be immediately after td,
# or in the middle of the
# parameters or at the last position
$newx = preg_replace("/$regex/i", '<td$1$2$3>', $s);

echo "original:\n", $s, "\n\nchanged:\n", $newx, "\n";
?>
Another approach is to use preg_replace_callback:
http://us4.php.net/manual/en/functio...e-callback.php


And not learn the "negative lookahead assertion"? :-))
This was a very challenging challenge!

--
I have a spam filter working.
To mail me include "urkxvq" (with or without the quotes)
in the subject line, or your mail will be ruthlessly discarded.


That was interesting :)

What I'm really doing is pasting from Excel into an "HTML Area" (
www.interactivetools.com). It's like a text area, but it's a little WYSIWYG
editor for content management systems. I'm stripping out all of the style
garbage Excel puts in its code, and replacing it with cleaned code. It works
great now, but I can't get it to preserve colspans because those get
stripped too.

I'll try some more things. Maybe I'll get it to work :)

Thanks guys
Jul 17 '05 #9

P: n/a
"jn" <js******@cfl.rr.com> wrote in message news:<TU**********************@twister.tampabay.rr .com>...
"Pedro" <he****@hotpop.com> wrote in message
news:bo*************@ID-203069.news.uni-berlin.de...
Eric Ellsworth wrote:
So the new regex would be:
$regex = '<td(?:[^>]*)( colspan=\S+)?(?:[^>]*)>';

What I'm really doing is pasting from Excel into an "HTML Area" (
www.interactivetools.com). It's like a text area, but it's a little WYSIWYG
editor for content management systems. I'm stripping out all of the style
garbage Excel puts in its code, and replacing it with cleaned code. It works
great now, but I can't get it to preserve colspans because those get
stripped too.

I'll try some more things. Maybe I'll get it to work :)


Try http://weitz.de/regex-coach

---
"Learn from yesterday, live for today, hope for tomorrow. The
important thing is to not stop questioning."---Albert Einstein
Email: rrjanbiah-at-Y!com
Jul 17 '05 #10

This discussion thread is closed

Replies have been disabled for this discussion.