By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
432,257 Members | 928 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 432,257 IT Pros & Developers. It's quick & easy.

How can I remove tags which have no attributes?

P: n/a
I have a large HTML document. It has hundreds of <span>s which
have no attributes so these <span>s are redundant.

How can I remove these tags automatically?

The document also has <span>s with style attributes that I don't
want to remove.

Jul 24 '05 #1
Share this Question
Share on Google+
12 Replies


P: n/a
Oberon <ob****@solstice.com> wrote:
I have a large HTML document. It has hundreds of <span>s which
have no attributes so these <span>s are redundant.
Non semantic elements without attributes may still be used since they
can be styled by targeting them via css selectors.
How can I remove these tags automatically?

The document also has <span>s with style attributes that I don't
want to remove.


Do they contain content and/or other tags? If they contain other tags,
could that include other spans?

--
Spartanicus
Jul 24 '05 #2

P: n/a
On Sat, 28 May 2005 18:09:58 GMT, Spartanicus
<in*****@invalid.invalid> wrote:
Oberon <ob****@solstice.com> wrote:
I have a large HTML document. It has hundreds of <span>s which
have no attributes so these <span>s are redundant.


Non semantic elements without attributes may still be used since they
can be styled by targeting them via css selectors.


The <span>s have no styles attached to them. They have no
attributes at all. I usually attach styles to elements such as
<td>, <p> and <body>, etc. and have only a few styles with a
document. I don't need to attach a style to <span> itself
because that would be redundant (I already have my CSS referring
to <td>, <p>, <body>, etc.

I know that's not what I'm supposed to do but it's what I prefer
to do.
How can I remove these tags automatically?

The document also has <span>s with style attributes that I don't
want to remove.


Do they contain content and/or other tags? If they contain other tags,
could that include other spans?


All of them have content. I can use Dreamweaver's Clean up HTML
command to remove empty tags.

Some of these <span> tags are nested but I don't mind losing
some of the formatting (which has been applied to nested <span>
tags with style attributes, if it involves getting rid of the
worthless spans.

Is there any editor that has a command to just remove specific
attribute-less tags?

These huge documents, with at least 40 Kbyte of content are
often produced by MS Word. I apply Dreamweaver's Clean Up Word
HTML command, then do a fair amount of editing to remove nearly
all embedded styles (replacing them with styles applied to
specific tags. After doing that, 95%+ of the <span> tags have no
attributes and are redundant. The <span> tags I want to keep
will either enclose symbols or special formatting.

I know I should really save the Word file as text, load it into
a blank HTML file and reformat it by hand, but some of that
reformatting can be tricky. (replacing symbols with the
appropriate character entity. So it would be nice if there was
an alternative.

Jul 24 '05 #3

P: n/a
Spartanicus wrote:
Oberon <ob****@solstice.com> wrote:

I have a large HTML document. It has hundreds of <span>s which
have no attributes so these <span>s are redundant.
Non semantic elements without attributes may still be used


Indeed. But wanting to remove them is not unreasonable (unless
taken to obsession).

since they can be styled by targeting them via css selectors.


But that would be such bad practice as to merit summary removal.
How can I remove these tags automatically?

The document also has <span>s with style attributes that I don't
want to remove.


Do they contain content and/or other tags? If they contain other tags,
could that include other spans?


If they do (or may do), then you'd need context information to remove
the unwanted ones. So build a DOM and strip them.

If they're never nested, then you can just strip them with a regexp,
or on the fly with a SAX parser such as mod_publisher.

--
Nick Kew
Jul 24 '05 #4

P: n/a
Oberon <ob****@solstice.com> wrote:
I have a large HTML document. It has hundreds of <span>s which
have no attributes so these <span>s are redundant.How can I remove these tags automatically?

Is there any editor that has a command to just remove specific
attribute-less tags?


Unlikely. To do so successfully you'd need a tool that understands sgml
& html, a regexp S&R doesn't cut it. Perl has a html parser module that
could serve that purpose afaik.
These huge documents, with at least 40 Kbyte of content are
often produced by MS Word. I apply Dreamweaver's Clean Up Word
HTML command, then do a fair amount of editing to remove nearly
all embedded styles (replacing them with styles applied to
specific tags. After doing that, 95%+ of the <span> tags have no
attributes and are redundant. The <span> tags I want to keep
will either enclose symbols or special formatting.


Have you tried using HTML Tidy's Word clean up routine instead of the
above procedure?

--
Spartanicus
Jul 24 '05 #5

P: n/a
Nick Kew <ni**@asgard.webthing.com> wrote:
Non semantic elements without attributes may still be used since they
can be styled by targeting them via css selectors.


But that would be such bad practice as to merit summary removal.


Nonsense.

--
Spartanicus
Jul 24 '05 #6

P: n/a
Tim
On Sat, 28 May 2005 17:31:51 +0000, Oberon wrote:
I have a large HTML document. It has hundreds of <span>s which have no
attributes so these <span>s are redundant.

How can I remove these tags automatically?

The document also has <span>s with style attributes that I don't want to
remove.


I just did a little experiment with a text editor and the HTML tidy
program. I made a small test HTML file with some <span>bogus</span>
contents splattered throughout it. Used the search and replace option in
the text editor to replace all <span> opening tags with nothing, then ran
HTML tidy on it. It stripped out the erroneous closing </span> tags.

Try that on a test file.

--
If you insist on e-mailing me, use the reply-to address (it's real but
temporary). But please reply to the group, like you're supposed to.

This message was sent without a virus, please delete some files yourself.

Jul 24 '05 #7

P: n/a
Tim
On Sat, 28 May 2005 17:31:51 +0000, Oberon wrote:
I have a large HTML document. It has hundreds of <span>s which have no
attributes so these <span>s are redundant.

How can I remove these tags automatically?

The document also has <span>s with style attributes that I don't want to
remove.


I just did a little experiment with a text editor and the HTML tidy
program. I made a small test HTML file with some <span>bogus</span>
contents splattered throughout it. Used the search and replace option in
the text editor to replace all <span> opening tags with nothing, then ran
HTML tidy on it. It stripped out the erroneous closing </span> tags.

Try that on a test file.

--
If you insist on e-mailing me, use the reply-to address (it's real but
temporary). But please reply to the group, like you're supposed to.

This message was sent without a virus, please delete some files yourself.

Jul 24 '05 #8

P: n/a
Oberon wrote:
I have a large HTML document. It has hundreds of <span>s which
have no attributes so these <span>s are redundant.

How can I remove these tags automatically?

Use a macro in your editor. Have it search for "<span>" and delete it;
then search for "</span>" and delete it. Repeat as required.
This works if there are no nested spans.
Or use the HTMLTidy method as Tim described.

--
jmm dash list (at) sohnen-moe (dot) com
(Remove .AXSPAMGN for email)
Jul 24 '05 #9

P: n/a
Oberon wrote:
I have a large HTML document. It has hundreds of <span>s which
have no attributes so these <span>s are redundant.

How can I remove these tags automatically?

Use a macro in your editor. Have it search for "<span>" and delete it;
then search for "</span>" and delete it. Repeat as required.
This works if there are no nested spans.
Or use the HTMLTidy method as Tim described.

--
jmm dash list (at) sohnen-moe (dot) com
(Remove .AXSPAMGN for email)
Jul 24 '05 #10

P: n/a
JRS: In article <pa****************************@mail.localhost.inv alid>
, dated Sun, 29 May 2005 15:59:17, seen in news:comp.infosystems.www.aut
horing.html, Tim <ti*@mail.localhost.invalid> posted :

I just did a little experiment with a text editor and the HTML tidy
program. I made a small test HTML file with some <span>bogus</span>
contents splattered throughout it. Used the search and replace option in
the text editor to replace all <span> opening tags with nothing, then ran
HTML tidy on it. It stripped out the erroneous closing </span> tags.

If you have

<span A> aaa <span> bbb <span B> ccc </span> ddd </span> eee </span>

in which A and B are useful, then, after you have removed the <span>
between aaa & bbb, how can TIDY possibly tell that it is the </span>
between ddd & eee that should be removed, and not the final one?

ISTM better to use something like MiniTrue or SED to remove the tags
from each detectable instance of
<span> *something*not*including*<span>*for*sure* </span>
which with any luck will remove the great majority, and should do no
harm. To allow *everything*not*including*<span>*for*sure may be
difficult; but to allow *everything*not*including*<*for*sure* may catch
a sufficient proportion to be useful.

Alternatively, one might write a program to do it in a general high-
level language program, one that tracks the nesting level so that it can
remove the correct one.

Caveat : ISTM that CSS might, in some version, allow styling <span> with
added whitespace. If that should be, removing <span> from bbba<span>bbb
could make a visible difference.

--
John Stockton, Surrey, UK. ?@merlyn.demon.co.uk Turnpike v4.00 MIME.
Web <URL:http://www.merlyn.demon.co.uk/> - FAQish topics, acronyms, & links.
I find MiniTrue useful for viewing/searching/altering files, at a DOS prompt;
free, DOS/Win/UNIX, <URL:http://www.idiotsdelight.net/minitrue/> Update hope?
Jul 24 '05 #11

P: n/a
JRS: In article <pa****************************@mail.localhost.inv alid>
, dated Sun, 29 May 2005 15:59:17, seen in news:comp.infosystems.www.aut
horing.html, Tim <ti*@mail.localhost.invalid> posted :

I just did a little experiment with a text editor and the HTML tidy
program. I made a small test HTML file with some <span>bogus</span>
contents splattered throughout it. Used the search and replace option in
the text editor to replace all <span> opening tags with nothing, then ran
HTML tidy on it. It stripped out the erroneous closing </span> tags.

If you have

<span A> aaa <span> bbb <span B> ccc </span> ddd </span> eee </span>

in which A and B are useful, then, after you have removed the <span>
between aaa & bbb, how can TIDY possibly tell that it is the </span>
between ddd & eee that should be removed, and not the final one?

ISTM better to use something like MiniTrue or SED to remove the tags
from each detectable instance of
<span> *something*not*including*<span>*for*sure* </span>
which with any luck will remove the great majority, and should do no
harm. To allow *everything*not*including*<span>*for*sure may be
difficult; but to allow *everything*not*including*<*for*sure* may catch
a sufficient proportion to be useful.

Alternatively, one might write a program to do it in a general high-
level language program, one that tracks the nesting level so that it can
remove the correct one.

Caveat : ISTM that CSS might, in some version, allow styling <span> with
added whitespace. If that should be, removing <span> from bbba<span>bbb
could make a visible difference.

--
John Stockton, Surrey, UK. ?@merlyn.demon.co.uk Turnpike v4.00 MIME.
Web <URL:http://www.merlyn.demon.co.uk/> - FAQish topics, acronyms, & links.
I find MiniTrue useful for viewing/searching/altering files, at a DOS prompt;
free, DOS/Win/UNIX, <URL:http://www.idiotsdelight.net/minitrue/> Update hope?
Jul 24 '05 #12

P: n/a
Tim
Tim <ti*@mail.localhost.invalid> posted:
I just did a little experiment with a text editor and the HTML tidy
program. I made a small test HTML file with some <span>bogus</span>
contents splattered throughout it. Used the search and replace option in
the text editor to replace all <span> opening tags with nothing, then ran
HTML tidy on it. It stripped out the erroneous closing </span> tags.

Dr John Stockton <jr*@merlyn.demon.co.uk> posted:
If you have

<span A> aaa <span> bbb <span B> ccc </span> ddd </span> eee </span>

in which A and B are useful, then, after you have removed the <span>
between aaa & bbb, how can TIDY possibly tell that it is the </span>
between ddd & eee that should be removed, and not the final one?


Well, I did say do a test. It will depend on the data that you're working
with... Of course, if you have nested spans you're going to need something
smarter.

--
If you insist on e-mailing me, use the reply-to address (it's real but
temporary). But please reply to the group, like you're supposed to.

This message was sent without a virus, please delete some files yourself.
Jul 24 '05 #13

This discussion thread is closed

Replies have been disabled for this discussion.