By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
437,724 Members | 1,655 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 437,724 IT Pros & Developers. It's quick & easy.

a commandline tool to drop css and javascript?

P: n/a
Hello. I am looking for a commandline tool to take an html document (or
html document segment, a.k.a. without beginign
"<html><head>..</head><body>") and process it by removing all css style
settings and javascripts, and output a clean html/xhtml.

Optionally, it would be nice if this tool can take an
acceptable tag list and remove all tags not in this list.

I need such a tool to process a lot of static html document I am working
on. Do you happen to know such a tool? I am still googling around ;) I
tried tidy but there seems not to be an option to remove css.

Thanks a lot!
Jan 28 '07 #1
Share this Question
Share on Google+
2 Replies


P: n/a
Gazing into my crystal ball I observed Zhang Weiwu
<zh********@realss.comwriting in news:41lt84-nrt2.ln1
@exupery.realss.com:
Hello. I am looking for a commandline tool to take an html document
(or
html document segment, a.k.a. without beginign
"<html><head>..</head><body>") and process it by removing all css
style
settings and javascripts, and output a clean html/xhtml.

Optionally, it would be nice if this tool can take an
acceptable tag list and remove all tags not in this list.

I need such a tool to process a lot of static html document I am
working
on. Do you happen to know such a tool? I am still googling around ;) I
tried tidy but there seems not to be an option to remove css.

Thanks a lot!
Can you use search and replace? How about looking for style=" . Seems
to me search and replace will be what you want to do. Google for a good
search and replace tool, or I am sure someone will be around shortly to
tell you another way.
--
Adrienne Boswell at Home
Arbpen Web Site Design Services
http://www.cavalcade-of-coding.info
Please respond to the group so others can share

Jan 31 '07 #2

P: n/a
On Jan 28, 6:22 am, Zhang Weiwu <zhangwe...@realss.com>
wrote:
Hello. I am looking for a commandline tool to take an
html document (or html document segment, a.k.a. without
beginign "<html><head>..</head><body>") and process it by
removing all css style settings and javascripts, and
output a clean html/xhtml.

Optionally, it would be nice if this tool can take an
acceptable tag list and remove all tags not in this list.

I need such a tool to process a lot of static html
document I am working on. Do you happen to know such a
tool? I am still googling around ;) I tried tidy but
there seems not to be an option to remove css.
Unless your source HTML is so tag-soupy no sane HTML parser
can grok it, XSLT is great for this kind of stuff. Of
course, you'll also need an XSLT processor that can
transform HTML documents (libxslt can do that, and probably
many others).

pavel@debian:~/dev/xslt$ cat raw.html
<!DOCTYPE HTML PUBLIC
"-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<title>Test</title>
<style type="text/css">
body { font-family : monospace ; }
</style>
<script type="text/javascript">
function oink ( ) { alert ( 'oink!' ) ; }
</script>
</head>
<body>
<div style=" color : blue ;">
<span style=" font-style : italic ; "
onclick=" oink ( ) ; ">oink!</span>
</div>
</body>
</html>

pavel@debian:~/dev/xslt$ cat strip_jscss.xsl
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html"/>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="style|script|@style|@onclick"/>
</xsl:stylesheet>

pavel@debian:~/dev/xslt$ xsltproc -html strip_jscss.xsl
raw.html
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;
charset=UTF-8">
<title>Test</title>
</head>
<body><div>
<span>oink!</span>
</div></body>
</html>

Naturally, you'll want to tinker with xsl:output to get
valid HTML as an output, and you'll need to fine-tune the
exclusion template to handle all the event handler
attributes etc. xsltproc is a command-line utility that
comes with libxslt, but as I said, I'd expect most of XSLT
processors capable of transforming HTML as well.

--
Pavel Lepin

Feb 1 '07 #3

This discussion thread is closed

Replies have been disabled for this discussion.