By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
459,736 Members | 1,457 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 459,736 IT Pros & Developers. It's quick & easy.

Convert some files from html to plaintext

P: n/a
I have many html files named like these:

c:\dir\femo-black.html
c:\dir\loren-white.html
c:\dir\spark-white.html
c:\dir\kim-black.html
c:\dir\paul-white.html

How can I convert only the files named "c:\dir\*-white.html" to
plaintext files named c:\dir\(original filename)-text.txt?

Is there a PHP module that does a good quality conversion HTML to
plaintext?

Nov 11 '07 #1
Share this Question
Share on Google+
7 Replies


P: n/a
On Nov 11, 1:05 pm, Luca Villa <lucavi...@cashette.comwrote:
I have many html files named like these:

c:\dir\femo-black.html
c:\dir\loren-white.html
c:\dir\spark-white.html
c:\dir\kim-black.html
c:\dir\paul-white.html

How can I convert only the files named "c:\dir\*-white.html" to
plaintext files named c:\dir\(original filename)-text.txt?

Is there a PHP module that does a good quality conversion HTML to
plaintext?
See this:

<http://www.php.net/strip_tags>

Nov 11 '07 #2

P: n/a
See this:
>
<http://www.php.net/strip_tags>
Isn't there something of higher quality, like the rendering engine of
the textual browser Lynx?

Nov 11 '07 #3

P: n/a
On Nov 11, 8:58 pm, Luca Villa <lucavi...@cashette.comwrote:
See this:
<http://www.php.net/strip_tags>

Isn't there something of higher quality, like the rendering engine of
the textual browser Lynx?
I guess you dont't simply want to remove all the tags. You rather want
to make sure, that the content of your <h1>-element is followed by an
empty line or that your <p>-elements are indented, etc.

This might seem a little oversized, but if all of your files have the
same structure, you might want to create an XSLT and have PHP
transform it to whatever strucure you prefer.

Check out the PHP manual here http://www.php.net/ref.xsl and maybe
this tutorial on XSLT http://www.w3schools.com/xsl/

Oli

Nov 11 '07 #4

P: n/a
Oli, there are ready and open source converters available like Lynx,
Links, ELinks, W3M etc...
I think that it's not the case to re-write with XSLT what's it's
already done by others with many years of work.
I hoped that PHP had an integrated solution for this, like the engine
of one of the mentioned textual browsers...

Nov 11 '07 #5

P: n/a
Luca Villa wrote:
Oli, there are ready and open source converters available like Lynx,
Links, ELinks, W3M etc...
I think that it's not the case to re-write with XSLT what's it's
already done by others with many years of work.
I hoped that PHP had an integrated solution for this, like the engine
of one of the mentioned textual browsers...
Usually very simple. Install Lynx on youre server and call Lynx by one
of the command executing functions of PHP:

http://php.net/exec

Other Options you dont have without alot of work...

So long, Ulf

--
_,
_(_p Ulf [Kado] Kadner
\<_)
^^
Nov 12 '07 #6

P: n/a
Usually very simple. Install Lynx on youre server and call Lynx by one
of the command executing functions of PHP:
That's the road I'm following, but calling an external program
thousands of times (I need to process thousand of files) is not much
efficient...

Nov 12 '07 #7

P: n/a
Luca Villa wrote:
>Usually very simple. Install Lynx on youre server and call Lynx by one
of the command executing functions of PHP:

That's the road I'm following, but calling an external program
thousands of times (I need to process thousand of files) is not much
efficient...
sure, not a performance wonder :-)

better you write a shellscript that reads all resources from a file
(maybee dynamic generated) and handles it by lynx in a loop. Thats faster

So long, Ulf

--
_,
_(_p Ulf [Kado] Kadner
\<_)
^^
Nov 13 '07 #8

This discussion thread is closed

Replies have been disabled for this discussion.