473,569 Members | 2,844 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Looking for a tool to make plain text document out of a simple HTML document

Hi,

Hopefully this is not too much offtopic.

I'm working on a FAQ. I want to make two versions of it, plain text and
HTML. I'm looking for a tool that will make a plain text doc out of the
HTML doc. The HTML version doesn't have anything fancy, just internal
links. So the tool must be able to delete internal links and anchors from
the HTML version, but leave external links in simplified form. That is, the
HTML version would say <a href="http://foo/bar.html">Bar</a> and the plain
text version would just say http://foo/bar.html. The tool should also be
able to make mailto links into plain text, that is, change <a
href="mailto:fo *@bar.com">Foo </a> into fo*@bar.com. In fact, I think both
of these changes might be possible with regex search and replace engine. I
have one at my editor, but I don't know how to use it. So I'm either
looking for a regex strings or a ready tool, both will be find.
Jul 20 '05 #1
14 6851
Akseli Mäki wrote:

I forgot to say, that the tool should be Dos or Windows one.
Jul 20 '05 #2
Jon
open the file in Word, select all, copy, open Notepad, paste, save :)

Jon

"Akseli Mäki" <ne********@aks eli-yok.utu.fi> wrote in message
news:0p******** *************** *********@4ax.c om...
Akseli Mäki wrote:

I forgot to say, that the tool should be Dos or Windows one.
Jul 20 '05 #3
On Sun, 21 Dec 2003 14:12:45 +0200, Akseli Mäki wrote:
Hi,

Hopefully this is not too much offtopic.

I'm working on a FAQ. I want to make two versions of it, plain text and
HTML. I'm looking for a tool that will make a plain text doc out of the
HTML doc. The HTML version doesn't have anything fancy, just internal
links. So the tool must be able to delete internal links and anchors from
the HTML version, but leave external links in simplified form. That is, the
HTML version would say <a href="http://foo/bar.html">Bar</a> and the plain
text version would just say http://foo/bar.html. The tool should also be
able to make mailto links into plain text, that is, change <a
href="mailto:fo *@bar.com">Foo </a> into fo*@bar.com. In fact, I think both
of these changes might be possible with regex search and replace engine. I
have one at my editor, but I don't know how to use it. So I'm either
looking for a regex strings or a ready tool, both will be find.


While it doesn't *exactly* match your requirements, lynx is a very good
tool for doing this. "lynx --dump http://host/dir/page.ext" will produce
a plain-text output with links replaced with '[1]link text'; at the bottom
of the output is a list of all the links' destination URLs.

It is available for Windows at <http://jim.spath.com/lynx_win32/>.

--
Some say the Wired doesn't have political borders like the real world,
but there are far too many nonsense-spouting anarchists or idiots who
think that pranks are a revolution.

Jul 20 '05 #4
Jon wrote:

Please direct your attention to: http://www.allmyfaqs.com/faq.pl?How_to_post
open the file in Word, select all, copy, open Notepad, paste, save :)


How does that preserve extenal hyperlinks?

--
David Dorward <http://dorward.me.uk/>
Jul 20 '05 #5
Akseli Mäki wrote:

Hopefully this is not too much offtopic.
Perhaps slightly, but the ciwa-tools groups seems to generate little
traffic outside of spam.
I'm working on a FAQ. I want to make two versions of it, plain text
and HTML.
May we ask why?
I'm looking for a tool that will make a plain text doc out of the
HTML doc. The HTML version doesn't have anything fancy, just
internal links. So the tool must be able to delete internal links
and anchors from the HTML version, but leave external links in
simplified form. That is, the HTML version would say <a
href="http://foo/bar.html">Bar</a> and the plain text version would
just say http://foo/bar.html. The tool should also be able to make
mailto links into plain text, that is, change <a
href="mailto:fo *@bar.com">Foo </a> into fo*@bar.com.
And in both cases, you want to *remove* the anchor text, is that right?
In fact, I think both of these changes might be possible with regex
search and replace engine.
That's how I'd probably do it, but it would be a little time consuming
for me, because I'd need several steps to do it. My editor has a
search/replace dialogue box. If I were going to try to do what you're
doing, I'd copy the html files to a new directory, each file with a
new .txt extension. Then I'd run the search/replace.

Search: <a href="{[a-z/]*}">[a-zA-Z]*</a>

Replace: \1

This almost works in my text editor, NoteTab Light. Perhaps it'll
help you get started.
I have one at my editor, but I don't know how to use it.


Google "regex" or "regular expression" -- lots of links to go
through. If you are going to go this route, then I doubt there's any
acceptable substiture to learning regular expressions. But then, if
your editor has them,

--
Brian
follow the directions in my address to email me

Jul 20 '05 #6
In article <fg************ *************** *****@4ax.com> in
comp.infosystem s.www.authoring.html, Akseli Mäki wrote:
Hi,

Hopefully this is not too much offtopic.

I'm working on a FAQ. I want to make two versions of it, plain text and
HTML. I'm looking for a tool that will make a plain text doc out of the
HTML doc. The HTML version doesn't have anything fancy, just internal
links. So the tool must be able to delete internal links and anchors from
the HTML version, but leave external links in simplified form.


Lynx can almost do what you want, and it has the great virtue that
you can do the job with a batch file rather than navigate menus. The
form (from memory; check with "lynx -help") is
lynx -dump URL_or_file >outputfile
A local file can be done either as file:///c:/zonk/file or without
the leading "file:///".

Lynx will insert bracketed numbers [1], [2], etc in the text after
each link, then put a list at the end, so you have a record of what
each link is. I don't think it makes any distinction among external
and internal links and mailtos, however. You could postprocess the
output to remove internal links from the link list.

http://www.fdisk.com/doslynx/lynxport.htm

--
Stan Brown, Oak Road Systems, Cortland County, New York, USA
http://OakRoadSystems.com/
HTML 4.01 spec: http://www.w3.org/TR/html401/
validator: http://validator.w3.org/
CSS 2 spec: http://www.w3.org/TR/REC-CSS2/
2.1 changes: http://www.w3.org/TR/CSS21/changes.html
validator: http://jigsaw.w3.org/css-validator/
Jul 20 '05 #7
Jon
Whoops! missed that!!!!

Although this could be sorted with some word VBA script.. not really the
polace for that

Jon
"David Dorward" <do*****@yahoo. com> wrote in message
news:bs******** ***********@new s.demon.co.uk.. .
Jon wrote:

Please direct your attention to: http://www.allmyfaqs.com/faq.pl?How_to_post
open the file in Word, select all, copy, open Notepad, paste, save :)


How does that preserve extenal hyperlinks?

--
David Dorward <http://dorward.me.uk/>
Jul 20 '05 #8
Akseli Mäki wrote:

Ok thanks for all the suggestins. I already have Lynx so I'll use it.
Jul 20 '05 #9
Brian wrote:
I'm working on a FAQ. I want to make two versions of it, plain text
and HTML.May we ask why?

Well, some people might prefer HMTL file, I don't know yet. I might deside
to drop the idea if no one downloads it. Naturally I would post only the
plaintext version to the NG.
And in both cases, you want to *remove* the anchor text, is that right?

Yes.
Jul 20 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
3582
by: Frank Einstein | last post by:
Looking for a tool that can edit an XML file in a browser. The basic requirement is that the XML file is rendered as an HTML form with editable fields (including add/delete, preferably in accordance with with the document's XSD/DTD or other schema definition). I don't mind the actual form of this tool - could be implemented as a JSP/Servlet,...
8
2346
by: Sebastian Kerekes | last post by:
Greetings, I'm developing an application that supports multiple languages. In my XSL I use variables to place the text where it belongs to. At the top of the document I include those variables - the included file depends on the language. Atm I'm editing those file manually. Luckily atm it's only two languages I have to work with, but...
8
3940
by: Wayne Davis | last post by:
I want to password protect the continued running of a section of javascript. What I need is for the user to see a login field, they type a code in, if it is good, the script continues, if bad, it stops dead. I would like to base the password on julian date so some example numerical calcs (so I can see how they are structured in Java) would...
6
4854
by: scottyman | last post by:
I can't make this script work properly. I've gone as far as I can with it and the rest is out of my ability. I can do some html editing but I'm lost in the Java world. The script at the bottom of the html page controls the form fields that are required. It doesn't function like it's supposed to and I can leave all the fields blank and it still...
2
2952
by: P2P | last post by:
Hi I am wondering if someone know of a free cross-browsers vertical scrolling script that - is cross cross-browsers - will call the scrolling content from an external html page or from a url page
28
2920
by: Steven Bethard | last post by:
Ok, I finally have a PEP number. Here's the most updated version of the "make" statement PEP. I'll be posting it shortly to python-dev. Thanks again for the previous discussion and suggestions! PEP: 359 Title: The "make" Statement Version: $Revision: 45366 $ Last-Modified: $Date: 2006-04-13 07:36:24 -0600 (Thu, 13 Apr 2006) $
1
3496
by: Sithlord999 | last post by:
Hello. I'm working on an email form on Dreamweaver and I'm looking for a PHP code to make it work. The form with some required fields would send the submitted information and two image attachments to my email address. I just started looking at PHP two weeks ago for an answer and the only code that worked for me, so far, was a simple php email...
3
9538
by: realmerl | last post by:
Hi All. I'm trying to transform a html document into plain text via xslt. Simple you say! (i hope) I have got it working, by using the magnificent <xsl:value-of select="."/>. This returns the whole document, and <xsl:output method="text"/> ensures that the output I get is plain text. problem: The html I am transforming has a table, with...
0
7695
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
7612
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
1
7668
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
0
7964
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
0
6281
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
1
5509
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
3637
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2111
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1209
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.