473,881 Members | 1,670 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Looking for a tool to make plain text document out of a simple HTML document

Hi,

Hopefully this is not too much offtopic.

I'm working on a FAQ. I want to make two versions of it, plain text and
HTML. I'm looking for a tool that will make a plain text doc out of the
HTML doc. The HTML version doesn't have anything fancy, just internal
links. So the tool must be able to delete internal links and anchors from
the HTML version, but leave external links in simplified form. That is, the
HTML version would say <a href="http://foo/bar.html">Bar</a> and the plain
text version would just say http://foo/bar.html. The tool should also be
able to make mailto links into plain text, that is, change <a
href="mailto:fo *@bar.com">Foo </a> into fo*@bar.com. In fact, I think both
of these changes might be possible with regex search and replace engine. I
have one at my editor, but I don't know how to use it. So I'm either
looking for a regex strings or a ready tool, both will be find.
Jul 20 '05 #1
14 6913
Akseli Mäki wrote:

I forgot to say, that the tool should be Dos or Windows one.
Jul 20 '05 #2
Jon
open the file in Word, select all, copy, open Notepad, paste, save :)

Jon

"Akseli Mäki" <ne********@aks eli-yok.utu.fi> wrote in message
news:0p******** *************** *********@4ax.c om...
Akseli Mäki wrote:

I forgot to say, that the tool should be Dos or Windows one.
Jul 20 '05 #3
On Sun, 21 Dec 2003 14:12:45 +0200, Akseli Mäki wrote:
Hi,

Hopefully this is not too much offtopic.

I'm working on a FAQ. I want to make two versions of it, plain text and
HTML. I'm looking for a tool that will make a plain text doc out of the
HTML doc. The HTML version doesn't have anything fancy, just internal
links. So the tool must be able to delete internal links and anchors from
the HTML version, but leave external links in simplified form. That is, the
HTML version would say <a href="http://foo/bar.html">Bar</a> and the plain
text version would just say http://foo/bar.html. The tool should also be
able to make mailto links into plain text, that is, change <a
href="mailto:fo *@bar.com">Foo </a> into fo*@bar.com. In fact, I think both
of these changes might be possible with regex search and replace engine. I
have one at my editor, but I don't know how to use it. So I'm either
looking for a regex strings or a ready tool, both will be find.


While it doesn't *exactly* match your requirements, lynx is a very good
tool for doing this. "lynx --dump http://host/dir/page.ext" will produce
a plain-text output with links replaced with '[1]link text'; at the bottom
of the output is a list of all the links' destination URLs.

It is available for Windows at <http://jim.spath.com/lynx_win32/>.

--
Some say the Wired doesn't have political borders like the real world,
but there are far too many nonsense-spouting anarchists or idiots who
think that pranks are a revolution.

Jul 20 '05 #4
Jon wrote:

Please direct your attention to: http://www.allmyfaqs.com/faq.pl?How_to_post
open the file in Word, select all, copy, open Notepad, paste, save :)


How does that preserve extenal hyperlinks?

--
David Dorward <http://dorward.me.uk/>
Jul 20 '05 #5
Akseli Mäki wrote:

Hopefully this is not too much offtopic.
Perhaps slightly, but the ciwa-tools groups seems to generate little
traffic outside of spam.
I'm working on a FAQ. I want to make two versions of it, plain text
and HTML.
May we ask why?
I'm looking for a tool that will make a plain text doc out of the
HTML doc. The HTML version doesn't have anything fancy, just
internal links. So the tool must be able to delete internal links
and anchors from the HTML version, but leave external links in
simplified form. That is, the HTML version would say <a
href="http://foo/bar.html">Bar</a> and the plain text version would
just say http://foo/bar.html. The tool should also be able to make
mailto links into plain text, that is, change <a
href="mailto:fo *@bar.com">Foo </a> into fo*@bar.com.
And in both cases, you want to *remove* the anchor text, is that right?
In fact, I think both of these changes might be possible with regex
search and replace engine.
That's how I'd probably do it, but it would be a little time consuming
for me, because I'd need several steps to do it. My editor has a
search/replace dialogue box. If I were going to try to do what you're
doing, I'd copy the html files to a new directory, each file with a
new .txt extension. Then I'd run the search/replace.

Search: <a href="{[a-z/]*}">[a-zA-Z]*</a>

Replace: \1

This almost works in my text editor, NoteTab Light. Perhaps it'll
help you get started.
I have one at my editor, but I don't know how to use it.


Google "regex" or "regular expression" -- lots of links to go
through. If you are going to go this route, then I doubt there's any
acceptable substiture to learning regular expressions. But then, if
your editor has them,

--
Brian
follow the directions in my address to email me

Jul 20 '05 #6
In article <fg************ *************** *****@4ax.com> in
comp.infosystem s.www.authoring.html, Akseli Mäki wrote:
Hi,

Hopefully this is not too much offtopic.

I'm working on a FAQ. I want to make two versions of it, plain text and
HTML. I'm looking for a tool that will make a plain text doc out of the
HTML doc. The HTML version doesn't have anything fancy, just internal
links. So the tool must be able to delete internal links and anchors from
the HTML version, but leave external links in simplified form.


Lynx can almost do what you want, and it has the great virtue that
you can do the job with a batch file rather than navigate menus. The
form (from memory; check with "lynx -help") is
lynx -dump URL_or_file >outputfile
A local file can be done either as file:///c:/zonk/file or without
the leading "file:///".

Lynx will insert bracketed numbers [1], [2], etc in the text after
each link, then put a list at the end, so you have a record of what
each link is. I don't think it makes any distinction among external
and internal links and mailtos, however. You could postprocess the
output to remove internal links from the link list.

http://www.fdisk.com/doslynx/lynxport.htm

--
Stan Brown, Oak Road Systems, Cortland County, New York, USA
http://OakRoadSystems.com/
HTML 4.01 spec: http://www.w3.org/TR/html401/
validator: http://validator.w3.org/
CSS 2 spec: http://www.w3.org/TR/REC-CSS2/
2.1 changes: http://www.w3.org/TR/CSS21/changes.html
validator: http://jigsaw.w3.org/css-validator/
Jul 20 '05 #7
Jon
Whoops! missed that!!!!

Although this could be sorted with some word VBA script.. not really the
polace for that

Jon
"David Dorward" <do*****@yahoo. com> wrote in message
news:bs******** ***********@new s.demon.co.uk.. .
Jon wrote:

Please direct your attention to: http://www.allmyfaqs.com/faq.pl?How_to_post
open the file in Word, select all, copy, open Notepad, paste, save :)


How does that preserve extenal hyperlinks?

--
David Dorward <http://dorward.me.uk/>
Jul 20 '05 #8
Akseli Mäki wrote:

Ok thanks for all the suggestins. I already have Lynx so I'll use it.
Jul 20 '05 #9
Brian wrote:
I'm working on a FAQ. I want to make two versions of it, plain text
and HTML.May we ask why?

Well, some people might prefer HMTL file, I don't know yet. I might deside
to drop the idea if no one downloads it. Naturally I would post only the
plaintext version to the NG.
And in both cases, you want to *remove* the anchor text, is that right?

Yes.
Jul 20 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
3628
by: Frank Einstein | last post by:
Looking for a tool that can edit an XML file in a browser. The basic requirement is that the XML file is rendered as an HTML form with editable fields (including add/delete, preferably in accordance with with the document's XSD/DTD or other schema definition). I don't mind the actual form of this tool - could be implemented as a JSP/Servlet, ASP component, or browser plug-in for local or remote XML file access. Alternatively, it doesn't...
8
2365
by: Sebastian Kerekes | last post by:
Greetings, I'm developing an application that supports multiple languages. In my XSL I use variables to place the text where it belongs to. At the top of the document I include those variables - the included file depends on the language. Atm I'm editing those file manually. Luckily atm it's only two languages I have to work with, but even in this case I forget to add a variable that I added in the other file, forget to use entities .....
8
3959
by: Wayne Davis | last post by:
I want to password protect the continued running of a section of javascript. What I need is for the user to see a login field, they type a code in, if it is good, the script continues, if bad, it stops dead. I would like to base the password on julian date so some example numerical calcs (so I can see how they are structured in Java) would be handy. Also, how do I time out a group of statements after a preset amount of time like 1...
6
4910
by: scottyman | last post by:
I can't make this script work properly. I've gone as far as I can with it and the rest is out of my ability. I can do some html editing but I'm lost in the Java world. The script at the bottom of the html page controls the form fields that are required. It doesn't function like it's supposed to and I can leave all the fields blank and it still submits the form. Also I can't get it to transfer the file in the upload section. The file name...
2
2987
by: P2P | last post by:
Hi I am wondering if someone know of a free cross-browsers vertical scrolling script that - is cross cross-browsers - will call the scrolling content from an external html page or from a url page
28
2964
by: Steven Bethard | last post by:
Ok, I finally have a PEP number. Here's the most updated version of the "make" statement PEP. I'll be posting it shortly to python-dev. Thanks again for the previous discussion and suggestions! PEP: 359 Title: The "make" Statement Version: $Revision: 45366 $ Last-Modified: $Date: 2006-04-13 07:36:24 -0600 (Thu, 13 Apr 2006) $
1
3515
by: Sithlord999 | last post by:
Hello. I'm working on an email form on Dreamweaver and I'm looking for a PHP code to make it work. The form with some required fields would send the submitted information and two image attachments to my email address. I just started looking at PHP two weeks ago for an answer and the only code that worked for me, so far, was a simple php email form: <html> <head></head> <body> <? if(isset($submit)) { //check for email injection attack;...
3
9688
by: realmerl | last post by:
Hi All. I'm trying to transform a html document into plain text via xslt. Simple you say! (i hope) I have got it working, by using the magnificent <xsl:value-of select="."/>. This returns the whole document, and <xsl:output method="text"/> ensures that the output I get is plain text. problem: The html I am transforming has a table, with headings and data. Whilst the output contains all the data form the table, it does not preserve any...
0
9776
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10718
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10816
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10401
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7110
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5781
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5977
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
4196
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
3225
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.