473,698 Members | 2,576 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Looking for a tool to make plain text document out of a simple HTML document

Hi,

Hopefully this is not too much offtopic.

I'm working on a FAQ. I want to make two versions of it, plain text and
HTML. I'm looking for a tool that will make a plain text doc out of the
HTML doc. The HTML version doesn't have anything fancy, just internal
links. So the tool must be able to delete internal links and anchors from
the HTML version, but leave external links in simplified form. That is, the
HTML version would say <a href="http://foo/bar.html">Bar</a> and the plain
text version would just say http://foo/bar.html. The tool should also be
able to make mailto links into plain text, that is, change <a
href="mailto:fo *@bar.com">Foo </a> into fo*@bar.com. In fact, I think both
of these changes might be possible with regex search and replace engine. I
have one at my editor, but I don't know how to use it. So I'm either
looking for a regex strings or a ready tool, both will be find.
Jul 20 '05
14 6885
On Mon, 22 Dec 2003, Akseli Mäki wrote:
Ok thanks for all the suggestins. I already have Lynx so I'll use it.


Yes, Lynx does that job very well - except for tables. Real tables, I
mean - it can be actually beneficial what it does with
tables-for-layout, but tabular data can become unusable in Lynx.

(If you're in control of the HTML, I have an ancient web page
about how to make HTML tables which also present acceptably on
Lynx, but it's not really been updated since 1998: if you still
want to read it after that low-key introduction, then
http://ppewww.ph.gla.ac.uk/~flavell/www/tablejob.html

The nobreak-space stuffing technique is the one I would recommend now,
if you want to do anything at all.)
Jul 20 '05 #11
Alan J. Flavell wrote:
On Mon, 22 Dec 2003, Akseli Mäki wrote:
Ok thanks for all the suggestins. I already have Lynx so I'll use it.


Yes, Lynx does that job very well - except for tables. Real tables, I
mean - it can be actually beneficial what it does with
tables-for-layout, but tabular data can become unusable in Lynx.


Lynx actually does a pretty good job with tables these days, it guesses if
its a layout table or a real table, so its not 100% though.

To take the example from the URI I snipped, it comes out quite happily as:

Deutsch British USA
Haube Bonnet Hood
Kofferraum Boot Trunk
Benzin Petrol Gas(oline)

(With the <th>s rendered in brown)

(That said, it is rather early here, and I only skimmed your document, so I
could be missing something).
--
David Dorward <http://dorward.me.uk/>
Jul 20 '05 #12
In article <Pi************ *************** ****@ppepc56.ph .gla.ac.uk>, one of infinite monkeys
at the keyboard of "Alan J. Flavell" <fl*****@ph.gla .ac.uk> wrote:
Yes, Lynx does that job very well - except for tables. Real tables, I
mean - it can be actually beneficial what it does with
tables-for-layout, but tabular data can become unusable in Lynx.


Um - recent Lynx does a very nice job on tables.
At least, that's my experience with the Lynx bundled in Slackware 9.

--
Nick Kew

In urgent need of paying work - see http://www.webthing.com/~nick/cv.html
Jul 20 '05 #13
On Mon, 22 Dec 2003, David Dorward wrote:
Lynx actually does a pretty good job with tables these days, it guesses if
its a layout table or a real table, so its not 100% though.
OK, I knew they had been working on it. Care to mention which version
you're using?
(That said, it is rather early here, and I only skimmed your document, so I
could be missing something).


OK, I *did* stress that my page is from about 5 years back. However,
I might remark that there used to be plenty of obsolete versions of
Lynx around. Hmmm, interesting: if I trawl the logs now, the oldest
version of Lynx that seems to be well represented is 2.8.3rel.1,
although I did see just the occasional 2.7.1

A quick hunt around different machines here didn't show up too many
usable versions, but I tried a couple as below.

This one doesn't space the display out usefully: Lynx 2.8.2rel.1

This one does: Lynx 2.8.4rel.1

So it must have got sorted out somewhere in between.

I think the bottom line is that I withdraw my previous posting,
provided that the user has a recent-enough version of Lynx. Thanks.
Jul 20 '05 #14
Alan J. Flavell wrote:
On Mon, 22 Dec 2003, David Dorward wrote:
Lynx actually does a pretty good job with tables these days, it guesses
if its a layout table or a real table, so its not 100% though.
OK, I knew they had been working on it. Care to mention which version
you're using?


david $ lynx --version
Lynx Version 2.8.4rel.1 (17 Jul 2001)
libwww-FM 2.14, SSL-MM 1.4.1, OpenSSL 0.9.7c
Built on linux-gnu Dec 7 2003 10:16:09
I think the bottom line is that I withdraw my previous posting,
provided that the user has a recent-enough version of Lynx. Thanks.


Its always nice to see browsers improve. (Hint to Microsoft :D)

--
David Dorward <http://dorward.me.uk/>
Jul 20 '05 #15

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
3596
by: Frank Einstein | last post by:
Looking for a tool that can edit an XML file in a browser. The basic requirement is that the XML file is rendered as an HTML form with editable fields (including add/delete, preferably in accordance with with the document's XSD/DTD or other schema definition). I don't mind the actual form of this tool - could be implemented as a JSP/Servlet, ASP component, or browser plug-in for local or remote XML file access. Alternatively, it doesn't...
8
2356
by: Sebastian Kerekes | last post by:
Greetings, I'm developing an application that supports multiple languages. In my XSL I use variables to place the text where it belongs to. At the top of the document I include those variables - the included file depends on the language. Atm I'm editing those file manually. Luckily atm it's only two languages I have to work with, but even in this case I forget to add a variable that I added in the other file, forget to use entities .....
8
3944
by: Wayne Davis | last post by:
I want to password protect the continued running of a section of javascript. What I need is for the user to see a login field, they type a code in, if it is good, the script continues, if bad, it stops dead. I would like to base the password on julian date so some example numerical calcs (so I can see how they are structured in Java) would be handy. Also, how do I time out a group of statements after a preset amount of time like 1...
6
4883
by: scottyman | last post by:
I can't make this script work properly. I've gone as far as I can with it and the rest is out of my ability. I can do some html editing but I'm lost in the Java world. The script at the bottom of the html page controls the form fields that are required. It doesn't function like it's supposed to and I can leave all the fields blank and it still submits the form. Also I can't get it to transfer the file in the upload section. The file name...
2
2969
by: P2P | last post by:
Hi I am wondering if someone know of a free cross-browsers vertical scrolling script that - is cross cross-browsers - will call the scrolling content from an external html page or from a url page
28
2935
by: Steven Bethard | last post by:
Ok, I finally have a PEP number. Here's the most updated version of the "make" statement PEP. I'll be posting it shortly to python-dev. Thanks again for the previous discussion and suggestions! PEP: 359 Title: The "make" Statement Version: $Revision: 45366 $ Last-Modified: $Date: 2006-04-13 07:36:24 -0600 (Thu, 13 Apr 2006) $
1
3508
by: Sithlord999 | last post by:
Hello. I'm working on an email form on Dreamweaver and I'm looking for a PHP code to make it work. The form with some required fields would send the submitted information and two image attachments to my email address. I just started looking at PHP two weeks ago for an answer and the only code that worked for me, so far, was a simple php email form: <html> <head></head> <body> <? if(isset($submit)) { //check for email injection attack;...
3
9589
by: realmerl | last post by:
Hi All. I'm trying to transform a html document into plain text via xslt. Simple you say! (i hope) I have got it working, by using the magnificent <xsl:value-of select="."/>. This returns the whole document, and <xsl:output method="text"/> ensures that the output I get is plain text. problem: The html I am transforming has a table, with headings and data. Whilst the output contains all the data form the table, it does not preserve any...
0
8680
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8609
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9030
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
7738
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6528
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5861
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4371
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4622
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
3
2007
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.