473,769 Members | 3,232 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

converting documents to HTML

can anyone recommend a good tool to convert documents to HTML on the
fly. I need to integrate this tool with a VB app so it must have an
API.

thanks in advance
Davinder
da******@gujral .co.uk
Jul 20 '05
21 4249
Ok, so my posting was mischievous - I knew there would be howls (and I was
full of beer at the time!). I also know that Word-generated HTML is bloated
with all sorts of stuff - there to support Word features when HTML is
chosen as a document's native format. MS have even provided a filter to
weed it out when no longer needed (well worth investigating). MS is anyway
moving towards XML format, which will offer new opportunities to improve
things.

But does "clean HTML" matter in itself, independently of the purpose of the
page? I don't think so. If you have a page which must load quickly, then
you'd probably optimise by hand, and I doubt anyone who writes a lot of web
pages will choose Word as their editor of choice. But when you have an
occasional document which you'd like to make available on the web, it'll
depend on whether 28 seconds of a visitor's time is worth more than the time
it would take to make the conversion. It'll usually be worth running the
filter, but if the web performance is that important, then Word is not for
you. I'm not immune to prissiness about HTML - one of my sites has a page
which is taken from an Excel spreadsheet, and every time I copy and paste I
shudder momentarily at the redundancy in the resulting code. But it's only
occasionally visited (that's fine by me) and it's not worth additional
effort. Horses for courses - HTML "flower arranging" is not for me without
some real benefit in view.

Darin commented earlier that editing automatically-generated HTML is a
nightmare. So don't do it. The software engineering world is full of
useful products that shrink development times dramatically by generating
code. It's always hideous, but if you're paying for the time of software
engineers that's a compelling deal. The trick is to make sure you do all
editing through the generator, and separate out anything likely to need
hand-optimisation into a separate module.

--
############### #######
## PH, London ##
############### #######

"Stephen Poley" <sb*****@xs4all .nl> wrote in message
news:1f******** *************** *********@4ax.c om...
On Thu, 10 Jul 2003 00:03:37 +0000 (UTC), "Philip Herlihy"
<fo******@REMOV Eherlihy.eu.com > wrote:
Of course. But who cares?


(Further context vanished because the quoted text was part of the sig.
Please have a read of http://www.xs4all.nl/~sbpoley/toppost.htm).

Maybe your readers might just care? I tried using Word-generated HTML
just once. It was horrible. My hand-coded version took 2 seconds to load
from my local hard disk. The Word-generated version took 30 seconds.
(That's not a typo - it took about fifteen times as long!!) By the time
it had come from a server over a modem link, you can be pretty sure that
most of my visitors would have gone elsewhere.

--
Stephen Poley

http://www.xs4all.nl/~sbpoley/webmatters/

Jul 20 '05 #11
Spotted this link to an HTML filter for Office 2000. The filter appears to
be built into Office XP.

http://office.microsoft.com/download.../Msohtmf2.aspx

--
############### #######
## PH, London ##
############### #######
da******@gujral .co.uk says...
can anyone recommend a good tool to convert documents to HTML on the
fly. I need to integrate this tool with a VB app so it must have an
API.

Jul 20 '05 #12
Yikes:

http://www.microsoft.com/technet/tre...n/MS03-023.asp

--
############### #######
## PH, London ##
############### #######

"Philip Herlihy" <fo******@REMOV Eherlihy.eu.com > wrote in message
news:be******** **@titan.btinte rnet.com...
Spotted this link to an HTML filter for Office 2000. The filter appears to be built into Office XP.

http://office.microsoft.com/download.../Msohtmf2.aspx

--
############### #######
## PH, London ##
############### #######
da******@gujral .co.uk says...
> can anyone recommend a good tool to convert documents to HTML on the
> fly. I need to integrate this tool with a VB app so it must have an
> API.


Jul 20 '05 #13
On Thu, 10 Jul 2003 13:32:24 +0000 (UTC), "Philip Herlihy"
<fo******@REMOV Eherlihy.eu.com > wrote:
But does "clean HTML" matter in itself, independently of the purpose of the
page? I don't think so. If you have a page which must load quickly, then
you'd probably optimise by hand
When writing a program, one typically starts by doing it in the most
straightforward fashion. If this isn't fast enough, then one starts
optimising.

But in HTML the most straightforward fashion normally *is* the optimised
version. One just has to avoid getting a lot of superfluous crud in
there in the first place.

Clean HTML also typically matters if you want your page to be properly
readable by browsers other than IE.
But when you have an
occasional document which you'd like to make available on the web, it'll
depend on whether 28 seconds of a visitor's time is worth more than the time
it would take to make the conversion.


Firstly - that 28 seconds was from my example on the local hard disk.
Over a modem we're probably talking about more than a minute extra time.

For the rest - it depends. Occasionally one might indeed resort to a
simple dump from Word. But the original question was "can anyone
recommend a good tool to convert documents to HTML" - note the word
'good' - and in that context Word is hardly appropriate.

--
Stephen Poley

http://www.xs4all.nl/~sbpoley/webmatters/
Jul 20 '05 #14
On Thu, 10 Jul 2003 13:32:24 +0000 (UTC), "Philip Herlihy"
<fo******@REMOV Eherlihy.eu.com > wrote:
I'm not immune to prissiness about HTML - one of my sites has a page
which is taken from an Excel spreadsheet, and every time I copy and paste I
shudder momentarily at the redundancy in the resulting code. But it's only
occasionally visited (that's fine by me) and it's not worth additional
effort.


A few months ago I found a nifty little program that converts Excel
spreadsheets to clean HTML. It's called XLS2HTML and you can find out
more about it at: http://www.finertechnologies.com/index-xls2html.html

When I got it the download was free. The site above now offers a free
trial version that times out in 10 days. Still, it was a god send for
me, and still is. A quick look at the site also shows a program
called DOC2HTML. Worth checking out......

Haven't seen this mentioned in the thread, but another option for
converting pages for the web - Adobe Acrobat - .pdf.

Just more of my $.02.

Leslie
Jul 20 '05 #15
"Philip Herlihy" <fo******@REMOV Eherlihy.eu.com > wrote in message
news:be******** **@titan.btinte rnet.com...
Spotted this link to an HTML filter for Office 2000. The filter appears to be built into Office XP.

http://office.microsoft.com/download.../Msohtmf2.aspx


If anyone knows of a super Word code cleaner, I'd love to hear it. What I
end up having to do is using the HTML filter, then removing all spans, all
divs, all class and style attributes and then manually set all the list
items. It's a royal pain (yet a process I've managed to get down to about
5-10 mins per document).

Jonathan
Jul 20 '05 #16
Thanks for the note, Mark, but I'm not really that interested in this
debate. We'll have to agree to differ.

--
############### #######
## PH, London ##
############### #######

"Mark Parnell" <we*******@clar kecomputers.com .au> wrote in message
news:3f******** *************** @freenews.iinet .net.au...
Philip Herlihy wrote:
On top-posting:

Jul 20 '05 #17
In article <qooPa.125390$x 4o.46930
@news04.bloor.i s.net.cable.rog ers.com>, go************* **@snook.ca
says...
"Philip Herlihy" <fo******@REMOV Eherlihy.eu.com > wrote in message
news:be******** **@titan.btinte rnet.com...
Spotted this link to an HTML filter for Office 2000. The filter appears .... http://office.microsoft.com/download.../Msohtmf2.aspx


If anyone knows of a super Word code cleaner, I'd love to hear it. What I
end up having to do is using the HTML filter, then removing all spans, all
divs, all class and style attributes and then manually set all the list

....

http://www.jafsoft.com/detagger/ will do quite a lot of that.
Jul 20 '05 #18

"Jacqui or (maybe) Pete" <po****@spamcop .net> wrote in message
news:MP******** *************** *@news.CIS.DFN. DE...
If anyone knows of a super Word code cleaner, I'd love to hear it. What I end up having to do is using the HTML filter, then removing all spans, all divs, all class and style attributes and then manually set all the list


http://www.jafsoft.com/detagger/ will do quite a lot of that.


That does do quite a bit. That in combination with TidyHTML, it's 90% there!
:-) Thank you very much for the link.

Jonathan
Jul 20 '05 #19
Tim
On Thu, 10 Jul 2003 12:57:48 +0000 (UTC),
"Philip Herlihy" <fo******@REMOV Eherlihy.eu.com > wrote:
However, I'm not going to stop top-posting, because I strongly
prefer it, and I'm voting with my postings, as it were.


The most important thing to remember, is that if you're posting seeking
solutions, then you want to post in a manner that's most likely to get
you *USEFUL* answers.

a. Post the same as the others, i.e. in the preferred style for
where you're posting.

b. Post in a manner that's suitable for the recipients more than
your own prejudices.

c. You're most likely to get the correct information from the old
hands, and many of them will just ignore top-posting.

You're cutting off your own nose to spite your face, with what you've
said.

--
My "from" address is totally fake. (Hint: If I wanted e-mails from
complete strangers, I'd have put a real one, there.) Reply to usenet
postings in the same place as you read the message you're replying to.
Jul 20 '05 #20

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
1187
by: tjordah | last post by:
Hi! Im looking for a developed XML format that hides the low-level details of XSL-FO but that easily can be converted to nice-looking XSL-FO documents as well as HTML through a set of (pre-developed) XSLT-documents. I'm picturing a format where I can do something like this (from http://www.w3schools.com/xslfo/xslfo_xslt.asp): <header> W3Schools
20
7356
by: Al Moritz | last post by:
Hi all, I was always told that the conversion of Word files to HTML as done by Word itself sucks - you get a lot of unnecessary code that can influence the design on web browsers other than Internet Explorer. Our computer expert in my company had told me already a while ago that I should learn HTML and encode myself. I was never inclined to do so (I am no computer expert), and when upon his suggestion I looked how my pages (converted to...
29
3909
by: Armand Karlsen | last post by:
I have a website ( http://www.zen62775.zen.co.uk ) that I made HTML 4.01 Transitional and CSS compliant, and I'm thinking of converting it into XHTML to learn a little about it. Which XHTML variant would you recommend? The w3c HTML validator mentions XHTML 1.0 Transitional, Basic, Strict, and XHTML 1.1. Would I be able to make my existing CSS work in the XHTML page without modification to the .css file?
2
2870
by: mike | last post by:
regards: I follow the following steps to converting from HTML to XHTML http://webpageworkshop.co.uk/main/xhtml_converting My parser is http://htmlparser.sourceforge.net/ Xhtml version is 1.0 from http://nds.nokia.com/uaprof/N6600r100.xml but nokia mobile browser cannot identify the converted file(XHTML1.0). Is there something wrong with my procedure.
3
9927
by: Stephan Brunner | last post by:
Hi I have created two flavors of an XSLT stylesheet to transform all attributes of an XML document to elements: They both work as expected with MSXML and XMLSPY but throw an exception ========================= <?xml version="1.0" encoding="iso-8859-1"?> <xsl:stylesheet version="1.0"
6
1177
by: Glenn | last post by:
Hi, I have a fairly urgent requirement to generate PDF documents from within a C# .NET component, based on generated HTML reports. These HTML reports contain images and use CSS styles extensively. I need to incorporate proper paging and section breaks into these PDF documents also. I've googled this process and there seems to be a myriad of tools out there! Instead of re-inventing the wheel I'd really appreciate any recommendations...
9
6735
by: anupamjain | last post by:
Hi, After 2 weeks of search/hit-and-trial I finally thought to revert to the group to find solution to my problem.(something I should have done much earlier) This is the deal : On a JSP page, I want to grab a URL and parse /change the HTML and send it to the JSP page. I take the URL from the user in a textbox (not the
1
1311
by: =?Utf-8?B?U3FsQmVnaW5uZXI=?= | last post by:
I want to automate a process of converting documents (*.doc) to html pages using C#. Please note that documents might contain images within it. Any pointers in this regard would be of great help to me. Thanks! Regards Pradeep
0
1341
by: Andre Majorel | last post by:
Is there some command-line program for Unix to make all links relative in HTML documents saved in wget -x fashion ? (http://foo.com/a/b.html saved as ./foo.com/a/b.html.) For example, - if ./foo.com/a/b.html contains <img src="/images/d.jpg"> and ./foo.com/images/d.jpg exists, replace that tag by <img src="../images/d.jpg">
0
9590
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9424
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
1
10000
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8879
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
6675
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5310
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5448
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3968
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
2815
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.