473,325 Members | 2,870 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,325 software developers and data experts.

converting documents to HTML

can anyone recommend a good tool to convert documents to HTML on the
fly. I need to integrate this tool with a VB app so it must have an
API.

thanks in advance
Davinder
da******@gujral.co.uk
Jul 20 '05 #1
21 4185
In article <de**************************@posting.google.com >,
da******@gujral.co.uk says...
can anyone recommend a good tool to convert documents to HTML on the
fly. I need to integrate this tool with a VB app so it must have an
API.

What kind of document?
Jul 20 '05 #2
90% of all docs will be office documents. The other 10% are pdf's, gif, jpeg and bmp

Jacqui or (maybe) Pete <po****@spamcop.net> wrote in message news:<MP************************@news.CIS.DFN.DE>. ..
In article <de**************************@posting.google.com >,
da******@gujral.co.uk says...
can anyone recommend a good tool to convert documents to HTML on the
fly. I need to integrate this tool with a VB app so it must have an
API.

What kind of document?

Jul 20 '05 #3
On Wed, 09 Jul 2003 10:14:57 +0200, Davinder <"Davinder"
<da******@gujral.co.uk>> wrote:
90% of all docs will be office documents. The other 10% are pdf's, gif,
jpeg and bmp


Which office documents?

pdfs can be run through any of a number of filters. Try googling for
them. .gifs and jpegs can already be displayed inline by most browsers
that are capable of displaying images. .bmps need to be converted to jpgs
or pngs or gifs.

Ciao

Zak

--
================================================== ======================
http://www.carfolio.com/ Searchable database of 10 000+ car specs
================================================== ======================
Jul 20 '05 #4
Have you not investigated the object models of each Office app? Word, for
example, gives you access to a Document object, which has a SaveAs method,
one of whose parameters is FileFormat, which can take a value of
wdFormatHTML (this is Word XP). You may also be save as Compact HTML by
experimenting with the FileConverters object.

Have a look at this, to get you started.
http://msdn.microsoft.com/library/de...ordObjects.asp

--
######################
## PH, London ##
######################

"Davinder" <da******@gujral.co.uk> wrote in message
news:de**************************@posting.google.c om...
90% of all docs will be office documents. The other 10% are pdf's, gif, jpeg and bmp
Jacqui or (maybe) Pete <po****@spamcop.net> wrote in message

news:<MP************************@news.CIS.DFN.DE>. ..
In article <de**************************@posting.google.com >,
da******@gujral.co.uk says...
can anyone recommend a good tool to convert documents to HTML on the
fly. I need to integrate this tool with a VB app so it must have an
API.

What kind of document?

Jul 20 '05 #5
In article <be**********@titan.btinternet.com>, fo******@REMOVEherlihy.eu.com
says...
Have you not investigated the object models of each Office app? Word, for
example, gives you access to a Document object, which has a SaveAs method,
one of whose parameters is FileFormat, which can take a value of
wdFormatHTML (this is Word XP). You may also be save as Compact HTML by
experimenting with the FileConverters object.

Have a look at this, to get you started.
http://msdn.microsoft.com/library/de...ordObjects.asp

Yes, but the HTML that word produces is absolute GARBAGE!
Jul 20 '05 #6
Philip.
i have tried the the word office model...it worked well although i was
looking for something alittle more sophisticated. For example,
converting a word doc with 40+ pages would give me 1 large html rather
than linked pages.

Currently i am using net-it-central...this works great but its TOO
expensive for us to buy another license.

Davinder Gujral
da******@gujral.co.uk
"Philip Herlihy" <fo******@REMOVEherlihy.eu.com> wrote in message news:<be**********@titan.btinternet.com>...
Have you not investigated the object models of each Office app? Word, for
example, gives you access to a Document object, which has a SaveAs method,
one of whose parameters is FileFormat, which can take a value of
wdFormatHTML (this is Word XP). You may also be save as Compact HTML by
experimenting with the FileConverters object.

Have a look at this, to get you started.
http://msdn.microsoft.com/library/de...ordObjects.asp

--
######################
## PH, London ##
######################

"Davinder" <da******@gujral.co.uk> wrote in message
news:de**************************@posting.google.c om...
90% of all docs will be office documents. The other 10% are pdf's, gif,

jpeg and bmp

Jacqui or (maybe) Pete <po****@spamcop.net> wrote in message

news:<MP************************@news.CIS.DFN.DE>. ..
In article <de**************************@posting.google.com >,
da******@gujral.co.uk says...
> can anyone recommend a good tool to convert documents to HTML on the
> fly. I need to integrate this tool with a VB app so it must have an
> API.
What kind of document?

Jul 20 '05 #7
Of course. But who cares?

--
######################
## PH, London ##
######################

"Mr. Clean" <mr*****@protctorandgamble.com> wrote in message
news:MP************************@news-server.austin.rr.com...
In article <be**********@titan.btinternet.com>, fo******@REMOVEherlihy.eu.com says...
Have you not investigated the object models of each Office app? Word, for example, gives you access to a Document object, which has a SaveAs method, one of whose parameters is FileFormat, which can take a value of
wdFormatHTML (this is Word XP). You may also be save as Compact HTML by
experimenting with the FileConverters object.

Have a look at this, to get you started.
http://msdn.microsoft.com/library/de...ordObjects.asp

Yes, but the HTML that word produces is absolute GARBAGE!

Jul 20 '05 #8
On Thu, 10 Jul 2003 00:03:37 +0000 (UTC), "Philip Herlihy"
<fo******@REMOVEherlihy.eu.com> wrote:
Of course. But who cares?


(Further context vanished because the quoted text was part of the sig.
Please have a read of http://www.xs4all.nl/~sbpoley/toppost.htm).

Maybe your readers might just care? I tried using Word-generated HTML
just once. It was horrible. My hand-coded version took 2 seconds to load
from my local hard disk. The Word-generated version took 30 seconds.
(That's not a typo - it took about fifteen times as long!!) By the time
it had come from a server over a modem link, you can be pretty sure that
most of my visitors would have gone elsewhere.

--
Stephen Poley

http://www.xs4all.nl/~sbpoley/webmatters/
Jul 20 '05 #9
On top-posting:

Thanks for the link to that intelligent article, which did make me think
about it again, despite an initial hostile prejudice. In general I ignore
off-topic complaints about posting style as the mostly come from fuss-pots
and are mainly noise - I do acknowledge that your comments here are informed
and useful. However, I'm not going to stop top-posting, because I strongly
prefer it, and I'm voting with my postings, as it were. I also rather like
OE, which happens to make bottom-posting awkward. Even if there was an
option to reverse OE's top-posting into bottom-posting I wouldn't use it.
Some folk will rail against violation of "standards", and always against
Microsoft, but there are more important issues in my own life. I've taken
on board those points the article made about quoting, though.

Amusingly, one of my hobby horses is postings which take you off-topic
without changing the subject line. Tut, tut... :-)

I'll get back the HTML thread in a reply to Stephen...
--
######################
## PH, London ##
######################

"Darin McGrew" <mc****@stanfordalumni.org> wrote in message
news:be**********@blue.rahul.net...
A: It's backwards and makes discussions harder to follow:
http://www.cs.tut.fi/~jkorpela/usenet/brox.html

Jul 20 '05 #10
Ok, so my posting was mischievous - I knew there would be howls (and I was
full of beer at the time!). I also know that Word-generated HTML is bloated
with all sorts of stuff - there to support Word features when HTML is
chosen as a document's native format. MS have even provided a filter to
weed it out when no longer needed (well worth investigating). MS is anyway
moving towards XML format, which will offer new opportunities to improve
things.

But does "clean HTML" matter in itself, independently of the purpose of the
page? I don't think so. If you have a page which must load quickly, then
you'd probably optimise by hand, and I doubt anyone who writes a lot of web
pages will choose Word as their editor of choice. But when you have an
occasional document which you'd like to make available on the web, it'll
depend on whether 28 seconds of a visitor's time is worth more than the time
it would take to make the conversion. It'll usually be worth running the
filter, but if the web performance is that important, then Word is not for
you. I'm not immune to prissiness about HTML - one of my sites has a page
which is taken from an Excel spreadsheet, and every time I copy and paste I
shudder momentarily at the redundancy in the resulting code. But it's only
occasionally visited (that's fine by me) and it's not worth additional
effort. Horses for courses - HTML "flower arranging" is not for me without
some real benefit in view.

Darin commented earlier that editing automatically-generated HTML is a
nightmare. So don't do it. The software engineering world is full of
useful products that shrink development times dramatically by generating
code. It's always hideous, but if you're paying for the time of software
engineers that's a compelling deal. The trick is to make sure you do all
editing through the generator, and separate out anything likely to need
hand-optimisation into a separate module.

--
######################
## PH, London ##
######################

"Stephen Poley" <sb*****@xs4all.nl> wrote in message
news:1f********************************@4ax.com...
On Thu, 10 Jul 2003 00:03:37 +0000 (UTC), "Philip Herlihy"
<fo******@REMOVEherlihy.eu.com> wrote:
Of course. But who cares?


(Further context vanished because the quoted text was part of the sig.
Please have a read of http://www.xs4all.nl/~sbpoley/toppost.htm).

Maybe your readers might just care? I tried using Word-generated HTML
just once. It was horrible. My hand-coded version took 2 seconds to load
from my local hard disk. The Word-generated version took 30 seconds.
(That's not a typo - it took about fifteen times as long!!) By the time
it had come from a server over a modem link, you can be pretty sure that
most of my visitors would have gone elsewhere.

--
Stephen Poley

http://www.xs4all.nl/~sbpoley/webmatters/

Jul 20 '05 #11
Spotted this link to an HTML filter for Office 2000. The filter appears to
be built into Office XP.

http://office.microsoft.com/download.../Msohtmf2.aspx

--
######################
## PH, London ##
######################
da******@gujral.co.uk says...
can anyone recommend a good tool to convert documents to HTML on the
fly. I need to integrate this tool with a VB app so it must have an
API.

Jul 20 '05 #12
Yikes:

http://www.microsoft.com/technet/tre...n/MS03-023.asp

--
######################
## PH, London ##
######################

"Philip Herlihy" <fo******@REMOVEherlihy.eu.com> wrote in message
news:be**********@titan.btinternet.com...
Spotted this link to an HTML filter for Office 2000. The filter appears to be built into Office XP.

http://office.microsoft.com/download.../Msohtmf2.aspx

--
######################
## PH, London ##
######################
da******@gujral.co.uk says...
> can anyone recommend a good tool to convert documents to HTML on the
> fly. I need to integrate this tool with a VB app so it must have an
> API.


Jul 20 '05 #13
On Thu, 10 Jul 2003 13:32:24 +0000 (UTC), "Philip Herlihy"
<fo******@REMOVEherlihy.eu.com> wrote:
But does "clean HTML" matter in itself, independently of the purpose of the
page? I don't think so. If you have a page which must load quickly, then
you'd probably optimise by hand
When writing a program, one typically starts by doing it in the most
straightforward fashion. If this isn't fast enough, then one starts
optimising.

But in HTML the most straightforward fashion normally *is* the optimised
version. One just has to avoid getting a lot of superfluous crud in
there in the first place.

Clean HTML also typically matters if you want your page to be properly
readable by browsers other than IE.
But when you have an
occasional document which you'd like to make available on the web, it'll
depend on whether 28 seconds of a visitor's time is worth more than the time
it would take to make the conversion.


Firstly - that 28 seconds was from my example on the local hard disk.
Over a modem we're probably talking about more than a minute extra time.

For the rest - it depends. Occasionally one might indeed resort to a
simple dump from Word. But the original question was "can anyone
recommend a good tool to convert documents to HTML" - note the word
'good' - and in that context Word is hardly appropriate.

--
Stephen Poley

http://www.xs4all.nl/~sbpoley/webmatters/
Jul 20 '05 #14
On Thu, 10 Jul 2003 13:32:24 +0000 (UTC), "Philip Herlihy"
<fo******@REMOVEherlihy.eu.com> wrote:
I'm not immune to prissiness about HTML - one of my sites has a page
which is taken from an Excel spreadsheet, and every time I copy and paste I
shudder momentarily at the redundancy in the resulting code. But it's only
occasionally visited (that's fine by me) and it's not worth additional
effort.


A few months ago I found a nifty little program that converts Excel
spreadsheets to clean HTML. It's called XLS2HTML and you can find out
more about it at: http://www.finertechnologies.com/index-xls2html.html

When I got it the download was free. The site above now offers a free
trial version that times out in 10 days. Still, it was a god send for
me, and still is. A quick look at the site also shows a program
called DOC2HTML. Worth checking out......

Haven't seen this mentioned in the thread, but another option for
converting pages for the web - Adobe Acrobat - .pdf.

Just more of my $.02.

Leslie
Jul 20 '05 #15
"Philip Herlihy" <fo******@REMOVEherlihy.eu.com> wrote in message
news:be**********@titan.btinternet.com...
Spotted this link to an HTML filter for Office 2000. The filter appears to be built into Office XP.

http://office.microsoft.com/download.../Msohtmf2.aspx


If anyone knows of a super Word code cleaner, I'd love to hear it. What I
end up having to do is using the HTML filter, then removing all spans, all
divs, all class and style attributes and then manually set all the list
items. It's a royal pain (yet a process I've managed to get down to about
5-10 mins per document).

Jonathan
Jul 20 '05 #16
Thanks for the note, Mark, but I'm not really that interested in this
debate. We'll have to agree to differ.

--
######################
## PH, London ##
######################

"Mark Parnell" <we*******@clarkecomputers.com.au> wrote in message
news:3f***********************@freenews.iinet.net. au...
Philip Herlihy wrote:
On top-posting:

Jul 20 '05 #17
In article <qooPa.125390$x4o.46930
@news04.bloor.is.net.cable.rogers.com>, go***************@snook.ca
says...
"Philip Herlihy" <fo******@REMOVEherlihy.eu.com> wrote in message
news:be**********@titan.btinternet.com...
Spotted this link to an HTML filter for Office 2000. The filter appears .... http://office.microsoft.com/download.../Msohtmf2.aspx


If anyone knows of a super Word code cleaner, I'd love to hear it. What I
end up having to do is using the HTML filter, then removing all spans, all
divs, all class and style attributes and then manually set all the list

....

http://www.jafsoft.com/detagger/ will do quite a lot of that.
Jul 20 '05 #18

"Jacqui or (maybe) Pete" <po****@spamcop.net> wrote in message
news:MP************************@news.CIS.DFN.DE...
If anyone knows of a super Word code cleaner, I'd love to hear it. What I end up having to do is using the HTML filter, then removing all spans, all divs, all class and style attributes and then manually set all the list


http://www.jafsoft.com/detagger/ will do quite a lot of that.


That does do quite a bit. That in combination with TidyHTML, it's 90% there!
:-) Thank you very much for the link.

Jonathan
Jul 20 '05 #19
Tim
On Thu, 10 Jul 2003 12:57:48 +0000 (UTC),
"Philip Herlihy" <fo******@REMOVEherlihy.eu.com> wrote:
However, I'm not going to stop top-posting, because I strongly
prefer it, and I'm voting with my postings, as it were.


The most important thing to remember, is that if you're posting seeking
solutions, then you want to post in a manner that's most likely to get
you *USEFUL* answers.

a. Post the same as the others, i.e. in the preferred style for
where you're posting.

b. Post in a manner that's suitable for the recipients more than
your own prejudices.

c. You're most likely to get the correct information from the old
hands, and many of them will just ignore top-posting.

You're cutting off your own nose to spite your face, with what you've
said.

--
My "from" address is totally fake. (Hint: If I wanted e-mails from
complete strangers, I'd have put a real one, there.) Reply to usenet
postings in the same place as you read the message you're replying to.
Jul 20 '05 #20
In message <be**********@hercules.btinternet.com> on Thursday July 10
2003 07:57, Philip Herlihy wrote:
"Darin McGrew" <mc****@stanfordalumni.org> wrote in message
news:be**********@blue.rahul.net...
A: It's backwards and makes discussions harder to follow:
http://www.cs.tut.fi/~jkorpela/usenet/brox.html [and then at the end of the article] Q. What's wrong with Text Over, Fullquote Under (TOFU) posting?
On top-posting:

Thanks for the link to that intelligent article, which did make me
think about it again, despite an initial hostile prejudice. In
general I ignore off-topic complaints about posting style as the
mostly come from fuss-pots and are mainly noise - I do acknowledge
that your comments here are informed and useful.


Well, I have found that generally TOFU posts come from clue-lacking
people and are mainly noise.
However, I'm not going to stop top-posting, because I strongly prefer
it,
The majority of the rest of Usenet does not, however.
and I'm voting with my postings, as it were. I also rather like OE,
which happens to make bottom-posting awkward.
How so?
Even if there was an option to reverse OE's top-posting into
bottom-posting I wouldn't use it.
The cursor is at the top of the post so you can edit the quotes down to
the relevant portion before replying. Top-posting has nothing to do
with your software. Your keyboard does have a working down arrow key,
correct? You do know how to type Control-End, correct? (I think that
takes you to the end of the post, I don't use Microsoft products as a
rule but that's the CUA standard key for doing so)
Some folk will rail against violation of "standards", and always
against Microsoft, but there are more important issues in my own life.


Have you stopped to think about why?

I often start reading a news article by scrolling down past the quoting.
TOFU articles completely foul this up, because now I have a blank
screen. I have to go all the way back up to find the original text.
Then, to find out what's being replied to, I'm usually out of habit
looking above, only to find the top of the article, so then I scroll
down to look at the quoted article. It's hard to blame a lot of the
Usenet veterans for killfiling the sources of TOFU articles on sight,
and I've been tempted to do so myself on many an occasion.

BTW, to quote from previous articles on a TOFU post with a signature, I
have to manually select the entire article in KNode before replying as
it thoughtfully trims everything below the signature, in this case, all
the quotes so I have no idea what is being replied to as I'm writing
*my* followup (even if I edit it out before posting, I like to have it
there).

--
Shawn K. Quinn
Jul 20 '05 #21
"Philip Herlihy" <fo******@REMOVEherlihy.eu.com> wrote in message news:<be**********@hercules.btinternet.com>...
On top-posting:

Thanks for the link to that intelligent article, which did make me think
about it again, despite an initial hostile prejudice.


I also have some comments on quoting style in my site:

http://mailformat.dan.info/quoting/

--
Dan
Jul 20 '05 #22

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: tjordah | last post by:
Hi! Im looking for a developed XML format that hides the low-level details of XSL-FO but that easily can be converted to nice-looking XSL-FO documents as well as HTML through a set of...
20
by: Al Moritz | last post by:
Hi all, I was always told that the conversion of Word files to HTML as done by Word itself sucks - you get a lot of unnecessary code that can influence the design on web browsers other than...
29
by: Armand Karlsen | last post by:
I have a website ( http://www.zen62775.zen.co.uk ) that I made HTML 4.01 Transitional and CSS compliant, and I'm thinking of converting it into XHTML to learn a little about it. Which XHTML variant...
2
by: mike | last post by:
regards: I follow the following steps to converting from HTML to XHTML http://webpageworkshop.co.uk/main/xhtml_converting My parser is http://htmlparser.sourceforge.net/ Xhtml version is 1.0...
3
by: Stephan Brunner | last post by:
Hi I have created two flavors of an XSLT stylesheet to transform all attributes of an XML document to elements: They both work as expected with MSXML and XMLSPY but throw an exception ...
6
by: Glenn | last post by:
Hi, I have a fairly urgent requirement to generate PDF documents from within a C# .NET component, based on generated HTML reports. These HTML reports contain images and use CSS styles...
9
by: anupamjain | last post by:
Hi, After 2 weeks of search/hit-and-trial I finally thought to revert to the group to find solution to my problem.(something I should have done much earlier) This is the deal : On a JSP...
1
by: =?Utf-8?B?U3FsQmVnaW5uZXI=?= | last post by:
I want to automate a process of converting documents (*.doc) to html pages using C#. Please note that documents might contain images within it. Any pointers in this regard would be of great help...
0
by: Andre Majorel | last post by:
Is there some command-line program for Unix to make all links relative in HTML documents saved in wget -x fashion ? (http://foo.com/a/b.html saved as ./foo.com/a/b.html.) For example, - if...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.