By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
429,189 Members | 2,153 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 429,189 IT Pros & Developers. It's quick & easy.

converting documents to HTML

P: n/a
can anyone recommend a good tool to convert documents to HTML on the
fly. I need to integrate this tool with a VB app so it must have an
API.

thanks in advance
Davinder
da******@gujral.co.uk
Jul 20 '05 #1
Share this Question
Share on Google+
21 Replies


P: n/a
In article <de**************************@posting.google.com >,
da******@gujral.co.uk says...
can anyone recommend a good tool to convert documents to HTML on the
fly. I need to integrate this tool with a VB app so it must have an
API.

What kind of document?
Jul 20 '05 #2

P: n/a
90% of all docs will be office documents. The other 10% are pdf's, gif, jpeg and bmp

Jacqui or (maybe) Pete <po****@spamcop.net> wrote in message news:<MP************************@news.CIS.DFN.DE>. ..
In article <de**************************@posting.google.com >,
da******@gujral.co.uk says...
can anyone recommend a good tool to convert documents to HTML on the
fly. I need to integrate this tool with a VB app so it must have an
API.

What kind of document?

Jul 20 '05 #3

P: n/a
On Wed, 09 Jul 2003 10:14:57 +0200, Davinder <"Davinder"
<da******@gujral.co.uk>> wrote:
90% of all docs will be office documents. The other 10% are pdf's, gif,
jpeg and bmp


Which office documents?

pdfs can be run through any of a number of filters. Try googling for
them. .gifs and jpegs can already be displayed inline by most browsers
that are capable of displaying images. .bmps need to be converted to jpgs
or pngs or gifs.

Ciao

Zak

--
================================================== ======================
http://www.carfolio.com/ Searchable database of 10 000+ car specs
================================================== ======================
Jul 20 '05 #4

P: n/a
Have you not investigated the object models of each Office app? Word, for
example, gives you access to a Document object, which has a SaveAs method,
one of whose parameters is FileFormat, which can take a value of
wdFormatHTML (this is Word XP). You may also be save as Compact HTML by
experimenting with the FileConverters object.

Have a look at this, to get you started.
http://msdn.microsoft.com/library/de...ordObjects.asp

--
######################
## PH, London ##
######################

"Davinder" <da******@gujral.co.uk> wrote in message
news:de**************************@posting.google.c om...
90% of all docs will be office documents. The other 10% are pdf's, gif, jpeg and bmp
Jacqui or (maybe) Pete <po****@spamcop.net> wrote in message

news:<MP************************@news.CIS.DFN.DE>. ..
In article <de**************************@posting.google.com >,
da******@gujral.co.uk says...
can anyone recommend a good tool to convert documents to HTML on the
fly. I need to integrate this tool with a VB app so it must have an
API.

What kind of document?

Jul 20 '05 #5

P: n/a
In article <be**********@titan.btinternet.com>, fo******@REMOVEherlihy.eu.com
says...
Have you not investigated the object models of each Office app? Word, for
example, gives you access to a Document object, which has a SaveAs method,
one of whose parameters is FileFormat, which can take a value of
wdFormatHTML (this is Word XP). You may also be save as Compact HTML by
experimenting with the FileConverters object.

Have a look at this, to get you started.
http://msdn.microsoft.com/library/de...ordObjects.asp

Yes, but the HTML that word produces is absolute GARBAGE!
Jul 20 '05 #6

P: n/a
Philip.
i have tried the the word office model...it worked well although i was
looking for something alittle more sophisticated. For example,
converting a word doc with 40+ pages would give me 1 large html rather
than linked pages.

Currently i am using net-it-central...this works great but its TOO
expensive for us to buy another license.

Davinder Gujral
da******@gujral.co.uk
"Philip Herlihy" <fo******@REMOVEherlihy.eu.com> wrote in message news:<be**********@titan.btinternet.com>...
Have you not investigated the object models of each Office app? Word, for
example, gives you access to a Document object, which has a SaveAs method,
one of whose parameters is FileFormat, which can take a value of
wdFormatHTML (this is Word XP). You may also be save as Compact HTML by
experimenting with the FileConverters object.

Have a look at this, to get you started.
http://msdn.microsoft.com/library/de...ordObjects.asp

--
######################
## PH, London ##
######################

"Davinder" <da******@gujral.co.uk> wrote in message
news:de**************************@posting.google.c om...
90% of all docs will be office documents. The other 10% are pdf's, gif,

jpeg and bmp

Jacqui or (maybe) Pete <po****@spamcop.net> wrote in message

news:<MP************************@news.CIS.DFN.DE>. ..
In article <de**************************@posting.google.com >,
da******@gujral.co.uk says...
> can anyone recommend a good tool to convert documents to HTML on the
> fly. I need to integrate this tool with a VB app so it must have an
> API.
What kind of document?

Jul 20 '05 #7

P: n/a
Of course. But who cares?

--
######################
## PH, London ##
######################

"Mr. Clean" <mr*****@protctorandgamble.com> wrote in message
news:MP************************@news-server.austin.rr.com...
In article <be**********@titan.btinternet.com>, fo******@REMOVEherlihy.eu.com says...
Have you not investigated the object models of each Office app? Word, for example, gives you access to a Document object, which has a SaveAs method, one of whose parameters is FileFormat, which can take a value of
wdFormatHTML (this is Word XP). You may also be save as Compact HTML by
experimenting with the FileConverters object.

Have a look at this, to get you started.
http://msdn.microsoft.com/library/de...ordObjects.asp

Yes, but the HTML that word produces is absolute GARBAGE!

Jul 20 '05 #8

P: n/a
On Thu, 10 Jul 2003 00:03:37 +0000 (UTC), "Philip Herlihy"
<fo******@REMOVEherlihy.eu.com> wrote:
Of course. But who cares?


(Further context vanished because the quoted text was part of the sig.
Please have a read of http://www.xs4all.nl/~sbpoley/toppost.htm).

Maybe your readers might just care? I tried using Word-generated HTML
just once. It was horrible. My hand-coded version took 2 seconds to load
from my local hard disk. The Word-generated version took 30 seconds.
(That's not a typo - it took about fifteen times as long!!) By the time
it had come from a server over a modem link, you can be pretty sure that
most of my visitors would have gone elsewhere.

--
Stephen Poley

http://www.xs4all.nl/~sbpoley/webmatters/
Jul 20 '05 #9

P: n/a
On top-posting:

Thanks for the link to that intelligent article, which did make me think
about it again, despite an initial hostile prejudice. In general I ignore
off-topic complaints about posting style as the mostly come from fuss-pots
and are mainly noise - I do acknowledge that your comments here are informed
and useful. However, I'm not going to stop top-posting, because I strongly
prefer it, and I'm voting with my postings, as it were. I also rather like
OE, which happens to make bottom-posting awkward. Even if there was an
option to reverse OE's top-posting into bottom-posting I wouldn't use it.
Some folk will rail against violation of "standards", and always against
Microsoft, but there are more important issues in my own life. I've taken
on board those points the article made about quoting, though.

Amusingly, one of my hobby horses is postings which take you off-topic
without changing the subject line. Tut, tut... :-)

I'll get back the HTML thread in a reply to Stephen...
--
######################
## PH, London ##
######################

"Darin McGrew" <mc****@stanfordalumni.org> wrote in message
news:be**********@blue.rahul.net...
A: It's backwards and makes discussions harder to follow:
http://www.cs.tut.fi/~jkorpela/usenet/brox.html

Jul 20 '05 #10

P: n/a
Ok, so my posting was mischievous - I knew there would be howls (and I was
full of beer at the time!). I also know that Word-generated HTML is bloated
with all sorts of stuff - there to support Word features when HTML is
chosen as a document's native format. MS have even provided a filter to
weed it out when no longer needed (well worth investigating). MS is anyway
moving towards XML format, which will offer new opportunities to improve
things.

But does "clean HTML" matter in itself, independently of the purpose of the
page? I don't think so. If you have a page which must load quickly, then
you'd probably optimise by hand, and I doubt anyone who writes a lot of web
pages will choose Word as their editor of choice. But when you have an
occasional document which you'd like to make available on the web, it'll
depend on whether 28 seconds of a visitor's time is worth more than the time
it would take to make the conversion. It'll usually be worth running the
filter, but if the web performance is that important, then Word is not for
you. I'm not immune to prissiness about HTML - one of my sites has a page
which is taken from an Excel spreadsheet, and every time I copy and paste I
shudder momentarily at the redundancy in the resulting code. But it's only
occasionally visited (that's fine by me) and it's not worth additional
effort. Horses for courses - HTML "flower arranging" is not for me without
some real benefit in view.

Darin commented earlier that editing automatically-generated HTML is a
nightmare. So don't do it. The software engineering world is full of
useful products that shrink development times dramatically by generating
code. It's always hideous, but if you're paying for the time of software
engineers that's a compelling deal. The trick is to make sure you do all
editing through the generator, and separate out anything likely to need
hand-optimisation into a separate module.

--
######################
## PH, London ##
######################

"Stephen Poley" <sb*****@xs4all.nl> wrote in message
news:1f********************************@4ax.com...
On Thu, 10 Jul 2003 00:03:37 +0000 (UTC), "Philip Herlihy"
<fo******@REMOVEherlihy.eu.com> wrote:
Of course. But who cares?


(Further context vanished because the quoted text was part of the sig.
Please have a read of http://www.xs4all.nl/~sbpoley/toppost.htm).

Maybe your readers might just care? I tried using Word-generated HTML
just once. It was horrible. My hand-coded version took 2 seconds to load
from my local hard disk. The Word-generated version took 30 seconds.
(That's not a typo - it took about fifteen times as long!!) By the time
it had come from a server over a modem link, you can be pretty sure that
most of my visitors would have gone elsewhere.

--
Stephen Poley

http://www.xs4all.nl/~sbpoley/webmatters/

Jul 20 '05 #11

P: n/a
Spotted this link to an HTML filter for Office 2000. The filter appears to
be built into Office XP.

http://office.microsoft.com/download.../Msohtmf2.aspx

--
######################
## PH, London ##
######################
da******@gujral.co.uk says...
can anyone recommend a good tool to convert documents to HTML on the
fly. I need to integrate this tool with a VB app so it must have an
API.

Jul 20 '05 #12

P: n/a
Yikes:

http://www.microsoft.com/technet/tre...n/MS03-023.asp

--
######################
## PH, London ##
######################

"Philip Herlihy" <fo******@REMOVEherlihy.eu.com> wrote in message
news:be**********@titan.btinternet.com...
Spotted this link to an HTML filter for Office 2000. The filter appears to be built into Office XP.

http://office.microsoft.com/download.../Msohtmf2.aspx

--
######################
## PH, London ##
######################
da******@gujral.co.uk says...
> can anyone recommend a good tool to convert documents to HTML on the
> fly. I need to integrate this tool with a VB app so it must have an
> API.


Jul 20 '05 #13

P: n/a
On Thu, 10 Jul 2003 13:32:24 +0000 (UTC), "Philip Herlihy"
<fo******@REMOVEherlihy.eu.com> wrote:
But does "clean HTML" matter in itself, independently of the purpose of the
page? I don't think so. If you have a page which must load quickly, then
you'd probably optimise by hand
When writing a program, one typically starts by doing it in the most
straightforward fashion. If this isn't fast enough, then one starts
optimising.

But in HTML the most straightforward fashion normally *is* the optimised
version. One just has to avoid getting a lot of superfluous crud in
there in the first place.

Clean HTML also typically matters if you want your page to be properly
readable by browsers other than IE.
But when you have an
occasional document which you'd like to make available on the web, it'll
depend on whether 28 seconds of a visitor's time is worth more than the time
it would take to make the conversion.


Firstly - that 28 seconds was from my example on the local hard disk.
Over a modem we're probably talking about more than a minute extra time.

For the rest - it depends. Occasionally one might indeed resort to a
simple dump from Word. But the original question was "can anyone
recommend a good tool to convert documents to HTML" - note the word
'good' - and in that context Word is hardly appropriate.

--
Stephen Poley

http://www.xs4all.nl/~sbpoley/webmatters/
Jul 20 '05 #14

P: n/a
On Thu, 10 Jul 2003 13:32:24 +0000 (UTC), "Philip Herlihy"
<fo******@REMOVEherlihy.eu.com> wrote:
I'm not immune to prissiness about HTML - one of my sites has a page
which is taken from an Excel spreadsheet, and every time I copy and paste I
shudder momentarily at the redundancy in the resulting code. But it's only
occasionally visited (that's fine by me) and it's not worth additional
effort.


A few months ago I found a nifty little program that converts Excel
spreadsheets to clean HTML. It's called XLS2HTML and you can find out
more about it at: http://www.finertechnologies.com/index-xls2html.html

When I got it the download was free. The site above now offers a free
trial version that times out in 10 days. Still, it was a god send for
me, and still is. A quick look at the site also shows a program
called DOC2HTML. Worth checking out......

Haven't seen this mentioned in the thread, but another option for
converting pages for the web - Adobe Acrobat - .pdf.

Just more of my $.02.

Leslie
Jul 20 '05 #15

P: n/a
"Philip Herlihy" <fo******@REMOVEherlihy.eu.com> wrote in message
news:be**********@titan.btinternet.com...
Spotted this link to an HTML filter for Office 2000. The filter appears to be built into Office XP.

http://office.microsoft.com/download.../Msohtmf2.aspx


If anyone knows of a super Word code cleaner, I'd love to hear it. What I
end up having to do is using the HTML filter, then removing all spans, all
divs, all class and style attributes and then manually set all the list
items. It's a royal pain (yet a process I've managed to get down to about
5-10 mins per document).

Jonathan
Jul 20 '05 #16

P: n/a
Thanks for the note, Mark, but I'm not really that interested in this
debate. We'll have to agree to differ.

--
######################
## PH, London ##
######################

"Mark Parnell" <we*******@clarkecomputers.com.au> wrote in message
news:3f***********************@freenews.iinet.net. au...
Philip Herlihy wrote:
On top-posting:

Jul 20 '05 #17

P: n/a
In article <qooPa.125390$x4o.46930
@news04.bloor.is.net.cable.rogers.com>, go***************@snook.ca
says...
"Philip Herlihy" <fo******@REMOVEherlihy.eu.com> wrote in message
news:be**********@titan.btinternet.com...
Spotted this link to an HTML filter for Office 2000. The filter appears .... http://office.microsoft.com/download.../Msohtmf2.aspx


If anyone knows of a super Word code cleaner, I'd love to hear it. What I
end up having to do is using the HTML filter, then removing all spans, all
divs, all class and style attributes and then manually set all the list

....

http://www.jafsoft.com/detagger/ will do quite a lot of that.
Jul 20 '05 #18

P: n/a

"Jacqui or (maybe) Pete" <po****@spamcop.net> wrote in message
news:MP************************@news.CIS.DFN.DE...
If anyone knows of a super Word code cleaner, I'd love to hear it. What I end up having to do is using the HTML filter, then removing all spans, all divs, all class and style attributes and then manually set all the list


http://www.jafsoft.com/detagger/ will do quite a lot of that.


That does do quite a bit. That in combination with TidyHTML, it's 90% there!
:-) Thank you very much for the link.

Jonathan
Jul 20 '05 #19

P: n/a
Tim
On Thu, 10 Jul 2003 12:57:48 +0000 (UTC),
"Philip Herlihy" <fo******@REMOVEherlihy.eu.com> wrote:
However, I'm not going to stop top-posting, because I strongly
prefer it, and I'm voting with my postings, as it were.


The most important thing to remember, is that if you're posting seeking
solutions, then you want to post in a manner that's most likely to get
you *USEFUL* answers.

a. Post the same as the others, i.e. in the preferred style for
where you're posting.

b. Post in a manner that's suitable for the recipients more than
your own prejudices.

c. You're most likely to get the correct information from the old
hands, and many of them will just ignore top-posting.

You're cutting off your own nose to spite your face, with what you've
said.

--
My "from" address is totally fake. (Hint: If I wanted e-mails from
complete strangers, I'd have put a real one, there.) Reply to usenet
postings in the same place as you read the message you're replying to.
Jul 20 '05 #20

P: n/a
In message <be**********@hercules.btinternet.com> on Thursday July 10
2003 07:57, Philip Herlihy wrote:
"Darin McGrew" <mc****@stanfordalumni.org> wrote in message
news:be**********@blue.rahul.net...
A: It's backwards and makes discussions harder to follow:
http://www.cs.tut.fi/~jkorpela/usenet/brox.html [and then at the end of the article] Q. What's wrong with Text Over, Fullquote Under (TOFU) posting?
On top-posting:

Thanks for the link to that intelligent article, which did make me
think about it again, despite an initial hostile prejudice. In
general I ignore off-topic complaints about posting style as the
mostly come from fuss-pots and are mainly noise - I do acknowledge
that your comments here are informed and useful.


Well, I have found that generally TOFU posts come from clue-lacking
people and are mainly noise.
However, I'm not going to stop top-posting, because I strongly prefer
it,
The majority of the rest of Usenet does not, however.
and I'm voting with my postings, as it were. I also rather like OE,
which happens to make bottom-posting awkward.
How so?
Even if there was an option to reverse OE's top-posting into
bottom-posting I wouldn't use it.
The cursor is at the top of the post so you can edit the quotes down to
the relevant portion before replying. Top-posting has nothing to do
with your software. Your keyboard does have a working down arrow key,
correct? You do know how to type Control-End, correct? (I think that
takes you to the end of the post, I don't use Microsoft products as a
rule but that's the CUA standard key for doing so)
Some folk will rail against violation of "standards", and always
against Microsoft, but there are more important issues in my own life.


Have you stopped to think about why?

I often start reading a news article by scrolling down past the quoting.
TOFU articles completely foul this up, because now I have a blank
screen. I have to go all the way back up to find the original text.
Then, to find out what's being replied to, I'm usually out of habit
looking above, only to find the top of the article, so then I scroll
down to look at the quoted article. It's hard to blame a lot of the
Usenet veterans for killfiling the sources of TOFU articles on sight,
and I've been tempted to do so myself on many an occasion.

BTW, to quote from previous articles on a TOFU post with a signature, I
have to manually select the entire article in KNode before replying as
it thoughtfully trims everything below the signature, in this case, all
the quotes so I have no idea what is being replied to as I'm writing
*my* followup (even if I edit it out before posting, I like to have it
there).

--
Shawn K. Quinn
Jul 20 '05 #21

P: n/a
"Philip Herlihy" <fo******@REMOVEherlihy.eu.com> wrote in message news:<be**********@hercules.btinternet.com>...
On top-posting:

Thanks for the link to that intelligent article, which did make me think
about it again, despite an initial hostile prejudice.


I also have some comments on quoting style in my site:

http://mailformat.dan.info/quoting/

--
Dan
Jul 20 '05 #22

This discussion thread is closed

Replies have been disabled for this discussion.