473,398 Members | 2,125 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,398 software developers and data experts.

PDF to HTML conversion help...

Hello,

I tried the following post a few weeks ago and never received any replies,
so I figured I'd try again.

I'm seeking suggestions for an interesting problem I have. I'm building a
web application that presents legal documents that users must accept in order
to proceed on the site. The documents are in PDF format. The legal
department requires that I convert the PDF docs to HTML and display them on a
web page. I can't just display the PDF document (which would be simple
enough) but rather need to convert the PDF to HTML. This will allow me to
include the standard "I agree..." checkbox and submit button at the bottom of
the page. Users will only be able to get to the submit button by scrolling
down the entire HTML page, and thus theoretically reading the entire
agreement. I found a tool called "ABC Amber PDF Converter" that converts PDF
files to HTML, but the output quality leaves a LOT to be desired, with the
output being unusable. Plus, it requires shelling out to launch the
command-line exe. Do you have any other ideas, or know of any other tools
that might do the trick?

Thanks for your help,
Greg S.
Nov 18 '05 #1
7 1877
I know Adobe Acrobat writer allows you to embed forms in it. You could
easily create a form to post to your server and insert the data.

hope this helps,
Anthony

--
Anthony J Biondo Jr. - MCP
Senior Web Developer
Keystone Mercy Health Plan - Philadelphia, PA
www.keystonemercy.com
"webgreginsf" <we*********@discussions.microsoft.com> wrote in message
news:CB**********************************@microsof t.com...
Hello,

I tried the following post a few weeks ago and never received any replies,
so I figured I'd try again.

I'm seeking suggestions for an interesting problem I have. I'm building a
web application that presents legal documents that users must accept in order to proceed on the site. The documents are in PDF format. The legal
department requires that I convert the PDF docs to HTML and display them on a web page. I can't just display the PDF document (which would be simple
enough) but rather need to convert the PDF to HTML. This will allow me to
include the standard "I agree..." checkbox and submit button at the bottom of the page. Users will only be able to get to the submit button by scrolling down the entire HTML page, and thus theoretically reading the entire
agreement. I found a tool called "ABC Amber PDF Converter" that converts PDF files to HTML, but the output quality leaves a LOT to be desired, with the
output being unusable. Plus, it requires shelling out to launch the
command-line exe. Do you have any other ideas, or know of any other tools
that might do the trick?

Thanks for your help,
Greg S.

Nov 18 '05 #2
Hello Anthony,

Thanks for your suggestion. Actually, I'm using a tool named PDFKit.Net to
generate the PDF. Basically, I'm doing what you suggested in that I have a
PDF Form in which I dynamically insert data from the db. The only problem is
that the legal department wants me to email the customer the PDF version of
the document I generate, but display an HTML version of it on the web page.
Their requirement is that I display the whole thing as one long document
(with no scroll bars) so that the user is forced to scroll down the entire
page to get to the accept checkbox and submit button. They won't let me just
display the PDF document in a web page or in some sort of viewer control. So
I'm trying to find some way to convert the PDF document to HTML while keeping
most of the formatting in tact. Hopefully there's something out there that
does this...

Thanks for your help,
Greg

"Anthony Biondo Jr." wrote:
I know Adobe Acrobat writer allows you to embed forms in it. You could
easily create a form to post to your server and insert the data.

hope this helps,
Anthony

--
Anthony J Biondo Jr. - MCP
Senior Web Developer
Keystone Mercy Health Plan - Philadelphia, PA
www.keystonemercy.com
"webgreginsf" <we*********@discussions.microsoft.com> wrote in message
news:CB**********************************@microsof t.com...
Hello,

I tried the following post a few weeks ago and never received any replies,
so I figured I'd try again.

I'm seeking suggestions for an interesting problem I have. I'm building a
web application that presents legal documents that users must accept in

order
to proceed on the site. The documents are in PDF format. The legal
department requires that I convert the PDF docs to HTML and display them

on a
web page. I can't just display the PDF document (which would be simple
enough) but rather need to convert the PDF to HTML. This will allow me to
include the standard "I agree..." checkbox and submit button at the bottom

of
the page. Users will only be able to get to the submit button by

scrolling
down the entire HTML page, and thus theoretically reading the entire
agreement. I found a tool called "ABC Amber PDF Converter" that converts

PDF
files to HTML, but the output quality leaves a LOT to be desired, with the
output being unusable. Plus, it requires shelling out to launch the
command-line exe. Do you have any other ideas, or know of any other tools
that might do the trick?

Thanks for your help,
Greg S.


Nov 18 '05 #3
You can buy Adobe Acrobat & convert them that way.

Or a better solution would probably be to embed the PDF into an HTML page
that has the "I Agree" button in it. That way you don't have to convert
anything at all. Search on "embed pdf html" -- here's a result I found
http://www.planetpdf.com/mainpage.asp?webpageid=1682

--
Ben Strackany
www.developmentnow.com

<a href="http://www.developmentnow.com">dn</a>
"webgreginsf" <we*********@discussions.microsoft.com> wrote in message
news:CB**********************************@microsof t.com...
Hello,

I tried the following post a few weeks ago and never received any replies,
so I figured I'd try again.

I'm seeking suggestions for an interesting problem I have. I'm building a
web application that presents legal documents that users must accept in order to proceed on the site. The documents are in PDF format. The legal
department requires that I convert the PDF docs to HTML and display them on a web page. I can't just display the PDF document (which would be simple
enough) but rather need to convert the PDF to HTML. This will allow me to
include the standard "I agree..." checkbox and submit button at the bottom of the page. Users will only be able to get to the submit button by scrolling down the entire HTML page, and thus theoretically reading the entire
agreement. I found a tool called "ABC Amber PDF Converter" that converts PDF files to HTML, but the output quality leaves a LOT to be desired, with the
output being unusable. Plus, it requires shelling out to launch the
command-line exe. Do you have any other ideas, or know of any other tools
that might do the trick?

Thanks for your help,
Greg S.

Nov 18 '05 #4
Ben,

Thanks for your suggestions. Unfortunately, because the server my web app
sits on is used for various other sites as well, and Adobe Acrobat tends to
be a bit of a resource hog, my bosses don't want to install it on the server.
Plus, since I will be generating the documents on the fly, I'm not sure how
Acrobat would handle simultaneous requests. I know MS doesn't recommend
using Office products in this fashion.

As for embedding the PDF in the HTML page, I thought of that one, too. The
problem is the legal department doesn't want it to be inside of a separate
control with scrollbars, as this conceivably would allow the user to scroll
down the page to get to the "I Agree" button without actually scrolling
through the legal agreement, so that one was a no go.

I really do appreciate your input, though.

Greg

"Ben Strackany" wrote:
You can buy Adobe Acrobat & convert them that way.

Or a better solution would probably be to embed the PDF into an HTML page
that has the "I Agree" button in it. That way you don't have to convert
anything at all. Search on "embed pdf html" -- here's a result I found
http://www.planetpdf.com/mainpage.asp?webpageid=1682

--
Ben Strackany
www.developmentnow.com

<a href="http://www.developmentnow.com">dn</a>
"webgreginsf" <we*********@discussions.microsoft.com> wrote in message
news:CB**********************************@microsof t.com...
Hello,

I tried the following post a few weeks ago and never received any replies,
so I figured I'd try again.

I'm seeking suggestions for an interesting problem I have. I'm building a
web application that presents legal documents that users must accept in

order
to proceed on the site. The documents are in PDF format. The legal
department requires that I convert the PDF docs to HTML and display them

on a
web page. I can't just display the PDF document (which would be simple
enough) but rather need to convert the PDF to HTML. This will allow me to
include the standard "I agree..." checkbox and submit button at the bottom

of
the page. Users will only be able to get to the submit button by

scrolling
down the entire HTML page, and thus theoretically reading the entire
agreement. I found a tool called "ABC Amber PDF Converter" that converts

PDF
files to HTML, but the output quality leaves a LOT to be desired, with the
output being unusable. Plus, it requires shelling out to launch the
command-line exe. Do you have any other ideas, or know of any other tools
that might do the trick?

Thanks for your help,
Greg S.


Nov 18 '05 #5
webgreginsf wrote:
Hello Anthony,

Thanks for your suggestion. Actually, I'm using a tool named
PDFKit.Net to generate the PDF.


You are *generating* those PDF's yourself? Why not generate the
HTML version from the same data, instead of first going to PDF
and then to HTML?
If you want output in PDF format, use your existing code,
if you want output in HTML, generate that directly from the data,
exactly the way *you* want it.

Hans Kesting
Nov 18 '05 #6
Hans,

Thanks for your idea. That is actually a very good idea and might be what I
have to do. The only reason I was trying to stear clear from generating the
HTML is that these documents I'm generating/displaying are 30+ page monster
legal contracts that can change fairly often. The legal department creates
the originals in MS Word, which we then convert to a PDF form using Acrobat
Writer. This form serves as a "template" that I place out on the web server
and populate certain user-specific pieces of data like the user's name,
company info, etc. at run time. We were hoping to automate the process so
that the users can generate the PDF templates without any manual intervention
from my team. If I use the HTML method like you talked about, I'll probably
have to manually convert the monsters to HTML on a fairly regular basis.
They don't want MS Word on the server, so I can't just use the built-in Word
to HTML conversion to generate the HTML as the generated HTML uses
Word-specific tags that requires Word to be on the web server.

I think you're right, though, and doing the HTML myself is probably the best
option, so that I'll have two templates for each doc (both PDF and HTML
versions). I can then dynamically fill in the same info into both of them at
run time. It requires some manual steps, but hey something is better than
nothing, right?

Thanks for your help,
Greg

"Hans Kesting" wrote:
webgreginsf wrote:
Hello Anthony,

Thanks for your suggestion. Actually, I'm using a tool named
PDFKit.Net to generate the PDF.


You are *generating* those PDF's yourself? Why not generate the
HTML version from the same data, instead of first going to PDF
and then to HTML?
If you want output in PDF format, use your existing code,
if you want output in HTML, generate that directly from the data,
exactly the way *you* want it.

Hans Kesting

Nov 18 '05 #7
webgreginsf wrote:
Hans,

Thanks for your idea. That is actually a very good idea and might be what I
have to do. The only reason I was trying to stear clear from generating the
HTML is that these documents I'm generating/displaying are 30+ page monster
legal contracts that can change fairly often. The legal department creates
the originals in MS Word, which we then convert to a PDF form using Acrobat
Writer. This form serves as a "template" that I place out on the web server
and populate certain user-specific pieces of data like the user's name,
company info, etc. at run time. We were hoping to automate the process so
that the users can generate the PDF templates without any manual intervention
from my team. If I use the HTML method like you talked about, I'll probably
have to manually convert the monsters to HTML on a fairly regular basis.
They don't want MS Word on the server, so I can't just use the built-in Word
to HTML conversion to generate the HTML as the generated HTML uses
Word-specific tags that requires Word to be on the web server.

I think you're right, though, and doing the HTML myself is probably the best
option, so that I'll have two templates for each doc (both PDF and HTML
versions). I can then dynamically fill in the same info into both of them at
run time. It requires some manual steps, but hey something is better than
nothing, right?

Thanks for your help,
Greg

"Hans Kesting" wrote:

webgreginsf wrote:
Hello Anthony,

Thanks for your suggestion. Actually, I'm using a tool named
PDFKit.Net to generate the PDF.


You are *generating* those PDF's yourself? Why not generate the
HTML version from the same data, instead of first going to PDF
and then to HTML?
If you want output in PDF format, use your existing code,
if you want output in HTML, generate that directly from the data,
exactly the way *you* want it.

Hans Kesting

If you are using one of the latest versions of office save it out as xml
and transform the file into html that way on the server.
Nov 18 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

10
by: st4 | last post by:
Help, As part of my family history web site i need to get 150 pages of typed text into some format to display. It just text right now but I would like to add some graphics (photos) and make the...
9
by: MLibby | last post by:
How do I convert an HTML page into XML? My initial thought is to convert the page to xslt but I'm not sure how to do this. Please provide any source code examples if you have them. Thanks, Mike...
0
by: Joergen Bech | last post by:
Help! Looking for a .Net-kompatible component for converting between HTML and RTF (both ways). Asked this question a few days ago and received this link:...
2
by: Number 11950 - GPEMC! Replace number with 11950 | last post by:
HTML to RTF conversion is done by the clipboard in certain circumstances. Does anyone know of an API or possibly a Framework2 class.method that will convert HTML to RTF...? TIA -- Timothy...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.