473,385 Members | 2,044 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

html to pdf

Hi
We have HTML datasheets but now we want then in PDF format because
page layout is very bad when HTML is printed.

I am through with the XML and XSLT part. But donot hav any idea of XSL-
FO.
I guess, i would need to use "fo" tags in the XSLT. Could someone
suggest me some good reference material, or pointers for this.

Its a bit urgent, please do help.

Best Regards
Surbhi
Jun 27 '08 #1
5 1813
Surbhi escribió:
We have HTML datasheets but now we want then in PDF format because
page layout is very bad when HTML is printed.

I am through with the XML and XSLT part. But donot hav any idea of
XSL- FO. I guess, i would need to use "fo" tags in the XSLT. Could
someone suggest me some good reference material, or pointers for
this.

Its a bit urgent, please do help.
http://www.antennahouse.com/XSLsampl...l-xhtml2fo.zip

--
Manuel Collado - http://lml.ls.fi.upm.es/~mcollado
Jun 27 '08 #2
On 13 May, 10:23, Surbhi <surb...@gmail.comwrote:
We have HTML datasheets but now we want then in PDF format because
page layout is very bad when HTML is printed.
I wouldn't give up on printed HTML, but I can understand how you're
thinking here.

You have two options, "Print HTML to PDF", using tools such as Adobe's
(expensive) or Foxit (open-source, simpler, free)

Otherwise go down the XSLT, XSL:FO, PDF route. You'll probably find
Apache FOP to be the easiest route from :FO to PDF. I do this a lot (a
quarter of my working day) and host it all within Java and Ant as a
"make" framework to glue it all together. For bigger systems, Cocoon
or Apache Forrest are worth looking at too.

Learning to code the XSL:FO is painful, the rest is well-established
pipeline tools that just get on with it and work. You'll find that
good HTML + CSS knowledge is a good starting point to understanding
XSL:FO properties and rendering. If you have that, then just the W3C
specs for XSL:FO are enough to work with. If you want a CSS
background, read Lie & Bos "Cascading Style Sheets". Usenet group
c.i.w.a.s is good too.

This stuff isn't an easy bit of knowledge to learn, so start simple
and get _something_ working first, then look to expand it. It's very
useful long-term though, so it does repay the effort.

Jun 27 '08 #3
Surbhi wrote:
Hi
We have HTML datasheets but now we want then in PDF format because
page layout is very bad when HTML is printed.

I am through with the XML and XSLT part. But donot hav any idea of XSL-
FO.
I guess, i would need to use "fo" tags in the XSLT. Could someone
suggest me some good reference material, or pointers for this.

Its a bit urgent, please do help.

Best Regards
Surbhi
Whether you would be better
to transform your XML directly to 'Formatting Objects' or to
transform it indirectly ( to 'Docbook' or something similar with an
off-the-shelf transformation to xsl-fo ) is a moot point.

There are also a few off-the-shelf stylesheets that convert html
directly to pdf, but the Typographic quality varies.
Whatever, you still need a 'serialiser' to convert the xsl-fo into pdf.
Apache FOP is a popular open source one.

Wikipedia is a good place to start:

http://en.wikipedia.org/wiki/XSL_Formatting_Objects
If you want a complete system, so you can concentrate of learning one
bit at a time (such as xsl-fo),
you could try Apache Cocoon, and use the 'Hello World' example
where the same XML is converted to MANY output formats, including
pdf, xhtml, svg, postscript, flash, open document, Excel ...

(There is an example of the kind of stuff you need to
do, at:

http://cocoon.apache.org/2.1/howto/h...ublishing.html

)
Some of the major software vendors have tutorials in the use
of xsl-fo. For example 'Render-X' and 'Antenna House'; most of
the material is just as relevant to a free serialiser such as
Apache FOP. (If graphics are important, you may need a 'try and
see' approach , particularly with vector graphics and
transparency which are degraded or lost by some serialisers).

On the other hand, at least one serialiser now goes beyond the xsl-fo
specification, allowing rudimentary interactive forms in the pdf.

Finally, there are systems which you can use to convert your XML
to LaTeX, and from there you will get very high quality output. But
LaTeX is yet another massive leaning task if none of your team already
know it.

Jun 27 '08 #4
On 13 May, 15:24, Ken Starks <stra...@lampsacos.demon.co.ukwrote:
Whether you would be better
to transform your XML directly to 'Formatting Objects' or to
transform it indirectly ( to 'Docbook' or something similar with an
off-the-shelf transformation to xsl-fo ) is a moot point.
I wouldn't go down that route, via DocBook.

Of course this all depends a _lot_ on the quality of the HTML. HTML
3.2 with presentation guff all over it is a lot more trouble to work
with than pure-semantics HTML 4 + CSS. This is true for any processing
toolset. HTML 4 with a bad case of "divitis" is actually one of the
easiest targets for conversion to XSL:FO. Bad practice for coding
semantic HTML, but a closer match to your target here.

HTML is somewhat more generalised than DocBook, so converting
"upwards" to DocBook is unlikely to have any more structure implied in
it than is simply inferred automatically from the HTML. DocBook isn't
some fantastic panacea anyway - I've rarely used it in practice, as
its minor advantages over HTML are all too often outweighed by being
yet another format. Unless you need book-level structuring, if all you
need is inline markup, paragraphs and headings, then HTML 4 gives you
nearly as much anyway.

I'd consider going from HTML to DocBook if I was concatenating a
number of pages to make one single DocBook representing the whole set
as a site, but very rarely for single page stuff.

As to the use of pre-existing transforms for DocBook to XSL:FO, then
these are certainly available and well-done, but they're not as useful
as one might think. This is for two reasons: they're not as necessary
as one might think, and it's not so hard to do without them.

The off-the-shelf DocBook stylesheets have a big advantage in that
they're competent, full implementations of all DocBook elements. Now
most of us just don't need that, because we only author a tiny subset
of DocBook anyway. I've never used the <kitchen-sinkelement,
although I'm sure DocBook has one somewhere. This is particularly the
case for auto-generated DocBook out of HTML. Secondly, it's not that
hard to write a minimal XSLT to make simple (i.e. little formatting
subtlety) XSL:FO. Thirdly it's harder to make XSL:FO with complex
formatting. If you don't need this, either use the pre-exisitng
stylesheet or write your own - neither is impossibly hard. If you _do_
need complex formatting, you probably have to write your own XSLT
whether you like it or not.
Jun 27 '08 #5
I agree with you, Andy. DocBook is a poor example, being far too
heavy. I think I was really thinking of something more lightweight
such as LinuxDoc (which you can take into Lyx for tweaking). I have
also, recently, given .dita a quick spin, but it also seems to
be yet another format. (It too has many more elements than html,
by the way.)

Yours,

Ken.

Andy Dingley wrote:
On 13 May, 15:24, Ken Starks <stra...@lampsacos.demon.co.ukwrote:
>Whether you would be better
to transform your XML directly to 'Formatting Objects' or to
transform it indirectly ( to 'Docbook' or something similar with an
off-the-shelf transformation to xsl-fo ) is a moot point.

I wouldn't go down that route, via DocBook.

... <snipmany good points.
Jun 27 '08 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: VK | last post by:
09/30/03 Phil Powell posted his "Radio buttons do not appear checked" question. This question led to a long discussion about the naming rules applying to variables, objects, methods and properties...
4
by: Francois Keyeux | last post by:
hello everyone: i have a web site built using vbasic active server scripting running on iis (it works on either iis 50 and 60, but is designed for iis 50) i know how to create a plain text...
1
by: cirillo_curiosone | last post by:
Hi, i'm new to javascript. I started studing it on the web few weeks ago, but still haven't been able to solve one big problem: HOT TO PASS VALUES FROM A SCRIPT VARIABLE TO A CHILD HTML...
33
by: LRW | last post by:
http://gto.ie-studios.net/index.php When you view the above site in IE, if the 1st of the three product images is tall enough to push the cell down a couple of pixels, IE somehow doesn't show...
0
by: Boris Ammerlaan | last post by:
This notice is posted about every week. I'll endeavor to use the same subject line so that those of you who have seen it can kill-file the subject; additionally, Supersedes: headers are used to...
9
by: Patient Guy | last post by:
Taking the BODY element as an example, all of its style attributes ('alink', 'vlink', 'background', 'text', etc.) are deprecated in HTML 4.01, a fact noted in the DOM Level 2 HTML specification. ...
5
by: serge calderara | last post by:
Dear all, I am new in asp.net and prepare myself for exam I still have dificulties to understand the difference between server control and HTML control. Okey things whcih are clear are the fact...
6
by: Guy Macon | last post by:
cwdjrxyz wrote: HTML 5 has solved the above probem. See the following web page: HTML 5, one vocabulary, two serializations http://www.w3.org/QA/2008/01/html5-is-html-and-xml.html
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.