473,413 Members | 2,051 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,413 software developers and data experts.

Choice of format for web publishing

I'd like to raise an issue that is somewhat outside the focus of this
newsgroup although related, which is the ideal document format for web
publication.

In terms of likely future trends, what is the ideal format for the
publication of technical documents (by "technical", I mean documents
that are paginated, have bibliography and footnotes),

The reason for my question is that I've become involved in a project to
develop an on-line journal in the humanities. The publisher intends to
solicite manuscripts in Word and convert them to PDF (using Chicago
Style Sheet, which is another matter).

My instinct is to suggest to him that PDF has disadvantages (including
accessibility and not being machine readable), and that he consider
(X)HTML instead. I'd like to know reasons for choosing one over the
other.

--

Haines Brown, KB1GRM

Jun 27 '08 #1
7 1673
On Jun 19, 3:00*pm, Haines Brown <bro...@teufel.hartford-hwp.com>
wrote:
The reason for my question is that I've become involved in a project to
develop an on-line journal in the humanities. The publisher intends to
solicite manuscripts in Word and convert them to PDF (using Chicago
Style Sheet, which is another matter).

My instinct is to suggest to him that PDF has disadvantages (including
accessibility and not being machine readable), and that he consider
(X)HTML instead. I'd like to know reasons for choosing one over the
other.
I'm involved in an online journal that uses the Open Journal System.
Articles are stored on the server as OpenOffice documents, and HTML or
PDF versions are generated on the fly, according to the user's choice.
This method seems to offer the best of both: PDF allows you to
download and store just one file, and makes for better printing; HTML
is (I believe) more accessible, as you say.
Jun 27 '08 #2
What I was hoping to see was someone suggest Tei/XML, with an
appropriate schema and style sheet, but since it was not mentioned, I
wonder if there's a problem going in that direction.
--

Haines Brown, KB1GRM

Jun 27 '08 #3
On 19 Jun, 14:00, Haines Brown <bro...@teufel.hartford-hwp.comwrote:
The reason for my question is that I've become involved in a project to
develop an on-line journal in the humanities. The publisher intends to
solicite manuscripts in Word and convert them to PDF (using Chicago
Style Sheet, which is another matter).

My instinct is to suggest to him that PDF has disadvantages (including
accessibility and not being machine readable), and that he consider
(X)HTML instead. I'd like to know reasons for choosing one over the
other.
What do you mean by "publish" here?

By all means offer PDFs as one final format that your CMS can offer to
readers.

Don't _store_ your content as PDFs though. Use something else
(anything!) and generate PDFs on demand (with caching and maybe pre-
generation).

As a storage format, XHTML is one choice, as would be DocBook or
TEI.

I wouldn't use HTML, although I'd publish my XHTML to readers as HTML
(for web-design reasons oft discussed hereabouts). The reason for this
is that whatever XML-based format you choose for internal storage,
it's likely to involve lots of namespacing and composition of overall
schema by importing snippets from both DocBook and Dublin Core (etc.
etc.) You really need namespace and processing features that XHTML
gives you easily when HTML won't. XML tools will be far more use than
SGML.

I would favour DocBook over HTML for any "long" document that needs
structure at a scope greater than heading / para. Neither has much
semantic markup to them, neither has any advantage in the quality of
their inline markup. DocBook does win out though for section/chapter/
book level structure.
Jun 27 '08 #4
Haines Brown wrote:
What I was hoping to see was someone suggest Tei/XML, with an
appropriate schema and style sheet, but since it was not mentioned, I
wonder if there's a problem going in that direction.
Ask and ye shall receive - Andy Dingley just suggested TEI, though he
proposed (and I concur) that you store internally in TEI or DocBook
but serve HTML. I'm not sure whether that's what you were proposing
above, or whether you were thinking of serving XML + schema + style
sheet to user agents. The latter won't be handled properly by many
UAs, and will confuse non-technical users if they try to save content,
etc.

You might want to take a look at /Kairos/ [1]. They've been in the
online-humanities-journal biz for a while (about 12 years), so they
have a lot of experience with what works well for their authors and
readers.

They publish most content as HTML, but they also run multimedia
articles and the like. One factor to consider with an online
humanities journal is that authors will want to use the affordances of
the readers' systems, and that means accommodating things like video
and interactive applets. (Obviously not all readers will be able to,
or choose to, view that kind of content; but enough will.)

You can get some nice innovative work if you allow for things like
Karl Stolley's "Lo-Fi Manifesto" [2], for example.

(The current /Kairos/ design is ... aging, shall we say; but they have
a much nicer redesign coming out with the next issue that is prettier,
standards-compliant, and amply supplied with features that degrade
gracefully, like hCard markup on author information.)
[1] http://kairos.technorhetoric.net/
[2]
http://kairos.technorhetoric.net/12....lley/index.htm

--
Michael Wojcik
Micro Focus
Rhetoric & Writing, Michigan State University
Jun 27 '08 #5
Michael Wojcik <mw*****@newsguy.comwrites:
Andy Dingley just suggested TEI, though he proposed (and I concur)
that you store internally in TEI or DocBook but serve HTML. I'm not
sure whether that's what you were proposing above, or whether you were
thinking of serving XML + schema + style sheet to user agents. The
latter won't be handled properly by many UAs, and will confuse
non-technical users if they try to save content, etc.
Well, I _was_ toying with the idea of serving XML+schema+stylesheet. By
"UA" I presume you mean the average browser (IE). However, I didn't
realize that browsers have problems with XML + public schema +
stylesheet. Would you be more specific about the kinds of problems and
their likelihood of their occurring? And why would a non-technical user
be confused? Wouldn't the user see on his browser the same thing if the
document were instead served as HTML?

I'm unclear about just what is implied by "store internally". Do you
mean placing TEI or DocBook documents in a database on the server and
then process them for display as HTML/XHTML for the user?
You might want to take a look at /Kairos/ [1]. They've been in the
online-humanities-journal biz for a while (about 12 years), so they
have a lot of experience with what works well for their authors and
readers.
I don't understand why you offered this as an example, and probably miss
your point. The document I looked at from the Kairos site is just some
JavaScript that defines a framework and inserts into it an old-fashioned
(using table for format, for example) document. If I were to do this I'd
use SSI, XHTML, and CSS, but in any case, at least for the document I
viewed, the internally stored document is only HTML, not TEI or DocBook.

--

Haines Brown, KB1GRM

Jun 27 '08 #6
Haines Brown wrote:
Michael Wojcik <mw*****@newsguy.comwrites:
>Andy Dingley just suggested TEI, though he proposed (and I concur)
that you store internally in TEI or DocBook but serve HTML. I'm not
sure whether that's what you were proposing above, or whether you were
thinking of serving XML + schema + style sheet to user agents. The
latter won't be handled properly by many UAs, and will confuse
non-technical users if they try to save content, etc.

Well, I _was_ toying with the idea of serving XML+schema+stylesheet. By
"UA" I presume you mean the average browser (IE).
I mean user agent: whatever is processing the data you send. (That's
standard terminology in the W3C specs, the HTTP RFCs, etc.) Doesn't
particularly matter to me whether it's "average" or exotic, though of
course you may decide not to worry about supporting less-common UAs.
(Do you expect people to read your journal on their iPhones? On other
mobile devices? On browsers embedded in appliances?)
However, I didn't
realize that browsers have problems with XML + public schema +
stylesheet. Would you be more specific about the kinds of problems and
their likelihood of their occurring?
I was over-hasty with that comment. I assumed that there were many UAs
that won't handle XML + schema + style sheet. (IE, for example,
doesn't even handle XHTML properly.) And I believe I've read more
substantial claims to that effect. But I realized when I read your
response that I had not actually verified that suspicion.

Personally, if I were building this application, I'd be reluctant to
serve XML + schema + style sheet, simply because I'd rather not do the
interoperability testing (or limit my content to a handful of common
UAs), when it's not at all difficult to serve HTML 4.01 Strict instead.
And why would a non-technical user
be confused? Wouldn't the user see on his browser the same thing if the
document were instead served as HTML?
Suppose you are a non-technical user. Suppose you are viewing a page
of this journal and decide to save a copy. You know, from prior
experience, that a saved web page is a file with an extension like
".htm" and possibly a folder containing some images and the like.
What's a ".xml" file? What's a ".xsd" file?

And whether the user sees "the same thing" is hard to say. Browsers
have built-in styles for HTML, which they will fall back on in various
circumstances. Some users have user style sheets, which select HTML
elements.
I'm unclear about just what is implied by "store internally". Do you
mean placing TEI or DocBook documents in a database on the server and
then process them for display as HTML/XHTML for the user?
You have to store content, and you have to serve it. Sometimes content
is static - that is, the server simply sends the stored representation
(often just by reading a file from a local filesystem). Often it's
dynamic: server-side includes, ASP and JSP and PHP and other sorts of
scriptable pages, CGI scripts, server extensions that execute
application code, etc.

I don't care (well, for these purposes) how you store content. I'm
suggesting that you store it in a form that works well for your
production toolchain and for the applications that use it - so TEI or
DocBook might well be a good choice. And I'm suggesting that you serve
it in a form that the UA is likely to handle well; I'd suggest HTML
4.01 Strict with external CSS 2.1 style sheets.

To go from the stored representation to the presentation
representation, XSLT looks like the obvious mechanism. The server
could do that on the fly, if it has sufficient resources; or it could
cache the generated HTML; or the HTML could be generated whenever the
XML is updated and served statically.
>You might want to take a look at /Kairos/ [1]. They've been in the
online-humanities-journal biz for a while (about 12 years), so they
have a lot of experience with what works well for their authors and
readers.

I don't understand why you offered this as an example, and probably miss
your point. The document I looked at from the Kairos site is just some
JavaScript that defines a framework and inserts into it an old-fashioned
(using table for format, for example) document.
I was unclear. I didn't mean /Kairos/ as an example of an implementation.

I suggested it because it's an online humanities journal of long
standing, relatively wide readership, and good reputation; because
they've had to deal with all of these issues, and these are the
compromises they arrived at; and because it demonstrates my other
point, which is that people writing for an online journal will want to
be able to use all the possible facilities. That means people will
want to submit articles with multimedia components, so you need to
think about how you'll handle non-text materials in your toolchain.
People will want to submit articles with dynamic content and scripting
- even applications, with any luck - so you'll need to handle that.
If I were to do this I'd
use SSI, XHTML, and CSS, but in any case, at least for the document I
viewed, the internally stored document is only HTML, not TEI or DocBook.
How can you tell how the document is stored internally? What you see
is what the server sent you. You don't know what it did in producing
that content.

--
Michael Wojcik
Micro Focus
Rhetoric & Writing, Michigan State University
Jun 27 '08 #7
Michael, thank you for your wise comments and clarifications.

My translation of your "user agent" into the instance of browswers I see
was too restrictive. You are right; I do have to consider iPhones,
etc. Yes, that would be exotic today, but tomorrow perhaps less so. On
the other hand, there is perhaps reason to assume that "exotic" UAs will
at the same time learn to deal with XML.

I know that IE does not do HTML well, and I have to make the appropriate
accomodations. I'm too ignorant about the matter to say whether it would
do any worse with XML.

You point about the user possibly defining the presentation style
understood. That suggests serving the pages with a clear separation of
format and marked-up content, which can be either XML or HTML.

Thanks again, you were very helpful.
--

Haines Brown, KB1GRM

Jun 27 '08 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: pugnatio2 | last post by:
Hi, I'm a Zope newbie but an experienced Python programmer. Could someone suggest a general approach for the two parts below? 1) I'd like to build a system that stores XML documents in the...
0
by: MarionEll | last post by:
XML 2003 to Highlight Key Publishing Trend: XSL-FO Tools XSL-FO “Chef’s Tools Exhibition” Slated for 7 p.m. Dec. 10; Premier XML Industry Event Runs Dec. 7-12 in Philadelphia Alexandria,...
6
by: Alan Kennedy | last post by:
Hi All, If there any contributors of SpamBayes reading, Congratulations! SpamBayes has won the Personal Computer World (pcw.co.uk) Editors Choice award for anti-spam software, in a review of...
2
by: Dustin Davis | last post by:
This is my first attempt at using ClickOnce. It seems it is publishing OK, but I can't run the install. If I use IE, I just see XML in the web browser. If I use Firefox, I download a setup file,...
10
by: Badis | last post by:
Hi Guys, I'm trying to retrieve data defined as Numeric in Cache database and display it in a textbox but it's giving me this error: "Input string was not in a correct format" Cheers.
2
by: Elmo Watson | last post by:
Since I came from the HTML world many years ago - I am just in the habit of copying/FTPing all my ASP.Net websites to the remote server However, I have a friend who says that Publishing is the...
1
by: thavaht | last post by:
I recently began drawing dynamic pages using PHP + MYSQL. Im developing the site on windows XP / Apache 1.3.34 / PHP 5.1.2 as a testing server. My publishing server is on another machine on the...
2
by: Max2006 | last post by:
Hi, After I right-click on my web application project file and choose "Publish ." and do the publishing, the result publishable files does not include the *.ashx files. Is it by design? How...
0
by: =?Utf-8?B?QWxoYW1icmEgRWlkb3MgS2lxdWVuZXQ=?= | last post by:
Hi all, I have a publishing Click Once of my application. Now I want create new version of my application for modify some files, like app.config.
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development projectplanning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.