473,405 Members | 2,261 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,405 software developers and data experts.

RFC: A Distributed Universal SGML/XML Catalogue Management System

Rationale
=========

Many applications today benefit from an SGML and/or XML Entity Catalogue
to dereference entities referenced by a Public Identifier. For a
validating SGML parser this is an absolute requirement. For any
SGML or XML parser it serves to enable entities such as DTDs and
modules to be resolved locally.

Hitherto, different packages and applications have distributed entity
catalogues. Examples are Docbook, HTML Validators, the OpenSP parser,
and operating system distros. However, there is little coordination
between the distributors of these, and no common package distributors
can rely on. Even in tightly-controlled environments such as the
Debian packages, the W3C Validator includes its own Entity
Catalogue rather than relying on it being available as a dependency.

This situation should be rationalised to allow for an SGML and XML
catalogue to be a single package on which other packages can depend.
In this note, we propose a framework for managing such a package.

Goals
=====
* To maintain a Universal Catalogue
* To provide an automated process for generating local installations of
all or part of the Universal Catalogue.
* To minimise the effort and coordination required to ensure that the
universal catalogues and local installations remain up-to-date.
In particular, end-users should be offered a self-maintaining default
installation that eliminates effort on their part altogether.
* To enable control of different parts of the catalogue to be delegated
to the people/organisations responsible for them.

A loose analogy could be drawn to DNS. But since immediate lookup of
[SG|X]ML entities is dealt with by SYSTEM ids, we only have to deal with
efficient cacheing of local copies of PUBLIC ids. Entities are in
general long-lived, but by no means immutable (for example, the MathML 2
DTD modules have undergone several minor revisions).

Managing a Universal Catalogue
==============================

In principal, all organisations creating public identifiers should be
registered with ISO.
But this is not widely practiced, and the present chaotic situation
indicates that it is not effectively meeting todays needs. We propose
that a distributed architecture for automating catalogue management
is both feasible and preferable.

#### ISO registry: availability???

Our proposal envisages a central registry, cooperating with a set of
recognised repositories each managing its own entity catalogue locally.
For example, the W3C, WapForum and Oasis each manage their own catalogues
independently. Likewise, different groups acting independently within
W3C are responsible for different areas such as HTML, MathML, SVG and
SMIL.
We propose that a universal catalogue will work best if responsibility
for each sub-catalogue is explicitly devolved to the working group
responsible for defining it. The central registry will serve merely
to reference the reponsible groups, in a manner somewhat analagous to DNS.

This is broadly in line with the registry already run by the ISO but
not widely used. What our proposal adds is the availability of the
registry online in machine-readable format, and its integration with
catalogues maintained by each participating organisation. It is
possible that tying the registry in to distribution of Markup libraries
and catalogues may in itself be an incentive for organisations to
register.

#### Implications for naming conventions?

Implementation
==============

Since the Universal Catalogue serves SGML and XML applications, it is
appropriate that it should itself be capable of implementation as an
SGML or XML application. This is straightforward: all we need is a
DTD for declaring catalogues and catalogue entries, and a list of
entities defining catalogues maintained by the groups entrusted with
doing so. This is then implemented by a program to fetch the data
required and write the catalogues. Local installations may be
customised by selecting which entities to include, while package
maintainers can ship a standard configuration.

An implementation demonstrating the above is available at
<URL:http://valet.webthing.com/catalogue/>. It fetches the master
catalogue, DTD and Entities by HTTP. It updates all entries defined,
but uses HTTP If-Modified-Since header to avoid the overhead of re-
fetching anything that is already up-to-date in the local installation.
It can therefore be run regularly (e.g. monthly) with minimal overhead.

CatalogueManager may be used as-is, but is intended as a proof-of-concept.
Non-technical issues such as how to delegate responsibility for different
sub-catalogues need to be addressed, and the file format used for
the demonstrator is likely to be subject to improvement.

Security
========

A package such as CatalogueManager that updates system files based on
third-party definitions has potential to introduce malicious files.
It is strongly recommended that standard system security be used to
avoid serious consequences in the event of any of the sub-catalogues
being compromised. CatalogueManager should run as a user with no
privilege to write to the local filesystem except within a designated
SGML/XML library area, such as /usr/local/share/sgmlib.
Distributors creating a package such as an RPM of CatalogueManager
should ensure your users' security.

A more inherently secure architecture would generate all local filenames
internally, and is probably preferable. The current implementation serves
for back-compatibility until the proposal can be considered stable.

--
Nick Kew

In urgent need of paying work - see http://www.webthing.com/~nick/cv.html
Jul 20 '05 #1
0 1849

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Constandinos Mavromoustakis | last post by:
CFP: CLADE 2004-Challenges of Large Applications in Distributed Environments ------------------------------------------------- PhD student - Dept.Informatics at Aristotle University of...
7
by: J Goldman | last post by:
I'm looking for documentation pointers to learn what I need to put together a distributed database system. I've read through "Oracle 9i Database Administrator's Guide: Distributed Database...
1
by: krammer | last post by:
Hello, Can any one please give me a short but concise pros and cons list of Unicode support in both SGML and XML? long story short, we are gonna port our leagacy SGML files to XML and the new...
6
by: S. | last post by:
if in my website i am using the sgml { notation, is it accurate to say to my users that the site uses unicode or that it requires unicode? is there a mathematical formula to calculate a unicode...
19
by: Arthur J. O'Dwyer | last post by:
Request for comments on the following program: http://www.contrib.andrew.cmu.edu/~ajo/workshop/tokens.c It's a token counter for the C programming language, following the outline...
40
by: news.microsoft.com | last post by:
To Microsoft and fellow MSDN Universal subscribers... Regarding new MSDN Universal (I mean Premier) price and level changes: 1) Way too expensive for the small and medium developer Universal...
0
by: Charles Hall | last post by:
I've successfully built a web service which interacts with my company's document management system through it's supplied API. All works fine, except I now want to make the service more "distributed"....
0
by: melledge | last post by:
XTech 2006, 17-19 May, Amsterdam, The Netherlands www.xtech.org Ajax lightning demos line-up The provisional line-up for the Ajax lightning demos is now available. These will be rapid-fire...
14
by: Richard Harter | last post by:
Apologies for the length - this post is best viewed with fixed font and a line width >= 72. Below is the source code for a C header file that provides a suite of storage management macros. I am...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.