470,874 Members | 1,705 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 470,874 developers. It's quick & easy.

XML-tagging text document from W3C-schema

Is there an XML-editor that can assist you in marking up the data in an
urformatted text document, such that you mark the data and then
right-click to access the Schema you have assigned, and choose the
tag/attribute in the tree-structure, which then magically appears with
your data enclosed? There must be, right?

My assignment is to bring order to a truckload of stats from various
Athletics events. For individual meetings, the formatting of the list of
results are fairly standardized, but as a rule, they all follow their
own standard, which means there's no hope making a script to automize
the tagging, since there are several hundred meetings.

I have made a fairly simple schema which covers basicaly what I need in
the way of tags, which is:

<statistics>
<season name="" year="">
<meeting name="" date="" country="" location="" arranger="" arena="">
<event name="" category="">
<participant lastname="" firstname="" born="" club="" result=""
type="" place="" other=""/>
</event>
</meeting>
</season>
</statistics>
Here's hoping for some suggestions!

--
A noise annoys an oyster
Apr 4 '06 #1
1 1418
Jana wrote:
Is there an XML-editor that can assist you in marking up the data in an
urformatted text document, such that you mark the data and then
right-click to access the Schema you have assigned, and choose the
tag/attribute in the tree-structure, which then magically appears with
your data enclosed? There must be, right?
Some, AFAIK. Although you don't pick the Schema/DTD after highlighting
the text, you pick it once at the start of the document. Then you
highlight the whole text and enclose it in the root element, and then
you break it into its component elements, tagging each one as you go.
And you probably wouldn't choose it from the tree structure but from
a menu.

I always use Emacs for this, as it has no problem in opening a non-XML
document and letting me add markup. But that's a personal choice, and
many people have serious concerns about seeing angle brackets in their
wild state.
My assignment is to bring order to a truckload of stats from various
Athletics events. For individual meetings, the formatting of the list of
results are fairly standardized, but as a rule, they all follow their
own standard, which means there's no hope making a script to automize
the tagging, since there are several hundred meetings.
One alternative, if it is politically acceptable, is to ship them all
off to one of the many excellent companies in the Indian subcontinent
or the Pacific Rim, who are expert this kind of conversion.

Other wise writing a script *is* going to be easier than using an
editor, even with Emacs macros. Have a look at the plaintext handling
of XSLT2 or Omnimark for an "up-convert" (term for what you are trying
to do).
I have made a fairly simple schema which covers basicaly what I need in
the way of tags, which is:
Before you start, make sure all the athletic associations which are
generating your data will use this schema in future. Otherwise you
or your successor will have the same problem all over again in a few
years. Regular readers of this group may remember the suboptimal
format selected by one international winter sporting organisation
which could easily have been done right if they had thought to ask
someone with a clue.
<statistics>
<season name="" year="">
<meeting name="" date="" country="" location="" arranger="" arena="">
<event name="" category="">
<participant lastname="" firstname="" born="" club="" result=""
type="" place="" other=""/>
</event>
</meeting>
</season>
</statistics>
Here's hoping for some suggestions!


Make sure dates are always in the standard format: yyyy-mm-dd.
Anything else makes it impossible to sort the data.

Make as many as possible of the attributes token lists (in W3C
Schema terminology, "enumerated"). This will identify spelling
and formatting errors when you validate. It is essential that
the data is 100% regularised, otherwise any reports you generate
will have multiple differently-spelled groups.

Consider documenting the DTD or Schema and making it openly
available to the sporting community.

///Peter
--
XML FAQ: http://xml.silmaril.ie/
Apr 9 '06 #2

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

2 posts views Thread by Simon Strandgaard | last post: by
reply views Thread by melledge | last post: by
5 posts views Thread by Kurt Bauer | last post: by
5 posts views Thread by laks | last post: by
7 posts views Thread by Scott M. Lyon | last post: by
1 post views Thread by maguca | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.