By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
429,423 Members | 1,601 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 429,423 IT Pros & Developers. It's quick & easy.

Arbitrary definition of class names by user agents

P: n/a
Stefan Ram wrote (in "More than one language in a page"):
In this case, one might even use Google's new attribute value:

<p lang="en">The word
<q><span lang="fr" class="notranslate">chef</span></q>
is of French origin.</p>

See

http://googlewebmastercentral.blogsp...ge-barrier.htm
Is this a new trend of user-agent writers (Microformats, and now Google)
staking claims on the @class namespace? I'm surely not the only one
disturbed by this. Somehow, an author publishing on the web, with no
control over which user agents will access his page, has to avoid
clashes with the union of all names deemed special by all those user
agents, now and in the future?

I suppose the proponents justify this practice by a line in the HTML
spec (HTML4.01 §7.5.2), that class names are also for "general purpose
processing by user agents" as well as stylesheet selectors. It doesn't
go into any further detail, but I don't think it was the intention that
applications which the author has no control over (e.g. once a page is
published) should define class names willy-nilly. More likely, the
author would have opted in to some scheme, such as a company's internal
robot to do some advanced indexing on all its own pages.

Here are some ideas for external interpretation, i.e. by some 'third
party' such as Google:

* Opt in to a third party's scheme. Register ones URIs with Google,
so they know that 'notranslate' means what they think on those
pages. I don't fancy doing that with a lot of third parties, though.
* Third parties register class names with an authority (e.g. W3C).
But still, authors have to watch out for future uses of names.
And third parties shouldn't have to register with W3C when they've
already registered (for example) DNS names.
* Define a sub-namespace not used by CSS to form DNS-like names,
e.g. ':com:google:notranslate'. Okay, but potentially verbose if
used a lot. And it doesn't generally sidestep non-CSS mechanisms
of defining class names.
* Use head/@profile with a URI owned by the third party. This is
what Microformats seem to be doing, but I don't think it is
adequate. Independent microformats used in the same page still
have to avoid clashing with each other, which means going back to
some authority's third-party register. Plus, the author doesn't
have control over the class names - it's all or nothing for a
particular format.
* Extend CSS with properties not related to style. There's nothing
in the framework of CSS that limits it to just style (right?). I
favour this, and shall elaborate on it...

Google could define a CSS property which turns translation on or off,
and the author could associate any class he chooses (indeed, any CSS
selector) with that property:

.notranslate { // Okay, so he chose the same one after all! ;-)
-google-translation: disable;
}

Then, to avoid Google having to scan his stylesheets just to find this
rule, the author links it in with:

<link rel="stylesheet" media="translator" href="...">

Other user agents won't touch it, because they don't recognise
"translator". Google won't touch other stylesheets because they're not
labelled with "translator".

A few issues raised by this approach are:

* It's not style/presentation, which is what CSS was designed for.
But I think this is a superficial problem - just regard the name
"CSS" and rel="stylesheet" as historical accidents, and CSS
becomes an application of arbitrary properties, that happens to
include ones related to style.
* It's now invading the CSS-property and media-type namespaces. But
both of these could go the same way as XML namespaces and
link/@rel schemas, if necessary.

To summarise: Rather than user agents stomping over the heretofore
author-defined namespace of class names, they should fit into it in the
same way that CSS properties do. This would scale better, and would be
less intrusive on the author's ability to choose.
Oct 26 '08 #1
Share this Question
Share on Google+
4 Replies


P: n/a
Steven Simpson wrote:
Is this a new trend of user-agent writers (Microformats, and now
Google) staking claims on the @class namespace?
It surely is, and all the warnings seem to get ignored. The idea of
assigning fixed meanings to class names sounds _so_ cool and useful, and you
don't need anybody's permission or time-wasting discussions!

And it probably looks obvious that "notranslate" won't accidentally be used
for something else by someone else, so it looks safe to define it as you
like. It might be different with shorter and more vague class names like
"date" - does it refer to date notations, or dating, or something else? You
cannot possibly know what the string "date" might intuitively mean to
billions of people speaking hundreds of different languages. So by
declaring, say, "date" as predefined, you would assign arbitrary meanings to
an unknown number of constructs in documents, meanings that need not have
anything to do with the intentions of their authors.

In fact, "notranslate" is potentially very risky too. It is true that in any
existing document, it probably relates to someone's intentions of not having
something translated. But it might also mean that something _has not_ been
translated. Or it might mean 'do not translate (the content)' in a very
specific and limited technical meaning, _not_ a universal declaration that
the content should not be translated. For example, in some bilingual site
maintenance approach, it might be an instruction to human translators to
leave the content untranslated, since it shall be the same in both
languages - without meaning that it should be the same in _all_ languages.

The only sensible approach in using class attributes for purposes like
"notranslate" in the Google technique would have been to use a class name
that is syntactically malformed by existing specifications. That way, no
legitimate existing usage of the string as class attribute would have been
affected.

Even better, a new attribute (or element) should have been introduced.

Someone might say that from the viewpoint of generalized markup, a
processing instruction might have been the most adequate approach. But
generalized markup is water under the bridge, and we live with tag sets that
everyone can use as he likes and sees fit.

And on the realistic side, translation instructions should not really be
merged into markup. They are process-oriented, not data-oriented or
structure-oriented. You typically have words or phrases that should not be
translated, and would you really like to be forced to add
non-translatability markup into each and every occurrence in each document,
instead of having e.g. a site-wide glossary of terms that specifies them,
among other things?

Besides, the most common case for non-translatability that I can imagine
right now is English words and phrases in non-English text. For them, common
sense might say that it should suffice to declare their language as English.
When translating, say, some text from Dutch to French, you are normally not
supposed to translate any English words and phrases in them. If they are OK
in the original, they're usually the right choice in the translation as
well. So the only thing needed would be language markup.
Google could define a CSS property which turns translation on or off,
That would be even more wrong than using "predefined" class names, since
translation issues are not presentational in the sense that CSS is supposed
to be.
* It's not style/presentation, which is what CSS was designed for.
But I think this is a superficial problem - just regard the name
"CSS" and rel="stylesheet" as historical accidents, and CSS
becomes an application of arbitrary properties, that happens to
include ones related to style.
Excuse me while fall into despair.
To summarise: Rather than user agents stomping over the heretofore
author-defined namespace of class names, they should fit into it in
the same way that CSS properties do.
I cannot recognize parody any more, sorry.

--
Yucca, http://www.cs.tut.fi/~jkorpela/

Oct 26 '08 #2

P: n/a
"Jukka K. Korpela" <jk******@cs.tut.fiwrites:
Steven Simpson wrote:
>Is this a new trend of user-agent writers (Microformats, and now
Google) staking claims on the @class namespace?
<snip>
In fact, "notranslate" is potentially very risky too. It is true that
in any existing document, it probably relates to someone's intentions
of not having something translated. But it might also mean that
something _has not_ been translated. Or it might mean 'do not
translate (the content)' in a very specific and limited technical
meaning, _not_ a universal declaration that the content should not be
translated. For example, in some bilingual site maintenance approach,
it might be an instruction to human translators to leave the content
untranslated, since it shall be the same in both languages - without
meaning that it should be the same in _all_ languages.
Agreed. It could also relate to the other meaning of "translate" --
the geometric one. A paragraph which is to be left in its normal
position, not translated in any direction, might well be marked
"notranslate".

--
Ben.
Oct 26 '08 #3

P: n/a
Jukka K. Korpela wrote:
Steven Simpson wrote:
>Google could define a CSS property which turns translation on or off,

That would be even more wrong than using "predefined" class names,
since translation issues are not presentational in the sense that CSS
is supposed to be.
> * It's not style/presentation, which is what CSS was designed for.
But I think this is a superficial problem - just regard the name
"CSS" and rel="stylesheet" as historical accidents, and CSS
becomes an application of arbitrary properties, that happens to
include ones related to style.

Excuse me while fall into despair.
What's wrong? I'm not suggesting that we abandon the distinction
between content and presentation, merely recognising that only two
things constrain CSS technically to presentation:

* the set of properties defined by various specs,
* the media type/query filter,

....and by extending these together, you get a framework still capable of
separating presentation from content, but also capable of separating
other kinds of (erm) "interpretation" from content.

Looking at it another way, if you wanted to devise a framework for the
latter separation, you could easily come up with one identical to that
used for the former, except that:

* the file format's property set would differ from CSS's,
* you'd have a different set of @media,
* you wouldn't call the format CSS,
* your @rel type wouldn't mention 'style'.

It would be technically sufficient to continue using @rel="stylesheet",
and rely on @media to distinguish between presentation and 'other kinds
of interpretation'. But if that really is a problem, just use
@rel="propertysheet".
Oct 26 '08 #4

P: n/a
Jukka K. Korpela wrote:
Steven Simpson wrote:
>Is this a new trend of user-agent writers (Microformats, and now
Google) staking claims on the @class namespace?

It surely is, and all the warnings seem to get ignored. The idea of
assigning fixed meanings to class names sounds _so_ cool and useful, and
you don't need anybody's permission or time-wasting discussions!

And it probably looks obvious that "notranslate" won't accidentally be
used for something else by someone else, so it looks safe to define it
as you like. It might be different with shorter and more vague class
names like "date" - does it refer to date notations, or dating, or
something else? You cannot possibly know what the string "date" might
intuitively mean to billions of people speaking hundreds of different
languages. So by declaring, say, "date" as predefined, you would assign
arbitrary meanings to an unknown number of constructs in documents,
meanings that need not have anything to do with the intentions of their
authors.

In fact, "notranslate" is potentially very risky too. It is true that in
any existing document, it probably relates to someone's intentions of
not having something translated. But it might also mean that something
_has not_ been translated. Or it might mean 'do not translate (the
content)' in a very specific and limited technical meaning, _not_ a
universal declaration that the content should not be translated. For
example, in some bilingual site maintenance approach, it might be an
instruction to human translators to leave the content untranslated,
since it shall be the same in both languages - without meaning that it
should be the same in _all_ languages.

The only sensible approach in using class attributes for purposes like
"notranslate" in the Google technique would have been to use a class
name that is syntactically malformed by existing specifications. That
way, no legitimate existing usage of the string as class attribute would
have been affected.
If Google had specified class="google:notranslate" in place of
class="notranslate", despite the lack of any intrinsic significance of
the x: in class names it would have gone a long way toward eliminating
potential conflict.
Oct 27 '08 #5

This discussion thread is closed

Replies have been disabled for this discussion.