469,280 Members | 1,813 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,280 developers. It's quick & easy.

Understanding simplest HTML page

I have been trying to get a better understanding of simple HTML, but I
am finding conflicting information is very common. Not only that, even
in what seemed elementary and without any possibility of getting wrong
it seems I am on very shaky ground .

For example, pretty much every book and web course on html that I have
read tells me I must include <html>, <head> and <body> tag pairs.

I have always done that, and never questioned it. It seemed perfectly
reasonable to me (and still does) to split meta information from
presented content, and indeed to require that browsers be told the
content was html. Although I guess having a server present mime type
text/html covers whether contents are html, as does having a doctype.

However on reading http://www.w3.org/TR/html401/struct/global.html I
noticed that the html, head and body tags were optional (although the
title tag is required). So I did a test page

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<title>Test whether required in head</title>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1">
<p>Paragraph of text

This validates without any warning.

If you leave out either the title or some body content it will not
validate. So the validator at least is making an assumption about what
is head and what is body. I would imagine most user agent parsers would
also.

Does anyone have any suggestions about good tutorial texts about html
that get everything correct? At the moment I am gradually going through
the W3C documentation, but I tend to find myself missing some of the
implications.

--
http://www.ericlindsay.com
Nov 23 '05
82 5828
Lars Eighner wrote:
In our last episode, the lovely and talented Beauregard T. Shagnasty
broadcast on comp.infosystems.www.authoring.html:
Eric Lindsay wrote:

I wonder how many people have a server that by default serves .css
as text/html?

Heh, my webmail pages serve css files as: "application/x-pointplus"
I've been reporting this error to them for several years!


So, does the browser believe:

1) The server
2) The type attribute in the LINK element
3) What it can deduce by looking at the file itself
?

I really don't know.


Perhaps 2 or 3, because 1 is wrong. The pages do have the link set
correctly: <link href="..." rel="stylesheet" type="text/css">

All my browsers display the pages correctly, so they all are able to
overcome the error. Even IE. <g>

--
-bts
-Warning: I brake for lawn deer
Nov 23 '05 #51
On Tue, 22 Nov 2005, Lars Eighner wrote:
In our last episode,
<1p****************************@40tude.net>,
the lovely and talented Beauregard T. Shagnasty
broadcast on comp.infosystems.www.authoring.html:
Heh, my webmail pages serve css files as:
"application/x-pointplus"
I just *knew* that was going to come up! I've been seeing that
reported, off and on, for as long as CSS has existed. But, once the
text/css media type had been officially registered in the list of
media types, there's little excuse to continue configuring a
service-provider's web server to *default* to any kind of x-
(unregistered) media type.
I've been reporting this error to them for several years!

I think it was last year that one of the victims of that issue
reported their service provider's response: "we do not support CSS,
and have no plans to do so". Talk about clue-free!?!?!
So, does the browser believe:

1) The server
2) The type attribute in the LINK element
3) What it can deduce by looking at the file itself
?


Typical browsers seem to assume that anything offered as a stylesheet
is going to be CSS, no matter what MIME type the server says it is:
as I said before, this is in violation of RFC2616, which basically
prohibits a client from making a unilateral determination of the media
type at variance with what the server said it was. Mozilla (in
standards mode) behaves more correctly, in ignoring the stylesheet
under these conditions.

After all, the /content/ is supposed to be still fully accessible
without the stylesheet, whereas a misinterpreted stylesheet can lead
to scrambled content; so it's a good working principle to say "if in
doubt, leave it out".

The rest should, I suppose, consider themselves lucky that CSS is the
only widely used stylesheet language for HTML. If there were several
different stylesheet languages being used, this could get quite
exciting.

The RFC2616 rule isn't just there for fun - it's potentially relevant
to security. The fact that you or I might not be able to devise a way
to compromise security via a stylesheet doesn't necessarily mean that
there isn't a way. Still, I suppose those who design security
firewalls have already had to cope with the fact that RFC2616 is
widely disregarded by clients - particularly by the one (YKWIM) which
is vulnerable to security exploits for other reasons too.
Nov 23 '05 #52
"Alan J. Flavell" <fl*****@ph.gla.ac.uk> writes:
The RFC2616 rule isn't just there for fun - it's potentially relevant
to security. The fact that you or I might not be able to devise a way
to compromise security via a stylesheet doesn't necessarily mean that
there isn't a way. Still, I suppose those who design security
firewalls have already had to cope with the fact that RFC2616 is
widely disregarded by clients - particularly by the one (YKWIM) which
is vulnerable to security exploits for other reasons too.


I can't think of any reported against stylesheets (CSS with CSS,
perhaps), but there was a very recent security issue reported that
involved uploading an image to a third-party server (a web board
avatar, in this case), constructed such that it would be interpreted
as HTML by IE, despite the server thinking it was an image and sending
an image mime type, and then placing a link to the image on the web to
do some cross-site scripting. At least it didn't work via <img
src=...>

Opinion was somewhat divided as to whether this was a security flaw _in
the application allowing the upload_ or not, since any
RFC2616-conforming client would just get a corrupted image returned,
which would do no harm at all.

--
Chris
Nov 23 '05 #53
Alan J. Flavell wrote:
In our last episode,
<1p****************************@40tude.net>,
the lovely and talented Beauregard T. Shagnasty
broadcast on comp.infosystems.www.authoring.html:
Heh, my webmail pages serve css files as:
"application/x-pointplus"

I just *knew* that was going to come up!


Aha! You are the one with the crystal ball! :-)
I think it was last year that one of the victims of that issue
reported their service provider's response: "we do not support CSS,
and have no plans to do so". Talk about clue-free!?!?!


Yes, the grunts at my ISP have been copiously sprayed with copious
amounts of clue repellant.

They have a new beta version of their webmail interface for us to see.
Gawd... what a mess.

--
-bts
-Clue repellant: I love this term
Nov 23 '05 #54

Eric Lindsay wrote:

Guy Macon <http://www.guymacon.com/> wrote:
There are a few minor DNS improvements that you might wish to consider:

http://www.dnsreport.com/tools/dnsre...riclindsay.com

Compare with:

http://www.dnsreport.com/tools/dnsre...w.guymacon.com


That is a really nice reporting tool. At least I only got warnings,
rather than a fail. I'll test all the other servers available to me as
well. Looks like it would be a good way to get an idea of what stuff an
ISP handles well before signing up. Thanks for that Guy.


Keep in mind that you don't have to use the DNS your hosting provider
gives you. Register your domains with (or transfer them to) 000Domains
and you will get a solid, fast no-down-time DNS that passes all of the
above tests and is easy to tweak.
Nov 23 '05 #55
Alan J. Flavell said the following on 11/20/2005 22:59 +0200:
On Sun, 20 Nov 2005, Guy Macon wrote:
the discussion at http://www.webstandards.org/learn/askw3c/sep2003.html

[snipped] Furthermore, their well-intentioned idea of having XHTML sent out with
different MIME types to different browsers seems to me to raise more
issues than it solves. If the material is sufficiently back-level (to
XHTML/1.0 Appendix C) then it can be processed as text/html by *any*
browser: sure, it'll be processed by them all as tag soup, but can
still be rendered in Standards Mode rather than quirks mode. Sending
it to e.g Mozilla as full-blown XHTML doesn't gain any clear
advantage, and actually brings some disadvantage for the recipient.


What is this disadvantage? Caching? I've read about that in previous
threads some time ago and the linked W3C document for the quoted URL
also says something about it, but is this what you're refering to?

I don't use XHTML (only in some experiments) because of previous
statements against XHTML in the past one-and-a-half year in this group
(before that I jumped the XHTML bandwagon, so this group has at least
saved one soul), but I like to understand the problems of it.

--
Regards
Harrie
Nov 24 '05 #56
Alan J. Flavell said the following on 11/20/2005 13:22 +0200:
Apache is available to run on Windows, or Mac OS X, for example, and
And all known UNIX'es, but I think your point is that Windows or Mac OS
X users have the same open source, free alternative as UNIX/Linux users.
can do a pretty fine job of reflecting what the pages are going to
find when they're uploaded to the production web server. Including
any SSI or PHP processing that you might be doing, for example. [..]
Yes, I totally agree.
[..] And
correctly handling things like href="./" , which direct file system
access does not.


Are you refering to Apache's DirectoryIndex option? Apart from that,
href="./" locally gives the same directory listing as Apache would, when
there's no index file.

--
Regards
Harrie
Nov 24 '05 #57
On Thu, 24 Nov 2005, Harrie wrote:
Alan J. Flavell said the following on 11/20/2005 13:22 +0200:
Apache is available to run on Windows, or Mac OS X, for example,
and


And all known UNIX'es, but I think your point is that Windows or Mac
OS X users have the same open source, free alternative as UNIX/Linux
users.


Just so. The point that I really had in mind was that this might be
their -authoring- platform, no matter what OS would be running on
their -server-.

[...]
[..] And correctly handling things like href="./" , which direct
file system access does not.


Are you refering to Apache's DirectoryIndex option?


Yes, I mean referring to a default document - whose name is neither
specified nor even fixed - it might be index.html today, and
default.cgi tomorrow, without anyone outside of the server itself
needing to know or care. This is IMHO good web hygiene. I got really
upset when a new author changed them all to ./index.html pattern and
claimed this was the only way he could preview the site. He's now
happily running Apache on his authoring Mac - the references are back
to the previous convention, and we're both on good terms again ;-)

Nov 24 '05 #58
Harrie wrote:
Sending
it to e.g Mozilla as full-blown XHTML doesn't gain any clear
advantage, and actually brings some disadvantage for the recipient.


What is this disadvantage?


I think Alan was refering to the way a xhtml compliant browser behave
when it reads an invalid page sent as xhtml (application/xml+xhtml mime
type).
For exemple see :
http://pgoiffon.free.fr/info/inet/ht...html.ex2.xhtml
Only a parsing error is displayed...
Nov 24 '05 #59
Pierre Goiffon wrote:
I think Alan was refering to the way a xhtml compliant browser behave
when it reads an invalid page sent as xhtml (application/xml+xhtml mime
type).


Oops... I should have writtend application/xhtml+xml, sorry
By the way a must-read document about mime types to use in XHTML :
http://www.w3.org/TR/xhtml-media-types/
Nov 24 '05 #60
On Thu, 24 Nov 2005, Harrie wrote:
Alan J. Flavell said the following on 11/20/2005 22:59 +0200:

[...]
it'll be processed by them all as tag soup, but can still be
rendered in Standards Mode rather than quirks mode. Sending it to
e.g Mozilla as full-blown XHTML doesn't gain any clear advantage,
and actually brings some disadvantage for the recipient.


What is this disadvantage?


Well, there are differences in incremental rendering, which can
mean that the reader has to wait longer until the document is fully
rendered. I don't know the precise differences - they may vary
from version to version, but it seems that sending as text/html will
get the results onto the screen faster, in general. Google for the
terms "incremental rendering" and/or "incremental display" in
conjunction with XHTML for lots of discussions.

And if the content provider hasn't ensured that the document is
well-formed, the reader will get shown an error report instead of a
best-effort at rendering. That's what is *meant* to happen, and (for
pedants) is one of the *benefits* of XML, but some readers take a
different view when they're actually trying to browse for content.

Hixie's briefing discusses this whole area, in a lot more detail than
I can do here.

regards
Nov 24 '05 #61
Alan J. Flavell said the following on 11/24/2005 12:38 +0200:
On Thu, 24 Nov 2005, Harrie wrote:
Alan J. Flavell said the following on 11/20/2005 22:59 +0200:
it'll be processed by them all as tag soup, but can still be
rendered in Standards Mode rather than quirks mode. Sending it to
e.g Mozilla as full-blown XHTML doesn't gain any clear advantage,
and actually brings some disadvantage for the recipient.


What is this disadvantage?


Well, there are differences in incremental rendering, which can
mean that the reader has to wait longer until the document is fully
rendered. I don't know the precise differences - they may vary
from version to version, but it seems that sending as text/html will
get the results onto the screen faster, in general. Google for the
terms "incremental rendering" and/or "incremental display" in
conjunction with XHTML for lots of discussions.


Thanks for the pointer, I'll look into it, it sounds interesting.
And if the content provider hasn't ensured that the document is
well-formed, the reader will get shown an error report instead of a
best-effort at rendering. That's what is *meant* to happen, and (for
pedants) is one of the *benefits* of XML, but some readers take a
different view when they're actually trying to browse for content.
Yes, I too think this is a benefit of X(HT)ML and I'm hoping that
authors spot their mistakes earlier, when they check it with an XHTML
capable browser.

But in my field of work (IT related) I see very few people who actually
check what they are doing and I guess this is also true for XHTML
authors :(
Hixie's briefing discusses this whole area, in a lot more detail than
I can do here.


I've read his post a while ago and I didn't fully understood everything,
but I'll read it again, some things I have to read multiple times to
understand.

--
Regards
Harrie
Nov 24 '05 #62
Pierre Goiffon said the following on 11/24/2005 09:39 +0200:
Pierre Goiffon wrote:
I think Alan was refering to the way a xhtml compliant browser behave
when it reads an invalid page sent as xhtml (application/xml+xhtml
mime type).

Like Alan says this what is suposed to happen and my point of view is
that an author should check his writing (for correct encoding, validity
and more), but this sadly is not a real world example.
Oops... I should have writtend application/xhtml+xml, sorry
By the way a must-read document about mime types to use in XHTML :
http://www.w3.org/TR/xhtml-media-types/


Thanks, I know this one, it has been posted before with discussions
about HTML vs XHTML. I strongly believe that XHTML should be served as
application/xhtml+xml and because MSIE doens't like that, I don't use
XHTML (although MSIE is not my personal choise as a browser).

I do like XML though and therefor like XHTML and I hope MSIE will
someday support application/xhtml+xml, I think it has benefits over
text/html, but I wasn't aware about the partional rendering thing which
Alan pointed out.

I have some reading to do (although I will stay with HTML 4.01 Strict
awaiting better times).

--
Regards
Harrie
Nov 24 '05 #63
Doug Laidlaw said the following on 11/19/2005 03:27 +0200:
You can create your own .htaccess files. Windows won't (or wouldn't) let
you create a file with a name that is "all extension" so I once created a
file with a name that kept Windows happy, then changed it once it was on
the server. I am now running Linux, and have no problems. The problem
then was that dot-files were invisible.


It's true that Windows won't let you create a dot file, but you can make
it on UNIX/Linux and copy it to your Windows machine (FTP or by whatever
means). Once you remove the hidden attribute (on Windows), you can edit
it with Wordpad without a problem. That way you don't have to change
names. At least this works with my Win2k box.

--
Regards
Harrie
Nov 24 '05 #64
Doug Laidlaw said the following on 11/19/2005 03:27 +0200:

[about Windows]
[..] The problem then was that dot-files were invisible.


You have to make "hidden" files visible on Windows with Tools ==> folder
Options ==> View and then check "Show hidden files and folders".

Maybe you need to uncheck "Hide protected OS system files" and "Hide
file extentions for known types" also, I've always unchecked those cause
I don't like an OS to hide things for me (the question is then, what am
I doing with Windows?! Answer: games).

--
Regards
Harrie
Nov 24 '05 #65
On Thu, 24 Nov 2005, Harrie wrote:
Doug Laidlaw said the following on 11/19/2005 03:27 +0200:
You can create your own .htaccess files. Windows won't (or wouldn't) let
you create a file with a name that is "all extension" so I once created a
file with a name that kept Windows happy, then changed it once it was on
the server. I am now running Linux, and have no problems. The problem
then was that dot-files were invisible.
It's true that Windows won't let you create a dot file,


I'm sorry, but I don't agree with that as stated. There's nothing in
*Windows* (the OS) which prevents you from creating a file with a name
beginning with a dot, such as .htaccess - in fact, when I list my
files on this Win/2K laptop I can see that I have quite a few such
files there.

*Some* *Applications* written for Windows will give you a hard time
with creating or opening them, it's true (myself I use PFE32 as my
plaintext editor for this kind of purpose, although it's old and no
longer supported). (But if *your* favourite editor won't touch them,
you can always call them something else, and then copy them to
..htaccess or whatever it was you wanted, afterwards. A cmd window is
quite happy to perform "copy htaccess .htaccess".

Alternatively, since this is one's private Apache setup that we're
discussing here, it's quite legal to edit the httpd.conf and change
the name of this file with an AccessFileName directive. You just have
to remember that will then be a difference from a regular production
server, over which you don't control the configuration.
but you can make it on UNIX/Linux and copy it to your Windows
machine (FTP or by whatever means).


You *can* do it that way, but there's no need to. If unix-like
systems are your style, could I suggest installing the appropriate
components of Cygwin ? As a unix fan, one can work in a more
comfortable way there. But those who are not fans of unix just need
to find Windows applications that don't put artificial hurdles in
their way, IMHO.

Nov 24 '05 #66
Harrie wrote:
It's true that Windows won't let you create a dot file


yeah it will

--
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact

Nov 24 '05 #67
Harrie wrote:
the question is then, what am I doing with Windows?! Answer: games


get a ps2.

--
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact

Nov 24 '05 #68
On Thu, 24 Nov 2005, Toby Inkster wrote:
Harrie wrote:
the question is then, what am I doing with Windows?! Answer: games


get a ps2.


We had one of those, once - like this:
http://www.seds.org/~spider/ps2/ps2.html

Oh, not *that* kind of PS2 ;-}

Nov 24 '05 #69
In article <43***********************@news.free.fr>,
Pierre Goiffon <pg******@free.fr.invalid> wrote:
I think Alan was refering to the way a xhtml compliant browser behave
when it reads an invalid page sent as xhtml (application/xml+xhtml mime
type).
For exemple see :
http://pgoiffon.free.fr/info/inet/ht...html.ex2.xhtml
Only a parsing error is displayed...


Just tried that with Safari, plus Macintosh version of Opera and
Firefox, all of which tell you about the error.

However I was under the impression that Google search bots did not
search files served as application/xhtml+xml. So I searched for the
contents of that page using

Ce parapgraphe contient des balises croises

and Google had it as the only hit. Did I misunderstand (again)?

--
http://www.ericlindsay.com
Nov 24 '05 #70
I was going to ask about the Title element, required in the head of a
web page (wow, I'm down to about line 4 of my web page so far).
Specifically I am asking if there was anything I am likely to get wrong
with the title?

http://www.w3.org/TR/REC-html40/stru...l.html#h-7.4.2
says the allowed attributes are lang (language information)and dir (text
direction). I don't believe I would need to use either of these
attributes.

I notice that http://www.htmlref.com/reference/appa/tag_title.htm says
title can include id="unique alphanumeric identifier" but I didn't see
that in the W3C site.

Title seems to be specifically for external titles. May show at the top
of a web browser open to that page. May be shown when a page is
bookmarked as a favourite. Probably used by some or all search engines,
so it seems pretty important to get the content exactly right.
Something very descriptive of the page contents, and especially not a
generic title that is the same for every page on a site.

So for example on one now almost abandoned hobby site
http://www.ericlindsay.com/epoc I have a bunch of pages about
applications for a PDA. The site titles for specific applications
generally say things like "Psion Epoc phone dialler hints and tips". I
tried to include the manufacturer, the operating system, the name of or
a descriptive term for the application or utility, plus what I was
providing on the page (hints and tips). In this case the more general
manufacturer and operating system are short words, so I could leave them
at the start of the title. If they were long, I'd probably put them
later in the line in case it was truncated when viewing, along the lines
of "phone dialler hints and tips for very long manufacturer name with an
even longer operating system name". I think that approach is generally
correct, but if anyone thinks it should be done in a different way I'd
like to hear reasons.

I was thinking in writing web pages I would probably generally generate
Title and h1 from the same input line, with title perhaps also getting a
shortened URL for the site (such as myname.com rather than a full URL).
I was thinking that would normally help confine the length of the title
to about one line.

Because of the importance of title and h1 to search engines, I was also
thinking of generating the metas for keywords and description from the
title or h1, in case any search engines still also use these metas. I
would imagine in a lot of pages one or more of title, h1, keywords and
description would also end up being edited manually.

--
http://www.ericlindsay.com
Nov 24 '05 #71
On Fri, 25 Nov 2005, Eric Lindsay wrote:

[TITLE element:]
http://www.w3.org/TR/REC-html40/stru...l.html#h-7.4.2
says the allowed attributes are lang (language information)and dir
(text direction).
right
I don't believe I would need to use either of these attributes.
It's going to be the same language as you defined on the HTML element,
presumably. Unless you're using right-to-left scripts, there's rarely
a reason to use dir. (As A.Prilop has demonstrated, browsers get
confused with rtl scripts, and it can be helpful to add appropriate
dir=rtl attributes fairly liberally - even where the rtl-ness ought to
be implicit.)
I notice that http://www.htmlref.com/reference/appa/tag_title.htm says
title can include id="unique alphanumeric identifier"
That seems to be wrong, and I can't see what point an id would be.
Title seems to be specifically for external titles. May show at the
top of a web browser open to that page. May be shown when a page is
bookmarked as a favourite. Probably used by some or all search
engines, so it seems pretty important to get the content exactly
right. Something very descriptive of the page contents, and
especially not a generic title that is the same for every page on a
site.
Good summary. So it needs to make sense *out* of context. But for
practical reasons, best confined to about 55-65 characters max. So
don't get carried away with too much detail.
So for example on one now almost abandoned hobby site
http://www.ericlindsay.com/epoc I have a bunch of pages about
applications for a PDA. The site titles for specific applications
generally say things like "Psion Epoc phone dialler hints and tips".
No arguments with that, but take care that the key part doesn't
risk getting chopped off at the right. Long-ish titles:

Acme Supporters and Social Club Bumbledon Branch News
Acme Supporters and Social Club Bumbledon Branch Membership
Acme Supporters and Social Club Bumbledon Branch Photo Gallery

and so on, all very worthy, but if the bookmarks get shown like:

Acme Supporters and Social Club Bumbledo...

then it's a bit irritating to users.
I was thinking in writing web pages I would probably generally
generate Title and h1 from the same input line,


<h1> doesn't necessarily have to be useful out of context, unlike the
<title>, but yes, often they can be the same or very similar.

I've nothing particular to add on your other points.

hope that's useful
Nov 24 '05 #72
Eric Lindsay <NO**********@ericlindsay.com> wrote:
http://pgoiffon.free.fr/info/inet/ht...html.ex2.xhtml
Only a parsing error is displayed...


Just tried that with Safari, plus Macintosh version of Opera and
Firefox, all of which tell you about the error.

However I was under the impression that Google search bots did not
search files served as application/xhtml+xml. So I searched for the
contents of that page using

Ce parapgraphe contient des balises croises

and Google had it as the only hit. Did I misunderstand (again)?


http://www.google.com/search?q=Ce+pa...alises+croises

--
Spartanicus
Nov 24 '05 #73
In article
<gf********************************@news.spartanic us.utvinternet.ie>,
Spartanicus <in*****@invalid.invalid> wrote:
Eric Lindsay <NO**********@ericlindsay.com> wrote:
http://pgoiffon.free.fr/info/inet/ht...html.ex2.xhtml
Only a parsing error is displayed...


Just tried that with Safari, plus Macintosh version of Opera and
Firefox, all of which tell you about the error.

However I was under the impression that Google search bots did not
search files served as application/xhtml+xml. So I searched for the
contents of that page using

Ce parapgraphe contient des balises croises

and Google had it as the only hit. Did I misunderstand (again)?


http://www.google.com/search?q=Ce+pa...alises+croises


Doh! I did see it, Spartanicus. However I cut and pasted my search
terms from the page source for the site. croisees seems not to have
survived being pasted into this newsreader. The first e of the two has
an acute accent (at least, it looks like that is it). &eacute; ?

http://www.google.com/search?hl=en&l...&q=Ce+parapgra
phe+contient+des+balises+croises&btnG=Search Same thing happened here.
I don't think my newsreader can cope.

--
http://www.ericlindsay.com
Nov 25 '05 #74
Harrie wrote:
I do like XML though and therefor like XHTML and I hope MSIE will
someday support application/xhtml+xml,


Support is planned for IE8, it will not be supported in IE7, but even
then we'd have to wait for enough users to ditch older browsers to use
it. Hopefully, by then, Google, Lynx and others will have added support
also.

--
Lachlan Hunt
http://lachy.id.au/
http://GetFirefox.com/ Rediscover the Web
http://GetThunderbird.com/ Reclaim your Inbox
Nov 25 '05 #75
Eric Lindsay <NO**********@ericlindsay.com> wrote:
>> http://pgoiffon.free.fr/info/inet/ht...html.ex2.xhtml
>> Only a parsing error is displayed...
>
>However I was under the impression that Google search bots did not
>search files served as application/xhtml+xml. So I searched for the
>contents of that page using
>
>Ce parapgraphe contient des balises croises
>
>and Google had it as the only hit. Did I misunderstand (again)?


http://www.google.com/search?q=Ce+pa...alises+croises


Doh! I did see it, Spartanicus. However I cut and pasted my search
terms from the page source for the site. croisees seems not to have
survived being pasted into this newsreader. The first e of the two has
an acute accent (at least, it looks like that is it). &eacute; ?


I also pasted the actual text into the address bar of my browser, this
also produced a "not found". It does work when I paste the actual text
into the search field of the Google page.

The server may be configured to serve text/html to the Google bot. It
currently doesn't appear to use content negotiation based on the UA's
accept string (IE gets application/xhtml+xml), but it may have been at
the time when Google bot came round to index the site.

http://www.spartanicus.utvinternet.ie/demo.xhtml isn't indexed by
Google:
http://www.google.com/search?hl=en&l...22&btnG=Search

--
Spartanicus
Nov 25 '05 #76
Harrie wrote:
I think Alan was refering to the way a xhtml compliant browser behave
when it reads an invalid page sent as xhtml (application/xml+xhtml
mime type).

Like Alan says this what is suposed to happen and my point of view is
that an author should check his writing (for correct encoding, validity
and more), but this sadly is not a real world example.


Saying that you're forgeting all site with dynamically generated pages,
and particularly all the CMS where the content contains markup written
by non IT users.
I do like XML though and therefor like XHTML and I hope MSIE will
someday support application/xhtml+xml


There was an post on the IE blog saying there won't be "true" XHTML
support for IE7 :
http://blogs.msdn.com/ie/archive/2005/09/15/467901.aspx
Nov 25 '05 #77
Eric Lindsay wrote:
For exemple see :
http://pgoiffon.free.fr/info/inet/ht...html.ex2.xhtml
Only a parsing error is displayed...
Just tried that with Safari, plus Macintosh version of Opera and
Firefox, all of which tell you about the error.


Yes, this is the normal behavior for "true" invalid XHTML
However I was under the impression that Google search bots did not
search files served as application/xhtml+xml. So I searched for the
contents of that page using

Ce parapgraphe contient des balises croises

and Google had it as the only hit. Did I misunderstand (again)?


The text was "ce paragraphe contient des balises croisées"

Plus I did recently correct a mistake I made : my .htaccess was badly
configured, and application/xml+xhtml was sent for this page instead of
application/xhtml+xml... Note also there are no content negociation for
this page, as it's an exemple for an article I wrote to sum up
informations collected in the various threads on the subject (choice of
html or xhtml) in fr.comp.infosystemes.www.auteurs, to show a "true"
xhtml page with invalid markup displays an error in xhtml compliant
browsers. The article is here :
http://pgoiffon.free.fr/info/inet/html_ou_xhtml.php
(oh yes, it's written in french)

So this page could definitively not be took as a good exemple about
search engine indexing "true" xhtml or not (it could be in a near
future, but you must wait some weeks I presume)
Nov 25 '05 #78
On Fri, 25 Nov 2005, Eric Lindsay wrote:
User-Agent: MT-NewsWatcher/3.4 (PPC Mac OS X)

However I cut and pasted my search
terms from the page source for the site. croisees seems not to have
survived being pasted into this newsreader. The first e of the two has
an acute accent (at least, it looks like that is it). &eacute; ?
I don't think my newsreader can cope.


It can - at least when set up correctly:
http://www.smfr.org/mtnw/docs/Mime.html
http://www.smfr.org/mtnw/docs/Prefer...Send_with_MIME

--
Netscape 3.04 does everything I need, and it's utterly reliable.
Why should I switch? Peter T. Daniels in <news:sci.lang>

Nov 25 '05 #79
Tim
On Thu, 24 Nov 2005 23:16:52 +0000, Alan J. Flavell sent:
take care that the key part doesn't
risk getting chopped off at the right. Long-ish titles:

Acme Supporters and Social Club Bumbledon Branch News
Acme Supporters and Social Club Bumbledon Branch Membership
Acme Supporters and Social Club Bumbledon Branch Photo Gallery

and so on, all very worthy, but if the bookmarks get shown like:

Acme Supporters and Social Club Bumbledo...

then it's a bit irritating to users.


Obvious way to use long titles that can still be usable if truncated:

Branch News for Acme Supporters and Social Club Bumbledon

--
If you insist on e-mailing me, use the reply-to address (it's real but
temporary). But please reply to the group, like you're supposed to.

This message was sent without a virus, please destroy some files yourself.

Nov 25 '05 #80
In article
<Pi*************************************@s5b004.rr zn.uni-hannover.de>,
Andreas Prilop <nh******@rrzn-user.uni-hannover.de> wrote:
On Fri, 25 Nov 2005, Eric Lindsay wrote:
User-Agent: MT-NewsWatcher/3.4 (PPC Mac OS X)

However I cut and pasted my search
terms from the page source for the site. croisees seems not to have
survived being pasted into this newsreader. The first e of the two has
an acute accent (at least, it looks like that is it). &eacute; ?
I don't think my newsreader can cope.


It can - at least when set up correctly:
http://www.smfr.org/mtnw/docs/Mime.html
http://www.smfr.org/mtnw/docs/Prefer...Send_with_MIME


Thanks. I think this is the first time in 6 months I have tried sending
a non-US ASCII character to a newsgroup (and I didn't even notice it
when I cut and pasted the URL). Nice to know I can turn mime on if
sending such material (if I had seen mime when I configured MTM, I
certainly didn't remember it).

--
http://www.ericlindsay.com
Nov 26 '05 #81
On Sat, 26 Nov 2005, Eric Lindsay wrote:
Nice to know I can turn mime on if sending such material


This should not be necessary. :-(
The newsreader should *always* include a header line
MIME-Version: 1.0
and specify the Content-Type.

--
Netscape 3.04 does everything I need, and it's utterly reliable.
Why should I switch? Peter T. Daniels in <news:sci.lang>

Nov 28 '05 #82
Pierre Goiffon said the following on 11/25/2005 10:32 +0200:
Harrie wrote:
Like Alan says this what is suposed to happen and my point of view is
that an author should check his writing (for correct encoding,
validity and more), but this sadly is not a real world example.


Saying that you're forgeting all site with dynamically generated pages,
and particularly all the CMS where the content contains markup written
by non IT users.


No, it just says something about my opinion of the quality of those
tools ;)

I validate (and I know validation is not everything, just a tool) my CGI
scripts (pr otherwise generated HTML sources) and try to check them as
much as possible for semantically correct coding (I don't think I used
the best words to describe that, but I hope you know what I mean), I
think CMS'es and tools like that should check what they spew out
(preferably before releasing such a tool).

A friend of mine uses NetObjects Fusion which uses made up attributes. I
know User Agents should ignore attributes they don't understand, so this
shouldn't be a big problem, but this is not a nice solution. HTML (SGML)
comments would be better, but I don't like the idea of poluting HTML
source whith such things.

This is a sample of the source:

<BODY NOF="(MB=(ZeroMargins, 0, 0, 0, 0), L=(HomeLayout, 820, 597))">

Also, I really don't understand why NOF includes a DOCTYPE in the HTML
file and doesn't comply to that (which is HTML 4.0 Transitional). It
doesn't use the backup system identifier (the URL part) for triggering
Stand Compliance Mode, so what's the use of it then?

And then all those blogging tools, most (all?) use XHTML 1.0
Transitional and send it as text/html instead of HTML 4.01 Strict. I
thought (some time ago) that web authoring tool makers would have some
knowledge about web standards, but I think they all jumped the bandwagon
of XHTML (as I've done, but I quickly changed back once I've read enough
post in this group).

I'm always confused (but I'm still too naive from time to time) to see
how most people just want to make money and quality comes to a second
place (if at all).

Just my own thoughts ..

--
Regards
Harrie
Nov 30 '05 #83

This discussion thread is closed

Replies have been disabled for this discussion.

By using this site, you agree to our Privacy Policy and Terms of Use.