Connecting Tech Pros Worldwide Forums | Help | Site Map

application/xhtml+xml not recognized by Google

Andreas Prilop
Guest
 
Posts: n/a
#1: Apr 3 '06
It seems that Google is unable to "recognize" application/xhtml+xml:

http://google.com/search?q=www.unics...otation.x.html
"File Format: Unrecognized"

Then follow the link "View as HTML" to
http://google.com/search?q=cache:www...otation.x.html
and look at the source text (as compared with the original)!

Are they mad?


David Dorward
Guest
 
Posts: n/a
#2: Apr 3 '06

re: application/xhtml+xml not recognized by Google


Andreas Prilop wrote:
[color=blue]
> It seems that Google is unable to "recognize" application/xhtml+xml:
> Are they mad?[/color]

Nor can a default installation of Internet Explorer (which still holds a
majority marketshare), so there aren't many true XHTML documents out there.
Thus it probably isn't worth all that much to Google to chain an XML parser
into their search indexer.

--
David Dorward <http://blog.dorward.me.uk/> <http://dorward.me.uk/>
Home is where the ~/.bashrc is
Andreas Prilop
Guest
 
Posts: n/a
#3: Apr 4 '06

re: application/xhtml+xml not recognized by Google


On Mon, 3 Apr 2006, David Dorward wrote:
[color=blue]
> Andreas Prilop wrote:
>[color=green]
>> It seems that Google is unable to "recognize" application/xhtml+xml:
>> Are they mad?[/color]
>
> Nor can a default installation of Internet Explorer (which still holds a
> majority marketshare), so there aren't many true XHTML documents out there.
> Thus it probably isn't worth all that much to Google to chain an XML parser
> into their search indexer.[/color]

(1)
Internet Explorer 6 on Windows XP SP2 does display
http://www.unics.uni-hannover.de/nht...otation.x.html
(because of the suffix .html)

(2)
You misquoted me! You have corrupted my text!

My question "Are they mad?" was NOT under the sentence that Google
does not recognize application/xhtml+xml. It was under the link to
Google's cached version
http://google.com/search?q=cache:www...otation.x.html

Look at the source of the above and compare with the original at
http://www.unics.uni-hannover.de/nht...otation.x.html

They *changed* my <h1> to
<p><font size="6" face="helvetica"><b>
etc. etc.

Therefore I ask "Are they mad?".

--
The 6th of June is Bill Gates Day.

Jukka K. Korpela
Guest
 
Posts: n/a
#4: Apr 4 '06

re: application/xhtml+xml not recognized by Google


Andreas Prilop <nhtcapri@rrzn-user.uni-hannover.de> wrote:
[color=blue]
> They *changed* my <h1> to
> <p><font size="6" face="helvetica"><b>
> etc. etc.
>
> Therefore I ask "Are they mad?".[/color]

Idiots savants, perhaps? Google has performed a nontrivial conversion from
XHML 1.1 to quasi-XHTML (absurdly presentational XHTML-lookalike markup with
some syntax errors like <BASE> element before <html> element). It has clearly
parsed your XHTML (somehow) and mapped the logical elements to presentational
hacks. This uncalled-for transmogrification is "idiotic" is the common
figurative sense but has really required some (abuse of) intelligence and
mental capabilities far above the level of idiots.

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Andreas Prilop
Guest
 
Posts: n/a
#5: Apr 5 '06

re: application/xhtml+xml not recognized by Google


On Tue, 4 Apr 2006, Jukka K. Korpela wrote:
[color=blue]
> Google has performed a nontrivial conversion from
> XHML 1.1 to quasi-XHTML (absurdly presentational XHTML-lookalike markup with
> some syntax errors like <BASE> element before <html> element). It has clearly
> parsed your XHTML (somehow) and mapped the logical elements to presentational
> hacks.[/color]

They make quite an effort to index non-HTML files:
http://www.google.com/help/faq_filetypes.html
It should be trivial to index XHTML 1.1, no?

Jukka K. Korpela
Guest
 
Posts: n/a
#6: Apr 5 '06

re: application/xhtml+xml not recognized by Google


Andreas Prilop <nhtcapri@rrzn-user.uni-hannover.de> wrote:
[color=blue]
> They make quite an effort to index non-HTML files:
> http://www.google.com/help/faq_filetypes.html
> It should be trivial to index XHTML 1.1, no?[/color]

Indeed, especially since they obviously parse XHTML 1.1 (and then do
something nasty). I wonder why XML is not listed. It should be rather simple
to parse XML documents and index just their textual content.

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Andreas Prilop
Guest
 
Posts: n/a
#7: Apr 6 '06

re: application/xhtml+xml not recognized by Google


On Wed, 5 Apr 2006, Jukka K. Korpela wrote:
[color=blue][color=green]
>> http://www.google.com/help/faq_filetypes.html[/color]
>
> I wonder why XML is not listed. It should be rather simple
> to parse XML documents and index just their textual content.[/color]

Google writes "File Format: Unrecognized" for XML:
<http://google.com/search?q=site:groups.google.com+inurl:feed>

--
The 6th of June is Bill Gates Day.


Closed Thread