By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
437,723 Members | 1,662 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 437,723 IT Pros & Developers. It's quick & easy.

Not the browsers, dummy, the search engines

P: n/a

If I understand the general direction of recent posts, the idea is to
improve the quality of html/css by soliciting help from the various
browsers. Browsers can certainly detect problems but they have no
sensible place to report them and no way to prevent the same problem
from happening over-and-over in multiple sites around the world. That
idea simply doesn't work.

But how about this one. Suppose we have all of those search engine
spiders do a cursory html/css edit check while they're creeping around
on the internet, and not post items with errors into their search
files. Or perhaps flag them on their lists as having errors.

This has some things going for it. It would prevent bad html from
being disseminated all over the world; it would inform the authors of
the bad html that they have a problem, and it would encourage them to
fix their problem since that's the only way anyone will be able to
find the kernels of wisdom they wish to share with the world.

The negative from the search engine point of view would be that the
spiders would take substantially longer to analyze a given file. There
may be a positive for them as well (Other than that warm glow inside
when they just know they're doing the right thing!), they would be
delivering a better product to their customers. If someone selected a
item from a Google list they could be fairly sure it wouldn't end up
being a pile of pointy brackets and wall-to-wall text.

Regards,
Kent Feiler
www.KentFeiler.com
Feb 10 '07 #1
Share this Question
Share on Google+
16 Replies


P: n/a
On 10 Feb, 15:11, Kent Feiler <z...@zzzz.comwrote:
But how about this one. Suppose we have all of those search engine
spiders do a cursory html/css edit check while they're creeping around
on the internet,
Interesting idea...

This would be a solution to a problem of authors not knowing if their
sites were invalid. However the real problem is that authors don't
_care_ (or even understand) if their sites are invalid. If someone
cares to check, it's not hard to tell. This "spider validation" idea
just doesn't solve the real issue.

Feb 10 '07 #2

P: n/a
In article <aa********************************@4ax.com>,
Kent Feiler <zz**@zzzz.comwrote:
But how about this one. Suppose we have all of those search engine
spiders do a cursory html/css edit check while they're creeping around
on the internet, and not post items with errors into their search
files.
That wont work, because most Web content is erroneous but still useful
to users. Search engines compete on the usefulness of their results, so
excluding useful results that have errors that browsers are able to
silently recover from would be a very bad business move.
Or perhaps flag them on their lists as having errors.
Not a new idea. This has been discussed relatively recently on the
WHATWG list, for example (even though the discussion was off-topic
there).

This wont work, because flagging erroneous pages would mean that the
vast majority of search results would have an error flag next to them
adding clutter to the search UI. A person performing a search isn't
primarily interested about the spec conformance of the pages.
This has some things going for it. It would prevent bad html from
being disseminated all over the world; it would inform the authors of
the bad html that they have a problem, and it would encourage them to
fix their problem since that's the only way anyone will be able to
find the kernels of wisdom they wish to share with the world.
Search engines aren't in the business of putting perpetrators of bad
HTML on the stocks.
There
may be a positive for them as well (Other than that warm glow inside
when they just know they're doing the right thing!), they would be
delivering a better product to their customers.
Why would having an error flag next to just about every search result
item constitute delivering a better product to their customers?

--
Henri Sivonen
hs******@iki.fi
http://hsivonen.iki.fi/
Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html
Feb 10 '07 #3

P: n/a
In article <aa********************************@4ax.com>,
Kent Feiler <zz**@zzzz.comwrote:
If I understand the general direction of recent posts, the idea is to
improve the quality of html/css by soliciting help from the various
browsers. Browsers can certainly detect problems but they have no
sensible place to report them and no way to prevent the same problem
from happening over-and-over in multiple sites around the world. That
idea simply doesn't work.

But how about this one. Suppose we have all of those search engine
spiders do a cursory html/css edit check while they're creeping around
on the internet, and not post items with errors into their search
files. Or perhaps flag them on their lists as having errors.
That's a nice idea, Kent, but think about it -- as a search engine user,
when you do a search for "rutabaga recipes", do you care if the page it
takes you to is valid or not? A search engine that penalized invalid
pages would mostly be punishing itself. In an ideal world, all pages
would validate, but I think valid pages will always be the minority so
search engines have no choice but to deal with tag soup as best they
can. And they do a pretty good job, IMHO.

Now if someone wrote a spider whose sole purpose was validation, *that*
would be pretty interesting...

;)

--
Philip
http://NikitaTheSpider.com/
Whole-site HTML validation, link checking and more
Feb 10 '07 #4

P: n/a
Gazing into my crystal ball I observed "Andy Dingley"
<di*****@codesmiths.comwriting in
news:11**********************@j27g2000cwj.googlegr oups.com:
On 10 Feb, 15:11, Kent Feiler <z...@zzzz.comwrote:
>But how about this one. Suppose we have all of those search engine
spiders do a cursory html/css edit check while they're creeping
around on the internet,

Interesting idea...

This would be a solution to a problem of authors not knowing if their
sites were invalid. However the real problem is that authors don't
_care_ (or even understand) if their sites are invalid. If someone
cares to check, it's not hard to tell. This "spider validation" idea
just doesn't solve the real issue.

Actually, I am one of the few that does care, and I have seen good
results from it. For example, I live in Glendale, California, and I am
Catholic. What would be important to me? To find out when mass was in
Glendale. If I Google for mass glendale, I am greeted by first two
results that are not applicable, and the very next, Holy Family Catholic
Community happens to be a site I developed. Pretty good for 1,080,000
results - 3rd, and 1st relavent result.

I haven't used any "black hat" methods, or Merlin wizardry, just clean,
valid markup.

--
Adrienne Boswell at Home
Arbpen Web Site Design Services
http://www.cavalcade-of-coding.info
Please respond to the group so others can share

Feb 10 '07 #5

P: n/a
On Sat, 10 Feb 2007 12:38:26 -0500, Nikita the Spider
<Ni*************@gmail.comwrote:
>Now if someone wrote a spider whose sole purpose was validation, *that*
would be pretty interesting...
.... but needs a way to notify the author.

- "author" meta tag - none of the pages I work on have it
and IIUC isn't required
- e-mail found in <body- is it the author or someone else?
I have this for most but not all of my pages
- recognize a "contact us" <formon the page
- find and follow a "contact us" link to another page

These ideas all require active compliance on the part of the author,
lack of which being the main reason for the crawler's existence. So
how about ...

- notify the ISP which will relay to the site owner - this may work!?

--

Charles
Feb 10 '07 #6

P: n/a
On Sat, 10 Feb 2007 19:16:57 +0200, Henri Sivonen <hs******@iki.fi>
wrote:

In article <aa********************************@4ax.com>,
Kent Feiler <zz**@zzzz.comwrote:
But how about this one. Suppose we have all of those search engine
spiders do a cursory html/css edit check while they're creeping
around on the internet, and not post items with errors into their
search files.
That wont work, because most Web content is erroneous but still useful
to users. Search engines compete on the usefulness of their results,
so excluding useful results that have errors that browsers are able to
silently recover from would be a very bad business move.
Or perhaps flag them on their lists as having errors.
Not a new idea. This has been discussed relatively recently on the
WHATWG list, for example (even though the discussion was off-topic
there).

This wont work, because flagging erroneous pages would mean that the
vast majority of search results would have an error flag next to them
adding clutter to the search UI. A person performing a search isn't
primarily interested about the spec conformance of the pages.
This has some things going for it. It would prevent bad html from
being disseminated all over the world; it would inform the authors
of the bad html that they have a problem, and it would encourage
them to fix their problem since that's the only way anyone will be
able to find the kernels of wisdom they wish to share with the
world.

Search engines aren't in the business of putting perpetrators of bad
HTML on the stocks.
There may be a positive for them as well (Other than that warm glow
inside when they just know they're doing the right thing!), they
would be delivering a better product to their customers.
Why would having an error flag next to just about every search result
item constitute delivering a better product to their customers?
----------------------------------------------------------------------

Yeah, eliminating results for bad markup is a bad idea, but there are
likely a lot of variations on the thought of flagging bad markup. How
about something like a 1-10 rating where 1 = total crap and, of
course, 10 = the markup version of Bo Derek. As a user, if you had a
choice of two relevent items on a Google list, you might make your
choice on the basis of the markup score.

Would that motivate authors to improve their markup? Anyone in this
newsgroup who got less than a 10 would be horribly embarrassed, but
for others, it's hard to say.
Regards,
Kent Feiler
www.KentFeiler.com
Feb 11 '07 #7

P: n/a
Kent Feiler wrote:
Yeah, eliminating results for bad markup is a bad idea, but there are
likely a lot of variations on the thought of flagging bad markup. How
about something like a 1-10 rating where 1 = total crap and, of
course, 10 = the markup version of Bo Derek. As a user, if you had a
choice of two relevent items on a Google list, you might make your
choice on the basis of the markup score.
Along with markup ratings, search engines could also tell you how many
misspelled words are on each site, and how many times the word "the"
appears. Regardless of whether they could provide these pieces of
information, though, the search engine providers have no reason to.
Feb 11 '07 #8

P: n/a
Kent Feiler wrote:
Yeah, eliminating results for bad markup is a bad idea, but there
are likely a lot of variations on the thought of flagging bad
markup. How about something like a 1-10 rating where 1 = total crap
and, of course, 10 = the markup version of Bo Derek. As a user, if
you had a choice of two relevent items on a Google list, you might
make your choice on the basis of the markup score.
--------------------------------------------------------------------

Along with markup ratings, search engines could also tell you how many
misspelled words are on each site, and how many times the word "the"
appears. Regardless of whether they could provide these pieces of
information, though, the search engine providers have no reason to.
-------------------------------------------------------------------

You may be on to something there. Search engines are just like any
other company, they need to attract customers to their service rather
than their competitor's. So far the only difference between the search
engines seems to be the efficiency of their searches and the size of
their databases. The descriptions of the sites aren't very good.
Mostly the site author can control what appears on the search engine
list for his site. Maybe quality of markup could be one small part of
improving the site descriptions.

Regards,
Kent Feiler
www.KentFeiler.com
Feb 11 '07 #9

P: n/a
In article <91********************************@4ax.com>,
BobaBird <Bo******@aol.spam.com.freewrote:
On Sat, 10 Feb 2007 12:38:26 -0500, Nikita the Spider
<Ni*************@gmail.comwrote:
Now if someone wrote a spider whose sole purpose was validation, *that*
would be pretty interesting...

... but needs a way to notify the author.
My guess is that those who understand what validation is and care about
it put some effort into finding the tools to help them to make their
site valid. Those who don't will probably not react well to an
unsolicited email that says, "Your site has errors!" However
well-intentioned, criticism from strangers isn't usually well-received.
Just note how easily flamefests erupt on Usenet! It's enough to make one
litter one's posts with smileys. =)

--
Philip
http://NikitaTheSpider.com/
Whole-site HTML validation, link checking and more
Feb 12 '07 #10

P: n/a
Kent Feiler wrote:
You may be on to something there. Search engines are just like any
other company, they need to attract customers to their service rather
than their competitor's. So far the only difference between the search
engines seems to be the efficiency of their searches and the size of
their databases. The descriptions of the sites aren't very good.
Mostly the site author can control what appears on the search engine
list for his site. Maybe quality of markup could be one small part of
improving the site descriptions.
Again: where is the demand for this? What percentage of Google users
cares whether the pages returned comply with W3C recommendations? I, for
one, have no interest in seeing this information in Google search
results. What would I do with it? Choose not to look at pages that have
the information I need because Google says they're non-compliant? I
don't think so.
Feb 15 '07 #11

P: n/a
On Thu, 15 Feb 2007 07:24:15 -0500, Harlan Messinger
<hm*******************@comcast.netwrote:

Kent Feiler wrote:
You may be on to something there. Search engines are just like any
other company, they need to attract customers to their service
rather than their competitor's. So far the only difference between
the search engines seems to be the efficiency of their searches and
the size of their databases. The descriptions of the sites aren't
very good. Mostly the site author can control what appears on the
search engine list for his site. Maybe quality of markup could be
one small part of improving the site descriptions.
------------------------------------------------------------------

Again: where is the demand for this? What percentage of Google users
cares whether the pages returned comply with W3C recommendations? I,
for one, have no interest in seeing this information in Google search
results. What would I do with it? Choose not to look at pages that
have he information I need because Google says they're non-compliant?
I don't think so.
-----------------------------------------------------------------

What I said was that quality of markup could be "one small part" of
improving site descriptions. The idea is that if you have a "site
description" spider that reads entire html files and tries to
determine what they're about and how well they accomplish what they're
trying to do, checking and evaluating the markup would be an easy
add-on to it.
Regards,
Kent Feiler
www.KentFeiler.com
Feb 15 '07 #12

P: n/a
Kent Feiler wrote:
On Thu, 15 Feb 2007 07:24:15 -0500, Harlan Messinger
<hm*******************@comcast.netwrote:

Kent Feiler wrote:
>You may be on to something there. Search engines are just like any
other company, they need to attract customers to their service
rather than their competitor's. So far the only difference between
the search engines seems to be the efficiency of their searches and
the size of their databases. The descriptions of the sites aren't
very good. Mostly the site author can control what appears on the
search engine list for his site. Maybe quality of markup could be
one small part of improving the site descriptions.
------------------------------------------------------------------

Again: where is the demand for this? What percentage of Google users
cares whether the pages returned comply with W3C recommendations? I,
for one, have no interest in seeing this information in Google search
results. What would I do with it? Choose not to look at pages that
have he information I need because Google says they're non-compliant?
I don't think so.
-----------------------------------------------------------------

What I said was that quality of markup could be "one small part" of
improving site descriptions. The idea is that if you have a "site
description" spider that reads entire html files and tries to
determine what they're about and how well they accomplish what they're
trying to do, checking and evaluating the markup would be an easy
add-on to it.
It doesn't matter how small or easy a part it would be. If the designers
of Google decide to add a feature that they believe will increase their
revenue, they're not also going to say, "Oh, and as long as we're doing
this let's also spend money adding a bunch of small and easy features
that almost nobody is interested in and won't do one thing to make us
more profitable."
Feb 15 '07 #13

P: n/a
On 15 Feb, 18:43, Kent Feiler <z...@zzzz.comwrote:
checking and evaluating the markup would be an easy
add-on to [a spider]
Of course. The question is, what you would then do with this
assessment, and why it's valuable to generate it. Anyone who needs or
wants it can already get it for themselves for trivial effort. If they
aren't doing so already, we can only assume that it's because they
don't want to.

Feb 15 '07 #14

P: n/a
Kent Feiler <zz**@zzzz.comwrote:
>checking and evaluating the markup would be an easy
add-on to it.
I imagine that parser resource usage is very important to a SE. Looking
at SE results and the options they offer to search the data it is fair
to conclude that their parsers are very rudimentary, not least to reduce
resource usage. Adding validation would increase the resource usage many
times over.

--
Spartanicus
Feb 15 '07 #15

P: n/a
On 15 Feb 2007 11:05:32 -0800, "Andy Dingley" <di*****@codesmiths.com>
wrote:

On 15 Feb, 18:43, Kent Feiler <z...@zzzz.comwrote:
checking and evaluating the markup would be an easy
add-on to [a spider]
--------------------------------------------------------------
Of course. The question is, what you would then do with this
assessment, and why it's valuable to generate it. Anyone who needs or
wants it can already get it for themselves for trivial effort. If they
aren't doing so already, we can only assume that it's because they
don't want to.

--------------------------------------------------------------

I would think this would appear under a heading something like
"readability", and be a 1-10 scale. You're thinking about html
authors, I'm thinking about ordinary web page surfers who might like
to know in advance that the page listed on Google that they're about
to click may be crapped up and hard to read.

Of course html authors will be able to use the same indication.

Regards,
Kent Feiler
www.KentFeiler.com
Feb 16 '07 #16

P: n/a
On 16 Feb, 15:43, Kent Feiler <z...@zzzz.comwrote:
I would think this would appear under a heading something like
"readability",
That's a _very_ different idea, because it's based around users not
creators. As such, it's potentially interesting and useful.

Feb 16 '07 #17

This discussion thread is closed

Replies have been disabled for this discussion.