473,394 Members | 1,794 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,394 software developers and data experts.

Not the browsers, dummy, the search engines


If I understand the general direction of recent posts, the idea is to
improve the quality of html/css by soliciting help from the various
browsers. Browsers can certainly detect problems but they have no
sensible place to report them and no way to prevent the same problem
from happening over-and-over in multiple sites around the world. That
idea simply doesn't work.

But how about this one. Suppose we have all of those search engine
spiders do a cursory html/css edit check while they're creeping around
on the internet, and not post items with errors into their search
files. Or perhaps flag them on their lists as having errors.

This has some things going for it. It would prevent bad html from
being disseminated all over the world; it would inform the authors of
the bad html that they have a problem, and it would encourage them to
fix their problem since that's the only way anyone will be able to
find the kernels of wisdom they wish to share with the world.

The negative from the search engine point of view would be that the
spiders would take substantially longer to analyze a given file. There
may be a positive for them as well (Other than that warm glow inside
when they just know they're doing the right thing!), they would be
delivering a better product to their customers. If someone selected a
item from a Google list they could be fairly sure it wouldn't end up
being a pile of pointy brackets and wall-to-wall text.

Regards,
Kent Feiler
www.KentFeiler.com
Feb 10 '07 #1
16 2104
On 10 Feb, 15:11, Kent Feiler <z...@zzzz.comwrote:
But how about this one. Suppose we have all of those search engine
spiders do a cursory html/css edit check while they're creeping around
on the internet,
Interesting idea...

This would be a solution to a problem of authors not knowing if their
sites were invalid. However the real problem is that authors don't
_care_ (or even understand) if their sites are invalid. If someone
cares to check, it's not hard to tell. This "spider validation" idea
just doesn't solve the real issue.

Feb 10 '07 #2
In article <aa********************************@4ax.com>,
Kent Feiler <zz**@zzzz.comwrote:
But how about this one. Suppose we have all of those search engine
spiders do a cursory html/css edit check while they're creeping around
on the internet, and not post items with errors into their search
files.
That wont work, because most Web content is erroneous but still useful
to users. Search engines compete on the usefulness of their results, so
excluding useful results that have errors that browsers are able to
silently recover from would be a very bad business move.
Or perhaps flag them on their lists as having errors.
Not a new idea. This has been discussed relatively recently on the
WHATWG list, for example (even though the discussion was off-topic
there).

This wont work, because flagging erroneous pages would mean that the
vast majority of search results would have an error flag next to them
adding clutter to the search UI. A person performing a search isn't
primarily interested about the spec conformance of the pages.
This has some things going for it. It would prevent bad html from
being disseminated all over the world; it would inform the authors of
the bad html that they have a problem, and it would encourage them to
fix their problem since that's the only way anyone will be able to
find the kernels of wisdom they wish to share with the world.
Search engines aren't in the business of putting perpetrators of bad
HTML on the stocks.
There
may be a positive for them as well (Other than that warm glow inside
when they just know they're doing the right thing!), they would be
delivering a better product to their customers.
Why would having an error flag next to just about every search result
item constitute delivering a better product to their customers?

--
Henri Sivonen
hs******@iki.fi
http://hsivonen.iki.fi/
Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html
Feb 10 '07 #3
In article <aa********************************@4ax.com>,
Kent Feiler <zz**@zzzz.comwrote:
If I understand the general direction of recent posts, the idea is to
improve the quality of html/css by soliciting help from the various
browsers. Browsers can certainly detect problems but they have no
sensible place to report them and no way to prevent the same problem
from happening over-and-over in multiple sites around the world. That
idea simply doesn't work.

But how about this one. Suppose we have all of those search engine
spiders do a cursory html/css edit check while they're creeping around
on the internet, and not post items with errors into their search
files. Or perhaps flag them on their lists as having errors.
That's a nice idea, Kent, but think about it -- as a search engine user,
when you do a search for "rutabaga recipes", do you care if the page it
takes you to is valid or not? A search engine that penalized invalid
pages would mostly be punishing itself. In an ideal world, all pages
would validate, but I think valid pages will always be the minority so
search engines have no choice but to deal with tag soup as best they
can. And they do a pretty good job, IMHO.

Now if someone wrote a spider whose sole purpose was validation, *that*
would be pretty interesting...

;)

--
Philip
http://NikitaTheSpider.com/
Whole-site HTML validation, link checking and more
Feb 10 '07 #4
Gazing into my crystal ball I observed "Andy Dingley"
<di*****@codesmiths.comwriting in
news:11**********************@j27g2000cwj.googlegr oups.com:
On 10 Feb, 15:11, Kent Feiler <z...@zzzz.comwrote:
>But how about this one. Suppose we have all of those search engine
spiders do a cursory html/css edit check while they're creeping
around on the internet,

Interesting idea...

This would be a solution to a problem of authors not knowing if their
sites were invalid. However the real problem is that authors don't
_care_ (or even understand) if their sites are invalid. If someone
cares to check, it's not hard to tell. This "spider validation" idea
just doesn't solve the real issue.

Actually, I am one of the few that does care, and I have seen good
results from it. For example, I live in Glendale, California, and I am
Catholic. What would be important to me? To find out when mass was in
Glendale. If I Google for mass glendale, I am greeted by first two
results that are not applicable, and the very next, Holy Family Catholic
Community happens to be a site I developed. Pretty good for 1,080,000
results - 3rd, and 1st relavent result.

I haven't used any "black hat" methods, or Merlin wizardry, just clean,
valid markup.

--
Adrienne Boswell at Home
Arbpen Web Site Design Services
http://www.cavalcade-of-coding.info
Please respond to the group so others can share

Feb 10 '07 #5
On Sat, 10 Feb 2007 12:38:26 -0500, Nikita the Spider
<Ni*************@gmail.comwrote:
>Now if someone wrote a spider whose sole purpose was validation, *that*
would be pretty interesting...
.... but needs a way to notify the author.

- "author" meta tag - none of the pages I work on have it
and IIUC isn't required
- e-mail found in <body- is it the author or someone else?
I have this for most but not all of my pages
- recognize a "contact us" <formon the page
- find and follow a "contact us" link to another page

These ideas all require active compliance on the part of the author,
lack of which being the main reason for the crawler's existence. So
how about ...

- notify the ISP which will relay to the site owner - this may work!?

--

Charles
Feb 10 '07 #6
On Sat, 10 Feb 2007 19:16:57 +0200, Henri Sivonen <hs******@iki.fi>
wrote:

In article <aa********************************@4ax.com>,
Kent Feiler <zz**@zzzz.comwrote:
But how about this one. Suppose we have all of those search engine
spiders do a cursory html/css edit check while they're creeping
around on the internet, and not post items with errors into their
search files.
That wont work, because most Web content is erroneous but still useful
to users. Search engines compete on the usefulness of their results,
so excluding useful results that have errors that browsers are able to
silently recover from would be a very bad business move.
Or perhaps flag them on their lists as having errors.
Not a new idea. This has been discussed relatively recently on the
WHATWG list, for example (even though the discussion was off-topic
there).

This wont work, because flagging erroneous pages would mean that the
vast majority of search results would have an error flag next to them
adding clutter to the search UI. A person performing a search isn't
primarily interested about the spec conformance of the pages.
This has some things going for it. It would prevent bad html from
being disseminated all over the world; it would inform the authors
of the bad html that they have a problem, and it would encourage
them to fix their problem since that's the only way anyone will be
able to find the kernels of wisdom they wish to share with the
world.

Search engines aren't in the business of putting perpetrators of bad
HTML on the stocks.
There may be a positive for them as well (Other than that warm glow
inside when they just know they're doing the right thing!), they
would be delivering a better product to their customers.
Why would having an error flag next to just about every search result
item constitute delivering a better product to their customers?
----------------------------------------------------------------------

Yeah, eliminating results for bad markup is a bad idea, but there are
likely a lot of variations on the thought of flagging bad markup. How
about something like a 1-10 rating where 1 = total crap and, of
course, 10 = the markup version of Bo Derek. As a user, if you had a
choice of two relevent items on a Google list, you might make your
choice on the basis of the markup score.

Would that motivate authors to improve their markup? Anyone in this
newsgroup who got less than a 10 would be horribly embarrassed, but
for others, it's hard to say.
Regards,
Kent Feiler
www.KentFeiler.com
Feb 11 '07 #7
Kent Feiler wrote:
Yeah, eliminating results for bad markup is a bad idea, but there are
likely a lot of variations on the thought of flagging bad markup. How
about something like a 1-10 rating where 1 = total crap and, of
course, 10 = the markup version of Bo Derek. As a user, if you had a
choice of two relevent items on a Google list, you might make your
choice on the basis of the markup score.
Along with markup ratings, search engines could also tell you how many
misspelled words are on each site, and how many times the word "the"
appears. Regardless of whether they could provide these pieces of
information, though, the search engine providers have no reason to.
Feb 11 '07 #8
Kent Feiler wrote:
Yeah, eliminating results for bad markup is a bad idea, but there
are likely a lot of variations on the thought of flagging bad
markup. How about something like a 1-10 rating where 1 = total crap
and, of course, 10 = the markup version of Bo Derek. As a user, if
you had a choice of two relevent items on a Google list, you might
make your choice on the basis of the markup score.
--------------------------------------------------------------------

Along with markup ratings, search engines could also tell you how many
misspelled words are on each site, and how many times the word "the"
appears. Regardless of whether they could provide these pieces of
information, though, the search engine providers have no reason to.
-------------------------------------------------------------------

You may be on to something there. Search engines are just like any
other company, they need to attract customers to their service rather
than their competitor's. So far the only difference between the search
engines seems to be the efficiency of their searches and the size of
their databases. The descriptions of the sites aren't very good.
Mostly the site author can control what appears on the search engine
list for his site. Maybe quality of markup could be one small part of
improving the site descriptions.

Regards,
Kent Feiler
www.KentFeiler.com
Feb 11 '07 #9
In article <91********************************@4ax.com>,
BobaBird <Bo******@aol.spam.com.freewrote:
On Sat, 10 Feb 2007 12:38:26 -0500, Nikita the Spider
<Ni*************@gmail.comwrote:
Now if someone wrote a spider whose sole purpose was validation, *that*
would be pretty interesting...

... but needs a way to notify the author.
My guess is that those who understand what validation is and care about
it put some effort into finding the tools to help them to make their
site valid. Those who don't will probably not react well to an
unsolicited email that says, "Your site has errors!" However
well-intentioned, criticism from strangers isn't usually well-received.
Just note how easily flamefests erupt on Usenet! It's enough to make one
litter one's posts with smileys. =)

--
Philip
http://NikitaTheSpider.com/
Whole-site HTML validation, link checking and more
Feb 12 '07 #10
Kent Feiler wrote:
You may be on to something there. Search engines are just like any
other company, they need to attract customers to their service rather
than their competitor's. So far the only difference between the search
engines seems to be the efficiency of their searches and the size of
their databases. The descriptions of the sites aren't very good.
Mostly the site author can control what appears on the search engine
list for his site. Maybe quality of markup could be one small part of
improving the site descriptions.
Again: where is the demand for this? What percentage of Google users
cares whether the pages returned comply with W3C recommendations? I, for
one, have no interest in seeing this information in Google search
results. What would I do with it? Choose not to look at pages that have
the information I need because Google says they're non-compliant? I
don't think so.
Feb 15 '07 #11
On Thu, 15 Feb 2007 07:24:15 -0500, Harlan Messinger
<hm*******************@comcast.netwrote:

Kent Feiler wrote:
You may be on to something there. Search engines are just like any
other company, they need to attract customers to their service
rather than their competitor's. So far the only difference between
the search engines seems to be the efficiency of their searches and
the size of their databases. The descriptions of the sites aren't
very good. Mostly the site author can control what appears on the
search engine list for his site. Maybe quality of markup could be
one small part of improving the site descriptions.
------------------------------------------------------------------

Again: where is the demand for this? What percentage of Google users
cares whether the pages returned comply with W3C recommendations? I,
for one, have no interest in seeing this information in Google search
results. What would I do with it? Choose not to look at pages that
have he information I need because Google says they're non-compliant?
I don't think so.
-----------------------------------------------------------------

What I said was that quality of markup could be "one small part" of
improving site descriptions. The idea is that if you have a "site
description" spider that reads entire html files and tries to
determine what they're about and how well they accomplish what they're
trying to do, checking and evaluating the markup would be an easy
add-on to it.
Regards,
Kent Feiler
www.KentFeiler.com
Feb 15 '07 #12
Kent Feiler wrote:
On Thu, 15 Feb 2007 07:24:15 -0500, Harlan Messinger
<hm*******************@comcast.netwrote:

Kent Feiler wrote:
>You may be on to something there. Search engines are just like any
other company, they need to attract customers to their service
rather than their competitor's. So far the only difference between
the search engines seems to be the efficiency of their searches and
the size of their databases. The descriptions of the sites aren't
very good. Mostly the site author can control what appears on the
search engine list for his site. Maybe quality of markup could be
one small part of improving the site descriptions.
------------------------------------------------------------------

Again: where is the demand for this? What percentage of Google users
cares whether the pages returned comply with W3C recommendations? I,
for one, have no interest in seeing this information in Google search
results. What would I do with it? Choose not to look at pages that
have he information I need because Google says they're non-compliant?
I don't think so.
-----------------------------------------------------------------

What I said was that quality of markup could be "one small part" of
improving site descriptions. The idea is that if you have a "site
description" spider that reads entire html files and tries to
determine what they're about and how well they accomplish what they're
trying to do, checking and evaluating the markup would be an easy
add-on to it.
It doesn't matter how small or easy a part it would be. If the designers
of Google decide to add a feature that they believe will increase their
revenue, they're not also going to say, "Oh, and as long as we're doing
this let's also spend money adding a bunch of small and easy features
that almost nobody is interested in and won't do one thing to make us
more profitable."
Feb 15 '07 #13
On 15 Feb, 18:43, Kent Feiler <z...@zzzz.comwrote:
checking and evaluating the markup would be an easy
add-on to [a spider]
Of course. The question is, what you would then do with this
assessment, and why it's valuable to generate it. Anyone who needs or
wants it can already get it for themselves for trivial effort. If they
aren't doing so already, we can only assume that it's because they
don't want to.

Feb 15 '07 #14
Kent Feiler <zz**@zzzz.comwrote:
>checking and evaluating the markup would be an easy
add-on to it.
I imagine that parser resource usage is very important to a SE. Looking
at SE results and the options they offer to search the data it is fair
to conclude that their parsers are very rudimentary, not least to reduce
resource usage. Adding validation would increase the resource usage many
times over.

--
Spartanicus
Feb 15 '07 #15
On 15 Feb 2007 11:05:32 -0800, "Andy Dingley" <di*****@codesmiths.com>
wrote:

On 15 Feb, 18:43, Kent Feiler <z...@zzzz.comwrote:
checking and evaluating the markup would be an easy
add-on to [a spider]
--------------------------------------------------------------
Of course. The question is, what you would then do with this
assessment, and why it's valuable to generate it. Anyone who needs or
wants it can already get it for themselves for trivial effort. If they
aren't doing so already, we can only assume that it's because they
don't want to.

--------------------------------------------------------------

I would think this would appear under a heading something like
"readability", and be a 1-10 scale. You're thinking about html
authors, I'm thinking about ordinary web page surfers who might like
to know in advance that the page listed on Google that they're about
to click may be crapped up and hard to read.

Of course html authors will be able to use the same indication.

Regards,
Kent Feiler
www.KentFeiler.com
Feb 16 '07 #16
On 16 Feb, 15:43, Kent Feiler <z...@zzzz.comwrote:
I would think this would appear under a heading something like
"readability",
That's a _very_ different idea, because it's based around users not
creators. As such, it's potentially interesting and useful.

Feb 16 '07 #17

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Aardwolf | last post by:
I have recently started to convert several of my websites over to dynamic sites with pages written as requested with php and in some cases using mysql databases to supply data within parts of the...
0
by: R. Rajesh Jeba Anbiah | last post by:
Q: Is PHP search engine friendly? Q: Will search engine spiders crawl my PHP pages? A: Spiders should crawl anything provided they're accessible. Since, nowadays most of the websites are been...
0
by: Jonathan Fine | last post by:
I'd appreciate comments on what I'm doing. Please be kind, I'm a new kid on the block. Here's my application. I'm writing some test code. (I'm a recent convert to unit testing.)
11
by: Petre Huile | last post by:
I have designed a site for a client, but they have hired an internet marketing person to incrase their search engine ranking and traffic. He wants to put extra-large fonts on every page which will...
19
by: Chris | last post by:
Hello all A new standards based and CSS designed site I recently completed is soon to go live. After much testing and tweaking I have fianlly managed to get the site to look as it should on...
2
by: Patrick | last post by:
Are the differences between a search engine, a subject directory and a meta search engine significant for an ebusiness web site owner? A meta search engine merely uses ordinary existing search...
39
by: Noticedtrends | last post by:
Can inference search-engines narrow-down the number of often irrelevant results, by using specific keywords; for the purpose of discerning emerging social & business trends? For example, if...
64
by: Manfred Kooistra | last post by:
I am building a website with identical content in four different languages. On a first visit, the search engine determines the language of the content by the IP address of the visitor. What the...
0
by: passion | last post by:
"Specialized Search Engines" along with Google Search Capability (2 in 1): http://specialized-search-engines.blogspot.com/ Billions of websites are available on the web and plenty of extremely...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.