473,386 Members | 2,078 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

php and Googlebot

I've created a php page which is optimized for search engine indexation: no
images, tables or css, just plain html with relevant meta tags etc.

The page contains a list records pulled from a database, and for each record
there is a link to the detail view for that record in this form: <a
href="<?=$_SERVER['PHP_SELF'] ?>?rec=<?=$recordId ?>">

I've made sure the urlstring doesn't contain a variable called 'id' or
something, since I heard some bots will assume that is a session id and not
follow the link.

From my server stats I can see that the "master view" page is being indexed
fine by Google, but the bot apparently doesn't follow the links to the
detail pages.

Does anyone have any idea why?
..soma
Jul 17 '05 #1
24 2701

"somaboy mx" <no****@fakemail.fk> a écrit dans le message de news:
42**********@x-privat.org...

From my server stats I can see that the "master view" page is being indexed fine by Google, but the bot apparently doesn't follow the links to the
detail pages.

Does anyone have any idea why?


Be patient...
Jul 17 '05 #2
Bots don't like dynamic URLs.

ECRIA
http://www.ecria.com
Jul 17 '05 #3
*** ECRIA Public Mail Buffer wrote/escribió (Thu, 30 Jun 2005 11:54:05
-0400):
Bots don't like dynamic URLs.


Since nowadays most interesting info comes from dynamic URLs, that doesn't
say much about bots.
--
-- Álvaro G. Vicario - Burgos, Spain
-- http://bits.demogracia.com - Mi sitio sobre programación web
-- Don't e-mail me your questions, post them to the group
--
Jul 17 '05 #4


Alvaro G Vicario wrote:
*** ECRIA Public Mail Buffer wrote/escribió (Thu, 30 Jun 2005 11:54:05
-0400):
Bots don't like dynamic URLs.


Since nowadays most interesting info comes from dynamic URLs, that doesn't
say much about bots.

What would you have the bots do, fill out forms, click buttons? Even if
they could, would you want a bunch webcrawler bots firing your cgis and
PHP stuff all the time?


Brian

Jul 17 '05 #5

ECRIA Public Mail Buffer wrote:
Bots don't like dynamic URLs.


They dont seem to mind mine.

Jul 17 '05 #6


de***********@yahoo.com wrote:
Alvaro G Vicario wrote:
*** ECRIA Public Mail Buffer wrote/escribió (Thu, 30 Jun 2005 11:54:05
-0400):
Bots don't like dynamic URLs.


Since nowadays most interesting info comes from dynamic URLs, that doesn't
say much about bots.

What would you have the bots do, fill out forms, click buttons? Even if
they could, would you want a bunch webcrawler bots firing your cgis and
PHP stuff all the time?


I would... And I wouldn't mind them firing my cgis and PHP stuff a
few times a day. Wouldn't really hurt anything.

And they don't have to fill out forms most of the time, just follow the
given links. As long as there are links from the home page.

--
Contact Us Script:
http://www.douglassdavis.com

Jul 17 '05 #7

"ECRIA Public Mail Buffer" <ng***********@ecria.com> a écrit dans le message
de news: da**********@murdoch.acc.Virginia.EDU...
Bots don't like dynamic URLs.

Completly wrong
Jul 17 '05 #8
"ECRIA Public Mail Buffer" <ng***********@ecria.com> wrote in message
news:da**********@murdoch.acc.Virginia.EDU...
Bots don't like dynamic URLs.


I'd be surprised ifthat were the case. I've had pages with a couple of
variables in the urlstring which got excellent indexing.

This said, I believe there might be some benefit in "clean" url's, if only
for human readability.
..soma
Jul 17 '05 #9
Google for one generally only indexes dynamic URLs for news sites and
forums. The reason is simple: Dynamic content changes unpredictably and
cannot be indexed reliably without additional credible information about the
content.

This is a discussion-based newsgroup, so disagree if you must. Feel free to
give us an example of your dynamic non-forum/news/blog URL that appears in
web search results. Doing so is a much better way to make your point than
anonymously posting "You don't know what you're talking about".

We have been in the Design/SEO business for years, and we know exactly what
we are talking about.

ECRIA
http://www.ecria.com
Jul 17 '05 #10
ECRIA Public Mail Buffer <ng***********@ecria.com> wrote:
Google for one generally only indexes dynamic URLs for news sites and
forums. The reason is simple: Dynamic content changes unpredictably and
cannot be indexed reliably without additional credible information about the
content.

We have been in the Design/SEO business for years, and we know exactly what
we are talking about.


Then you should know that there is no way to determine if an URL is
dynamically generated or not other than parsing it's contents. If a
spider/indexer decides to mark something as dynamic based on the form of
an URL the developers should be fired at once.

Jul 17 '05 #11
ECRIA Public Mail Buffer <ng***********@ecria.com> wrote:
Google for one generally only indexes dynamic URLs for news sites and
forums. The reason is simple: Dynamic content changes unpredictably and
cannot be indexed reliably without additional credible information about the
content.

We have been in the Design/SEO business for years, and we know exactly what
we are talking about.


Then you should know that there is no way to determine if an URL is
dynamically generated or not other than parsing it's contents. And even
that doesn't mean anything (an unchanged page can still be dynamically
generated and a changed page could be manually updated).

If a spider/indexer decides to mark something as dynamic based on the
form of an URL the developers should be fired at once.

Jul 17 '05 #12
" Then you should know that there is no way to determine if an URL is
dynamically generated or not other than parsing it's contents. And even that
doesn't mean anything (an unchanged page can still be dynamically generated
and a changed page could be manually updated)."

Agreed. Furthermore, a static page URL may actually be a dynamically
generated page. For example, there is not a single HTML file on
http://www.ecria.com - but HTML is all anyone will see.

However, the point is that if there are variables in a URL, robots assume
that the page is generated dynamically - which is a fair assumption. We're
not condoning this - it's just what happens.

There are ways to get around it, but they don't change the fact that this is
the way robots work.

Other than a blog/forum/news pages, how many web sites containing URL
variables show up in, say, a Google search?

See what I mean?

ECRIA
http://www.ecria.com
Jul 17 '05 #13
ECRIA Public Mail Buffer <ng***********@ecria.com> wrote:
However, the point is that if there are variables in a URL, robots assume
that the page is generated dynamically - which is a fair assumption.
Bad assumption, they could be used clientside.
There are ways to get around it, but they don't change the fact that this is
the way robots work.
That's why I say they should be fired. They are to lazy to fix bad
assumptions and even go out of their way to insert this kind of silly
behavior.
Other than a blog/forum/news pages, how many web sites containing URL
variables show up in, say, a Google search?

See what I mean?


No, all pages of eg http://www.amsterdamchinafestival.nl/ appear to be
listed in Google. All dynamically generated by hiddeous URL without
trying to hide it's dynamic and it's no blog, form or news page. Even if
it was how would a spider know it's on of the "special" sites?

Jul 17 '05 #14
There's a difference between being LISTED and being RANKED -
http://www.amsterdamchinafestival.nl/ does not show up in Google results for
"Amsterdam china festival" - it's own title!

How does Google know it's a news/blog site? Beats me - ask them. They know.
Probably something to do with the Google Groups technology.

I think that will have to be my final word... this discussion is not going
anywhere.

ECRIA
http://www.ecria.com
Jul 17 '05 #15
ECRIA Public Mail Buffer <ng***********@ecria.com> wrote:
There's a difference between being LISTED and being RANKED -
Goal post shifting detected! From indexing to ranking.
http://www.amsterdamchinafestival.nl/ does not show up in Google results for
"Amsterdam china festival" - it's own title!
In the results I get, it's at apathetic 10th place (most propably due to
only having 2 incoming links).
How does Google know it's a news/blog site? Beats me - ask them. They
know. Probably something to do with the Google Groups technology.

I think that will have to be my final word... this discussion is not
going anywhere.


I have given you the requested example of the "dynamic
non-forum/news/blog URL that appears in web search results". Now please
enlighten us with knowledge of "the Design/SEO business for years"...

Jul 17 '05 #16
ECRIA Public Mail Buffer wrote:
Google for one generally only indexes dynamic URLs for news sites and
forums.
And how does Google tell if a URL is a news site or a forum?
The reason is simple: Dynamic content changes unpredictably
and cannot be indexed reliably without additional credible
information about the content.
Don't news sites & forums change unpredictably?
I've had more problems with Google being out-of-date on forums and news
sites than on other dynamic sites.
This is a discussion-based newsgroup, so disagree if you must. Feel
free to give us an example of your dynamic non-forum/news/blog URL
that appears in web search results.
http://www.mrbreakfast.com/superdisp...?recipeid=1325
cinnamon roll pull apart, #22

http://www.911cheferic.com/main/drec...fle&recipe=532
grand marnier souffle, #10

http://www.mrbreakfast.com/superdisp...p?recipeid=267
grand marnier souffle, #2

I'd say #10 and #2 is pretty good indexing, wouldn't you?
Doing so is a much better way to
make your point than anonymously posting "You don't know what you're
talking about".
True. It's generally better to prove your point, rather than making
unfounded assertions, right?
We have been in the Design/SEO business for years, and we know
exactly what we are talking about.


:)

--
Tony Garcia
Web Right! Development
Jul 17 '05 #17
Daniel Tryba wrote:
ECRIA Public Mail Buffer <ng***********@ecria.com> wrote:
There's a difference between being LISTED and being RANKED -


Goal post shifting detected! From indexing to ranking.


OK, so we're saying that Google doesn't RANK dynamic URL's, now.

Interesting, since http://www.mrbreakfast.com/superdisp...p?recipeid=267
shows up as #2 under a search for "grand marnier souffle"

How does Google know it's a news/blog site? Beats me - ask them. They
know. Probably something to do with the Google Groups technology.

I think that will have to be my final word... this discussion is not
going anywhere.


I have given you the requested example of the "dynamic
non-forum/news/blog URL that appears in web search results". Now
please enlighten us with knowledge of "the Design/SEO business for
years"...


As I said before:

:)

--
Tony Garcia
Web Right! Development
Jul 17 '05 #18
http://www.google.com/webmasters/2.html

FAQ: "My webpages have never been included in the Google index."

Google: "Your pages are dynamically generated. We're able to index
dynamically generated pages. However, because our web crawler could
overwhelm and crash sites that serve dynamic content, we limit the number of
dynamic pages we index. In addition, our crawlers may suspect that a URL
with many dynamic parameters might be the same page as another URL with
different parameters. For that reason, we recommend using fewer parameters
if possible. Typically, URLs with 1-2 parameters are more easily crawlable
than those with many parameters."

ECRIA: "Bots don't like dynamic URLs."

I think you're right - dynamic sites are indexed, but ecria does have a
point (ducking and covering)...
Jul 17 '05 #19
On Fri, 1 Jul 2005 14:40:20 -0400, "ECRIA Public Mail Buffer"
<ng***********@ecria.com> wrote:
How does Google know it's a news/blog site? Beats me - ask them. They know.
Probably something to do with the Google Groups technology.


I heard that certain words (like 'blog' or 'forum') when detected are
like death to your site - a bot will either drop the links or the
search engine just won't rank you very highly.

Chris
Jul 17 '05 #20
ECRIA Public Mail Buffer wrote:
We have been in the Design/SEO business for years,
The SEO business, eh? That's nice to know.
and we know exactly what we are talking about.


I'm glad somebody does, cos I haven't the foggiest!

--
Jock
Jul 17 '05 #21
Somebody wrote:
I heard that certain words (like 'blog' or 'forum') when detected are
like death to your site - a bot will either drop the links or the
search engine just won't rank you very highly.


Do you believe that?

--
Jock
Jul 17 '05 #22
ECRIA Public Mail Buffer wrote:
Google for one generally only indexes dynamic URLs for news sites and
forums. The reason is simple: Dynamic content changes unpredictably and
cannot be indexed reliably without additional credible information about the
content.

This is a discussion-based newsgroup, so disagree if you must. Feel free to
give us an example of your dynamic non-forum/news/blog URL that appears in
web search results. Doing so is a much better way to make your point than
anonymously posting "You don't know what you're talking about".

We have been in the Design/SEO business for years, and we know exactly what
we are talking about.

ECRIA
http://www.ecria.com


http://www.google.com/search?hs=ec8&...rg&btnG=Search

For one. Completely dynamically generated. Not a news site or a forum. Many
of my sites have dynamic pages. And all get spidered eventually. Just not
necessarily all at once.

I can come up with plenty more because even though you've "been in the business
for several years" your statement is hogwash. Even google indicates they spider
dynamic pages:

http://www.google.com/intl/en/webmasters/2.html
--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================
Jul 17 '05 #23
ch*********@hartslock.org.uk wrote:
On Fri, 1 Jul 2005 14:40:20 -0400, "ECRIA Public Mail Buffer"
<ng***********@ecria.com> wrote:
How does Google know it's a news/blog site? Beats me - ask them. They know.
Probably something to do with the Google Groups technology.

I heard that certain words (like 'blog' or 'forum') when detected are
like death to your site - a bot will either drop the links or the
search engine just won't rank you very highly.

Chris


Don't believe everything you hear - especially on the internet.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================
Jul 17 '05 #24
"ECRIA Public Mail Buffer" <ng***********@ecria.com> wrote ...
This is a discussion-based newsgroup, so disagree if you must. Feel free
to give us an example of your dynamic non-forum/news/blog URL that appears
in web search results. Doing so is a much better way to make your point
than anonymously posting "You don't know what you're talking about".

Here's a list of dynamic url's from a site I created a while back, all
indexed:
http://www.google.be/search?q=allinu....krikri.be+aID
We have been in the Design/SEO business for years, and we know exactly
what we are talking about.


Apparently you need to re-evaluate your assumptions...

..s

Jul 17 '05 #25

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
by: Bonnie | last post by:
Hi: I'm hoping someone can shed some light on this issue. (I've been digging around everywhere and can't seem to find it by searching): I use the @import statement to attach an external style...
2
by: socialism001 | last post by:
I have a folder that I want googlebot to index but I don't want any other bots to be able to index the files. How would I do this in the ..robots file. Thanks, Chris
3
by: noop | last post by:
Hi, not really a html question, but... I've submitted my URL to Google for indexing. In the logs of my server, I see that googlebot has requested my /robots.txt and my /index.html, but it stopped...
0
by: John Smith | last post by:
Googlebot has been picking up numerous PHPSESSID name/value pairs in URIs at my website, and this causes duplicate hits and wasted bandwidth. I've since prevented PHPSESSID generation in my PHP...
29
by: CAH | last post by:
Hi Can you avoid that googlebot indexes PHPSESSID pages? Googlebot is indexing pages with PHPSESSID, which makes it think my page has a infinite number of pages. How can one avoid this? ...
5
by: =?Utf-8?B?cGF0cmlja2RyZA==?= | last post by:
Hi everyone! I get some errors lately regarding: HTTP_USER_AGENT Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) and: ...
8
by: Ciaran | last post by:
I have a piece of code that I'd rather google's spider did not follow. Is this possible please?
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.