473,589 Members | 2,498 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Can you avoid that googlebot indexes PHPSESSID pages?

CAH
Hi

Can you avoid that googlebot indexes PHPSESSID pages? Googlebot is
indexing pages with PHPSESSID, which makes it think my page has a
infinite number of pages. How can one avoid this?

Here is an exsample of url that google register, that might make is
more clear what is happening

http://www.winches.dk/winches.php?ar...6f0d46334659ff...
http://www.winches.dk/winches.php?ar...b6aed41fc142ea...

I do use session registred ID, but if I visit my site I never see those
kind of urls? So how come google gets a hold of them?

Best regards
Mads

Apr 3 '06 #1
29 4840
CAH wrote:
Hi

Can you avoid that googlebot indexes PHPSESSID pages? Googlebot is
indexing pages with PHPSESSID, which makes it think my page has a
infinite number of pages. How can one avoid this?

Here is an exsample of url that google register, that might make is
more clear what is happening

http://www.winches.dk/winches.php?ar...6f0d46334659ff...
http://www.winches.dk/winches.php?ar...b6aed41fc142ea...

I do use session registred ID, but if I visit my site I never see those
kind of urls? So how come google gets a hold of them?

Best regards
Mads


http://www.php.net/manual/en/ref.session.php

--
=============== ===
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attgl obal.net
=============== ===
Apr 3 '06 #2
CAH wrote:
Hi

Can you avoid that googlebot indexes PHPSESSID pages? Googlebot is
indexing pages with PHPSESSID, which makes it think my page has a
infinite number of pages. How can one avoid this?
Well, one way to handle this is to check the User-Agent header to see
if the client is Googlebot and not enable session. Obviously if a page
is dependent on session then it ceases to be indexible.
Here is an exsample of url that google register, that might make is
more clear what is happening

http://www.winches.dk/winches.php?ar...6f0d46334659ff...
http://www.winches.dk/winches.php?ar...b6aed41fc142ea...

I do use session registred ID, but if I visit my site I never see those
kind of urls? So how come google gets a hold of them?


If session.use_tra ns_sid is enabled, then PHP tries to compensate for
the lack of cookie by inserting the session id into any possible links.

I think you have quite a problem on your hand. Once those links are in
Google's database, the bot will keep returning to them. You'll need to
detect the condition and tell Googlebot to buzz off so it doesn't eat
up your bandwidth quota.

Apr 3 '06 #3
CAH

CAH skrev:
Hi

Can you avoid that googlebot indexes PHPSESSID pages? Googlebot is
indexing pages with PHPSESSID, which makes it think my page has a
infinite number of pages. How can one avoid this?

Here is an exsample of url that google register, that might make is
more clear what is happening

http://www.winches.dk/winches.php?ar...6f0d46334659ff...
http://www.winches.dk/winches.php?ar...b6aed41fc142ea...

I do use session registred ID, but if I visit my site I never see those
kind of urls? So how come google gets a hold of them?

Best regards
Mads


I am now testing this as a solution

"Using .htaccess often, you need to put the following two lines in the
..htaccess file, if your host is using PHP as an Apache module:

php_value session.use_onl y_cookies 1
php_value session.use_tra ns_sid 0 "

The downside is my site now only functions when user has cookies
enabled, and I am still not sure whethers this will do the trick.

Apr 3 '06 #4
CAH wrote:

CAH skrev:
Hi

Can you avoid that googlebot indexes PHPSESSID pages? Googlebot is
indexing pages with PHPSESSID, which makes it think my page has a
infinite number of pages. How can one avoid this?

Here is an exsample of url that google register, that might make is
more clear what is happening

http://www.winches.dk/winches.php?ar...6f0d46334659ff...
http://www.winches.dk/winches.php?ar...b6aed41fc142ea...

I do use session registred ID, but if I visit my site I never see those
kind of urls? So how come google gets a hold of them?

Best regards
Mads

I am now testing this as a solution "Using .htaccess often, you need to put the following two lines in the
..htaccess file, if your host is using PHP as an Apache module: php_value session.use_onl y_cookies 1
php_value session.use_tra ns_sid 0 " The downside is my site now only functions when user has cookies
enabled, and I am still not sure whethers this will do the trick.


IIRC, google and other sites search for a file called robots.txt that give
directives on what it can and cannot index. Do a google search for
robots.txt to see... (to verify, look in your webserver log files - it
does show up as a request in my apache log files...)

If your robots.txt includes the following directive - it will skip the
entire site.

User-agent: *
Disallow: *

or to limit the scope of it's search:
User-agent: *
Disallow: /cgi-bin/
Disallow: /images/
Disallow: *.php

Apr 3 '06 #5
CAH
> IIRC, google and other sites search for a file called robots.txt that give
directives on what it can and cannot index. Do a google search for
robots.txt to see... (to verify, look in your webserver log files - it
does show up as a request in my apache log files...)

If your robots.txt includes the following directive - it will skip the
entire site.

User-agent: *
Disallow: *

or to limit the scope of it's search:
User-agent: *
Disallow: /cgi-bin/
Disallow: /images/
Disallow: *.php


I was testing this robot.txt

User-agent: Googlebot
Disallow: /*PHPSESSID

And that might solve it, I just do not know whether is works or not.

Mads

Apr 3 '06 #6
On Mon, 2006-04-03 at 01:20 -0700, CAH wrote:
Hi

Can you avoid that googlebot indexes PHPSESSID pages? Googlebot is
indexing pages with PHPSESSID, which makes it think my page has a
infinite number of pages. How can one avoid this?

Here is an exsample of url that google register, that might make is
more clear what is happening

http://www.winches.dk/winches.php?ar...6f0d46334659ff...
http://www.winches.dk/winches.php?ar...b6aed41fc142ea...

I do use session registred ID, but if I visit my site I never see those
kind of urls? So how come google gets a hold of them?

Best regards
Mads


There was some discussion of forcing cookies, but the author didn't want
to limit his users, so...

How about doing something like this:

// See if the user agent is Googlebot
$isGoogle = stripos($_SERVE R['HTTP_USER_AGEN T'], 'Googlebot');

// If it is, use ini_set to only allow cookies for the session variable
if ($isGoogle !== false) {
ini_set('sessio n.use_only_cook ies', '1');
}

Apr 4 '06 #7
CAH
> There was some discussion of forcing cookies, but the author didn't want
to limit his users, so...

How about doing something like this:

// See if the user agent is Googlebot
$isGoogle = stripos($_SERVE R['HTTP_USER_AGEN T'], 'Googlebot');

// If it is, use ini_set to only allow cookies for the session variable
if ($isGoogle !== false) {
ini_set('sessio n.use_only_cook ies', '1');
}


That is a cool solution, but can one be sure that one can reconize
googlebot? And how about all the other robots? Could one make a "is not
robot test"?

Thanks for the help
Mads

Apr 4 '06 #8
On Mon, 2006-04-03 at 23:57 -0700, CAH wrote:
There was some discussion of forcing cookies, but the author didn't want
to limit his users, so...

How about doing something like this:

// See if the user agent is Googlebot
$isGoogle = stripos($_SERVE R['HTTP_USER_AGEN T'], 'Googlebot');

// If it is, use ini_set to only allow cookies for the session variable
if ($isGoogle !== false) {
ini_set('sessio n.use_only_cook ies', '1');
}


That is a cool solution, but can one be sure that one can reconize
googlebot? And how about all the other robots? Could one make a "is not
robot test"?

Thanks for the help
Mads


I wouldn't expect all (or even most) robots to be easily identified by
the user-agent. Maybe you could make an array of the most common ones
(Googlebot, Inktomi, etc) and loop through it with the logic I
suggested. I also don't think you could check to see if it's a browser,
because firewalls & proxy servers may not send that information through.

Sorry! (It's not my internet. I just work here!)

Scott

Apr 4 '06 #9
CAH
> I wouldn't expect all (or even most) robots to be easily identified
by
the user-agent. Maybe you could make an array of the most common ones
(Googlebot, Inktomi, etc) and loop through it with the logic I
suggested. I also don't think you could check to see if it's a browser,
because firewalls & proxy servers may not send that information through.


I see what you mean.

Do you think this solution will work?

"Using .htaccess often, you need to put the following two lines in the
..htaccess file, if your host is using PHP as an Apache module:

php_value session.use_onl y_cookies 1
php_value session.use_tra ns_sid 0 "

I think it does, and even though you then have to rely on cookies, I
think it is the better solution because today this is a small minus,
compared to search engine problems.

If this solutions works

User-agent: Googlebot
Disallow: /*PHPSESSID

it would be by far the simplest, I do however not feel to sure that it
does work, and have no opportunity to check it at this time.

Regards
Mads

Apr 4 '06 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
7732
by: AmigaLemming | last post by:
As I understand my admins installed a PHP server and now my plain HTML pages also want to set a PHPSESSID cookie when loaded into a browser. Can I suppress this, e.g. by creating a configuration file like ..htaccess somewhere in my public_html directory? Is it possible to disable the setting of PHPSESSID cookies for plain HTML pages in general and maybe for PHP pages that don't need to track any session information?
4
8249
by: Arnaud | last post by:
Hi ! I would like to propagate data between php pages, in two cases : the pages are read by : 1- Internet Explorer It's ok, data are writen in one page, and read from another. I don't use PHPSESSID 2- from a mobile browser ( mobile i-Mode phone Nec22) After several tests, I understod it's impossible, because the session system tries to write a cookie on the browser...
7
7007
by: windandwaves | last post by:
Dear Gurus Is it correct that you do not have to pass the PHPSESSID in the header in order to keep a session going. What are the advantages/disadvantages of having the PHPSESSID in the header e..g. http://www.myurl.com/index.php?PHPSESSID=...... Thank you
24
2723
by: somaboy mx | last post by:
I've created a php page which is optimized for search engine indexation: no images, tables or css, just plain html with relevant meta tags etc. The page contains a list records pulled from a database, and for each record there is a link to the detail view for that record in this form: <a href="<?=$_SERVER ?>?rec=<?=$recordId ?>"> I've made sure the urlstring doesn't contain a variable called 'id' or something, since I heard some bots...
6
4572
by: Bonnie | last post by:
Hi: I'm hoping someone can shed some light on this issue. (I've been digging around everywhere and can't seem to find it by searching): I use the @import statement to attach an external style sheet on our site. I chose to use this technique rather than use the LINK tag because I wanted to hide the design from older browsers like NN4 and to present a clean, text-only version to JAWS and other accessibility screen readers.
3
2792
by: noop | last post by:
Hi, not really a html question, but... I've submitted my URL to Google for indexing. In the logs of my server, I see that googlebot has requested my /robots.txt and my /index.html, but it stopped there: there was no request for the other pages of my site. (It did so twice: after some time I re-submitted my URL, and googlebot still retrieved only the first page.) What did I do wrong? Here is a description of my site: - I didn't put any...
0
1747
by: John Smith | last post by:
Googlebot has been picking up numerous PHPSESSID name/value pairs in URIs at my website, and this causes duplicate hits and wasted bandwidth. I've since prevented PHPSESSID generation in my PHP script if Googlebot makes the request; like so: if(preg_match("/googlebot/i", $_SERVER) != 1) //session code here ....but this doesn't stop the PHPSESSID requests from Googlebot because they're already stored in its database, and it continues...
5
2909
by: =?Utf-8?B?cGF0cmlja2RyZA==?= | last post by:
Hi everyone! I get some errors lately regarding: HTTP_USER_AGENT Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) and: System.Web.UI.Util.CheckVirtualFileExists(VirtualPath virtualPath)
0
7929
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
7862
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8357
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
8223
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
5729
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
3847
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
3887
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2372
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1459
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.