Hi
Can you avoid that googlebot indexes PHPSESSID pages? Googlebot is
indexing pages with PHPSESSID, which makes it think my page has a
infinite number of pages. How can one avoid this?
Here is an exsample of url that google register, that might make is
more clear what is happening http://www.winches.dk/winches.php?ar...6f0d46334659ff... http://www.winches.dk/winches.php?ar...b6aed41fc142ea...
I do use session registred ID, but if I visit my site I never see those
kind of urls? So how come google gets a hold of them?
Best regards
Mads 29 4840
CAH wrote: Hi
Can you avoid that googlebot indexes PHPSESSID pages? Googlebot is indexing pages with PHPSESSID, which makes it think my page has a infinite number of pages. How can one avoid this?
Well, one way to handle this is to check the User-Agent header to see
if the client is Googlebot and not enable session. Obviously if a page
is dependent on session then it ceases to be indexible.
Here is an exsample of url that google register, that might make is more clear what is happening
http://www.winches.dk/winches.php?ar...6f0d46334659ff... http://www.winches.dk/winches.php?ar...b6aed41fc142ea...
I do use session registred ID, but if I visit my site I never see those kind of urls? So how come google gets a hold of them?
If session.use_tra ns_sid is enabled, then PHP tries to compensate for
the lack of cookie by inserting the session id into any possible links.
I think you have quite a problem on your hand. Once those links are in
Google's database, the bot will keep returning to them. You'll need to
detect the condition and tell Googlebot to buzz off so it doesn't eat
up your bandwidth quota.
CAH skrev: Hi
Can you avoid that googlebot indexes PHPSESSID pages? Googlebot is indexing pages with PHPSESSID, which makes it think my page has a infinite number of pages. How can one avoid this?
Here is an exsample of url that google register, that might make is more clear what is happening
http://www.winches.dk/winches.php?ar...6f0d46334659ff... http://www.winches.dk/winches.php?ar...b6aed41fc142ea...
I do use session registred ID, but if I visit my site I never see those kind of urls? So how come google gets a hold of them?
Best regards Mads
I am now testing this as a solution
"Using .htaccess often, you need to put the following two lines in the
..htaccess file, if your host is using PHP as an Apache module:
php_value session.use_onl y_cookies 1
php_value session.use_tra ns_sid 0 "
The downside is my site now only functions when user has cookies
enabled, and I am still not sure whethers this will do the trick.
CAH wrote: CAH skrev:
Hi
Can you avoid that googlebot indexes PHPSESSID pages? Googlebot is indexing pages with PHPSESSID, which makes it think my page has a infinite number of pages. How can one avoid this?
Here is an exsample of url that google register, that might make is more clear what is happening
http://www.winches.dk/winches.php?ar...6f0d46334659ff... http://www.winches.dk/winches.php?ar...b6aed41fc142ea...
I do use session registred ID, but if I visit my site I never see those kind of urls? So how come google gets a hold of them?
Best regards Mads
I am now testing this as a solution
"Using .htaccess often, you need to put the following two lines in the ..htaccess file, if your host is using PHP as an Apache module:
php_value session.use_onl y_cookies 1 php_value session.use_tra ns_sid 0 "
The downside is my site now only functions when user has cookies enabled, and I am still not sure whethers this will do the trick.
IIRC, google and other sites search for a file called robots.txt that give
directives on what it can and cannot index. Do a google search for
robots.txt to see... (to verify, look in your webserver log files - it
does show up as a request in my apache log files...)
If your robots.txt includes the following directive - it will skip the
entire site.
User-agent: *
Disallow: *
or to limit the scope of it's search:
User-agent: *
Disallow: /cgi-bin/
Disallow: /images/
Disallow: *.php
> IIRC, google and other sites search for a file called robots.txt that give directives on what it can and cannot index. Do a google search for robots.txt to see... (to verify, look in your webserver log files - it does show up as a request in my apache log files...)
If your robots.txt includes the following directive - it will skip the entire site.
User-agent: * Disallow: *
or to limit the scope of it's search: User-agent: * Disallow: /cgi-bin/ Disallow: /images/ Disallow: *.php
I was testing this robot.txt
User-agent: Googlebot
Disallow: /*PHPSESSID
And that might solve it, I just do not know whether is works or not.
Mads
On Mon, 2006-04-03 at 01:20 -0700, CAH wrote: Hi
Can you avoid that googlebot indexes PHPSESSID pages? Googlebot is indexing pages with PHPSESSID, which makes it think my page has a infinite number of pages. How can one avoid this?
Here is an exsample of url that google register, that might make is more clear what is happening
http://www.winches.dk/winches.php?ar...6f0d46334659ff... http://www.winches.dk/winches.php?ar...b6aed41fc142ea...
I do use session registred ID, but if I visit my site I never see those kind of urls? So how come google gets a hold of them?
Best regards Mads
There was some discussion of forcing cookies, but the author didn't want
to limit his users, so...
How about doing something like this:
// See if the user agent is Googlebot
$isGoogle = stripos($_SERVE R['HTTP_USER_AGEN T'], 'Googlebot');
// If it is, use ini_set to only allow cookies for the session variable
if ($isGoogle !== false) {
ini_set('sessio n.use_only_cook ies', '1');
}
> There was some discussion of forcing cookies, but the author didn't want to limit his users, so...
How about doing something like this:
// See if the user agent is Googlebot $isGoogle = stripos($_SERVE R['HTTP_USER_AGEN T'], 'Googlebot');
// If it is, use ini_set to only allow cookies for the session variable if ($isGoogle !== false) { ini_set('sessio n.use_only_cook ies', '1'); }
That is a cool solution, but can one be sure that one can reconize
googlebot? And how about all the other robots? Could one make a "is not
robot test"?
Thanks for the help
Mads
On Mon, 2006-04-03 at 23:57 -0700, CAH wrote: There was some discussion of forcing cookies, but the author didn't want to limit his users, so...
How about doing something like this:
// See if the user agent is Googlebot $isGoogle = stripos($_SERVE R['HTTP_USER_AGEN T'], 'Googlebot');
// If it is, use ini_set to only allow cookies for the session variable if ($isGoogle !== false) { ini_set('sessio n.use_only_cook ies', '1'); }
That is a cool solution, but can one be sure that one can reconize googlebot? And how about all the other robots? Could one make a "is not robot test"?
Thanks for the help Mads
I wouldn't expect all (or even most) robots to be easily identified by
the user-agent. Maybe you could make an array of the most common ones
(Googlebot, Inktomi, etc) and loop through it with the logic I
suggested. I also don't think you could check to see if it's a browser,
because firewalls & proxy servers may not send that information through.
Sorry! (It's not my internet. I just work here!)
Scott
> I wouldn't expect all (or even most) robots to be easily identified
by the user-agent. Maybe you could make an array of the most common ones (Googlebot, Inktomi, etc) and loop through it with the logic I suggested. I also don't think you could check to see if it's a browser, because firewalls & proxy servers may not send that information through.
I see what you mean.
Do you think this solution will work?
"Using .htaccess often, you need to put the following two lines in the
..htaccess file, if your host is using PHP as an Apache module:
php_value session.use_onl y_cookies 1
php_value session.use_tra ns_sid 0 "
I think it does, and even though you then have to rely on cookies, I
think it is the better solution because today this is a small minus,
compared to search engine problems.
If this solutions works
User-agent: Googlebot
Disallow: /*PHPSESSID
it would be by far the simplest, I do however not feel to sure that it
does work, and have no opportunity to check it at this time.
Regards
Mads This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: AmigaLemming |
last post by:
As I understand my admins installed a PHP server and now my plain HTML
pages also want to set a PHPSESSID cookie when loaded into a browser.
Can I suppress this, e.g. by creating a configuration file like
..htaccess somewhere in my public_html directory? Is it possible to
disable the setting of PHPSESSID cookies for plain HTML pages in
general and maybe for PHP pages that don't need to track any session
information?
|
by: Arnaud |
last post by:
Hi !
I would like to propagate data between php pages, in two cases :
the pages are read by :
1- Internet Explorer
It's ok, data are writen in one page, and read from another. I don't use
PHPSESSID
2- from a mobile browser ( mobile i-Mode phone Nec22)
After several tests, I understod it's impossible, because the session system
tries to write a cookie on the browser...
|
by: windandwaves |
last post by:
Dear Gurus
Is it correct that you do not have to pass the PHPSESSID in the header in
order to keep a session going. What are the advantages/disadvantages of
having the PHPSESSID in the header
e..g. http://www.myurl.com/index.php?PHPSESSID=......
Thank you
|
by: somaboy mx |
last post by:
I've created a php page which is optimized for search engine indexation: no
images, tables or css, just plain html with relevant meta tags etc.
The page contains a list records pulled from a database, and for each record
there is a link to the detail view for that record in this form: <a
href="<?=$_SERVER ?>?rec=<?=$recordId ?>">
I've made sure the urlstring doesn't contain a variable called 'id' or
something, since I heard some bots...
|
by: Bonnie |
last post by:
Hi:
I'm hoping someone can shed some light on this issue. (I've been
digging around everywhere and can't seem to find it by searching):
I use the @import statement to attach an external style sheet on our
site. I chose to use this technique rather than use the LINK tag
because I wanted to hide the design from older browsers like NN4 and to
present a clean, text-only version to JAWS and other accessibility
screen readers.
| |
by: noop |
last post by:
Hi, not really a html question, but...
I've submitted my URL to Google for indexing.
In the logs of my server, I see that googlebot has requested my /robots.txt
and my /index.html, but it stopped there: there was no request for the
other pages of my site.
(It did so twice: after some time I re-submitted my URL, and googlebot still
retrieved only the first page.)
What did I do wrong? Here is a description of my site:
- I didn't put any...
|
by: John Smith |
last post by:
Googlebot has been picking up numerous PHPSESSID name/value pairs in
URIs at my website, and this causes duplicate hits and wasted bandwidth.
I've since prevented PHPSESSID generation in my PHP script if
Googlebot makes the request; like so:
if(preg_match("/googlebot/i", $_SERVER) != 1)
//session code here
....but this doesn't stop the PHPSESSID requests from Googlebot because
they're already stored in its database, and it continues...
|
by: =?Utf-8?B?cGF0cmlja2RyZA==?= |
last post by:
Hi everyone!
I get some errors lately regarding:
HTTP_USER_AGENT Mozilla/5.0 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)
and:
System.Web.UI.Util.CheckVirtualFileExists(VirtualPath virtualPath)
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it.
First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
| |
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
|
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
|
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |