473,408 Members | 1,707 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,408 software developers and data experts.

custom user agent

I have a simple crawler as part of a website. It uses file() to copy
HTML content from a web page on another website and reformat it into a
new HTML page using a template. Is there a simple way of having the
crawler supply a custom HTTP user agent header to identify itself to
the website being crawled? I don't want to alter the php.ini file, I
need to set the user agent value from PHP if possible.

Thanks

Jul 17 '05 #1
4 2639
st****@stoxfx.com wrote:
I have a simple crawler as part of a website. It uses file() to copy
HTML content from a web page on another website and reformat it into a
new HTML page using a template. Is there a simple way of having the
crawler supply a custom HTTP user agent header to identify itself to
the website being crawled? I don't want to alter the php.ini file, I
need to set the user agent value from PHP if possible.

Thanks


Hi,

AFAIK: headers are added to the actual HTML page by the software that is
sending the HTML page.
For example: The webserver will add a few headers.

So I do not know how you can 'tag' your pages on the second server, so the
webserver will know it should send some extra headers.

You can however always add META-tags into the document itself, but I don't
know if that is what you need.

just my 2 cents..

Regards,
Erwin Moller
Jul 17 '05 #2
Is there a simple way of having the
crawler supply a custom HTTP user agent header to identify itself


ini_set( "user_agent", "MyCrappyCrawler/1.0\r\n");

---
Steve

Jul 17 '05 #3
"st****@stoxfx.com" wrote:
I have a simple crawler as part of a website. It uses file() to copy
HTML content from a web page on another website and reformat it into a
new HTML page using a template. Is there a simple way of having the
crawler supply a custom HTTP user agent header to identify itself to
the website being crawled? I don't want to alter the php.ini file, I
need to set the user agent value from PHP if possible.

Thanks


There may be safe-mode restrictions to prevent you doing this, but have a go
anyway:

ini_set("user_agent","Marvin the Paranoid Android");

If you're planning to crawl sites that don't belong to you, be nice and
check the robots.txt file first.

--
phil [dot] ronan @ virgin [dot] net
http://vzone.virgin.net/phil.ronan/
Jul 17 '05 #4

We have permission to crawl this page and ini_set() works just fine.

Thanks all!

Jul 17 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: Paul Woods | last post by:
I'm developing an application that uses a custom 404 page to deliver all of my site's content. However, doing things this way renders IIS's regular log files pretty much useless. Are there any...
60
by: Fotios | last post by:
Hi guys, I have put together a flexible client-side user agent detector (written in js). I thought that some of you may find it useful. Code is here: http://fotios.cc/software/ua_detect.htm ...
4
by: Scott C. Reynolds | last post by:
Hi. I have created a templated control, and I want to know how i can set it up so that the projects consuming it will have all the same intellisense benefits of using any other server control. ...
6
by: Scott Zabolotzky | last post by:
I'm trying to pass a custom object back and forth between forms. This custom object is pulled into the app using an external reference to an assembly DLL that was given to me by a co-worker. A...
3
by: Raventhorn | last post by:
I am having problems that I also saw people having in the ASP.NET forums with menus and people coming to a site with weird user agent values. Is there a way to access the user agent before the user...
5
by: Martin Bischoff | last post by:
Hi, is it possible to implement custom resource managers for ASP.NET 2.0 so that strings can be read from a database (for example)? Ideally, it should be possible to configure the custom...
8
by: bryan | last post by:
I've got a custom HttpHandler to process all requests for a given extension. It gets invoked OK, but if I try to do a Server.Transfer I get an HttpException. A Response.Redirect works, but I really...
3
by: Tantr Mantr | last post by:
Hello , I have a class which I serialize using XMLSerializer. This class has public properties which are based on other interfaces. Because of this I am unable to serialize the object. Error :...
35
by: RobG | last post by:
Seems developers of mobile applications are pretty much devoted to UA sniffing: <URL: http://wurfl.sourceforge.net/vodafonerant/index.htm > -- Rob
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.