473,473 Members | 2,025 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

screen scraping

I'm writing an application to scrape the code from client web sites to
look for links on the pages. I am using file_get_contents() function to
grad the code, but I don't know how to control for web sites that may
be down or unavailable. I know file_get_contents() returns FALSE on a
failure, but the error message still prints to the screen. How do I
avoid that?

Here a snippet of my code:

function urlcheck($url, $sitelink) {
// grab code from web site
if (file_get_contents($url)){
$html = file_get_contents($url);
//REGEX to pull the link code out of the array
$relink = "/<a.+?href=[\"\'](.*?)[\"\'].+?\>/i";

// Put the matching link code into an array called links
preg_match_all($relink, $html, $links);

// loop through links on the page and look for a match
for ($i=0; $i< count($links[0]); $i++) {
if ( strpos($links[1][$i], $sitelink) != false ||
strpos($links[1][$i], $sitelink) === 0 ) {
return $links[0][$i];
break;
}
}
}
else {
print "Doesn't exist";
}
}

Thanks
Clint Pidlubny

Jul 17 '05 #1
2 1699
How about reg ex'ing for particular error codes? You can look for
specific ones like 404, 500, should be the normal ones you would be
getting. If you see that in your return code, you know you have an
error.

HTH

JJ

Jul 17 '05 #2
I discovered if I set the URL check in a variable and check the
variable, the error will not be output.

i.e.

function urlcheck($url, $sitelink) {
$urlup = @file($url);
// grab code from web site
if ($urlup){
$html = file_get_contents($url);
//REGEX to pull the link code out of the array
$relink = "/<a.+?href=[\"\'](.*?)[\"\'].+?\>/i";

// Put the matching link code into an array called links
preg_match_all($relink, $html, $links);

// loop through links on the page and look for a match
for ($i=0; $i< count($links[0]); $i++) {
if ( strpos($links[1][$i], $sitelink) != false ||
strpos($links[1][$i], $sitelink) === 0 ) {
return $links[0][$i];
break;
}
}
}
else {
print "Doesn't exist";
}
}

Jul 17 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Jonathan Epstein | last post by:
I would like to perform a more classical type of "screen scraping" than what most people now associate with this term. I only want to find all the text on the current screen, and obtain associated...
4
by: Roland Hall | last post by:
Am I correct in assuming screen scraping is just the response text sent to the browser? If so, would that mean that this could not be screen scraped? function moi() { var tag = '<a href='; var...
2
by: Me | last post by:
I am dealing with a poorly written windows application that does not contain an API. I would like to use C# to run a predetermied set of steps in the application and scrape the resulting data...
0
by: Robert Martinez | last post by:
I've seen a lot about screen scraping with .NET, mostly in VB.net. I have been able to convert most of it over, but it is still just very basic stuff. Can someone help direct me toward some good...
3
by: _eee_ | last post by:
Does anyone know of a simple code module that can do screen scraping, including simulating user-entered pushbuttons, etc. I can get the first screen on a website with HttpWebRequest, but I need...
3
by: Jim Giblin | last post by:
I need to scrape specific information from another website, specifically the prices of precious metals from several different vendors. While I will credit the vendors as the data source, I do not...
1
by: niv | last post by:
Hello, I would like to screen scrape certain parts of a webpage...how can I do this in asp.net For instance.... a stockticker thats embeded on a webpage.. I dont want the entire page.. I...
4
by: rachel | last post by:
Hello, I am currently contracted out by a real estate agent. He has a page that he has created himself that has a list of homes.. their images and data in html format. He wants me to take...
4
by: different.engine | last post by:
Folks: I am screen scraping a large volume of data from Yahoo Finance each evening, and parsing with Beautiful Soup. I was wondering if anyone could give me some pointers on how to make it...
3
by: WFDGW2 | last post by:
I want to write or obtain C++ code that will scrape text from a dialog box within a poker client, and then record that text somewhere else. What do I do? Thanks.
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
1
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
1
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
1
muto222
php
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.