By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
459,312 Members | 1,341 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 459,312 IT Pros & Developers. It's quick & easy.

Grab data from another site ! HELP ME PLEASE !

P: n/a
I have 2 codes below to grap data from another site. I use them to get
the data from one News site. However, when I click on some link inside
(such as :
http://www.thuthao.info/news/chitiet...5/01/3B9DB0AC/
), there are some errors, I try fix some, but hopeless. CAN ANY BODY
HELP ME ??? MUCH APPRECIATED
demo : news.thuthao.info
real site : www.vnexpress.net

Code of Index.php
----------------------------------------------------------------

<?php
function grabData($source_to_grab, $delimiter_start, $delimiter_stop,
$str_to_replace='', $str_replace='', $extra_data='') {
$fd = ""; $start_pos = $end_pos = 0;
$source_to_grab = fopen($source_to_grab, "r");
while(true) {
if($end_pos > $start_pos) {
$result = substr($fd, $start_pos, $end_pos-$start_pos);
$result .= $delimiter_stop;
break;
}//10
$data = fread($source_to_grab, 8192);
if(strlen($data) == 0) break;
$fd .= $data;
if(!$start_pos) $start_pos = strpos($fd, $delimiter_start);
if($start_pos) $end_pos = strpos(substr($fd, $start_pos),
$delimiter_stop) + $start_pos;
}
fclose($source_to_grab);
return str_replace($str_to_replace, $str_replace, $extra_data.$result);
}//19

$url = "http://vnexpress.net/Vietnam/Home/";

$delimiter_start = '<table width="100%" cellspacing=0 cellpadding=0
border=0><tr bgcolor="#CCCCCC">';
$delimiter_stop = '<td width=210 valign=top><A
href="/Vietnam/Home/buuthiep.gif" class=Normal></A>';
$web = grabData($url, $delimiter_start, $delimiter_stop, 'img src="/',
'img src="http://vnexpress.net/', '');
$web = str_replace('href="',
'href="http://www.thuthao.info/news/chitiet.php?url=', $web);
$header = '<html><head><meta http-equiv="Content-Type"
content="text/html; charset=UTF-8"><link rel="stylesheet"
href="Default.css" type="text/css"><title>NGUYEN HUYNH THU THAO
NEWS</title></head><body topmargin=3 leftmargin=0 marginheight=3
marginwidth=0>';
$footer = '</tr></table></body></html>';
$full = $header.$web.$footer;
echo '<div align=center><a href="http://news.thuthao.info">Trang
nhất</a> - <a href="http://www.thuthao.info">Trang chủ</a> - <a
href="http://forum.thuthao.info">Diễn đ*n</a></div>';
echo '<tr>&nbsp;</tr>';
echo $full;
-------------------------------------------------------------------

Code of chitiet.php
-------------------------------------------------------------------
<?php
function grabData($source_to_grab, $delimiter_start, $delimiter_stop,
$str_to_replace='', $str_replace='', $extra_data='') {
$fd = ""; $start_pos = $end_pos = 0;
$source_to_grab = fopen($source_to_grab, "r");
while(true) {
if($end_pos > $start_pos) {
$result = substr($fd, $start_pos, $end_pos-$start_pos);
$result .= $delimiter_stop;
break;
}//10
$data = fread($source_to_grab, 8192);
if(strlen($data) == 0) break;
$fd .= $data;
if(!$start_pos) $start_pos = strpos($fd, $delimiter_start);
if($start_pos) $end_pos = strpos(substr($fd, $start_pos),
$delimiter_stop) + $start_pos;
}
fclose($source_to_grab);
return str_replace($str_to_replace, $str_replace, $extra_data.$result);
}//19
$url = 'http://vnexpress.net'.$url;
$begin1 = '<table id="CContainer" border=0 cellpadding=0 cellspacing=0
width="100%">';
$begin2 = '<table width="100%" cellspacing=0 cellpadding=0 border=0>';
$delimiter_stop = '</ul>';
$web = grabData($url, $begin1, $delimiter_stop, '', '', '');
if (strlen($web) == 0) $web = grabData($url, $begin2 , $delimiter_stop,
'', '', '');
$web = str_replace('src="','src="'.$url.'/',$web);
$web =
str_replace('src="'.$url.'//','src="http://vnexpress.net/',$web);
$web = str_replace('href="',
'href="http://www.thuthao.info/news/chitiet.php?url=', $web);
$web =
str_replace('href="www.thuthao.info/news/chitiet.php?url=javascript:history.go(-1)',
'href="javascript:history.go(-1)', $web);
$header = '<html><head><meta http-equiv="Content-Type"
content="text/html; charset=UTF-8"><link rel="stylesheet"
href="Default.css" type="text/css"><title>NGUYEN HUYNH THU THAO -
NEWS</title></head><body topmargin=3 leftmargin=0 marginheight=3
marginwidth=0>';
$footer = '</td></tr><tr><td align="center"
nowrap></td></tr></table></body></html>';
$full = $header.$web.$footer;
echo '<div align=center><a href="http://news.thuthao.info">Trang
nhất</a> - <a href="http://www.thuthao.info">Trang chủ</a> - <a
href="http://forum.thuthao.info">Diễn đ*n</a></div>';
echo '<tr>&nbsp;</tr>';
echo $full;
------------------------------------------------------------------

Jul 17 '05 #1
Share this Question
Share on Google+
3 Replies


P: n/a
"Baby Blue" <da********@gmail.com> wrote in
news:11**********************@g14g2000cwa.googlegr oups.com:
I have 2 codes below to grap data from another site. I use them to get
the data from one News site. However, when I click on some link inside
(such as :
http://www.thuthao.info/news/chitiet...ress.net/Vietn
am/Kinh-doanh/2005/01/3B9DB0AC/ ), there are some errors, I try fix
some, but hopeless. CAN ANY BODY HELP ME ??? MUCH APPRECIATED
demo : news.thuthao.info
real site : www.vnexpress.net
"Baby Blue" <da********@gmail.com> wrote in
news:11**********************@g14g2000cwa.googlegr oups.com:
I have 2 codes below to grap data from another site. I use them to get
the data from one News site. However, when I click on some link inside
(such as :
http://www.thuthao.info/news/chitiet...ress.net/Vietn
am/Kinh-doanh/2005/01/3B9DB0AC/ ), there are some errors, I try fix
some, but hopeless. CAN ANY BODY HELP ME ??? MUCH APPRECIATED


Warning: fopen(http://vnexpress.nethttp://vnexpress.net/Vietnam/Kinh-
doanh/2005/01/3B9DB0AC/): failed to open stream: HTTP request failed!

That's the error message I got when I tried to visit the address you
posted. It should tell you what you really need to know, you're trying to
load an invalid URL. Somewhere before you try to load the URL, you need
to make sure that it only contains "http://vnexpress.net" once. You can
do this using str_replace.

At a glance, it looks like the problem is here:

$url = 'http://vnexpress.net'.$url;

At this point you should check to see if "http://vnexpress.net" is
already part of $url:

$url = 'http://vnexpress.net' . str_replace('http://vnexpress.net', '',
$url);

hth

--

Bulworth : PHP/MySQL/Unix | Email : str_rot13('f@fung.arg'); Web :
shaunc.com
--------------------------
|--------------------------------------------------
<http://www.phplabs.com/> | PHP scripts and thousands of webmaster
resources!
Jul 17 '05 #2

P: n/a
It is really works completely, Thank you very very much !!!

Jul 17 '05 #3

P: n/a
I dont know why, but it just ran well for a short time, now it still
has errors, can any body help me agian ?

Jul 17 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.