By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
424,853 Members | 992 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 424,853 IT Pros & Developers. It's quick & easy.

Any ideas how to read a url that's changed by the server?

P: n/a
I apologize, but I posted this in the php general forum earlier and
realized that this is the more appropriate forum. Hopefully there's a
coder here who has done this in the past.

I've got code that uses CURL to go a web page to read the data.

When I type in www.website.com, the server automatically adds a
session variable to the url. I need to be able to read that session
variable. Then I will use that session variable to input into a new
CURL session.

Any ideas how I can do this?

If I use code like this:

// find out the domain:
$domain = $_SERVER['HTTP_HOST'];
// find out the path to the current file:
$path = $_SERVER['SCRIPT_NAME'];

It gives me the code for where my script is sitting on my server
rather than the values for the web site that I'm trying to read.

Any ideas?

Thanks for your time!

Aug 20 '07 #1
Share this Question
Share on Google+
7 Replies


P: n/a
On Mon, 20 Aug 2007 15:48:23 -0000, TechieGrl <cs*******@gmail.comwrote:
>I apologize, but I posted this in the php general forum earlier and
realized that this is the more appropriate forum. Hopefully there's a
coder here who has done this in the past.

I've got code that uses CURL to go a web page to read the data.

When I type in www.website.com, the server automatically adds a
session variable to the url. I need to be able to read that session
variable. Then I will use that session variable to input into a new
CURL session.
As in it redirects to something like http://example.com/?SESSIONID=blah
?

In which case, tell cURL to follow redirects:

http://uk.php.net/manual/en/function.curl-setopt.php

with option CURLOPT_FOLLOWLOCATION.

then read the "effective URL" from the handle with:

http://uk.php.net/manual/en/function.curl-getinfo.php

with option CURLINFO_EFFECTIVE_URL.

You should then be able to extract the session ID from that using your choice
of text matching function.

--
Andy Hassall :: an**@andyh.co.uk :: http://www.andyh.co.uk
http://www.andyhsoftware.co.uk/space :: disk and FTP usage analysis tool
Aug 20 '07 #2

P: n/a
You should then be able to extract the session ID from that using your choice
of text matching function.

Now that I've extracted the unique key from the url, I'm finding that
I'm having problems opening up the actual pages. This unique key
appears to be their session variable and so now their server views the
first hit to the server as sesson 1 and assigns it an id. Then when I
send in a new url with the session id, it views it as being a new
session and so that unique session id is no longer valid.

Any thoughts?

Aug 20 '07 #3

P: n/a
Rik
On Mon, 20 Aug 2007 22:57:01 +0200, TechieGrl <cs*******@gmail.comwrote:
>
> You should then be able to extract the session ID from that using your
choice
of text matching function.


Now that I've extracted the unique key from the url, I'm finding that
I'm having problems opening up the actual pages. This unique key
appears to be their session variable and so now their server views the
first hit to the server as sesson 1 and assigns it an id. Then when I
send in a new url with the session id, it views it as being a new
session and so that unique session id is no longer valid.

Any thoughts?
Probably you need to use cookies.

<http://nl3.php.net/manual/en/function.curl-setopt.php>

//create sort of 'anonymous' cookiefile in temporary directory
$cookiefile = tempnam();

//initiliaze curl
$c = curl_init();

//follow redirects
curl_setopt($c,CURLOPT_FOLLOWLOCATION,true);

//store & retrieve cookies
curl_setopt($c,CURLOPT_COOKIEFILE,$cookiefile);
curl_setopt($c,CURLOPT_COOKIEJAR,$cookiefile);

//some people still think referrer should be checked, hidious:
curl_setopt($c,CURLOPT_AUTOREFERER,true);

//set the url
curl_setopt($c, CURLOPT_URL, "http://www.example.com/");

//and go:
curl_exec($c);

//and close
curl_close($c);

//and delete cookiefile
unlink($cookiefile);

--
Rik Wasmus
Aug 20 '07 #4

P: n/a
//set the url
curl_setopt($c, CURLOPT_URL, "http://www.example.com/");

Following the example as you have it worked great and gave me the
initial page information! But the problem is that I am not sure how
get to the page that I really need given how the url is created.

I need to hit this page - www.example.com/sessionId/page.html

My initial thought is to go to the main web site - www.example.com.
When I go to that site, I'm automatically redirected to a page that
has the session variable inserted into the url - www.example.com/sessionId/page.html

page.html is actually where the data is that I'm grabbing.

It seems as if I need to sent in 2 CURLOPT_URL values, but that's
where the session variable becomes a problem because it now thinks
that I have 2 separate sessions.

Maybe I'm approaching this all wrong.

Aug 20 '07 #5

P: n/a
Rik
On Tue, 21 Aug 2007 00:16:05 +0200, TechieGrl <cs*******@gmail.comwrote:
>
>//set the url
curl_setopt($c, CURLOPT_URL, "http://www.example.com/");


Following the example as you have it worked great and gave me the
initial page information! But the problem is that I am not sure how
get to the page that I really need given how the url is created.

I need to hit this page - www.example.com/sessionId/page.html

My initial thought is to go to the main web site - www.example.com.
When I go to that site, I'm automatically redirected to a page that
has the session variable inserted into the url -
www.example.com/sessionId/page.html

page.html is actually where the data is that I'm grabbing.

It seems as if I need to sent in 2 CURLOPT_URL values, but that's
where the session variable becomes a problem because it now thinks
that I have 2 separate sessions.
Requesting and discarding several pages before you enter the 'real' data
shouldn't be a problem like this.
Maybe I'm approaching this all wrong.
If you have a cookie with a session-id, you probably don't need in the URL
(might be required though, I don't know which site).

--
Rik Wasmus
Aug 21 '07 #6

P: n/a
Requesting and discarding several pages before you enter the 'real' data
shouldn't be a problem like this.
If you have a cookie with a session-id, you probably don't need in the URL
(might be required though, I don't know which site).

Here's an example of a redirect - not the same site that I'm using,
but you can see what happens here.

When I type in http://my.opera.com, I am redirected to http://my.opera.com/community

Then when I click on a link, I go to a page that includes "community"
in the url - http://my.opera.com/community/blog/2...er-of-the-week
I need to get from my.opera.com to the last url, but if the word
"community" was actually a changing session ID, then I would need to
check for that each time prior to getting to the page I really want,
member-of-the-week.

Does that make sense?

Aug 21 '07 #7

P: n/a
Rik
On Tue, 21 Aug 2007 15:44:12 +0200, TechieGrl <cs*******@gmail.comwrote:
>
>Requesting and discarding several pages before you enter the 'real' data
shouldn't be a problem like this.
If you have a cookie with a session-id, you probably don't need in the
URL
(might be required though, I don't know which site).


Here's an example of a redirect - not the same site that I'm using,
but you can see what happens here.

When I type in http://my.opera.com, I am redirected to
http://my.opera.com/community

Then when I click on a link, I go to a page that includes "community"
in the url -
http://my.opera.com/community/blog/2...er-of-the-week
I need to get from my.opera.com to the last url, but if the word
"community" was actually a changing session ID, then I would need to
check for that each time prior to getting to the page I really want,
member-of-the-week.

Does that make sense?
Could very well be. It all depends on how the implemented the session. If
you enable the cookies in CURL on most site you'll just use the cookies,
without having to check the url. If it enforces a GET session-id, you'll
have to check that & continue to add it to subsequent reuqests (recheck
for change, etc).

As said, you'll have to use curl_getinfo() to check for ending URL,
possible use a curl_setopt() to get some headers which might be important.

Usefull functions here are also parse_url() & parse_str() for the returned
(ending) url. And if it doesn't work, check with a 'normal' browser what
redirects/headers get sent (Fiddler for MSIE & LiveHTTPHeaders for FF come
to mind), copy that to curl, and remove again one by one untill you're
left with the once that really matter. It's all about discovering
(knowing/asking(would be fastest...)) what the actual inner workings of
the site are.

Keep in mind that CURL works great as long as the site doesn't use
javascript for some critical browsing/displaying/session functions. If it
does, you're in for a very painstaking translation of the critical
javascript code to the actual actions, which may or may not fail in future
with the minimum amount of change in the setup of the site.
--
Rik Wasmus
Aug 21 '07 #8

This discussion thread is closed

Replies have been disabled for this discussion.