I'm not developing webcrawlers, but a quick thought of mine is
string link = "../../wohoo.asp"
string thisPageURL = "http://www.xyz.com/wohoo.asp"
stirng [] linkParts = System.Text.RegularExpressions.Regex.Split(link,
"x2Ex2E/"); // split on ../
string [] URLParts = System.Text.RegularExpressions.Regex.Split(thisPag eURL,
"/");
the length of linkParts.Lenght - 1 will now contain the wanted numbers of
"../" "directory recursion" and the last element will be the wanted page
the URL to the new page will be concatenated from the URLParts array,
exluding the the linkPartLength number of elements, and the last element in
LinkParts
Just a quick shot at an solution...
/mortb
"ask josephsen" <jaj(((a)))oticon.dk> wrote in message
news:40**********************@news.dk.uu.net...
Hi NG
I'm making a program to crawl the internet. It works by retrieving all
links in a page, downloading the page of each link and again retrieving all the
links. (If there is better ways I'd like to hear)
My problem is relative links (like "../../wohoo.asp"). What is the
smartest way to get the full url (http://www.xyz.com/wohoo.asp)? Do I have to parse
the relative link in relation to the url where the relative link was found
and then concatenate it? Does anyone know how other search-engines/
crawlers walk the net?
Thanks :)
./ask