I am trying to build a fairly simple spider, that takes in a Url,
downloads the page, extracts all links, and then downloads those.
The only problem I am struggling to crack is how to restrict the
downloaded links to only those from within the starting folder.
I am trapping the ResponseUri from the first Request in order to cope
with Redirection. From this I can determine the Host, but not the
folder.
e.g. if the Url = www17.brinkster.com/johnsmith/default.htm
I want www17.brinkster.com/johnsmith/
but the Host = www17.brinkster.com
the LocalPath = /johnsmith/default.htm
I would use the Uri.Segments, as this gives
1) /
2) johnsmith/
3) default.htm
so in this case I could remove the last item from the list, and build
my path from the rest.
But I have tried sites where the following segments would be listed
1) /
2) johnsmith
In which case I want all of them!
Is what I am trying to do possible?
Thanks,
Tony