So far I've managed to get it to scrape the entire destination page, and using a regular expression, kind of extract the image. The problem is that the images are sometimes relative paths, which means they won't display.
Unless I'm missing something very obvious, does anyone know of any solutions for this kind of thing? I'm also hoping to do the same with embedded tags, so I can list up things like flash videos, etc.
I do realise that a lot of this code needs tidying up, but if anyone has anything I'd be very grateful.
For example, if I searched for www.google.com, I would want it to list the main google image, but instead of getting the path like this:
http://www.google.co.uk/intl/en_uk/images/logo.gif
I get the path like this:
/intl/en_uk/images/logo.gif
Expand|Select|Wrap|Line Numbers
- public void doSearch(object sender, EventArgs e)
- {
- results_tbl.Rows.Clear();
- string reqURL = url_searchBox_txt.Text;
- if (!reqURL.StartsWith("http://"))
- {
- reqURL = "http://" + reqURL;
- }
- WebRequest req = WebRequest.Create(reqURL);
- WebResponse resp = req.GetResponse();
- Stream s = resp.GetResponseStream();
- StreamReader sr = new StreamReader(s,Encoding.ASCII);
- string st = sr.ReadToEnd();
- Regex r = new Regex(@"<img([^>]+)>",RegexOptions.IgnoreCase | RegexOptions.Compiled);
- Match m = r.Match(st);
- while (m.Success)
- {
- TableRow tr = new TableRow();
- TableCell tc1 = new TableCell(); //Item
- tc1.Text = "<img " + m.Groups[1].Value + "/>";
- tr.Cells.Add(tc1);
- tr.Cells.Add(tc2);
- results_tbl.Rows.Add(tr);
- m = m.NextMatch();
- }
- }