Scraping Just Images in C#

I'm trying to create a page where users can just enter a url into a textbox, click search, and then we list all of the images on that exact page.

So far I've managed to get it to scrape the entire destination page, and using a regular expression, kind of extract the image. The problem is that the images are sometimes relative paths, which means they won't display.

Unless I'm missing something very obvious, does anyone know of any solutions for this kind of thing? I'm also hoping to do the same with embedded tags, so I can list up things like flash videos, etc.

I do realise that a lot of this code needs tidying up, but if anyone has anything I'd be very grateful.

For example, if I searched for www.google.com, I would want it to list the main google image, but instead of getting the path like this:
http://www.google.co.uk/intl/en_uk/images/logo.gif

I get the path like this:
/intl/en_uk/images/logo.gif

Expand|Select|Wrap|Line Numbers

 
    public void doSearch(object sender, EventArgs e)

    {

        results_tbl.Rows.Clear();

        string reqURL = url_searchBox_txt.Text;
 
        if (!reqURL.StartsWith("http://"))

        {

            reqURL = "http://" + reqURL;

        }
 
        WebRequest req = WebRequest.Create(reqURL);

        WebResponse resp = req.GetResponse();
 
        Stream s = resp.GetResponseStream();

        StreamReader sr = new StreamReader(s,Encoding.ASCII);
 
        string st = sr.ReadToEnd();
 
        Regex r = new Regex(@"<img([^>]+)>",RegexOptions.IgnoreCase | RegexOptions.Compiled);

        Match m = r.Match(st);

        while (m.Success)

        {

                TableRow tr = new TableRow();

                TableCell tc1 = new TableCell();    //Item
 
                tc1.Text = "<img " + m.Groups[1].Value + "/>";
 
                tr.Cells.Add(tc1);

                tr.Cells.Add(tc2);
 
                results_tbl.Rows.Add(tr);

                m = m.NextMatch();

        }

    }

Nov 12 '09 #1

Subscribe Post Reply

7190

Bassem

344

100+

You have solved one of three, not one of two!!

Pay attention to that:
The src attribute - of the img element - content is a link to a URL so its contents type is one of these:
1. Fully qualified URL.
2. Absolute.
3. Relative.

You have solved the first type, it remains two more.

Anyway, consider this method:
1. You have "url_searchBox_txt.Text" it contains the URL has a type of three, but all contain the domain name (host name), you can split it.
2. Extract the img's src property, compare the value if it begins with the domain name... so it is type #1.
Else if it begins with "/" slash... so it is type #2.
Else... it is type #3.
3. For type #1: go on.
For type #2: insert the domain name into the start of the value. That's it, very simple.
For type #3: Oh, now you got a problem, you will need to search in the website directories and I have no idea how to solve this.

Thanks,
Bassem

Nov 13 '09 #2

swapan das

The problem is so simple.Look,A web page can import image or media file from its local server or remote server.When the page import image from external server the image url looks like:
<img src="http://www.domain.com/01.jpg></img>
But when the page import image from local server then the image reference looks like:
<img src="/images/01.jpg".
So to fix the problem,just add the http url path at the begining looks: "htt://www.google.com/"+img_result

Sep 29 '10 #3

Similar topics

screen scraping

by: Roland Hall | last post by:

Am I correct in assuming screen scraping is just the response text sent to the browser? If so, would that mean that this could not be screen scraped? function moi() { var tag = '<a href='; var...

ASP / Active Server Pages

Decoding base64 data and extracting images

by: gRizwan | last post by:

Hello all, We have a problem on a webpage. That page is sent some email data in base64 format. what we need to do is, decode the base64 data back to original shape and extract attached image...

ASP / Active Server Pages

how to screen scrape content + images

by: rachel | last post by:

Hello, I am currently contracted out by a real estate agent. He has a page that he has created himself that has a list of homes.. their images and data in html format. He wants me to take...

ASP.NET

Screen scraping question

by: Victor | last post by:

I'm doing screen scraping by retrieving data from one site and entering into another site. I have a problem with logging into the site. User name and password field contain 'name' property, and...

ASP.NET

Java script and screen scraping

by: Victor | last post by:

Hi, I have a problem with logging into web site via screen scraping. User name and password field contain 'name' property, and therefore I can easily do assignment to them:...

ASP.NET

Is there a way to block images from being viewed if not loaded inside a page?

by: darrel | last post by:

Is there a way to prohibit images from being viewed (linked directly to) unless they are loaded from a page in my application? I'm working on a project where one can create galleries of images....

ASP.NET

Screen Scraping for Modern Applications?

by: ljr2600 | last post by:

Hello, I'm very new to python and still familiarizing myself with the language, sorry if the post seems moronic or simple. For a side project I'm working on I need to be able to scrape a...

Python

images in a grid using DIV's instead of a table...

by: wattanabi | last post by:

Greetings, I'm attempting to layout a bunch of images in a grid using DIV's instead of a table. I currently have a 3x6 table that I need to convert to css. I've seen various example of a 3 to 4...

HTML / CSS

Images/CSS not resolving on dev machine testing

by: John Kotuby | last post by:

I have just upgraded to a new development machine that came with Vista ultimate. I am developing a website with VS2005 and VB. My image and css references in my source code are all relative. For...

ASP.NET

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing