472,127 Members | 1,669 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,127 software developers and data experts.

how to screen scrape content + images

Hello,

I am currently contracted out by a real estate agent. He
has a page that he has created himself that has a list of
homes.. their images and data in html format.

He wants me to take this page and reformat it so that it
looks different.
Do I use screen scraping to do this?
Could someone please point me to a good screen scraping
article... I am using ASP.NET and C#

Thanks,
Rachel
Nov 19 '05 #1
4 3323
For doing that, you'd be better off with
a website copier, like Teleport Pro.

There's dozens of them at
http://www.tucows.com/offline95_default.html

Most have a 30 day trial period.
Some are freeware, too.

Make sure you check the rating ( # of cows )
at that page before installing the one you choose.

You'll find more at
http://www.download.com/3120-20_4-0....fline+browsing
and at http://www.snapfiles.com/Freeware/do...fwoffline.html

Juan T. Llibre
ASP.NET MVP
===========
"rachel" <ra****@hotmail.com> wrote in message
news:05****************************@phx.gbl...
Hello,

I am currently contracted out by a real estate agent. He
has a page that he has created himself that has a list of
homes.. their images and data in html format.

He wants me to take this page and reformat it so that it
looks different.
Do I use screen scraping to do this?
Could someone please point me to a good screen scraping
article... I am using ASP.NET and C#

Thanks,
Rachel

Nov 19 '05 #2
Hi Juan,
Thanks for the quick reply.
Are you are saying that I can use Teleport Pro with
ASP.NET to get the desired outcome?
I have to use ASP.NET as well because the website has
other functions that it performs.

Thanks for your help, I look forward to your reply.
Rachel
-----Original Message-----
For doing that, you'd be better off with
a website copier, like Teleport Pro.

There's dozens of them at
http://www.tucows.com/offline95_default.html

Most have a 30 day trial period.
Some are freeware, too.

Make sure you check the rating ( # of cows )
at that page before installing the one you choose.

You'll find more at
http://www.download.com/3120-20_4-0.html? qt=offline+browsingand at http://www.snapfiles.com/Freeware/do...fwoffline.html


Juan T. Llibre
ASP.NET MVP
===========
"rachel" <ra****@hotmail.com> wrote in message
news:05****************************@phx.gbl...
Hello,

I am currently contracted out by a real estate agent. He
has a page that he has created himself that has a list of homes.. their images and data in html format.

He wants me to take this page and reformat it so that it
looks different.
Do I use screen scraping to do this?
Could someone please point me to a good screen scraping
article... I am using ASP.NET and C#

Thanks,
Rachel

.

Nov 19 '05 #3
re:
Are you are saying that I can use Teleport Pro
with ASP.NET to get the desired outcome?
No, no.

I thought you only wanted to get the pages and images,
so you could reformat the presentation, in order to
later proceed to write the code in ASP.NET.

Teleport Pro allows you to replicate the directory
structure of the site, and then you could write
your ASP.NET application using the same image
directory structure which your client is using currently.

There's some gotchas, like if you client uses a database
to store the images ( I hope not ) but, essentially,
using a website downloader lets you get the basics.

Hints :

There's a free Open Source applications called nGallery
http://www.ngallery.org/ which might give you some ideas
about how to handle ASP.NET code for retrieving/displaying
images and manipulating descriptions, etc.

If you're familiar with ASP, maybe it would help you to take
a look at this free Real Estate website code at the Code Project :
http://www.codeproject.com/useritems...te-website.asp

Good luck!
Juan T. Llibre
ASP.NET MVP
===========
"rachel" <ra****@hotmail.com> wrote in message
news:05****************************@phx.gbl... Hi Juan,
Thanks for the quick reply.
Are you are saying that I can use Teleport Pro with
ASP.NET to get the desired outcome?
I have to use ASP.NET as well because the website has
other functions that it performs.

Thanks for your help, I look forward to your reply.
Rachel
-----Original Message-----
For doing that, you'd be better off with
a website copier, like Teleport Pro.

There's dozens of them at
http://www.tucows.com/offline95_default.html

Most have a 30 day trial period.
Some are freeware, too.

Make sure you check the rating ( # of cows )
at that page before installing the one you choose.

You'll find more at
http://www.download.com/3120-20_4-0.html?

qt=offline+browsing
and at

http://www.snapfiles.com/Freeware/do...fwoffline.html



Juan T. Llibre
ASP.NET MVP
===========
"rachel" <ra****@hotmail.com> wrote in message
news:05****************************@phx.gbl...
Hello,

I am currently contracted out by a real estate agent. He
has a page that he has created himself that has a list of homes.. their images and data in html format.

He wants me to take this page and reformat it so that it
looks different.
Do I use screen scraping to do this?
Could someone please point me to a good screen scraping
article... I am using ASP.NET and C#

Thanks,
Rachel

.

Nov 19 '05 #4
Rachel,

If your extraction is a one-time effort, designed to gather the basic
content for your new version of the website, it's easiest to use a tool like
Juan recommended or even just extract the details by hand. Real-estate
listings can be fairly complex, containing a couple of hundred fields per
property listing, so you might consider whipping up some tools for yourself
to rend the data from the page. Regular expressions are very useful for
this purpose.

If your content-extraction need is recurring, I would at all costs avoid
screen scraping. That's akin to using their existing website as a database
for your new site. Among other things, it means they have to keep their old
site running somewhere and in good working order.

Instead, do some digging to find out where the content is originating from.
If they're taking the photographs and entering the content directly into
their website themselves, you'll probably have to mimic that functionality
through a set of web-based administrative tools. In that case you may be
able to skip the listing-content extraction entirely, build the tools, and
have your client re-enter all of the listing. Sell the idea as
"training"... =)

There's a good chance that they are using a third party provider to acquire
the listings, or are feeding the data in directly from their local MLS. In
the US, most multiple listing services (MLSs) now comply with the national
IDX and VOW standards for publishing listings. Assuming your client's MLS
does, you can acquire a developer license and pull the content yourself from
the MLS, store it in a database, and then embed the data in the website as
desired.

We do this for the Chicago region, so I should note that the effort is all
fairly significant. The raw data is often published daily in large CSV
files (100 MB+ in size), retrieved from an FTP server. It's fully
de-normalized so you probably want to do a ton of scrubbing and
normalization to make it useful. You'll likely need to decode all of the
fields to English text so that the general public can make sense of the
listing content. Images are also often FTP'd although some MLS's offer URL
access to the photos for active listings (i.e. you'd have to cache some if
you want to display sold listings for your client). In the VOW ("Virtual
Office Website") program, regulations are such that you also need to have an
enrollment process before visitors are permitted to see the listings, do an
email address verification by sending an account activation email, etc. etc.
etc.

Nothing insurmountable, but expect to grind some code if you go this route.
Alternately, you may be able to find a third party service to handle the
listing display entirely, and if your client likes the appearance (you
rarely have choices...), then you can just focus on the rest of the website.

/// M

"rachel" <ra****@hotmail.com> wrote in message
news:05****************************@phx.gbl...
Hello,

I am currently contracted out by a real estate agent. He
has a page that he has created himself that has a list of
homes.. their images and data in html format.

He wants me to take this page and reformat it so that it
looks different.
Do I use screen scraping to do this?
Could someone please point me to a good screen scraping
article... I am using ASP.NET and C#

Thanks,
Rachel

Nov 19 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

2 posts views Thread by Adrian Lumsden | last post: by
reply views Thread by Jason Steeves | last post: by
2 posts views Thread by Rob Lauer | last post: by
reply views Thread by Steve | last post: by
7 posts views Thread by Swanand Mokashi | last post: by
2 posts views Thread by soul_chicken | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.