473,396 Members | 1,726 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

how to screen scrape content + images

Hello,

I am currently contracted out by a real estate agent. He
has a page that he has created himself that has a list of
homes.. their images and data in html format.

He wants me to take this page and reformat it so that it
looks different.
Do I use screen scraping to do this?
Could someone please point me to a good screen scraping
article... I am using ASP.NET and C#

Thanks,
Rachel
Nov 19 '05 #1
4 3436
For doing that, you'd be better off with
a website copier, like Teleport Pro.

There's dozens of them at
http://www.tucows.com/offline95_default.html

Most have a 30 day trial period.
Some are freeware, too.

Make sure you check the rating ( # of cows )
at that page before installing the one you choose.

You'll find more at
http://www.download.com/3120-20_4-0....fline+browsing
and at http://www.snapfiles.com/Freeware/do...fwoffline.html

Juan T. Llibre
ASP.NET MVP
===========
"rachel" <ra****@hotmail.com> wrote in message
news:05****************************@phx.gbl...
Hello,

I am currently contracted out by a real estate agent. He
has a page that he has created himself that has a list of
homes.. their images and data in html format.

He wants me to take this page and reformat it so that it
looks different.
Do I use screen scraping to do this?
Could someone please point me to a good screen scraping
article... I am using ASP.NET and C#

Thanks,
Rachel

Nov 19 '05 #2
Hi Juan,
Thanks for the quick reply.
Are you are saying that I can use Teleport Pro with
ASP.NET to get the desired outcome?
I have to use ASP.NET as well because the website has
other functions that it performs.

Thanks for your help, I look forward to your reply.
Rachel
-----Original Message-----
For doing that, you'd be better off with
a website copier, like Teleport Pro.

There's dozens of them at
http://www.tucows.com/offline95_default.html

Most have a 30 day trial period.
Some are freeware, too.

Make sure you check the rating ( # of cows )
at that page before installing the one you choose.

You'll find more at
http://www.download.com/3120-20_4-0.html? qt=offline+browsingand at http://www.snapfiles.com/Freeware/do...fwoffline.html


Juan T. Llibre
ASP.NET MVP
===========
"rachel" <ra****@hotmail.com> wrote in message
news:05****************************@phx.gbl...
Hello,

I am currently contracted out by a real estate agent. He
has a page that he has created himself that has a list of homes.. their images and data in html format.

He wants me to take this page and reformat it so that it
looks different.
Do I use screen scraping to do this?
Could someone please point me to a good screen scraping
article... I am using ASP.NET and C#

Thanks,
Rachel

.

Nov 19 '05 #3
re:
Are you are saying that I can use Teleport Pro
with ASP.NET to get the desired outcome?
No, no.

I thought you only wanted to get the pages and images,
so you could reformat the presentation, in order to
later proceed to write the code in ASP.NET.

Teleport Pro allows you to replicate the directory
structure of the site, and then you could write
your ASP.NET application using the same image
directory structure which your client is using currently.

There's some gotchas, like if you client uses a database
to store the images ( I hope not ) but, essentially,
using a website downloader lets you get the basics.

Hints :

There's a free Open Source applications called nGallery
http://www.ngallery.org/ which might give you some ideas
about how to handle ASP.NET code for retrieving/displaying
images and manipulating descriptions, etc.

If you're familiar with ASP, maybe it would help you to take
a look at this free Real Estate website code at the Code Project :
http://www.codeproject.com/useritems...te-website.asp

Good luck!
Juan T. Llibre
ASP.NET MVP
===========
"rachel" <ra****@hotmail.com> wrote in message
news:05****************************@phx.gbl... Hi Juan,
Thanks for the quick reply.
Are you are saying that I can use Teleport Pro with
ASP.NET to get the desired outcome?
I have to use ASP.NET as well because the website has
other functions that it performs.

Thanks for your help, I look forward to your reply.
Rachel
-----Original Message-----
For doing that, you'd be better off with
a website copier, like Teleport Pro.

There's dozens of them at
http://www.tucows.com/offline95_default.html

Most have a 30 day trial period.
Some are freeware, too.

Make sure you check the rating ( # of cows )
at that page before installing the one you choose.

You'll find more at
http://www.download.com/3120-20_4-0.html?

qt=offline+browsing
and at

http://www.snapfiles.com/Freeware/do...fwoffline.html



Juan T. Llibre
ASP.NET MVP
===========
"rachel" <ra****@hotmail.com> wrote in message
news:05****************************@phx.gbl...
Hello,

I am currently contracted out by a real estate agent. He
has a page that he has created himself that has a list of homes.. their images and data in html format.

He wants me to take this page and reformat it so that it
looks different.
Do I use screen scraping to do this?
Could someone please point me to a good screen scraping
article... I am using ASP.NET and C#

Thanks,
Rachel

.

Nov 19 '05 #4
Rachel,

If your extraction is a one-time effort, designed to gather the basic
content for your new version of the website, it's easiest to use a tool like
Juan recommended or even just extract the details by hand. Real-estate
listings can be fairly complex, containing a couple of hundred fields per
property listing, so you might consider whipping up some tools for yourself
to rend the data from the page. Regular expressions are very useful for
this purpose.

If your content-extraction need is recurring, I would at all costs avoid
screen scraping. That's akin to using their existing website as a database
for your new site. Among other things, it means they have to keep their old
site running somewhere and in good working order.

Instead, do some digging to find out where the content is originating from.
If they're taking the photographs and entering the content directly into
their website themselves, you'll probably have to mimic that functionality
through a set of web-based administrative tools. In that case you may be
able to skip the listing-content extraction entirely, build the tools, and
have your client re-enter all of the listing. Sell the idea as
"training"... =)

There's a good chance that they are using a third party provider to acquire
the listings, or are feeding the data in directly from their local MLS. In
the US, most multiple listing services (MLSs) now comply with the national
IDX and VOW standards for publishing listings. Assuming your client's MLS
does, you can acquire a developer license and pull the content yourself from
the MLS, store it in a database, and then embed the data in the website as
desired.

We do this for the Chicago region, so I should note that the effort is all
fairly significant. The raw data is often published daily in large CSV
files (100 MB+ in size), retrieved from an FTP server. It's fully
de-normalized so you probably want to do a ton of scrubbing and
normalization to make it useful. You'll likely need to decode all of the
fields to English text so that the general public can make sense of the
listing content. Images are also often FTP'd although some MLS's offer URL
access to the photos for active listings (i.e. you'd have to cache some if
you want to display sold listings for your client). In the VOW ("Virtual
Office Website") program, regulations are such that you also need to have an
enrollment process before visitors are permitted to see the listings, do an
email address verification by sending an account activation email, etc. etc.
etc.

Nothing insurmountable, but expect to grind some code if you go this route.
Alternately, you may be able to find a third party service to handle the
listing display entirely, and if your client likes the appearance (you
rarely have choices...), then you can just focus on the rest of the website.

/// M

"rachel" <ra****@hotmail.com> wrote in message
news:05****************************@phx.gbl...
Hello,

I am currently contracted out by a real estate agent. He
has a page that he has created himself that has a list of
homes.. their images and data in html format.

He wants me to take this page and reformat it so that it
looks different.
Do I use screen scraping to do this?
Could someone please point me to a good screen scraping
article... I am using ASP.NET and C#

Thanks,
Rachel

Nov 19 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Adrian Lumsden | last post by:
Hello, I have an app where I have to screen scrape to capture an image from a JMF film player. The user is given a dialog with a list of frames that can be exported as images. If the one they...
0
by: Jason Steeves | last post by:
I have one .aspx form that my users fill out and this then takes that information and populates a second .aspx form via session variables. I need to screen scrape the second .aspx form and e-mail...
3
by: Ollie | last post by:
I know you can screen scrape a website using the System.Net.HttpWebResponse & System.Net.HttpWebRequest classes. But how do you screen scrape a secured website (https) that takes a username &...
2
by: Rob Lauer | last post by:
I have written two completely separate web applications that cannot talk directly to one another (applications "A" and "B"). Application "A" has a form that takes some input (radio buttons,...
0
by: Steve | last post by:
I am working on an application to screen scrape information from a web page. I have the base code working but the problem is I have to login before I can get the info I need. The page is hosted on...
7
by: Swanand Mokashi | last post by:
Hi all -- I would like to create an application(call it Application "A") that I would like to mimic exactly as a form on a foreign system (Application "F"). Application "F" is on the web (so...
2
by: soul_chicken | last post by:
I'm working on a simple image upload project using the asp:FileUpload control. I read the file in and stick it into a clients content management system. The problem is that once I redirect back to...
0
by: mpsmith78 | last post by:
I am tryin to screen scrape a specific panel in a content page to a temporary file in order to send in the body of an e-mail. The problem is that if i screen scrape the page it also copies the...
1
by: newdev | last post by:
Hi All, Can somebody maybe please help me? - how do i screen scrape data from a dos application / window to .net application by using c#? - how do i screen scrape data from a dos application /...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.