473,323 Members | 1,537 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,323 software developers and data experts.

screen scraping

Am I correct in assuming screen scraping is just the response text sent to
the browser? If so, would that mean that this could not be screen scraped?

function moi() {
var tag = '<a href=';
var tagType1 = '"mail'+'to:', tagType2 = '">', tagType3 = '<\/a>';
var user1 = 'web', user2 = 'master', user3 = '@';
var dom1 = 'danger', dom2 = 'ous', dom3 = 'ly';
var tld = '.us';
document.write(tag+tagType1+user1+user2+user3+dom1 +dom2+dom3+tld+tagType2+user1+user2+user3+dom1+dom 2+dom3+tld+tagType3);
}

--
Roland Hall
/* This information is distributed in the hope that it will be useful, but
without any warranty; without even the implied warranty of merchantability
or fitness for a particular purpose. */
Technet Script Center - http://www.microsoft.com/technet/scriptcenter/
WSH 5.6 Documentation - http://msdn.microsoft.com/downloads/list/webdev.asp
MSDN Library - http://msdn.microsoft.com/library/default.asp
Jul 22 '05 #1
4 5712
Screen scraping is a technique, not a format. The technique is to intercept
the raw data (in this case HTML)that would normally be displayed on the
client system screen and extract data from it. In ASP context screen
scraping would typically be done by having a server-side component (such as
xmlhttprequest) perform a get or post to a url and return the raw HTML as
text. Then a parser of some kind is used to extract the desired information.

The example you present would be difficult (though not impossible) to
screen-scrape server-side. The parser would have to be able to evaluate the
output of the JavaScript function to get the data. I have seen references to
using the HTML browser component (MSHTML object) to do things like this but
I don't think it works well server-side.

--
Mark Schupp
Head of Development
Integrity eLearning
www.ielearning.com
"Roland Hall" <nobody@nowhere> wrote in message
news:Og*************@tk2msftngp13.phx.gbl...
Am I correct in assuming screen scraping is just the response text sent to
the browser? If so, would that mean that this could not be screen scraped?
function moi() {
var tag = '<a href=';
var tagType1 = '"mail'+'to:', tagType2 = '">', tagType3 = '<\/a>';
var user1 = 'web', user2 = 'master', user3 = '@';
var dom1 = 'danger', dom2 = 'ous', dom3 = 'ly';
var tld = '.us';
document.write(tag+tagType1+user1+user2+user3+dom1 +dom2+dom3+tld+tagType2+us
er1+user2+user3+dom1+dom2+dom3+tld+tagType3); }

--
Roland Hall
/* This information is distributed in the hope that it will be useful, but
without any warranty; without even the implied warranty of merchantability
or fitness for a particular purpose. */
Technet Script Center - http://www.microsoft.com/technet/scriptcenter/
WSH 5.6 Documentation - http://msdn.microsoft.com/downloads/list/webdev.asp MSDN Library - http://msdn.microsoft.com/library/default.asp

Jul 22 '05 #2

Roland Hall wrote:
Am I correct in assuming screen scraping is just the response text sent to the browser? If so, would that mean that this could not be screen scraped?
function moi() {
var tag = '<a href=';
var tagType1 = '"mail'+'to:', tagType2 = '">', tagType3 = '<\/a>';
var user1 = 'web', user2 = 'master', user3 = '@';
var dom1 = 'danger', dom2 = 'ous', dom3 = 'ly';
var tld = '.us';
document.write(tag+tagType1+user1+user2+user3+dom1 +dom2+dom3+tld+tagType2+user1+user2+user3+dom1+dom 2+dom3+tld+tagType3); }


Anything can be scraped. If you want to hide an email address, put a
form up and send the email server side so that the email address can
never be retrieved over HTML.

Jul 22 '05 #3
<la**********@yahoo.com> wrote in message
news:11*********************@f14g2000cwb.googlegro ups.com...
:
: Anything can be scraped. If you want to hide an email address, put a
: form up and send the email server side so that the email address can
: never be retrieved over HTML.

Hi Larry...

Thanks for responding...

I understand a form is best but I was looking for a way to defeat the
javascript. Surely a spammer is not going to capture all scripts and
process them in hopes of finding a single email address. The goal of a
spammer is to be lazy and get as much as possible with as little effort as
possible. There is no benefit to processing every script they spider with
no guarantee to of finding an email address encoded in it somewhere. I see
the benefit of finding one in plain sight since 99.99% of them will be that
way.

I also shouldn't have said "screen" scraped as it's not really the screen
memory that's being queried but rather the response text. Javascript
doesn't show the results, except to the browser. I have not seen a way to
grab those results although I can think of some possibilities which appear
to be a lot of effort. I just don't see the ROI but would welcome any info
on how it is accomplished.

--
Roland Hall
/* This information is distributed in the hope that it will be useful, but
without any warranty; without even the implied warranty of merchantability
or fitness for a particular purpose. */
Technet Script Center - http://www.microsoft.com/technet/scriptcenter/
WSH 5.6 Documentation - http://msdn.microsoft.com/downloads/list/webdev.asp
MSDN Library - http://msdn.microsoft.com/library/default.asp
Jul 22 '05 #4
"Mark Schupp" wrote in message news:OC**************@tk2msftngp13.phx.gbl...
: Screen scraping is a technique, not a format.

Hi Mark...

Thanks for responding. I didn't realize I said it was a format and I should
have said HTML scraping since it's not really screen scraping like it would
be on a terminal.

: The technique is to intercept
: the raw data (in this case HTML)that would normally be displayed on the
: client system screen and extract data from it. In ASP context screen
: scraping would typically be done by having a server-side component (such
as
: xmlhttprequest) perform a get or post to a url and return the raw HTML as
: text. Then a parser of some kind is used to extract the desired
information.

Yes, I'm familiar with that process.

: The example you present would be difficult (though not impossible) to
: screen-scrape server-side. The parser would have to be able to evaluate
the
: output of the JavaScript function to get the data. I have seen references
to
: using the HTML browser component (MSHTML object) to do things like this
but
: I don't think it works well server-side.

I have not been able to do it either. I think it may require HTML scraping
the site and then "screen" scraping my page, implying printing it to a text
file and then reloading and parsing that or capturing it from my screen
memory, the former being the easier of the two. This would require the
result look like us**@domain.com instead of user at domain dot com. I think
I'll test the first since so many suggest using encoded javascript to hide
from spammers.

--
Roland Hall
/* This information is distributed in the hope that it will be useful, but
without any warranty; without even the implied warranty of merchantability
or fitness for a particular purpose. */
Technet Script Center - http://www.microsoft.com/technet/scriptcenter/
WSH 5.6 Documentation - http://msdn.microsoft.com/downloads/list/webdev.asp
MSDN Library - http://msdn.microsoft.com/library/default.asp
Jul 22 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Jonathan Epstein | last post by:
I would like to perform a more classical type of "screen scraping" than what most people now associate with this term. I only want to find all the text on the current screen, and obtain associated...
2
by: Me | last post by:
I am dealing with a poorly written windows application that does not contain an API. I would like to use C# to run a predetermied set of steps in the application and scrape the resulting data...
0
by: Robert Martinez | last post by:
I've seen a lot about screen scraping with .NET, mostly in VB.net. I have been able to convert most of it over, but it is still just very basic stuff. Can someone help direct me toward some good...
3
by: _eee_ | last post by:
Does anyone know of a simple code module that can do screen scraping, including simulating user-entered pushbuttons, etc. I can get the first screen on a website with HttpWebRequest, but I need...
3
by: Jim Giblin | last post by:
I need to scrape specific information from another website, specifically the prices of precious metals from several different vendors. While I will credit the vendors as the data source, I do not...
1
by: niv | last post by:
Hello, I would like to screen scrape certain parts of a webpage...how can I do this in asp.net For instance.... a stockticker thats embeded on a webpage.. I dont want the entire page.. I...
4
by: rachel | last post by:
Hello, I am currently contracted out by a real estate agent. He has a page that he has created himself that has a list of homes.. their images and data in html format. He wants me to take...
4
by: different.engine | last post by:
Folks: I am screen scraping a large volume of data from Yahoo Finance each evening, and parsing with Beautiful Soup. I was wondering if anyone could give me some pointers on how to make it...
3
by: WFDGW2 | last post by:
I want to write or obtain C++ code that will scrape text from a dialog box within a poker client, and then record that text somewhere else. What do I do? Thanks.
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.