473,385 Members | 1,402 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

General Web Scraping Question

I've been working on a web scraping program, and have the basics down.

But I don't understand the parameters.
Normally, you go to a URL (say a reverse yellow pages directory), and enter
some parameters (like area code, phone number, etc.) and POST this back to
the web. Then you parse the response, looking for the data you need.

Ofen I see examples where the data you post contains something like
"AreaCode=503&Number=5551212&x=1&y=2"

Where do the "x=1 and y=2" come from? I have some sites where my post
doesn't work. In one case, you are supposed to enter a contractor's license
number, and then click a button, and the result contains information about
the license. After I post what I think should work, the result coming back
is the same web page, with the contractor's number filled in.

Do the X and Y parameters involve invoking a button? How do you determine
what to use for the parameters?

Thanks in advance for any advice or pointers!
---Selden McCabe
Nov 18 '05 #1
2 1314
I suspect X and Y are passed by the browser when the user clicks on an image
map. Have you tried passing &x=1&y=1 in your post?

--
Thanks,

Eric Lawrence
Program Manager
Assistance and Worldwide Services

This posting is provided "AS IS" with no warranties, and confers no rights.
"Selden McCabe" <se*****@msn.com> wrote in message
news:#l**************@TK2MSFTNGP09.phx.gbl...
I've been working on a web scraping program, and have the basics down.

But I don't understand the parameters.
Normally, you go to a URL (say a reverse yellow pages directory), and enter some parameters (like area code, phone number, etc.) and POST this back to
the web. Then you parse the response, looking for the data you need.

Ofen I see examples where the data you post contains something like
"AreaCode=503&Number=5551212&x=1&y=2"

Where do the "x=1 and y=2" come from? I have some sites where my post
doesn't work. In one case, you are supposed to enter a contractor's license number, and then click a button, and the result contains information about
the license. After I post what I think should work, the result coming back is the same web page, with the contractor's number filled in.

Do the X and Y parameters involve invoking a button? How do you determine
what to use for the parameters?

Thanks in advance for any advice or pointers!
---Selden McCabe

Nov 18 '05 #2
Selden McCabe wrote:
I've been working on a web scraping program, and have the basics down.

But I don't understand the parameters.
Normally, you go to a URL (say a reverse yellow pages directory), and
enter some parameters (like area code, phone number, etc.) and POST
this back to the web. Then you parse the response, looking for the
data you need.

Ofen I see examples where the data you post contains something like
"AreaCode=503&Number=5551212&x=1&y=2"

Where do the "x=1 and y=2" come from? I have some sites where my post
doesn't work. In one case, you are supposed to enter a contractor's
license number, and then click a button, and the result contains
information about the license. After I post what I think should
work, the result coming back is the same web page, with the
contractor's number filled in.

Do the X and Y parameters involve invoking a button? How do you
determine what to use for the parameters?


These could be hidden fields used by web application to store session state
on the client. Actually, it's not easy to implement web scraping for
"foreign" web applications where you don't have access to the code or at
least some inside knowledge.

Cheers,

--
Joerg Jooss
jo*********@gmx.net

Nov 18 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: David Jones | last post by:
Hi, I'm interested in learning about web scraping/site scraping using Python. Does anybody know of some online resources or have any modules that are available to help out. O'Reilly published an...
4
by: Roland Hall | last post by:
Am I correct in assuming screen scraping is just the response text sent to the browser? If so, would that mean that this could not be screen scraped? function moi() { var tag = '<a href='; var...
1
by: mustafa | last post by:
anyone know some good reliable html scraping (with python) tutorials. i have looked around and found a few. one uses urllib2 and beautifull soap modules for scraping and parsing...
3
by: Jim Giblin | last post by:
I need to scrape specific information from another website, specifically the prices of precious metals from several different vendors. While I will credit the vendors as the data source, I do not...
1
by: niv | last post by:
Hello, I would like to screen scrape certain parts of a webpage...how can I do this in asp.net For instance.... a stockticker thats embeded on a webpage.. I dont want the entire page.. I...
2
by: Victor | last post by:
I'm doing screen scraping by retrieving data from one site and entering into another site. I have a problem with logging into the site. User name and password field contain 'name' property, and...
4
by: jeffbg123 | last post by:
Hey, I am trying to make a bot for a flash game using python. However I am having some trouble with a screen scraping strategy. Is there an accepted way to compare a full screenshot with the...
3
by: bruce | last post by:
Hi... got a short test app that i'm playing with. the goal is to get data off the page in question. basically, i should be able to get a list of "tr" nodes, and then to iterate/parse them....
1
by: bruce | last post by:
Hi Paul... Thanks for the reply. Came to the same conclusion a few minutes before I saw your email. Another question: tr=d.xpath(foo) gets me an array of nodes.
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.