473,770 Members | 6,133 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

help!! *extra* tricky web page to extract data from...

How extract the visible numerical data from this Microsoft financial
web site?

http://tinyurl.com/yw2w4h

If you simply download the HTML file you'll see the data is *not*
embedded in it but loaded from some other file.

Surely if I can see the data in my browser I can grab it somehow right
in a Python script?

Any help greatly appreciated.

Sincerely,

Chris

Mar 13 '07
11 1819
Paul Rubin wrote:
"Diez B. Roggisch" <de***@nospam.w eb.dewrites:
>Obviously this wouldn't really help, as you can't predict what a
website actually wants which events, in possibly which
order. Especially if the site does not _want_ to be scrapable- think
of a simple "click on the images in the order of the numbers shown on
them" captcha.

Sure, but most sites don't go to such lengths, and even captchas can
be defeated if you're trying to scrape a specific site and are willing
to spend effort on the particular captcha generator that it uses.
Plus there is always www.captchasolver.com (!).
I especially like the rems and conditions they ask you to acknowledge if
you want to sign up as a worker:

http://www.captchasolver.com/join/worker#

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
Blog of Note: http://holdenweb.blogspot.com
See you at PyCon? http://us.pycon.org/TX2007

Mar 14 '07 #11
Steve Holden <st***@holdenwe b.comwrites:
I especially like the rems and conditions they ask you to acknowledge
if you want to sign up as a worker:
http://www.captchasolver.com/join/worker#
Heh, cute, I guess you have to solve a different type of puzzle to
read them.

I'm surprised anyone is purporting to pay actual money for captcha
solutions. The usual scheme I've herad (dunno if anyone actually does
it) is to feed the captchas you want to solve into a porn site, so
people give you solutions in order to keep viewing porn. You then
funnel the solutions back to the forms you're actually trying to
automate.

I think captchas are proving reasonably effective as a speed bump but
they do get defeated all the time, whether through automatic means or
otherwise.
Mar 14 '07 #12

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
8404
by: fadfdsj | last post by:
Hi, I would like to extract data from the table attached. Could someone help me to create the regular expression to grab that informations? TABLE: <table border=1 cellpadding=4 cellspacing=0 width=100%><tr bgcolor='#dcdcdc'><td align=center><b>Data</b></td><td align=center><b>Apertura</b></td><td
0
5494
by: Ashok | last post by:
Hi, is it possible to extract data from a web based java applet in order to enter that data in mysql? for example, something that would let me extract the data shown in applet on http://gcitrading.com/forex-quotes.htm and enter it in mysql. The data needs to be compared too so that any changes that this applet shows is entered with a time stamp. Or any info source? Thanks. Ashok
4
7356
by: jrefactors | last post by:
How to extract data from html page? For example, if i want to get the information of weather (http://weather.yahoo.com/forecast/USCA1005.html) and put in my web page. Is it possible to do that? please advise. thanks!!
1
3074
by: basyarie | last post by:
Hello All, I`d like to introduce myself. I`m basyarie, now is student of university. Nice to meet you all. I`m beginner in this discussion community. Just want to ask about VB6 for GPS application. How to extract data from GPS (for example with Pioneer GPS M1zz). Can I do it? How to do? Because there are at least 5 format sentences: RMC, VTG, GGA, GSA, ZDA.
2
3843
by: missolsr | last post by:
hi, I am using jpcap to capture OLSR topology control (udp) packets. Does anyone know how to extract data (the way ethereal does it) from the olsr packet? There are methods to extract data from udp and IP packets in jpcap but the issue is that olsr packets have their own header-data and since jpcap can not dig that far, I get nonsense as packet data. 1. Am I right to assume that jpcap can not dig to the data part of the packet...
1
2064
by: bibie | last post by:
How to extract data from mssql and then convert it to mysql using VB6.0. How to connect the mssql..I know a little bit of VB6.0 but only create an interface using STANDARD EXE. Someone told me to create a script to extract data but i dont know how. Where should i create the script? ActiveX EXE or AvtiveX DLL. I really need help..Tq.
5
3590
by: ElTipo | last post by:
Hello People, I made a data base with secure wizard to provide to users a PID and Passwords. I need to extract data from Crystal Reports 7 in this data base but Crystal Reports send me a message like I cant extract data because I don't have rights to this data base. I am the "Admin" I don't Know what happens in this case. I try to change the "Set Location" in Crystal Rpts but no results. Crystal don't show me any window to put the...
1
4334
by: fly2irfan | last post by:
Hi All, I am new to IT/Developer Network I have to create an application which has to Extract data from Excel Spreadsheet using C# or VB.net then save the data into SQL database. Can anybody help me out in this regard. Regards, Cool
5
2850
by: =?Utf-8?B?aWxy?= | last post by:
Hi This is probably fairly simple but I am newish at programming and was wondering if someone can give me some advice on handling the following. I have an array with a large number of elements in it. 0-9 are related data, 10-19, 20-29 are related and so on. What is the best way of extracting groups of elements from the array into another array where each element is the related data or to extract say elements 0,1,5 from the first...
1
1461
by: =?Utf-8?B?THVpZ2k=?= | last post by:
Hi all, is it possible to extract data from Pdf file, in several formats, like .txt or Excel. And from an aspx page (ASP.NET 2.0 - C#). Thanks in advance. -- Luigi
0
9592
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9425
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10231
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10059
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10005
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9871
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
7416
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
2
3576
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2817
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.