How extract the visible numerical data from this Microsoft financial
web site? http://tinyurl.com/yw2w4h
If you simply download the HTML file you'll see the data is *not*
embedded in it but loaded from some other file.
Surely if I can see the data in my browser I can grab it somehow right
in a Python script?
Any help greatly appreciated.
Sincerely,
Chris 11 1817 se******@spawar .navy.mil schrieb:
How extract the visible numerical data from this Microsoft financial
web site?
http://tinyurl.com/yw2w4h
If you simply download the HTML file you'll see the data is *not*
embedded in it but loaded from some other file.
Surely if I can see the data in my browser I can grab it somehow right
in a Python script?
Any help greatly appreciated.
It's an AJAX-site. You have to carefully analyze it and see what
actually happens in the javascript, then use that. Maybe something like
the http header plugin for firefox helps you there.
Diez
"se******@spawa r.navy.mil" <se******@spawa r.navy.milwrote :
How extract the visible numerical data from this Microsoft
financial web site?
http://tinyurl.com/yw2w4h
If you simply download the HTML file you'll see the data is *not*
embedded in it but loaded from some other file.
Surely if I can see the data in my browser I can grab it somehow
right in a Python script?
Any help greatly appreciated.
Sincerely,
Chris
The url for the data is in an iframe. If you need to scrape the
original page for some reason(instead of iframe url directly), you can
use urlparse.urljoi n to resolve the relative url.
max
It's an AJAX-site. You have to carefully analyze it and see what
actually happens in the javascript, then use that. Maybe something like
the http header plugin for firefox helps you there.
ups, obviously I wasn't looking enough at the site. Sorry for the confusion.
Still, some pages are AJAX, you won't be able to scrape them easily
without analyzing the JS code.
Diez
"Diez B. Roggisch" <de***@nospam.w eb.dewrites:
Still, some pages are AJAX, you won't be able to scrape them easily
without analyzing the JS code.
Sooner or later it would be great to have a JS interpreter written in
Python for this purpose. It would do all the same operations on an
HTML/XML DOM that a browser does, basically all the stuff of a browser
except rendering into pixels. JS semantics are similar enough to
Python that maybe the JS could be compiled into Python byte code.
Paul Rubin schrieb:
"Diez B. Roggisch" <de***@nospam.w eb.dewrites:
>Still, some pages are AJAX, you won't be able to scrape them easily without analyzing the JS code.
Sooner or later it would be great to have a JS interpreter written in
Python for this purpose. It would do all the same operations on an
HTML/XML DOM that a browser does, basically all the stuff of a browser
except rendering into pixels. JS semantics are similar enough to
Python that maybe the JS could be compiled into Python byte code.
Nice idea, but not really helpful in the end. Besides the rather nasty
parts of the DOMs that make JS programming the PITA it is, I think the
whole event-based stuff makes this basically impossible.
Diez
"Diez B. Roggisch" <de***@nospam.w eb.dewrites:
Nice idea, but not really helpful in the end. Besides the rather nasty
parts of the DOMs that make JS programming the PITA it is, I think the
whole event-based stuff makes this basically impossible.
Obviously the Python interface would need ways to send events into the
DOM, simulating timer ticks, mouse clicks, and so forth, just like
urllib in a sense simulates a user navigating a browser.
Paul Rubin schrieb:
"Diez B. Roggisch" <de***@nospam.w eb.dewrites:
>Nice idea, but not really helpful in the end. Besides the rather nasty parts of the DOMs that make JS programming the PITA it is, I think the whole event-based stuff makes this basically impossible.
Obviously the Python interface would need ways to send events into the
DOM, simulating timer ticks, mouse clicks, and so forth, just like
urllib in a sense simulates a user navigating a browser.
Obviously this wouldn't really help, as you can't predict what a website
actually wants which events, in possibly which order. Especially if the
site does not _want_ to be scrapable- think of a simple "click on the
images in the order of the numbers shown on them" captcha.
Most time it's easier to sniff the http stream & grab the data directly.
Diez
"Diez B. Roggisch" <de***@nospam.w eb.dewrites:
Obviously this wouldn't really help, as you can't predict what a
website actually wants which events, in possibly which
order. Especially if the site does not _want_ to be scrapable- think
of a simple "click on the images in the order of the numbers shown on
them" captcha.
Sure, but most sites don't go to such lengths, and even captchas can
be defeated if you're trying to scrape a specific site and are willing
to spend effort on the particular captcha generator that it uses.
Plus there is always www.captchasolver.com (!).
Most time it's easier to sniff the http stream & grab the data directly.
Certainly true, but there are times when you have to pull stuff out of
the JS. It's usually possible to do that without actually
interpreting the JS, but an interpreter would make it a lot more
convenient some of the time. se******@spawar .navy.mil wrote:
How extract the visible numerical data from this Microsoft financial
web site?
http://tinyurl.com/yw2w4h
If you simply download the HTML file you'll see the data is *not*
embedded in it but loaded from some other file.
Surely if I can see the data in my browser I can grab it somehow right
in a Python script?
Any help greatly appreciated.
Been there, done that, years ago. Try this: http://www.downside.com/cgi/testfina...-06-034196.txt
That will get you the data you're looking for.
If you want to try other companies, start at the query box on
"http://www.downside.co m".
The data is actually coming from the United States Securities and Exchange
Commission's EDGAR web site, where companies are required to file their
financial statements. The filings are intended to be read by humans, but
it's possible to parse many filings mechanically. They're supposed to be
in HTML 3.2, but this isn't enforced.
There are many EDGAR parsers, some better than ours. To do a really good one,
you have to license a patent from Price Waterhouse. Try
"http://www.10kwizard.c om/", which has an API for retrieving this info.
It's not free.
John Nagle This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: fadfdsj |
last post by:
Hi,
I would like to extract data from the table attached.
Could someone help me to create the regular expression to grab that
informations?
TABLE:
<table border=1 cellpadding=4 cellspacing=0 width=100%><tr
bgcolor='#dcdcdc'><td align=center><b>Data</b></td><td
align=center><b>Apertura</b></td><td
|
by: Ashok |
last post by:
Hi,
is it possible to extract data from a web based java applet in order to
enter that data in mysql?
for example, something that would let me extract the data shown in applet on
http://gcitrading.com/forex-quotes.htm and enter it in mysql. The data needs
to be compared too so that any changes that this applet shows is entered
with a time stamp.
Or any info source?
Thanks.
Ashok
|
by: jrefactors |
last post by:
How to extract data from html page? For example, if i want to
get the information of weather
(http://weather.yahoo.com/forecast/USCA1005.html)
and put in my web page. Is it possible to do that?
please advise. thanks!!
|
by: basyarie |
last post by:
Hello All,
I`d like to introduce myself.
I`m basyarie, now is student of university.
Nice to meet you all.
I`m beginner in this discussion community.
Just want to ask about VB6 for GPS application.
How to extract data from GPS (for example with Pioneer GPS M1zz).
Can I do it? How to do?
Because there are at least 5 format sentences: RMC, VTG, GGA, GSA, ZDA.
|
by: missolsr |
last post by:
hi,
I am using jpcap to capture OLSR topology control (udp) packets.
Does anyone know how to extract data (the way ethereal does it) from the olsr packet?
There are methods to extract data from udp and IP packets in jpcap but the issue is that olsr packets have their own header-data and since jpcap can not dig that far, I get nonsense as packet data.
1. Am I right to assume that jpcap can not dig to the data part of the packet...
| |
by: bibie |
last post by:
How to extract data from mssql and then convert it to mysql using VB6.0. How to connect the mssql..I know a little bit of VB6.0 but only create an interface using STANDARD EXE. Someone told me to create a script to extract data but i dont know how. Where should i create the script? ActiveX EXE or AvtiveX DLL. I really need help..Tq.
|
by: ElTipo |
last post by:
Hello People,
I made a data base with secure wizard to provide to users a PID and Passwords. I need to extract data from Crystal Reports 7 in this data base but Crystal Reports send me a message like I cant extract data because I don't have rights to this data base. I am the "Admin" I don't Know what happens in this case.
I try to change the "Set Location" in Crystal Rpts but no results.
Crystal don't show me any window to put the...
|
by: fly2irfan |
last post by:
Hi All,
I am new to IT/Developer Network I have to create an application which has to Extract data from Excel Spreadsheet using C# or VB.net then save the data into SQL database.
Can anybody help me out in this regard.
Regards,
Cool
|
by: =?Utf-8?B?aWxy?= |
last post by:
Hi
This is probably fairly simple but I am newish at programming and was
wondering if someone can give me some advice on handling the following.
I have an array with a large number of elements in it. 0-9 are related
data, 10-19, 20-29 are related and so on. What is the best way of extracting
groups of elements from the array into another array where each element is
the related data or to extract say elements 0,1,5 from the first...
|
by: =?Utf-8?B?THVpZ2k=?= |
last post by:
Hi all,
is it possible to extract data from Pdf file, in several formats, like .txt
or Excel.
And from an aspx page (ASP.NET 2.0 - C#).
Thanks in advance.
--
Luigi
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it.
First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
| |
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
| |
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
|
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |