473,320 Members | 1,817 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

how do I extract data from html file ? - web scraping

Hi guys , Im a little unfamiliar with Python . Hope you can take a look at this:

Im trying to extract the number 7.2 from the html string below using python:
Expand|Select|Wrap|Line Numbers
  1. '''<a href="/ratings_explained">weighted average</a> vote of <a href="/List?ratings=7">7.2</a> / 10</p><p>'''
I thought this would be code to do this .But how come this doesnt work ?
Expand|Select|Wrap|Line Numbers
  1. averageget = re.compile('<a href="/List?ratings=7">(.*?)</a>')
  2. average = averageget.findall(htmlr)
Could it be that there some special structures in the html file again which I missed out ?
Dec 23 '09 #1
1 2767
bvdet
2,851 Expert Mod 2GB
Please use code tags when posting code.

The question mark (?) is a special character sequence recognized in regular expression patterns. To use the literal character, precede the character with a backslash.

Expand|Select|Wrap|Line Numbers
  1. averageget = re.compile('<a href="/List\?ratings=7">(.*?)</a>')
Dec 26 '09 #2

Sign in to post your reply or Sign up for a free account.

Similar topics

3
by: Phong Ho | last post by:
Hi everyone, I try to write a simple web crawler. It has to do the following: 1) Open an URL and retrieve a HTML file. 2) Extract news headlines from the HTML file 3) Put the headlines into a...
5
by: mbbx6spp | last post by:
Hi All, I already searched this newsgroup and google groups to see if I could find a Python equivalent to Perl's Template::Extract, but didn't find anything leading to a Python module that had...
1
by: mustafa | last post by:
anyone know some good reliable html scraping (with python) tutorials. i have looked around and found a few. one uses urllib2 and beautifull soap modules for scraping and parsing...
3
by: Robot | last post by:
Dear all, I need to create a script which will extract the contents of 2 cells of an html that contains a specified number of cells.Then I need to put that contents in another cells of my own html...
7
by: MarkusJNZ | last post by:
Hi, we have some datafeeds which pull info from external sources. Unfortunately, we have to use screen scraping as there are no XML feeds. The data feeds are located in a variety of different...
1
by: steveyjg | last post by:
I want to extract the following data from a retrieved html file and store the information as strings. 'get the text of "title" <h1 id="test_title">title</h1> 'get the contents of the value...
7
by: Ulysse | last post by:
Hello, I'm trying to extract the data from HTML table. Here is the part of the HTML source : """ <tr> <td class="tdn" valign="top"> <input name="x44553130" value="y" type="checkbox"></td>...
1
by: veer | last post by:
Hi i am making a program in which i want to extract data from html file . Actually there are two dates on html file i want to extract these dates but the main probleum is that these dates are...
18
by: Ecka | last post by:
Hi everyone, I'm trying to write a PHP script that connects to a bank's currency convertor page using cURL and that part works fine. The issue is that I end up with a page that includes a lot...
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
0
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.