BeautifulSoup: problems with parsing a website

Similar topics

by: Steve Young | last post by:

I tried using BeautifulSoup to make changes to the url links on html pages, but when the page was displayed, it was garbled up and didn't look right (even when I didn't actually change anything on the page yet). I ran these steps in python to see what was up: >>from BeautifulSoup import BeautifulSoup >>from urllib2 import build_opener, Request >> >>req = Request('http://www.python.org/')

Python

8611

scraping nested tables with BeautifulSoup

by: Gonzillaaa | last post by:

I'm trying to get the data on the "Central London Property Price Guide" box at the left hand side of this page http://www.findaproperty.com/regi0018.html I have managed to get the data :) but when I start looking for tables I only get tables of depth 1 how do I go about accessing inner tables? same happens for links... this is what I've go so far

Python

4634

BeautifulSoup vs. loose & chars

by: John Nagle | last post by:

I've been parsing existing HTML with BeautifulSoup, and occasionally hit content which has something like "Design & Advertising", that is, an "&" instead of an "&". Is there some way I can get BeautifulSoup to clean those up? There are various parsing options related to "&" handling, but none of them seem to do quite the right thing. If I write the BeautifulSoup parse tree back out with "prettify", the loose "&" is still in there. So...

Python

2664

BeautifulSoup bug when ">>>" found in attribute value

by: John Nagle | last post by:

This, which is from a real web site, went into BeautifulSoup: <param name="movie" value="/images/offersBanners/sw04.swf?binfot=We offer fantastic rates for selected weeks or days!!&blinkt=Click here And this came out, via prettify: <addresssnippet siteurl="http%3A//apartmentsapart.com" url="http%3A//www.apartmentsapart.com/Europe/Spain/Madrid/FAQ"> <param name="movie" value="/images/offersBanners/sw04.swf?binfot=We offer

Python

1774

"Subscribing" to topics?

by: Mizipzor | last post by:

Is there a way to "subscribe" to individual topics? im currently getting bombarded with daily digests and i wish to only receive a mail when there is activity in a topic that interests me. Can this be done? Thanks in advance.

Python

1893

BeautifulSoup vs. Microsoft

by: John Nagle | last post by:

Here's a construct with which BeautifulSoup has problems. It's from "http://support.microsoft.com/contactussupport/?ws=support". This is the original: <a href="http://www.microsoft.com/usability/enroll.mspx" id="L_75998" title="<!--http://www.microsoft.com/usability/information.mspx->" onclick="return MS_HandleClick(this,'C_32179', true);">

Python

3185

BeautifulSoup vs. real-world HTML comments

by: John Nagle | last post by:

The syntax that browsers understand as HTML comments is much less restrictive than what BeautifulSoup understands. I keep running into sites with formally incorrect HTML comments which are parsed happily by browsers. Here's yet another example, this one from "http://www.webdirectory.com". The page starts like this: <!Hello there! Welcome to The Environment Directory!> <!Not too much exciting HTML code here but it does the job! >...

Python

2022

Re: Where to get BeautifulSoup--www.crummy.com appears to be down.

by: John Nagle | last post by:

Mike Driscoll wrote: What on earth do you need a "Windows binary" for? "BeautifulSoup" is ONE PYTHON SOURCE FILE, "BeautifulSoup.py". It can be downloaded here: http://www.crummy.com/software/BeautifulSoup/download/BeautifulSoup.py And yes, the site is up.

Python

5598

simple Question about using BeautifulSoup

by: Alexnb | last post by:

Okay, I have used BeautifulSoup a lot lately, but I am wondering, how do you open a local html file? Usually I do something like this for a url soup = BeautifulSoup(urllib.urlopen('http://www.website.com') but the file extension doesn't work. So how do I open one? -- View this message in context: http://www.nabble.com/simple-Question-about-using-BeautifulSoup-tp19069980p19069980.html

Python

9607

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...

Windows Server

10398

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...

Online Marketing

9215

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...

Career Advice

7676

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...

Microsoft Access / VBA

6897

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...

C# / C Sharp

5567

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

5702

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

3881

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP

3028

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

General