By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
439,957 Members | 1,969 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 439,957 IT Pros & Developers. It's quick & easy.

Help working Beautifulsoup into Python script

P: n/a
Hello, when I attempt to run a script I have in Python, I included Beautifulsoup into the coding of it, however when I run the script, Beautiful Soup fails, could someone explain what I did wrong?

Picture of error message:

Oct 7 '10 #1
Share this Question
Share on Google+
2 Replies


bvdet
Expert Mod 2.5K+
P: 2,851
According to the error message, there is an invalid tag on line 2645 of the HTML you are trying to parse. I have never used Beautiful Soup, but according to the documentation you may be able to fix the HTML before the document is parsed by passing the constructor a markupMassage argument. See the documentation here.
Oct 8 '10 #2

P: 3
A common Javascript pattern is to directly insert elements into the DOM. To this effect, you will encounter many instances where an "improperly" coded script element (as in without using CDATA, a rare habit and one that I'm completely against) will cause the parser to grind to a screeching halt. The fix is simple, apply the following filter to your source string:

Expand|Select|Wrap|Line Numbers
  1. import re
  2. re_script = re.compile("<script.*?>((?:.|\s)+?)</script>")
  3. out = re_script.sub("", source)
This will remove all script tags from the source string.
Oct 10 '10 #3

Post your reply

Sign in to post your reply or Sign up for a free account.