By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
439,957 Members | 2,038 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 439,957 IT Pros & Developers. It's quick & easy.

Some <head> clauses cases BeautifulSoup to choke?

P: n/a
I've got a simple script that looks like (watch the wrap):
---------------------------------------------------
import BeautifulSoup,urllib

ifile = urllib.urlopen("http://www.naco.faa.gov/digital_tpp_search.asp?fldId
ent=klax&fld_ident_type=ICAO&ver=0711&bnSubmit=Com plete+Search").read()

soup=BeautifulSoup.BeautifulSoup(ifile)
print soup.prettify()
----------------------------------------------------

and all I get out of it is garbage. Other simular urls from the same site
work fine (use http://www.naco.faa.gov/digital_tpp_search.asp?fldId
ent=klax&fld_ident_type=ICAO&ver=0711&bnSubmit=Com plete+Search as one example).

I did some poking and proding and it seems that there is something in the
<headclause that is causing the problem. Heck if I can see what it is.

I'm new to BeautifulSoup (heck, I'm new to python). If I'm doing something
dumb, you don't need to be gentle.

--
Frank Stutzman
Nov 19 '07 #1
Share this Question
Share on Google+
2 Replies


P: n/a
On Nov 19, 2007 1:36 PM, Frank Stutzman <st******@skywagon.kjsl.comwrote:
I've got a simple script that looks like (watch the wrap):
---------------------------------------------------
import BeautifulSoup,urllib

ifile = urllib.urlopen("http://www.naco.faa.gov/digital_tpp_search.asp?fldId
ent=klax&fld_ident_type=ICAO&ver=0711&bnSubmit=Com plete+Search").read()

soup=BeautifulSoup.BeautifulSoup(ifile)
print soup.prettify()
----------------------------------------------------

and all I get out of it is garbage. Other simular urls from the same site
work fine (use http://www.naco.faa.gov/digital_tpp_search.asp?fldId
ent=klax&fld_ident_type=ICAO&ver=0711&bnSubmit=Com plete+Search as one example).

I did some poking and proding and it seems that there is something in the
<headclause that is causing the problem. Heck if I can see what it is.

I'm new to BeautifulSoup (heck, I'm new to python). If I'm doing something
dumb, you don't need to be gentle.
You have the same URL as both your good and bad example.
Nov 19 '07 #2

P: n/a
Frank Stutzman <st******@skywagon.kjsl.comwrote:
I did some poking and proding and it seems that there is something in
the
<headclause that is causing the problem. Heck if I can see what it
is.
Maybe Beautifulsoup believes the incorrect encoding in the meta tags?
Nov 19 '07 #3

This discussion thread is closed

Replies have been disabled for this discussion.