471,055 Members | 2,059 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,055 software developers and data experts.

How to parse a name out of a web page?

with high accuracy...

My temporary plan is to first recognized consecutive two or three
initial-capitalized words, but certainly we need to do more than that?
Anyone has suggestions?

Thanks first.

Apr 5 '06 #1
2 1088

Haibao Tang wrote:
with high accuracy...

My temporary plan is to first recognized consecutive two or three
initial-capitalized words, but certainly we need to do more than that?
Anyone has suggestions?

Thanks first.


It's not easy to say without seeing the HTML. If you the structure
allows it, a couple of str.split() is probably the easiest way, but you
always have BeautifulSoup.

http://www.crummy.com/software/BeautifulSoup/

Apr 5 '06 #2

On Apr 5, 2006, at 4:50 PM, Haibao Tang wrote:
with high accuracy...

My temporary plan is to first recognized consecutive two or three
initial-capitalized words, but certainly we need to do more than that?
Anyone has suggestions?

Thanks first.

--
http://mail.python.org/mailman/listinfo/python-list


Surely, this is a task for http://nltk.sourceforge.net/ . Especially
if you want high accuracy.
---
Andrew Gwozdziewycz
ap****@gmail.com
http://ihadagreatview.org
http://and.rovir.us
Apr 5 '06 #3

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

3 posts views Thread by dave | last post: by
3 posts views Thread by Marten van Urk | last post: by
2 posts views Thread by new kid | last post: by
6 posts views Thread by Ehartwig | last post: by
reply views Thread by Friso Wiskerke | last post: by
14 posts views Thread by Rob Meade | last post: by
7 posts views Thread by Perks | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.