473,657 Members | 2,530 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Beautiful parse joy - Oh what fun

Hi all,

I am trying to parse into a dictionary a table and I am having all
kinds of fun. Can someone please help me out.

What I want is this:

dic={'Division Code':'SALS','E mployee':'LOO ABLE'}

Here is what I have..

html="""<table> <tr valign="top"><t d width="24"><img
src="/icons/ecblank.gif" border="0" height="1" width="1" alt=""
/></td><td width="129"><b> <font size="2" face="Arial">Di vision Code:
</font></b></td><td width="693"><fo nt size="2"
face="Arial">SA LS</font></td></tr> <tr valign="top"><t d width="24"><img
src="/icons/ecblank.gif" border="0" height="1" width="1" alt="" /> <td
width="129"><b> <font size="2" face="Arial">Em ployee:
</font></b></td> <td width="693"><fo nt size="2"
face="Arial">LO O</font><b><font size="2" face="Arial"> </font></b><font
size="2" face="Arial">AB LE</font></td></tr></table> """
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup()
soup.feed(html)

dic={}
for row in soup('table')[0]('tr'):
column = row('td')
print column[1].findNext('font ').string.strip (),
column[2].findNext('font ').string.strip ()
dic[column[1].findNext('font ').string.strip ()]=
column[2].findNext('font ').string.strip ()

for key in dic.keys():
print key, dic[key]

The problem is I am missing the last name ABLE. How can I get "ALL"
of the text. Clearly I have something wrong with my font string.. but
what it is I am not sure of.

Please and thanks!!

May 16 '06 #1
3 1411

rh0dium wrote:
Hi all,

I am trying to parse into a dictionary a table and I am having all
kinds of fun. Can someone please help me out.

What I want is this:

dic={'Division Code':'SALS','E mployee':'LOO ABLE'}

Here is what I have..

html="""<table> <tr valign="top"><t d width="24"><img
src="/icons/ecblank.gif" border="0" height="1" width="1" alt=""
/></td><td width="129"><b> <font size="2" face="Arial">Di vision Code:
</font></b></td><td width="693"><fo nt size="2"
face="Arial">SA LS</font></td></tr> <tr valign="top"><t d width="24"><img
src="/icons/ecblank.gif" border="0" height="1" width="1" alt="" /> <td
width="129"><b> <font size="2" face="Arial">Em ployee:
</font></b></td> <td width="693"><fo nt size="2"
face="Arial">LO O</font><b><font size="2" face="Arial"> </font></b><font
size="2" face="Arial">AB LE</font></td></tr></table> """
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup()
soup.feed(html)

dic={}
for row in soup('table')[0]('tr'):
column = row('td')
print column[1].findNext('font ').string.strip (),
column[2].findNext('font ').string.strip ()
dic[column[1].findNext('font ').string.strip ()]=
column[2].findNext('font ').string.strip ()

for key in dic.keys():
print key, dic[key]

The problem is I am missing the last name ABLE. How can I get "ALL"
of the text. Clearly I have something wrong with my font string.. but
what it is I am not sure of.

Please and thanks!!

In the last row you have 3 <font> tags. The first one
contains LOO the second one is empty and the third one
contains ABLE.

<td width="693"><fo nt size="2" face="Arial">LO O</font><b>
<font size="2" face="Arial"> </font></b>
<font size="2" face="Arial">AB LE</font></td>

Your code is not expecting the second (empty) tag.

-Larry Bates
May 16 '06 #2
KvS
Maybe a more robust approach is just to walk through the string
counting the (increments) of the number of brackets "<" and ">" as you
know that all the relevant text occurs right after a ">" has occured
that sets your counter to 0 (meaning you're at the "highest level").
There's no relevant text if the next character is again a "<".

May 17 '06 #3
Here's one way to do it:

import re
_any_re = re.compile('.+' )

d = {}
for row in BeautifulSoup(h tml).fetch('tr' ):
columns = row.fetch('td')
field = columns[1].firstText(_any _re).rstrip(' \t\n:')
value = ' '.join(text.rst rip()
for text in columns[2].fetchText(_any _re))
d[field] = value
print d

George

May 17 '06 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
1466
by: Gekitsuu | last post by:
I've been reading a lot of python modules lately to see how they work and I've stumbled across something that's sort of annoying and wanted to find out of there was a good reason behind it. In a Perl program when you're calling other modules you'll add "use" statements at the beginning of your script like: use strict; use WWW::Mechanize; use CGI;
2
2165
by: meyerkp | last post by:
Hi all, I'm trying to extract some information from an html file using beautiful soup. The strings I want get are after br tags, eg: <font size='6'> <br>this info <br>more info <br>and more info </font>
15
5980
by: Francach | last post by:
Hi, I'm trying to use the Beautiful Soup package to parse through the "bookmarks.html" file which Firefox exports all your bookmarks into. I've been struggling with the documentation trying to figure out how to extract all the urls. Has anybody got a couple of longer examples using Beautiful Soup I could play around with? Thanks, Martin.
6
8186
by: wei.niu | last post by:
I'm writing a little software for managing diary.I use only SDK,and I find it's hard to create a good GUI.I hope it's skin can be changed easily.But I don't know how to do it.Are there any articles or book about it?
5
6439
by: fAnSKyer/C# newbie | last post by:
How to make GUI more beautiful? Can any give any hint? Or some examples that downloadeable from internet? I am using C# and visual studio 2005 Thanks
3
10699
by: cjl | last post by:
I am learning python and beautiful soup, and I'm stuck. A web page has a table that contains data I would like to scrape. The table has a unique class, so I can use: soup.find("table", {"class": "class_name"}) This isolates the table. So far, so good. Next, this table has a certain number of rows (I won't know ahead of time how many), and each row has a set number of cells (which will be constant).
0
1732
by: jack | last post by:
Check it out:Very good online resources,tons of cool men and beautiful women eager for lovers....: 1.Buy tickets online: http://groups.google.com/group/all-good-things/web/want-to-buy-tickets-online-come-here 2.No 1 social network: http://groups.google.com/group/all-good-things/web/1-social-network 3.Very good online resources:
8
2766
by: js | last post by:
Hi, Have you ever seen Beautiful Python code? Zope? Django? Python standard lib? or else? Please tell me what code you think it's stunning.
1
3434
by: Mark B | last post by:
I am trying to programmatically create some icons for an Office 2007 add-in (VB/C#) that will appear on the navigation Ribbon. The particular icon is simply a colored round circle representing a colored dot to indicate the category of an item. I have been able to draw basic circles using the System.Drawing.Drawing2D namespace. What I want though is 'beautiful' icons (e.g. that look the the Windows Vista bottom-left Start button, save...
0
8421
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8325
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8844
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8742
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
7354
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6177
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5643
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4173
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4330
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.