473,386 Members | 1,775 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

pyparsing with nested table

using pyparsing to deal with nested tables , wanna keep table's
structure and propertys .
but program was chunked with the </td> tag of inner table.

have any ideas?

here's the program
from pyparsing import *

mytable = """
<table id="leftpage_table" width="156" border="0" cellspacing="0"
cellpadding="0">
<tr id="trtd" height="24">
<td width="153" background="images/bt_kind.gif" align="center"
class="left_menu">system</td>
</tr>
<tr id="trtd_down" height="20">
<td id="trtd_down"><table id="inner_lefgpage_table" width="100%"
height="100%" border="0" cellspacing="0" cellpadding="0">
<tr id="inner_trtd" height="20">
<td background="images/bt_class.gif" align="center">art</td>
</tr>
<tr>
<td background="images/bt_class.gif" align="center">art</td>
</tr>
</table></td>
</tr>
</table>
"""

startTag = Literal("<")
endTag = Literal(">")
idPattern = CaselessLiteral("id").suppress() + Literal("=").suppress()
+ ( quotedString.copy().setParseAction( removeQuotes ) |
Word(srange("[a-zA-Z0-9_~]")))
attrPattern = Combine(Word(alphanums + "_") + Literal("=") + (
quotedString | Word(srange("[a-zA-Z0-9_~:&@#;?/\.]"))))

tablePattern = Forward()
def getItemCloseTag(x):
itemCloseTag = Combine(startTag + Literal("/") + CaselessLiteral(x)
+ endTag).suppress()
return itemCloseTag
def getItemStartTag(x):
itemStartTag = startTag.suppress() +
Keyword(x,caseless=True).suppress() + Group(ZeroOrMore(idPattern)) +
Group(ZeroOrMore(attrPattern)) + endTag.suppress()
return itemStartTag
def getItemPattern(x):
tCloseTag = getItemCloseTag(x)
itemPattern = getItemStartTag(x) + Group(ZeroOrMore(tablePattern))
+ Group(SkipTo(tCloseTag)) + tCloseTag
return itemPattern
def getMultiLevelPattern(x,y):
tCloseTag = getItemCloseTag(x)
itemPattern = getItemStartTag(x) + Group(OneOrMore(y)) + tCloseTag
return itemPattern

tdPattern = getItemPattern(x='td')
trPattern = getMultiLevelPattern('tr',tdPattern)
tablePattern = getMultiLevelPattern('table',trPattern)
t = tablePattern
for toks,strt,end in t.scanString(mytable):
print toks.asList()
OutPut:
[['leftpage_table'], ['width="156"', 'border="0"', 'cellspacing="0"',
'cellpadding="0"'], [['trtd'], ['height="24"'], [[], ['width="153"',
'background="images/bt_kind.gif"', 'align="center"',
'class="left_menu"'], [], ['system']], ['trtd_down'], ['height="20"'],
[['trtd_down'], [], [], ['<table id="inner_lefgpage_table" width="100%"
height="100%" border="0" cellspacing="0" cellpadding="0">\n <tr
id="inner_trtd" height="20">\n <td
background="images/bt_class.gif" align="center">art']], [], [], [[],
['background="images/bt_class.gif"', 'align="center"'], [], ['art']]]]

Dec 8 '05 #1
2 1660

astarocean wrote:
using pyparsing to deal with nested tables , wanna keep table's
structure and propertys .
but program was chunked with the </td> tag of inner table.

have any ideas?

here's the program
from pyparsing import * <... snip ...>
tablePattern = Forward() <... snip ...> tablePattern = getMultiLevelPattern('table',trPattern)
t = tablePattern
for toks,strt,end in t.scanString(mytable):
print toks.asList()


Load Forward's with '<<' instead of '='. Change:
tablePattern = getMultiLevelPattern('table',trPattern)
to:
tablePattern << getMultiLevelPattern('table',trPattern)

I think that is all you needed.

Awesome job! (Also check out the pyparsing built-ins for making HTML
and XML tags.)

-- Paul

Dec 8 '05 #2
Paul McGuire wrote:

Load Forward's with '<<' instead of '='. Change:
tablePattern = getMultiLevelPattern('table',trPattern)
to:
tablePattern << getMultiLevelPattern('table',trPattern)

I think that is all you needed.

Awesome job! (Also check out the pyparsing built-ins for making HTML
and XML tags.)

-- Paul


thank you , i was wonding why my iteraiton not functional . so it's my
fault .

later , i checked other parsers like Clientable & BeautifulSoap ,
i think with beautifulsoap doing this job is a better idea.

Dec 8 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: Lukas Holcik | last post by:
Hi everyone! How can I simply search text for regexps (lets say <a href="(.*?)">(.*?)</a>) and save all URLs(1) and link contents(2) in a dictionary { name : URL}? In a single pass if it could....
2
by: Peter Fein | last post by:
I'm trying to use pyparsing write a screenscraper. I've got some arbitrary HTML text I define as opener & closer. In between is the HTML data I want to extract. However, the data may contain the...
3
by: Paul McGuire | last post by:
"The best laid plans o' mice an' men / Gang aft a-gley" So said Robert Burns (who really should do something about that speech impediment!). And so said I about 6 weeks ago, when I thought that...
2
by: Inyeol Lee | last post by:
I'm trying to extract module contents from Verilog, which has the form of; module foo (port1, port2, ... ); // module contents to extract here. ... endmodule
13
by: 7stud | last post by:
To the developer: 1) I went to the pyparsing wiki to download the pyparsing module and try it 2) At the wiki, there was no index entry in the table of contents for Downloads. After searching...
0
by: napolpie | last post by:
DISCUSSION IN USER nappie writes: Hello, I'm Peter and I'm new in python codying and I'm using parsying to extract data from one meteo Arpege file. This file is long file and it's composed by...
1
by: Steve | last post by:
Hi All (especially Paul McGuire!) Could you lend a hand in the grammar and paring of the output from the function win32pdhutil.ShowAllProcesses()? This is the code that I have so far (it is...
3
by: Prabhu Gurumurthy | last post by:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 All, I have the following lines that I would like to parse in python using pyparsing, but have some problems forming the grammar. Line in...
1
by: rh0dium | last post by:
Hi all, I almost did my first pyparsing without help but here we go again. Let's start with my code. The sample data is listed below. # This will gather the following ( "NamedPin"...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.