467,109 Members | 1,345 Online
Bytes | Developer Community
Ask Question

Home New Posts Topics Members FAQ

Post your question to a community of 467,109 developers. It's quick & easy.

remove contents between tags

want to remove "content" between two corresponding tags. What's the best way of doing it? I want to remove contents between <SEC-HEADER> and <table> tags. I have different tags, and I want to remove contents from html code, and what is the best way of scanning the file just once. I know I can count the number of lines in between but it deosn't seem to be efficeint.

Thanks,



Expand|Select|Wrap|Line Numbers
  1. <SEC-HEADER>0001047469-08-001731.hdr.sgml : 20080226 
  2.  
  3. MAIL ADDRESS: 
  4. STREET 1: 1 NEW ORCHARD RD 
  5. CITY: ARMONK 
  6. STATE: NY 
  7. ZIP: 10504 
  8. </SEC-HEADER> 
  9.  
  10. <DIV ALIGN="CENTER"><TABLE WIDTH="100%" BORDER=0 CELLSPACING=0 CELLPADDING=0> 
  11. <TR VALIGN="TOP"> 
  12. <TD WIDTH="38%" ALIGN="CENTER"><FONT SIZE=2><B>NEW YORK<BR> </B></FONT><FONT SIZE=2>(State of Incorporation)</FONT></TD> 
  13. <TD WIDTH="21%"><FONT SIZE=2>&nbsp;</FONT></TD> 
  14. <TD WIDTH="40%" ALIGN="CENTER"><FONT SIZE=2><B>13-0871985<BR> </B></FONT><FONT SIZE=2>(IRS Employer Identification Number)</FONT></TD> 
  15. </TR> 
  16. <TR VALIGN="TOP"> 
  17. <TD WIDTH="38%" ALIGN="CENTER"><BR><FONT SIZE=2><B>ARMONK, NEW YORK<BR> </B></FONT><FONT SIZE=2>(Address of principal executive offices)</FONT></TD> 
  18. <TD WIDTH="21%"><FONT SIZE=2><BR>&nbsp;</FONT></TD> 
  19. <TD WIDTH="40%" ALIGN="CENTER"><BR><FONT SIZE=2><B>10504<BR> </B></FONT><FONT SIZE=2>(Zip Code)</FONT></TD> 
  20. </TR> 
  21. </TABLE></DIV>
Sep 23 '08 #1
  • viewed: 2219
Share:
3 Replies
numberwhun
Expert Mod 2GB
What have you tried thus far to do what you are describing?

Post your code here, surrounded by the necessary code tags, and we will try and help you.

Regards,

Jeff
Sep 23 '08 #2
hi ...if i am getting ur question.. then it can be implemented easily.. i can give u a hint... and u try to code it..urself. try the inbuilt {s////} of pattern matching and substituting simultaneously...
thats all...
Oct 15 '08 #3
KevinADC
Expert 2GB
The "best" way is to use an actual HTML parser, like HTML::Parser or HTML::TokeParser. Unfortunately they are not exactly user friendly modules.

If the html is very very tidy and you know the exact structure then you can use regular expressions but if the html code is dynamic or subject to some changes or poor formatting a parser is the way to go.
Oct 15 '08 #4

Post your reply

Sign in to post your reply or Sign up for a free account.

Similar topics

1 post views Thread by Aaron Fleming | last post: by
6 posts views Thread by tshad | last post: by
10 posts views Thread by Hermann.Richter@gmail.com | last post: by
1 post views Thread by shapper | last post: by
4 posts views Thread by ashish1779 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.