By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
437,934 Members | 1,685 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 437,934 IT Pros & Developers. It's quick & easy.

remove contents between tags

P: 4
want to remove "content" between two corresponding tags. What's the best way of doing it? I want to remove contents between <SEC-HEADER> and <table> tags. I have different tags, and I want to remove contents from html code, and what is the best way of scanning the file just once. I know I can count the number of lines in between but it deosn't seem to be efficeint.


Expand|Select|Wrap|Line Numbers
  1. <SEC-HEADER>0001047469-08-001731.hdr.sgml : 20080226 
  6. STATE: NY 
  7. ZIP: 10504 
  8. </SEC-HEADER> 
  11. <TR VALIGN="TOP"> 
  12. <TD WIDTH="38%" ALIGN="CENTER"><FONT SIZE=2><B>NEW YORK<BR> </B></FONT><FONT SIZE=2>(State of Incorporation)</FONT></TD> 
  13. <TD WIDTH="21%"><FONT SIZE=2>&nbsp;</FONT></TD> 
  14. <TD WIDTH="40%" ALIGN="CENTER"><FONT SIZE=2><B>13-0871985<BR> </B></FONT><FONT SIZE=2>(IRS Employer Identification Number)</FONT></TD> 
  15. </TR> 
  16. <TR VALIGN="TOP"> 
  17. <TD WIDTH="38%" ALIGN="CENTER"><BR><FONT SIZE=2><B>ARMONK, NEW YORK<BR> </B></FONT><FONT SIZE=2>(Address of principal executive offices)</FONT></TD> 
  18. <TD WIDTH="21%"><FONT SIZE=2><BR>&nbsp;</FONT></TD> 
  19. <TD WIDTH="40%" ALIGN="CENTER"><BR><FONT SIZE=2><B>10504<BR> </B></FONT><FONT SIZE=2>(Zip Code)</FONT></TD> 
  20. </TR> 
  21. </TABLE></DIV>
Sep 23 '08 #1
Share this Question
Share on Google+
3 Replies

Expert Mod 2.5K+
P: 3,503
What have you tried thus far to do what you are describing?

Post your code here, surrounded by the necessary code tags, and we will try and help you.


Sep 23 '08 #2

P: 4
hi ...if i am getting ur question.. then it can be implemented easily.. i can give u a hint... and u try to code it..urself. try the inbuilt {s////} of pattern matching and substituting simultaneously...
thats all...
Oct 15 '08 #3

Expert 2.5K+
P: 4,059
The "best" way is to use an actual HTML parser, like HTML::Parser or HTML::TokeParser. Unfortunately they are not exactly user friendly modules.

If the html is very very tidy and you know the exact structure then you can use regular expressions but if the html code is dynamic or subject to some changes or poor formatting a parser is the way to go.
Oct 15 '08 #4

Post your reply

Sign in to post your reply or Sign up for a free account.