473,326 Members | 2,012 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,326 software developers and data experts.

Parsing HTML

1
I am wondering if anyone has run across any code to parse HTML and change any <h1> or <h2> tags from.

Expand|Select|Wrap|Line Numbers
  1. <h2>Intro to Cooking</h2>
to

Expand|Select|Wrap|Line Numbers
  1. <h2 id="Intro to Cooking">Intro to Cooking</h2>
Or something similar. I need some code to automatically ad HTML anchors to existing headings on some documentation.

Anyone help please?

Thanks!
Oct 4 '07 #1
2 1308
numberwhun
3,509 Expert Mod 2GB
I am wondering if anyone has run across any code to parse HTML and change any <h1> or <h2> tags from.

Expand|Select|Wrap|Line Numbers
  1. <h2>Intro to Cooking</h2>
to

Expand|Select|Wrap|Line Numbers
  1. <h2 id="Intro to Cooking">Intro to Cooking</h2>
Or something similar. I need some code to automatically ad HTML anchors to existing headings on some documentation.

Anyone help please?

Thanks!
There are a couple of ways you could probably do that. One way to be to write some code that would examine each line and remove each <element> with a substitution regex.

Or, you could just search cpan and check out the HTML::Parser module.

Always remember that whenever you want to do something in Perl to check CPAN for a module.

Regards,

Jeff
Oct 4 '07 #2
KevinADC
4,059 Expert 2GB
Expand|Select|Wrap|Line Numbers
  1. $_ = 'foo blah  <h2>Intro to Cooking</h2> foo blah';
  2. $_ =~ s/<h(\d)>([^>]+)<\/h\1>/<h$1 id="$2">$2<\/h$1>/;
  3. print;
  4.  
but parsing html code can be messy and sometimes just impossible. HTML::Parser is a good if not easy solution.
Oct 4 '07 #3

Sign in to post your reply or Sign up for a free account.

Similar topics

8
by: Gerrit Holl | last post by:
Posted with permission from the author. I have some comments on this PEP, see the (coming) followup to this message. PEP: 321 Title: Date/Time Parsing and Formatting Version: $Revision: 1.3 $...
14
by: Viktor Rosenfeld | last post by:
Hi, I need to create a parser for a Python project, and I'd like to use process kinda like lex/yacc. I've looked at various parsing packages online, but didn't find anything useful for me: -...
9
by: RiGGa | last post by:
Hi, I want to parse a web page in Python and have it write certain values out to a mysql database. I really dont know where to start with parsing the html code ( I can work out the database...
0
by: Fuzzyman | last post by:
I am trying to parse an HTML page an only modify URLs within tags - e.g. inside IMG, A, SCRIPT, FRAME tags etc... I have built one that works fine using the HTMLParser.HTMLParser and it works...
3
by: Willem Ligtenberg | last post by:
I decided to use SAX to parse my xml file. But the parser crashes on: File "/usr/lib/python2.3/site-packages/_xmlplus/sax/handler.py", line 38, in fatalError raise exception...
16
by: Terry | last post by:
Hi, This is a newbie's question. I want to preload 4 images and only when all 4 images has been loaded into browser's cache, I want to start a slideshow() function. If images are not completed...
1
by: yonido | last post by:
hello, my goal is to get patterns out of email files - say "message forwarding" patterns (message forwarded from: xx to: yy subject: zz) now lets say there are tons of these patterns (by gmail,...
4
by: Rick Walsh | last post by:
I have an HTML table in the following format: <table> <tr><td>Header 1</td><td>Header 2</td></tr> <tr><td>1</td><td>2</td></tr> <tr><td>3</td><td>4</td></tr> <tr><td>5</td><td>6</td></tr>...
9
by: ankitdesai | last post by:
I would like to parse a couple of tables within an individual player's SHTML page. For example, I would like to get the "Actual Pitching Statistics" and the "Translated Pitching Statistics"...
4
by: Neil.Smith | last post by:
I can't seem to find any references to this, but here goes: In there anyway to parse an html/aspx file within an asp.net application to gather a collection of controls in the file. For instance...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.