Hi, bariole,
Generally speaking it is impossible to create scanner and parser of SGML
(HTML and XML) using flex/bison and the like.
These are special languages as they require building the parser dynamically
on the fly, based on a Document Type Declaration (DTD).
Moreover, practical HTML scanner and parser shall include also parser for
JavaScript (or its subset) as you may bump into following:
<SCRIPT>
function foo() { a.write("</SCRIPT>"); }
</SCRIPT>
(not recommended by spec but happens)
To be short I think you will not find ready to use lex/y files for HTML.
"Finite state automata" of HTML parser is not so hard to write.
And there are plenty of examples in the Net.
For example :
http://www.do.org/products/parser/
"Per aspera ad astra!" :)
Andrew Fedoniouk.
http://terrainformatica.com
"bariole" <ba*******@SPAMyahoo.com> wrote in message
news:ls********************************@4ax.com...
Hi
I am trying to make lexical analysis of some simplified html code with
flex tool. However that kind of work is new to me and I don't know
where to start. I have searched a web but I didn't find anything
useful. I found tools like LEXHTML.CXX library but I have no need for
that.
What I need is simple overview of working ideas of most usual html
lexical analysators like ones inside IE or Gecko. Something like good
article or post where is described how lexical atoms or operators and
similar particles are recognized in HTML (what is what and where it
goes).
Kudos for your help..