Hi, bariole,
Generally speaking it is impossible to create scanner and parser of SGML
(HTML and XML) using flex/bison and the like.
These are special languages as they require building the parser dynamically
on the fly, based on a Document Type Declaration (DTD).
Moreover, practical HTML scanner and parser shall include also parser for
JavaScript (or its subset) as you may bump into following:
<SCRIPT>
function foo() { a.write("</SCRIPT>"); }
</SCRIPT>
(not recommended by spec but happens)
To be short I think you will not find ready to use lex/y files for HTML.
"Finite state automata" of HTML parser is not so hard to write.
And there are plenty of examples in the Net.
For example :
http://www.do.org/products/parser/
"Per aspera ad astra!" :)
Andrew Fedoniouk.
http://terrainformatica.com
"bariole" <barioleNO@SPAMyahoo.com> wrote in message
news:ls5la0tajasu3svh00cvh7p2g2750f5bs7@4ax.com...[color=blue]
> Hi
>
> I am trying to make lexical analysis of some simplified html code with
> flex tool. However that kind of work is new to me and I don't know
> where to start. I have searched a web but I didn't find anything
> useful. I found tools like LEXHTML.CXX library but I have no need for
> that.
>
> What I need is simple overview of working ideas of most usual html
> lexical analysators like ones inside IE or Gecko. Something like good
> article or post where is described how lexical atoms or operators and
> similar particles are recognized in HTML (what is what and where it
> goes).
>
> Kudos for your help..
>[/color]