Connecting Tech Pros Worldwide Help | Site Map

Parsing an html/aspx file

Neil.Smith@cityofbristol.ac.uk
Guest
 
Posts: n/a
#1: Nov 28 '06
I can't seem to find any references to this, but here goes:

In there anyway to parse an html/aspx file within an asp.net
application to gather a collection of controls in the file. For
instance what I'm trying to do is upload a html file onto the web
server, convert it to aspx file and then parse it for input
tags/controls, which in turn will become fields in a newly created
database table.

Clearly when the aspx file is called the controls become available, but
I want to gather the collection from another page, basically parsing
the aspx file. I don't want to have to end up treating it like a text
file (reading it in and handling it char by char) but that may be my
only option unless anyone has any suggestions?

Cheers

Michael Lang
Guest
 
Posts: n/a
#2: Nov 28 '06

re: Parsing an html/aspx file


Well, I don't know of anything off hand that does anything like this other
than the .NET framework itself, Microsoft would have some clr code somewhere
in one of the .NET framework dlls that does the opposite process to what
you're describing i.e. they generate HTML from ASPX files, maybe there's
some underlying infrastructure there that you could reuse.

Alternatively I would use the System.Xml namespace to parse the HTML files,
you should have no problems reading well formed HTML as XML. If the HTML is
not well formed that might be another matter.

Using System.Xml namespace should be a lot easier than parsing it as a text
file manually.

Michael
http://www.mblmsoftware.com/

<Neil.Smith@cityofbristol.ac.ukwrote in message
news:1164731981.837910.21300@l12g2000cwl.googlegro ups.com...
Quote:
>I can't seem to find any references to this, but here goes:
>
In there anyway to parse an html/aspx file within an asp.net
application to gather a collection of controls in the file. For
instance what I'm trying to do is upload a html file onto the web
server, convert it to aspx file and then parse it for input
tags/controls, which in turn will become fields in a newly created
database table.
>
Clearly when the aspx file is called the controls become available, but
I want to gather the collection from another page, basically parsing
the aspx file. I don't want to have to end up treating it like a text
file (reading it in and handling it char by char) but that may be my
only option unless anyone has any suggestions?
>
Cheers
>

bruce barker \(sqlwork.com\)
Guest
 
Posts: n/a
#3: Nov 28 '06

re: Parsing an html/aspx file


very little html is well formed enough for an xml parser to read it. one
<brinstead of <br /and your toast. also attributes need t be quoted
correctly. .net parses asp.net files by looking for well formed asp.net tags
(lax about quotes though). most of the html parses I've seen are really xml
parses and don't work well in the general case.

you could compile the page with the asp.net compiler, then load the dll.and
use reflection to walk the controls collection.

-- bruce (sqlwork.com)

"Michael Lang" <micklang at gmail.comwrote in message
news:OgDE45wEHHA.4608@TK2MSFTNGP02.phx.gbl...
Quote:
Well, I don't know of anything off hand that does anything like this other
than the .NET framework itself, Microsoft would have some clr code
somewhere in one of the .NET framework dlls that does the opposite process
to what you're describing i.e. they generate HTML from ASPX files, maybe
there's some underlying infrastructure there that you could reuse.
>
Alternatively I would use the System.Xml namespace to parse the HTML
files, you should have no problems reading well formed HTML as XML. If
the HTML is not well formed that might be another matter.
>
Using System.Xml namespace should be a lot easier than parsing it as a
text file manually.
>
Michael
http://www.mblmsoftware.com/
>
<Neil.Smith@cityofbristol.ac.ukwrote in message
news:1164731981.837910.21300@l12g2000cwl.googlegro ups.com...
Quote:
>>I can't seem to find any references to this, but here goes:
>>
>In there anyway to parse an html/aspx file within an asp.net
>application to gather a collection of controls in the file. For
>instance what I'm trying to do is upload a html file onto the web
>server, convert it to aspx file and then parse it for input
>tags/controls, which in turn will become fields in a newly created
>database table.
>>
>Clearly when the aspx file is called the controls become available, but
>I want to gather the collection from another page, basically parsing
>the aspx file. I don't want to have to end up treating it like a text
>file (reading it in and handling it char by char) but that may be my
>only option unless anyone has any suggestions?
>>
>Cheers
>>
>
>

Michael Lang
Guest
 
Posts: n/a
#4: Nov 28 '06

re: Parsing an html/aspx file


Compiling the ASPX and using reflection I guess that's one way to use the
underlying infrastructure. That sounds pretty tricky as well though, you
may have to play around with some compiler settings.

In 1.0 I have doubts of success as if my memory serves me correctly it just
compiles the code behind and not the HTML within the ASPX.

In 2.0 at least you have the option when publishing the site to compile not
just the code behind but the contents of the ASPX into the dll. I'd suggest
having a play with Ildasm to see what's available in your assembly before
you try writting a lot of code using reflection on it.

I did say it would be another matter if the HTML is not well formed. But I
thought perhaps you could tweak the XmlTextReader to read HTML by playing
with the XmlReaderSettings or something... I've never tried this so I didn't
know. A quick look at the API tells me that you're quite right, it's tricky,
but not completely hopeless.

So I did a search to see if anyone has attempted it and found this...

http://www.gotdotnet.com/Community/U...4-c3bd760564bc

....which might be worth a look.

Cheers

Michael
http://www.mblmsoftware.com/

"bruce barker (sqlwork.com)" <b_r_u_c_e_removeunderscores@sqlwork.comwrote
in message news:uw9evcyEHHA.3660@TK2MSFTNGP06.phx.gbl...
Quote:
very little html is well formed enough for an xml parser to read it. one
<brinstead of <br /and your toast. also attributes need t be quoted
correctly. .net parses asp.net files by looking for well formed asp.net
tags (lax about quotes though). most of the html parses I've seen are
really xml parses and don't work well in the general case.
>
you could compile the page with the asp.net compiler, then load the
dll.and use reflection to walk the controls collection.
>
-- bruce (sqlwork.com)
>
"Michael Lang" <micklang at gmail.comwrote in message
news:OgDE45wEHHA.4608@TK2MSFTNGP02.phx.gbl...
Quote:
>Well, I don't know of anything off hand that does anything like this
>other than the .NET framework itself, Microsoft would have some clr code
>somewhere in one of the .NET framework dlls that does the opposite
>process to what you're describing i.e. they generate HTML from ASPX
>files, maybe there's some underlying infrastructure there that you could
>reuse.
>>
>Alternatively I would use the System.Xml namespace to parse the HTML
>files, you should have no problems reading well formed HTML as XML. If
>the HTML is not well formed that might be another matter.
>>
>Using System.Xml namespace should be a lot easier than parsing it as a
>text file manually.
>>
>Michael
>http://www.mblmsoftware.com/
>>
><Neil.Smith@cityofbristol.ac.ukwrote in message
>news:1164731981.837910.21300@l12g2000cwl.googlegr oups.com...
Quote:
>>>I can't seem to find any references to this, but here goes:
>>>
>>In there anyway to parse an html/aspx file within an asp.net
>>application to gather a collection of controls in the file. For
>>instance what I'm trying to do is upload a html file onto the web
>>server, convert it to aspx file and then parse it for input
>>tags/controls, which in turn will become fields in a newly created
>>database table.
>>>
>>Clearly when the aspx file is called the controls become available, but
>>I want to gather the collection from another page, basically parsing
>>the aspx file. I don't want to have to end up treating it like a text
>>file (reading it in and handling it char by char) but that may be my
>>only option unless anyone has any suggestions?
>>>
>>Cheers
>>>
>>
>>
>
>

Neil.Smith@cityofbristol.ac.uk
Guest
 
Posts: n/a
#5: Nov 29 '06

re: Parsing an html/aspx file


Thanks for the replies, I do find it interesting that there is no
obvious methods for doing something that the framework must do
everytime it loads a aspx page!

I will try out parsing it as xml and the SgmlReader API.

Closed Thread