By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
445,813 Members | 1,252 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 445,813 IT Pros & Developers. It's quick & easy.

Parsing an html/aspx file

P: n/a
I can't seem to find any references to this, but here goes:

In there anyway to parse an html/aspx file within an asp.net
application to gather a collection of controls in the file. For
instance what I'm trying to do is upload a html file onto the web
server, convert it to aspx file and then parse it for input
tags/controls, which in turn will become fields in a newly created
database table.

Clearly when the aspx file is called the controls become available, but
I want to gather the collection from another page, basically parsing
the aspx file. I don't want to have to end up treating it like a text
file (reading it in and handling it char by char) but that may be my
only option unless anyone has any suggestions?

Cheers

Nov 28 '06 #1
Share this Question
Share on Google+
4 Replies


P: n/a
Well, I don't know of anything off hand that does anything like this other
than the .NET framework itself, Microsoft would have some clr code somewhere
in one of the .NET framework dlls that does the opposite process to what
you're describing i.e. they generate HTML from ASPX files, maybe there's
some underlying infrastructure there that you could reuse.

Alternatively I would use the System.Xml namespace to parse the HTML files,
you should have no problems reading well formed HTML as XML. If the HTML is
not well formed that might be another matter.

Using System.Xml namespace should be a lot easier than parsing it as a text
file manually.

Michael
http://www.mblmsoftware.com/

<Ne********@cityofbristol.ac.ukwrote in message
news:11*********************@l12g2000cwl.googlegro ups.com...
>I can't seem to find any references to this, but here goes:

In there anyway to parse an html/aspx file within an asp.net
application to gather a collection of controls in the file. For
instance what I'm trying to do is upload a html file onto the web
server, convert it to aspx file and then parse it for input
tags/controls, which in turn will become fields in a newly created
database table.

Clearly when the aspx file is called the controls become available, but
I want to gather the collection from another page, basically parsing
the aspx file. I don't want to have to end up treating it like a text
file (reading it in and handling it char by char) but that may be my
only option unless anyone has any suggestions?

Cheers

Nov 28 '06 #2

P: n/a
very little html is well formed enough for an xml parser to read it. one
<brinstead of <br /and your toast. also attributes need t be quoted
correctly. .net parses asp.net files by looking for well formed asp.net tags
(lax about quotes though). most of the html parses I've seen are really xml
parses and don't work well in the general case.

you could compile the page with the asp.net compiler, then load the dll.and
use reflection to walk the controls collection.

-- bruce (sqlwork.com)

"Michael Lang" <micklang at gmail.comwrote in message
news:Og**************@TK2MSFTNGP02.phx.gbl...
Well, I don't know of anything off hand that does anything like this other
than the .NET framework itself, Microsoft would have some clr code
somewhere in one of the .NET framework dlls that does the opposite process
to what you're describing i.e. they generate HTML from ASPX files, maybe
there's some underlying infrastructure there that you could reuse.

Alternatively I would use the System.Xml namespace to parse the HTML
files, you should have no problems reading well formed HTML as XML. If
the HTML is not well formed that might be another matter.

Using System.Xml namespace should be a lot easier than parsing it as a
text file manually.

Michael
http://www.mblmsoftware.com/

<Ne********@cityofbristol.ac.ukwrote in message
news:11*********************@l12g2000cwl.googlegro ups.com...
>>I can't seem to find any references to this, but here goes:

In there anyway to parse an html/aspx file within an asp.net
application to gather a collection of controls in the file. For
instance what I'm trying to do is upload a html file onto the web
server, convert it to aspx file and then parse it for input
tags/controls, which in turn will become fields in a newly created
database table.

Clearly when the aspx file is called the controls become available, but
I want to gather the collection from another page, basically parsing
the aspx file. I don't want to have to end up treating it like a text
file (reading it in and handling it char by char) but that may be my
only option unless anyone has any suggestions?

Cheers


Nov 28 '06 #3

P: n/a
Compiling the ASPX and using reflection I guess that's one way to use the
underlying infrastructure. That sounds pretty tricky as well though, you
may have to play around with some compiler settings.

In 1.0 I have doubts of success as if my memory serves me correctly it just
compiles the code behind and not the HTML within the ASPX.

In 2.0 at least you have the option when publishing the site to compile not
just the code behind but the contents of the ASPX into the dll. I'd suggest
having a play with Ildasm to see what's available in your assembly before
you try writting a lot of code using reflection on it.

I did say it would be another matter if the HTML is not well formed. But I
thought perhaps you could tweak the XmlTextReader to read HTML by playing
with the XmlReaderSettings or something... I've never tried this so I didn't
know. A quick look at the API tells me that you're quite right, it's tricky,
but not completely hopeless.

So I did a search to see if anyone has attempted it and found this...

http://www.gotdotnet.com/Community/U...4-c3bd760564bc

....which might be worth a look.

Cheers

Michael
http://www.mblmsoftware.com/

"bruce barker (sqlwork.com)" <b_*************************@sqlwork.comwrote
in message news:uw**************@TK2MSFTNGP06.phx.gbl...
very little html is well formed enough for an xml parser to read it. one
<brinstead of <br /and your toast. also attributes need t be quoted
correctly. .net parses asp.net files by looking for well formed asp.net
tags (lax about quotes though). most of the html parses I've seen are
really xml parses and don't work well in the general case.

you could compile the page with the asp.net compiler, then load the
dll.and use reflection to walk the controls collection.

-- bruce (sqlwork.com)

"Michael Lang" <micklang at gmail.comwrote in message
news:Og**************@TK2MSFTNGP02.phx.gbl...
>Well, I don't know of anything off hand that does anything like this
other than the .NET framework itself, Microsoft would have some clr code
somewhere in one of the .NET framework dlls that does the opposite
process to what you're describing i.e. they generate HTML from ASPX
files, maybe there's some underlying infrastructure there that you could
reuse.

Alternatively I would use the System.Xml namespace to parse the HTML
files, you should have no problems reading well formed HTML as XML. If
the HTML is not well formed that might be another matter.

Using System.Xml namespace should be a lot easier than parsing it as a
text file manually.

Michael
http://www.mblmsoftware.com/

<Ne********@cityofbristol.ac.ukwrote in message
news:11*********************@l12g2000cwl.googlegr oups.com...
>>>I can't seem to find any references to this, but here goes:

In there anyway to parse an html/aspx file within an asp.net
application to gather a collection of controls in the file. For
instance what I'm trying to do is upload a html file onto the web
server, convert it to aspx file and then parse it for input
tags/controls, which in turn will become fields in a newly created
database table.

Clearly when the aspx file is called the controls become available, but
I want to gather the collection from another page, basically parsing
the aspx file. I don't want to have to end up treating it like a text
file (reading it in and handling it char by char) but that may be my
only option unless anyone has any suggestions?

Cheers



Nov 28 '06 #4

P: n/a
Thanks for the replies, I do find it interesting that there is no
obvious methods for doing something that the framework must do
everytime it loads a aspx page!

I will try out parsing it as xml and the SgmlReader API.

Nov 29 '06 #5

This discussion thread is closed

Replies have been disabled for this discussion.