473,436 Members | 1,612 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,436 software developers and data experts.

The trouble with my code ?

Any ideas on this. I am trying to loop through an xml document to remove
attributes, but Im having so much trouble, any help is appreciated

//THIS IS THE EXCEPTION ( SEE CODE LINE WHERE FAILURE OCCURS

'//Unexpected XML declaration. The XML declaration must be the first node in
the document, and no white space characters are allowed to appear before it.
Line 13, position 11.

//THE XHTML TEXT WHICH IS BEING LOOOKED AT

<table cellspacing="0" rules="all" border="1" id="dgArticles"
style="font-family:Arial;font-size:8pt;width:762px;border-collapse:collapse;">
<tr style="color:White;background-color:Blue;">
<td>&nbsp;</td><td style="width:0.75cm;">ID</td><td
style="width:7cm;">Title</td><td style="width:13cm;">Summary</td><td
style="width:1cm;">Published</td>
</tr><tr valign="Top">
<td><a href='Articles/Art226/Art226.html'
target=_blank>Open</a></td><td>226</td><td>SQL Server 2005
Permissions</td><td>See this article for a handy reference to the complete
list of permissons on SQL Server 2005 </td><td>28/12/2006</td>
</tr><tr valign="Top">
<td><a href='Articles/Art223/Art223.html'
target=_blank>Open</a></td><td>223</td><td>SQL Schemas In SQL
2005</td><td>Want to know a little more about schemas in SQL Server 2005,
take a look at this quick overview. </td><td>25/12/2006</td>
</tr><tr valign="Top">
<td><a href='Articles/Art224/Art224.html'
target=_blank>Open</a></td><td>224</td><td>SQL Server 2005 - Must_Change
option</td><td>When de-checking Enforce Password Policy, SQL Security
responds with an error and refers to Must_Change being in force. This
article shows you how to reverse this. </td><td>27/12/2006</td>
</tr><tr valign="Top">
<td><a href='Articles/Art220/Art220.html'
target=_blank>Open</a></td><td>220</td><td>Installing Adventureworks
Sample</td><td>If you dont install the samples for Adventureworks first
time, getting them on can be a little tricky. This article explains.
</td><td>23/12/2006</td>
</tr>
</table>

'// THE CODE WHICH PROCESSES THE xhtml

Private Sub useXmlDocButton_Click(ByVal sender As System.Object, ByVal e As
System.EventArgs) Handles useXmlDocButton.Click

GC.Collect()

'Clear message

Me.messageTextBox.Text = String.Empty

Dim xmlString As String

'//Some pre-processing here

xmlString = Me.sourcetextBox.Text.ToLower

'//Remove nbsp

xmlString = Regex.Replace(xmlString, "&nbsp;", "")

'//Remove any explorer codes

xmlString = Regex.Replace(xmlString, "&[a-zA-Z0-9]*;", "")

'//Remove any unquoted attributes which appear at the end of a tag

xmlString = Regex.Replace(xmlString, " [A-Za-z0-9]*=[A-Za-z0-9_]*>", ">")

'//Remove any unquoted attributes which before end of tag

xmlString = Regex.Replace(xmlString, " [A-Za-z0-9]*=[A-Za-z0-9_]* ", "")

'Finally prepend the cml declaration needed

xmlString = "<?xml version='1.0' encoding='utf-8'?" & xmlString

Me.sourcetextBox.Text = xmlString

'Get the xml into a stream

Dim stream As New System.IO.MemoryStream

stream.Write((New System.Text.UTF8Encoding).GetBytes(xmlString), 0,
xmlString.Length)

stream.Position = 0

Dim xDoc As New System.Xml.XmlDocument

xDoc.Load(stream)

stream.Position = 0

Dim xreader As New System.Xml.XmlTextReader(stream)

Dim xNode As System.Xml.XmlNode

stream.Position = 0

While xreader.Read()

If xreader.NodeType = Xml.XmlNodeType.Element Then

xNode = xDoc.ReadNode(xreader) '//************* THIS IS WHERE IT FAILS //

xNode.Attributes.RemoveAll()

End If

End While

Dim sr As New System.IO.StreamReader(stream)

stream.Position = 0

targetTextBox.Text = sr.ReadToEnd

sr.Close()

sr.Dispose()

xreader.Close()

stream.Close()

stream.Dispose()


Jan 6 '07 #1
6 1708

"Just Me" <news.microsoft.comwrote in message
news:eo**************@TK2MSFTNGP06.phx.gbl...
: Any ideas on this. I am trying to loop through an xml document to
: remove attributes, but Im having so much trouble, any help is
: appreciated
:
: //THIS IS THE EXCEPTION ( SEE CODE LINE WHERE FAILURE OCCURS
:
: '//Unexpected XML declaration. The XML declaration must be the first
: node in the document, and no white space characters are allowed to
: appear before it. Line 13, position 11.
:
: //THE XHTML TEXT WHICH IS BEING LOOOKED AT
:
: <table cellspacing="0" rules="all" border="1" id="dgArticles"
: style="font-family:Arial;font-size:8pt;width:762px;border-collapse
: :collapse;">
: <tr style="color:White;background-color:Blue;">
: <td>&nbsp;</td><td style="width:0.75cm;">ID</td><td
: style="width:7cm;">Title</td><td style="width:13cm;">Summary</td><td
: style="width:1cm;">Published</td>
: </tr><tr valign="Top">
: <td><a href='Articles/Art226/Art226.html'
: target=_blank>Open</a></td><td>226</td><td>SQL Server 2005
: Permissions</td><td>See this article for a handy reference to the
: complete list of permissons on SQL Server 2005
: </td><td>28/12/2006</td>
: </tr><tr valign="Top">
: <td><a href='Articles/Art223/Art223.html'
: target=_blank>Open</a></td><td>223</td><td>SQL Schemas In SQL
: 2005</td><td>Want to know a little more about schemas in SQL Server
: 2005, take a look at this quick overview.
: </td><td>25/12/2006</td>
: </tr><tr valign="Top">
: <td><a href='Articles/Art224/Art224.html'
: target=_blank>Open</a></td><td>224</td><td>SQL Server 2005 -
: Must_Change option</td><td>When de-checking Enforce Password Policy,
: SQL Security responds with an error and refers to Must_Change being
: in force. This article shows you how to reverse this.
: </td><td>27/12/2006</td>
: </tr><tr valign="Top">
: <td><a href='Articles/Art220/Art220.html'
: target=_blank>Open</a></td><td>220</td><td>Installing Adventureworks
: Sample</td><td>If you dont install the samples for Adventureworks
: first time, getting them on can be a little tricky. This article
: explains.
: </td><td>23/12/2006</td>
: </tr>
: </table>
:
: '// THE CODE WHICH PROCESSES THE xhtml
:
:
:
: Private Sub useXmlDocButton_Click(ByVal sender As System.Object,
: ByVal e As System.EventArgs) Handles useXmlDocButton.Click
:
: GC.Collect()
:
: 'Clear message
:
: Me.messageTextBox.Text = String.Empty
:
: Dim xmlString As String
:
: '//Some pre-processing here
:
: xmlString = Me.sourcetextBox.Text.ToLower
:
: '//Remove nbsp
:
: xmlString = Regex.Replace(xmlString, "&nbsp;", "")
:
: '//Remove any explorer codes
:
: xmlString = Regex.Replace(xmlString, "&[a-zA-Z0-9]*;", "")
:
: '//Remove any unquoted attributes which appear at the end of a tag
:
: xmlString = Regex.Replace(xmlString, " [A-Za-z0-9]*=[A-Za-z0-9_]*>",
: ">")
:
: '//Remove any unquoted attributes which before end of tag
:
: xmlString = Regex.Replace(xmlString, " [A-Za-z0-9]*=[A-Za-z0-9_]* ",
: "")
:
: 'Finally prepend the cml declaration needed
:
: xmlString = "<?xml version='1.0' encoding='utf-8'?" & xmlString
:
: Me.sourcetextBox.Text = xmlString
:
: 'Get the xml into a stream
:
: Dim stream As New System.IO.MemoryStream
:
: stream.Write((New System.Text.UTF8Encoding).GetBytes(xmlString), 0,
: xmlString.Length)
:
: stream.Position = 0
:
: Dim xDoc As New System.Xml.XmlDocument
:
: xDoc.Load(stream)
:
: stream.Position = 0
:
: Dim xreader As New System.Xml.XmlTextReader(stream)
:
: Dim xNode As System.Xml.XmlNode
:
: stream.Position = 0
:
: While xreader.Read()
:
: If xreader.NodeType = Xml.XmlNodeType.Element Then
:
: xNode = xDoc.ReadNode(xreader) '//************* THIS IS WHERE IT
: FAILS //
:
: xNode.Attributes.RemoveAll()
:
: End If
:
: End While
:
:
:
: Dim sr As New System.IO.StreamReader(stream)
:
: stream.Position = 0
:
: targetTextBox.Text = sr.ReadToEnd
:
: sr.Close()
:
: sr.Dispose()
:
: xreader.Close()
:
: stream.Close()
:
: stream.Dispose()
Try something along these lines instead (VB.NET 2.0):

xmlString As String = Me.sourcetextBox.Text.ToLower
xmlString = Regex.Replace(xmlString, _
"&nbsp;", "")
xmlString = Regex.Replace(xmlString, _
"&[a-zA-Z0-9]*;", "")
xmlString = Regex.Replace(xmlString, _
" [A-Za-z0-9]*=[A-Za-z0-9_]*>", ">")
xmlString = Regex.Replace(xmlString, _
" [A-Za-z0-9]*=[A-Za-z0-9_]* ", "")

'NOT SURE WHY YOU'D WANT THIS BUT NO HARM IN IT
xmlString = "<?xml version='1.0' encoding='utf-8'?" & xmlString

Dim tmpDoc as New XmlDocument
tmpdoc.loadxml(xmlstring)
ZapAttributes(tmpdoc.selectSingleNode("/table"))
Me.targetTextBox.Text = tmpdoc.InnerXml

[...]

Private Sub ZapAttributes(xNode as xmlnode)
If xNode.attributes IsNot Nothing Then
xnode.Attributes.RemoveAll
End If
For each child As xmlNode in xNode.childNOdes
ZapAttributes(child)
Next
End Sub

Ralf
--
--
----------------------------------------------------------
* ^~^ ^~^ *
* _ {~ ~} {~ ~} _ *
* /_``>*< >*<''_\ *
* (\--_)++) (++(_--/) *
----------------------------------------------------------
There are no advanced students in Aikido - there are only
competent beginners. There are no advanced techniques -
only the correct application of basic principles.
Jan 6 '07 #2
Thanks for your help. But it doesent really answer my question about my own
failing code. Where am I going wrong, this is important for me to learn as I
need to know why its failing.

Many Thanks

"_AnonCoward" <ab*@xyz.comwrote in message
news:45***********************@roadrunner.com...
>
"Just Me" <news.microsoft.comwrote in message
news:eo**************@TK2MSFTNGP06.phx.gbl...
: Any ideas on this. I am trying to loop through an xml document to
: remove attributes, but Im having so much trouble, any help is
: appreciated
:
: //THIS IS THE EXCEPTION ( SEE CODE LINE WHERE FAILURE OCCURS
:
: '//Unexpected XML declaration. The XML declaration must be the first
: node in the document, and no white space characters are allowed to
: appear before it. Line 13, position 11.
:
: //THE XHTML TEXT WHICH IS BEING LOOOKED AT
:
: <table cellspacing="0" rules="all" border="1" id="dgArticles"
: style="font-family:Arial;font-size:8pt;width:762px;border-collapse
: :collapse;">
: <tr style="color:White;background-color:Blue;">
: <td>&nbsp;</td><td style="width:0.75cm;">ID</td><td
: style="width:7cm;">Title</td><td style="width:13cm;">Summary</td><td
: style="width:1cm;">Published</td>
: </tr><tr valign="Top">
: <td><a href='Articles/Art226/Art226.html'
: target=_blank>Open</a></td><td>226</td><td>SQL Server 2005
: Permissions</td><td>See this article for a handy reference to the
: complete list of permissons on SQL Server 2005
: </td><td>28/12/2006</td>
: </tr><tr valign="Top">
: <td><a href='Articles/Art223/Art223.html'
: target=_blank>Open</a></td><td>223</td><td>SQL Schemas In SQL
: 2005</td><td>Want to know a little more about schemas in SQL Server
: 2005, take a look at this quick overview.
: </td><td>25/12/2006</td>
: </tr><tr valign="Top">
: <td><a href='Articles/Art224/Art224.html'
: target=_blank>Open</a></td><td>224</td><td>SQL Server 2005 -
: Must_Change option</td><td>When de-checking Enforce Password Policy,
: SQL Security responds with an error and refers to Must_Change being
: in force. This article shows you how to reverse this.
: </td><td>27/12/2006</td>
: </tr><tr valign="Top">
: <td><a href='Articles/Art220/Art220.html'
: target=_blank>Open</a></td><td>220</td><td>Installing Adventureworks
: Sample</td><td>If you dont install the samples for Adventureworks
: first time, getting them on can be a little tricky. This article
: explains.
: </td><td>23/12/2006</td>
: </tr>
: </table>
:
: '// THE CODE WHICH PROCESSES THE xhtml
:
:
:
: Private Sub useXmlDocButton_Click(ByVal sender As System.Object,
: ByVal e As System.EventArgs) Handles useXmlDocButton.Click
:
: GC.Collect()
:
: 'Clear message
:
: Me.messageTextBox.Text = String.Empty
:
: Dim xmlString As String
:
: '//Some pre-processing here
:
: xmlString = Me.sourcetextBox.Text.ToLower
:
: '//Remove nbsp
:
: xmlString = Regex.Replace(xmlString, "&nbsp;", "")
:
: '//Remove any explorer codes
:
: xmlString = Regex.Replace(xmlString, "&[a-zA-Z0-9]*;", "")
:
: '//Remove any unquoted attributes which appear at the end of a tag
:
: xmlString = Regex.Replace(xmlString, " [A-Za-z0-9]*=[A-Za-z0-9_]*>",
: ">")
:
: '//Remove any unquoted attributes which before end of tag
:
: xmlString = Regex.Replace(xmlString, " [A-Za-z0-9]*=[A-Za-z0-9_]* ",
: "")
:
: 'Finally prepend the cml declaration needed
:
: xmlString = "<?xml version='1.0' encoding='utf-8'?" & xmlString
:
: Me.sourcetextBox.Text = xmlString
:
: 'Get the xml into a stream
:
: Dim stream As New System.IO.MemoryStream
:
: stream.Write((New System.Text.UTF8Encoding).GetBytes(xmlString), 0,
: xmlString.Length)
:
: stream.Position = 0
:
: Dim xDoc As New System.Xml.XmlDocument
:
: xDoc.Load(stream)
:
: stream.Position = 0
:
: Dim xreader As New System.Xml.XmlTextReader(stream)
:
: Dim xNode As System.Xml.XmlNode
:
: stream.Position = 0
:
: While xreader.Read()
:
: If xreader.NodeType = Xml.XmlNodeType.Element Then
:
: xNode = xDoc.ReadNode(xreader) '//************* THIS IS WHERE IT
: FAILS //
:
: xNode.Attributes.RemoveAll()
:
: End If
:
: End While
:
:
:
: Dim sr As New System.IO.StreamReader(stream)
:
: stream.Position = 0
:
: targetTextBox.Text = sr.ReadToEnd
:
: sr.Close()
:
: sr.Dispose()
:
: xreader.Close()
:
: stream.Close()
:
: stream.Dispose()
Try something along these lines instead (VB.NET 2.0):

xmlString As String = Me.sourcetextBox.Text.ToLower
xmlString = Regex.Replace(xmlString, _
"&nbsp;", "")
xmlString = Regex.Replace(xmlString, _
"&[a-zA-Z0-9]*;", "")
xmlString = Regex.Replace(xmlString, _
" [A-Za-z0-9]*=[A-Za-z0-9_]*>", ">")
xmlString = Regex.Replace(xmlString, _
" [A-Za-z0-9]*=[A-Za-z0-9_]* ", "")

'NOT SURE WHY YOU'D WANT THIS BUT NO HARM IN IT
xmlString = "<?xml version='1.0' encoding='utf-8'?" & xmlString

Dim tmpDoc as New XmlDocument
tmpdoc.loadxml(xmlstring)
ZapAttributes(tmpdoc.selectSingleNode("/table"))
Me.targetTextBox.Text = tmpdoc.InnerXml

[...]

Private Sub ZapAttributes(xNode as xmlnode)
If xNode.attributes IsNot Nothing Then
xnode.Attributes.RemoveAll
End If
For each child As xmlNode in xNode.childNOdes
ZapAttributes(child)
Next
End Sub

Ralf
--
--
----------------------------------------------------------
* ^~^ ^~^ *
* _ {~ ~} {~ ~} _ *
* /_``>*< >*<''_\ *
* (\--_)++) (++(_--/) *
----------------------------------------------------------
There are no advanced students in Aikido - there are only
competent beginners. There are no advanced techniques -
only the correct application of basic principles.


Jan 6 '07 #3

"Just Me" <news.microsoft.comwrote in message
news:OY**************@TK2MSFTNGP04.phx.gbl...
:
: Thanks for your help. But it doesent really answer my question about
: my own failing code. Where am I going wrong, this is important for
: me to learn as I need to know why its failing.
:
: Many Thanks

<snip>

Well, at first glance it would appear that the problem is here:

=============================
xmlString = "<?xml version='1.0' encoding='utf-8'?" & xmlString
=============================

This is the Xml Declaration the exception is referring to. However, if
you remove this line you just end up with a different exception -
"There are multiple root elements" - so in reality, the xml
declaration isn't the actual problem.

What these two exceptions have in common is that they are reporting
the underlying xml as being malformed and I think that is an important
clue. I'm not an expert with the memory stream object, so I cannot
give you a specific answer as to what is happening but it appears that
the when the xml reader gets to the end of the memory stream, it is
looping back on itself. What the xml text reader object therefore ends
up seeing is something like this:

<?xml version='1.0'?>
<table>
<tr>
[...]
</tr>
</table>
<?xml version='1.0'?>
<table>
<tr>
[...]
</tr>
</table>

In the first exception message, it's objecting because it thinks it's
seeing the <?xml...?declaration embedded in the complete document.
In second exception, it's objecting to the what it thinks is a second
root element.

As I've stated, I'm not familiar with the memory stream object so I
don't know in fact that this what is happening, but this certainly
strikes me as plausible. This argument is reinforced when you consider
that if you copy the xml into a text file and make the following
change, the xmlexceptions go away:

'Dim xreader As New System.Xml.XmlTextReader(stream)
Dim xreader As New System.Xml.XmlTextReader("xhtmldoc.xml")

Ralf
--
--
----------------------------------------------------------
* ^~^ ^~^ *
* _ {~ ~} {~ ~} _ *
* /_``>*< >*<''_\ *
* (\--_)++) (++(_--/) *
----------------------------------------------------------
There are no advanced students in Aikido - there are only
competent beginners. There are no advanced techniques -
only the correct application of basic principles.
Jan 7 '07 #4
Ok Ralf

Thanks for your insight into this problem, I find this whole area a little
confusing, there seems to be so many ways of skinning the same cat. You have
the xpath stuff, the xldocument itself, the xmlreader, the streams.

Blows my head off sometimes.

I am trying to alter the code you gave me so that I can re-apply specific
class attributes to the first row and another to the tables cells and one
for the table tag itself.

I seem to have almost got it, but not quite.

Thanks anyway for your help.

"_AnonCoward" <ab*@xyz.comwrote in message
news:45**********************@roadrunner.com...
>
"Just Me" <news.microsoft.comwrote in message
news:OY**************@TK2MSFTNGP04.phx.gbl...
:
: Thanks for your help. But it doesent really answer my question about
: my own failing code. Where am I going wrong, this is important for
: me to learn as I need to know why its failing.
:
: Many Thanks

<snip>

Well, at first glance it would appear that the problem is here:

=============================
xmlString = "<?xml version='1.0' encoding='utf-8'?" & xmlString
=============================

This is the Xml Declaration the exception is referring to. However, if
you remove this line you just end up with a different exception -
"There are multiple root elements" - so in reality, the xml
declaration isn't the actual problem.

What these two exceptions have in common is that they are reporting
the underlying xml as being malformed and I think that is an important
clue. I'm not an expert with the memory stream object, so I cannot
give you a specific answer as to what is happening but it appears that
the when the xml reader gets to the end of the memory stream, it is
looping back on itself. What the xml text reader object therefore ends
up seeing is something like this:

<?xml version='1.0'?>
<table>
<tr>
[...]
</tr>
</table>
<?xml version='1.0'?>
<table>
<tr>
[...]
</tr>
</table>

In the first exception message, it's objecting because it thinks it's
seeing the <?xml...?declaration embedded in the complete document.
In second exception, it's objecting to the what it thinks is a second
root element.

As I've stated, I'm not familiar with the memory stream object so I
don't know in fact that this what is happening, but this certainly
strikes me as plausible. This argument is reinforced when you consider
that if you copy the xml into a text file and make the following
change, the xmlexceptions go away:

'Dim xreader As New System.Xml.XmlTextReader(stream)
Dim xreader As New System.Xml.XmlTextReader("xhtmldoc.xml")

Ralf
--
--
----------------------------------------------------------
* ^~^ ^~^ *
* _ {~ ~} {~ ~} _ *
* /_``>*< >*<''_\ *
* (\--_)++) (++(_--/) *
----------------------------------------------------------
There are no advanced students in Aikido - there are only
competent beginners. There are no advanced techniques -
only the correct application of basic principles.


Jan 7 '07 #5

Just Me wrote :
<backposted/>

If what you want is to extract the contents of the html in a structured
way, then I suggest you use a tool to convert html to xml first --
there are so many details on dealing with html that any ad hoc approach
is sure to leave something out.

It seems HTMLTidy is such a tool (I never used, can't say anything
about it).

Another approach you may consider is using the WebBrowser control to
"navigate" the document structure. Maybe its easier than your current
approach:

<aircode>
Private WithEvents WB As WebBrowser
Private mText As String

Sub ExtractText(ByVal Text As String)
mText = ""
If WB Is Nothing Then WB = New WebBrowser
WB.DocumentText = Text
End Sub

Private Sub WB_DocumentCompleted( _
ByVal sender As System.Object, _
ByVal E As WebBrowserDocumentCompletedEventArgs _
) Handles WB.DocumentCompleted

Dim S As New System.Text.StringBuilder
MapHtmlItems(WB.Document.Body.Children, S, 0)
mText = S.ToString
Debug.Print(mText)
End Sub

Sub MapHtmlItems(ByVal Items As HtmlElementCollection, _
ByVal Builder As System.Text.StringBuilder, _
ByVal Level As Integer)

For Each E As HtmlElement In Items
MapHtmlItem(E, Builder, Level)
Next

End Sub

Sub MapHtmlItem(ByVal Element As HtmlElement, _
ByVal Builder As System.Text.StringBuilder, _
ByVal Level As Integer)

If Element.CanHaveChildren Then
Dim Tag As String = Element.TagName
Dim Text As String = Nothing

If Element.Children.Count = 0 Then
Text = Element.InnerText
End If

Select Case Element.TagName.ToLower
Case "table", "tr", "td"
'does nothing
Case Else
Tag = Nothing
End Select

Dim Tab As String = New String(" "c, Level * 2)
If Not String.IsNullOrEmpty(Text) Then
Dim S As String
If Not String.IsNullOrEmpty(Tag) Then
S = String.Format("{0}<{1}>{2}</{1}>", Tab, Tag, Text)
Else
S = String.Format("{0}{1}", Tab, Text)
End If
Builder.AppendLine(S)
Else
If Not String.IsNullOrEmpty(Tag) Then
Builder.AppendLine(String.Format("{0}<{1}>", Tab, Tag))
End If

MapHtmlItems(Element.Children, Builder, Level + 1)

If Not String.IsNullOrEmpty(Tag) Then
Builder.AppendLine(String.Format("{0}</{1}>", Tab, Tag))
End If

End If

End If

End Sub

</aircode>

The previous code will extract all table structures from the htmltext
you provide. To this, just pass the text to ExtractText(); the result
will be saved in the mText global string. Maybe this can give you new
ideas. ;-)

HTH.

Regards,

Branco.
Any ideas on this. I am trying to loop through an xml document to remove
attributes, but Im having so much trouble, any help is appreciated

//THIS IS THE EXCEPTION ( SEE CODE LINE WHERE FAILURE OCCURS

'//Unexpected XML declaration. The XML declaration must be the first node in
the document, and no white space characters are allowed to appear before it.
Line 13, position 11.

//THE XHTML TEXT WHICH IS BEING LOOOKED AT

<table cellspacing="0" rules="all" border="1" id="dgArticles"
style="font-family:Arial;font-size:8pt;width:762px;border-collapse:collapse;">
<tr style="color:White;background-color:Blue;">
<td>&nbsp;</td><td style="width:0.75cm;">ID</td><td
style="width:7cm;">Title</td><td style="width:13cm;">Summary</td><td
style="width:1cm;">Published</td>
</tr><tr valign="Top">
<td><a href='Articles/Art226/Art226.html'
target=_blank>Open</a></td><td>226</td><td>SQL Server 2005
Permissions</td><td>See this article for a handy reference to the complete
list of permissons on SQL Server 2005 </td><td>28/12/2006</td>
</tr><tr valign="Top">
<td><a href='Articles/Art223/Art223.html'
target=_blank>Open</a></td><td>223</td><td>SQL Schemas In SQL
2005</td><td>Want to know a little more about schemas in SQL Server 2005,
take a look at this quick overview. </td><td>25/12/2006</td>
</tr><tr valign="Top">
<td><a href='Articles/Art224/Art224.html'
target=_blank>Open</a></td><td>224</td><td>SQL Server 2005 - Must_Change
option</td><td>When de-checking Enforce Password Policy, SQL Security
responds with an error and refers to Must_Change being in force. This
article shows you how to reverse this. </td><td>27/12/2006</td>
</tr><tr valign="Top">
<td><a href='Articles/Art220/Art220.html'
target=_blank>Open</a></td><td>220</td><td>Installing Adventureworks
Sample</td><td>If you dont install the samples for Adventureworks first
time, getting them on can be a little tricky. This article explains.
</td><td>23/12/2006</td>
</tr>
</table>

'// THE CODE WHICH PROCESSES THE xhtml

Private Sub useXmlDocButton_Click(ByVal sender As System.Object, ByVal e As
System.EventArgs) Handles useXmlDocButton.Click

GC.Collect()

'Clear message

Me.messageTextBox.Text = String.Empty

Dim xmlString As String

'//Some pre-processing here

xmlString = Me.sourcetextBox.Text.ToLower

'//Remove nbsp

xmlString = Regex.Replace(xmlString, "&nbsp;", "")

'//Remove any explorer codes

xmlString = Regex.Replace(xmlString, "&[a-zA-Z0-9]*;", "")

'//Remove any unquoted attributes which appear at the end of a tag

xmlString = Regex.Replace(xmlString, " [A-Za-z0-9]*=[A-Za-z0-9_]*>", ">")

'//Remove any unquoted attributes which before end of tag

xmlString = Regex.Replace(xmlString, " [A-Za-z0-9]*=[A-Za-z0-9_]* ", "")

'Finally prepend the cml declaration needed

xmlString = "<?xml version='1.0' encoding='utf-8'?" & xmlString

Me.sourcetextBox.Text = xmlString

'Get the xml into a stream

Dim stream As New System.IO.MemoryStream

stream.Write((New System.Text.UTF8Encoding).GetBytes(xmlString), 0,
xmlString.Length)

stream.Position = 0

Dim xDoc As New System.Xml.XmlDocument

xDoc.Load(stream)

stream.Position = 0

Dim xreader As New System.Xml.XmlTextReader(stream)

Dim xNode As System.Xml.XmlNode

stream.Position = 0

While xreader.Read()

If xreader.NodeType = Xml.XmlNodeType.Element Then

xNode = xDoc.ReadNode(xreader) '//************* THIS IS WHERE IT FAILS //

xNode.Attributes.RemoveAll()

End If

End While

Dim sr As New System.IO.StreamReader(stream)

stream.Position = 0

targetTextBox.Text = sr.ReadToEnd

sr.Close()

sr.Dispose()

xreader.Close()

stream.Close()

stream.Dispose()
Jan 8 '07 #6
In the end I was able to get just what I needed. Here you go!

Imports System.xml

Imports System.Text.RegularExpressions

Private idNo As Integer

Private rowCount As Integer

Private Sub processButton_Click(ByVal sender As System.Object, ByVal e As
System.EventArgs) Handles processButton.Click

Dim xmlString As String

idNo = 0

'//Some pre-processing here

xmlString = Me.sourceTextbox.Text.ToLower

'//Remove nbsp

xmlString = Regex.Replace(xmlString, "&nbsp;", "")

'//Remove any explorer codes

xmlString = Regex.Replace(xmlString, "&[a-zA-Z0-9]*;", "")

'//Remove any unquoted attributes which appear at the end of a tag

xmlString = Regex.Replace(xmlString, "\sp*[A-Za-z0-9]*=[A-Za-z0-9_]*>", ">")

'//Remove any unquoted attributes which before end of tag

xmlString = Regex.Replace(xmlString, "\sp*[A-Za-z0-9]*=[A-Za-z0-9_]* ", "")

Dim tmpDoc As New XmlDocument

Try

tmpDoc.LoadXml(xmlString)

ZapAttributes(tmpDoc.SelectSingleNode("/table"), tmpDoc)

Me.targetTextBox.Text = tmpDoc.InnerXml

Catch ex As XmlException

End Try

End Sub

Private Sub ZapAttributes(ByVal xNode As XmlNode, ByVal xd As
System.Xml.XmlDocument)

If Not (xNode.Attributes Is Nothing) Then

Dim xAttr As System.Xml.XmlAttribute

xNode.Attributes.RemoveAll()

Select Case xNode.Name

Case "table"

xAttr = xd.CreateAttribute("class")

xAttr.Value = "ArticleTableTag"

xNode.Attributes.Append(xAttr)

Case "tr"

rowCount += 1

Case "td"

If rowCount = 1 Then

xAttr = xd.CreateAttribute("class")

xAttr.Value = "ArticleTableHeader"

xNode.Attributes.Append(xAttr)

ElseIf rowCount 1 Then

xAttr = xd.CreateAttribute("class")

xAttr.Value = "ArticleTableCells"

xNode.Attributes.Append(xAttr)

End If

Case "a"

End Select

End If

For Each child As XmlNode In xNode.ChildNodes

ZapAttributes(child, xd)

Next

End Sub

End Class
Jan 8 '07 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: Jacek Dziedzic | last post by:
Hi! First of all, I hope my problem is not too loosely tied to the "standard C++" that is the topic of this group. I have some code that exhibits a strange behaviour: on one computer, where I...
6
by: Daniel Walzenbach | last post by:
Hi, I have a web application which sometimes throws an “out of memory” exception. To get an idea what happens I traced some values using performance monitor and got the following values (for...
0
by: cwbp17 | last post by:
I'm having trouble updating individual datagrid cells. Have two tables car_master (columns include Car_ID, YEAR,VEHICLE) and car_detail (columns include Car_ID,PRICE,MILEAGE,and BODY);both tables...
3
by: Olivier BESSON | last post by:
Hello, I have a web service of my own on a server (vb.net). I must declare it with SoapRpcMethod to be used with JAVA. This is a simple exemple method of my vb source : ...
1
by: rh1200la | last post by:
Hi there. I'm having trouble with an HTTP Post in my code behind. Can anyone help? Here's my code: string data = "&fields_fname = " + txtFirstName.Text + "&fields_lname=" +...
1
by: yucikala | last post by:
Hello, I'm a "expert of beginner" in C#. I have a dll - in C. And in this dll is this struct: typedef struct msg_s { /* please make duplicates of strings before next call to emi_read() ! */ ...
2
by: JLupear | last post by:
I am having trouble with my code again, I had prepared a question and the code to upload, however I am having trouble posting it, are there limits to the amount of lines you can post? I split it...
2
by: roger26 | last post by:
I am having trouble with a registration page. Which contains 3 groups of radio buttons and a check box i having hard time to making it work. I have added code for matching the password too but having...
1
by: sndive | last post by:
i have a lot of trouble selling twisted a a client lib for network access (on embedded platform) the group i'm a member of wants to write some unmaintainable threaded blocking junk in c--. does...
9
by: itdevries | last post by:
Hi, I've ran into some trouble with an overloaded + operator, maybe someone can give me some hints what to look out for. I've got my own custom vector class, as a part of that I've overloaded...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.