473,326 Members | 2,813 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,326 software developers and data experts.

XML to plain text

I have a simple xml file that contains, in part, content that is in HTML. I
am encompassing that content in <![cdata[]]> tags. This works fine.

However, my application needs to output the XML file (from a strongly typed
dataset) to plain text. I am doing

theText = myDataset.getXML()

Which works, but it doesn't "remember" the portions that were in cdata tags,
so that content gets parsed, turning every html tag int &lt;b&gt;, etc...

Is there a simple way to output that data without parsing it, and forcing
certain nodes to use the cdata tag? The getXML function accepts no
parameters.

Thanks!

MCD
Nov 20 '05 #1
11 1430
Since I believe this issomething which can't get fixed in five seconds try:
theText =Replace(theText, "<", "&lt;")
theText =Replace(theText, ">", "&gt;")

"Big D" <a@a.com> wrote in message
news:%2******************@TK2MSFTNGP10.phx.gbl...
I have a simple xml file that contains, in part, content that is in HTML. I am encompassing that content in <![cdata[]]> tags. This works fine.

However, my application needs to output the XML file (from a strongly typed dataset) to plain text. I am doing

theText = myDataset.getXML()

Which works, but it doesn't "remember" the portions that were in cdata tags, so that content gets parsed, turning every html tag int &lt;b&gt;, etc...

Is there a simple way to output that data without parsing it, and forcing
certain nodes to use the cdata tag? The getXML function accepts no
parameters.

Thanks!

MCD

Nov 20 '05 #2
Cor
Hi BigD,

Did you mean this?

\\\It start with making a sample dataset
Dim ds As New DataSet
Dim dt As New DataTable("parameters")
For c As Integer = 1 To 10
Dim dc As New DataColumn("elem" & c.tostring)
dt.Columns.Add(dc)
Next
For r As Integer = 1 To 10
Dim dr As DataRow = dt.NewRow
For c As Integer = 1 To 10
dr("elem" & c.tostring) = _
r.ToString & c.tostring ' or just dr(c) but to show you
Next
dt.Rows.Add(dr) ' can also before but I find this looking nicer
Next
ds.Tables.Add(dt)
-- end building sample dataset
Dim ser As XmlSerializer = New XmlSerializer(GetType(DataSet))
Dim ms As New IO.MemoryStream
Dim sw As IO.TextWriter = New IO.StreamWriter(ms)
ser.Serialize(sw, ds)
Dim b As Long = ms.Length
ms.Position = 0
Dim sr As IO.TextReader = New IO.StreamReader(ms)
Dim xmlstring As String = sr.ReadToEnd
sw.Close()
sr.Close()
ms.Close()
///
I hope this helps a little bit?

Cor
Nov 20 '05 #3

Hi Big D,

I have reviewed your issue. I will spend some time to do some research on
this issue.

I will reply to you ASAP. Thanks for your understanding.

Best regards,
Jeffrey Tan
Microsoft Online Partner Support
Get Secure! - www.microsoft.com/security
This posting is provided "as is" with no warranties and confers no rights.

Nov 20 '05 #4
Cor
Hi Jeffrey.
I have reviewed your issue. I will spend some time to do some research on
this issue.

I will reply to you ASAP. Thanks for your understanding.


While there are 2 answers that are unanswered if it fits.

Is there a difference with the actions from MOPS between the persons who are
asking questions to this newsgroup?

If this is not an accident I think this is embarrassing.

Cor
Nov 20 '05 #5
Richard,

Thanks for the reply.

Yes, obviously I could do that. However, for one, I'm not confident that
"<,>" are the only characters getting parsed out. Secondly, The problem is
that even if I could do a find and replace on all the parsed characters,
this information would not exist within the <![cdata[]]> tag, so it won't be
a valid XML document. (the GetXML() funciton just places the parsed data in
between the tags without "knowing" that previously it was in cdata)

Is this a part of the schema that I need to adjust to notify it to expect
these characters? If so, how?

Thanks for the input!

-MCD

"Richard T. Ed*****@pwpsquared.net" <re****@pwpsquared.net> wrote in message
news:uP****************@TK2MSFTNGP11.phx.gbl...
Since I believe this issomething which can't get fixed in five seconds try: theText =Replace(theText, "<", "&lt;")
theText =Replace(theText, ">", "&gt;")

"Big D" <a@a.com> wrote in message
news:%2******************@TK2MSFTNGP10.phx.gbl...
I have a simple xml file that contains, in part, content that is in HTML.
I
am encompassing that content in <![cdata[]]> tags. This works fine.

However, my application needs to output the XML file (from a strongly

typed
dataset) to plain text. I am doing

theText = myDataset.getXML()

Which works, but it doesn't "remember" the portions that were in cdata

tags,
so that content gets parsed, turning every html tag int &lt;b&gt;,

etc...
Is there a simple way to output that data without parsing it, and forcing certain nodes to use the cdata tag? The getXML function accepts no
parameters.

Thanks!

MCD


Nov 20 '05 #6
Hey Cor,

Thanks for the reply. I haven't tried the code bit, but it doesn't seem
like what I need. First off, It appears that you are programmatically
building the dataset, not from the schema... that is a neccescity for my
design. The cool part of how I have it working is that since it's a
strongly typed dataset, it's super easy to work with, I don't have to know
everything about the schema in order to operate on parts of it, and the
GetXML() function is EXACTLY what I want to do, EXCEPT of course that it is
parsing the "<" characters and such.

To me is seems like a schema issue. Previously I have just manually entered
the CDATA tag into fields where I knew that there would be HTML. It seems
like VS should be able to know from a setting in the xsd that the element
contains illegal characters. That way, when GetXML reads the schema to
output the data in the dataset, it would know what to do.

Maybe I'm dreaming.

;-)

Thanks!

MCD
"Cor" <no*@non.com> wrote in message
news:OO****************@TK2MSFTNGP12.phx.gbl...
Hi BigD,

Did you mean this?

\\\It start with making a sample dataset
Dim ds As New DataSet
Dim dt As New DataTable("parameters")
For c As Integer = 1 To 10
Dim dc As New DataColumn("elem" & c.tostring)
dt.Columns.Add(dc)
Next
For r As Integer = 1 To 10
Dim dr As DataRow = dt.NewRow
For c As Integer = 1 To 10
dr("elem" & c.tostring) = _
r.ToString & c.tostring ' or just dr(c) but to show you
Next
dt.Rows.Add(dr) ' can also before but I find this looking nicer Next
ds.Tables.Add(dt)
-- end building sample dataset
Dim ser As XmlSerializer = New XmlSerializer(GetType(DataSet))
Dim ms As New IO.MemoryStream
Dim sw As IO.TextWriter = New IO.StreamWriter(ms)
ser.Serialize(sw, ds)
Dim b As Long = ms.Length
ms.Position = 0
Dim sr As IO.TextReader = New IO.StreamReader(ms)
Dim xmlstring As String = sr.ReadToEnd
sw.Close()
sr.Close()
ms.Close()
///
I hope this helps a little bit?

Cor

Nov 20 '05 #7

Hi Big D,

Sorry for letting you wait for so long time.

After consult to the product team, I know the cause of the problem.

Actually, this behavior is by design.

This is the way XML is supposed to be serialized to a string. The "<"
character is not allowed to occur in text or attribute content because it
marks the beginning of a markup, therefore we escape it as &lt;. We also
escape ">" for compatibility reasons. If you look at the dataset content
though, you should see "<" and ">" in the value unescaped.

XML spec section 2.4:

The ampersand character (&) and the left angle bracket (<) may appear in
their literal form only when used as markup delimiters, or within a
comment, a processing instruction, or a CDATA section. If they are needed
elsewhere, they must be escaped using either numeric character references
or the strings "&amp;" and "&lt;" respectively. The right angle bracket (>)
may be represented using the string "&gt;", and must, for compatibility, be
escaped using "&gt;"

So as a workaround, you may follow Richard's suggestion to parse the string
yourself.

Best regards,
Jeffrey Tan
Microsoft Online Partner Support
Get Secure! - www.microsoft.com/security
This posting is provided "as is" with no warranties and confers no rights.

Nov 20 '05 #8
Cor
Hi Jeffrey,

Now I get curious, can you tell me why I have that design behaviour not with
the sample I have send.

Cor


Hi Big D,

Sorry for letting you wait for so long time.

After consult to the product team, I know the cause of the problem.

Actually, this behavior is by design.

This is the way XML is supposed to be serialized to a string. The "<"
character is not allowed to occur in text or attribute content because it
marks the beginning of a markup, therefore we escape it as &lt;. We also
escape ">" for compatibility reasons. If you look at the dataset content
though, you should see "<" and ">" in the value unescaped.

XML spec section 2.4:

The ampersand character (&) and the left angle bracket (<) may appear in
their literal form only when used as markup delimiters, or within a
comment, a processing instruction, or a CDATA section. If they are needed
elsewhere, they must be escaped using either numeric character references
or the strings "&amp;" and "&lt;" respectively. The right angle bracket (>) may be represented using the string "&gt;", and must, for compatibility, be escaped using "&gt;"

So as a workaround, you may follow Richard's suggestion to parse the string yourself.

Best regards,
Jeffrey Tan
Microsoft Online Partner Support
Get Secure! - www.microsoft.com/security
This posting is provided "as is" with no warranties and confers no rights.

Nov 20 '05 #9
Cor
Hi,

I saw overlooking my sample, that I forgot to tell that it needs an import
to
System.Xml.Serialization
Or that you have to set it before the xmlseralizer.

Cor
Nov 20 '05 #10

Hi Cor,

Oh, sorry, I can not see any succees in your solution.

In your solution, you build a dataset yourself, which contains no CDATA
section, also, your self-produced dataset contains no "special"
character(such as "<" or ">").

I have tested your solution in the correct way in C#, but it also does not
work, like this:
private void button1_Click(object sender, System.EventArgs e)
{
DataSet ds=new DataSet();
ds.ReadXml(@"D:\newtest.xml");

XmlSerializer ser=new XmlSerializer(typeof(DataSet));
MemoryStream ms=new MemoryStream();
TextWriter sw=new StreamWriter(ms);
ser.Serialize(sw, ds);

long b=ms.Length;
ms.Position=0;

TextReader sr=new StreamReader(ms);
string xmlstring =sr.ReadToEnd();
sw.Close();
sr.Close();
ms.Close();
}

Then, in debugger you will see that the CDATA section in my
"D:\newtest.xml" is also parsed(That is "<" becomes "&lt")
Best regards,
Jeffrey Tan
Microsoft Online Partner Support
Get Secure! - www.microsoft.com/security
This posting is provided "as is" with no warranties and confers no rights.

Nov 20 '05 #11
Cor
Hi Jeffrey,

Thank you for your message. It made my confusion totally clear.

I was going for the dataset alone to string, while the problem is a HTML
portion text saved as a string in a dataset.

Thinking it over than the answer for Big D is of course very simple.

To get the portions in the dataset, read it with dataset.readXML(path) and
then just write the items as needed with the streamreader to disk or just
use it.

(The answer can be "by design", but I think that the addition must than be
that it when it is written in this way by the ds.writexml it is readed in
the properiate size back with ds.readxml).

Just my thoughts

Cor
Nov 20 '05 #12

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Alfredo Agosti | last post by:
Hi folks, I have an Access 2000 db with a memo field. Into the memo field I put text with bold attributes, URL etc etc What I need to to is converting the rich text contained into the memo...
10
by: J. Alan Rueckgauer | last post by:
Hello. I'm looking for a simple way to do the following: We have a database that serves-up content to a website. Some of those items are events, some are news articles. They're stored in the...
3
by: pradeep gummi | last post by:
I have an XML FILE that is to be converted to Plain Text using an XSL file. Since I just want plain text, I do not want to set any root element during transformation.And if I do not any root...
14
by: Akseli Mäki | last post by:
Hi, Hopefully this is not too much offtopic. I'm working on a FAQ. I want to make two versions of it, plain text and HTML. I'm looking for a tool that will make a plain text doc out of the...
8
by: LRW | last post by:
I'm not sure this message is totally appropriate for this group, so please, if anyone has a better group suggestion, let me know! My company sends out a monthly newsletter in HTML format to our...
2
by: Mike Bridge | last post by:
Is there any way to get Internet explorer to treat a text/plain .net page as plain text using asp.net? It seems like IE doesn't trust text/plain as a mime type, and so it (ironically) displays it...
3
by: MarkMurphy | last post by:
I have a simple export.aspx page that allows a user to fill in a form to export some data. The postback logic writes the data to the response stream. I have two small issues: 1) The data is...
8
by: Doominato | last post by:
good day, I was just wondering how can I download a web page as plain text from a certain web site. I have tried to use the OpenURL() method from INET control in my VB.NET app, but it returns...
10
by: Eric Lindsay | last post by:
This may be too far off topic, however I was looking at this page http://www.hixie.ch/advocacy/xhtml about XHTML problems by Ian Hickson. It is served as text/plain, according to Firefox...
6
by: monomaniac21 | last post by:
hi all how can u send a plain text version of an email with the html so that the users mail client can access this plain text version? kind regards marc
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.