473,372 Members | 1,429 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,372 software developers and data experts.

Replace special characters in xml but do not replace tags

My requirement is that, i recieve a xml file that may contain characters like "<" or ">".

for example like this
<?xml version="1.0" encoding="ISO-8859-1" ?>
<RecordDivision> abc>efg</RecordDivision>

now i want to replace the ">" character in the string abc>efg.

this is just a sample xml. i may recieve any type of xml.

So when i use some other methods i found by googling, they are replacing the xml tags characters "<" ,">" also.

My requirement is tags should not be replaced but the content should be.
i am using c#
Apr 16 '10 #1
9 12462
8,658 Expert Mod 8TB
ask your source to send you well-formed XML (which includes escaping such characters).

if the XML is formatted linewise, you may be able to use a regex

(not C# code)
Expand|Select|Wrap|Line Numbers
  1. // this ain’t correct either, but currently I don’t know how to describe it better
  2. <([^ >]+)[^>]*>(.+)</\1>
  3. // 1st parenthesis: opening XML tag name
  4. // 2nd parenthesis: anything that is between 1 and 3
  5. // 3rd: closing XML tag bearing the same content as 1
Apr 16 '10 #2
no they wont.

i have to handle this my side.

Apr 16 '10 #3
8,658 Expert Mod 8TB
no they wont.
notify them they don’t serve (well-formed) XML (which contradicts the meaning of XML)
Apr 16 '10 #4
if you wont see me down,

could you please explain how to use your regular expression <([^ >]+)[^>]*>(.+)</\1>
just a small example would be great plz
Apr 16 '10 #5
8,658 Expert Mod 8TB
I don’t know C#, so I can’t. besides, I would complain about the invalid XML a lot, there are web standards, after all.
Apr 16 '10 #6
can we do similar to this

it checks for html file.
also it is in Php i dont understand it.

any help?
Apr 16 '10 #7
8,658 Expert Mod 8TB
you will encounter the same problems discussed there. no helping it because the invalid XML is the source of the problem.

otherwise it would work for you too, if, and only if, you have the necessary simplicity the regex requires.

can’t help with C# though.
Apr 16 '10 #8
2,057 Expert 2GB
This is a pretty good algorithm problem. You won't be able to really use any xml parsing tools since this isn't xml, unless you're using them to find the errors. Unpolished idea follows:
Create a stack to keep track of your elements. Assuming you know your root element is <Incident>Take in input, and output it until you reach a <Incident> start node. From here the real work begins.

1. Push <Incident> to the stack. All following output will be considered within the domain of the <Incident> node.

While we have input:
2. Continue parsing until you reach a <. If you see a lone >, convert it
immediately to &gt;

3. When you see a <, add all the preceding output to the current node.

4. Check if this is a start element, or an end element, a complete empty element or just a lone <.
Lone < - easy convert to &lt;
Empty element. <- render complete element.
Start element - Push a new element to the stack and continue.
End element - Peek at the stack. If it matches, good. End the output for this element, and pop the element from the stack. Add this output to the preceding element. If there is not a match, this means either:
1. One or more preceding tags are invalid.
2. This is an invalid tag, just render it
To check 1. try looking through the stack to find the corresponding start element. If it's not found, then we go the route of 2. If it is found, then we treat all the start elements in between as just text.

Eg if we have <a><b><c><d></b>...
Then we look at the top, d and have no match. We go backwards , c-no match, b-match. So we add the following output to the a element. <b>&lt;c&gt;&lt;d&gt;</b>

If you have tag soup like : <r> <b> </r> </b>
this algorithm will render the first one first. eg, <r> &lt;b&gt; </r> &lt;/b&gt;

This algorithm is not guaranteed to give you what you want. It depends on the volatility of your xml.

I am now seriously considering writing such a program for my own personal use, since I come across this problem a lot.
Apr 16 '10 #9
2,057 Expert 2GB
Unfortunately for you I use primarily Java, but it shouldn't be that hard to port.
Expand|Select|Wrap|Line Numbers
  1. /**
  2.  * @author jkmyoung
  3.  * Simple Class to store Element name and inner xml, allowing for easy addition
  4.  */
  5. public static class Element{
  6.     StringBuffer sb; // contents of element
  7.     String ename;    // name of element
  9.     /**
  10.      * Creates element with given start tag
  11.      * @param startTag starting tag. May contain spaces.
  12.      */
  13.     Element(String startTag){
  14.         ename = startTag;
  15.         sb = new StringBuffer();
  16.     }
  18.     Element(){
  19.         ename = null;
  20.         sb = new StringBuffer();
  21.     }
  22.     /**
  23.      * Adds input to the element
  24.      * @param input  The string input to be added.
  25.      */
  26.     void Add(String input){
  27.         sb.append(input);
  28.     }
  29.     /**
  30.      * Adds input to the element
  31.      * @param input  The string input to be added.
  32.      */
  33.     void Add(char input){
  34.         sb.append(input);
  35.     }
  37.     /**
  38.      * Abruptly ends the element, treating it as text.
  39.      * Can be used as a debugging function as well.
  40.      * There may be 
  41.      * @pre Element is not the root.
  42.      */
  43.     String Truncate(){
  44.         return "&lt;"+ename+"&gt;"+sb.toString();
  45.     }
  47.     /**
  48.      * Ends and outputs the entire element.
  49.      * @param endTag given endTag.
  50.      * @pre Element is not the root, endTag matches startTag. 
  51.      */
  52.     String ElementEnd(String endTag){
  53.         //sb.append("<"); // slight speed increase if added to sb first, but this makes it hard to debug.
  54.         //sb.append(endTag);
  55.         //sb.append(">");
  56.         //return "<"+ename+">"+sb.toString();
  57.         return "<"+ename+">"+sb.toString()+"<"+endTag+">";
  58.     }
  60.     /**
  61.      * Outputs the root
  62.      */
  63.     void OutputRoot(){
  64.         System.out.println(sb.toString());
  66.     }
  67. }
This is the basis for the code I am working on. Need to test more before I release here.
Apr 16 '10 #10

Sign in to post your reply or Sign up for a free account.

Similar topics

by: Pikkel | last post by:
i'm looking for a way to replace special characters with characters without accents, cedilles, etc.
by: Jens Kristensen | last post by:
I have a problem displaying a divbox containing a html-textarea - everything works fine with "normal" characters. However, when the textarea contains special chars like <P> or ' , the box fails to...
by: mr h q | last post by:
Hi all, i want to replace $ to \$ so linux can work with paths and filenames that contain $. I wrote the following code for(string::size_type i = s.find(exist, 0); i != string::npos; i =...
by: Ewok | last post by:
let me just say. it's not by choice but im dealing with a .net web app (top down approach with VB and a MySQL database) sigh..... Anyhow, I've just about got all the kinks worked out but I am...
by: bobbie.matera | last post by:
I have the system setup to import a DomainList.csv file into a table called tblDmnLst. It contains a column called "NitchMarket" (datatype = text 75 character) where the user has discribed the...
by: Carl Mercier | last post by:
Hi, Is it possible to use special characters like \n or \t in a VB.NET string, just like in C#? My guess is NO, but maybe there's something I don't know. If it's not possible, does anybody...
by: reynard.michel | last post by:
Hi, I've some xml documents who contains the  character. Since I don't need this char I thought about removing it with the xslt 2.0 of altova replace($node, '', '') But unfortunately it...
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.