473,320 Members | 1,691 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

Replace special characters in xml but do not replace tags

My requirement is that, i recieve a xml file that may contain characters like "<" or ">".

for example like this
<?xml version="1.0" encoding="ISO-8859-1" ?>
<Incident>
<RecordDivision> abc>efg</RecordDivision>
<Incident>

now i want to replace the ">" character in the string abc>efg.

this is just a sample xml. i may recieve any type of xml.

So when i use some other methods i found by googling, they are replacing the xml tags characters "<" ,">" also.

My requirement is tags should not be replaced but the content should be.
i am using c#
Apr 16 '10 #1
9 12456
Dormilich
8,658 Expert Mod 8TB
ask your source to send you well-formed XML (which includes escaping such characters).

if the XML is formatted linewise, you may be able to use a regex

(not C# code)
Expand|Select|Wrap|Line Numbers
  1. // this ain’t correct either, but currently I don’t know how to describe it better
  2. <([^ >]+)[^>]*>(.+)</\1>
  3. // 1st parenthesis: opening XML tag name
  4. // 2nd parenthesis: anything that is between 1 and 3
  5. // 3rd: closing XML tag bearing the same content as 1
Apr 16 '10 #2
no they wont.

i have to handle this my side.

Alas......
Apr 16 '10 #3
Dormilich
8,658 Expert Mod 8TB
no they wont.
notify them they don’t serve (well-formed) XML (which contradicts the meaning of XML)
Apr 16 '10 #4
if you wont see me down,

could you please explain how to use your regular expression <([^ >]+)[^>]*>(.+)</\1>
just a small example would be great plz
Apr 16 '10 #5
Dormilich
8,658 Expert Mod 8TB
I don’t know C#, so I can’t. besides, I would complain about the invalid XML a lot, there are web standards, after all.
Apr 16 '10 #6
can we do similar to this
http://www.webdeveloper.com/forum/ar.../t-210232.html

it checks for html file.
also it is in Php i dont understand it.

any help?
Apr 16 '10 #7
Dormilich
8,658 Expert Mod 8TB
you will encounter the same problems discussed there. no helping it because the invalid XML is the source of the problem.

otherwise it would work for you too, if, and only if, you have the necessary simplicity the regex requires.

can’t help with C# though.
Apr 16 '10 #8
jkmyoung
2,057 Expert 2GB
This is a pretty good algorithm problem. You won't be able to really use any xml parsing tools since this isn't xml, unless you're using them to find the errors. Unpolished idea follows:
Create a stack to keep track of your elements. Assuming you know your root element is <Incident>Take in input, and output it until you reach a <Incident> start node. From here the real work begins.

1. Push <Incident> to the stack. All following output will be considered within the domain of the <Incident> node.

While we have input:
2. Continue parsing until you reach a <. If you see a lone >, convert it
immediately to &gt;

3. When you see a <, add all the preceding output to the current node.

4. Check if this is a start element, or an end element, a complete empty element or just a lone <.
Lone < - easy convert to &lt;
Empty element. <- render complete element.
Start element - Push a new element to the stack and continue.
End element - Peek at the stack. If it matches, good. End the output for this element, and pop the element from the stack. Add this output to the preceding element. If there is not a match, this means either:
1. One or more preceding tags are invalid.
2. This is an invalid tag, just render it
To check 1. try looking through the stack to find the corresponding start element. If it's not found, then we go the route of 2. If it is found, then we treat all the start elements in between as just text.

Eg if we have <a><b><c><d></b>...
Then we look at the top, d and have no match. We go backwards , c-no match, b-match. So we add the following output to the a element. <b>&lt;c&gt;&lt;d&gt;</b>

If you have tag soup like : <r> <b> </r> </b>
this algorithm will render the first one first. eg, <r> &lt;b&gt; </r> &lt;/b&gt;


This algorithm is not guaranteed to give you what you want. It depends on the volatility of your xml.

--edit
I am now seriously considering writing such a program for my own personal use, since I come across this problem a lot.
Apr 16 '10 #9
jkmyoung
2,057 Expert 2GB
Unfortunately for you I use primarily Java, but it shouldn't be that hard to port.
Expand|Select|Wrap|Line Numbers
  1. /**
  2.  * @author jkmyoung
  3.  * Simple Class to store Element name and inner xml, allowing for easy addition
  4.  */
  5. public static class Element{
  6.     StringBuffer sb; // contents of element
  7.     String ename;    // name of element
  8.  
  9.     /**
  10.      * Creates element with given start tag
  11.      * @param startTag starting tag. May contain spaces.
  12.      */
  13.     Element(String startTag){
  14.         ename = startTag;
  15.         sb = new StringBuffer();
  16.     }
  17.  
  18.     Element(){
  19.         ename = null;
  20.         sb = new StringBuffer();
  21.     }
  22.     /**
  23.      * Adds input to the element
  24.      * @param input  The string input to be added.
  25.      */
  26.     void Add(String input){
  27.         sb.append(input);
  28.     }
  29.     /**
  30.      * Adds input to the element
  31.      * @param input  The string input to be added.
  32.      */
  33.     void Add(char input){
  34.         sb.append(input);
  35.     }
  36.  
  37.     /**
  38.      * Abruptly ends the element, treating it as text.
  39.      * Can be used as a debugging function as well.
  40.      * There may be 
  41.      * @pre Element is not the root.
  42.      */
  43.     String Truncate(){
  44.         return "&lt;"+ename+"&gt;"+sb.toString();
  45.     }
  46.  
  47.     /**
  48.      * Ends and outputs the entire element.
  49.      * @param endTag given endTag.
  50.      * @pre Element is not the root, endTag matches startTag. 
  51.      */
  52.     String ElementEnd(String endTag){
  53.         //sb.append("<"); // slight speed increase if added to sb first, but this makes it hard to debug.
  54.         //sb.append(endTag);
  55.         //sb.append(">");
  56.         //return "<"+ename+">"+sb.toString();
  57.         return "<"+ename+">"+sb.toString()+"<"+endTag+">";
  58.     }
  59.  
  60.     /**
  61.      * Outputs the root
  62.      */
  63.     void OutputRoot(){
  64.         System.out.println(sb.toString());
  65.  
  66.     }
  67. }
This is the basis for the code I am working on. Need to test more before I release here.
Apr 16 '10 #10

Sign in to post your reply or Sign up for a free account.

Similar topics

17
by: Pikkel | last post by:
i'm looking for a way to replace special characters with characters without accents, cedilles, etc.
3
by: Jens Kristensen | last post by:
I have a problem displaying a divbox containing a html-textarea - everything works fine with "normal" characters. However, when the textarea contains special chars like <P> or ' , the box fails to...
5
by: mr h q | last post by:
Hi all, i want to replace $ to \$ so linux can work with paths and filenames that contain $. I wrote the following code for(string::size_type i = s.find(exist, 0); i != string::npos; i =...
4
by: Ewok | last post by:
let me just say. it's not by choice but im dealing with a .net web app (top down approach with VB and a MySQL database) sigh..... Anyhow, I've just about got all the kinks worked out but I am...
6
by: bobbie.matera | last post by:
I have the system setup to import a DomainList.csv file into a table called tblDmnLst. It contains a column called "NitchMarket" (datatype = text 75 character) where the user has discribed the...
17
by: Carl Mercier | last post by:
Hi, Is it possible to use special characters like \n or \t in a VB.NET string, just like in C#? My guess is NO, but maybe there's something I don't know. If it's not possible, does anybody...
0
by: reynard.michel | last post by:
Hi, I've some xml documents who contains the  character. Since I don't need this char I thought about removing it with the xslt 2.0 of altova replace($node, '', '') But unfortunately it...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.