473,486 Members | 1,984 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

Finding and replacing Invalid Tokens in an XML document

Hi all,

I have a system which allows users to enter a message on a (PHP) website.
This message is then put into a (MySQL) Database.

A perl script then picks up the message and creates an XML document.

The webpages, database and XML are all UTF-8, however every now and then I
get an error in the XML parser that tells me I have an invalid token. This
occurs when the message contains particular characters, although I don't
know which characters - all I can see in the logs is the ANSI
representation (e.g. @^C). If I copy & paste into word the I get a square
box after the @ that takes two right cursor presses to go past.

My script catches that there is an invalid token, but rather than fail the
message completely, I would like to replace the bad characters with a
space.
Is there a simple way to find these characters, or do I have to
write a function that looks at the output of $@ from the eval and work out
where the character is from the line/column/byte information in order to
fix it?

FYI, the XML is created and parsed with XML::Simple and UTF-8 encoded with
encode. I have included a simplified snippet (written into this post, so
may contain typos) at the end of the email.

Cheers,

Ben

-- Snippet of Code --

# $MessageText is pulled from the database and may contain bad
characters.

# Build an array of the elements
my %arr;
$arr{'Message'}=encode("UTF-8", $MessageText);

# Convert the array into an XML Document with XMLOut
my $tempxml = new XML::Simple (NoAttr=>1, RootName=>'WebMessage');
my $xmldoc = "<?xml version=\"1.0\" encoding=\"UTF-8\">";
$xmldoc .= $tempxml->XMLout(\$arr);

# Parse the XML Document
my $tempxml2 = new XML::Simple (ForceArray => 1);
eval ($tempxml2->XMLin($xmldoc);};
if ($@)
{
# An error occurred. Usually an invalid token due to a bad character
# in $MessageText
}

Jan 6 '06 #1
0 5683

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
1906
by: Ajay | last post by:
hi! i am reading a file on the Web. How can i find out when it was last modified? thanks
13
15189
by: yaipa | last post by:
What would be the common sense way of finding a binary pattern in a ..bin file, say some 200 bytes, and replacing it with an updated pattern of the same length at the same offset? Also, the...
2
4346
by: Moon | last post by:
Seems I still haven't got the hang of all those window generating code in Javascript. I've got a page with about 15 photo thumbnails. When you click on a thumbnail a new window pops up which shows...
7
50866
by: Doug van Vianen | last post by:
I recently found the following JavaScript code which is supposed to let one find then use the ip address of the person accessing the web page containing the script. <SCRIPT...
3
6129
by: Danny | last post by:
Is there a way to find the size of a <div> element where the height either hasn't been specified or is set to 'auto'? document.getElementById('myId').style.height returns either nothing or...
1
2160
by: Andrew Poulos | last post by:
Say I have a page, which has been created by a third party, and the page may contain some pre-specified text. How can I find and replace that text dynamically? For example, if the page I have...
6
24198
by: tentstitcher | last post by:
Hi all: I have a source xml document with an element of type string. This element contains something like the following: <stringData> &lt;Header&gt; &lt;Body&gt; </stringData> I would like to apply an...
1
1221
by: Andrew Poulos | last post by:
Say I have some CSS, which is several hundred lines long, with the contents in this format: ..foo { blah color:#000; blah } ..bar { blah
2
2475
by: kevin.eugene08 | last post by:
hi all, i'm trying to replace a string with known "tokens" with values -- and am not sure how to do this effectively. Since i don't know the size of the string containing the tokens to be...
0
7099
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
6964
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
7123
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
1
6842
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
5430
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
3069
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
3070
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
1378
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
1
598
muto222
php
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.