473,323 Members | 1,560 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,323 software developers and data experts.

Escapes Sequences Not Working?

If you examine the complete XML below you will see an element "Notes"
consisting of...

<Notes>test replace test[LINE]&amp;[LINE]replace</Notes>

As you can see I have properly (I think) escaped the ampersand (&) with
"&amp;". If I place this XML in a file and open it with Internet Explorer
the ampersand is properly dealt with. In my Java servlet I am using a SAX
parser to parse the XML and write it to a database. When that parser gets
to the "Notes" element all that is returned is the characters up to (not
including) the ampersand in the escape sequence. Everything after that is
truncated. I have found that this will happen with any escape sequence
(since they all start with the ampersand).

I get no errors and the record is written to the database, just with a
truncated Notes field.

Any ideas what I can look for?

<?xml version="1.0"?>
<MBO>
<Record>
<ID>-49781293</ID>
<OrderDate>2004-08-24 15:19:31</OrderDate>
<MemoBillType>5</MemoBillType>
<AccountNum>1</AccountNum>
<BillToAddress>TEST</BillToAddress>
<ShipToAddress>Same as Bill To Address</ShipToAddress>
<RegMgr>John Doe</RegMgr>
<SecCode>308040-860602</SecCode>
<Notes>test replace test[LINE]&amp;[LINE]replace</Notes>
<RequireDate>TEST</RequireDate>
<RackInfo>TEST</RackInfo>
<CallPhoneNumber>TEST TEST</CallPhoneNumber>
<SubRecord_A>
<LineNum>1</LineNum>
<Quantity>1</Quantity>
<PartNum>TEST</PartNum>
<ShipDesignation>TEST</ShipDesignation>
<Price>NULL_VALUE</Price>
<Discount>NULL_VALUE</Discount>
<Notes>TEST TEST TEST</Notes>
</SubRecord_A>
</Record>
</MBO>
Jul 20 '05 #1
13 3039


Rick Brandt wrote:
If you examine the complete XML below you will see an element "Notes"
consisting of...

<Notes>test replace test[LINE]&amp;[LINE]replace</Notes>

As you can see I have properly (I think) escaped the ampersand (&) with
"&amp;". If I place this XML in a file and open it with Internet Explorer
the ampersand is properly dealt with. In my Java servlet I am using a SAX
parser to parse the XML and write it to a database. When that parser gets
to the "Notes" element all that is returned is the characters up to (not
including) the ampersand in the escape sequence. Everything after that is
truncated. I have found that this will happen with any escape sequence
(since they all start with the ampersand).


How does your SAX code look? You might get several chunks of character
data as the content of the <Notes> element.

--

Martin Honnen
http://JavaScript.FAQTs.com/
Jul 20 '05 #2
"Martin Honnen" <ma*******@yahoo.de> wrote in message
news:41***********************@newsread2.arcor-online.net...
How does your SAX code look? You might get several chunks of character
data as the content of the <Notes> element.


public void characters(char[] ch, int start, int length)
throws SAXException, DataSetException {
try {
if (elementStart) {
elementStart = false;
String s = new String(ch, start, length);

I'm using JBuilder 7 and it has a built in SAX parser object template that
extends DefaultHandler. The problem seems to be with the length argument
on the last line above. If I examine the ch[] array in debug mode it still
has all of the text from the "Notes" element, but the length argument being
passed from the parser is (for some reason) being set to the first
occurrence of an ampersand instead of extending to the element close tag.
So the String s that I use for insertion to the database is truncated.
--
I don't check the Email account attached
to this message. Send instead to...
RBrandt at Hunter dot com


Jul 20 '05 #3
In article <2p************@uni-berlin.de>,
Rick Brandt <ri*********@hotmail.com> wrote:
I'm using JBuilder 7 and it has a built in SAX parser object template that
extends DefaultHandler. The problem seems to be with the length argument
on the last line above. If I examine the ch[] array in debug mode it still
has all of the text from the "Notes" element, but the length argument being
passed from the parser is (for some reason) being set to the first
occurrence of an ampersand instead of extending to the element close tag.
So the String s that I use for insertion to the database is truncated.


And you don't get more calls to characters() with the rest of the string?
There's no guarantee you will get it all at once.

-- Richard
Jul 20 '05 #4
In <comp.text.xml> Rick Brandt <ri*********@hotmail.com> wrote:
If you examine the complete XML below you will see an element "Notes"
consisting of...

<Notes>test replace test[LINE]&amp;[LINE]replace</Notes>

As you can see I have properly (I think) escaped the ampersand (&)
with "&amp;". If I place this XML in a file and open it with Internet
Explorer the ampersand is properly dealt with. In my Java servlet I am
using a SAX parser to parse the XML and write it to a database. When
that parser gets to the "Notes" element all that is returned is the
characters up to (not including) the ampersand in the escape sequence.
Everything after that is truncated. I have found that this will
happen with any escape sequence (since they all start with the
ampersand).

I get no errors and the record is written to the database, just with a
truncated Notes field.

Any ideas what I can look for?


At least with Expat XML parser, I get 3 calls, ie.
test replace test[LINE]
&
[LINE]replace
So, collect all data until end of <Notes> element.

--
William Park <op**********@yahoo.ca>
Open Geometry Consulting, Toronto, Canada
Jul 20 '05 #5
"Richard Tobin" <ri*****@cogsci.ed.ac.uk> wrote in message
news:cg***********@pc-news.cogsci.ed.ac.uk...
In article <2p************@uni-berlin.de>,
Rick Brandt <ri*********@hotmail.com> wrote:
I'm using JBuilder 7 and it has a built in SAX parser object template thatextends DefaultHandler. The problem seems to be with the length argumenton the last line above. If I examine the ch[] array in debug mode it stillhas all of the text from the "Notes" element, but the length argument beingpassed from the parser is (for some reason) being set to the first
occurrence of an ampersand instead of extending to the element close tag.So the String s that I use for insertion to the database is truncated.


And you don't get more calls to characters() with the rest of the string?
There's no guarantee you will get it all at once.


Should I get those "more calls" automatically or do I have to put in some
kind of loop? Why wouldn't Characters() return ALL characters between the
<> and </>? Isn't that what the parser's job is?

I was originally wrapping all of my text elements in CDATA sections, but I
ran into a problem where any CDATA section with the string "replace" in it
raised a Parse Error (previous newsgroup thread where I received no
answers).

I decided I would just escape all of the illegal XML characters instead of
using CDATA and now I have this truncation issue.

I appreciate the help.
--
I don't check the Email account attached
to this message. Send instead to...
RBrandt at Hunter dot com

Jul 20 '05 #6
"William Park" <op**********@yahoo.ca> wrote in message
news:2p************@uni-berlin.de...
At least with Expat XML parser, I get 3 calls, ie.
test replace test[LINE]
&
[LINE]replace
So, collect all data until end of <Notes> element.


OK I found this at a SAX FAQ site...

*****************************************
The ContentHandler.characters() callback is missing data!

Please read the JavaDoc for this method. A parser may split text into any
number of separate chunks, and some characters may be reported using
ignorableWhitespace() instead of this callback. If you want all the text
inside an element, you need to collect the text from the various characters
callbacks into a buffer. Only when you see the endElement event can you be
sure that you have seen all the text, and some of it may really "belong" to
child elements. \
******************************************

This appears to say that I am using the wrong event. It would be a major
re-write to move my code to the EndElement() event, but if I have to I
guess I have to, but then I might have child element characters included
that I don't want? How do I avoid the child element characters? The FAQ
doesn't go into that at all.
--
I don't check the Email account attached
to this message. Send instead to...
RBrandt at Hunter dot com

Jul 20 '05 #7
In article <2p************@uni-berlin.de>,
Rick Brandt <ri*********@hotmail.com> wrote:
Should I get those "more calls" automatically
Yes. Quite likely you will get thre calls in this case.
I was originally wrapping all of my text elements in CDATA sections, but I
ran into a problem where any CDATA section with the string "replace" in it
raised a Parse Error (previous newsgroup thread where I received no
answers).


Maybe you should try a different parser!

-- Richard
Jul 20 '05 #8
"Richard Tobin" <ri*****@cogsci.ed.ac.uk> wrote in message
news:cg***********@pc-news.cogsci.ed.ac.uk...
In article <2p************@uni-berlin.de>,
Rick Brandt <ri*********@hotmail.com> wrote:
Should I get those "more calls" automatically


Yes. Quite likely you will get thre calls in this case.
I was originally wrapping all of my text elements in CDATA sections, but Iran into a problem where any CDATA section with the string "replace" in itraised a Parse Error (previous newsgroup thread where I received no
answers).


Maybe you should try a different parser!


AFAIK I am using the one that comes with java 1.4.2_04-b05. The import
statements in my SAX class are...

org.xml.sax.*;
org.xml.sax.helpers.*;


Jul 20 '05 #9
"Rick Brandt" <ri*********@hotmail.com> wrote in message
news:2p************@uni-berlin.de...
"William Park" <op**********@yahoo.ca> wrote in message
news:2p************@uni-berlin.de...
At least with Expat XML parser, I get 3 calls, ie.
test replace test[LINE]
&
[LINE]replace
So, collect all data until end of <Notes> element.
OK I found this at a SAX FAQ site...

*****************************************
The ContentHandler.characters() callback is missing data!

Please read the JavaDoc for this method. A parser may split text into any
number of separate chunks, and some characters may be reported using
ignorableWhitespace() instead of this callback. If you want all the text
inside an element, you need to collect the text from the various

characters callbacks into a buffer. Only when you see the endElement event can you be sure that you have seen all the text, and some of it may really "belong" to child elements. \
******************************************

This appears to say that I am using the wrong event. It would be a major
re-write to move my code to the EndElement() event, but if I have to I
guess I have to, but then I might have child element characters included
that I don't want? How do I avoid the child element characters? The FAQ
doesn't go into that at all.


Ok, I found yet another reference...

*********************************************
Note that a SAX driver is free to chunk the character data any way it
wants, so you cannot count on all of the character data content of an
element arriving in a single characters event.
*********************************************

So it appears that this is working "as designed" yet none of the examples I
see on these same pages describe methods for properly dealing with the
characters() event.

Immediately prior to the statement above the site uses an example for
pulling the data from the characters event that clearly will NOT work if
the parser decides to "chunk" the data into multiple pieces.

I guess I will look at collecting the pieces in characters and not writing
them until endElement(). I just wish I could fix the CDATA bug as this was
working fine for 3 or 4 years before that started happening. Either CDATA
forces all of the text in the characters event to be pulled in a single
block or we just got really lucky for all that time because I never saw any
truncation until the CDATA section was removed.
--
I don't check the Email account attached
to this message. Send instead to...
RBrandt at Hunter dot com
Jul 20 '05 #10
"Rick Brandt" <ri*********@hotmail.com> wrote in message
news:2p************@uni-berlin.de...
I was originally wrapping all of my text elements in CDATA sections,
but
Iran into a problem where any CDATA section with the string "replace" in itraised a Parse Error (previous newsgroup thread where I received no
answers).


Maybe you should try a different parser!


AFAIK I am using the one that comes with java 1.4.2_04-b05. The import
statements in my SAX class are...

org.xml.sax.*;
org.xml.sax.helpers.*;


Weird! I'm using the JAXP/DOM APIs built into Java SDK version 1.4.2_04.
(Linux) I can't reproduce an error with a CDATA section containing
"replace".

I think this CDATA problem is worth digging into. Can you post (or send me)
sample code and text?

/kmc
Jul 20 '05 #11
"Keith M. Corbett" <km*@world.std.com> wrote in message
news:Ta********************@comcast.com...
Weird! I'm using the JAXP/DOM APIs built into Java SDK version 1.4.2_04.
(Linux) I can't reproduce an error with a CDATA section containing
"replace".

I think this CDATA problem is worth digging into. Can you post (or send me) sample code and text?


Well, here's the full story on that. I think what I'm seeing is a bug in
IPlanet's web application server which is what our production web servers
run.

About 2 months ago I had a user reporting errors when submitting data to my
Java servlet over an HTTP request. At the time we isolated it to when a
line-item note field was too long (or so we thought). The problem does NOT
happen when I point the client at the servlet running in my JBuilder
environment (which uses Tomcat) so I was stumped troubleshooting it. The
notes are somewhat of a non-critical field so I asked him to just keep them
short until I could investigate further.

Last week he reported the same problem only it was with a parent note
field. This time I was able to determine that it wasn't the length at all,
but rather that any time the string "replace" occurred. I then tested my
other client apps which send data over HTTP in a similar fashion. Every
single one of them bombs if I include "replace" in a CDATA section.

The error reported from the servlet is "root node missing" which I believe
is being raised because the parser is in fact not being passed any data at
all. I then discovered that the word replace was harmless if it was not in
a CDATA section so since I seemed to have few troubleshooting options I
decided to just escape all illegal XML characters and drop the CDATA
section. At initial design the CDATA looked like the easiest way to handle
the data entered by the user instead of doing a bunch of Replace()
functions. Now I'll have to rewrite all of my SAX parsing code because of
this issue with characters() breaking the text into chunks. Apparently it
uses the ampersand as the "chunk delimiter".

This CDATA problem definitely has some variability to it because while I
can reproduce the problem myself, I have never had any other user complain
of this (around 30) and I can find records in the database that contain the
word "replace" which apparently made it through ok.
--
I don't check the Email account attached
to this message. Send instead to...
RBrandt at Hunter dot com
Jul 20 '05 #12
In <comp.text.xml> Rick Brandt <ri*********@hotmail.com> wrote:
I guess I will look at collecting the pieces in characters and not
writing them until endElement(). I just wish I could fix the CDATA
bug as this was working fine for 3 or 4 years before that started
happening. Either CDATA forces all of the text in the characters
event to be pulled in a single block or we just got really lucky for
all that time because I never saw any truncation until the CDATA
section was removed.


You were just lucky. :-)

If you're using (or can use) Bash shell, then collecting all texts
inside <Notes> or any other element is simple. Assuming elements
containing data are not nested,

start () { # Usage: start tag att=value ...
case $1 in
Notes) unset data;;
esac
}
middle () { # Usage: middle text
case ${XML_ELEMENT_STACK[1]} in
Notes) data+="$1" ;;
esac
}
end () { # Usage: start tag
case $1 in
Notes) echo "$data" ;;
esac
}

Then,
xml -s start -d middle -e end "<Notes>aa&amp;bb</Notes>"
produces
aa&bb

Ref:
http://freshmeat.net/projects/bashdiff/
http://home.eol.ca/~parkw/index.html#xml
help xml
--
William Park <op**********@yahoo.ca>
Open Geometry Consulting, Toronto, Canada
Jul 20 '05 #13
On Wed, 25 Aug 2004 13:18:24 -0500, Rick Brandt wrote:
"Richard Tobin" <ri*****@cogsci.ed.ac.uk> wrote in message
news:cg***********@pc-news.cogsci.ed.ac.uk...
In article <2p************@uni-berlin.de>, Rick Brandt
<ri*********@hotmail.com> wrote:
>I'm using JBuilder 7 and it has a built in SAX parser object template that >extends DefaultHandler. The problem seems to be with the length argument >on the last line above. If I examine the ch[] array in debug mode it still >has all of the text from the "Notes" element, but the length argument being >passed from the parser is (for some reason) being set to the first
>occurrence of an ampersand instead of extending to the element close tag. >So the String s that I use for insertion to the database is truncated.


And you don't get more calls to characters() with the rest of the
string? There's no guarantee you will get it all at once.


Should I get those "more calls" automatically or do I have to put in
some kind of loop? Why wouldn't Characters() return ALL characters
between the <> and </>? Isn't that what the parser's job is?

You should get these "more calls" more or less automatically, but your
characters method has to allow for multiple calls with partial data.

The basic strategy is to setup a StringBuffer in the startElement method,
collect text into it in the characters method, and pull the whole result
out in the endElement method.

I don't know what triggers the division into multiple events, but it
sounds like the implementation you're using may be stopping on ampersands
to handle entities. I'd hope once you get your code dealing with the
multiple calls his will be transparent. Possibly use of a CDATA section
simplified the parsers job so it didn't need to do this.

But the characters method is definitely not guaranteed to return the
entire enclosed text, so you should do something like what I described
above.
Jul 20 '05 #14

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Thomas Philips | last post by:
I have been playing around with reading strings with embedded escape sequences from files both using readline() and codecs.open() and have a question.I create a file "test.txt" with exactly one...
10
by: Vilson farias | last post by:
Greetings, I'm getting a big performance problem and I would like to ask you what would be the reason, but first I need to explain how it happens. Let's suppose I can't use sequences (it seams...
6
by: Chris Anderson | last post by:
Anyone know of a fix (ideally) or an easy workaround to the problem of escape characters not working in regex replacement text? They just come out as literal text For example, you'd think that thi...
2
by: David J Birnbaum | last post by:
Dear Python-list, I need to read a Unicode (utf-8) file that contains text like: I get my input and then process it with something like: When Python encounters the "\f" substring in an input...
18
by: psbasha | last post by:
Hi, I would like to know what naming conventions we can follow for the following types of variables/sequences : Variables : ------------- Integer Float Boolean
3
by: John Nagle | last post by:
I have XML replies in a DOM which contain entity escapes, like "&amp;". What's the proper way to replace them with the ordinary characters? Preferably something that will work in most browsers? I...
4
by: JJ | last post by:
Is there a way of checking that a line with escape sequences in it, has no strings in it (apart from the escape sequences)? i.e. a line with \n\t\t\t\t\t\t\t\r\n would have no string in it a...
7
by: JMan | last post by:
Hi, i have some xml that looks like this: <parent> <a>1,2,3,4,5,6,7,8</a> <b>a,b,c,d,e,f,g,h</b> </parent> what i need i this: <parent>
5
mikek12004
by: mikek12004 | last post by:
I was wondering why PHP escapes the single quotes in a GET or POST variable? is it just for display purposes or the single quot can mess up other things too? And it escapes just the single quote or...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.