473,802 Members | 1,940 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Escapes Sequences Not Working?

If you examine the complete XML below you will see an element "Notes"
consisting of...

<Notes>test replace test[LINE]&amp;[LINE]replace</Notes>

As you can see I have properly (I think) escaped the ampersand (&) with
"&amp;". If I place this XML in a file and open it with Internet Explorer
the ampersand is properly dealt with. In my Java servlet I am using a SAX
parser to parse the XML and write it to a database. When that parser gets
to the "Notes" element all that is returned is the characters up to (not
including) the ampersand in the escape sequence. Everything after that is
truncated. I have found that this will happen with any escape sequence
(since they all start with the ampersand).

I get no errors and the record is written to the database, just with a
truncated Notes field.

Any ideas what I can look for?

<?xml version="1.0"?>
<MBO>
<Record>
<ID>-49781293</ID>
<OrderDate>20 04-08-24 15:19:31</OrderDate>
<MemoBillType>5 </MemoBillType>
<AccountNum>1 </AccountNum>
<BillToAddress> TEST</BillToAddress>
<ShipToAddress> Same as Bill To Address</ShipToAddress>
<RegMgr>John Doe</RegMgr>
<SecCode>3080 40-860602</SecCode>
<Notes>test replace test[LINE]&amp;[LINE]replace</Notes>
<RequireDate>TE ST</RequireDate>
<RackInfo>TES T</RackInfo>
<CallPhoneNumbe r>TEST TEST</CallPhoneNumber >
<SubRecord_A>
<LineNum>1</LineNum>
<Quantity>1</Quantity>
<PartNum>TEST </PartNum>
<ShipDesignatio n>TEST</ShipDesignation >
<Price>NULL_VAL UE</Price>
<Discount>NULL_ VALUE</Discount>
<Notes>TEST TEST TEST</Notes>
</SubRecord_A>
</Record>
</MBO>
Jul 20 '05 #1
13 3069


Rick Brandt wrote:
If you examine the complete XML below you will see an element "Notes"
consisting of...

<Notes>test replace test[LINE]&amp;[LINE]replace</Notes>

As you can see I have properly (I think) escaped the ampersand (&) with
"&amp;". If I place this XML in a file and open it with Internet Explorer
the ampersand is properly dealt with. In my Java servlet I am using a SAX
parser to parse the XML and write it to a database. When that parser gets
to the "Notes" element all that is returned is the characters up to (not
including) the ampersand in the escape sequence. Everything after that is
truncated. I have found that this will happen with any escape sequence
(since they all start with the ampersand).


How does your SAX code look? You might get several chunks of character
data as the content of the <Notes> element.

--

Martin Honnen
http://JavaScript.FAQTs.com/
Jul 20 '05 #2
"Martin Honnen" <ma*******@yaho o.de> wrote in message
news:41******** *************** @newsread2.arco r-online.net...
How does your SAX code look? You might get several chunks of character
data as the content of the <Notes> element.


public void characters(char[] ch, int start, int length)
throws SAXException, DataSetExceptio n {
try {
if (elementStart) {
elementStart = false;
String s = new String(ch, start, length);

I'm using JBuilder 7 and it has a built in SAX parser object template that
extends DefaultHandler. The problem seems to be with the length argument
on the last line above. If I examine the ch[] array in debug mode it still
has all of the text from the "Notes" element, but the length argument being
passed from the parser is (for some reason) being set to the first
occurrence of an ampersand instead of extending to the element close tag.
So the String s that I use for insertion to the database is truncated.
--
I don't check the Email account attached
to this message. Send instead to...
RBrandt at Hunter dot com


Jul 20 '05 #3
In article <2p************ @uni-berlin.de>,
Rick Brandt <ri*********@ho tmail.com> wrote:
I'm using JBuilder 7 and it has a built in SAX parser object template that
extends DefaultHandler. The problem seems to be with the length argument
on the last line above. If I examine the ch[] array in debug mode it still
has all of the text from the "Notes" element, but the length argument being
passed from the parser is (for some reason) being set to the first
occurrence of an ampersand instead of extending to the element close tag.
So the String s that I use for insertion to the database is truncated.


And you don't get more calls to characters() with the rest of the string?
There's no guarantee you will get it all at once.

-- Richard
Jul 20 '05 #4
In <comp.text.xm l> Rick Brandt <ri*********@ho tmail.com> wrote:
If you examine the complete XML below you will see an element "Notes"
consisting of...

<Notes>test replace test[LINE]&amp;[LINE]replace</Notes>

As you can see I have properly (I think) escaped the ampersand (&)
with "&amp;". If I place this XML in a file and open it with Internet
Explorer the ampersand is properly dealt with. In my Java servlet I am
using a SAX parser to parse the XML and write it to a database. When
that parser gets to the "Notes" element all that is returned is the
characters up to (not including) the ampersand in the escape sequence.
Everything after that is truncated. I have found that this will
happen with any escape sequence (since they all start with the
ampersand).

I get no errors and the record is written to the database, just with a
truncated Notes field.

Any ideas what I can look for?


At least with Expat XML parser, I get 3 calls, ie.
test replace test[LINE]
&
[LINE]replace
So, collect all data until end of <Notes> element.

--
William Park <op**********@y ahoo.ca>
Open Geometry Consulting, Toronto, Canada
Jul 20 '05 #5
"Richard Tobin" <ri*****@cogsci .ed.ac.uk> wrote in message
news:cg******** ***@pc-news.cogsci.ed. ac.uk...
In article <2p************ @uni-berlin.de>,
Rick Brandt <ri*********@ho tmail.com> wrote:
I'm using JBuilder 7 and it has a built in SAX parser object template thatextends DefaultHandler. The problem seems to be with the length argumenton the last line above. If I examine the ch[] array in debug mode it stillhas all of the text from the "Notes" element, but the length argument beingpassed from the parser is (for some reason) being set to the first
occurrence of an ampersand instead of extending to the element close tag.So the String s that I use for insertion to the database is truncated.


And you don't get more calls to characters() with the rest of the string?
There's no guarantee you will get it all at once.


Should I get those "more calls" automatically or do I have to put in some
kind of loop? Why wouldn't Characters() return ALL characters between the
<> and </>? Isn't that what the parser's job is?

I was originally wrapping all of my text elements in CDATA sections, but I
ran into a problem where any CDATA section with the string "replace" in it
raised a Parse Error (previous newsgroup thread where I received no
answers).

I decided I would just escape all of the illegal XML characters instead of
using CDATA and now I have this truncation issue.

I appreciate the help.
--
I don't check the Email account attached
to this message. Send instead to...
RBrandt at Hunter dot com

Jul 20 '05 #6
"William Park" <op**********@y ahoo.ca> wrote in message
news:2p******** ****@uni-berlin.de...
At least with Expat XML parser, I get 3 calls, ie.
test replace test[LINE]
&
[LINE]replace
So, collect all data until end of <Notes> element.


OK I found this at a SAX FAQ site...

*************** *************** ***********
The ContentHandler. characters() callback is missing data!

Please read the JavaDoc for this method. A parser may split text into any
number of separate chunks, and some characters may be reported using
ignorableWhites pace() instead of this callback. If you want all the text
inside an element, you need to collect the text from the various characters
callbacks into a buffer. Only when you see the endElement event can you be
sure that you have seen all the text, and some of it may really "belong" to
child elements. \
*************** *************** ************

This appears to say that I am using the wrong event. It would be a major
re-write to move my code to the EndElement() event, but if I have to I
guess I have to, but then I might have child element characters included
that I don't want? How do I avoid the child element characters? The FAQ
doesn't go into that at all.
--
I don't check the Email account attached
to this message. Send instead to...
RBrandt at Hunter dot com

Jul 20 '05 #7
In article <2p************ @uni-berlin.de>,
Rick Brandt <ri*********@ho tmail.com> wrote:
Should I get those "more calls" automatically
Yes. Quite likely you will get thre calls in this case.
I was originally wrapping all of my text elements in CDATA sections, but I
ran into a problem where any CDATA section with the string "replace" in it
raised a Parse Error (previous newsgroup thread where I received no
answers).


Maybe you should try a different parser!

-- Richard
Jul 20 '05 #8
"Richard Tobin" <ri*****@cogsci .ed.ac.uk> wrote in message
news:cg******** ***@pc-news.cogsci.ed. ac.uk...
In article <2p************ @uni-berlin.de>,
Rick Brandt <ri*********@ho tmail.com> wrote:
Should I get those "more calls" automatically


Yes. Quite likely you will get thre calls in this case.
I was originally wrapping all of my text elements in CDATA sections, but Iran into a problem where any CDATA section with the string "replace" in itraised a Parse Error (previous newsgroup thread where I received no
answers).


Maybe you should try a different parser!


AFAIK I am using the one that comes with java 1.4.2_04-b05. The import
statements in my SAX class are...

org.xml.sax.*;
org.xml.sax.hel pers.*;


Jul 20 '05 #9
"Rick Brandt" <ri*********@ho tmail.com> wrote in message
news:2p******** ****@uni-berlin.de...
"William Park" <op**********@y ahoo.ca> wrote in message
news:2p******** ****@uni-berlin.de...
At least with Expat XML parser, I get 3 calls, ie.
test replace test[LINE]
&
[LINE]replace
So, collect all data until end of <Notes> element.
OK I found this at a SAX FAQ site...

*************** *************** ***********
The ContentHandler. characters() callback is missing data!

Please read the JavaDoc for this method. A parser may split text into any
number of separate chunks, and some characters may be reported using
ignorableWhites pace() instead of this callback. If you want all the text
inside an element, you need to collect the text from the various

characters callbacks into a buffer. Only when you see the endElement event can you be sure that you have seen all the text, and some of it may really "belong" to child elements. \
*************** *************** ************

This appears to say that I am using the wrong event. It would be a major
re-write to move my code to the EndElement() event, but if I have to I
guess I have to, but then I might have child element characters included
that I don't want? How do I avoid the child element characters? The FAQ
doesn't go into that at all.


Ok, I found yet another reference...

*************** *************** ***************
Note that a SAX driver is free to chunk the character data any way it
wants, so you cannot count on all of the character data content of an
element arriving in a single characters event.
*************** *************** ***************

So it appears that this is working "as designed" yet none of the examples I
see on these same pages describe methods for properly dealing with the
characters() event.

Immediately prior to the statement above the site uses an example for
pulling the data from the characters event that clearly will NOT work if
the parser decides to "chunk" the data into multiple pieces.

I guess I will look at collecting the pieces in characters and not writing
them until endElement(). I just wish I could fix the CDATA bug as this was
working fine for 3 or 4 years before that started happening. Either CDATA
forces all of the text in the characters event to be pulled in a single
block or we just got really lucky for all that time because I never saw any
truncation until the CDATA section was removed.
--
I don't check the Email account attached
to this message. Send instead to...
RBrandt at Hunter dot com
Jul 20 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
3374
by: Thomas Philips | last post by:
I have been playing around with reading strings with embedded escape sequences from files both using readline() and codecs.open() and have a question.I create a file "test.txt" with exactly one line: 1\na\n\n2\n\n3 I then open test.txt and then read it using readline(): >>> input_file=file("test.txt") >>> x=input_file.readline() >>> x
10
2795
by: Vilson farias | last post by:
Greetings, I'm getting a big performance problem and I would like to ask you what would be the reason, but first I need to explain how it happens. Let's suppose I can't use sequences (it seams impossible but my boss doesn't like specific database features like this one). For sequence simulation I had created a table called cnfg_key_generation and each tuple holds information for one of my tables (tablename,
6
8886
by: Chris Anderson | last post by:
Anyone know of a fix (ideally) or an easy workaround to the problem of escape characters not working in regex replacement text? They just come out as literal text For example, you'd think that thi Regex.Replace("<stuff>text</stuff>", "<stuff>", "<stuff>\n" would give yo <stuff text</stuff
2
2592
by: David J Birnbaum | last post by:
Dear Python-list, I need to read a Unicode (utf-8) file that contains text like: I get my input and then process it with something like: When Python encounters the "\f" substring in an input line, it wants to treat it as an escape sequence representing a form-feed control character, which means that it gets interpreted as (or, from my perspective, translated to) "\x0c". Were I entering this string myself within my program code, I could...
18
3289
by: psbasha | last post by:
Hi, I would like to know what naming conventions we can follow for the following types of variables/sequences : Variables : ------------- Integer Float Boolean
3
8198
by: John Nagle | last post by:
I have XML replies in a DOM which contain entity escapes, like "&amp;". What's the proper way to replace them with the ordinary characters? Preferably something that will work in most browsers? I know about ".innerText", but that's not portable; some browsers convert escapes when reading from innerText and some don't. John Nagle
4
4283
by: JJ | last post by:
Is there a way of checking that a line with escape sequences in it, has no strings in it (apart from the escape sequences)? i.e. a line with \n\t\t\t\t\t\t\t\r\n would have no string in it a line with \n\t\t\t\thello\t\t\n would hve the string 'hello' in it. In others words, is there a method of removing all escape sequences from a string? I've tried Regex.Unescape(string) but this doesn't not seem to remove the
7
2283
by: JMan | last post by:
Hi, i have some xml that looks like this: <parent> <a>1,2,3,4,5,6,7,8</a> <b>a,b,c,d,e,f,g,h</b> </parent> what i need i this: <parent>
5
2304
mikek12004
by: mikek12004 | last post by:
I was wondering why PHP escapes the single quotes in a GET or POST variable? is it just for display purposes or the single quot can mess up other things too? And it escapes just the single quote or also and other characters?
0
10304
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10285
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10063
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
7598
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6838
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5494
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5622
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
3792
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2966
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.