Hi,
I have a servlet (running under tomcat 4.1, java 1.4.2) that sends XML in
the HTTP body from a servlet. The I want the XML to be encoded in UTF-8.
when I run Tomcat on windows 2000, the XML appears fine on the client end,
but running Tomcat on debian woody linux, accented characters don't appear
correctly. In the XML output stream, each accented character comes out as
two characters, so obviously the fact that it's supposed to be UTF-8 is
being lost.
here's how I'm streaming the XML:
response.setContentType("text/xml");
OutputStream os = response.getOutputStream();
OutputStreamWriter osw = new OutputStreamWriter(os , "UTF-8");
PrintWriter pw = new PrintWriter(osw);
pw.print("..all the xml..")
If, instead of writing to the response object, I write to a
FileOutputStream, the accented characters appear OK in the file.
I'm a bit stuck here because when I wrote this code, I read up all about
character encoding and did what I thought was right, and it all worked on my
Win2000 test system. I can't figure out what could be going wrong on the
linux box.
many thanks for any advice on hints,
Andy 7 14177
Andy Fish wrote: Hi,
I have a servlet (running under tomcat 4.1, java 1.4.2) that sends XML in the HTTP body from a servlet. The I want the XML to be encoded in UTF-8.
when I run Tomcat on windows 2000, the XML appears fine on the client end, but running Tomcat on debian woody linux, accented characters don't appear correctly. In the XML output stream, each accented character comes out as two characters, so obviously the fact that it's supposed to be UTF-8 is being lost.
How do you check the XML? With a browser? here's how I'm streaming the XML:
response.setContentType("text/xml");
Maybe you can add the encoding to the HTTP header:
response.setContentType("text/xml;charset=utf-8");
f'up2 c.t.x
--
Johannes Koch
In te domine speravi; non confundar in aeternum.
(Te Deum, 4th cent.)
Thanks very much Johannes, that fixed it perfectly - isn't usenet just the
best thing ever? :-)))
I'm still not sure why it works differently on windows 2000 - maybe down to
the native locale of the OS or something I guess.(or possibly slightly
different version of tomcat)
"Johannes Koch" <ko**@w3development.de> wrote in message
news:c2*************@ID-61067.news.uni-berlin.de... Andy Fish wrote: Hi,
I have a servlet (running under tomcat 4.1, java 1.4.2) that sends XML
in the HTTP body from a servlet. The I want the XML to be encoded in UTF-8.
when I run Tomcat on windows 2000, the XML appears fine on the client
end, but running Tomcat on debian woody linux, accented characters don't
appear correctly. In the XML output stream, each accented character comes out
as two characters, so obviously the fact that it's supposed to be UTF-8 is being lost.
How do you check the XML? With a browser?
here's how I'm streaming the XML:
response.setContentType("text/xml");
Maybe you can add the encoding to the HTTP header: response.setContentType("text/xml;charset=utf-8");
f'up2 c.t.x -- Johannes Koch In te domine speravi; non confundar in aeternum. (Te Deum, 4th cent.)
Andy Fish wrote: Hi,
I have a servlet (running under tomcat 4.1, java 1.4.2) that sends XML in the HTTP body from a servlet. The I want the XML to be encoded in UTF-8.
when I run Tomcat on windows 2000, the XML appears fine on the client end, but running Tomcat on debian woody linux, accented characters don't appear correctly. In the XML output stream, each accented character comes out as two characters, so obviously the fact that it's supposed to be UTF-8 is being lost.
No, that's not obvious at all. Not from the information you have given.
Unicode provides for logical characters to be composed of two or more
characters; for instance, a lowercase u with an umlaut could be
represented as the latin lowercase 'u' followed by the umlaut "combining
character". Many of the more common combinations also have
single-character representations, including the u-umlaut example, and
pretty much all the "diacriticalized" characters used in Western
European languages. The alternative representations are equivalent as
far as Unicode is concerned, and Unicode processors are permitted to
freely substitute one for another. They should be displayed or printed
the same by a conformant processor.
Moreover, the fact that you are making judgements about the "UTF-8ness"
of the stream based on the character count leads me to wonder whether
perhaps you are confusing characters with bytes / octets, or whether you
misunderstand the nature of character encodings. The character count
has little to do with whether the characters are encoded in UTF-8;
rather it has everything to do with which character or characters have
been encoded. The byte count has more relation to the encoding, but is
still closely tied to the characters that have been encoded.
here's how I'm streaming the XML:
response.setContentType("text/xml");
Better would probably be "text/xml; charset=UTF-8".
OutputStream os = response.getOutputStream(); OutputStreamWriter osw = new OutputStreamWriter(os , "UTF-8"); PrintWriter pw = new PrintWriter(osw); pw.print("..all the xml..")
If, instead of writing to the response object, I write to a FileOutputStream, the accented characters appear OK in the file.
As judged how?
I'm a bit stuck here because when I wrote this code, I read up all about character encoding and did what I thought was right, and it all worked on my Win2000 test system. I can't figure out what could be going wrong on the linux box.
The output part looks okay to me. I suspect you have a different
problem than you think you have.
John Bollinger jo******@indiana.edu
Andy Fish wrote: correctly. In the XML output stream, each accented character comes out as two characters, so obviously the fact that it's supposed to be UTF-8 is being lost.
No. Not "obviously"
Capture and list the actual *bytes* going across the wire. Inspect them
and then you can say one way or another. here's how I'm streaming the XML:
response.setContentType("text/xml"); OutputStream os = response.getOutputStream();
IIRC, you need to set the encoding before the call to getOutputStream().
"Jon A. Cruz" <jo*@joncruz.org> schrieb im Newsbeitrag
news:40************@joncruz.org... response.setContentType("text/xml"); OutputStream os = response.getOutputStream();
IIRC, you need to set the encoding before the call to getOutputStream().
The encoding needs to be specified on several levels. One is the HTTP
Response header, one is in the XML header ( <?xml version="1.0"
encoding="..."?> ), and finally the output sent to the response's
outputstream need to use the very same encoding as well.
The background is that outputstream just handles bytes. You must ensure
these bytes are in the above mentioned encoding. This can be done by using a
OutputStreamWriter and setting the encoding in the constructor. Now you can
output characters and OutputStreamWriter will ensure that the outputstream
gets the correct bytes.
Hiran
Hiran Chaudhuri wrote: The background is that outputstream just handles bytes. You must ensure these bytes are in the above mentioned encoding. This can be done by using a OutputStreamWriter and setting the encoding in the constructor. Now you can output characters and OutputStreamWriter will ensure that the outputstream gets the correct bytes.
My point is that the order of things is very important. In order to get
the response headers to properly reflect what you're going to send, you
need to set things *before* getOutputStream() or getWriter().
That's a point that trips up a lot of people.
"Jon A. Cruz" <jo*@joncruz.org> schrieb im Newsbeitrag
news:40**************@joncruz.org... Hiran Chaudhuri wrote: My point is that the order of things is very important. In order to get the response headers to properly reflect what you're going to send, you need to set things *before* getOutputStream() or getWriter().
That's a point that trips up a lot of people.
That's right.
It should be easy to handle as I have seen servlet containers complaining
about attempts to set headers after the response has been committed. This is
exactly when you first fill the HTTP response body and afterwards care for
the headers.
Hiran This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: George |
last post by:
XML Sample 1 : -
<?xml version="1.0" encoding="UTF-8"?>
<!-- edited with XMLSPY v5 U (http://www.xmlspy.com) by PHS Group plc
(PHS Group plc)...
|
by: Andrew Hayes |
last post by:
Hi All
I'm calling an old VB6 program from a C#.NET application using a Process component and I was wondering if the VB6 EXE can return an exit...
|
by: sam martin |
last post by:
why doesn't IE6 show the response from this page as XML? is the contenttype
wrong?
Basically, i've got an "empty" aspx page (barring the...
|
by: Steve |
last post by:
I thought I knew XML but this just doesn't make sense.
I have a dataset ds. I load the data from the dataset into an XML document
and try to...
|
by: LS |
last post by:
Can a WebMethod return an Interface type?
Can we pass an interface parameter ?
Example :
public interface IEntity
{
long Id { get; set; }...
|
by: Stacey Levine |
last post by:
I have a webservice that I wanted to return an ArrayList..Well the service
compiles and runs when I have the output defined as ArrayList, but the...
|
by: arfeengodil |
last post by:
Hi,
I need to have a web application such that other applications should be
able to do send data to it using HTTP Post.
So I created a ASP.NET...
|
by: InnoCreate |
last post by:
Hi everyone.
I've recently written a classic asp website which uses an MS Access
datasource. I know this is less than an ideal data source as it...
|
by: krina |
last post by:
I am new in this field and have just started learning XML and xslt,
Now I am having a XML file.......having structure as follows.....
<?xml...
|
by: Algobardo |
last post by:
Good morning,
this is the first time i write on this forum because i googled and i've seen related post with no solution.
I will expose briefly...
|
by: Kemmylinns12 |
last post by:
Blockchain technology has emerged as a transformative force in the business world, offering unprecedented opportunities for innovation and...
|
by: antdb |
last post by:
Ⅰ. Advantage of AntDB: hyper-convergence + streaming processing engine
In the overall architecture, a new "hyper-convergence" concept was...
|
by: Matthew3360 |
last post by:
Hi there. I have been struggling to find out how to use a variable as my location in my header redirect function.
Here is my code.
...
|
by: Matthew3360 |
last post by:
Hi, I have a python app that i want to be able to get variables from a php page on my webserver. My python app is on my computer. How would I make it...
|
by: AndyPSV |
last post by:
HOW CAN I CREATE AN AI with an .executable file that would suck all files in the folder and on my computerHOW CAN I CREATE AN AI with an .executable...
|
by: Arjunsri |
last post by:
I have a Redshift database that I need to use as an import data source. I have configured the DSN connection using the server, port, database, and...
|
by: Oralloy |
last post by:
Hello Folks,
I am trying to hook up a CPU which I designed using SystemC to I/O pins on an FPGA.
My problem (spelled failure) is with the...
|
by: BLUEPANDA |
last post by:
At BluePanda Dev, we're passionate about building high-quality software and sharing our knowledge with the community. That's why we've created a SaaS...
|
by: Rahul1995seven |
last post by:
Introduction:
In the realm of programming languages, Python has emerged as a powerhouse. With its simplicity, versatility, and robustness, Python...
| |