473,704 Members | 5,928 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

returning XML as UTF-8 from a servlet

Hi,

I have a servlet (running under tomcat 4.1, java 1.4.2) that sends XML in
the HTTP body from a servlet. The I want the XML to be encoded in UTF-8.

when I run Tomcat on windows 2000, the XML appears fine on the client end,
but running Tomcat on debian woody linux, accented characters don't appear
correctly. In the XML output stream, each accented character comes out as
two characters, so obviously the fact that it's supposed to be UTF-8 is
being lost.

here's how I'm streaming the XML:

response.setCon tentType("text/xml");
OutputStream os = response.getOut putStream();
OutputStreamWri ter osw = new OutputStreamWri ter(os , "UTF-8");
PrintWriter pw = new PrintWriter(osw );
pw.print("..all the xml..")

If, instead of writing to the response object, I write to a
FileOutputStrea m, the accented characters appear OK in the file.

I'm a bit stuck here because when I wrote this code, I read up all about
character encoding and did what I thought was right, and it all worked on my
Win2000 test system. I can't figure out what could be going wrong on the
linux box.

many thanks for any advice on hints,

Andy
Jul 20 '05 #1
7 14241
Andy Fish wrote:
Hi,

I have a servlet (running under tomcat 4.1, java 1.4.2) that sends XML in
the HTTP body from a servlet. The I want the XML to be encoded in UTF-8.

when I run Tomcat on windows 2000, the XML appears fine on the client end,
but running Tomcat on debian woody linux, accented characters don't appear
correctly. In the XML output stream, each accented character comes out as
two characters, so obviously the fact that it's supposed to be UTF-8 is
being lost.
How do you check the XML? With a browser?

here's how I'm streaming the XML:

response.setCon tentType("text/xml");


Maybe you can add the encoding to the HTTP header:
response.setCon tentType("text/xml;charset=utf-8");

f'up2 c.t.x
--
Johannes Koch
In te domine speravi; non confundar in aeternum.
(Te Deum, 4th cent.)
Jul 20 '05 #2
Thanks very much Johannes, that fixed it perfectly - isn't usenet just the
best thing ever? :-)))

I'm still not sure why it works differently on windows 2000 - maybe down to
the native locale of the OS or something I guess.(or possibly slightly
different version of tomcat)

"Johannes Koch" <ko**@w3develop ment.de> wrote in message
news:c2******** *****@ID-61067.news.uni-berlin.de...
Andy Fish wrote:
Hi,

I have a servlet (running under tomcat 4.1, java 1.4.2) that sends XML in the HTTP body from a servlet. The I want the XML to be encoded in UTF-8.

when I run Tomcat on windows 2000, the XML appears fine on the client end, but running Tomcat on debian woody linux, accented characters don't appear correctly. In the XML output stream, each accented character comes out as two characters, so obviously the fact that it's supposed to be UTF-8 is
being lost.


How do you check the XML? With a browser?

here's how I'm streaming the XML:

response.setCon tentType("text/xml");


Maybe you can add the encoding to the HTTP header:
response.setCon tentType("text/xml;charset=utf-8");

f'up2 c.t.x
--
Johannes Koch
In te domine speravi; non confundar in aeternum.
(Te Deum, 4th cent.)

Jul 20 '05 #3
Andy Fish wrote:
Hi,

I have a servlet (running under tomcat 4.1, java 1.4.2) that sends XML in
the HTTP body from a servlet. The I want the XML to be encoded in UTF-8.

when I run Tomcat on windows 2000, the XML appears fine on the client end,
but running Tomcat on debian woody linux, accented characters don't appear
correctly. In the XML output stream, each accented character comes out as
two characters, so obviously the fact that it's supposed to be UTF-8 is
being lost.
No, that's not obvious at all. Not from the information you have given.
Unicode provides for logical characters to be composed of two or more
characters; for instance, a lowercase u with an umlaut could be
represented as the latin lowercase 'u' followed by the umlaut "combining
character". Many of the more common combinations also have
single-character representations , including the u-umlaut example, and
pretty much all the "diacriticalize d" characters used in Western
European languages. The alternative representations are equivalent as
far as Unicode is concerned, and Unicode processors are permitted to
freely substitute one for another. They should be displayed or printed
the same by a conformant processor.

Moreover, the fact that you are making judgements about the "UTF-8ness"
of the stream based on the character count leads me to wonder whether
perhaps you are confusing characters with bytes / octets, or whether you
misunderstand the nature of character encodings. The character count
has little to do with whether the characters are encoded in UTF-8;
rather it has everything to do with which character or characters have
been encoded. The byte count has more relation to the encoding, but is
still closely tied to the characters that have been encoded.
here's how I'm streaming the XML:

response.setCon tentType("text/xml");
Better would probably be "text/xml; charset=UTF-8".
OutputStream os = response.getOut putStream();
OutputStreamWri ter osw = new OutputStreamWri ter(os , "UTF-8");
PrintWriter pw = new PrintWriter(osw );
pw.print("..all the xml..")

If, instead of writing to the response object, I write to a
FileOutputStrea m, the accented characters appear OK in the file.
As judged how?
I'm a bit stuck here because when I wrote this code, I read up all about
character encoding and did what I thought was right, and it all worked on my
Win2000 test system. I can't figure out what could be going wrong on the
linux box.


The output part looks okay to me. I suspect you have a different
problem than you think you have.
John Bollinger
jo******@indian a.edu

Jul 20 '05 #4
Andy Fish wrote:
correctly. In the XML output stream, each accented character comes out as
two characters, so obviously the fact that it's supposed to be UTF-8 is
being lost.
No. Not "obviously"

Capture and list the actual *bytes* going across the wire. Inspect them
and then you can say one way or another.


here's how I'm streaming the XML:

response.setCon tentType("text/xml");
OutputStream os = response.getOut putStream();


IIRC, you need to set the encoding before the call to getOutputStream ().

Jul 20 '05 #5

"Jon A. Cruz" <jo*@joncruz.or g> schrieb im Newsbeitrag
news:40******** ****@joncruz.or g...

response.setCon tentType("text/xml");
OutputStream os = response.getOut putStream();


IIRC, you need to set the encoding before the call to getOutputStream ().


The encoding needs to be specified on several levels. One is the HTTP
Response header, one is in the XML header ( <?xml version="1.0"
encoding="..."? > ), and finally the output sent to the response's
outputstream need to use the very same encoding as well.

The background is that outputstream just handles bytes. You must ensure
these bytes are in the above mentioned encoding. This can be done by using a
OutputStreamWri ter and setting the encoding in the constructor. Now you can
output characters and OutputStreamWri ter will ensure that the outputstream
gets the correct bytes.

Hiran
Jul 20 '05 #6
Hiran Chaudhuri wrote:

The background is that outputstream just handles bytes. You must ensure
these bytes are in the above mentioned encoding. This can be done by using a
OutputStreamWri ter and setting the encoding in the constructor. Now you can
output characters and OutputStreamWri ter will ensure that the outputstream
gets the correct bytes.


My point is that the order of things is very important. In order to get
the response headers to properly reflect what you're going to send, you
need to set things *before* getOutputStream () or getWriter().

That's a point that trips up a lot of people.

Jul 20 '05 #7

"Jon A. Cruz" <jo*@joncruz.or g> schrieb im Newsbeitrag
news:40******** ******@joncruz. org...
Hiran Chaudhuri wrote:
My point is that the order of things is very important. In order to get
the response headers to properly reflect what you're going to send, you
need to set things *before* getOutputStream () or getWriter().

That's a point that trips up a lot of people.


That's right.

It should be easy to handle as I have seen servlet containers complaining
about attempts to set headers after the response has been committed. This is
exactly when you first fill the HTTP response body and afterwards care for
the headers.

Hiran
Jul 20 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
4659
by: George | last post by:
XML Sample 1 : - <?xml version="1.0" encoding="UTF-8"?> <!-- edited with XMLSPY v5 U (http://www.xmlspy.com) by PHS Group plc (PHS Group plc) --> <EntityConfiguration xmlns="x-schema:C:/QuantivProjects/QuantivT/Scheme/QuantivBase/QTVConfiguration/EntityConfiguration.xdr" xmlns:dt="urn:schemas-microsoft-com:datatypes"> <MajorVersion>1</MajorVersion> <MinorVersion>0</MinorVersion>
15
443
by: Andrew Hayes | last post by:
Hi All I'm calling an old VB6 program from a C#.NET application using a Process component and I was wondering if the VB6 EXE can return an exit code different than 0 I know I could use the Environment.ExitCode property or even the Environment.Exit method if the called app was written using VB.NET, but I'm not looking to spend several months moving this app to .NET I'd also rather not mess about with writing exit information to a file or...
2
1780
by: sam martin | last post by:
why doesn't IE6 show the response from this page as XML? is the contenttype wrong? Basically, i've got an "empty" aspx page (barring the precompiler line <%@ Page language="c#" Codebehind="newssvc.aspx.cs" AutoEventWireup="false" Inherits="gorlandnet.news.newssvc" %>) then this in the Page_Load method: Response.ContentType="text/xml";
1
2449
by: Steve | last post by:
I thought I knew XML but this just doesn't make sense. I have a dataset ds. I load the data from the dataset into an XML document and try to SelectNodes. Nothing. Here is the code
5
10372
by: LS | last post by:
Can a WebMethod return an Interface type? Can we pass an interface parameter ? Example : public interface IEntity { long Id { get; set; } string Name { get; set; } }
5
19596
by: Stacey Levine | last post by:
I have a webservice that I wanted to return an ArrayList..Well the service compiles and runs when I have the output defined as ArrayList, but the WSDL defines the output as an Object so I was having a problem in the calling program. I searched online and found suggestions that I return an Array instead so I modified my code (below) to return an Array instead of an ArrayList. Now I get the message when I try to run just my webservice...
1
2494
by: arfeengodil | last post by:
Hi, I need to have a web application such that other applications should be able to do send data to it using HTTP Post. So I created a ASP.NET web service and defined a web method for other people to POST data to. But the string returned back to the person who sent the POST message is embedded in XML Tags. for example http://mydomain.com/webserver.asmx/SendData?parameter=Hello World
6
5471
by: InnoCreate | last post by:
Hi everyone. I've recently written a classic asp website which uses an MS Access datasource. I know this is less than an ideal data source as it has limited functionality. I have a search form on my website which allows users to define parameters and return results accordingly. The problem i have is a need to return these results in a random order each time. With SQLServer i know NEWID() would do the trick - used this many times before...
0
1225
by: krina | last post by:
I am new in this field and have just started learning XML and xslt, Now I am having a XML file.......having structure as follows..... <?xml version="1.0" encoding="UTF-8"?> <page> <item xPos="20" yPos="10" layer="1">This is at layer one <item xPos="30" yPos="10" layer="1">This is 1st child of item one</item>
0
1720
by: Algobardo | last post by:
Good morning, this is the first time i write on this forum because i googled and i've seen related post with no solution. I will expose briefly the problem. I'm using c# 3.5 and i'm trying using a web service. Unfortunatly i get a NULL response and i can't understand why. RpcRsat.RSATWSPortType rpt = new RpcRsat.RSATWSPortTypeClient(); RpcRsat.retrieve_seqResponse res = rpt.retrieve_seq(req);
0
9266
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9130
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
9020
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
7879
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6603
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5926
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4435
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4694
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3130
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.