473,718 Members | 2,108 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

UNICODE in Java Help

Hello all.

I am trying to write a Java3D loader for a geometry file from a
game, which has Unicode characters (Korean) in it. I wrote the loader
and it works in Windows, but I recently brushed off Windows completely
and am now under Linux. When I try to load the filenames now, I get ??????.
This is the block of code in my loader which reads the strings
from the
file:

/** get the number of texture files */
numTextures = in.readInt();

/** skip ahead 4 bytes */
in.skipBytes(4) ;

/** load the texture files strings */
textures = new String[numTextures];
for (int i=0; i < numTextures; i++) {
/** read in the 40 byte buffer */
in.read(bmpPath );

/** trim buffer to length and store */
for (len=0; len < 40; len++) {
if (bmpPath[len] == 0)
break;
}
textures[i] = new String(bmpPath, 0, len);

/** skip ahead 40 bytes */
in.skipBytes(40 );
}

By the time it enters the String array it is all messed up and does not
properly represent the correct paths anymore.
The current reader takes in 40 bytes and then figures out how long the
string is from there. There is no string length indication in the file,
so I have to figure it out within the byte array.

Does anyone have any suggestions on how I fix this so I can read the
Korean text in both Windows and Linux (and other OSs)?

Thank you for any help!
Jul 17 '05 #1
13 6341
On Thu, 27 May 2004 03:54:01 GMT, Nicholas Pappas
<no*****@rights tep.org> wrote or quoted :
numTextures = in.readInt();


The key is the declaration of in. What format are these data? What
encoding?

See http://mindprod.com/fileio.html
to select the correct method.

--
Canadian Mind Products, Roedy Green.
Coaching, problem solving, economical contract programming.
See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
Jul 17 '05 #2
Roedy Green wrote:
On Thu, 27 May 2004 03:54:01 GMT, Nicholas Pappas
numTextures = in.readInt();


The key is the declaration of in. What format are these data? What
encoding?


'in' is a LittleEndianInp utStream, which extends FilterInputStre am and
implements DataInput. In the case of reading bytes (as done to
construct the Strings in question), the read() function is simple a
pass-through -- no changes to the default behavior.
I received another suggestion about the encoding and will be trying
that this evening when I get home. However, I'm concerned that I am
going to get this working for Linux (perhaps by select UTF-8) and then
it will stop working in Windows.
Will forcing the input stream to a certain encoding under Linux break
Windows?
Jul 17 '05 #3
under Java you are suppose to be shielded from machine level details and
that includes unicode issues.... is the jre the same or later than the
one used on your windows platform...?

- perry

Nicholas Pappas wrote:
Hello all.

I am trying to write a Java3D loader for a geometry file from a
game, which has Unicode characters (Korean) in it. I wrote the loader
and it works in Windows, but I recently brushed off Windows completely
and am now under Linux. When I try to load the filenames now, I get
??????.
This is the block of code in my loader which reads the strings
from the
file:

/** get the number of texture files */
numTextures = in.readInt();

/** skip ahead 4 bytes */
in.skipBytes(4) ;

/** load the texture files strings */
textures = new String[numTextures];
for (int i=0; i < numTextures; i++) {
/** read in the 40 byte buffer */
in.read(bmpPath );

/** trim buffer to length and store */
for (len=0; len < 40; len++) {
if (bmpPath[len] == 0)
break;
}
textures[i] = new String(bmpPath, 0, len);

/** skip ahead 40 bytes */
in.skipBytes(40 );
}

By the time it enters the String array it is all messed up and does
not properly represent the correct paths anymore.
The current reader takes in 40 bytes and then figures out how long
the string is from there. There is no string length indication in the
file, so I have to figure it out within the byte array.

Does anyone have any suggestions on how I fix this so I can read the
Korean text in both Windows and Linux (and other OSs)?

Thank you for any help!


Jul 17 '05 #4
Nicholas Pappas wrote:
This is the block of code in my loader which reads the strings
from the file:
[...]
/** read in the 40 byte buffer */
in.read(bmpPath );

/** trim buffer to length and store */
for (len=0; len < 40; len++) {
if (bmpPath[len] == 0)
break;
}
textures[i] = new String(bmpPath, 0, len);
[...]
Does anyone have any suggestions on how I fix this so I can read the
Korean text in both Windows and Linux (and other OSs)?


You need a basic understanding of the relationship between bytes and
characters, and of the concept of character encodings. And you need
more information about your input file; specifically, what character
encoding it uses. There are a number of potential problems here:

1. (Actually not related to character encodings) Your call to in.read is
flawed. Take a look at the API documentation for that method.
Specifically, the method is not guaranteed to read the entire array. It
is only specified to read at least one byte but not more than the length
of the array, and to return to number of bytes that it has read. If you
want to read the entire byte array, you'll need to write a loop; sorta
like this:

int pos = 0;
while (pos < bmpPath.length)
{
int len = in.read(bmpPath , pos, bmpPath.length - pos);

if (len == -1) handlePremature EOF();
else pos += len;
}

Of course, handlePremature EOF() should be replaced with appropriate
error-handling code, such as throwing an exception indicating the bad
file format.

2. You don't specify an encoding when you convert the data in the byte
array to text. That data was encoding in some specific encoding when
the file was written. The code you've written will work only if you get
lucky and the platform-default character encoding happens to match the
encoding in the file. To make this work reliably in a cross-platform
way, you need to discover what encoding was used in the file, and
specify that in a separate parameter, for example:

textures[i] = new String(bmpPath, 0, len, "UTF-8");

(That gets you UTF-8 encoding, which is probably a decent guess; but you
need to find out the real encoding to be sure this will work. It should
be documented with the file format spec.)

3. This is a bit of a subtle one, actually. The test for bytes to equal
zero, which you use to determine the end of the String, will not work
reliably across character encodings. In any multi-byte character
encoding, there's a chance that there will be an embedded zero byte
inside of a character, but the character code itself will be non-zero.

To work around this, you need to swap the order. If your strings are
null-terminated, then convert your byte array to characters first, then
look for a null character (i.e., Unicode value zero), rather than a zero
byte. That looks like this:

InputStreamRead er in = new InputStreamRead er(
new ByteArrayInputS tream(bmpPath), "UTF-8");
StringWriter sw = new StringWriter();

int c;
while (c > 0) sw.write((char) c);

textures[i] = sw.toString();

This is an alternative to the String constructor you used to convert to
characters, and notice that you still need to know the proper character
encoding.

Hope that gets you started,

--
www.designacourse.com
The Easiest Way to Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
Jul 17 '05 #5
That is what I thought too (the shielding part). :)

I last used 1.4.2 in Windows, but am using 1.4.1 right now under Linux.
I've been trying to upgrade to 1.5, but the self-installer bin doesn't
seem to want to install correctly on Gentoo Linux. :(

perry anderson wrote:
under Java you are suppose to be shielded from machine level details and
that includes unicode issues.... is the jre the same or later than the
one used on your windows platform...?

- perry

Nicholas Pappas wrote:
Hello all.

I am trying to write a Java3D loader for a geometry file from a
game, which has Unicode characters (Korean) in it. I wrote the loader
and it works in Windows, but I recently brushed off Windows completely
and am now under Linux. When I try to load the filenames now, I get
??????.
This is the block of code in my loader which reads the strings
from the
file:

/** get the number of texture files */
numTextures = in.readInt();

/** skip ahead 4 bytes */
in.skipBytes(4) ;

/** load the texture files strings */
textures = new String[numTextures];
for (int i=0; i < numTextures; i++) {
/** read in the 40 byte buffer */
in.read(bmpPath );

/** trim buffer to length and store */
for (len=0; len < 40; len++) {
if (bmpPath[len] == 0)
break;
}
textures[i] = new String(bmpPath, 0, len);

/** skip ahead 40 bytes */
in.skipBytes(40 );
}

By the time it enters the String array it is all messed up and
does not properly represent the correct paths anymore.
The current reader takes in 40 bytes and then figures out how long
the string is from there. There is no string length indication in the
file, so I have to figure it out within the byte array.

Does anyone have any suggestions on how I fix this so I can read
the Korean text in both Windows and Linux (and other OSs)?

Thank you for any help!


Jul 17 '05 #6
On Thu, 27 May 2004 09:43:04 -0400, Nicholas Pappas
<no*****@rights tep.org> wrote or quoted :
Will forcing the input stream to a certain encoding under Linux break
Windows?


If the files have different encodings in different platforms you are
going to be in trouble. If you both write and read the file you can
force the encoding and thereby be consistent on all platforms. If you
don't specify the encoding you get a pig an a poke, whatever the
locale things is reasonable, highly unlikely to be something exotic
like your file.

--
Canadian Mind Products, Roedy Green.
Coaching, problem solving, economical contract programming.
See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
Jul 17 '05 #7
On Thu, 27 May 2004 13:35:29 -0600, Chris Smith <cd*****@twu.ne t>
wrote or quoted :
You need a basic understanding of the relationship between bytes and
characters, and of the concept of character encodings. And you need
more information about your input file; specifically, what character
encoding it uses. There are a number of potential problems here:


Read up on encodings, http://mindprod.com/jgloss/encoding.html. OP
has not told us enough about what he is doing. Where did this file
come from? Is it encoded the same way on all platforms or is it being
provided in a variety of encodings by some third party software?

--
Canadian Mind Products, Roedy Green.
Coaching, problem solving, economical contract programming.
See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
Jul 17 '05 #8
Chris Smith wrote:
2. You don't specify an encoding when you convert the data in the byte
array to text. That data was encoding in some specific encoding when
the file was written. The code you've written will work only if you get
lucky and the platform-default character encoding happens to match the
encoding in the file. To make this work reliably in a cross-platform
way, you need to discover what encoding was used in the file, and
specify that in a separate parameter, for example:

textures[i] = new String(bmpPath, 0, len, "UTF-8");


Well, I gave UTF-8 and all the encoding listed on this page:
http://java.sun.com/j2se/1.4.2/docs/...t/Charset.html
No luck. :(

All the directories show up correctly in Konqueror (KDE browser). Is
there some way I can detect the encoding being used there? What is
Windows default encoding, anyone know? :)

Thanks again for all the help!
Jul 17 '05 #9
Roedy Green wrote:
On Thu, 27 May 2004 09:43:04 -0400, Nicholas Pappas
Will forcing the input stream to a certain encoding under Linux break
Windows?


If the files have different encodings in different platforms you are
going to be in trouble. If you both write and read the file you can
force the encoding and thereby be consistent on all platforms.


Thankfully I do not need to write to the files, so I only need to
figure out how to read them.
Is there a Linux/UNIX command (or even a Windows command) that will
display the character set? Seems unlikely, but I'll cross my fingers
anyway. The files show up correctly in the KDE browser -- might I be
able to figure something out from there?

Thanks again!
Jul 17 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
1989
by: Nicholas Pappas | last post by:
Hello all. First, a many thanks to all who helped out with my first issue with getting UNICODE moving as it should in Java. I am now able to open the file and store the strings as they appeared in Windows (using Cp1252)! So, I'm a step closer! But, I've run into another problem. Once I have the string stored in my application, I try to load the image it points to. If there are any UNICODE characters in the string,
1
2165
by: krammer | last post by:
Hello, I have the following questions that I have not been able to find any *good* answers for. Your help would me much appreciated!, fyi, I am a Java XML guy and I have no experience with SGML so my questions will probably be XML biased. 1) Is is possible to have Unicode text inside an SGML file? an example would be something like this.......
2
9931
by: Dale Gerdemann | last post by:
I'm having trouble with Unicode encoding in DOM. As a simple example, I read in a UTF-8 encoded xml file such as: <?xml version="1.0" encoding="UTF-8" standalone="no"?> <aText>letter 'a' with umlaut: ä</aText> And when I serialize it, it comes out encoded as ISO-8895-1. But I don't think the problem is with serialization. In processing my XML files, I'm matching bits and pieces of text and attributes with some
8
3273
by: zahidal | last post by:
hello, i am facing a problem with a db2 database created with utf-8 character set. My db2 server is running on windows 2000 server, client is on another machine that is also running windows 2000 server. installation of db2 is default on both client and server. after creating database i cant insert any japanese/bangla character in the database. then i test the font settngs of both db2 server and client from Control centre and found that...
5
18662
by: Jamie | last post by:
I have a file that was written using Java and the file has unicode strings. What is the best way to deal with these in C? The file definition reads: Data Field Description CHAR File identifier (64 bytes corresponding to Unicode character string padded with '0' Unicode characters. CHAR File format version (32 bytes corresponding to Unicode character string "x.y.z" where x, y, z are integers corresponding to major, minor and...
9
13424
by: Charles F McDevitt | last post by:
I'm trying to upgrade some old code that used old iostreams. At one place in the code, I have a path/filename in a wchar_t string (unicode utf-16). I need to open an ifstream to that file. But the open() on ifstream only takes char * strings (mbcs?). In old iostreams, I could _wopen() the file, get the filedesc, and call attach() on the ifstream.
6
6615
by: John Sidney-Woollett | last post by:
Hi I need to store accented characters in a postgres (7.4) database, and access the data (mostly) using the postgres JDBC driver (from a web app). Does anyone know if: 1) Is there a performance loss using (multibyte) UNICODE vs (single byte) SQL_ASCII/LATINxxx character encoding? (In terms of extra data, and searching/sorting speeds).
8
9922
by: Richard Schulman | last post by:
Sorry to be back at the goodly well so soon, but... ....when I execute the following -- variable mean_eng_txt being utf-16LE and its datatype nvarchar2(79) in Oracle: cursor.execute("""INSERT INTO mean (mean_id,mean_eng_txt) VALUES (:id,:mean)""",id=id,mean=mean) I not surprisingly get this error message:
8
2659
by: Yves Dorfsman | last post by:
Can you put UTF-8 characters in a dbhash in python 2.5 ? It fails when I try: #!/bin/env python # -*- coding: utf-8 -*- import dbhash db = dbhash.open('dbfile.db', 'w') db = u'☺'
0
8827
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, well explore What is ONU, What Is Router, ONU & Routers main usage, and What is the difference between ONU and Router. Lets take a closer look ! Part I. Meaning of...
0
9207
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
9053
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7987
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development projectplanning, coding, testing, and deploymentwithout human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5971
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4741
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3182
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
2553
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2122
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.