Nicholas Pappas wrote:
This is the block of code in my loader which reads the strings
from the file:
[...]
/** read in the 40 byte buffer */
in.read(bmpPath);
/** trim buffer to length and store */
for (len=0; len < 40; len++) {
if (bmpPath[len] == 0)
break;
}
textures[i] = new String(bmpPath, 0, len);
[...]
Does anyone have any suggestions on how I fix this so I can read the
Korean text in both Windows and Linux (and other OSs)?
You need a basic understanding of the relationship between bytes and
characters, and of the concept of character encodings. And you need
more information about your input file; specifically, what character
encoding it uses. There are a number of potential problems here:
1. (Actually not related to character encodings) Your call to in.read is
flawed. Take a look at the API documentation for that method.
Specifically, the method is not guaranteed to read the entire array. It
is only specified to read at least one byte but not more than the length
of the array, and to return to number of bytes that it has read. If you
want to read the entire byte array, you'll need to write a loop; sorta
like this:
int pos = 0;
while (pos < bmpPath.length)
{
int len = in.read(bmpPath, pos, bmpPath.length - pos);
if (len == -1) handlePrematureEOF();
else pos += len;
}
Of course, handlePrematureEOF() should be replaced with appropriate
error-handling code, such as throwing an exception indicating the bad
file format.
2. You don't specify an encoding when you convert the data in the byte
array to text. That data was encoding in some specific encoding when
the file was written. The code you've written will work only if you get
lucky and the platform-default character encoding happens to match the
encoding in the file. To make this work reliably in a cross-platform
way, you need to discover what encoding was used in the file, and
specify that in a separate parameter, for example:
textures[i] = new String(bmpPath, 0, len, "UTF-8");
(That gets you UTF-8 encoding, which is probably a decent guess; but you
need to find out the real encoding to be sure this will work. It should
be documented with the file format spec.)
3. This is a bit of a subtle one, actually. The test for bytes to equal
zero, which you use to determine the end of the String, will not work
reliably across character encodings. In any multi-byte character
encoding, there's a chance that there will be an embedded zero byte
inside of a character, but the character code itself will be non-zero.
To work around this, you need to swap the order. If your strings are
null-terminated, then convert your byte array to characters first, then
look for a null character (i.e., Unicode value zero), rather than a zero
byte. That looks like this:
InputStreamReader in = new InputStreamReader(
new ByteArrayInputStream(bmpPath), "UTF-8");
StringWriter sw = new StringWriter();
int c;
while (c > 0) sw.write((char) c);
textures[i] = sw.toString();
This is an alternative to the String constructor you used to convert to
characters, and notice that you still need to know the proper character
encoding.
Hope that gets you started,
--
www.designacourse.com
The Easiest Way to Train Anyone... Anywhere.
Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation