469,271 Members | 1,416 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,271 developers. It's quick & easy.

Writing Unicode-16 to a text file

I tried to write some Unicode-16 characters (that were displayed
correctly, as expected, on the screen) to a file but it didn't work
out very well. I have those in an char[] as well as a String. Both
will give me a number of "?".

What do i miss?

--

Kindly
Konrad
---------------------------------------------------
May all spammers die an agonizing death; have no burial places;
their souls be chased by demons in Gehenna from one room to
another for all eternity and more.

Sleep - thing used by ineffective people
as a substitute for coffee

Ambition - a poor excuse for not having
enough sense to be lazy
---------------------------------------------------


Jul 17 '05 #1
5 11497
Konrad Den Ende wrote:
I tried to write some Unicode-16 characters (that were displayed
correctly, as expected, on the screen) to a file but it didn't work
out very well. I have those in an char[] as well as a String. Both
will give me a number of "?".

What do i miss?


When you wrote the characters to a file (what method did you use?) they
probably underwent a 16-bit to 8-bit conversion, using some encoding (what
encoding did you specify? or what is your Java installation using as its
default encoding?). When you looked at the file afterwards, the software
you used to do that (what did you use?) probably wasn't set up to grok that
encoding.

What happens when you read the file back into Java?

Good luck,

Chris

--
Chris Gray ch***@kiffer.eunet.be
/k/ Embedded Java Solutions

Jul 17 '05 #2
Konrad Den Ende wrote:
I tried to write some Unicode-16 characters (that were displayed
correctly, as expected, on the screen) to a file but it didn't work
out very well. I have those in an char[] as well as a String. Both
will give me a number of "?".

What do i miss?


When you wrote the characters to a file (what method did you use?) they
probably underwent a 16-bit to 8-bit conversion, using some encoding (what
encoding did you specify? or what is your Java installation using as its
default encoding?). When you looked at the file afterwards, the software
you used to do that (what did you use?) probably wasn't set up to grok that
encoding.

What happens when you read the file back into Java?

Good luck,

Chris

--
Chris Gray ch***@kiffer.eunet.be
/k/ Embedded Java Solutions

Jul 17 '05 #3
> When you wrote the characters to a file (what method did you use?) they
probably underwent a 16-bit to 8-bit conversion
try {
BufferedWriter writer = new BufferedWriter (new FileWriter
("nihongo.txt"));
writer.write (cc); // cc is a char[] that stores the characters
writer.close ();
}
catch (Exception e) {System.out.println (e.getMessage ());}

using some encoding (what encoding did you specify? or what is your Java
installation using as its default encoding?).
I didn't specify any encoding so i guess it's english. BUT i figured that
since
char is not more than a number then my char[] variable is just an array of
some
kind of integers (2-byte, i guess, so it will contain all the 65k
characters).
When you looked at the file afterwards, the software you used to do that
(what did you use?) probably wasn't set up to grok that encoding.
I used MS Word and a text reader with enabled japanese. Just to be sure i
checked a file that i can read japanese text from using my usual software,
and read from it using notepad. I didn's see japanese (oh, what a surprise)
but i could see a number of strange characters.
Yet, the file that my application creates, contains only "?"'s.
What happens when you read the file back into Java?


"?"'s only.

Any hint?
--

Kindly
Konrad
---------------------------------------------------
May all spammers die an agonizing death; have no burial places;
their souls be chased by demons in Gehenna from one room to
another for all eternity and more.

Sleep - thing used by ineffective people
as a substitute for coffee

Ambition - a poor excuse for not having
enough sense to be lazy
---------------------------------------------------


Jul 17 '05 #4
> When you wrote the characters to a file (what method did you use?) they
probably underwent a 16-bit to 8-bit conversion
try {
BufferedWriter writer = new BufferedWriter (new FileWriter
("nihongo.txt"));
writer.write (cc); // cc is a char[] that stores the characters
writer.close ();
}
catch (Exception e) {System.out.println (e.getMessage ());}

using some encoding (what encoding did you specify? or what is your Java
installation using as its default encoding?).
I didn't specify any encoding so i guess it's english. BUT i figured that
since
char is not more than a number then my char[] variable is just an array of
some
kind of integers (2-byte, i guess, so it will contain all the 65k
characters).
When you looked at the file afterwards, the software you used to do that
(what did you use?) probably wasn't set up to grok that encoding.
I used MS Word and a text reader with enabled japanese. Just to be sure i
checked a file that i can read japanese text from using my usual software,
and read from it using notepad. I didn's see japanese (oh, what a surprise)
but i could see a number of strange characters.
Yet, the file that my application creates, contains only "?"'s.
What happens when you read the file back into Java?


"?"'s only.

Any hint?
--

Kindly
Konrad
---------------------------------------------------
May all spammers die an agonizing death; have no burial places;
their souls be chased by demons in Gehenna from one room to
another for all eternity and more.

Sleep - thing used by ineffective people
as a substitute for coffee

Ambition - a poor excuse for not having
enough sense to be lazy
---------------------------------------------------


Jul 17 '05 #5
Konrad Den Ende wrote:
When you wrote the characters to a file (what method did you use?) they
probably underwent a 16-bit to 8-bit conversion
try {
BufferedWriter writer = new BufferedWriter (new FileWriter
("nihongo.txt"));
writer.write (cc); // cc is a char[] that stores the characters
writer.close ();
}
catch (Exception e) {System.out.println (e.getMessage ());} using some encoding (what encoding did you specify? or what is your Java
installation using as its default encoding?).

Any hint?


Sure.

You have been writing Japanese with an encoding that doensn't support
it. I bet your default encoding, derived from your operating system
locale (you may see that from System.getProperties() . .. ) is ISO-8859
or something like that. It does not support Japanese.

You should look at OutputStreamWriter, of which you can make an instance
that uses an encoding that supports Japanese. You can get an idea of
what encodings are supported by looking at the CharSet class of java
1.4's nio package. There is a static method there, I forgot its name,
that will return you a Set of the names of supported encodings.

You may end up using ISO-2022-something, but I prefer Unicode's UTF-8,
it's a lot nicer and cleaner, and it supports almost any language. You
will need Unicode fonts though.

En encoding is the mapping from bytes (sequences of 8 bits) to a higher
level of abstraction, namely characters. Streams are byte oriented,
readers/writers are character oriented, and encoding/decoding is in
between.

Hope that helped.
Soren
--
Fjern de 4 bogstaver i min mailadresse som er indsat for at hindre s...
Remove the 4 letter word meaning "junk mail" in my mail address.

Jul 17 '05 #6

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

8 posts views Thread by Bill Eldridge | last post: by
20 posts views Thread by Sean McIlroy | last post: by
30 posts views Thread by aurora | last post: by
2 posts views Thread by Grace | last post: by
4 posts views Thread by webdev | last post: by
24 posts views Thread by ChaosKCW | last post: by
3 posts views Thread by Ira.Kovac | last post: by
reply views Thread by zhoujie | last post: by
reply views Thread by suresh191 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.