By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,649 Members | 2,142 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,649 IT Pros & Developers. It's quick & easy.

How to write unicode-characters to a RTF-doc ?

P: n/a
I need to produce a RTF-document which is filled with
data from a database.
I've created a RTF-document in WordPad (a template,
so to speak) which contains 'placeholders', for example
'<dd01>', '<dd02>', etc.

I read the entire template into a StringBuilder and
then perform a simple 'replace' on it, using a Hashtable.
The keys in the Hashtable are strings representing the
placeholders and the Hashtable's values contain data
from the database.

After the replace-action, I write the content of the
StringBuilder to a file with extension '.rtf'
(See code below).

It works like a charm, I can read the file with Word
(or WordPad) and it looks alright.

---

But ... Problems arise when the data from the database
contains characters like , etc. (Are these
called 'unicode-characters' ?)

These characters get converted to 'gibberish' when viewing
the generated rtf-doc in Word.
Then I thought that I probably needed to add 'Encoding.Unicode'
when writing the file, but when I do that, the generated file
is no longer recognized by Word as a valid RTF-doc.
Word then complains, 'this is an encoded file, install importfilters,
etc ...'.
My two questions now are :

1. How can I write unicode-characters to my RTF-template 'the
right way' ?

2. Why doesn't Word recognize a simple RTF-document no longer
after it was written using 'Encoding.Unicode' ?
I thought a RTF-document is basically just plain text (however
containing a lot of mark-up code) and by using 'Encoding.Unicode',
I'm only telling, 'this plain-text may contain unicode-characters'.
Right ?

//---

This is the code :

private void writeForm(string pathTemplate, string pathTempFile, Hashtable formData)
{
TextReader syncReader = TextReader.Synchronized(new StreamReader(pathTemplate));
TextWriter syncWriter = TextWriter.Synchronized(new StreamWriter(pathTempFile));

StringBuilder emptyTemplate = new StringBuilder(syncReader.ReadToEnd());
StringBuilder filledDoc = fillTemplate(emptyTemplate, formData);
syncWriter.Write(filledDoc);

syncReader.Close();
syncWriter.Close();
}

private StringBuilder fillTemplate(StringBuilder doc, Hashtable formData)
{
IDictionaryEnumerator myEnumerator = formData.GetEnumerator();
while (myEnumerator.MoveNext())
{
System.Diagnostics.Debug.WriteLine("1 : " + (string) myEnumerator.Value);
doc = doc.Replace( ( (string) myEnumerator.Key), (string) myEnumerator.Value);
}
return doc;
}

//---
Nov 17 '05 #1
Share this Question
Share on Google+
3 Replies


P: n/a
<jo**@wezayzo.com> wrote:
I need to produce a RTF-document which is filled with
data from a database.
I've created a RTF-document in WordPad (a template,
so to speak) which contains 'placeholders', for example
'<dd01>', '<dd02>', etc.

I read the entire template into a StringBuilder and
then perform a simple 'replace' on it, using a Hashtable.
The keys in the Hashtable are strings representing the
placeholders and the Hashtable's values contain data
from the database.

After the replace-action, I write the content of the
StringBuilder to a file with extension '.rtf'
(See code below).

It works like a charm, I can read the file with Word
(or WordPad) and it looks alright.

---

But ... Problems arise when the data from the database
contains characters like , etc. (Are these
called 'unicode-characters' ?)
Well, all characters in .NET are Unicode.
These characters get converted to 'gibberish' when viewing
the generated rtf-doc in Word.
Okay. I think you need to find the specifications for RTF and work out
which encoding to use. By default, StreamWriter will be using UTF-8. It
sounds like that's no good for you, but you shouldn't just pick
encodings at random - you could find one which appears to work, but
fails with some data you don't test it with.

Looking at the docs at www.wotsit.org, it looks like it *is* possible
to specify encodings, but that Word doesn't understand UTF-8 encoded
text. You may need to "manually" encode (with \UN) characters which
aren't in the appropriate code-page - I'd go with anything non-ASCII.
2. Why doesn't Word recognize a simple RTF-document no longer
after it was written using 'Encoding.Unicode' ?
I thought a RTF-document is basically just plain text (however
containing a lot of mark-up code) and by using 'Encoding.Unicode',
I'm only telling, 'this plain-text may contain unicode-characters'.
Right ?


No - it's entirely changing what the file looks like. See
http://www.pobox.com/~skeet/csharp/unicode.html to understand what
Encodings are about.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Nov 17 '05 #2

P: n/a
> <jo**@wezayzo.com> wrote:
I need to produce a RTF-document which is filled with
data from a database.
I've created a RTF-document in WordPad (a template,
so to speak) which contains 'placeholders', for example
'<dd01>', '<dd02>', etc.

I read the entire template into a StringBuilder and
then perform a simple 'replace' on it, using a Hashtable.
The keys in the Hashtable are strings representing the
placeholders and the Hashtable's values contain data
from the database.

After the replace-action, I write the content of the
StringBuilder to a file with extension '.rtf'
(See code below).

It works like a charm, I can read the file with Word
(or WordPad) and it looks alright.

---

But ... Problems arise when the data from the database
contains characters like , etc. (Are these
called 'unicode-characters' ?)


Well, all characters in .NET are Unicode.
These characters get converted to 'gibberish' when viewing
the generated rtf-doc in Word.


Okay. I think you need to find the specifications for RTF and work out
which encoding to use. By default, StreamWriter will be using UTF-8. It
sounds like that's no good for you, but you shouldn't just pick
encodings at random - you could find one which appears to work, but
fails with some data you don't test it with.

Looking at the docs at www.wotsit.org, it looks like it *is* possible
to specify encodings, but that Word doesn't understand UTF-8 encoded
text. You may need to "manually" encode (with \UN) characters which
aren't in the appropriate code-page - I'd go with anything non-ASCII.
2. Why doesn't Word recognize a simple RTF-document no longer
after it was written using 'Encoding.Unicode' ?
I thought a RTF-document is basically just plain text (however
containing a lot of mark-up code) and by using 'Encoding.Unicode',
I'm only telling, 'this plain-text may contain unicode-characters'.
Right ?


No - it's entirely changing what the file looks like. See
http://www.pobox.com/~skeet/csharp/unicode.html to understand what
Encodings are about.

Thank you very much, Jon.
I'm gonna study your page on unicode.
Nov 17 '05 #3

P: n/a
The most portable RTF forms is ascii only.
All high-characters (all above 127) should be escaped with \u
See the specs here: http://support.microsoft.com/kb/q86999/

--
Mihai Nita [Microsoft MVP, Windows - SDK]
------------------------------------------
Replace _year_ with _ to get the real email
Nov 17 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.