there is a method that will approximate most of the
characters. This is not guaranteed to work!
I once needed a converter from any codepage to any codepage (as a
matter of fact, all windows codepages to all macintosh codepages). On
this link you can get all the
mappings you'll need for ASCII to Unicode:
http://www.unicode.org/Public/MAPPINGS/VENDORS/
I wrote a parser that built a substitution matrix from two files to
only switch the characters that had different ASCII codes for the same
unicode value. In your case, I'd suggest
you build your matrix from one single file (don't hard code it to keep
your solution flexible).
To make the substitiutions I implemented an Aho-Corasick engine with
callbacks
(you'll definitely want to use this if you want your replacement to be
efficient when processing large files - let's say 1GB)
http://en.wikipedia.org/wiki/Aho-Corasick_algorithm
With this method you are in complete control of what you want to
change. It is also flexible, because you only need to change the file
which holds your substitutions.
Drop me a line and I'll send you some code,
Best Regards,
Joachim