Have been writing a parser for RTF(RichTextFormat) files and have a problem with understanding how to map escaped characters to the correct languages.
I basically want to convert all Text to unicode and links to images to store in a database.
Here is an example in Bulgarian.
RTF:
{\f192\fs20\cf1\lang1026\langfe1033\langnp1026\ins rsid9335274\charrsid3752215
\'c7\'e0 \'e4\'e0 \'f1\'e0 \'e2}
The lang1026 refer's to Bulgarian language, but how do I interpret the hex values of \'C7 etc.
here is the code I tried but , the language mappings do not seem to correspond with the code pages...
Any help appreciated.
Regards,
John
Expand|Select|Wrap|Line Numbers
- Encoding srcEncoding = Encoding.GetEncoding(currentState.lang);
- Encoding unicodeEncoding = Encoding.GetEncoding(1200);
- byte[] InBytes = new byte[1];
- InBytes[0] = (byte)hex_value;
- byte[] outputBytes = UnicodeEncoding.Convert(srcEncoding, unicodeEncoding, InBytes);
- string unicodestring = System.Text.Encoding.Unicode.GetString(str);