My basic solution is to read from the stream some number of UTF-8 bytes, convert them into codepoints, then convert those codepoints into UTF-16 bytes. I want to code this myself, but I don't understand how to test the actual byte sequence.
Let's say I use the following code to ensure I have a UTF-8 encoding (from Evan Jones' Scratch Pad: http://evanjones.ca/python-utf8.html)
Expand|Select|Wrap|Line Numbers
- s = "hello normal string"
- u = unicode( s, "utf-8" )
- backToBytes = u.encode( "utf-8" )