For the most part, the process I had in place was working, but I noticed certain characters in my test file were not decoded correctly. I investigated the values of the corresponding byte values I received from the input stream, and came to the conclusion that the stream was encoded as UTF-8 ( which I have also come to understand is the default in many places in .NET when no other encoding scheme is specified ). I think the problem is that all the characters in the stream were encoded using a single byte, so in the case of the few characters in the file whose UTF-8 encodings required more than one byte, they are improperly decoded.
An example:
One of the characters I am having an issue with is the dash ( hyphen, minus-sign operator ) character (-). It is encoded from the input stream as a single byte with the value 0x96, but I found a resource that lists this character as requiring two bytes ( 0xc2, 0x96 ) in the UTF-8 encoding. The result is that when I convert the byte that is supposed to represent this character to a char, it ends up with the value ( bit-wise ) 0xfffd.
Here is the code I am trying to use to accomplish this:
Expand|Select|Wrap|Line Numbers
- private string serializeFile( )
- {
- StringBuilder fileContents = new StringBuilder( );
- if( this._file != null )
- {
- HttpPostedFile file = this._file;
- byte[ ] fileBytes = new byte[file.ContentLength];
- file.InputStream.Read( fileBytes, 0, file.ContentLength );
- foreach( byte fileByte in fileBytes )
- {
- Char character = Convert.ToChar( fileByte );
- fileContents.Append( character );
- }
- }
- ...
Thank you very much for taking the time to read this, and if you choose to help. Let me know if you need any more information.