469,935 Members | 1,983 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,935 developers. It's quick & easy.

newbie: how do I test a byte string?

How do I test a byte string in Python? I want to manually convert (no libraries or functions) a UTF-8 string into UTF-16.

My basic solution is to read from the stream some number of UTF-8 bytes, convert them into codepoints, then convert those codepoints into UTF-16 bytes. I want to code this myself, but I don't understand how to test the actual byte sequence.

Let's say I use the following code to ensure I have a UTF-8 encoding (from Evan Jones' Scratch Pad: http://evanjones.ca/python-utf8.html)

Expand|Select|Wrap|Line Numbers
  1. s = "hello normal string"
  2. u = unicode( s, "utf-8" )
  3. backToBytes = u.encode( "utf-8" )
Now, I need to test the lead byte of the sequence for each character in "backToBytes", right? Is there a function that does this? Any help would be appreciated.
Jul 30 '08 #1
1 1841
I guess I get to solve my own thread (thanks again to the Natural Language Toolkit's online tutorial). The function repr() appears to give me what I need:

Expand|Select|Wrap|Line Numbers
  1. line = u'\u0144'
  2. line_utf = line.encode('utf8')
  4. print 'line = ', line_utf
  5. print 'line repr = ', repr(line_utf)
line = ń
line repr = '\xc5\x84'

It's the '\xc5\x84' part that I needed.
Jul 31 '08 #2

Post your reply

Sign in to post your reply or Sign up for a free account.

Similar topics

9 posts views Thread by lawrence | last post: by
12 posts views Thread by Nollie | last post: by
14 posts views Thread by ThazKool | last post: by
6 posts views Thread by tchaiket | last post: by
4 posts views Thread by Spam Catcher | last post: by
reply views Thread by Madmartigan | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.