By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
464,730 Members | 1,105 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 464,730 IT Pros & Developers. It's quick & easy.

newbie: how do I test a byte string?

P: 5
How do I test a byte string in Python? I want to manually convert (no libraries or functions) a UTF-8 string into UTF-16.

My basic solution is to read from the stream some number of UTF-8 bytes, convert them into codepoints, then convert those codepoints into UTF-16 bytes. I want to code this myself, but I don't understand how to test the actual byte sequence.

Let's say I use the following code to ensure I have a UTF-8 encoding (from Evan Jones' Scratch Pad: http://evanjones.ca/python-utf8.html)

Expand|Select|Wrap|Line Numbers
  1. s = "hello normal string"
  2. u = unicode( s, "utf-8" )
  3. backToBytes = u.encode( "utf-8" )
  4.  
Now, I need to test the lead byte of the sequence for each character in "backToBytes", right? Is there a function that does this? Any help would be appreciated.
Jul 30 '08 #1
Share this Question
Share on Google+
1 Reply

P: 5
I guess I get to solve my own thread (thanks again to the Natural Language Toolkit's online tutorial). The function repr() appears to give me what I need:

Expand|Select|Wrap|Line Numbers
  1. line = u'\u0144'
  2. line_utf = line.encode('utf8')
  3.  
  4. print 'line = ', line_utf
  5. print 'line repr = ', repr(line_utf)
  6.  
Output:
line = ń
line repr = '\xc5\x84'

It's the '\xc5\x84' part that I needed.
Jul 31 '08 #2

Post your reply

Sign in to post your reply or Sign up for a free account.