471,091 Members | 1,553 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,091 software developers and data experts.

unicode and socket

hello all,
I am new in Python. And I have got a problem about unicode.
I have got a unicode string, when I was going to send it out throuth a
socket by send(), I got an exception. How can I send the unicode string
to the remote end of the socket as it is without any conversion of
encode, so the remote end of the socket will receive unicode string?

Thanks

Jul 18 '05 #1
9 4879
You could not. Unicode is an abstract data type. It must be encoded into
octets in order to send via socket. And the other end must decode the
octets to retrieve the unicode string. Needless to say the encoding scheme
must be consistent and understood by both ends.
On 18 Feb 2005 11:03:46 -0800, <zy*****@163.net> wrote:
hello all,
I am new in Python. And I have got a problem about unicode.
I have got a unicode string, when I was going to send it out throuth a
socket by send(), I got an exception. How can I send the unicode string
to the remote end of the socket as it is without any conversion of
encode, so the remote end of the socket will receive unicode string?

Thanks


Jul 18 '05 #2
aurora wrote:
You could not. Unicode is an abstract data type. It must be encoded
into octets in order to send via socket. And the other end must decode
the octets to retrieve the unicode string. Needless to say the encoding
scheme must be consistent and understood by both ends.


So use pickle.

--Irmen
Jul 18 '05 #3
Irmen de Jong wrote:
aurora wrote:
You could not. Unicode is an abstract data type. It must be encoded
into octets in order to send via socket. And the other end must
decode the octets to retrieve the unicode string. Needless to say the
encoding scheme must be consistent and understood by both ends.

So use pickle.

--Irmen


Well, on second thought: don't use pickle.
If all you want to transfer is unicode strings (or normal strings)
it's safer to just encode them to, say, UTF-8, transfer
that octet stream across, and on the other side, decode the
UTF-8 octets back into a unicode string.
--Irmen
Jul 18 '05 #4
You probably want to use UTF-16 or UTF-8 on both sides of the socket.

See http://www.python.org/moin/Unicode for more information.

So, we have a Unicode string...
mystring=u'eggs and ham'
mystring u'eggs and ham'

Now, we want to send it over: to_send=mystring.encode('utf-8')
to_send 'eggs and ham'

It's encoded in UTF-8 now.

On the other side, (result=to_send,) we decode:
result=received.decode('utf-8')
result

u'eggs and ham'

You have transfered a unicode string. {:)}=

Jul 18 '05 #5
It's really funny, I cannot send a unicode stream throuth socket with
python while all the other languages as perl,c and java can do it.
then, how about converting the unicode string to a binary stream? It is
possible to send a binary through socket with python?

Jul 18 '05 #6
zy*****@163.net wrote:
It's really funny, I cannot send a unicode stream throuth socket with
python while all the other languages as perl,c and java can do it.
You may really start laughing loudly <wink> after you find out that you
can send arbitrary python objects over sockets. If you want language
specific way of sending objects, see Irmen's first answer: use pickle.
then, how about converting the unicode string to a binary stream?
Sure, there are already three answers in this thread that suggest you
to do that. Use encode method of unicode strings.
It is possible to send a binary through socket with python?


Sure. If it wouldn't be possible to send bytes through sockets with Python
what else do you think could be sent? Perhaps you're confused that
bytes are stored in byte strings in Python, which are often called strings in
documentation and conversations? It will be fixed in Python 3.0, but
these days you have to store bytes in str type.

Serge.


Jul 18 '05 #7
On 18 Feb 2005 19:10:36 -0800, <zy*****@163.net> wrote:
It's really funny, I cannot send a unicode stream throuth socket with
python while all the other languages as perl,c and java can do it.
then, how about converting the unicode string to a binary stream? It is
possible to send a binary through socket with python?


I was answering your specific question:

"How can I send the unicode string to the remote end of the socket as it
is without any conversion of encode"

The answer is you could not. Not that you cannot sent unicode but you have
to encode it. The same applies to perl, c or Java. The only difference is
the detail of how strings get encoded.

There are a few posts suggest various means. Or you can check out
codecs.getwriter() which closer resembles Java's way.
Jul 18 '05 #8
anonymous coward <zy*****@163.net> wrote:
It's really funny, I cannot send a unicode stream throuth socket with
python while all the other languages as perl,c and java can do it.


Are you sure you understand what Unicode is, and how sockets work?

Sockets are used to transfer byte streams. If you want to transfer
a python-level object, you have to decide how to encode it as a
byte stream. For integers, you have to decide whether to use a single
byte, a string of decimal ascii characters, netstring syntax, etc. For
text, you have to decide what character encoding to use. For arbitrary
objects, you have to decide what serialisation protocol to use. etc.

(and yes, the same applies to all other languages. Java sockets and C
sockets are no different from Python sockets...)

</F>

Jul 18 '05 #9
On 18 Feb 2005 19:10:36 -0800, rumours say that zy*****@163.net might have
written:
It's really funny, I cannot send a unicode stream throuth socket with
python while all the other languages as perl,c and java can do it.


I don't know about perl. What I think you mean by unicode in C most probably is
the wchar_t, which is Unicode encoded as 'ucs-2' or 'utf-16' (little or big
endian, depending on your platform) or maybe a 4-byte int, for which I don't
know a Python equivalent. And I /assume/ in Java that Unicode is equivalent to
'utf-16' encoded strings when input/output.

Perhaps Unicode encoded as 'utf-16' is what you're after. However, Unicode
encoded as 'utf-8' (like others also suggested) might be what you /should/ be
using, given that this encoding has some attractive properties (no null bytes,
no spurious control characters etc).

Don't interpret as weakness the explicitness requested from Python.
--
TZOTZIOY, I speak England very best.
"Be strict when sending and tolerant when receiving." (from RFC1958)
I really should keep that in mind when talking with people, actually...
Jul 18 '05 #10

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

1 post views Thread by Pettersen, Bjorn S | last post: by
11 posts views Thread by Laurent Therond | last post: by
5 posts views Thread by John Roth | last post: by
4 posts views Thread by Ivan Voras | last post: by
8 posts views Thread by wael | last post: by
2 posts views Thread by Ben | last post: by
reply views Thread by raghupise | last post: by
14 posts views Thread by Russell E. Owen | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.