473,387 Members | 1,619 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

unicode and socket

hello all,
I am new in Python. And I have got a problem about unicode.
I have got a unicode string, when I was going to send it out throuth a
socket by send(), I got an exception. How can I send the unicode string
to the remote end of the socket as it is without any conversion of
encode, so the remote end of the socket will receive unicode string?

Thanks

Jul 18 '05 #1
9 5050
You could not. Unicode is an abstract data type. It must be encoded into
octets in order to send via socket. And the other end must decode the
octets to retrieve the unicode string. Needless to say the encoding scheme
must be consistent and understood by both ends.
On 18 Feb 2005 11:03:46 -0800, <zy*****@163.net> wrote:
hello all,
I am new in Python. And I have got a problem about unicode.
I have got a unicode string, when I was going to send it out throuth a
socket by send(), I got an exception. How can I send the unicode string
to the remote end of the socket as it is without any conversion of
encode, so the remote end of the socket will receive unicode string?

Thanks


Jul 18 '05 #2
aurora wrote:
You could not. Unicode is an abstract data type. It must be encoded
into octets in order to send via socket. And the other end must decode
the octets to retrieve the unicode string. Needless to say the encoding
scheme must be consistent and understood by both ends.


So use pickle.

--Irmen
Jul 18 '05 #3
Irmen de Jong wrote:
aurora wrote:
You could not. Unicode is an abstract data type. It must be encoded
into octets in order to send via socket. And the other end must
decode the octets to retrieve the unicode string. Needless to say the
encoding scheme must be consistent and understood by both ends.

So use pickle.

--Irmen


Well, on second thought: don't use pickle.
If all you want to transfer is unicode strings (or normal strings)
it's safer to just encode them to, say, UTF-8, transfer
that octet stream across, and on the other side, decode the
UTF-8 octets back into a unicode string.
--Irmen
Jul 18 '05 #4
You probably want to use UTF-16 or UTF-8 on both sides of the socket.

See http://www.python.org/moin/Unicode for more information.

So, we have a Unicode string...
mystring=u'eggs and ham'
mystring u'eggs and ham'

Now, we want to send it over: to_send=mystring.encode('utf-8')
to_send 'eggs and ham'

It's encoded in UTF-8 now.

On the other side, (result=to_send,) we decode:
result=received.decode('utf-8')
result

u'eggs and ham'

You have transfered a unicode string. {:)}=

Jul 18 '05 #5
It's really funny, I cannot send a unicode stream throuth socket with
python while all the other languages as perl,c and java can do it.
then, how about converting the unicode string to a binary stream? It is
possible to send a binary through socket with python?

Jul 18 '05 #6
zy*****@163.net wrote:
It's really funny, I cannot send a unicode stream throuth socket with
python while all the other languages as perl,c and java can do it.
You may really start laughing loudly <wink> after you find out that you
can send arbitrary python objects over sockets. If you want language
specific way of sending objects, see Irmen's first answer: use pickle.
then, how about converting the unicode string to a binary stream?
Sure, there are already three answers in this thread that suggest you
to do that. Use encode method of unicode strings.
It is possible to send a binary through socket with python?


Sure. If it wouldn't be possible to send bytes through sockets with Python
what else do you think could be sent? Perhaps you're confused that
bytes are stored in byte strings in Python, which are often called strings in
documentation and conversations? It will be fixed in Python 3.0, but
these days you have to store bytes in str type.

Serge.


Jul 18 '05 #7
On 18 Feb 2005 19:10:36 -0800, <zy*****@163.net> wrote:
It's really funny, I cannot send a unicode stream throuth socket with
python while all the other languages as perl,c and java can do it.
then, how about converting the unicode string to a binary stream? It is
possible to send a binary through socket with python?


I was answering your specific question:

"How can I send the unicode string to the remote end of the socket as it
is without any conversion of encode"

The answer is you could not. Not that you cannot sent unicode but you have
to encode it. The same applies to perl, c or Java. The only difference is
the detail of how strings get encoded.

There are a few posts suggest various means. Or you can check out
codecs.getwriter() which closer resembles Java's way.
Jul 18 '05 #8
anonymous coward <zy*****@163.net> wrote:
It's really funny, I cannot send a unicode stream throuth socket with
python while all the other languages as perl,c and java can do it.


Are you sure you understand what Unicode is, and how sockets work?

Sockets are used to transfer byte streams. If you want to transfer
a python-level object, you have to decide how to encode it as a
byte stream. For integers, you have to decide whether to use a single
byte, a string of decimal ascii characters, netstring syntax, etc. For
text, you have to decide what character encoding to use. For arbitrary
objects, you have to decide what serialisation protocol to use. etc.

(and yes, the same applies to all other languages. Java sockets and C
sockets are no different from Python sockets...)

</F>

Jul 18 '05 #9
On 18 Feb 2005 19:10:36 -0800, rumours say that zy*****@163.net might have
written:
It's really funny, I cannot send a unicode stream throuth socket with
python while all the other languages as perl,c and java can do it.


I don't know about perl. What I think you mean by unicode in C most probably is
the wchar_t, which is Unicode encoded as 'ucs-2' or 'utf-16' (little or big
endian, depending on your platform) or maybe a 4-byte int, for which I don't
know a Python equivalent. And I /assume/ in Java that Unicode is equivalent to
'utf-16' encoded strings when input/output.

Perhaps Unicode encoded as 'utf-16' is what you're after. However, Unicode
encoded as 'utf-8' (like others also suggested) might be what you /should/ be
using, given that this encoding has some attractive properties (no null bytes,
no spurious control characters etc).

Don't interpret as weakness the explicitness requested from Python.
--
TZOTZIOY, I speak England very best.
"Be strict when sending and tolerant when receiving." (from RFC1958)
I really should keep that in mind when talking with people, actually...
Jul 18 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Pettersen, Bjorn S | last post by:
I've been trying to stay blissfully unaware of Unicode, however now it seems like it's my turn. From the outside it seems like a rather massive subject, so any pointers as to where I should _start_...
11
by: Laurent Therond | last post by:
Maybe you have a minute to clarify the following matter... Consider: --- from cStringIO import StringIO def bencode_rec(x, b): t = type(x)
5
by: John Roth | last post by:
I've got an interesting little problem that I can't find an answer to after hunting through the doc (2.3.3). I've got a string that contains something that kind of resembles an HTML document. On...
4
by: Ivan Voras | last post by:
I have a string fetched from database, in iso8859-2, with 8bit characters, and I'm trying to send it over the network, via a socket: File "E:\Python24\lib\socket.py", line 249, in write data =...
8
by: wael | last post by:
hello all, i want convert w_char to UCS2 encoded (0041) this is a char encoded UCS2 please look at this http://www.unicode.org/charts/ http://www.unicode.org/ every language has a chart bye...
2
by: Ben | last post by:
I'm left with some legacy code using plain old str, and I need to make sure it works with unicode input/output. I have a simple plan to do this: - Run the code with "python -U" so all the string...
2
by: John Nagle | last post by:
Here's a strange little bug. "socket.getaddrinfo" blows up if given a bad domain name containing ".." in Unicode. The same string in ASCII produces the correct "gaierror" exception. Actually,...
0
by: raghupise | last post by:
Hi everybody, I have one already existing project(Socket programmin). I have to make it comfortable for unicode as well as common strings. Shall i take whcar_t datatye instead of char datatype....
14
by: Russell E. Owen | last post by:
I have code like this: except Exception, e: self.setState(self.Failed, str(e)) which fails if the exception contains a unicode argument. I did, of course, try unicode(e) but that fails. The...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.