473,320 Members | 2,158 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

''.join() with encoded strings

I'd love to know why calling ''.join() on a list of encoded strings
automatically results in converting to the default encoding. First of
all, it's undocumented, so If I didn't have non-ascii characters in my
utf-8 data, I'd never have known until one day I did, and then the code
would break. Secondly you can't override (for valid reasons) the
default encoding, so that's not a way around it. So ''.join becomes
pretty useless when dealing with the real (non-ascii) world.

I won't miss the str class when it finally goes (in v3?).

How can I join my encoded strings effeciently?

Thanks,
-Sandra

Feb 27 '06 #1
3 1672
Sandra-24 wrote:
I'd love to know why calling ''.join() on a list of encoded strings
automatically results in converting to the default encoding. First of
all, it's undocumented, so If I didn't have non-ascii characters in my
utf-8 data, I'd never have known until one day I did, and then the code
would break. Secondly you can't override (for valid reasons) the
default encoding, so that's not a way around it. So ''.join becomes
pretty useless when dealing with the real (non-ascii) world.

I won't miss the str class when it finally goes (in v3?).

How can I join my encoded strings effeciently?


By not mixing unicode objects with ordinary byte strings. Use

u''.join(some_unicode_objects)

to get a joined unicode object.

Diez
Feb 27 '06 #2
"Sandra-24" wrote:
I'd love to know why calling ''.join() on a list of encoded strings
automatically results in converting to the default encoding. First of
all, it's undocumented, so If I didn't have non-ascii characters in my
utf-8 data, I'd never have known until one day I did, and then the code
would break. Secondly you can't override (for valid reasons) the
default encoding, so that's not a way around it. So ''.join becomes
pretty useless when dealing with the real (non-ascii) world.


if all strings in a sequence are encoded strings (byte buffers), join does
the right thing.

if all strings in a sequence are Unicode strings, join does the right thing.

if all strings are ascii strings, join does the right thing.

the only way to mess up is to mix byte buffers containing encoded data
with decoded strings. the solution is simple: make sure to *decode* all
data you're using, *before* using it.

</F>

Feb 27 '06 #3
Sorry, this was my mistake, I had some unicode strings in the list
without realizing it. I deleted the topic within 10 minutes, but
apparently I wasn't fast enough. You're right join works the way it
should, I just wasn't aware I had the unicode strings in there.

-Sandra

Feb 27 '06 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

16
by: Jim Hefferon | last post by:
Hello, I'm getting an error join-ing strings and wonder if someone can explain why the function is behaving this way? If I .join in a string that contains a high character then I get an ascii...
46
by: Leo Breebaart | last post by:
I've tried Googling for this, but practically all discussions on str.join() focus on the yuck-ugly-shouldn't-it-be-a-list-method? issue, which is not my problem/question at all. What I can't...
3
by: Supratim | last post by:
Hi, For past few weeks I am working on a function that would take encoded Unicode characters from query string of http requests and then decode them back to Unicode numbers. I have full success...
14
by: Bob | last post by:
I have a function that takes in a list of IDs (hundreds) as input parameter and needs to pass the data to another step as a comma delimited string. The source can easily create this list of IDs in...
1
by: yuri | last post by:
Hello, Is it possible to create a binding that would map an input message with a part defined as a complex-type element to a url-encoded string? For example, wsdl file defines a message as ...
4
by: micahc | last post by:
I currently have a Python program that reads in emails from a POP3 server. As soon as the message is read in it is fed directly into a PostgreSQL database for storage. Later, it is broken down...
27
by: Paulo da Silva | last post by:
Hi! I was told in this NG that string is obsolet. I should use str methods. So, how do I join a list of strings delimited by a given char, let's say ','? Old way:
6
by: Matt Mackal | last post by:
I have an application that occassionally is called upon to process strings that are a substantial portion of the size of memory. For various reasons, the resultant strings must fit completely in...
54
by: bearophileHUGS | last post by:
Empty Python lists don't know the type of the items it will contain, so this sounds strange: 0 Because that may be an empty sequence of someobject: 0 In a statically typed language in...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
0
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.