By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,908 Members | 1,892 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,908 IT Pros & Developers. It's quick & easy.

str.title question after '

P: n/a
I have a text in ascii. I use the ' for an apostroph. The problem is
this gives problems with the title method. I don't want letters
after a ' to be uppercased. Here are some examples:

argument result expected

't smidje 'T Smidje 't Smidje
na'ama Na'Ama Na'ama
al pi tnu'at Al Pi Tnu'At Al Pi Tnu'at
Is there an easy way to get what I want?

Should the current behaviour condidered a bug?
My would be inclined to answer yes, but that may be
because this behaviour would be wrong in Dutch. I'm
not so sure about english.

--
Antoon Pardon
Nov 13 '06 #1
Share this Question
Share on Google+
3 Replies


P: n/a
Antoon Pardon wrote:
I have a text in ascii. I use the ' for an apostroph. The problem is
this gives problems with the title method. I don't want letters
after a ' to be uppercased. Here are some examples:

argument result expected

't smidje 'T Smidje 't Smidje
na'ama Na'Ama Na'ama
al pi tnu'at Al Pi Tnu'At Al Pi Tnu'at
Is there an easy way to get what I want?
Depends on your definition of "easy". Writing your own function that
will regard the apostrophe as a letter would be "easy" in my book.
>
Should the current behaviour condidered a bug?
Its limitations could use some documentation.
My would be inclined to answer yes, but that may be
because this behaviour would be wrong in Dutch. I'm
not so sure about english.
It's not very appropriate for English, either:

| >>"didn't".title()
| "Didn'T"

It's OK for the English way of writing Irish surnames e.g. O'Brien, but
not IMHO very good behaviour for anything else.

The docs say: "Return a titlecased version of the string: words start
with uppercase characters, all remaining cased characters are
lowercase." Evidently the definition of "word" is the culprit.

Doing titlecasing properly depends heavily on the language/locale and
what data you are working on. For example, in the UK and anywhere that
Scots have migrated in reasonable numbers, you would probably want to
do McDonald and MacDonald. Avoiding nonsenses like MacE and MacHin :-)
takes some effort and a look-up table, and may not be cost-effective.

A related problem: some people mistakenly try too hard to correct
perceived data entry errors and also produce nonsenses -- a colleague
of Dutch extraction occasionally received mail addressed to Mr O'Belt
:-)

Cheers,
John

Nov 13 '06 #2

P: n/a

Antoon Pardon wrote:
I have a text in ascii. I use the ' for an apostroph. The problem is
this gives problems with the title method. I don't want letters
after a ' to be uppercased. Here are some examples:

argument result expected

't smidje 'T Smidje 't Smidje
na'ama Na'Ama Na'ama
al pi tnu'at Al Pi Tnu'At Al Pi Tnu'at
Is there an easy way to get what I want?
def title_words(s):
words = re.split('(\s+)', s)
return ''.join(word[0:1].upper()+word[1:] for word in words)
>
Should the current behaviour condidered a bug?
I believe it follows definition of \w from re module.
My would be inclined to answer yes, but that may be
because this behaviour would be wrong in Dutch. I'm
not so sure about english.
The problem is more complicated. First of all, why title() should be
limited to human languages? What about programming languages? Is
"bar.bar.spam" three tokens or one in a foo programming language? There
are some problems with human languages too: how are you going to
process "out-of-the-box" and "italian-american"?

-- Leo

Nov 13 '06 #3

P: n/a
Leo Kislov wrote:
>Is there an easy way to get what I want?

def title_words(s):
words = re.split('(\s+)', s)
return ''.join(word[0:1].upper()+word[1:] for word in words)
nit: to work well also for Unicode strings using arbitrary alphabets,
you should use title() instead of upper(). a naive upper() will do the
wrong thing in some cases, as can be seen in the following example:
>>u = u"\u01C9"
unicodedata.name(u)
'LATIN SMALL LETTER LJ'
>>unicodedata.name(u.upper())
'LATIN CAPITAL LETTER LJ'
>>unicodedata.name(u.title())
'LATIN CAPITAL LETTER L WITH SMALL LETTER J'

</F>

Nov 13 '06 #4

This discussion thread is closed

Replies have been disabled for this discussion.