By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,316 Members | 2,132 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,316 IT Pros & Developers. It's quick & easy.

newby question: Splitting a string - separator

P: n/a
Hi all,

i am having a textfile which contains a single string with names.
I want to split this string into its records an put them into a list.
In "normal" cases i would do something like:
#!/usr/bin/python
inp = open("file")
data = inp.read()
names = data.split()
inp.close()


The problem is, that the names contain spaces an the records are also
just seprarated by spaces. The only thing i can rely on, ist that the
recordseparator is always more than a single whitespace.

I thought of something like defining the separator for split() by using
a regex for "more than one whitespace". RegEx for whitespace is \s, but
what would i use for "more than one"? \s+?

TIA,
Tom
Dec 8 '05 #1
Share this Question
Share on Google+
13 Replies


P: n/a
Thomas Liesner wrote:
Hi all,

i am having a textfile which contains a single string with names.
I want to split this string into its records an put them into a list.
In "normal" cases i would do something like:
#!/usr/bin/python
inp = open("file")
data = inp.read()
names = data.split()
inp.close()


The problem is, that the names contain spaces an the records are also
just seprarated by spaces. The only thing i can rely on, ist that the
recordseparator is always more than a single whitespace.

I thought of something like defining the separator for split() by using
a regex for "more than one whitespace". RegEx for whitespace is \s, but
what would i use for "more than one"? \s+?

TIA,
Tom

\s+ gives one or more, you need \s{2,} for two or more:
import re
re.split("\s{2,}","Guido van Rossum Tim Peters Thomas Liesner") ['Guido van Rossum', 'Tim Peters', 'Thomas Liesner']


Michael

Dec 8 '05 #2

P: n/a

Thomas Liesner wrote:
...
The only thing i can rely on, ist that the
recordseparator is always more than a single whitespace.

I thought of something like defining the separator for split() by using
a regex for "more than one whitespace". RegEx for whitespace is \s, but
what would i use for "more than one"? \s+?


For your split regex you could say
"\s\s+"
or
"\s{2,}"

This should work for you:
YOUR_SPLIT_LIST = re.split("\s{2,}", YOUR_STRING)

Yours,
Noah

Dec 8 '05 #3

P: n/a
Jim
Hi Tom,
a regex for "more than one whitespace". RegEx for whitespace is \s, but
what would i use for "more than one"? \s+?


For more than one, I'd use

\s\s+

-Jim

Dec 8 '05 #4

P: n/a
Thomas Liesner wrote:
Hi all,

i am having a textfile which contains a single string with names.
I want to split this string into its records an put them into a list.
In "normal" cases i would do something like:

#!/usr/bin/python
inp = open("file")
data = inp.read()
names = data.split()
inp.close()

The problem is, that the names contain spaces an the records are also
just seprarated by spaces. The only thing i can rely on, ist that the
recordseparator is always more than a single whitespace.

I thought of something like defining the separator for split() by using
a regex for "more than one whitespace". RegEx for whitespace is \s, but
what would i use for "more than one"? \s+?

TIA,
Tom


The one I like best goes like this:

py> data = "Guido van Rossum Tim Peters Thomas Liesner"
py> names = [n for n in data.split() if n]
py> names
['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']

I think it is theoretically faster (and more pythonic) than using regexes.

James
Dec 10 '05 #5

P: n/a
James Stroud wrote:
The one I like best goes like this:

py> data = "Guido van Rossum Tim Peters Thomas Liesner"
py> names = [n for n in data.split() if n]
py> names
['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']

I think it is theoretically faster (and more pythonic) than using regexes.


Unfortunately it gives the wrong result.

Kent
Dec 10 '05 #6

P: n/a
[James Stroud]
The one I like best goes like this:

py> data = "Guido van Rossum Tim Peters Thomas Liesner"
py> names = [n for n in data.split() if n]
py> names
['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']

I think it is theoretically faster (and more pythonic) than using regexes.

[Kent Johnson] Unfortunately it gives the wrong result.


Still, it gets extra points for being such a pleasing example ;-)
Dec 10 '05 #7

P: n/a

Thomas Liesner wrote:
Hi all,

i am having a textfile which contains a single string with names.
I want to split this string into its records an put them into a list.
In "normal" cases i would do something like:
#!/usr/bin/python
inp = open("file")
data = inp.read()
names = data.split()
inp.close()


The problem is, that the names contain spaces an the records are also
just seprarated by spaces. The only thing i can rely on, ist that the
recordseparator is always more than a single whitespace.

I thought of something like defining the separator for split() by using
a regex for "more than one whitespace". RegEx for whitespace is \s, but
what would i use for "more than one"? \s+?

Can I just use "two space" as the seperator ?

[ x.strip() for x in data.split(" ") ]

Dec 10 '05 #8

P: n/a
Kent Johnson wrote:
James Stroud wrote:
The one I like best goes like this:

py> data = "Guido van Rossum Tim Peters Thomas Liesner"
py> names = [n for n in data.split() if n]
py> names
['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']

I think it is theoretically faster (and more pythonic) than using
regexes.

Unfortunately it gives the wrong result.

Kent


Just an example. Here is the "correct version":
names = [n for n in data.split(" ") if n]

James
Dec 10 '05 #9

P: n/a
bo****@gmail.com wrote:
Thomas Liesner wrote:
Hi all,

i am having a textfile which contains a single string with names.
I want to split this string into its records an put them into a list.
In "normal" cases i would do something like:
#!/usr/bin/python
inp = open("file")
data = inp.read()
names = data.split()
inp.close()

The problem is, that the names contain spaces an the records are also
just seprarated by spaces. The only thing i can rely on, ist that the
recordseparator is always more than a single whitespace.

I thought of something like defining the separator for split() by using
a regex for "more than one whitespace". RegEx for whitespace is \s, but
what would i use for "more than one"? \s+?

Can I just use "two space" as the seperator ?

[ x.strip() for x in data.split(" ") ]

If you like, but it will create dummy entries if there are more than two spaces:
data = "Guido van Rossum Tim Peters Thomas Liesner"
[ x.strip() for x in data.split(" ") ] ['Guido van Rossum', 'Tim Peters', '', 'Thomas Liesner']

You could add a condition to the listcomp:
[name.strip() for name in data.split(" ") if name] ['Guido van Rossum', 'Tim Peters', 'Thomas Liesner']

but what if there is some other whitespace character?
data = "Guido van Rossum Tim Peters \t Thomas Liesner"
[name.strip() for name in data.split(" ") if name] ['Guido van Rossum', 'Tim Peters', '', 'Thomas Liesner']
perhaps a smarter condition?
[name.strip() for name in data.split(" ") if name.strip(" \t")] ['Guido van Rossum', 'Tim Peters', 'Thomas Liesner']

but this is beginning to feel like hard work.
I think this is a case where it's not worth the effort to try to avoid the regexp
import re
re.split("\s{2,}",data) ['Guido van Rossum', 'Tim Peters', 'Thomas Liesner']


Michael
Dec 10 '05 #10

P: n/a
On Fri, 09 Dec 2005 18:02:02 -0800, James Stroud wrote:
Thomas Liesner wrote:
Hi all,

i am having a textfile which contains a single string with names.
I want to split this string into its records an put them into a list.
In "normal" cases i would do something like:

#!/usr/bin/python
inp = open("file")
data = inp.read()
names = data.split()
inp.close()

The problem is, that the names contain spaces an the records are also
just seprarated by spaces. The only thing i can rely on, ist that the
recordseparator is always more than a single whitespace.

I thought of something like defining the separator for split() by using
a regex for "more than one whitespace". RegEx for whitespace is \s, but
what would i use for "more than one"? \s+?

TIA,
Tom


The one I like best goes like this:

py> data = "Guido van Rossum Tim Peters Thomas Liesner"
py> names = [n for n in data.split() if n]
py> names
['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']

I think it is theoretically faster (and more pythonic) than using regexes.

Yes, but the correct result would be:

['Guido van Rossum', 'Tim Peters', 'Thomas Liesner']

Your code is short, elegant but wrong.

It could also be shorter and more elegant:

# your version
py> data = "Guido van Rossum Tim Peters Thomas Liesner"
py> [n for n in data.split() if n]
['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']

# my version
py> data = "Guido van Rossum Tim Peters Thomas Liesner"
py> data.split()
['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']

The "if n" in the list comp is superfluous, and without that, the whole
list comp is unnecessary.

--
Steven.

Dec 10 '05 #11

P: n/a
Steven D'Aprano wrote:
On Fri, 09 Dec 2005 18:02:02 -0800, James Stroud wrote:

Thomas Liesner wrote:
Hi all,

i am having a textfile which contains a single string with names.
I want to split this string into its records an put them into a list.
In "normal" cases i would do something like:

#!/usr/bin/python
inp = open("file")
data = inp.read()
names = data.split()
inp.close()
The problem is, that the names contain spaces an the records are also
just seprarated by spaces. The only thing i can rely on, ist that the
recordseparator is always more than a single whitespace.

I thought of something like defining the separator for split() by using
a regex for "more than one whitespace". RegEx for whitespace is \s, but
what would i use for "more than one"? \s+?

TIA,
Tom


The one I like best goes like this:

py> data = "Guido van Rossum Tim Peters Thomas Liesner"
py> names = [n for n in data.split() if n]
py> names
['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']

I think it is theoretically faster (and more pythonic) than using regexes.


Yes, but the correct result would be:

['Guido van Rossum', 'Tim Peters', 'Thomas Liesner']

Your code is short, elegant but wrong.

It could also be shorter and more elegant:

# your version
py> data = "Guido van Rossum Tim Peters Thomas Liesner"
py> [n for n in data.split() if n]
['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']

# my version
py> data = "Guido van Rossum Tim Peters Thomas Liesner"
py> data.split()
['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']

The "if n" in the list comp is superfluous, and without that, the whole
list comp is unnecessary.

see my post from 1 hr before this one.
Dec 10 '05 #12

P: n/a
James Stroud <js*****@mbi.ucla.edu> wrote:

The one I like best goes like this:

py> data = "Guido van Rossum Tim Peters Thomas Liesner"
py> names = [n for n in data.split() if n]
py> names
['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']

I think it is theoretically faster (and more pythonic) than using regexes.


But it is slower than this, which produces EXACTLY the same (incorrect)
result:

data = "Guido van Rossum Tim Peters Thomas Liesner"
names = data.split()
--
- Tim Roberts, ti**@probo.com
Providenza & Boekelheide, Inc.
Dec 10 '05 #13

P: n/a
James Stroud wrote:
py> data = "Guido van Rossum Tim Peters Thomas Liesner"
py> names = [n for n in data.split() if n]
py> names
['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']

I think it is theoretically faster (and more pythonic) than using
regexes.


Unfortunately it gives the wrong result.


Just an example. Here is the "correct version":

names = [n for n in data.split(" ") if n]


where "correct" is "still wrong", and "theoretically faster" means "slightly
slower" (at least if fix your version, and precompile the pattern).

</F>

Dec 10 '05 #14

This discussion thread is closed

Replies have been disabled for this discussion.