473,382 Members | 1,639 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,382 software developers and data experts.

newby question: Splitting a string - separator

Hi all,

i am having a textfile which contains a single string with names.
I want to split this string into its records an put them into a list.
In "normal" cases i would do something like:
#!/usr/bin/python
inp = open("file")
data = inp.read()
names = data.split()
inp.close()


The problem is, that the names contain spaces an the records are also
just seprarated by spaces. The only thing i can rely on, ist that the
recordseparator is always more than a single whitespace.

I thought of something like defining the separator for split() by using
a regex for "more than one whitespace". RegEx for whitespace is \s, but
what would i use for "more than one"? \s+?

TIA,
Tom
Dec 8 '05 #1
13 1424
Thomas Liesner wrote:
Hi all,

i am having a textfile which contains a single string with names.
I want to split this string into its records an put them into a list.
In "normal" cases i would do something like:
#!/usr/bin/python
inp = open("file")
data = inp.read()
names = data.split()
inp.close()


The problem is, that the names contain spaces an the records are also
just seprarated by spaces. The only thing i can rely on, ist that the
recordseparator is always more than a single whitespace.

I thought of something like defining the separator for split() by using
a regex for "more than one whitespace". RegEx for whitespace is \s, but
what would i use for "more than one"? \s+?

TIA,
Tom

\s+ gives one or more, you need \s{2,} for two or more:
import re
re.split("\s{2,}","Guido van Rossum Tim Peters Thomas Liesner") ['Guido van Rossum', 'Tim Peters', 'Thomas Liesner']


Michael

Dec 8 '05 #2

Thomas Liesner wrote:
...
The only thing i can rely on, ist that the
recordseparator is always more than a single whitespace.

I thought of something like defining the separator for split() by using
a regex for "more than one whitespace". RegEx for whitespace is \s, but
what would i use for "more than one"? \s+?


For your split regex you could say
"\s\s+"
or
"\s{2,}"

This should work for you:
YOUR_SPLIT_LIST = re.split("\s{2,}", YOUR_STRING)

Yours,
Noah

Dec 8 '05 #3
Jim
Hi Tom,
a regex for "more than one whitespace". RegEx for whitespace is \s, but
what would i use for "more than one"? \s+?


For more than one, I'd use

\s\s+

-Jim

Dec 8 '05 #4
Thomas Liesner wrote:
Hi all,

i am having a textfile which contains a single string with names.
I want to split this string into its records an put them into a list.
In "normal" cases i would do something like:

#!/usr/bin/python
inp = open("file")
data = inp.read()
names = data.split()
inp.close()

The problem is, that the names contain spaces an the records are also
just seprarated by spaces. The only thing i can rely on, ist that the
recordseparator is always more than a single whitespace.

I thought of something like defining the separator for split() by using
a regex for "more than one whitespace". RegEx for whitespace is \s, but
what would i use for "more than one"? \s+?

TIA,
Tom


The one I like best goes like this:

py> data = "Guido van Rossum Tim Peters Thomas Liesner"
py> names = [n for n in data.split() if n]
py> names
['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']

I think it is theoretically faster (and more pythonic) than using regexes.

James
Dec 10 '05 #5
James Stroud wrote:
The one I like best goes like this:

py> data = "Guido van Rossum Tim Peters Thomas Liesner"
py> names = [n for n in data.split() if n]
py> names
['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']

I think it is theoretically faster (and more pythonic) than using regexes.


Unfortunately it gives the wrong result.

Kent
Dec 10 '05 #6
[James Stroud]
The one I like best goes like this:

py> data = "Guido van Rossum Tim Peters Thomas Liesner"
py> names = [n for n in data.split() if n]
py> names
['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']

I think it is theoretically faster (and more pythonic) than using regexes.

[Kent Johnson] Unfortunately it gives the wrong result.


Still, it gets extra points for being such a pleasing example ;-)
Dec 10 '05 #7

Thomas Liesner wrote:
Hi all,

i am having a textfile which contains a single string with names.
I want to split this string into its records an put them into a list.
In "normal" cases i would do something like:
#!/usr/bin/python
inp = open("file")
data = inp.read()
names = data.split()
inp.close()


The problem is, that the names contain spaces an the records are also
just seprarated by spaces. The only thing i can rely on, ist that the
recordseparator is always more than a single whitespace.

I thought of something like defining the separator for split() by using
a regex for "more than one whitespace". RegEx for whitespace is \s, but
what would i use for "more than one"? \s+?

Can I just use "two space" as the seperator ?

[ x.strip() for x in data.split(" ") ]

Dec 10 '05 #8
Kent Johnson wrote:
James Stroud wrote:
The one I like best goes like this:

py> data = "Guido van Rossum Tim Peters Thomas Liesner"
py> names = [n for n in data.split() if n]
py> names
['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']

I think it is theoretically faster (and more pythonic) than using
regexes.

Unfortunately it gives the wrong result.

Kent


Just an example. Here is the "correct version":
names = [n for n in data.split(" ") if n]

James
Dec 10 '05 #9
bo****@gmail.com wrote:
Thomas Liesner wrote:
Hi all,

i am having a textfile which contains a single string with names.
I want to split this string into its records an put them into a list.
In "normal" cases i would do something like:
#!/usr/bin/python
inp = open("file")
data = inp.read()
names = data.split()
inp.close()

The problem is, that the names contain spaces an the records are also
just seprarated by spaces. The only thing i can rely on, ist that the
recordseparator is always more than a single whitespace.

I thought of something like defining the separator for split() by using
a regex for "more than one whitespace". RegEx for whitespace is \s, but
what would i use for "more than one"? \s+?

Can I just use "two space" as the seperator ?

[ x.strip() for x in data.split(" ") ]

If you like, but it will create dummy entries if there are more than two spaces:
data = "Guido van Rossum Tim Peters Thomas Liesner"
[ x.strip() for x in data.split(" ") ] ['Guido van Rossum', 'Tim Peters', '', 'Thomas Liesner']

You could add a condition to the listcomp:
[name.strip() for name in data.split(" ") if name] ['Guido van Rossum', 'Tim Peters', 'Thomas Liesner']

but what if there is some other whitespace character?
data = "Guido van Rossum Tim Peters \t Thomas Liesner"
[name.strip() for name in data.split(" ") if name] ['Guido van Rossum', 'Tim Peters', '', 'Thomas Liesner']
perhaps a smarter condition?
[name.strip() for name in data.split(" ") if name.strip(" \t")] ['Guido van Rossum', 'Tim Peters', 'Thomas Liesner']

but this is beginning to feel like hard work.
I think this is a case where it's not worth the effort to try to avoid the regexp
import re
re.split("\s{2,}",data) ['Guido van Rossum', 'Tim Peters', 'Thomas Liesner']


Michael
Dec 10 '05 #10
On Fri, 09 Dec 2005 18:02:02 -0800, James Stroud wrote:
Thomas Liesner wrote:
Hi all,

i am having a textfile which contains a single string with names.
I want to split this string into its records an put them into a list.
In "normal" cases i would do something like:

#!/usr/bin/python
inp = open("file")
data = inp.read()
names = data.split()
inp.close()

The problem is, that the names contain spaces an the records are also
just seprarated by spaces. The only thing i can rely on, ist that the
recordseparator is always more than a single whitespace.

I thought of something like defining the separator for split() by using
a regex for "more than one whitespace". RegEx for whitespace is \s, but
what would i use for "more than one"? \s+?

TIA,
Tom


The one I like best goes like this:

py> data = "Guido van Rossum Tim Peters Thomas Liesner"
py> names = [n for n in data.split() if n]
py> names
['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']

I think it is theoretically faster (and more pythonic) than using regexes.

Yes, but the correct result would be:

['Guido van Rossum', 'Tim Peters', 'Thomas Liesner']

Your code is short, elegant but wrong.

It could also be shorter and more elegant:

# your version
py> data = "Guido van Rossum Tim Peters Thomas Liesner"
py> [n for n in data.split() if n]
['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']

# my version
py> data = "Guido van Rossum Tim Peters Thomas Liesner"
py> data.split()
['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']

The "if n" in the list comp is superfluous, and without that, the whole
list comp is unnecessary.

--
Steven.

Dec 10 '05 #11
Steven D'Aprano wrote:
On Fri, 09 Dec 2005 18:02:02 -0800, James Stroud wrote:

Thomas Liesner wrote:
Hi all,

i am having a textfile which contains a single string with names.
I want to split this string into its records an put them into a list.
In "normal" cases i would do something like:

#!/usr/bin/python
inp = open("file")
data = inp.read()
names = data.split()
inp.close()
The problem is, that the names contain spaces an the records are also
just seprarated by spaces. The only thing i can rely on, ist that the
recordseparator is always more than a single whitespace.

I thought of something like defining the separator for split() by using
a regex for "more than one whitespace". RegEx for whitespace is \s, but
what would i use for "more than one"? \s+?

TIA,
Tom


The one I like best goes like this:

py> data = "Guido van Rossum Tim Peters Thomas Liesner"
py> names = [n for n in data.split() if n]
py> names
['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']

I think it is theoretically faster (and more pythonic) than using regexes.


Yes, but the correct result would be:

['Guido van Rossum', 'Tim Peters', 'Thomas Liesner']

Your code is short, elegant but wrong.

It could also be shorter and more elegant:

# your version
py> data = "Guido van Rossum Tim Peters Thomas Liesner"
py> [n for n in data.split() if n]
['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']

# my version
py> data = "Guido van Rossum Tim Peters Thomas Liesner"
py> data.split()
['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']

The "if n" in the list comp is superfluous, and without that, the whole
list comp is unnecessary.

see my post from 1 hr before this one.
Dec 10 '05 #12
James Stroud <js*****@mbi.ucla.edu> wrote:

The one I like best goes like this:

py> data = "Guido van Rossum Tim Peters Thomas Liesner"
py> names = [n for n in data.split() if n]
py> names
['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']

I think it is theoretically faster (and more pythonic) than using regexes.


But it is slower than this, which produces EXACTLY the same (incorrect)
result:

data = "Guido van Rossum Tim Peters Thomas Liesner"
names = data.split()
--
- Tim Roberts, ti**@probo.com
Providenza & Boekelheide, Inc.
Dec 10 '05 #13
James Stroud wrote:
py> data = "Guido van Rossum Tim Peters Thomas Liesner"
py> names = [n for n in data.split() if n]
py> names
['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']

I think it is theoretically faster (and more pythonic) than using
regexes.


Unfortunately it gives the wrong result.


Just an example. Here is the "correct version":

names = [n for n in data.split(" ") if n]


where "correct" is "still wrong", and "theoretically faster" means "slightly
slower" (at least if fix your version, and precompile the pattern).

</F>

Dec 10 '05 #14

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Aaron Walker | last post by:
I have a feeling this going to end up being something so stupid, but right now I'm confused as hell. I'm trying to code a function, that given a string and a delimiter char, returns a vector of...
9
by: robbie.carlton | last post by:
Hello! I've programmed in c a bit, but nothing very complicated. I've just come back to it after a long sojourn in the lands of functional programming and am completely stumped on a very simple...
8
by: ronrsr | last post by:
I'm trying to break up the result tuple into keyword phrases. The keyword phrases are separated by a ; -- the split function is not working the way I believe it should be. Can anyone see what I"m...
12
by: kevineller794 | last post by:
I want to make a split string function, but it's getting complicated. What I want to do is make a function with a String, BeginStr and an EndStr variable, and I want it to return it in a char...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.