Hi all,
i am having a textfile which contains a single string with names.
I want to split this string into its records an put them into a list.
In "normal" cases i would do something like: #!/usr/bin/python inp = open("file") data = inp.read() names = data.split() inp.close()
The problem is, that the names contain spaces an the records are also
just seprarated by spaces. The only thing i can rely on, ist that the
recordseparator is always more than a single whitespace.
I thought of something like defining the separator for split() by using
a regex for "more than one whitespace". RegEx for whitespace is \s, but
what would i use for "more than one"? \s+?
TIA,
Tom 13 1424
Thomas Liesner wrote: Hi all,
i am having a textfile which contains a single string with names. I want to split this string into its records an put them into a list. In "normal" cases i would do something like:
#!/usr/bin/python inp = open("file") data = inp.read() names = data.split() inp.close()
The problem is, that the names contain spaces an the records are also just seprarated by spaces. The only thing i can rely on, ist that the recordseparator is always more than a single whitespace.
I thought of something like defining the separator for split() by using a regex for "more than one whitespace". RegEx for whitespace is \s, but what would i use for "more than one"? \s+?
TIA, Tom
\s+ gives one or more, you need \s{2,} for two or more: import re re.split("\s{2,}","Guido van Rossum Tim Peters Thomas Liesner")
['Guido van Rossum', 'Tim Peters', 'Thomas Liesner']
Michael
Thomas Liesner wrote: ... The only thing i can rely on, ist that the recordseparator is always more than a single whitespace.
I thought of something like defining the separator for split() by using a regex for "more than one whitespace". RegEx for whitespace is \s, but what would i use for "more than one"? \s+?
For your split regex you could say
"\s\s+"
or
"\s{2,}"
This should work for you:
YOUR_SPLIT_LIST = re.split("\s{2,}", YOUR_STRING)
Yours,
Noah
Hi Tom, a regex for "more than one whitespace". RegEx for whitespace is \s, but what would i use for "more than one"? \s+?
For more than one, I'd use
\s\s+
-Jim
Thomas Liesner wrote: Hi all,
i am having a textfile which contains a single string with names. I want to split this string into its records an put them into a list. In "normal" cases i would do something like:
#!/usr/bin/python inp = open("file") data = inp.read() names = data.split() inp.close()
The problem is, that the names contain spaces an the records are also just seprarated by spaces. The only thing i can rely on, ist that the recordseparator is always more than a single whitespace.
I thought of something like defining the separator for split() by using a regex for "more than one whitespace". RegEx for whitespace is \s, but what would i use for "more than one"? \s+?
TIA, Tom
The one I like best goes like this:
py> data = "Guido van Rossum Tim Peters Thomas Liesner"
py> names = [n for n in data.split() if n]
py> names
['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']
I think it is theoretically faster (and more pythonic) than using regexes.
James
James Stroud wrote: The one I like best goes like this:
py> data = "Guido van Rossum Tim Peters Thomas Liesner" py> names = [n for n in data.split() if n] py> names ['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']
I think it is theoretically faster (and more pythonic) than using regexes.
Unfortunately it gives the wrong result.
Kent
[James Stroud] The one I like best goes like this:
py> data = "Guido van Rossum Tim Peters Thomas Liesner" py> names = [n for n in data.split() if n] py> names ['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']
I think it is theoretically faster (and more pythonic) than using regexes.
[Kent Johnson] Unfortunately it gives the wrong result.
Still, it gets extra points for being such a pleasing example ;-)
Thomas Liesner wrote: Hi all,
i am having a textfile which contains a single string with names. I want to split this string into its records an put them into a list. In "normal" cases i would do something like:
#!/usr/bin/python inp = open("file") data = inp.read() names = data.split() inp.close()
The problem is, that the names contain spaces an the records are also just seprarated by spaces. The only thing i can rely on, ist that the recordseparator is always more than a single whitespace.
I thought of something like defining the separator for split() by using a regex for "more than one whitespace". RegEx for whitespace is \s, but what would i use for "more than one"? \s+?
Can I just use "two space" as the seperator ?
[ x.strip() for x in data.split(" ") ]
Kent Johnson wrote: James Stroud wrote:
The one I like best goes like this:
py> data = "Guido van Rossum Tim Peters Thomas Liesner" py> names = [n for n in data.split() if n] py> names ['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']
I think it is theoretically faster (and more pythonic) than using regexes.
Unfortunately it gives the wrong result.
Kent
Just an example. Here is the "correct version":
names = [n for n in data.split(" ") if n]
James bo****@gmail.com wrote: Thomas Liesner wrote: Hi all,
i am having a textfile which contains a single string with names. I want to split this string into its records an put them into a list. In "normal" cases i would do something like:
#!/usr/bin/python inp = open("file") data = inp.read() names = data.split() inp.close() The problem is, that the names contain spaces an the records are also just seprarated by spaces. The only thing i can rely on, ist that the recordseparator is always more than a single whitespace.
I thought of something like defining the separator for split() by using a regex for "more than one whitespace". RegEx for whitespace is \s, but what would i use for "more than one"? \s+? Can I just use "two space" as the seperator ?
[ x.strip() for x in data.split(" ") ]
If you like, but it will create dummy entries if there are more than two spaces: data = "Guido van Rossum Tim Peters Thomas Liesner" [ x.strip() for x in data.split(" ") ]
['Guido van Rossum', 'Tim Peters', '', 'Thomas Liesner']
You could add a condition to the listcomp:
[name.strip() for name in data.split(" ") if name]
['Guido van Rossum', 'Tim Peters', 'Thomas Liesner']
but what if there is some other whitespace character?
data = "Guido van Rossum Tim Peters \t Thomas Liesner" [name.strip() for name in data.split(" ") if name]
['Guido van Rossum', 'Tim Peters', '', 'Thomas Liesner']
perhaps a smarter condition?
[name.strip() for name in data.split(" ") if name.strip(" \t")]
['Guido van Rossum', 'Tim Peters', 'Thomas Liesner']
but this is beginning to feel like hard work.
I think this is a case where it's not worth the effort to try to avoid the regexp
import re re.split("\s{2,}",data)
['Guido van Rossum', 'Tim Peters', 'Thomas Liesner']
Michael
On Fri, 09 Dec 2005 18:02:02 -0800, James Stroud wrote: Thomas Liesner wrote: Hi all,
i am having a textfile which contains a single string with names. I want to split this string into its records an put them into a list. In "normal" cases i would do something like:
#!/usr/bin/python inp = open("file") data = inp.read() names = data.split() inp.close()
The problem is, that the names contain spaces an the records are also just seprarated by spaces. The only thing i can rely on, ist that the recordseparator is always more than a single whitespace.
I thought of something like defining the separator for split() by using a regex for "more than one whitespace". RegEx for whitespace is \s, but what would i use for "more than one"? \s+?
TIA, Tom
The one I like best goes like this:
py> data = "Guido van Rossum Tim Peters Thomas Liesner" py> names = [n for n in data.split() if n] py> names ['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']
I think it is theoretically faster (and more pythonic) than using regexes.
Yes, but the correct result would be:
['Guido van Rossum', 'Tim Peters', 'Thomas Liesner']
Your code is short, elegant but wrong.
It could also be shorter and more elegant:
# your version
py> data = "Guido van Rossum Tim Peters Thomas Liesner"
py> [n for n in data.split() if n]
['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']
# my version
py> data = "Guido van Rossum Tim Peters Thomas Liesner"
py> data.split()
['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']
The "if n" in the list comp is superfluous, and without that, the whole
list comp is unnecessary.
--
Steven.
Steven D'Aprano wrote: On Fri, 09 Dec 2005 18:02:02 -0800, James Stroud wrote:
Thomas Liesner wrote:
Hi all,
i am having a textfile which contains a single string with names. I want to split this string into its records an put them into a list. In "normal" cases i would do something like: #!/usr/bin/python inp = open("file") data = inp.read() names = data.split() inp.close()
The problem is, that the names contain spaces an the records are also just seprarated by spaces. The only thing i can rely on, ist that the recordseparator is always more than a single whitespace.
I thought of something like defining the separator for split() by using a regex for "more than one whitespace". RegEx for whitespace is \s, but what would i use for "more than one"? \s+?
TIA, Tom
The one I like best goes like this:
py> data = "Guido van Rossum Tim Peters Thomas Liesner" py> names = [n for n in data.split() if n] py> names ['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']
I think it is theoretically faster (and more pythonic) than using regexes. Yes, but the correct result would be:
['Guido van Rossum', 'Tim Peters', 'Thomas Liesner']
Your code is short, elegant but wrong.
It could also be shorter and more elegant:
# your version py> data = "Guido van Rossum Tim Peters Thomas Liesner" py> [n for n in data.split() if n] ['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']
# my version py> data = "Guido van Rossum Tim Peters Thomas Liesner" py> data.split() ['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']
The "if n" in the list comp is superfluous, and without that, the whole list comp is unnecessary.
see my post from 1 hr before this one.
James Stroud <js*****@mbi.ucla.edu> wrote: The one I like best goes like this:
py> data = "Guido van Rossum Tim Peters Thomas Liesner" py> names = [n for n in data.split() if n] py> names ['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']
I think it is theoretically faster (and more pythonic) than using regexes.
But it is slower than this, which produces EXACTLY the same (incorrect)
result:
data = "Guido van Rossum Tim Peters Thomas Liesner"
names = data.split()
--
- Tim Roberts, ti**@probo.com
Providenza & Boekelheide, Inc.
James Stroud wrote: py> data = "Guido van Rossum Tim Peters Thomas Liesner" py> names = [n for n in data.split() if n] py> names ['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']
I think it is theoretically faster (and more pythonic) than using regexes.
Unfortunately it gives the wrong result.
Just an example. Here is the "correct version":
names = [n for n in data.split(" ") if n]
where "correct" is "still wrong", and "theoretically faster" means "slightly
slower" (at least if fix your version, and precompile the pattern).
</F> This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Aaron Walker |
last post by:
I have a feeling this going to end up being something so stupid, but
right now I'm confused as hell.
I'm trying to code a function, that given a string and a delimiter char,
returns a vector of...
|
by: robbie.carlton |
last post by:
Hello!
I've programmed in c a bit, but nothing very complicated. I've just
come back to it after a long sojourn in the lands of functional
programming and am completely stumped on a very simple...
|
by: ronrsr |
last post by:
I'm trying to break up the result tuple into keyword phrases. The
keyword phrases are separated by a ; -- the split function is not
working the way I believe it should be. Can anyone see what I"m...
|
by: kevineller794 |
last post by:
I want to make a split string function, but it's getting complicated.
What I want to do is make a function with a String, BeginStr and an
EndStr variable, and I want it to return it in a char...
|
by: Faith0G |
last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome former...
|
by: ryjfgjl |
last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
|
by: taylorcarr |
last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
|
by: ryjfgjl |
last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
|
by: ryjfgjl |
last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
|
by: emmanuelkatto |
last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud.
Please let me know.
Thanks!
Emmanuel
|
by: Sonnysonu |
last post by:
This is the data of csv file
1 2 3
1 2 3
1 2 3
1 2 3
2 3
2 3
3
the lengths should be different i have to store the data by column-wise with in the specific length.
suppose the i have to...
|
by: Hystou |
last post by:
There are some requirements for setting up RAID:
1. The motherboard and BIOS support RAID configuration.
2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
| |