473,796 Members | 2,525 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

newby question: Splitting a string - separator

Hi all,

i am having a textfile which contains a single string with names.
I want to split this string into its records an put them into a list.
In "normal" cases i would do something like:
#!/usr/bin/python
inp = open("file")
data = inp.read()
names = data.split()
inp.close()


The problem is, that the names contain spaces an the records are also
just seprarated by spaces. The only thing i can rely on, ist that the
recordseparator is always more than a single whitespace.

I thought of something like defining the separator for split() by using
a regex for "more than one whitespace". RegEx for whitespace is \s, but
what would i use for "more than one"? \s+?

TIA,
Tom
Dec 8 '05 #1
13 1455
Thomas Liesner wrote:
Hi all,

i am having a textfile which contains a single string with names.
I want to split this string into its records an put them into a list.
In "normal" cases i would do something like:
#!/usr/bin/python
inp = open("file")
data = inp.read()
names = data.split()
inp.close()


The problem is, that the names contain spaces an the records are also
just seprarated by spaces. The only thing i can rely on, ist that the
recordseparator is always more than a single whitespace.

I thought of something like defining the separator for split() by using
a regex for "more than one whitespace". RegEx for whitespace is \s, but
what would i use for "more than one"? \s+?

TIA,
Tom

\s+ gives one or more, you need \s{2,} for two or more:
import re
re.split("\s{2, }","Guido van Rossum Tim Peters Thomas Liesner") ['Guido van Rossum', 'Tim Peters', 'Thomas Liesner']


Michael

Dec 8 '05 #2

Thomas Liesner wrote:
...
The only thing i can rely on, ist that the
recordseparator is always more than a single whitespace.

I thought of something like defining the separator for split() by using
a regex for "more than one whitespace". RegEx for whitespace is \s, but
what would i use for "more than one"? \s+?


For your split regex you could say
"\s\s+"
or
"\s{2,}"

This should work for you:
YOUR_SPLIT_LIST = re.split("\s{2, }", YOUR_STRING)

Yours,
Noah

Dec 8 '05 #3
Jim
Hi Tom,
a regex for "more than one whitespace". RegEx for whitespace is \s, but
what would i use for "more than one"? \s+?


For more than one, I'd use

\s\s+

-Jim

Dec 8 '05 #4
Thomas Liesner wrote:
Hi all,

i am having a textfile which contains a single string with names.
I want to split this string into its records an put them into a list.
In "normal" cases i would do something like:

#!/usr/bin/python
inp = open("file")
data = inp.read()
names = data.split()
inp.close()

The problem is, that the names contain spaces an the records are also
just seprarated by spaces. The only thing i can rely on, ist that the
recordseparator is always more than a single whitespace.

I thought of something like defining the separator for split() by using
a regex for "more than one whitespace". RegEx for whitespace is \s, but
what would i use for "more than one"? \s+?

TIA,
Tom


The one I like best goes like this:

py> data = "Guido van Rossum Tim Peters Thomas Liesner"
py> names = [n for n in data.split() if n]
py> names
['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']

I think it is theoretically faster (and more pythonic) than using regexes.

James
Dec 10 '05 #5
James Stroud wrote:
The one I like best goes like this:

py> data = "Guido van Rossum Tim Peters Thomas Liesner"
py> names = [n for n in data.split() if n]
py> names
['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']

I think it is theoretically faster (and more pythonic) than using regexes.


Unfortunately it gives the wrong result.

Kent
Dec 10 '05 #6
[James Stroud]
The one I like best goes like this:

py> data = "Guido van Rossum Tim Peters Thomas Liesner"
py> names = [n for n in data.split() if n]
py> names
['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']

I think it is theoretically faster (and more pythonic) than using regexes.

[Kent Johnson] Unfortunately it gives the wrong result.


Still, it gets extra points for being such a pleasing example ;-)
Dec 10 '05 #7

Thomas Liesner wrote:
Hi all,

i am having a textfile which contains a single string with names.
I want to split this string into its records an put them into a list.
In "normal" cases i would do something like:
#!/usr/bin/python
inp = open("file")
data = inp.read()
names = data.split()
inp.close()


The problem is, that the names contain spaces an the records are also
just seprarated by spaces. The only thing i can rely on, ist that the
recordseparator is always more than a single whitespace.

I thought of something like defining the separator for split() by using
a regex for "more than one whitespace". RegEx for whitespace is \s, but
what would i use for "more than one"? \s+?

Can I just use "two space" as the seperator ?

[ x.strip() for x in data.split(" ") ]

Dec 10 '05 #8
Kent Johnson wrote:
James Stroud wrote:
The one I like best goes like this:

py> data = "Guido van Rossum Tim Peters Thomas Liesner"
py> names = [n for n in data.split() if n]
py> names
['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']

I think it is theoretically faster (and more pythonic) than using
regexes.

Unfortunately it gives the wrong result.

Kent


Just an example. Here is the "correct version":
names = [n for n in data.split(" ") if n]

James
Dec 10 '05 #9
bo****@gmail.co m wrote:
Thomas Liesner wrote:
Hi all,

i am having a textfile which contains a single string with names.
I want to split this string into its records an put them into a list.
In "normal" cases i would do something like:
#!/usr/bin/python
inp = open("file")
data = inp.read()
names = data.split()
inp.close()

The problem is, that the names contain spaces an the records are also
just seprarated by spaces. The only thing i can rely on, ist that the
recordseparator is always more than a single whitespace.

I thought of something like defining the separator for split() by using
a regex for "more than one whitespace". RegEx for whitespace is \s, but
what would i use for "more than one"? \s+?

Can I just use "two space" as the seperator ?

[ x.strip() for x in data.split(" ") ]

If you like, but it will create dummy entries if there are more than two spaces:
data = "Guido van Rossum Tim Peters Thomas Liesner"
[ x.strip() for x in data.split(" ") ] ['Guido van Rossum', 'Tim Peters', '', 'Thomas Liesner']

You could add a condition to the listcomp:
[name.strip() for name in data.split(" ") if name] ['Guido van Rossum', 'Tim Peters', 'Thomas Liesner']

but what if there is some other whitespace character?
data = "Guido van Rossum Tim Peters \t Thomas Liesner"
[name.strip() for name in data.split(" ") if name] ['Guido van Rossum', 'Tim Peters', '', 'Thomas Liesner']
perhaps a smarter condition?
[name.strip() for name in data.split(" ") if name.strip(" \t")] ['Guido van Rossum', 'Tim Peters', 'Thomas Liesner']

but this is beginning to feel like hard work.
I think this is a case where it's not worth the effort to try to avoid the regexp
import re
re.split("\s{2, }",data) ['Guido van Rossum', 'Tim Peters', 'Thomas Liesner']


Michael
Dec 10 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
1673
by: Aaron Walker | last post by:
I have a feeling this going to end up being something so stupid, but right now I'm confused as hell. I'm trying to code a function, that given a string and a delimiter char, returns a vector of the sub-strings. Here's what I have (I've thrown a main() in there for this mail). --- #include <iostream> #include <string>
9
2048
by: robbie.carlton | last post by:
Hello! I've programmed in c a bit, but nothing very complicated. I've just come back to it after a long sojourn in the lands of functional programming and am completely stumped on a very simple function I'm trying to write. I'm writing a function that takes a string, and returns an array of strings which are the result of splitting the input on whitespace and parentheses (but the parentheses should also be included in the array as...
8
2454
by: ronrsr | last post by:
I'm trying to break up the result tuple into keyword phrases. The keyword phrases are separated by a ; -- the split function is not working the way I believe it should be. Can anyone see what I"m doing wrong? bests, -rsr-
12
1927
by: kevineller794 | last post by:
I want to make a split string function, but it's getting complicated. What I want to do is make a function with a String, BeginStr and an EndStr variable, and I want it to return it in a char array. For example: char teststr; strcpy(teststr, "test:blah;");
0
9679
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9527
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10223
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10172
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10003
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9050
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5441
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5573
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4115
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.