473,773 Members | 2,277 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Help with splitting

I'm trying to split a string into pieces on whitespace, but I want to
save the whitespace characters rather than discarding them.

For example, I want to split the string '1 2' into ['1',' ','2'].
I was certain that there was a way to do this using the standard string
functions, but I just spent some time poring over the documentation
without finding anything.

There's a chance I was instead thinking of something in the re module,
but I also spent some time there without luck. Could someone point me
to the right function, if it exists?

Thanks in advance.

R.

Jul 18 '05 #1
7 1153
On Fri, 01 Apr 2005 14:20:51 -0800, RickMuller wrote:
I'm trying to split a string into pieces on whitespace, but I want to
save the whitespace characters rather than discarding them.

For example, I want to split the string '1 2' into ['1',' ','2'].
I was certain that there was a way to do this using the standard string
functions, but I just spent some time poring over the documentation
without finding anything.


importPython 2.3.5 (#1, Mar 3 2005, 17:32:12)
[GCC 3.4.3 (Gentoo Linux 3.4.3, ssp-3.4.3-0, pie-8.7.6.6)] on linux2
Type "help", "copyright" , "credits" or "license" for more information.
import re
whitespaceSplit ter = re.compile("(\w +)")
whitespaceSplit ter.split("1 2 3 \t\n5") ['', '1', ' ', '2', ' ', '3', ' \t\n', '5', ''] whitespaceSplit ter.split(" 1 2 3 \t\n5 ")

[' ', '1', ' ', '2', ' ', '3', ' \t\n', '5', ' ']

Note the null strings at the beginning and end if there are no instances
of the split RE at the beginning or end. Pondering the second invocation
should show why they are there, though darned if I can think of a good way
to put it into words.
Jul 18 '05 #2
RickMuller wrote:
There's a chance I was instead thinking of something in the re module,
but I also spent some time there without luck. Could someone point me
to the right function, if it exists?


The re solution Jeremy Bowers is what you want. Here's another (probably
much slower) way for fun (with no surrounding empty strings):

py> from itertools import groupby
py> [''.join(g) for k, g in groupby(' test ing ', lambda x: x.isspace())]
[' ', 'test', ' ', 'ing', ' ']
I tried replacing the lambda thing with an attrgetter, but apparently my
understanding of that isn't perfect... it groups by the identify of the
bound method instead of calling it...

--
Brian Beck
Adventurer of the First Order
Jul 18 '05 #3
[Brian Beck]>
py> from itertools import groupby
py> [''.join(g) for k, g in groupby(' test ing ', lambda x: x.isspace())]
[' ', 'test', ' ', 'ing', ' ']
Brilliant solution!

That leads to a better understanding of groupby as a tool for identifying
transitions without consuming them.

I tried replacing the lambda thing with an attrgetter, but apparently my
understanding of that isn't perfect... it groups by the identify of the
bound method instead of calling it...


Right.
attrgetter gets but does not call.

If unicode isn't an issue, then the lambda can be removed:
[''.join(g) for k, g in groupby(' test ing ', str.isspace)]

[' ', 'test', ' ', 'ing', ' ']

Raymond Hettinger
Jul 18 '05 #4
On Fri, 01 Apr 2005 18:01:49 -0500, Brian Beck wrote:
py> from itertools import groupby
py> [''.join(g) for k, g in groupby(' test ing ', lambda x: x.isspace())]
[' ', 'test', ' ', 'ing', ' ']

I tried replacing the lambda thing with an attrgetter, but apparently my
understanding of that isn't perfect... it groups by the identify of the
bound method instead of calling it...


Unfortunately, as you pointed out, it is slower:

python timeit.py -s
"import re; x = 'a ab c' * 1000; whitespaceSplit ter = re.compile('(\w +)')"

"whitespaceSpli tter.split(x)"

100 loops, best of 3: 9.47 msec per loop

python timeit.py -s
"from itertools import groupby; x = 'a ab c' * 1000;"

"[''.join(g) for k, g in groupby(x, lambda y: y.isspace())]"

10 loops, best of 3: 65.8 msec per loop

(tried to break it up to be easier to read)

But I like yours much better theoretically. It's also a pretty good demo
of "groupby".
Jul 18 '05 #5
Thanks to everyone who responded!! I guess I have to study my regular
expressions a little more closely.

Jul 18 '05 #6
Jeremy Bowers wrote:
On Fri, 01 Apr 2005 14:20:51 -0800, RickMuller wrote:
I'm trying to split a string into pieces on whitespace, but I want to
save the whitespace characters rather than discarding them.

For example, I want to split the string '1 2' into ['1',' ','2']. I was certain that there was a way to do this using the standard string functions, but I just spent some time poring over the documentation
without finding anything.
importPython 2.3.5 (#1, Mar 3 2005, 17:32:12)
[GCC 3.4.3 (Gentoo Linux 3.4.3, ssp-3.4.3-0, pie-8.7.6.6)] on linux2
Type "help", "copyright" , "credits" or "license" for more

information.
import re
whitespaceSplit ter = re.compile("(\w +)")
whitespaceSplit ter.split("1 2 3 \t\n5") ['', '1', ' ', '2', ' ', '3', ' \t\n', '5', ''] whitespaceSplit ter.split(" 1 2 3 \t\n5 ") [' ', '1', ' ', '2', ' ', '3', ' \t\n', '5', ' ']

Note the null strings at the beginning and end if there are no

instances of the split RE at the beginning or end. Pondering the second invocation should show why they are there, though darned if I can think of a good way to put it into words.


If you don't want any null strings at the beginning or the end, an
equivalent regexp is:
whitespaceSplit ter_2 = re.compile("\w+ |\s+")
whitespaceSplit ter_2.findall(" 1 2 3 \t\n5") ['1', ' ', '2', ' ', '3', ' \t\n', '5'] whitespaceSplit ter_2.findall(" 1 2 3 \t\n5 ")

[' ', '1', ' ', '2', ' ', '3', ' \t\n', '5', ' ']
George

Jul 18 '05 #7
George Sakkis wrote:
If you don't want any null strings at the beginning or the end, an
equivalent regexp is:
whitespaceSplit ter_2 = re.compile("\w+ |\s+")
whitespaceSplit ter_2.findall(" 1 2 3 \t\n5") ['1', ' ', '2', ' ', '3', ' \t\n', '5'] whitespaceSplit ter_2.findall(" 1 2 3 \t\n5 ")

[' ', '1', ' ', '2', ' ', '3', ' \t\n', '5', ' ']


Perhaps you may want to use "\s+|\S+" if you have non-alphanumeric
characters in the string.

Reinhold
Jul 18 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

38
4597
by: Kai Jaeger | last post by:
I am playing with setting font sizes in CSS using em as unit of measurement. All seems to be fine. Even Netscape Navigator shows the characters very similar to IE, what is not the kind if px is used! But! when selecting the "Larger" or "Smaller" command from the menubar in IE, font sizes increases from normal (1em) to, say, 6em or so _in the first step_!!! In the next step it seems to be 20em or say. Choosing "Smaller" makes the text...
3
4142
by: Rakesh | last post by:
Hi, I was 'googling' to look out for some ways of optimizing the code and came across this term - 'hot / cold splitting'. In short, the discussion is about splitting heavily accessed ( hot ) portions of data structure from rarely accessed cold portions. I haven't used this one myself anytime before, but am interested in learning more about this. Can you please share your experience here, so that I can understand better and this could...
4
1899
by: Prasad S | last post by:
Hello I wish to replace all the characters in a string except those which are inside '<' & '>' characters. And there could be multiple occurences of < & > within the string. e.g. string = "this is an example of <how> many words could be hidden <under> these characters" now, from this string all the characters should be searched & replaced
10
2696
by: rong.guo | last post by:
Greetings! Please see my data below, for each account, I would need the lastest balance_date with the corresponding balance. Can anyone help me with the query? Thanks a lot! create table a (account int ,balance_date datetime ,balance money)
2
2522
by: Trint Smith | last post by:
Ok, My program has been formating .txt files for input into sql server and ran into a problem...the .txt is an export from an accounting package and is only supposed to contain comas (,) between fields in a table...well, someone has been entering description fields with comas (,) in the description and now it is splitting between one field...example: "santa clause mushrooms, pens, cups and dolls" I somehow need to NOT split anything...
0
932
by: melis | last post by:
Hi all, I am new to MFC, and cannot find a way to the following problem :( What I am trying to do is just to split the window into two parts, tyring to have a CFormView or CDialog on left and a Cview on the right. I found lots of examples and tried them all-now I want to make a new project from scratch, when I try to do so, I have no problem in splitting the window to two Cviews but get an assert failure error when I try it with to use...
2
3271
by: shadow_ | last post by:
Hi i m new at C and trying to write a parser and a string class. Basicly program will read data from file and splits it into lines then lines to words. i used strtok function for splitting data to lines it worked quite well but srttok isnot working for multiple blank or commas. Can strtok do this kind of splitting if it cant what should i use . Unal
0
6547
by: shrik | last post by:
I have following error : Total giant files in replay configuration file are : File name : /new_file/prob1.rec Given file /new_file/prob1.rec is successfully verified. Splitting for giant file /new_file/prob1.rec started. Please wait.... In while loop of request searching *** glibc detected *** ./a.out: free(): invalid next size (normal): 0x099da890 *** ======= Backtrace: ========= /lib/libc.so.6
4
2825
by: yogi_bear_79 | last post by:
I have a simple string (i.e. February 27, 2008) that I need to split into three parts. The month, day, and year. Splitting into a string array would work, and I could convert day and years to integers later. I've bene looking around, and everything I see seems more complicated than it should be! Help!
37
1859
by: xyz | last post by:
I have a string 16:23:18.659343 131.188.37.230.22 131.188.37.59.1398 tcp 168 for example lets say for the above string 16:23:18.659343 -- time 131.188.37.230 -- srcaddress 22 --srcport 131.188.37.59 --destaddress 1398 --destport tcp --protocol
0
9621
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
1
10039
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9914
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
7463
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6717
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5355
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5484
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4012
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3610
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.