I'm trying to split a string into pieces on whitespace, but I want to
save the whitespace characters rather than discarding them.
For example, I want to split the string '1 2' into ['1',' ','2'].
I was certain that there was a way to do this using the standard string
functions, but I just spent some time poring over the documentation
without finding anything.
There's a chance I was instead thinking of something in the re module,
but I also spent some time there without luck. Could someone point me
to the right function, if it exists?
Thanks in advance.
R. 7 1153
On Fri, 01 Apr 2005 14:20:51 -0800, RickMuller wrote: I'm trying to split a string into pieces on whitespace, but I want to save the whitespace characters rather than discarding them.
For example, I want to split the string '1 2' into ['1',' ','2']. I was certain that there was a way to do this using the standard string functions, but I just spent some time poring over the documentation without finding anything.
importPython 2.3.5 (#1, Mar 3 2005, 17:32:12)
[GCC 3.4.3 (Gentoo Linux 3.4.3, ssp-3.4.3-0, pie-8.7.6.6)] on linux2
Type "help", "copyright" , "credits" or "license" for more information. import re whitespaceSplit ter = re.compile("(\w +)") whitespaceSplit ter.split("1 2 3 \t\n5")
['', '1', ' ', '2', ' ', '3', ' \t\n', '5', ''] whitespaceSplit ter.split(" 1 2 3 \t\n5 ")
[' ', '1', ' ', '2', ' ', '3', ' \t\n', '5', ' ']
Note the null strings at the beginning and end if there are no instances
of the split RE at the beginning or end. Pondering the second invocation
should show why they are there, though darned if I can think of a good way
to put it into words.
RickMuller wrote: There's a chance I was instead thinking of something in the re module, but I also spent some time there without luck. Could someone point me to the right function, if it exists?
The re solution Jeremy Bowers is what you want. Here's another (probably
much slower) way for fun (with no surrounding empty strings):
py> from itertools import groupby
py> [''.join(g) for k, g in groupby(' test ing ', lambda x: x.isspace())]
[' ', 'test', ' ', 'ing', ' ']
I tried replacing the lambda thing with an attrgetter, but apparently my
understanding of that isn't perfect... it groups by the identify of the
bound method instead of calling it...
--
Brian Beck
Adventurer of the First Order
[Brian Beck]> py> from itertools import groupby py> [''.join(g) for k, g in groupby(' test ing ', lambda x: x.isspace())] [' ', 'test', ' ', 'ing', ' ']
Brilliant solution!
That leads to a better understanding of groupby as a tool for identifying
transitions without consuming them.
I tried replacing the lambda thing with an attrgetter, but apparently my understanding of that isn't perfect... it groups by the identify of the bound method instead of calling it...
Right.
attrgetter gets but does not call.
If unicode isn't an issue, then the lambda can be removed: [''.join(g) for k, g in groupby(' test ing ', str.isspace)]
[' ', 'test', ' ', 'ing', ' ']
Raymond Hettinger
On Fri, 01 Apr 2005 18:01:49 -0500, Brian Beck wrote: py> from itertools import groupby py> [''.join(g) for k, g in groupby(' test ing ', lambda x: x.isspace())] [' ', 'test', ' ', 'ing', ' ']
I tried replacing the lambda thing with an attrgetter, but apparently my understanding of that isn't perfect... it groups by the identify of the bound method instead of calling it...
Unfortunately, as you pointed out, it is slower:
python timeit.py -s
"import re; x = 'a ab c' * 1000; whitespaceSplit ter = re.compile('(\w +)')"
"whitespaceSpli tter.split(x)"
100 loops, best of 3: 9.47 msec per loop
python timeit.py -s
"from itertools import groupby; x = 'a ab c' * 1000;"
"[''.join(g) for k, g in groupby(x, lambda y: y.isspace())]"
10 loops, best of 3: 65.8 msec per loop
(tried to break it up to be easier to read)
But I like yours much better theoretically. It's also a pretty good demo
of "groupby".
Thanks to everyone who responded!! I guess I have to study my regular
expressions a little more closely.
Jeremy Bowers wrote: On Fri, 01 Apr 2005 14:20:51 -0800, RickMuller wrote:
I'm trying to split a string into pieces on whitespace, but I want
to save the whitespace characters rather than discarding them.
For example, I want to split the string '1 2' into ['1','
','2']. I was certain that there was a way to do this using the standard
string functions, but I just spent some time poring over the documentation without finding anything. importPython 2.3.5 (#1, Mar 3 2005, 17:32:12) [GCC 3.4.3 (Gentoo Linux 3.4.3, ssp-3.4.3-0, pie-8.7.6.6)] on linux2 Type "help", "copyright" , "credits" or "license" for more
information. import re whitespaceSplit ter = re.compile("(\w +)") whitespaceSplit ter.split("1 2 3 \t\n5") ['', '1', ' ', '2', ' ', '3', ' \t\n', '5', ''] whitespaceSplit ter.split(" 1 2 3 \t\n5 ") [' ', '1', ' ', '2', ' ', '3', ' \t\n', '5', ' ']
Note the null strings at the beginning and end if there are no
instances of the split RE at the beginning or end. Pondering the second
invocation should show why they are there, though darned if I can think of a
good way to put it into words.
If you don't want any null strings at the beginning or the end, an
equivalent regexp is: whitespaceSplit ter_2 = re.compile("\w+ |\s+") whitespaceSplit ter_2.findall(" 1 2 3 \t\n5")
['1', ' ', '2', ' ', '3', ' \t\n', '5'] whitespaceSplit ter_2.findall(" 1 2 3 \t\n5 ")
[' ', '1', ' ', '2', ' ', '3', ' \t\n', '5', ' ']
George
George Sakkis wrote: If you don't want any null strings at the beginning or the end, an equivalent regexp is:
whitespaceSplit ter_2 = re.compile("\w+ |\s+") whitespaceSplit ter_2.findall(" 1 2 3 \t\n5") ['1', ' ', '2', ' ', '3', ' \t\n', '5'] whitespaceSplit ter_2.findall(" 1 2 3 \t\n5 ")
[' ', '1', ' ', '2', ' ', '3', ' \t\n', '5', ' ']
Perhaps you may want to use "\s+|\S+" if you have non-alphanumeric
characters in the string.
Reinhold This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Kai Jaeger |
last post by:
I am playing with setting font sizes in CSS using em as unit of
measurement. All seems to be fine. Even Netscape Navigator shows the
characters very similar to IE, what is not the kind if px is used!
But!
when selecting the "Larger" or "Smaller" command from the menubar in
IE, font sizes increases from normal (1em) to, say, 6em or so _in the
first step_!!!
In the next step it seems to be 20em or say. Choosing "Smaller" makes
the text...
|
by: Rakesh |
last post by:
Hi,
I was 'googling' to look out for some ways of optimizing the code
and came across this term - 'hot / cold splitting'.
In short, the discussion is about splitting heavily accessed ( hot )
portions of data structure from rarely accessed cold portions.
I haven't used this one myself anytime before, but am interested in
learning more about this.
Can you please share your experience here, so that I can understand
better and this could...
|
by: Prasad S |
last post by:
Hello
I wish to replace all the characters in a string except those which
are inside '<' & '>' characters. And there could be multiple
occurences of < & > within the string.
e.g. string = "this is an example of <how> many words could be hidden
<under> these characters"
now, from this string all the characters should be searched & replaced
|
by: rong.guo |
last post by:
Greetings!
Please see my data below, for each account, I would need the lastest
balance_date with the corresponding balance. Can anyone help me with
the query? Thanks a lot!
create table a
(account int
,balance_date datetime
,balance money)
|
by: Trint Smith |
last post by:
Ok,
My program has been formating .txt files for input into sql server and
ran into a problem...the .txt is an export from an accounting package
and is only supposed to contain comas (,) between fields in a
table...well, someone has been entering description fields with comas
(,) in the description and now it is splitting between one
field...example:
"santa clause mushrooms, pens, cups and dolls"
I somehow need to NOT split anything...
| |
by: melis |
last post by:
Hi all, I am new to MFC, and cannot find a
way to the following problem :(
What I am trying to do is just to split the window into two parts,
tyring to have a CFormView or CDialog on left and a Cview on the
right.
I found lots of examples and tried them all-now I want to make a new
project from scratch, when I try to do so, I have no problem in
splitting the window to two Cviews but get an assert failure error when
I try it with to use...
|
by: shadow_ |
last post by:
Hi i m new at C and trying to write a parser and a string class.
Basicly program will read data from file and splits it into lines then
lines to words. i used strtok function for splitting data to lines it
worked quite well but srttok isnot working for multiple blank or
commas. Can strtok do this kind of splitting if it cant what should i
use .
Unal
|
by: shrik |
last post by:
I have following error :
Total giant files in replay configuration file are :
File name : /new_file/prob1.rec
Given file /new_file/prob1.rec is successfully verified.
Splitting for giant file /new_file/prob1.rec started. Please wait....
In while loop of request searching
*** glibc detected *** ./a.out: free(): invalid next size (normal): 0x099da890 ***
======= Backtrace: =========
/lib/libc.so.6
|
by: yogi_bear_79 |
last post by:
I have a simple string (i.e. February 27, 2008) that I need to split
into three parts. The month, day, and year. Splitting into a string
array would work, and I could convert day and years to integers
later. I've bene looking around, and everything I see seems more
complicated than it should be! Help!
|
by: xyz |
last post by:
I have a string
16:23:18.659343 131.188.37.230.22 131.188.37.59.1398 tcp 168
for example lets say for the above string
16:23:18.659343 -- time
131.188.37.230 -- srcaddress
22 --srcport
131.188.37.59 --destaddress
1398 --destport
tcp --protocol
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
| |
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
| |
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
|
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |