multi split function taking delimiter list

martinskou

Hi, I'm looking for something like:

multi_split( 'a:=b+c' , [':=','+'] )

returning:
['a', ':=', 'b', '+', 'c']

whats the python way to achieve this, preferably without regexp?

Thanks.

Martin

Nov 14 '06 #1

Subscribe Post Reply

2295

Raymond Hettinger

ma********@gmail.com wrote:

Hi, I'm looking for something like:

multi_split( 'a:=b+c' , [':=','+'] )

returning:
['a', ':=', 'b', '+', 'c']

whats the python way to achieve this, preferably without regexp?

I think regexps are likely the right way to do this kind of
tokenization.

The string split() method doesn't return the split value so that is
less than helpful for your application: 'a=b'.split() -- ['a',
'b']

The new str.partition() method will return the split value and is
suitable for successive applications: 'a:=b+c'.partition(':=') -->
('a', ':=', 'b+c')

FWIW, when someone actually does want something that behaves like
str.split() but with multiple split values, one approach is to replace
each of the possible splitters with a single splitter:

def multi_split(s, splitters):
first = splitters[0]
for splitter in splitters:
s = s.replace(splitter, first)
return s.split(first)

print multi_split( 'a:=b+c' , [':=','+'] )
Raymond

Nov 14 '06 #2

Peter Otten

ma********@gmail.com wrote:

Hi, I'm looking for something like:

multi_split( 'a:=b+c' , [':=','+'] )

returning:
['a', ':=', 'b', '+', 'c']

whats the python way to achieve this, preferably without regexp?

I think in this case the regexp approach is the simplest, though:

>>def multi_split(text, splitters):

.... return re.split("(%s)" % "|".join(re.escape(splitter) for splitter
in splitters), text)
....

>>multi_split("a:=b+c", [":=", "+"])

['a', ':=', 'b', '+', 'c']

Peter

Nov 14 '06 #3

Kent Johnson

ma********@gmail.com wrote:

Hi, I'm looking for something like:

multi_split( 'a:=b+c' , [':=','+'] )

returning:
['a', ':=', 'b', '+', 'c']

whats the python way to achieve this, preferably without regexp?

What do you have against regexp? re.split() does exactly what you want:

In [1]: import re

In [2]: re.split(r'(:=|\+)', 'a:=b+c')
Out[2]: ['a', ':=', 'b', '+', 'c']

Kent

Nov 14 '06 #4

Paddy

ma********@gmail.com wrote:

Hi, I'm looking for something like:

multi_split( 'a:=b+c' , [':=','+'] )

returning:
['a', ':=', 'b', '+', 'c']

whats the python way to achieve this, preferably without regexp?

Thanks.

Martin

I resisted my urge to use a regexp and came up with this:

>>from itertools import groupby
s = 'apple=blue+cart'
[''.join(g) for k,g in groupby(s, lambda x: x in '=+')]

['apple', '=', 'blue', '+', 'cart']

>>>

For me, the regexp solution would have been clearer, but I need to
stretch my itertools skills.

- Paddy.

Nov 14 '06 #5

Sam Pointon

On Nov 14, 7:56 pm, "martins...@gmail.com" <martins...@gmail.com>
wrote:

Hi, I'm looking for something like:

multi_split( 'a:=b+c' , [':=','+'] )

returning:
['a', ':=', 'b', '+', 'c']

whats the python way to achieve this, preferably without regexp?

pyparsing <http://pyparsing.wikispaces.com/is quite a cool package
for doing this sort of thing. Using your example:

#untested
from pyparsing import *

splitat = Or(":=", "+")
lexeme = Word(alphas)
grammar = splitat | lexeme

grammar.parseString("a:=b+c")
#returns (the equivalent of) ['a', ':=', 'b', '+', 'c'].

--Sam

Nov 14 '06 #6

Paddy

Paddy wrote:

ma********@gmail.com wrote:

Hi, I'm looking for something like:

multi_split( 'a:=b+c' , [':=','+'] )

returning:
['a', ':=', 'b', '+', 'c']

whats the python way to achieve this, preferably without regexp?

Thanks.

Martin

I resisted my urge to use a regexp and came up with this:

>from itertools import groupby
s = 'apple=blue+cart'
[''.join(g) for k,g in groupby(s, lambda x: x in '=+')]

['apple', '=', 'blue', '+', 'cart']

>>

For me, the regexp solution would have been clearer, but I need to
stretch my itertools skills.

- Paddy.

Arghhh!
No colon!
Forget the above please.

- Pad.

Nov 15 '06 #7

Paddy

Paddy wrote:

Paddy wrote:

ma********@gmail.com wrote:

Hi, I'm looking for something like:
>
multi_split( 'a:=b+c' , [':=','+'] )
>
returning:
['a', ':=', 'b', '+', 'c']
>
whats the python way to achieve this, preferably without regexp?
>
Thanks.
>
Martin
I resisted my urge to use a regexp and came up with this:

>>from itertools import groupby
>>s = 'apple=blue+cart'
>>[''.join(g) for k,g in groupby(s, lambda x: x in '=+')]
['apple', '=', 'blue', '+', 'cart']
>>>
For me, the regexp solution would have been clearer, but I need to
stretch my itertools skills.

- Paddy.
Arghhh!
No colon!
Forget the above please.

- Pad.

With colon:

>>from itertools import groupby
s = 'apple:=blue+cart'
[''.join(g) for k,g in groupby(s,lambda x: x in ':=+')]

['apple', ':=', 'blue', '+', 'cart']

>>>

- Pad.

Nov 15 '06 #8

Frederic Rentsch

Paddy wrote:

Paddy wrote:

>Paddy wrote:

>>ma********@gmail.com wrote:

Hi, I'm looking for something like:

multi_split( 'a:=b+c' , [':=','+'] )

returning:
['a', ':=', 'b', '+', 'c']

whats the python way to achieve this, preferably without regexp?

Thanks.

Martin
I resisted my urge to use a regexp and came up with this:

>from itertools import groupby
>s = 'apple=blue+cart'
>[''.join(g) for k,g in groupby(s, lambda x: x in '=+')]
['apple', '=', 'blue', '+', 'cart']
For me, the regexp solution would have been clearer, but I need to
stretch my itertools skills.

- Paddy.
Arghhh!
No colon!
Forget the above please.

- Pad.

With colon:

>>>from itertools import groupby
s = 'apple:=blue+cart'
[''.join(g) for k,g in groupby(s,lambda x: x in ':=+')]

['apple', ':=', 'blue', '+', 'cart']

- Pad.

Automatic grouping may or may not work as intended. If some subsets
should not be split, the solution raises a new problem.

I have been demonstrating solutions based on SE with such frequency of
late that I have begun to irritate some readers and SE in sarcastic
exaggeration has been characterized as the 'Solution of Everything'.
With some trepidation I am going to demonstrate another SE solution,
because the truth of the exaggeration is that SE is a versatile tool for
handling a variety of relatively simple problems in a simple,
straightforward manner.

>>test_string = 'a:=b+c: apple:=blue:+cart''
SE.SE (':\==/:\=/ +=/+/')(test_string).split ('/') # For repeats

the SE object would be assigned to a variable
['a', ':=', 'b', '+', 'c: apple', ':=', 'blue:', '+', 'cart']

This is a nuts-and-bolts approach. What you do is what you get. What you
want is what you do. By itself SE doesn't do anything but search and
replace, a concept without a learning curve. The simplicity doesn't
suggest versatility. Versatility comes from application techniques.
SE is a game of challenge. You know the result you want. You know
the pieces you have. The game is how to get the result with the pieces
using search and replace, either per se or as an auxiliary, as in this
case for splitting. That's all. The example above inserts some
appropriate split mark ('/'). It takes thirty seconds to write it up and
see the result. No need to ponder formulas and inner workings. If you
don't like what you see you also see what needs to be changed. Supposing
we should split single colons too, adding the corresponding substitution
and verifying the effect is a matter of another ten seconds:

>>SE.SE (':\==/:\=/ +=/+/ :=/:/')(test_string).split ('/')

['a', ':=', 'b', '+', 'c', ':', ' apple', ':=', 'blue', ':', '', '+',
'cart']

Now we see an empty field we don't like towards the end. Why?

>>SE.SE (':\==/:\=/ +=/+/ :=/:/')(test_string)

'a/:=/b/+/c/:/ apple/:=/blue/://+/cart'

Ah! It's two slashes next to each other. No problem. We de-multiply
double slashes in a second pass:

>>SE.SE (':\==/:\=/ +=/+/ :=/:/ | //=/')(test_string).split ('/')

['a', ':=', 'b', '+', 'c', ':', ' apple', ':=', 'blue', ':', '+', 'cart']

On second thought the colon should not be split if a plus sign follows:

>>SE.SE (':\==/:\=/ +=/+/ :=/:/ :+=:/+/ | //=/')(test_string).split ('/')

['a', ':=', 'b', '+', 'c', ':', ' apple', ':=', 'blue:', '+', 'cart']

No, wrong again! 'Colon-plus' should be exempt altogether. And no spaces
please:

>>SE.SE (':\==/:\=/ +=/+/ :=/:/ :+=:+ " =" |

//=/')(test_string).split ('/')
['a', ':=', 'b', '+', 'c', ':', 'apple', ':=', 'blue:+cart']

etc.

It is easy to get carried away and to forget that SE should not be used
instead of Python's built-ins, or to get carried away doing contextual
or grammar processing explicitly, which gets messy very fast. SE fills a
gap somewhere between built-ins and parsers.
Stream editing is not a mainstream technique. I believe it has the
potential to make many simple problems trivial and many harder ones
simpler. This is why I believe the technique deserves more attention,
which, again, may explain the focus of my posts.

Frederic

Nov 16 '06 #9

Paul McGuire

On Nov 14, 5:41 pm, "Sam Pointon" <free.condime...@gmail.comwrote:

On Nov 14, 7:56 pm, "martins...@gmail.com" <martins...@gmail.com>
wrote:

Hi, I'm looking for something like:

multi_split( 'a:=b+c' , [':=','+'] )

returning:
['a', ':=', 'b', '+', 'c']

whats the python way to achieve this, preferably without regexp?

pyparsing <http://pyparsing.wikispaces.com/is quite a cool package
for doing this sort of thing.

Thanks for mentioning pyparsing, Sam!

This is a good example of using pyparsing for just basic tokenizing,
and it will do a nice job of splitting up the tokens, whether there is
whitespace or not.

For instance, if you were tokenizing using the string split() method,
you would get nice results from "a := b + c", but not so good from "a:=
b+ c". Using Sam Pointon's simple pyparsing expression, you can split
up the arithmetic using the symbol expressions, and the whitespace is
pretty much ignored.

But pyparsing can be used for more than just tokenizing. Here is a
slightly longer pyparsing example, using a new pyparsing helper method
called operatorPrecedence, which can shortcut the definition of
operator-separated expressions with () grouping. Note how this not
only tokenizes the expression, but also identifies the implicit groups
based on operator precedence. Finally, pyparsing allows you to label
the parsed results - in this case, you can reference the LHS and RHS
sides of your assignment statement using the attribute names "lhs" and
"rhs". This can really be handy for complicated grammars.

-- Paul
from pyparsing import *

number = Word(nums)
variable = Word(alphas)
operand = number | variable

arithexpr = operatorPrecedence( operand,
[("!", 1, opAssoc.LEFT), # factorial
("^", 2, opAssoc.RIGHT), # exponentiation
(oneOf('+ -'), 1, opAssoc.RIGHT), # leading sign
(oneOf('* /'), 2, opAssoc.LEFT), # multiplication
(oneOf('+ -'), 2, opAssoc.LEFT),] # addition
)

assignment = (variable.setResultsName("lhs") +
":=" +
arithexpr.setResultsName("rhs"))

test = ["a:= b+c",
"a := b + -c",
"y := M*X + B",
"e := m * c^2",]

for t in test:
tokens = assignment.parseString(t)
print tokens.asList()
print tokens.lhs, "<-", tokens.rhs
print

Prints:
['a', ':=', ['b', '+', 'c']]
a <- ['b', '+', 'c']

['a', ':=', ['b', '+', ['-', 'c']]]
a <- ['b', '+', ['-', 'c']]

['y', ':=', [['M', '*', 'X'], '+', 'B']]
y <- [['M', '*', 'X'], '+', 'B']

['e', ':=', ['m', '*', ['c', '^', 2]]]
e <- ['m', '*', ['c', '^', 2]]

Nov 16 '06 #10

Similar topics

sql charindex split string

by: Willem | last post by:

Hello I am quite hopeless and of course a newbe. The situation: Sql2k / query I would like it ot break down the following string: 2004 Inventory:Ex.Plant Farm1:1st Cut:Premium:0094

Microsoft SQL Server

String.Split versus Strings.Split

by: kurt sune | last post by:

The code: Dim aLine As String = "cat" & vbNewLine & "dog" & vbNewLine & "fox" & vbNewLine Dim csvColumns1 As String() = aLine.Split(vbNewLine, vbCr, vbLf) Dim csvColumns2 As String() =...

Visual Basic .NET

String split

by: Michele Petrazzo | last post by:

Hello ng, I don't understand why split (string split) doesn't work with the same method if I can't pass values or if I pass a whitespace value: >>> "".split() >>> "".split(" ") But into...

Python

Difficulty with maxsplit default value for str.split

by: Steven D'Aprano | last post by:

I'm having problems passing a default value to the maxsplit argument of str.split. I'm trying to write a function which acts as a wrapper to split, something like this: def mysplit(S, sep=None,...

Python

Why isn't SPLIT splitting my strings

by: ronrsr | last post by:

I'm trying to break up the result tuple into keyword phrases. The keyword phrases are separated by a ; -- the split function is not working the way I believe it should be. Can anyone see what I"m...

Python

Creating a form with multi select

by: bcap | last post by:

hi, I am trying to create a form where you may have more than one person at a meeting, but want to have them be related to the same meeting. I have a mulitple select text area and if you...

ASP / Active Server Pages

Multi-dimensional arrays - (I think!)

by: ben.r.wood | last post by:

I am not entirely sure, but after scanning the web believe I need to use multi-dimensional arrays for my problem - although I have not seen any examples of what I am trying to achieve. I have a...

ASP / Active Server Pages

split parameter line with quotes

by: teddyber | last post by:

Hello, first i'm a newbie to python (but i searched the Internet i swear). i'm looking for some way to split up a string into a list of pairs 'key=value'. This code should be able to handle this...

Python

Split function question

by: John | last post by:

Hi I have written a Split function which in turn calls the standard string split function. Code is below; Function Split1(ByVal Expression As String, Optional ByVal Delimiter As String = " ",...

Visual Basic .NET

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA