473,383 Members | 1,925 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,383 software developers and data experts.

split a string with quoted parts into list

hi there

i'm experimanting with imaplib and came across stringts like
(\HasNoChildren) "." "INBOX.Sent Items"
in which the quotes are part of the string.

now i try to convert this into a list. assume the string is in the variable
f, then i tried
f.split()
but i end up with
['(\\HasNoChildren)', '"."', '"INBOX.Sent', 'Items"']
so due to the sapce in "Sent Items" its is sepearted in two entries, what i
don't want.

is there another way to convert a string with quoted sub entries into a list
of strings?

thanks a lot, olli
Jul 18 '05 #1
5 3246
oliver wrote:
i'm experimanting with imaplib and came across stringts like
(\HasNoChildren) "." "INBOX.Sent Items"
in which the quotes are part of the string.

now i try to convert this into a list. assume the string is in the variable
f, then i tried
f.split()
but i end up with
['(\\HasNoChildren)', '"."', '"INBOX.Sent', 'Items"']
so due to the sapce in "Sent Items" its is sepearted in two entries, what i
don't want.

is there another way to convert a string with quoted sub entries into a list
of strings?


Try the standard module shlex
(http://www.python.org/dev/doc/devel/...le-shlex.html). It might be
that the quoting rules are not exactly the ones you need, though.

Daniel
Jul 18 '05 #2
> is there another way to convert a string with quoted sub entries into a
list of strings?


try the csv-module.
--
Regards,

Diez B. Roggisch
Jul 18 '05 #3
oliver wrote:
hi there

i'm experimanting with imaplib and came across stringts like
(\HasNoChildren) "." "INBOX.Sent Items"
in which the quotes are part of the string.

now i try to convert this into a list. assume the string is in the variable
f, then i tried
f.split()
but i end up with
['(\\HasNoChildren)', '"."', '"INBOX.Sent', 'Items"']
so due to the sapce in "Sent Items" its is sepearted in two entries, what i
don't want.

is there another way to convert a string with quoted sub entries into a list
of strings?

In Twisteds protocols/imap4.py module there is a function called
parseNestedParens() that can be ripped out of the module.

I have used it for another project and put it into this attachment.

--

hilsen/regards Max M, Denmark

http://www.mxm.dk/
IT's Mad Science

"""
This code was stolen from Twisteds protocols/imap4.py module
"""

import types, string

class IMAP4Exception(Exception):
def __init__(self, *args):
Exception.__init__(self, *args)

class MismatchedNesting(IMAP4Exception):
pass

class MismatchedQuoting(IMAP4Exception):
pass

def wildcardToRegexp(wildcard, delim=None):
wildcard = wildcard.replace('*', '(?:.*?)')
if delim is None:
wildcard = wildcard.replace('%', '(?:.*?)')
else:
wildcard = wildcard.replace('%', '(?:(?:[^%s])*?)' % re.escape(delim))
return re.compile(wildcard, re.I)

def splitQuoted(s):
"""Split a string into whitespace delimited tokens

Tokens that would otherwise be separated but are surrounded by \"
remain as a single token. Any token that is not quoted and is
equal to \"NIL\" is tokenized as C{None}.

@type s: C{str}
@param s: The string to be split

@rtype: C{list} of C{str}
@return: A list of the resulting tokens

@raise MismatchedQuoting: Raised if an odd number of quotes are present
"""
s = s.strip()
result = []
inQuote = inWord = start = 0
for (i, c) in zip(range(len(s)), s):
if c == '"' and not inQuote:
inQuote = 1
start = i + 1
elif c == '"' and inQuote:
inQuote = 0
result.append(s[start:i])
start = i + 1
elif not inWord and not inQuote and c not in ('"' + string.whitespace):
inWord = 1
start = i
elif inWord and not inQuote and c in string.whitespace:
if s[start:i] == 'NIL':
result.append(None)
else:
result.append(s[start:i])
start = i
inWord = 0
if inQuote:
raise MismatchedQuoting(s)
if inWord:
if s[start:] == 'NIL':
result.append(None)
else:
result.append(s[start:])
return result
def splitOn(sequence, predicate, transformers):
result = []
mode = predicate(sequence[0])
tmp = [sequence[0]]
for e in sequence[1:]:
p = predicate(e)
if p != mode:
result.extend(transformers[mode](tmp))
tmp = [e]
mode = p
else:
tmp.append(e)
result.extend(transformers[mode](tmp))
return result
def collapseStrings(results):
"""
Turns a list of length-one strings and lists into a list of longer
strings and lists. For example,

['a', 'b', ['c', 'd']] is returned as ['ab', ['cd']]

@type results: C{list} of C{str} and C{list}
@param results: The list to be collapsed

@rtype: C{list} of C{str} and C{list}
@return: A new list which is the collapsed form of C{results}
"""
copy = []
begun = None
listsList = [isinstance(s, types.ListType) for s in results]

pred = lambda e: isinstance(e, types.TupleType)
tran = {
0: lambda e: splitQuoted(''.join(e)),
1: lambda e: [''.join([i[0] for i in e])]
}
for (i, c, isList) in zip(range(len(results)), results, listsList):
if isList:
if begun is not None:
copy.extend(splitOn(results[begun:i], pred, tran))
begun = None
copy.append(collapseStrings(c))
elif begun is None:
begun = i
if begun is not None:
copy.extend(splitOn(results[begun:], pred, tran))
return copy


def parseNestedParens(s, handleLiteral = 1):
"""Parse an s-exp-like string into a more useful data structure.

@type s: C{str}
@param s: The s-exp-like string to parse

@rtype: C{list} of C{str} and C{list}
@return: A list containing the tokens present in the input.

@raise MismatchedNesting: Raised if the number or placement
of opening or closing parenthesis is invalid.
"""
s = s.strip()
inQuote = 0
contentStack = [[]]
try:
i = 0
L = len(s)
while i < L:
c = s[i]
if inQuote:
if c == '\\':
contentStack[-1].append(s[i+1])
i += 2
continue
elif c == '"':
inQuote = not inQuote
contentStack[-1].append(c)
i += 1
else:
if c == '"':
contentStack[-1].append(c)
inQuote = not inQuote
i += 1
elif handleLiteral and c == '{':
end = s.find('}', i)
if end == -1:
raise ValueError, "Malformed literal"
literalSize = int(s[i+1:end])
contentStack[-1].append((s[end+3:end+3+literalSize],))
i = end + 3 + literalSize
elif c == '(' or c == '[':
contentStack.append([])
i += 1
elif c == ')' or c == ']':
contentStack[-2].append(contentStack.pop())
i += 1
else:
contentStack[-1].append(c)
i += 1
except IndexError:
raise MismatchedNesting(s)
if len(contentStack) != 1:
raise MismatchedNesting(s)
return collapseStrings(contentStack[0])
if __name__=='__main__':

r = '(\Noinferiors \Unmarked) "/" "INBOX"(\Unmarked) "/" "test"(\Noinferiors \Unmarked) "/" "Sent Items"(\Noinferiors \Unmarked) "/" "Calendar"(\Noinferiors \Unmarked) "/" "Checklist"(\Unmarked) "/" "Cabinet"(\Noinferiors \Marked) "/" "Trash"(\Unmarked) "/" "INBOX.Sent"(\Unmarked) "/" "Sent"'

parsedParens = parseNestedParens(r)
print parsedParens
for i in range(0, len(parsedParens), 3):
(flags, seperator, folderName) = parsedParens[i:i+3]
print flags
print seperator
print folderName
Jul 18 '05 #4
oliver wrote:
i'm experimanting with imaplib and came across stringts like
(\HasNoChildren) "." "INBOX.Sent Items"
in which the quotes are part of the string.

now i try to convert this into a list. assume the string is in the variable
f, then i tried
f.split()
but i end up with
['(\\HasNoChildren)', '"."', '"INBOX.Sent', 'Items"']
so due to the sapce in "Sent Items" its is sepearted in two entries, what i
don't want. is there another way to convert a string with quoted sub entries into a list
of strings?


First break into strings, then space-split the non-strings.

def splitup(somestring):
gen = iter(somestring.split('"'))
for unquoted in gen:
for part in unquoted.split():
yield part
yield gen.next().join('""')

--Scott David Daniels
Sc***********@Acm.Org
Jul 18 '05 #5
Oliver -

Here is a simpler approach, hopefully more readable, using pyparsing
(at http://pyparsing.sourceforge.net). I also added another test word
to your sample input line, one consisting of a lone pair of double
quotes, signifying an empty string. (Be sure to remove leading '.'s
from Python text - necessary to retain program indentation which Google
Groups otherwise collapses.)

-- Paul
..data = r"""
..(\HasNoChildren) "." "INBOX.Sent Items" ""
.."""
..
..from pyparsing import printables,Word,dblQuotedString,OneOrMore
..
..nonQuoteChars = "".join( [ c for c in printables if c not in '"'] )
..word = Word(nonQuoteChars) | dblQuotedString
..
..words = OneOrMore(word)
..
..for s in words.parseString(data):
.. print ">%s<" % s
..
Gives:
(\HasNoChildren)<
"."<
"INBOX.Sent Items"<
""<
But really, I'm guessing that you'd rather not have the quote
characters in there either. It's simple enough to have pyparsing
remove them when a dblQuotedString is found:

..# add a parse action to remove the double quote characters
..# one of the beauties of parse actions is that there is no need to
..# verify that the first and last characters are "'s - this function
..# is never called unless the tokens in tokenslist match the
..# required expression
..def removeDblQuotes(st,loc,tokenslist):
.. return tokenslist[0][1:-1]
..dblQuotedString.setParseAction( removeDblQuotes )
..
..for s in words.parseString(data):
.. print ">%s<" % s
..
Gives:(\HasNoChildren)<
.<
INBOX.Sent Items<
<


Jul 18 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

14
by: Luka Milkovic | last post by:
Hello, I have a little problem and although it's little it's extremely difficult for me to describe it, but I'll try. I have written a program which extracts certain portions of my received...
13
by: Larry L | last post by:
Access is noted for bloating a database when you add and delete records frequently. I have always had mine set to compact on close, and that works great. Now after everyone's advice I split my...
4
by: William Stacey [MVP] | last post by:
Would like help with a (I think) a common regex split example. Thanks for your example in advance. Cheers! Source Data Example: one "two three" four Optional, but would also like to...
4
by: Roshawn | last post by:
Hi, I am retrieving a list of book titles from a web service. What I'd like to do is shorten the titles, if possible. For example, there is a book titled "Malicious Mobile Code: Virus...
4
by: Michele Petrazzo | last post by:
Hello ng, I don't understand why split (string split) doesn't work with the same method if I can't pass values or if I pass a whitespace value: >>> "".split() >>> "".split(" ") But into...
3
by: Dave | last post by:
I'm calling string.Split() producing output string. I need direct access to its enumerator, but would greatly prefer an enumerator strings and not object types (as my parsing is unsafe casting...
10
by: teddyber | last post by:
Hello, first i'm a newbie to python (but i searched the Internet i swear). i'm looking for some way to split up a string into a list of pairs 'key=value'. This code should be able to handle this...
2
by: Andy B | last post by:
I don't know if this is even working or not but here is the problem. I have a gridview that I databound to a dictionary<string, stringcollection: Contract StockContract = new Contract();...
6
by: Joel Koltner | last post by:
I normally use str.split() for simple splitting of command line arguments, but I would like to support, e.g., long file names which-- under windows -- are typically provided as simple quoted...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.