splitting delimited strings

Mark Harrison

What is the best way to process a text file of delimited strings?
I've got a file where strings are quoted with at-signs, @like this@.
At-signs in the string are represented as doubled @@.

What's the most efficient way to process this? Failing all
else I will split the string into characters and use a FSM,
but it seems that's not very pythonesqe.

@rv@ 2 @db.locks@ @//depot/hello.txt@ @mh@ @mh@ 1 1 44
@pv@ 0 @db.changex@ 44 44 @mh@ @mh@ 1118875308 0 @ :@@: :@@@@: @

(this is from a perforce journal file, btw)

Many TIA!
Mark

--
Mark Harrison
Pixar Animation Studios

Jul 19 '05 #1

Subscribe Post Reply

1732

Paul McNett

Mark Harrison wrote:

What is the best way to process a text file of delimited strings?
I've got a file where strings are quoted with at-signs, @like this@.
At-signs in the string are represented as doubled @@.
Have you taken a look at the csv module yet? No guarantees, but it may
just work. You'd have to set delimiter to ' ' and quotechar to '@'. You
may need to manually handle the double-@ thing, but why don't you see
how close you can get with csv?
@rv@ 2 @db.locks@ @//depot/hello.txt@ @mh@ @mh@ 1 1 44
@pv@ 0 @db.changex@ 44 44 @mh@ @mh@ 1118875308 0 @ :@@: :@@@@: @

(this is from a perforce journal file, btw)

--
Paul McNett
http://paulmcnett.com

Jul 19 '05 #2

Christoph Rackwitz

You could use regular expressions... it's an FSM of some kind but it's
faster *g*
check this snippet out:

def mysplit(s):
pattern = '((?:"[^"]*")|(?:[^ ]+))'
tmp = re.split(pattern, s)
res = [ifelse(i[0] in ('"',"'"), lambda:i[1:-1], lambda:i) for i in
tmp if i.strip()]
return res

mysplit('foo bar "baz foo" bar "baz"')

['foo', 'bar', 'baz foo', 'bar', 'baz']

Jul 19 '05 #3

Nicola Mingotti

On Wed, 15 Jun 2005 23:03:55 +0000, Mark Harrison wrote:

What's the most efficient way to process this? Failing all
else I will split the string into characters and use a FSM,
but it seems that's not very pythonesqe.

like this ?

s = "@hello@world@@foo@bar"
s.split("@") ['', 'hello', 'world', '', 'foo', 'bar'] s2 = "hello@world@@foo@bar"
s2 'hello@world@@foo@bar' s2.split("@") ['hello', 'world', '', 'foo', 'bar']

bye

Jul 19 '05 #4

John Machin

Mark Harrison wrote:

What is the best way to process a text file of delimited strings?
I've got a file where strings are quoted with at-signs, @like this@.
At-signs in the string are represented as doubled @@.

What's the most efficient way to process this? Failing all
else I will split the string into characters and use a FSM,
but it seems that's not very pythonesqe.

@rv@ 2 @db.locks@ @//depot/hello.txt@ @mh@ @mh@ 1 1 44
@pv@ 0 @db.changex@ 44 44 @mh@ @mh@ 1118875308 0 @ :@@: :@@@@: @

import csv
list(csv.reader(file('at_quotes.txt', 'rb'), delimiter=' ', quotechar='@'))
[['rv', '2', 'db.locks', '//depot/hello.txt', 'mh', 'mh', '1', '1',
'44'], ['pv'
, '0', 'db.changex', '44', '44', 'mh', 'mh', '1118875308', '0', ' :@:
:@@: ']]

Jul 19 '05 #5

John Machin

Nicola Mingotti wrote:

On Wed, 15 Jun 2005 23:03:55 +0000, Mark Harrison wrote:

What's the most efficient way to process this? Failing all
else I will split the string into characters and use a FSM,
but it seems that's not very pythonesqe.

like this ?

No, not like that. The OP said that an embedded @ was doubled.

s = "@hello@world@@foo@bar"
s.split("@")
['', 'hello', 'world', '', 'foo', 'bar']
s2 = "hello@world@@foo@bar"
s2
'hello@world@@foo@bar'
s2.split("@")

['hello', 'world', '', 'foo', 'bar']
bye

Jul 19 '05 #6

Mark Harrison

Paul McNett <p@ulmcnett.com> wrote:

Mark Harrison wrote:
What is the best way to process a text file of delimited strings?
I've got a file where strings are quoted with at-signs, @like this@.
At-signs in the string are represented as doubled @@.

Have you taken a look at the csv module yet? No guarantees, but it may
just work. You'd have to set delimiter to ' ' and quotechar to '@'. You
may need to manually handle the double-@ thing, but why don't you see
how close you can get with csv?

This is great! Everything works perfectly. Even the double-@ thing
is handled by the default quotechar handling.

Thanks again,
Mark

--
Mark Harrison
Pixar Animation Studios

Jul 19 '05 #7

Leif K-Brooks

Mark Harrison wrote:

What is the best way to process a text file of delimited strings?
I've got a file where strings are quoted with at-signs, @like this@.
At-signs in the string are represented as doubled @@.

import re
_at_re = re.compile('(?<!@)@(?!@)')
def split_at_line(line): .... return [field.replace('@@', '@') for field in
.... _at_re.split(line)]
.... split_at_line('foo@bar@@baz@qux')

['foo', 'bar@baz', 'qux']

Jul 19 '05 #8

Paul McGuire

Mark -

Let me weigh in with a pyparsing entry to your puzzle. It wont be
blazingly fast, but at least it will give you another data point in
your comparison of approaches. Note that the parser can do the
string-to-int conversion for you during the parsing pass.

If @rv@ and @pv@ are record type markers, then you can use pyparsing to
create more of a parser than just a simple tokenizer, and parse out the
individual record fields into result attributes.

Download pyparsing at http://pyparsing.sourceforge.net.

-- Paul

test1 = "@hello@@world@@foo@bar"
test2 = """@rv@ 2 @db.locks@ @//depot/hello.txt@ @mh@ @mh@ 1 1 44
@pv@ 0 @db.changex@ 44 44 @mh@ @mh@ 1118875308 0 @ :@@: :@@@@: @"""

from pyparsing import *

AT = Literal("@")
atQuotedString = AT.suppress() + Combine(OneOrMore((~AT + SkipTo(AT)) |

(AT +
AT).setParseAction(replaceWith("@")) )) + AT.suppress()

# extract any @-quoted strings
for test in (test1,test2):
for toks,s,e in atQuotedString.scanString(test):
print toks
print

# parse all tokens (assume either a positive integer or @-quoted
string)
def makeInt(s,l,toks):
return int(toks[0])
entry = OneOrMore( Word(nums).setParseAction(makeInt) | atQuotedString
)

for t in test2.split("\n"):
print entry.parseString(t)

Prints out:

['hello@world@foo']

['rv']
['db.locks']
['//depot/hello.txt']
['mh']
['mh']
['pv']
['db.changex']
['mh']
['mh']
[':@: :@@: ']

['rv', 2, 'db.locks', '//depot/hello.txt', 'mh', 'mh', 1, 1, 44]
['pv', 0, 'db.changex', 44, 44, 'mh', 'mh', 1118875308, 0, ':@: :@@: ']

Jul 19 '05 #9

Nicola Mingotti

On Thu, 16 Jun 2005 09:36:56 +1000, John Machin wrote:

like this ?

No, not like that. The OP said that an embedded @ was doubled.

you are right, sorry :)

anyway, if @@ -> @
an empty field map to what ?

Jul 19 '05 #10

John Machin

Leif K-Brooks wrote:

Mark Harrison wrote:
What is the best way to process a text file of delimited strings?
I've got a file where strings are quoted with at-signs, @like this@.
At-signs in the string are represented as doubled @@.

import re
_at_re = re.compile('(?<!@)@(?!@)')
def split_at_line(line):
... return [field.replace('@@', '@') for field in
... _at_re.split(line)]
...
split_at_line('foo@bar@@baz@qux')

['foo', 'bar@baz', 'qux']

The plot according to the OP was that the @s were quotes, NOT delimiters.

Jul 19 '05 #11

by: Steven Bethard | last post by:

Here's what I'm doing: >>> lst = >>> splits = >>> for s in lst: .... pair = s.split(':') .... if len(pair) != 2: .... pair.append(None) .... splits.append(pair) ....

Python

FYI - Sql for splitting a delimited concatenated string into separate strings/rows.

by: Dr. StrangeLove | last post by:

Greetings, Let say we want to split column 'list' in table lists into separate rows using the comma as the delimiter. Table lists id list 1 aa,bbb,c 2 e,f,gggg,hh 3 ii,kk 4 m

Microsoft Access / VBA

Handling nulls in tab-delimited string?

by: Vagabond Software | last post by:

Apparently, the Split method handles consecutive tabs as a single delimiter. Does anyone have any suggestions for handling consecutive tabs? I am reading in text files that contain lines of...

C# / C Sharp

i need help with splitting a string please

by: Trint Smith | last post by:

Ok, My program has been formating .txt files for input into sql server and ran into a problem...the .txt is an export from an accounting package and is only supposed to contain comas (,) between...

Visual Basic .NET

Splitting up a string

by: Opettaja | last post by:

I am new to c# and I am currently trying to make a program to retrieve Battlefield 2 game stats from the gamespy servers. I have got it so I can retrieve the data but I do not know how to cut up...

C# / C Sharp

Piped delimited string to int

by: Fariba | last post by:

Hello , I am trying to call a mthod with the following signature: AddRole(string Group_Nam, string Description, int permissionmask); Accroding to msdn ,you can mask the permissions using...

C# / C Sharp

Splitting a string into an array words

by: Simon | last post by:

Well, the title's pretty descriptive; how would I be able to take a line of input like this: getline(cin,mostrecentline); And split into an (flexible) array of strings. For example: "do this...

C / C++

string splitting

by: Andrea | last post by:

I want to write a program that: char * strplit(char* str1, char *str2, char * stroriginal,int split_point) that take stroriginal and split in the split_point element of the string the string...

C / C++

Splitting function

by: shadow_ | last post by:

Hi i m new at C and trying to write a parser and a string class. Basicly program will read data from file and splits it into lines then lines to words. i used strtok function for splitting data to...

C / C++

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

splitting delimited strings

Similar topics