473,698 Members | 2,554 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

splitting delimited strings

What is the best way to process a text file of delimited strings?
I've got a file where strings are quoted with at-signs, @like this@.
At-signs in the string are represented as doubled @@.

What's the most efficient way to process this? Failing all
else I will split the string into characters and use a FSM,
but it seems that's not very pythonesqe.

@rv@ 2 @db.locks@ @//depot/hello.txt@ @mh@ @mh@ 1 1 44
@pv@ 0 @db.changex@ 44 44 @mh@ @mh@ 1118875308 0 @ :@@: :@@@@: @

(this is from a perforce journal file, btw)

Many TIA!
Mark

--
Mark Harrison
Pixar Animation Studios
Jul 19 '05 #1
10 1777
Mark Harrison wrote:
What is the best way to process a text file of delimited strings?
I've got a file where strings are quoted with at-signs, @like this@.
At-signs in the string are represented as doubled @@.
Have you taken a look at the csv module yet? No guarantees, but it may
just work. You'd have to set delimiter to ' ' and quotechar to '@'. You
may need to manually handle the double-@ thing, but why don't you see
how close you can get with csv?
@rv@ 2 @db.locks@ @//depot/hello.txt@ @mh@ @mh@ 1 1 44
@pv@ 0 @db.changex@ 44 44 @mh@ @mh@ 1118875308 0 @ :@@: :@@@@: @

(this is from a perforce journal file, btw)

--
Paul McNett
http://paulmcnett.com

Jul 19 '05 #2
You could use regular expressions... it's an FSM of some kind but it's
faster *g*
check this snippet out:

def mysplit(s):
pattern = '((?:"[^"]*")|(?:[^ ]+))'
tmp = re.split(patter n, s)
res = [ifelse(i[0] in ('"',"'"), lambda:i[1:-1], lambda:i) for i in
tmp if i.strip()]
return res
mysplit('foo bar "baz foo" bar "baz"')

['foo', 'bar', 'baz foo', 'bar', 'baz']

Jul 19 '05 #3
On Wed, 15 Jun 2005 23:03:55 +0000, Mark Harrison wrote:
What's the most efficient way to process this? Failing all
else I will split the string into characters and use a FSM,
but it seems that's not very pythonesqe.


like this ?
s = "@hello@world@@ foo@bar"
s.split("@") ['', 'hello', 'world', '', 'foo', 'bar'] s2 = "hello@world@@f oo@bar"
s2 'hello@world@@f oo@bar' s2.split("@") ['hello', 'world', '', 'foo', 'bar']


bye
Jul 19 '05 #4
Mark Harrison wrote:
What is the best way to process a text file of delimited strings?
I've got a file where strings are quoted with at-signs, @like this@.
At-signs in the string are represented as doubled @@.

What's the most efficient way to process this? Failing all
else I will split the string into characters and use a FSM,
but it seems that's not very pythonesqe.

@rv@ 2 @db.locks@ @//depot/hello.txt@ @mh@ @mh@ 1 1 44
@pv@ 0 @db.changex@ 44 44 @mh@ @mh@ 1118875308 0 @ :@@: :@@@@: @

import csv
list(csv.reader (file('at_quote s.txt', 'rb'), delimiter=' ', quotechar='@'))
[['rv', '2', 'db.locks', '//depot/hello.txt', 'mh', 'mh', '1', '1',
'44'], ['pv'
, '0', 'db.changex', '44', '44', 'mh', 'mh', '1118875308', '0', ' :@:
:@@: ']]

Jul 19 '05 #5
Nicola Mingotti wrote:
On Wed, 15 Jun 2005 23:03:55 +0000, Mark Harrison wrote:

What's the most efficient way to process this? Failing all
else I will split the string into characters and use a FSM,
but it seems that's not very pythonesqe.

like this ?


No, not like that. The OP said that an embedded @ was doubled.

s = "@hello@world@@ foo@bar"
s.split("@" )
['', 'hello', 'world', '', 'foo', 'bar']
s2 = "hello@world@@f oo@bar"
s2
'hello@world@@f oo@bar'
s2.split("@ ")


['hello', 'world', '', 'foo', 'bar']
bye

Jul 19 '05 #6
Paul McNett <p@ulmcnett.com > wrote:
Mark Harrison wrote:
What is the best way to process a text file of delimited strings?
I've got a file where strings are quoted with at-signs, @like this@.
At-signs in the string are represented as doubled @@.


Have you taken a look at the csv module yet? No guarantees, but it may
just work. You'd have to set delimiter to ' ' and quotechar to '@'. You
may need to manually handle the double-@ thing, but why don't you see
how close you can get with csv?


This is great! Everything works perfectly. Even the double-@ thing
is handled by the default quotechar handling.

Thanks again,
Mark

--
Mark Harrison
Pixar Animation Studios
Jul 19 '05 #7
Mark Harrison wrote:
What is the best way to process a text file of delimited strings?
I've got a file where strings are quoted with at-signs, @like this@.
At-signs in the string are represented as doubled @@.

import re
_at_re = re.compile('(?< !@)@(?!@)')
def split_at_line(l ine): .... return [field.replace(' @@', '@') for field in
.... _at_re.split(li ne)]
.... split_at_line(' foo@bar@@baz@qu x')

['foo', 'bar@baz', 'qux']
Jul 19 '05 #8
Mark -

Let me weigh in with a pyparsing entry to your puzzle. It wont be
blazingly fast, but at least it will give you another data point in
your comparison of approaches. Note that the parser can do the
string-to-int conversion for you during the parsing pass.

If @rv@ and @pv@ are record type markers, then you can use pyparsing to
create more of a parser than just a simple tokenizer, and parse out the
individual record fields into result attributes.

Download pyparsing at http://pyparsing.sourceforge.net.

-- Paul

test1 = "@hello@@world@ @foo@bar"
test2 = """@rv@ 2 @db.locks@ @//depot/hello.txt@ @mh@ @mh@ 1 1 44
@pv@ 0 @db.changex@ 44 44 @mh@ @mh@ 1118875308 0 @ :@@: :@@@@: @"""

from pyparsing import *

AT = Literal("@")
atQuotedString = AT.suppress() + Combine(OneOrMo re((~AT + SkipTo(AT)) |

(AT +
AT).setParseAct ion(replaceWith ("@")) )) + AT.suppress()

# extract any @-quoted strings
for test in (test1,test2):
for toks,s,e in atQuotedString. scanString(test ):
print toks
print

# parse all tokens (assume either a positive integer or @-quoted
string)
def makeInt(s,l,tok s):
return int(toks[0])
entry = OneOrMore( Word(nums).setP arseAction(make Int) | atQuotedString
)

for t in test2.split("\n "):
print entry.parseStri ng(t)

Prints out:

['hello@world@fo o']

['rv']
['db.locks']
['//depot/hello.txt']
['mh']
['mh']
['pv']
['db.changex']
['mh']
['mh']
[':@: :@@: ']

['rv', 2, 'db.locks', '//depot/hello.txt', 'mh', 'mh', 1, 1, 44]
['pv', 0, 'db.changex', 44, 44, 'mh', 'mh', 1118875308, 0, ':@: :@@: ']

Jul 19 '05 #9
On Thu, 16 Jun 2005 09:36:56 +1000, John Machin wrote:
like this ?


No, not like that. The OP said that an embedded @ was doubled.


you are right, sorry :)

anyway, if @@ -> @
an empty field map to what ?

Jul 19 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
5352
by: Steven Bethard | last post by:
Here's what I'm doing: >>> lst = >>> splits = >>> for s in lst: .... pair = s.split(':') .... if len(pair) != 2: .... pair.append(None) .... splits.append(pair) ....
9
14723
by: Dr. StrangeLove | last post by:
Greetings, Let say we want to split column 'list' in table lists into separate rows using the comma as the delimiter. Table lists id list 1 aa,bbb,c 2 e,f,gggg,hh 3 ii,kk 4 m
4
6859
by: Vagabond Software | last post by:
Apparently, the Split method handles consecutive tabs as a single delimiter. Does anyone have any suggestions for handling consecutive tabs? I am reading in text files that contain lines of tab-delimited data. I was using string stringArray = lineOfText.Split('\t') to automatically populate an array used to populate the values in a new DataRow. However, sometimes the lines of text contain null values. I can find these null values by...
2
2520
by: Trint Smith | last post by:
Ok, My program has been formating .txt files for input into sql server and ran into a problem...the .txt is an export from an accounting package and is only supposed to contain comas (,) between fields in a table...well, someone has been entering description fields with comas (,) in the description and now it is splitting between one field...example: "santa clause mushrooms, pens, cups and dolls" I somehow need to NOT split anything...
20
3704
by: Opettaja | last post by:
I am new to c# and I am currently trying to make a program to retrieve Battlefield 2 game stats from the gamespy servers. I have got it so I can retrieve the data but I do not know how to cut up the data to assign each value to its own variable. So right now I am just saving the data to a txt file and when I look in the text file all the data is there. Not sure if this matters but when I open the text file in Word pad (Rich Text) It...
15
3074
by: Fariba | last post by:
Hello , I am trying to call a mthod with the following signature: AddRole(string Group_Nam, string Description, int permissionmask); Accroding to msdn ,you can mask the permissions using pipe symbol .for example you can use something like this AddRole("My Group", "Test", 0x10000000|0x00000002);
12
4454
by: Simon | last post by:
Well, the title's pretty descriptive; how would I be able to take a line of input like this: getline(cin,mostrecentline); And split into an (flexible) array of strings. For example: "do this action" would go to: item 0: do
29
4208
by: Andrea | last post by:
I want to write a program that: char * strplit(char* str1, char *str2, char * stroriginal,int split_point) that take stroriginal and split in the split_point element of the string the string into two other strings, example:
2
3267
by: shadow_ | last post by:
Hi i m new at C and trying to write a parser and a string class. Basicly program will read data from file and splits it into lines then lines to words. i used strtok function for splitting data to lines it worked quite well but srttok isnot working for multiple blank or commas. Can strtok do this kind of splitting if it cant what should i use . Unal
0
9171
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9032
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8905
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
7743
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6532
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5869
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4373
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
3053
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
2008
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.