What is the best way to process a text file of delimited strings?
I've got a file where strings are quoted with at-signs, @like this@.
At-signs in the string are represented as doubled @@.
What's the most efficient way to process this? Failing all
else I will split the string into characters and use a FSM,
but it seems that's not very pythonesqe.
@rv@ 2 @db.locks@ @//depot/hello.txt@ @mh@ @mh@ 1 1 44
@pv@ 0 @db.changex@ 44 44 @mh@ @mh@ 1118875308 0 @ :@@: :@@@@: @
(this is from a perforce journal file, btw)
Many TIA!
Mark
--
Mark Harrison
Pixar Animation Studios 10 1777
Mark Harrison wrote: What is the best way to process a text file of delimited strings? I've got a file where strings are quoted with at-signs, @like this@. At-signs in the string are represented as doubled @@.
Have you taken a look at the csv module yet? No guarantees, but it may
just work. You'd have to set delimiter to ' ' and quotechar to '@'. You
may need to manually handle the double-@ thing, but why don't you see
how close you can get with csv?
@rv@ 2 @db.locks@ @//depot/hello.txt@ @mh@ @mh@ 1 1 44 @pv@ 0 @db.changex@ 44 44 @mh@ @mh@ 1118875308 0 @ :@@: :@@@@: @
(this is from a perforce journal file, btw)
--
Paul McNett http://paulmcnett.com
You could use regular expressions... it's an FSM of some kind but it's
faster *g*
check this snippet out:
def mysplit(s):
pattern = '((?:"[^"]*")|(?:[^ ]+))'
tmp = re.split(patter n, s)
res = [ifelse(i[0] in ('"',"'"), lambda:i[1:-1], lambda:i) for i in
tmp if i.strip()]
return res mysplit('foo bar "baz foo" bar "baz"')
['foo', 'bar', 'baz foo', 'bar', 'baz']
On Wed, 15 Jun 2005 23:03:55 +0000, Mark Harrison wrote: What's the most efficient way to process this? Failing all else I will split the string into characters and use a FSM, but it seems that's not very pythonesqe.
like this ? s = "@hello@world@@ foo@bar" s.split("@")
['', 'hello', 'world', '', 'foo', 'bar'] s2 = "hello@world@@f oo@bar" s2
'hello@world@@f oo@bar' s2.split("@")
['hello', 'world', '', 'foo', 'bar']
bye
Mark Harrison wrote: What is the best way to process a text file of delimited strings? I've got a file where strings are quoted with at-signs, @like this@. At-signs in the string are represented as doubled @@.
What's the most efficient way to process this? Failing all else I will split the string into characters and use a FSM, but it seems that's not very pythonesqe.
@rv@ 2 @db.locks@ @//depot/hello.txt@ @mh@ @mh@ 1 1 44 @pv@ 0 @db.changex@ 44 44 @mh@ @mh@ 1118875308 0 @ :@@: :@@@@: @ import csv list(csv.reader (file('at_quote s.txt', 'rb'), delimiter=' ',
quotechar='@'))
[['rv', '2', 'db.locks', '//depot/hello.txt', 'mh', 'mh', '1', '1',
'44'], ['pv'
, '0', 'db.changex', '44', '44', 'mh', 'mh', '1118875308', '0', ' :@:
:@@: ']]
Nicola Mingotti wrote: On Wed, 15 Jun 2005 23:03:55 +0000, Mark Harrison wrote:
What's the most efficient way to process this? Failing all else I will split the string into characters and use a FSM, but it seems that's not very pythonesqe.
like this ?
No, not like that. The OP said that an embedded @ was doubled.
s = "@hello@world@@ foo@bar" s.split("@" ) ['', 'hello', 'world', '', 'foo', 'bar'] s2 = "hello@world@@f oo@bar" s2 'hello@world@@f oo@bar' s2.split("@ ")
['hello', 'world', '', 'foo', 'bar']
bye
Paul McNett <p@ulmcnett.com > wrote: Mark Harrison wrote: What is the best way to process a text file of delimited strings? I've got a file where strings are quoted with at-signs, @like this@. At-signs in the string are represented as doubled @@.
Have you taken a look at the csv module yet? No guarantees, but it may just work. You'd have to set delimiter to ' ' and quotechar to '@'. You may need to manually handle the double-@ thing, but why don't you see how close you can get with csv?
This is great! Everything works perfectly. Even the double-@ thing
is handled by the default quotechar handling.
Thanks again,
Mark
--
Mark Harrison
Pixar Animation Studios
Mark Harrison wrote: What is the best way to process a text file of delimited strings? I've got a file where strings are quoted with at-signs, @like this@. At-signs in the string are represented as doubled @@. import re _at_re = re.compile('(?< !@)@(?!@)') def split_at_line(l ine):
.... return [field.replace(' @@', '@') for field in
.... _at_re.split(li ne)]
.... split_at_line(' foo@bar@@baz@qu x')
['foo', 'bar@baz', 'qux']
Mark -
Let me weigh in with a pyparsing entry to your puzzle. It wont be
blazingly fast, but at least it will give you another data point in
your comparison of approaches. Note that the parser can do the
string-to-int conversion for you during the parsing pass.
If @rv@ and @pv@ are record type markers, then you can use pyparsing to
create more of a parser than just a simple tokenizer, and parse out the
individual record fields into result attributes.
Download pyparsing at http://pyparsing.sourceforge.net.
-- Paul
test1 = "@hello@@world@ @foo@bar"
test2 = """@rv@ 2 @db.locks@ @//depot/hello.txt@ @mh@ @mh@ 1 1 44
@pv@ 0 @db.changex@ 44 44 @mh@ @mh@ 1118875308 0 @ :@@: :@@@@: @"""
from pyparsing import *
AT = Literal("@")
atQuotedString = AT.suppress() + Combine(OneOrMo re((~AT + SkipTo(AT)) |
(AT +
AT).setParseAct ion(replaceWith ("@")) )) + AT.suppress()
# extract any @-quoted strings
for test in (test1,test2):
for toks,s,e in atQuotedString. scanString(test ):
print toks
print
# parse all tokens (assume either a positive integer or @-quoted
string)
def makeInt(s,l,tok s):
return int(toks[0])
entry = OneOrMore( Word(nums).setP arseAction(make Int) | atQuotedString
)
for t in test2.split("\n "):
print entry.parseStri ng(t)
Prints out:
['hello@world@fo o']
['rv']
['db.locks']
['//depot/hello.txt']
['mh']
['mh']
['pv']
['db.changex']
['mh']
['mh']
[':@: :@@: ']
['rv', 2, 'db.locks', '//depot/hello.txt', 'mh', 'mh', 1, 1, 44]
['pv', 0, 'db.changex', 44, 44, 'mh', 'mh', 1118875308, 0, ':@: :@@: ']
On Thu, 16 Jun 2005 09:36:56 +1000, John Machin wrote: like this ?
No, not like that. The OP said that an embedded @ was doubled.
you are right, sorry :)
anyway, if @@ -> @
an empty field map to what ? This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Steven Bethard |
last post by:
Here's what I'm doing:
>>> lst =
>>> splits =
>>> for s in lst:
.... pair = s.split(':')
.... if len(pair) != 2:
.... pair.append(None)
.... splits.append(pair)
....
|
by: Dr. StrangeLove |
last post by:
Greetings,
Let say we want to split column 'list' in table lists
into separate rows using the comma as the delimiter.
Table lists
id list
1 aa,bbb,c
2 e,f,gggg,hh
3 ii,kk
4 m
|
by: Vagabond Software |
last post by:
Apparently, the Split method handles consecutive tabs as a single delimiter. Does anyone have any suggestions for handling consecutive tabs?
I am reading in text files that contain lines of tab-delimited data. I was using string stringArray = lineOfText.Split('\t') to automatically populate an array used to populate the values in a new DataRow.
However, sometimes the lines of text contain null values. I can find these null values by...
|
by: Trint Smith |
last post by:
Ok,
My program has been formating .txt files for input into sql server and
ran into a problem...the .txt is an export from an accounting package
and is only supposed to contain comas (,) between fields in a
table...well, someone has been entering description fields with comas
(,) in the description and now it is splitting between one
field...example:
"santa clause mushrooms, pens, cups and dolls"
I somehow need to NOT split anything...
|
by: Opettaja |
last post by:
I am new to c# and I am currently trying to make a program to retrieve
Battlefield 2 game stats from the gamespy servers. I have got it so I
can retrieve the data but I do not know how to cut up the data to
assign each value to its own variable. So right now I am just saving
the data to a txt file and when I look in the text file all the data is
there.
Not sure if this matters but when I open the text file in Word pad
(Rich Text) It...
| |
by: Fariba |
last post by:
Hello ,
I am trying to call a mthod with the following signature:
AddRole(string Group_Nam, string Description, int permissionmask);
Accroding to msdn ,you can mask the permissions using pipe symbol .for
example you can use something like this
AddRole("My Group", "Test", 0x10000000|0x00000002);
|
by: Simon |
last post by:
Well, the title's pretty descriptive; how would I be able to take a
line of input like this:
getline(cin,mostrecentline);
And split into an (flexible) array of strings. For example: "do this
action"
would go to:
item 0: do
|
by: Andrea |
last post by:
I want to write a program that:
char * strplit(char* str1, char *str2, char * stroriginal,int
split_point)
that take stroriginal and split in the split_point element of the
string the string into two other strings,
example:
|
by: shadow_ |
last post by:
Hi i m new at C and trying to write a parser and a string class.
Basicly program will read data from file and splits it into lines then
lines to words. i used strtok function for splitting data to lines it
worked quite well but srttok isnot working for multiple blank or
commas. Can strtok do this kind of splitting if it cant what should i
use .
Unal
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
| |
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
|
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
| |
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| |