Hi --
I am trying to use the csv module to parse a column of values
containing comma-delimited values with unusual escaping:
AAA, BBB, CCC (some text, right here), DDD
I want this to come back as:
["AAA", "BBB", "CCC (some text, right here)", "DDD"]
I think this is probably non-standard escaping, as I can't figure out
how to structure a csv dialect to handle it correctly. I can probably
hack this with regular expressions but I thought I'd check to see if
anyone had any quick suggestions for how to do this elegantly first.
Thanks!
Ramon 8 5705
Ramon> I am trying to use the csv module to parse a column of values
Ramon> containing comma-delimited values with unusual escaping:
Ramon> AAA, BBB, CCC (some text, right here), DDD
Ramon> I want this to come back as:
Ramon> ["AAA", "BBB", "CCC (some text, right here)", "DDD"]
Alas, there's no "escaping" at all in the line above. I see no obvious way
to distinguish one comma from another in this example. If you mean the fact
that the comma you want to retain is in parens, that's not escaping. Escape
characters don't appear in the output as they do in your example.
Ramon> I can probably hack this with regular expressions but I thought
Ramon> I'd check to see if anyone had any quick suggestions for how to
Ramon> do this elegantly first.
I see nothing obvious unless you truly mean that the beginning of each field
is all caps. In that case you could wrap a file object and :
import re
class FunnyWrapper:
"""untested """
def __init__(self, f):
self.f = f
def __iter__(self):
return self
def next(self):
return '"' + re.sub(r',( *[A-Z]+)', r'","\1', self.f.next()) + '"'
and use it like so:
reader = csv.reader(Funn yWrapper(open(" somefile.csv", "rb")))
for row in reader:
print row
(I'm not sure what the ramifications are of iterating over a file opened in
binary mode.)
Skip
Try this.
re.findall(r'(. +? \(.+?\))(?:,|$) ',yourtexthere)
Oops, the above code doesn't quite work. Use this one instead.
re.findall(r'(. +? (?:\(.+?\))?)(? :,|$)',yourtext here)
Well, this doesn't have the terseness of an re solution, but it
shouldn't be hard to follow.
-- Paul
#~ This is a very crude first pass. It does not handle nested
#~ ()'s, nor ()'s inside quotes. But if your data does not
#~ stray too far from the example, this will probably do the job.
#~ Download pyparsing at http://pyparsing.sourceforge.net.
import pyparsing as pp
test = "AAA, BBB , CCC (some text, right here), DDD"
COMMA = pp.Literal(",")
LPAREN = pp.Literal("(")
RPAREN = pp.Literal(")")
parenthesizedTe xt = LPAREN + pp.SkipTo(RPARE N) + RPAREN
nonCommaChars = "".join( [ chr(c) for c in range(32,127)
if c not in map(ord,list(", ()")) ] )
nonCommaText = pp.Word(nonComm aChars)
commaListEntry = pp.Combine(pp.O neOrMore( parenthesizedTe xt |
nonCommaText ),adjacent=Fals e)
commaListEntry. setParseAction( lambda s,l,t: t[0].strip() )
csvList = pp.delimitedLis t( commaListEntry )
print csvList.parseSt ring(test)
Why don't you use a different delimiter when you're writing the CSV? fe******@gmail. com writes: I am trying to use the csv module to parse a column of values containing comma-delimited values with unusual escaping:
AAA, BBB, CCC (some text, right here), DDD
I want this to come back as:
["AAA", "BBB", "CCC (some text, right here)", "DDD"]
Quick and somewhat dirty: change your delimiter to a char that never exists in
fields (eg. null character '\0').
Example: s = 'AAA\0 BBB\0 CCC (some text, right here)\0 DDD' [f.strip() for f in s.split('\0')]
['AAA', 'BBB', 'CCC (some text, right here)', 'DDD']
But then you'd need to be certain there's no null character in the input
lines by checking it:
colsep = '\0'
for field in inputs:
if colsep in field:
raise IllegalCharExce ption('invalid chars in field %s' % field)
If you need to stick with comma as a separator and the format is relatively
fixed, I'd probably use some parser module instead. Regular expressions are
nice too, but it is easy to make a mistake with those, and for non-trivial
stuff they tend to become write-only.
--
# Edvard Majakari Software Engineer
# PGP PUBLIC KEY available Soli Deo Gloria!
$_ = '45647661726420 4d616a616b61726 92c206120436872 69737469616e20' ; print
join('',map{chr hex}(split/(\w{2})/)),uc substr(crypt(60 281449,'es'),2, 4),"\n";
Thanks for all the postings. I can't change delimiter in the source
itself, so I'm doing it temporarily just to handle the escaping:
def splitWithEscape dCommasInParens (s, trim=False):
pat = re.compile(r"(. +?\([^\(\),]*?),(.+?\).*)")
while pat.search(s):
s = re.sub(pat,r"\1 |\2",s)
if trim:
return [string.strip(st ring.replace(x, "|",",")) for x in
string.split(s, ",")]
else:
return [string.replace( x,"|",",") for x in string.split(s, ",")]
Probably not the most efficient, but its "the simplest thing that
works" for me :-)
Thanks again for all the quick responses.
Ramon
felciano <fe******@gmail .com> wrote: Thanks for all the postings. I can't change delimiter in the source itself, so I'm doing it temporarily just to handle the escaping:
def splitWithEscape dCommasInParens (s, trim=False): pat = re.compile(r"(. +?\([^\(\),]*?),(.+?\).*)") while pat.search(s): s = re.sub(pat,r"\1 |\2",s) if trim: return [string.strip(st ring.replace(x, "|",",")) for x in string.split(s, ",")] else: return [string.replace( x,"|",",") for x in string.split(s, ",")]
Probably not the most efficient, but its "the simplest thing that works" for me :-)
Thanks again for all the quick responses.
How about changing '(' or ')' into three double-quotes '"""'? That will
solve splitting issue. But, I'm not sure how you would get back '(' or
')', without much coding.
--
William Park <op**********@y ahoo.ca>, Toronto, Canada
ThinFlash: Linux thin-client on USB key (flash) drive http://home.eol.ca/~parkw/thinflash.html
BashDiff: Super Bash shell http://freshmeat.net/projects/bashdiff/ This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Dave Moore |
last post by:
Hi All,
Can anybody point me to a FAQ or similar that describes what all this
stuff is about please?. I'm interfacing with a MySQL database if that's
relavent. I've read a couple of books which refer to stripslahes and
'escaping' but nothing really explains what these terms are and why these
are used. Why is 'escaping' (whatever that is) used?. What the hell is a
magic quote?. How is it different from a non-magic one?.
Regards,
Dave
|
by: Allan |
last post by:
Hi All,
I am having a problem parsing an xml file I am getting from another server.
This is the portion of the xml I am getting I am interested in:
<DestinationAddress>
<City>Leawood</City>
<StateOrProvinceCode>KS</StateOrProvinceCode>
<PostalCode>66209</PostalCode>
|
by: Vishal |
last post by:
I need a simple method to find whether there are any instances of consecutive
commas (more than 1) in a given string without parsing each character of the
string. I tried with strtok() with comma as separator but it considers all
consecutive commas as a single separator and gives the next token.
Is there any simple method to do the same?
|
by: Frank Rizzo |
last post by:
Hello,
I'd like to have the following structure in my XML file
<lname, _fname, _minit>
<status>it is all good</status>
</lname, _fname, _minit>
But apparently, there is a problem with commas and underscores being in
the key name of the node. How can I escape it?
|
by: dmitrey |
last post by:
Hi all,
I looked to the PEPs & didn't find a proposition to remove brackets &
commas for to make Python func call syntax caml- or tcl- like: instead
of
result = myfun(param1, myfun2(param5, param8), param3)
just make possible using
result = myfun param1 (myfun2 param5 param8) param3
it would reduce length of code lines and make them more readable, + no
needs to write annoing charecters.
| |
by: korovev76 |
last post by:
Hello everybody.
I'm wondering how to iterate over a tuple like this
while saving A and C in a list.
My problem is that C sometimes is a tuple of the same structure
itself...
|
by: Bruce |
last post by:
I'm outputting form content into a csv file. If a comma is used in one
of the fields, however, it will interpret to go to next column. Is there
a workaround? Thanks.
$fp = fopen('my.csv','a');
$content = "$var1,$var2,$var3...
fwrite($fp,$content);
|
by: E11esar |
last post by:
Hi there. This could be a curious one.
Has anybody come across a solution to remove stray commas that appear within strings in a CSV file please?
In effect I have many address fields that are punctuated with commas and I am looking for a way to remove these while parsing the csv file.
Any ideas will be most appreciated please.
Thank you.
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
| |
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
|
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
| |