473,320 Members | 2,112 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

Regular expressions to parse a CSV line

MW
Dear All

Does anyone have a regular expression to parse a comma delimited line with
some fields optionally having string delimiters (text qualifiers)
I am currently testing with this regular expression and it works in almost
all my test cases. I found this on the internet in a C# solution.

,(?=([^\"]*"[^"]*")*(?![^"]*"))

However in some of my test cases it fails and I am having difficulty
interpreting it.

The VB.NET function I used is

Public Function parseCSVLine(ByVal sInputString As String) As ArrayList
Dim r As New Regex(",(?=([^\" & Chr(34) & "]*" & Chr(34) & "[^" &
Chr(34) & "]*" & Chr(34) & ")*(?![^" & Chr(34) & "]*" & Chr(34) & "))")
Dim iStart As Integer, m As Match
Dim oArrayList As New ArrayList()

For Each m In r.Matches(sInputString)
oArrayList.Add(sInputString.Substring(iStart, m.Index - iStart))
iStart = m.Index + 1
Next
oArrayList.Add(sInputString.Substring(iStart, sInputString.Length -
iStart))

Return oArrayList
End Function

My test cases are as follows:
#
CSV
Value 1
Value 2
Value 3
Value 4
Results

1
a,b,c
a
b
c

P

2
"a",b,c
a
b
c

P

3
'a',b,c
'a'
b
c

P

4
a , b , c
a
b
c

P

5
aa,bb;cc
aa
bb;cc
P

6

P

7
a
a

P

8
,b,

b
P

9
,,c
c

P

10
,,


P

11
"",b

b
P

12
" ",b
[SPACE]
b
P

13
"a,b"
a,b

P

14
"a,b",c
a,b
c
P

15
" a , b ", c
a , b
c
P

16
a b,c
a b
c
P

17
a"b,c
a"b
C
P

18
"a""b",c
a"b
c
P

19
a""b,c
a""b
c
P

20
a,b",c
a
b"
c

O

21
a,b"",c
a
b""
c

P

22
a,"B: ""Hi, I'm B""",c
a
B: "Hi, I'm B"
c

P

23
a,"b,c
a
"b
c

O

24
a,bc"d,e
a
bc"d
e
O

25
a,bc"d",e
a
bc"d"
e

O

26
a,"bc"d,e
a
"bc"d
e

O

Many thanks,
Wazir
Jul 21 '05 #1
18 12303
MW
Apologies for the formatting of test cases, I didnt realise I was posting in
Plain Text.

Here they are again, I hope it is more readable this time.

Some of them like case 20 doesnt work with the regular expression

# CSV Value 1 Value 2 Value 3 Value 4
1 a,b,c a b c
2 "a",b,c a b c
3 'a',b,c 'a' b c
4 a , b , c a b c
5 aa,bb;cc aa bb;cc
6
7 a a
8 ,b, b
9 ,,c c
10 ,,
11 "",b b
12 " ",b [SPACE] b
13 "a,b" a,b
14 "a,b",c a,b c
15 " a , b ", c a , b c
16 a b,c a b c
17 a"b,c a"b c
18 "a""b",c a"b c
19 a""b,c a""b c
20 a,b",c a b" c
21 a,b"",c a b"" c
22 a,"B: ""Hi, I'm B""",c a B: "Hi, I'm B"
c
23 a,"b,c a "b c
24 a,bc"d,e a bc"d e
25 a,bc"d",e a bc"d" e
26 a,"bc"d,e a "bc"d e
Jul 21 '05 #2
MW
Apologies for the formatting of test cases, I didnt realise I was posting in
Plain Text.

Here they are again, I hope it is more readable this time.

Some of them like case 20 doesnt work with the regular expression

# CSV Value 1 Value 2 Value 3 Value 4
1 a,b,c a b c
2 "a",b,c a b c
3 'a',b,c 'a' b c
4 a , b , c a b c
5 aa,bb;cc aa bb;cc
6
7 a a
8 ,b, b
9 ,,c c
10 ,,
11 "",b b
12 " ",b [SPACE] b
13 "a,b" a,b
14 "a,b",c a,b c
15 " a , b ", c a , b c
16 a b,c a b c
17 a"b,c a"b c
18 "a""b",c a"b c
19 a""b,c a""b c
20 a,b",c a b" c
21 a,b"",c a b"" c
22 a,"B: ""Hi, I'm B""",c a B: "Hi, I'm B"
c
23 a,"b,c a "b c
24 a,bc"d,e a bc"d e
25 a,bc"d",e a bc"d" e
26 a,"bc"d,e a "bc"d e
Jul 21 '05 #3
You can use the OleDbCommand class to read a csv file

Tu-Thac
www.ongtech.co

----- MW wrote: ----

Dear Al

Does anyone have a regular expression to parse a comma delimited line wit
some fields optionally having string delimiters (text qualifiers
I am currently testing with this regular expression and it works in almos
all my test cases. I found this on the internet in a C# solution

,(?=([^\"]*"[^"]*")*(?![^"]*")

However in some of my test cases it fails and I am having difficult
interpreting it

The VB.NET function I used i

Public Function parseCSVLine(ByVal sInputString As String) As ArrayLis
Dim r As New Regex(",(?=([^\" & Chr(34) & "]*" & Chr(34) & "[^"
Chr(34) & "]*" & Chr(34) & ")*(?![^" & Chr(34) & "]*" & Chr(34) & "))"
Dim iStart As Integer, m As Matc
Dim oArrayList As New ArrayList(

For Each m In r.Matches(sInputString
oArrayList.Add(sInputString.Substring(iStart, m.Index - iStart)
iStart = m.Index +
Nex
oArrayList.Add(sInputString.Substring(iStart, sInputString.Length
iStart)

Return oArrayLis
End Functio

My test cases are as follows

CS
Value
Value
Value
Value
Result
a,b,



"a",b,



'a',b,
'a


a , b ,



aa,bb;c
a
bb;c







,b



,,


1
,


1
"",


1
" ",
[SPACE

1
"a,b
a,

1
"a,b",
a,

1
" a , b ",
a ,

1
a b,
a

1
a"b,
a"

1
"a""b",
a"

1
a""b,
a""

2
a,b",

b


2
a,b"",

b"


2
a,"B: ""Hi, I'm B""",

B: "Hi, I'm B


2
a,"b,

"


2
a,bc"d,

bc"

2
a,bc"d",

bc"d


2
a,"bc"d,

"bc"


Many thanks
Wazi

Jul 21 '05 #4
You can use the OleDbCommand class to read a csv file

Tu-Thac
www.ongtech.co

----- MW wrote: ----

Dear Al

Does anyone have a regular expression to parse a comma delimited line wit
some fields optionally having string delimiters (text qualifiers
I am currently testing with this regular expression and it works in almos
all my test cases. I found this on the internet in a C# solution

,(?=([^\"]*"[^"]*")*(?![^"]*")

However in some of my test cases it fails and I am having difficult
interpreting it

The VB.NET function I used i

Public Function parseCSVLine(ByVal sInputString As String) As ArrayLis
Dim r As New Regex(",(?=([^\" & Chr(34) & "]*" & Chr(34) & "[^"
Chr(34) & "]*" & Chr(34) & ")*(?![^" & Chr(34) & "]*" & Chr(34) & "))"
Dim iStart As Integer, m As Matc
Dim oArrayList As New ArrayList(

For Each m In r.Matches(sInputString
oArrayList.Add(sInputString.Substring(iStart, m.Index - iStart)
iStart = m.Index +
Nex
oArrayList.Add(sInputString.Substring(iStart, sInputString.Length
iStart)

Return oArrayLis
End Functio

My test cases are as follows

CS
Value
Value
Value
Value
Result
a,b,



"a",b,



'a',b,
'a


a , b ,



aa,bb;c
a
bb;c







,b



,,


1
,


1
"",


1
" ",
[SPACE

1
"a,b
a,

1
"a,b",
a,

1
" a , b ",
a ,

1
a b,
a

1
a"b,
a"

1
"a""b",
a"

1
a""b,
a""

2
a,b",

b


2
a,b"",

b"


2
a,"B: ""Hi, I'm B""",

B: "Hi, I'm B


2
a,"b,

"


2
a,bc"d,

bc"

2
a,bc"d",

bc"d


2
a,"bc"d,

"bc"


Many thanks
Wazi

Jul 21 '05 #5
In article <#h*************@tk2msftngp13.phx.gbl>, MW wrote:
Dear All

Does anyone have a regular expression to parse a comma delimited line with
some fields optionally having string delimiters (text qualifiers)


I just replied to a similar post.

String object has Split call:

"a,b,c,de,fg,hegk".Split(',')
Jul 21 '05 #6
In article <#h*************@tk2msftngp13.phx.gbl>, MW wrote:
Dear All

Does anyone have a regular expression to parse a comma delimited line with
some fields optionally having string delimiters (text qualifiers)


I just replied to a similar post.

String object has Split call:

"a,b,c,de,fg,hegk".Split(',')
Jul 21 '05 #7
>> Does anyone have a regular expression to parse a comma delimited line with
some fields optionally having string delimiters (text qualifiers)
I just replied to a similar post.

String object has Split call:

"a,b,c,de,fg,hegk".Split(',')


This is a problem that is known for a long time to the Perl programmers.
It cannot be solved with regular expressions only.
The simple string above is works, but what about the next one, also valid
CSV:

Item 1,"string with comma, not to split",1234,"now, does it work?",done

This should be split into
Item 1
string with comma, not to split
1234
now, does it work?
done

--
Mihai
-------------------------
Replace _year_ with _ to get the real email
Jul 21 '05 #8
>> Does anyone have a regular expression to parse a comma delimited line with
some fields optionally having string delimiters (text qualifiers)
I just replied to a similar post.

String object has Split call:

"a,b,c,de,fg,hegk".Split(',')


This is a problem that is known for a long time to the Perl programmers.
It cannot be solved with regular expressions only.
The simple string above is works, but what about the next one, also valid
CSV:

Item 1,"string with comma, not to split",1234,"now, does it work?",done

This should be split into
Item 1
string with comma, not to split
1234
now, does it work?
done

--
Mihai
-------------------------
Replace _year_ with _ to get the real email
Jul 21 '05 #9
MW
Sergei,

Split will only work for a comma delimited file and will not work with text
qualifiers (string delimiters)

"Sergei Gnezdov" <no****@blackhole.net> wrote in message
news:sl*******************@STA01052.itron.com...
In article <#h*************@tk2msftngp13.phx.gbl>, MW wrote:
Dear All

Does anyone have a regular expression to parse a comma delimited line with some fields optionally having string delimiters (text qualifiers)


I just replied to a similar post.

String object has Split call:

"a,b,c,de,fg,hegk".Split(',')

Jul 21 '05 #10
MW
Sergei,

Split will only work for a comma delimited file and will not work with text
qualifiers (string delimiters)

"Sergei Gnezdov" <no****@blackhole.net> wrote in message
news:sl*******************@STA01052.itron.com...
In article <#h*************@tk2msftngp13.phx.gbl>, MW wrote:
Dear All

Does anyone have a regular expression to parse a comma delimited line with some fields optionally having string delimiters (text qualifiers)


I just replied to a similar post.

String object has Split call:

"a,b,c,de,fg,hegk".Split(',')

Jul 21 '05 #11
so when you iterate Split's result, replace (^"|$") with nothing...

MW wrote:
Sergei,

Split will only work for a comma delimited file and will not work with text
qualifiers (string delimiters)

"Sergei Gnezdov" <no****@blackhole.net> wrote in message
news:sl*******************@STA01052.itron.com...
In article <#h*************@tk2msftngp13.phx.gbl>, MW wrote:
Dear All

Does anyone have a regular expression to parse a comma delimited line
with
some fields optionally having string delimiters (text qualifiers)


I just replied to a similar post.

String object has Split call:

"a,b,c,de,fg,hegk".Split(',')


Jul 21 '05 #12
MW
Uri,

The split itself will be incorrect if there is a comma in a column with a
string delimiter

For example:
123, "Mathew, Dale", 24, "london"

Thanks,
Wazir

"Uri Dor" <re***************@mivzak.com> wrote in message
news:uT**************@TK2MSFTNGP12.phx.gbl...
so when you iterate Split's result, replace (^"|$") with nothing...

Jul 21 '05 #13
so when you iterate Split's result, replace (^"|$") with nothing...

MW wrote:
Sergei,

Split will only work for a comma delimited file and will not work with text
qualifiers (string delimiters)

"Sergei Gnezdov" <no****@blackhole.net> wrote in message
news:sl*******************@STA01052.itron.com...
In article <#h*************@tk2msftngp13.phx.gbl>, MW wrote:
Dear All

Does anyone have a regular expression to parse a comma delimited line
with
some fields optionally having string delimiters (text qualifiers)


I just replied to a similar post.

String object has Split call:

"a,b,c,de,fg,hegk".Split(',')


Jul 21 '05 #14
MW
Uri,

The split itself will be incorrect if there is a comma in a column with a
string delimiter

For example:
123, "Mathew, Dale", 24, "london"

Thanks,
Wazir

"Uri Dor" <re***************@mivzak.com> wrote in message
news:uT**************@TK2MSFTNGP12.phx.gbl...
so when you iterate Split's result, replace (^"|$") with nothing...

Jul 21 '05 #15
not everything can be done with a regex. maybe you need a parser. a
quick search yielded this (didn't try it myself):
http://www.codeproject.com/csharp/sp...&select=697728
it's a port from Boost's Spirit

MW wrote:
Uri,

The split itself will be incorrect if there is a comma in a column with a
string delimiter

For example:
123, "Mathew, Dale", 24, "london"

Thanks,
Wazir

"Uri Dor" <re***************@mivzak.com> wrote in message
news:uT**************@TK2MSFTNGP12.phx.gbl...
so when you iterate Split's result, replace (^"|$") with nothing...


Jul 21 '05 #16
not everything can be done with a regex. maybe you need a parser. a
quick search yielded this (didn't try it myself):
http://www.codeproject.com/csharp/sp...&select=697728
it's a port from Boost's Spirit

MW wrote:
Uri,

The split itself will be incorrect if there is a comma in a column with a
string delimiter

For example:
123, "Mathew, Dale", 24, "london"

Thanks,
Wazir

"Uri Dor" <re***************@mivzak.com> wrote in message
news:uT**************@TK2MSFTNGP12.phx.gbl...
so when you iterate Split's result, replace (^"|$") with nothing...


Jul 21 '05 #17
On 2004-04-19, Uri Dor <re***************@mivzak.com> wrote:
so when you iterate Split's result, replace (^"|$") with nothing...

MW wrote:
Sergei,

Split will only work for a comma delimited file and will not work with text
qualifiers (string delimiters)


This is true. You could parse CVS line in like this:
1) replace qualifiers with some unique id in a hashtable
2) run split
3) replace ids with with their content

Since the description above is no longer that simple, I'd go for parse
it yourself. It is not as hard as it sounds. It will involve one loop,
and a couple variables: start of the substring and a qualifier state
(on/off). When qualifier is on you search for the end of the qualifier.

I think that regular expressions are too hard to work with in this case,
but that's me.
Jul 21 '05 #18
On 2004-04-19, Uri Dor <re***************@mivzak.com> wrote:
so when you iterate Split's result, replace (^"|$") with nothing...

MW wrote:
Sergei,

Split will only work for a comma delimited file and will not work with text
qualifiers (string delimiters)


This is true. You could parse CVS line in like this:
1) replace qualifiers with some unique id in a hashtable
2) run split
3) replace ids with with their content

Since the description above is no longer that simple, I'd go for parse
it yourself. It is not as hard as it sounds. It will involve one loop,
and a couple variables: start of the substring and a qualifier state
(on/off). When qualifier is on you search for the end of the qualifier.

I think that regular expressions are too hard to work with in this case,
but that's me.
Jul 21 '05 #19

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: dmbkiwi | last post by:
I have a problem. I have written a python based theme for a linux app called superkaramba, which is effectively an engine for desktop applets that utilises python as its theming language. The...
8
by: Michael McGarry | last post by:
Hi, I am horrible with Regular Expressions, can anyone recommend a book on it? Also I am trying to parse the following string to extract the number after load average. ".... load average:...
11
by: Martin Robins | last post by:
I am trying to parse a string that is similar in form to an OLEDB connection string using regular expressions; in principle it is working, but certain character combinations in the string being...
10
by: hclugano | last post by:
Hi! There is a SQL Create Table statement, for example: CREATE TABLE . ( IDENTITY (1, 1) NOT NULL, NOT NULL, ( 15) NOT NULL ..... ) ON
8
by: moondaddy | last post by:
I'm writing an app in vb.net 1.1 and I need to parse strings that look similar to the one below. All 5 rows will make up one string. I have a form where a use can copy/paste data like what you...
18
by: MW | last post by:
Dear All Does anyone have a regular expression to parse a comma delimited line with some fields optionally having string delimiters (text qualifiers) I am currently testing with this regular...
6
by: John Salerno | last post by:
Ok, this might look familiar. I'd like to use regular expressions to change this line: self.source += '<p>' + paragraph + '</p>\n\n' to read: self.source += '<p>%s</p>\n\n' % paragraph ...
25
by: Mike | last post by:
I have a regular expression (^(.+)(?=\s*).*\1 ) that results in matches. I would like to get what the actual regular expression is. In other words, when I apply ^(.+)(?=\s*).*\1 to " HEART...
13
by: Wiseman | last post by:
I'm kind of disappointed with the re regular expressions module. In particular, the lack of support for recursion ( (?R) or (?n) ) is a major drawback to me. There are so many great things that can...
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
0
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.