Dear All
Does anyone have a regular expression to parse a comma delimited line with
some fields optionally having string delimiters (text qualifiers)
I am currently testing with this regular expression and it works in almost
all my test cases. I found this on the internet in a C# solution.
,(?=([^\"]*"[^"]*")*(?![^"]*"))
However in some of my test cases it fails and I am having difficulty
interpreting it.
The VB.NET function I used is
Public Function parseCSVLine(ByVal sInputString As String) As ArrayList
Dim r As New Regex(",(?=([^\" & Chr(34) & "]*" & Chr(34) & "[^" &
Chr(34) & "]*" & Chr(34) & ")*(?![^" & Chr(34) & "]*" & Chr(34) & "))")
Dim iStart As Integer, m As Match
Dim oArrayList As New ArrayList()
For Each m In r.Matches(sInputString)
oArrayList.Add(sInputString.Substring(iStart, m.Index - iStart))
iStart = m.Index + 1
Next
oArrayList.Add(sInputString.Substring(iStart, sInputString.Length -
iStart))
Return oArrayList
End Function
My test cases are as follows:
#
CSV
Value 1
Value 2
Value 3
Value 4
Results
1
a,b,c
a
b
c
P
2
"a",b,c
a
b
c
P
3
'a',b,c
'a'
b
c
P
4
a , b , c
a
b
c
P
5
aa,bb;cc
aa
bb;cc
P
6
P
7
a
a
P
8
,b,
b
P
9
,,c
c
P
10
,,
P
11
"",b
b
P
12
" ",b
[SPACE]
b
P
13
"a,b"
a,b
P
14
"a,b",c
a,b
c
P
15
" a , b ", c
a , b
c
P
16
a b,c
a b
c
P
17
a"b,c
a"b
C
P
18
"a""b",c
a"b
c
P
19
a""b,c
a""b
c
P
20
a,b",c
a
b"
c
O
21
a,b"",c
a
b""
c
P
22
a,"B: ""Hi, I'm B""",c
a
B: "Hi, I'm B"
c
P
23
a,"b,c
a
"b
c
O
24
a,bc"d,e
a
bc"d
e
O
25
a,bc"d",e
a
bc"d"
e
O
26
a,"bc"d,e
a
"bc"d
e
O
Many thanks,
Wazir 18 12303
Apologies for the formatting of test cases, I didnt realise I was posting in
Plain Text.
Here they are again, I hope it is more readable this time.
Some of them like case 20 doesnt work with the regular expression
# CSV Value 1 Value 2 Value 3 Value 4
1 a,b,c a b c
2 "a",b,c a b c
3 'a',b,c 'a' b c
4 a , b , c a b c
5 aa,bb;cc aa bb;cc
6
7 a a
8 ,b, b
9 ,,c c
10 ,,
11 "",b b
12 " ",b [SPACE] b
13 "a,b" a,b
14 "a,b",c a,b c
15 " a , b ", c a , b c
16 a b,c a b c
17 a"b,c a"b c
18 "a""b",c a"b c
19 a""b,c a""b c
20 a,b",c a b" c
21 a,b"",c a b"" c
22 a,"B: ""Hi, I'm B""",c a B: "Hi, I'm B"
c
23 a,"b,c a "b c
24 a,bc"d,e a bc"d e
25 a,bc"d",e a bc"d" e
26 a,"bc"d,e a "bc"d e
Apologies for the formatting of test cases, I didnt realise I was posting in
Plain Text.
Here they are again, I hope it is more readable this time.
Some of them like case 20 doesnt work with the regular expression
# CSV Value 1 Value 2 Value 3 Value 4
1 a,b,c a b c
2 "a",b,c a b c
3 'a',b,c 'a' b c
4 a , b , c a b c
5 aa,bb;cc aa bb;cc
6
7 a a
8 ,b, b
9 ,,c c
10 ,,
11 "",b b
12 " ",b [SPACE] b
13 "a,b" a,b
14 "a,b",c a,b c
15 " a , b ", c a , b c
16 a b,c a b c
17 a"b,c a"b c
18 "a""b",c a"b c
19 a""b,c a""b c
20 a,b",c a b" c
21 a,b"",c a b"" c
22 a,"B: ""Hi, I'm B""",c a B: "Hi, I'm B"
c
23 a,"b,c a "b c
24 a,bc"d,e a bc"d e
25 a,bc"d",e a bc"d" e
26 a,"bc"d,e a "bc"d e
You can use the OleDbCommand class to read a csv file
Tu-Thac www.ongtech.co
----- MW wrote: ----
Dear Al
Does anyone have a regular expression to parse a comma delimited line wit
some fields optionally having string delimiters (text qualifiers
I am currently testing with this regular expression and it works in almos
all my test cases. I found this on the internet in a C# solution
,(?=([^\"]*"[^"]*")*(?![^"]*")
However in some of my test cases it fails and I am having difficult
interpreting it
The VB.NET function I used i
Public Function parseCSVLine(ByVal sInputString As String) As ArrayLis
Dim r As New Regex(",(?=([^\" & Chr(34) & "]*" & Chr(34) & "[^"
Chr(34) & "]*" & Chr(34) & ")*(?![^" & Chr(34) & "]*" & Chr(34) & "))"
Dim iStart As Integer, m As Matc
Dim oArrayList As New ArrayList(
For Each m In r.Matches(sInputString
oArrayList.Add(sInputString.Substring(iStart, m.Index - iStart)
iStart = m.Index +
Nex
oArrayList.Add(sInputString.Substring(iStart, sInputString.Length
iStart)
Return oArrayLis
End Functio
My test cases are as follows
CS
Value
Value
Value
Value
Result
a,b,
"a",b,
'a',b,
'a
a , b ,
aa,bb;c
a
bb;c
,b
,,
1
,
1
"",
1
" ",
[SPACE
1
"a,b
a,
1
"a,b",
a,
1
" a , b ",
a ,
1
a b,
a
1
a"b,
a"
1
"a""b",
a"
1
a""b,
a""
2
a,b",
b
2
a,b"",
b"
2
a,"B: ""Hi, I'm B""",
B: "Hi, I'm B
2
a,"b,
"
2
a,bc"d,
bc"
2
a,bc"d",
bc"d
2
a,"bc"d,
"bc"
Many thanks
Wazi
You can use the OleDbCommand class to read a csv file
Tu-Thac www.ongtech.co
----- MW wrote: ----
Dear Al
Does anyone have a regular expression to parse a comma delimited line wit
some fields optionally having string delimiters (text qualifiers
I am currently testing with this regular expression and it works in almos
all my test cases. I found this on the internet in a C# solution
,(?=([^\"]*"[^"]*")*(?![^"]*")
However in some of my test cases it fails and I am having difficult
interpreting it
The VB.NET function I used i
Public Function parseCSVLine(ByVal sInputString As String) As ArrayLis
Dim r As New Regex(",(?=([^\" & Chr(34) & "]*" & Chr(34) & "[^"
Chr(34) & "]*" & Chr(34) & ")*(?![^" & Chr(34) & "]*" & Chr(34) & "))"
Dim iStart As Integer, m As Matc
Dim oArrayList As New ArrayList(
For Each m In r.Matches(sInputString
oArrayList.Add(sInputString.Substring(iStart, m.Index - iStart)
iStart = m.Index +
Nex
oArrayList.Add(sInputString.Substring(iStart, sInputString.Length
iStart)
Return oArrayLis
End Functio
My test cases are as follows
CS
Value
Value
Value
Value
Result
a,b,
"a",b,
'a',b,
'a
a , b ,
aa,bb;c
a
bb;c
,b
,,
1
,
1
"",
1
" ",
[SPACE
1
"a,b
a,
1
"a,b",
a,
1
" a , b ",
a ,
1
a b,
a
1
a"b,
a"
1
"a""b",
a"
1
a""b,
a""
2
a,b",
b
2
a,b"",
b"
2
a,"B: ""Hi, I'm B""",
B: "Hi, I'm B
2
a,"b,
"
2
a,bc"d,
bc"
2
a,bc"d",
bc"d
2
a,"bc"d,
"bc"
Many thanks
Wazi
In article <#h*************@tk2msftngp13.phx.gbl>, MW wrote: Dear All
Does anyone have a regular expression to parse a comma delimited line with some fields optionally having string delimiters (text qualifiers)
I just replied to a similar post.
String object has Split call:
"a,b,c,de,fg,hegk".Split(',')
In article <#h*************@tk2msftngp13.phx.gbl>, MW wrote: Dear All
Does anyone have a regular expression to parse a comma delimited line with some fields optionally having string delimiters (text qualifiers)
I just replied to a similar post.
String object has Split call:
"a,b,c,de,fg,hegk".Split(',')
>> Does anyone have a regular expression to parse a comma delimited line with some fields optionally having string delimiters (text qualifiers)
I just replied to a similar post.
String object has Split call:
"a,b,c,de,fg,hegk".Split(',')
This is a problem that is known for a long time to the Perl programmers.
It cannot be solved with regular expressions only.
The simple string above is works, but what about the next one, also valid
CSV:
Item 1,"string with comma, not to split",1234,"now, does it work?",done
This should be split into
Item 1
string with comma, not to split
1234
now, does it work?
done
--
Mihai
-------------------------
Replace _year_ with _ to get the real email
>> Does anyone have a regular expression to parse a comma delimited line with some fields optionally having string delimiters (text qualifiers)
I just replied to a similar post.
String object has Split call:
"a,b,c,de,fg,hegk".Split(',')
This is a problem that is known for a long time to the Perl programmers.
It cannot be solved with regular expressions only.
The simple string above is works, but what about the next one, also valid
CSV:
Item 1,"string with comma, not to split",1234,"now, does it work?",done
This should be split into
Item 1
string with comma, not to split
1234
now, does it work?
done
--
Mihai
-------------------------
Replace _year_ with _ to get the real email
Sergei,
Split will only work for a comma delimited file and will not work with text
qualifiers (string delimiters)
"Sergei Gnezdov" <no****@blackhole.net> wrote in message
news:sl*******************@STA01052.itron.com... In article <#h*************@tk2msftngp13.phx.gbl>, MW wrote: Dear All
Does anyone have a regular expression to parse a comma delimited line
with some fields optionally having string delimiters (text qualifiers)
I just replied to a similar post.
String object has Split call:
"a,b,c,de,fg,hegk".Split(',')
Sergei,
Split will only work for a comma delimited file and will not work with text
qualifiers (string delimiters)
"Sergei Gnezdov" <no****@blackhole.net> wrote in message
news:sl*******************@STA01052.itron.com... In article <#h*************@tk2msftngp13.phx.gbl>, MW wrote: Dear All
Does anyone have a regular expression to parse a comma delimited line
with some fields optionally having string delimiters (text qualifiers)
I just replied to a similar post.
String object has Split call:
"a,b,c,de,fg,hegk".Split(',')
so when you iterate Split's result, replace (^"|$") with nothing...
MW wrote: Sergei,
Split will only work for a comma delimited file and will not work with text qualifiers (string delimiters)
"Sergei Gnezdov" <no****@blackhole.net> wrote in message news:sl*******************@STA01052.itron.com...
In article <#h*************@tk2msftngp13.phx.gbl>, MW wrote:
Dear All
Does anyone have a regular expression to parse a comma delimited line with some fields optionally having string delimiters (text qualifiers)
I just replied to a similar post.
String object has Split call:
"a,b,c,de,fg,hegk".Split(',')
Uri,
The split itself will be incorrect if there is a comma in a column with a
string delimiter
For example:
123, "Mathew, Dale", 24, "london"
Thanks,
Wazir
"Uri Dor" <re***************@mivzak.com> wrote in message
news:uT**************@TK2MSFTNGP12.phx.gbl... so when you iterate Split's result, replace (^"|$") with nothing...
so when you iterate Split's result, replace (^"|$") with nothing...
MW wrote: Sergei,
Split will only work for a comma delimited file and will not work with text qualifiers (string delimiters)
"Sergei Gnezdov" <no****@blackhole.net> wrote in message news:sl*******************@STA01052.itron.com...
In article <#h*************@tk2msftngp13.phx.gbl>, MW wrote:
Dear All
Does anyone have a regular expression to parse a comma delimited line with some fields optionally having string delimiters (text qualifiers)
I just replied to a similar post.
String object has Split call:
"a,b,c,de,fg,hegk".Split(',')
Uri,
The split itself will be incorrect if there is a comma in a column with a
string delimiter
For example:
123, "Mathew, Dale", 24, "london"
Thanks,
Wazir
"Uri Dor" <re***************@mivzak.com> wrote in message
news:uT**************@TK2MSFTNGP12.phx.gbl... so when you iterate Split's result, replace (^"|$") with nothing...
not everything can be done with a regex. maybe you need a parser. a
quick search yielded this (didn't try it myself): http://www.codeproject.com/csharp/sp...&select=697728
it's a port from Boost's Spirit
MW wrote: Uri,
The split itself will be incorrect if there is a comma in a column with a string delimiter
For example: 123, "Mathew, Dale", 24, "london"
Thanks, Wazir
"Uri Dor" <re***************@mivzak.com> wrote in message news:uT**************@TK2MSFTNGP12.phx.gbl...
so when you iterate Split's result, replace (^"|$") with nothing...
not everything can be done with a regex. maybe you need a parser. a
quick search yielded this (didn't try it myself): http://www.codeproject.com/csharp/sp...&select=697728
it's a port from Boost's Spirit
MW wrote: Uri,
The split itself will be incorrect if there is a comma in a column with a string delimiter
For example: 123, "Mathew, Dale", 24, "london"
Thanks, Wazir
"Uri Dor" <re***************@mivzak.com> wrote in message news:uT**************@TK2MSFTNGP12.phx.gbl...
so when you iterate Split's result, replace (^"|$") with nothing...
On 2004-04-19, Uri Dor <re***************@mivzak.com> wrote: so when you iterate Split's result, replace (^"|$") with nothing...
MW wrote:
Sergei,
Split will only work for a comma delimited file and will not work with text qualifiers (string delimiters)
This is true. You could parse CVS line in like this:
1) replace qualifiers with some unique id in a hashtable
2) run split
3) replace ids with with their content
Since the description above is no longer that simple, I'd go for parse
it yourself. It is not as hard as it sounds. It will involve one loop,
and a couple variables: start of the substring and a qualifier state
(on/off). When qualifier is on you search for the end of the qualifier.
I think that regular expressions are too hard to work with in this case,
but that's me.
On 2004-04-19, Uri Dor <re***************@mivzak.com> wrote: so when you iterate Split's result, replace (^"|$") with nothing...
MW wrote:
Sergei,
Split will only work for a comma delimited file and will not work with text qualifiers (string delimiters)
This is true. You could parse CVS line in like this:
1) replace qualifiers with some unique id in a hashtable
2) run split
3) replace ids with with their content
Since the description above is no longer that simple, I'd go for parse
it yourself. It is not as hard as it sounds. It will involve one loop,
and a couple variables: start of the substring and a qualifier state
(on/off). When qualifier is on you search for the end of the qualifier.
I think that regular expressions are too hard to work with in this case,
but that's me. This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: dmbkiwi |
last post by:
I have a problem. I have written a python based theme for a linux app
called superkaramba, which is effectively an engine for desktop applets
that utilises python as its theming language. The...
|
by: Michael McGarry |
last post by:
Hi,
I am horrible with Regular Expressions, can anyone recommend a book on it?
Also I am trying to parse the following string to extract the number
after load average.
".... load average:...
|
by: Martin Robins |
last post by:
I am trying to parse a string that is similar in form to an OLEDB connection string using regular expressions; in principle it is working, but certain character combinations in the string being...
|
by: hclugano |
last post by:
Hi!
There is a SQL Create Table statement, for example:
CREATE TABLE . (
IDENTITY (1, 1) NOT NULL,
NOT NULL,
( 15) NOT NULL
.....
) ON
|
by: moondaddy |
last post by:
I'm writing an app in vb.net 1.1 and I need to parse strings that look
similar to the one below. All 5 rows will make up one string. I have a
form where a use can copy/paste data like what you...
|
by: MW |
last post by:
Dear All
Does anyone have a regular expression to parse a comma delimited line with
some fields optionally having string delimiters (text qualifiers)
I am currently testing with this regular...
|
by: John Salerno |
last post by:
Ok, this might look familiar. I'd like to use regular expressions to
change this line:
self.source += '<p>' + paragraph + '</p>\n\n'
to read:
self.source += '<p>%s</p>\n\n' % paragraph
...
|
by: Mike |
last post by:
I have a regular expression (^(.+)(?=\s*).*\1 ) that results in
matches. I would like to get what the actual regular expression is.
In other words, when I apply ^(.+)(?=\s*).*\1 to " HEART...
|
by: Wiseman |
last post by:
I'm kind of disappointed with the re regular expressions module. In
particular, the lack of support for recursion ( (?R) or (?n) ) is a
major drawback to me. There are so many great things that can...
|
by: DolphinDB |
last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation.
Take...
|
by: DolphinDB |
last post by:
Tired of spending countless mintues downsampling your data? Look no further!
In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM).
In this month's session, we are pleased to welcome back...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM).
In this month's session, we are pleased to welcome back...
|
by: jfyes |
last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
|
by: ArrayDB |
last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
|
by: CloudSolutions |
last post by:
Introduction:
For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
|
by: Defcon1945 |
last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
|
by: af34tf |
last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
| |