Connecting Tech Pros Worldwide Forums | Help | Site Map

Parsing Comma-delimited records?

VMI
Guest
 
Posts: n/a
#1: Jul 17 '06
I'm parsing a comma-delimited record but I want it to do something if some of
the string is between "". How can I do this? With the Excel import it does
it correct. I'm using String.Split().
Basically, this is what I want to do: Use string.Split() on the whole string
UNLESS the string is in between double-quotes. The part of the string
in-between the "" will be ignored by String.Split

Thanks.

Bruce Wood
Guest
 
Posts: n/a
#2: Jul 17 '06

re: Parsing Comma-delimited records?



VMI wrote:
Quote:
I'm parsing a comma-delimited record but I want it to do something if some of
the string is between "". How can I do this? With the Excel import it does
it correct. I'm using String.Split().
Basically, this is what I want to do: Use string.Split() on the whole string
UNLESS the string is in between double-quotes. The part of the string
in-between the "" will be ignored by String.Split
I would abandon String.Split and split the string myself, coping with
quotation marks and escape characters as necessary. It's a very simple
little loop to do it, easy to write and maintain. Sometimes the
Framework-supplied methods aren't the best way to solve a problem,
IMHO.

Unless someone else has a clever trick...?

Nicholas Paldino [.NET/C# MVP]
Guest
 
Posts: n/a
#3: Jul 17 '06

re: Parsing Comma-delimited records?


VMI,

You will have to cycle through the string character by character and
take note of the state, or use regular expressions in this case.

Hope this helps.


--
- Nicholas Paldino [.NET/C# MVP]
- mvp@spam.guard.caspershouse.com

"VMI" <VMI@discussions.microsoft.comwrote in message
news:BA14F56B-A09B-4DB1-8C7A-EDC136CD13A7@microsoft.com...
Quote:
I'm parsing a comma-delimited record but I want it to do something if some
of
the string is between "". How can I do this? With the Excel import it
does
it correct. I'm using String.Split().
Basically, this is what I want to do: Use string.Split() on the whole
string
UNLESS the string is in between double-quotes. The part of the string
in-between the "" will be ignored by String.Split
>
Thanks.

William Stacey [MVP]
Guest
 
Posts: n/a
#4: Jul 17 '06

re: Parsing Comma-delimited records?


Here is a method on my blog.
http://staceyw.spaces.msn.com/blog/cns!F4A38E96E598161E!352.entry?

--
William Stacey [MVP]

"VMI" <VMI@discussions.microsoft.comwrote in message
news:BA14F56B-A09B-4DB1-8C7A-EDC136CD13A7@microsoft.com...
| I'm parsing a comma-delimited record but I want it to do something if some
of
| the string is between "". How can I do this? With the Excel import it
does
| it correct. I'm using String.Split().
| Basically, this is what I want to do: Use string.Split() on the whole
string
| UNLESS the string is in between double-quotes. The part of the string
| in-between the "" will be ignored by String.Split
|
| Thanks.


Tom Spink
Guest
 
Posts: n/a
#5: Jul 17 '06

re: Parsing Comma-delimited records?


VMI wrote:
Quote:
I'm parsing a comma-delimited record but I want it to do something if some
of
the string is between "". How can I do this? With the Excel import it
does it correct. I'm using String.Split().
Basically, this is what I want to do: Use string.Split() on the whole
string UNLESS the string is in between double-quotes. The part of the
string in-between the "" will be ignored by String.Split
>
Thanks.
Hi,

I wrote a wee lexer that did this, somewhere earlier in the newsgroup. I've
found it, and here it is:

///
public string[] GetStringParts ( string inputString )
{
List<stringretVal = new List<string>();
string currentPart = string.Empty;
int lexerState = 0;

for ( int i = 0; i < inputString.Length; i++ )
{
switch ( lexerState )
{
case 0:
if ( inputString[i] == ',' )
{
retVal.Add( currentPart.Trim() );
currentPart = string.Empty;
}
else if ( inputString[i] == '"' )
lexerState = 1;
else
currentPart += inputString[i];
break;

case 1:
if ( inputString[i] == '"' )
lexerState = 0;
else
currentPart += inputString[i];
break;
}
}

return retVal.ToArray();
}
///

--
Hope this helps,
Tom Spink

Google first, ask later.
Bruce Wood
Guest
 
Posts: n/a
#6: Jul 17 '06

re: Parsing Comma-delimited records?


Tom Spink wrote:
Quote:
VMI wrote:
>
Quote:
I'm parsing a comma-delimited record but I want it to do something if some
of
the string is between "". How can I do this? With the Excel import it
does it correct. I'm using String.Split().
Basically, this is what I want to do: Use string.Split() on the whole
string UNLESS the string is in between double-quotes. The part of the
string in-between the "" will be ignored by String.Split
>
I wrote a wee lexer that did this, somewhere earlier in the newsgroup. I've
found it, and here it is:
Tom / VMI:

If you don't mind a bit of reworking, this version of Tom's lexer uses
StringBuilder rather than String, and so will not create so many
intermediate strings on the heap that later have to be
garbage-collected. For large volumes of data it will make a significant
difference:

public string[] GetStringParts ( string inputString )
{
List<stringretVal = new List<string>();
StringBuilder currentPart = new StringBuilder();
bool withinQuotes = false;

for ( int i = 0; i < inputString.Length; i++ )
{
char c = inputString[i];
if (withinQuotes)
{
if (c == '"')
{
withinQuotes = false;
}
else
{
currentPart.Append(c);
}
}
else
{
if (c == ',')
{
retVal.Add( currentPart.ToString().Trim() );
currentPart.Length = 0;
}
else if ( c == '"' )
{
withinQuotes = true;
}
else
{
currentPart.Append(c);
}
}
}
retVal.Add( currentPart.ToString().Trim() );

return retVal.ToArray();
}

This version also fixes a bug whereby the last item in the
comma-separated list wasn't being added to the return array.

Anyway, this is the kind of simple solution I was talking about: easy
to read, easy to maintain. It's also easy to add refinements like
backslash-escapes for quote characters, etc.

Tom Spink
Guest
 
Posts: n/a
#7: Jul 18 '06

re: Parsing Comma-delimited records?


Bruce Wood wrote:
Quote:
Tom Spink wrote:
Quote:
>VMI wrote:
>>
Quote:
I'm parsing a comma-delimited record but I want it to do something if
some of
the string is between "". How can I do this? With the Excel import it
does it correct. I'm using String.Split().
Basically, this is what I want to do: Use string.Split() on the whole
string UNLESS the string is in between double-quotes. The part of the
string in-between the "" will be ignored by String.Split
>>
>I wrote a wee lexer that did this, somewhere earlier in the newsgroup.
>I've found it, and here it is:
>
Tom / VMI:
>
If you don't mind a bit of reworking, this version of Tom's lexer uses
StringBuilder rather than String, and so will not create so many
intermediate strings on the heap that later have to be
garbage-collected. For large volumes of data it will make a significant
difference:
>
public string[] GetStringParts ( string inputString )
{
List<stringretVal = new List<string>();
StringBuilder currentPart = new StringBuilder();
bool withinQuotes = false;
>
for ( int i = 0; i < inputString.Length; i++ )
{
char c = inputString[i];
if (withinQuotes)
{
if (c == '"')
{
withinQuotes = false;
}
else
{
currentPart.Append(c);
}
}
else
{
if (c == ',')
{
retVal.Add( currentPart.ToString().Trim() );
currentPart.Length = 0;
}
else if ( c == '"' )
{
withinQuotes = true;
}
else
{
currentPart.Append(c);
}
}
}
retVal.Add( currentPart.ToString().Trim() );
>
return retVal.ToArray();
}
>
This version also fixes a bug whereby the last item in the
comma-separated list wasn't being added to the return array.
>
Anyway, this is the kind of simple solution I was talking about: easy
to read, easy to maintain. It's also easy to add refinements like
backslash-escapes for quote characters, etc.
Hi Bruce,

Thanks for spotting the bug, I actually spotted it myself, when I posted
this lexer a while ago, and posted the fix, but I neglected to include it
when I copy-and-pasted it here.

--
Hope this helps,
Tom Spink

Google first, ask later.
Shawn Wildermuth (C# MVP)
Guest
 
Posts: n/a
#8: Jul 18 '06

re: Parsing Comma-delimited records?


Hello Tom,

I have tried to use the ODBC CSV driver before to some success, but I have
used this to much success:

http://www.heikniemi.net/jhlib/


Thanks,
Shawn Wildermuth
Speaker, Author and C# MVP
http://adoguy.com
Quote:
Bruce Wood wrote:
>
Quote:
>Tom Spink wrote:
>>
Quote:
>>VMI wrote:
>>>
>>>I'm parsing a comma-delimited record but I want it to do something
>>>if
>>>some of
>>>the string is between "". How can I do this? With the Excel import
>>>it
>>>does it correct. I'm using String.Split().
>>>Basically, this is what I want to do: Use string.Split() on the
>>>whole
>>>string UNLESS the string is in between double-quotes. The part of
>>>the
>>>string in-between the "" will be ignored by String.Split
>>I wrote a wee lexer that did this, somewhere earlier in the
>>newsgroup. I've found it, and here it is:
>>>
>Tom / VMI:
>>
>If you don't mind a bit of reworking, this version of Tom's lexer
>uses StringBuilder rather than String, and so will not create so many
>intermediate strings on the heap that later have to be
>garbage-collected. For large volumes of data it will make a
>significant difference:
>>
>public string[] GetStringParts ( string inputString )
>{
>List<stringretVal = new List<string>();
>StringBuilder currentPart = new StringBuilder();
>bool withinQuotes = false;
>for ( int i = 0; i < inputString.Length; i++ )
>{
>char c = inputString[i];
>if (withinQuotes)
>{
>if (c == '"')
>{
>withinQuotes = false;
>}
>else
>{
>currentPart.Append(c);
>}
>}
>else
>{
>if (c == ',')
>{
>retVal.Add( currentPart.ToString().Trim() );
>currentPart.Length = 0;
>}
>else if ( c == '"' )
>{
>withinQuotes = true;
>}
>else
>{
>currentPart.Append(c);
>}
>}
>}
>retVal.Add( currentPart.ToString().Trim() );
>return retVal.ToArray();
>}
>This version also fixes a bug whereby the last item in the
>comma-separated list wasn't being added to the return array.
>>
>Anyway, this is the kind of simple solution I was talking about: easy
>to read, easy to maintain. It's also easy to add refinements like
>backslash-escapes for quote characters, etc.
>>
Hi Bruce,
>
Thanks for spotting the bug, I actually spotted it myself, when I
posted this lexer a while ago, and posted the fix, but I neglected to
include it when I copy-and-pasted it here.
>
Google first, ask later.
>

Jesse Houwing
Guest
 
Posts: n/a
#9: Jul 18 '06

re: Parsing Comma-delimited records?


* William Stacey [MVP] wrote, On 17-7-2006 22:21:
Quote:
Here is a method on my blog.
http://staceyw.spaces.msn.com/blog/cns!F4A38E96E598161E!352.entry?
>
I've got a regex lying around that does basically the same:

^((((?<value>[^|""\r\n]*)|""(?<value>([^""]|\\"")*)"")[|])+((?<value>[^|""\r\n]*)|""(?<value>([^""]|\\"")*)""))\r?$

It captures one line, separated by | (quoted strings may contain
newlines, unquoted ones may not).

Use this as follows:

static void Main(string[] args)
{
ParsePipeSeparatedLine(input);
}

static string input =
@"1|2
3|4";

public static List<string[]ParsePipeSeparatedLine(string input)
{
Regex rx = new
Regex(@"^((((?<value>[^|""\r\n]*)|""(?<value>([^""]|\\"")*)"")[|])+((?<value>[^|""\r\n]*)|""(?<value>([^""]|\\"")*)""))\r?$",
RegexOptions.Compiled | RegexOptions.Multiline);

Match m = rx.Match(input);

List<string[]lines = new List<string[]>();
while (m.Success)
{
int elemCount = m.Groups["value"].Captures.Count;
string[] values = new string[elemCount];

for (int i = 0; i < elemCount; i++)
{
values[i] = m.Groups["value"].Captures[i].Value;
}
lines.Add(values);
m = m.NextMatch();
}
return lines;
}
William Stacey [MVP]
Guest
 
Posts: n/a
#10: Jul 18 '06

re: Parsing Comma-delimited records?


Gotta love regex. I'll have to look at that harder. Looks interesting.

--
William Stacey [MVP]

"Jesse Houwing" <jesse.houwing@nospam-sogeti.nlwrote in message
news:%23T1qPwfqGHA.4424@TK2MSFTNGP05.phx.gbl...
|* William Stacey [MVP] wrote, On 17-7-2006 22:21:
| Here is a method on my blog.
| http://staceyw.spaces.msn.com/blog/cns!F4A38E96E598161E!352.entry?
| >
|
| I've got a regex lying around that does basically the same:
|
|
^((((?<value>[^|""\r\n]*)|""(?<value>([^""]|\\"")*)"")[|])+((?<value>[^|""\r\n]*)|""(?<value>([^""]|\\"")*)""))\r?$
|
| It captures one line, separated by | (quoted strings may contain
| newlines, unquoted ones may not).
|
| Use this as follows:
|
| static void Main(string[] args)
| {
| ParsePipeSeparatedLine(input);
| }
|
| static string input =
| @"1|2
| 3|4";
|
| public static List<string[]ParsePipeSeparatedLine(string input)
| {
| Regex rx = new
|
Regex(@"^((((?<value>[^|""\r\n]*)|""(?<value>([^""]|\\"")*)"")[|])+((?<value>[^|""\r\n]*)|""(?<value>([^""]|\\"")*)""))\r?$",
| RegexOptions.Compiled | RegexOptions.Multiline);
|
| Match m = rx.Match(input);
|
| List<string[]lines = new List<string[]>();
| while (m.Success)
| {
| int elemCount = m.Groups["value"].Captures.Count;
| string[] values = new string[elemCount];
|
| for (int i = 0; i < elemCount; i++)
| {
| values[i] = m.Groups["value"].Captures[i].Value;
| }
| lines.Add(values);
| m = m.NextMatch();
| }
| return lines;
| }


Tom Spink
Guest
 
Posts: n/a
#11: Jul 18 '06

re: Parsing Comma-delimited records?


Shawn Wildermuth (C# MVP) wrote:
Quote:
Hello Tom,
>
I have tried to use the ODBC CSV driver before to some success, but I have
used this to much success:
>
http://www.heikniemi.net/jhlib/
>
>
Thanks,
Shawn Wildermuth
Speaker, Author and C# MVP
http://adoguy.com
>
Quote:
>Bruce Wood wrote:
>>
Quote:
>>Tom Spink wrote:
>>>
>>>VMI wrote:
>>>>
>>>>I'm parsing a comma-delimited record but I want it to do something
>>>>if
>>>>some of
>>>>the string is between "". How can I do this? With the Excel import
>>>>it
>>>>does it correct. I'm using String.Split().
>>>>Basically, this is what I want to do: Use string.Split() on the
>>>>whole
>>>>string UNLESS the string is in between double-quotes. The part of
>>>>the
>>>>string in-between the "" will be ignored by String.Split
>>>I wrote a wee lexer that did this, somewhere earlier in the
>>>newsgroup. I've found it, and here it is:
>>>>
>>Tom / VMI:
>>>
>>If you don't mind a bit of reworking, this version of Tom's lexer
>>uses StringBuilder rather than String, and so will not create so many
>>intermediate strings on the heap that later have to be
>>garbage-collected. For large volumes of data it will make a
>>significant difference:
>>>
>>public string[] GetStringParts ( string inputString )
>>{
>>List<stringretVal = new List<string>();
>>StringBuilder currentPart = new StringBuilder();
>>bool withinQuotes = false;
>>for ( int i = 0; i < inputString.Length; i++ )
>>{
>>char c = inputString[i];
>>if (withinQuotes)
>>{
>>if (c == '"')
>>{
>>withinQuotes = false;
>>}
>>else
>>{
>>currentPart.Append(c);
>>}
>>}
>>else
>>{
>>if (c == ',')
>>{
>>retVal.Add( currentPart.ToString().Trim() );
>>currentPart.Length = 0;
>>}
>>else if ( c == '"' )
>>{
>>withinQuotes = true;
>>}
>>else
>>{
>>currentPart.Append(c);
>>}
>>}
>>}
>>retVal.Add( currentPart.ToString().Trim() );
>>return retVal.ToArray();
>>}
>>This version also fixes a bug whereby the last item in the
>>comma-separated list wasn't being added to the return array.
>>>
>>Anyway, this is the kind of simple solution I was talking about: easy
>>to read, easy to maintain. It's also easy to add refinements like
>>backslash-escapes for quote characters, etc.
>>>
>Hi Bruce,
>>
>Thanks for spotting the bug, I actually spotted it myself, when I
>posted this lexer a while ago, and posted the fix, but I neglected to
>include it when I copy-and-pasted it here.
>>
>Google first, ask later.
>>
Hi Shawn,

Interesting, but I've provided a small code-sample that you can embed
directly into your own code, that doesn't require you to download and
reference libraries.

--
Hope this helps,
Tom Spink

Google first, ask later.
mark.joyal@gmail.com
Guest
 
Posts: n/a
#12: Jul 18 '06

re: Parsing Comma-delimited records?


you could try this:

public static ArrayList Read(string file, bool hasHeader)
{
//returns an array list of rows which in turn contains an
arraylist of fields
ArrayList csvData = new ArrayList();
string path = Path.GetDirectoryName(file);
OleDbConnection con = new OleDbConnection();
if (hasHeader)
con.ConnectionString =
@"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + path + @";Extended
Properties=""Text;HDR=Yes;FMT=Delimited""";
else
con.ConnectionString =
@"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + path + @";Extended
Properties=""Text;HDR=No;FMT=Delimited""";
con.Open();
OleDbCommand cmd = new OleDbCommand("SELECT * FROM " +
Path.GetFileName(file), con);
OleDbDataReader rs = cmd.ExecuteReader();
while (rs.Read())
{
ArrayList csvRow = new ArrayList();
// loop through every field
for (int i = 0; i < rs.FieldCount; i++)
csvRow.Add(rs.GetValue(i));
csvData.Add(csvRow);
}
con.Close();
return csvData;
}

It's what I use. It's simple but can be extended to use your own
custom collection types, etc...



VMI wrote:
Quote:
I'm parsing a comma-delimited record but I want it to do something if some of
the string is between "". How can I do this? With the Excel import it does
it correct. I'm using String.Split().
Basically, this is what I want to do: Use string.Split() on the whole string
UNLESS the string is in between double-quotes. The part of the string
in-between the "" will be ignored by String.Split
>
Thanks.
Jesse Houwing
Guest
 
Posts: n/a
#13: Jul 18 '06

re: Parsing Comma-delimited records?


* William Stacey [MVP] wrote, On 18-7-2006 4:03:
Quote:
Gotta love regex. I'll have to look at that harder. Looks interesting.
>
I give lectures in Regex, it's become a way of life ;)

Jesse
GhostInAK
Guest
 
Posts: n/a
#14: Jul 20 '06

re: Parsing Comma-delimited records?


Hello VMI,

I assume the CSV file lives on disk somewhere. If so, then how about using
OleDb to talk to it..

Connection String:
Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\TxtFilesFolder\;Extended
Properties='text;HDR=Yes;FMT=Delimited'

CommandText:
SELECT * FROM file.csv

-Boo
Quote:
I'm parsing a comma-delimited record but I want it to do something if
some of
the string is between "". How can I do this? With the Excel import it
does
it correct. I'm using String.Split().
Basically, this is what I want to do: Use string.Split() on the whole
string
UNLESS the string is in between double-quotes. The part of the string
in-between the "" will be ignored by String.Split
Thanks.
>

William Stacey [MVP]
Guest
 
Posts: n/a
#15: Jul 21 '06

re: Parsing Comma-delimited records?


Your sick! :-)

--
William Stacey [MVP]

"Jesse Houwing" <jesse.houwing@nospam-sogeti.nlwrote in message
news:ub2ypPrqGHA.4812@TK2MSFTNGP04.phx.gbl...
|* William Stacey [MVP] wrote, On 18-7-2006 4:03:
| Gotta love regex. I'll have to look at that harder. Looks interesting.
| >
|
| I give lectures in Regex, it's become a way of life ;)
|
| Jesse


TChris
Guest
 
Posts: n/a
#16: Oct 17 '06

re: Parsing Comma-delimited records?


VMI,
I recently encountered the same issue and found a solution that made my life
very easy.
I added a reference to the Microsoft.VisualBasic component and used the
TextFieldParser object.
An example of my code follows...
TextFieldParser tfp = new TextFieldParser(txtFileName.Text);
tfp.TextFieldType=FieldType.Delimited;
tfp.SetDelimiters(",");
string[] columns;
while (!tfp.EndOfData)
{
try
{
columns = tfp.ReadFields();
Address address = new Address();
address.Address1 = columns[1];
address.Address2 = columns[2];
address.City = columns[3];
address.State = columns[4];
address.Zip = columns[5];
address.CarrierRoute = columns[6];
address.DeliveryPoint = columns[7];
address.AddressCode = columns[8];
addresses.Add(address); // Addresses is my <List>
reference
}
catch
{
}

}

By reading the file into a <listobject, I can look at each record and do
whatever I like with it. You can see that the fields are separated and thus,
you can perform an operation on an individual field as well.
Hope this helps!
TChris
--
.... but they that seek the Lord understand all things. Prov 28:5


"VMI" wrote:
Quote:
I'm parsing a comma-delimited record but I want it to do something if some of
the string is between "". How can I do this? With the Excel import it does
it correct. I'm using String.Split().
Basically, this is what I want to do: Use string.Split() on the whole string
UNLESS the string is in between double-quotes. The part of the string
in-between the "" will be ignored by String.Split
>
Thanks.
Closed Thread