Parsing Comma-delimited records? | | |
I'm parsing a comma-delimited record but I want it to do something if some of
the string is between "". How can I do this? With the Excel import it does
it correct. I'm using String.Split().
Basically, this is what I want to do: Use string.Split() on the whole string
UNLESS the string is in between double-quotes. The part of the string
in-between the "" will be ignored by String.Split
Thanks. | | | | re: Parsing Comma-delimited records?
VMI wrote: Quote:
I'm parsing a comma-delimited record but I want it to do something if some of
the string is between "". How can I do this? With the Excel import it does
it correct. I'm using String.Split().
Basically, this is what I want to do: Use string.Split() on the whole string
UNLESS the string is in between double-quotes. The part of the string
in-between the "" will be ignored by String.Split
I would abandon String.Split and split the string myself, coping with
quotation marks and escape characters as necessary. It's a very simple
little loop to do it, easy to write and maintain. Sometimes the
Framework-supplied methods aren't the best way to solve a problem,
IMHO.
Unless someone else has a clever trick...? | | | | re: Parsing Comma-delimited records?
VMI,
You will have to cycle through the string character by character and
take note of the state, or use regular expressions in this case.
Hope this helps.
--
- Nicholas Paldino [.NET/C# MVP]
- mvp@spam.guard.caspershouse.com
"VMI" <VMI@discussions.microsoft.comwrote in message
news:BA14F56B-A09B-4DB1-8C7A-EDC136CD13A7@microsoft.com... Quote:
I'm parsing a comma-delimited record but I want it to do something if some
of
the string is between "". How can I do this? With the Excel import it
does
it correct. I'm using String.Split().
Basically, this is what I want to do: Use string.Split() on the whole
string
UNLESS the string is in between double-quotes. The part of the string
in-between the "" will be ignored by String.Split
>
Thanks.
| | | | re: Parsing Comma-delimited records?
Here is a method on my blog.
http://staceyw.spaces.msn.com/blog/cns!F4A38E96E598161E!352.entry?
--
William Stacey [MVP]
"VMI" <VMI@discussions.microsoft.comwrote in message
news:BA14F56B-A09B-4DB1-8C7A-EDC136CD13A7@microsoft.com...
| I'm parsing a comma-delimited record but I want it to do something if some
of
| the string is between "". How can I do this? With the Excel import it
does
| it correct. I'm using String.Split().
| Basically, this is what I want to do: Use string.Split() on the whole
string
| UNLESS the string is in between double-quotes. The part of the string
| in-between the "" will be ignored by String.Split
|
| Thanks. | | | | re: Parsing Comma-delimited records?
VMI wrote: Quote:
I'm parsing a comma-delimited record but I want it to do something if some
of
the string is between "". How can I do this? With the Excel import it
does it correct. I'm using String.Split().
Basically, this is what I want to do: Use string.Split() on the whole
string UNLESS the string is in between double-quotes. The part of the
string in-between the "" will be ignored by String.Split
>
Thanks.
Hi,
I wrote a wee lexer that did this, somewhere earlier in the newsgroup. I've
found it, and here it is:
///
public string[] GetStringParts ( string inputString )
{
List<stringretVal = new List<string>();
string currentPart = string.Empty;
int lexerState = 0;
for ( int i = 0; i < inputString.Length; i++ )
{
switch ( lexerState )
{
case 0:
if ( inputString[i] == ',' )
{
retVal.Add( currentPart.Trim() );
currentPart = string.Empty;
}
else if ( inputString[i] == '"' )
lexerState = 1;
else
currentPart += inputString[i];
break;
case 1:
if ( inputString[i] == '"' )
lexerState = 0;
else
currentPart += inputString[i];
break;
}
}
return retVal.ToArray();
}
///
--
Hope this helps,
Tom Spink
Google first, ask later. | | | | re: Parsing Comma-delimited records?
Tom Spink wrote: Quote:
VMI wrote:
> Quote:
I'm parsing a comma-delimited record but I want it to do something if some
of
the string is between "". How can I do this? With the Excel import it
does it correct. I'm using String.Split().
Basically, this is what I want to do: Use string.Split() on the whole
string UNLESS the string is in between double-quotes. The part of the
string in-between the "" will be ignored by String.Split
>
I wrote a wee lexer that did this, somewhere earlier in the newsgroup. I've
found it, and here it is:
Tom / VMI:
If you don't mind a bit of reworking, this version of Tom's lexer uses
StringBuilder rather than String, and so will not create so many
intermediate strings on the heap that later have to be
garbage-collected. For large volumes of data it will make a significant
difference:
public string[] GetStringParts ( string inputString )
{
List<stringretVal = new List<string>();
StringBuilder currentPart = new StringBuilder();
bool withinQuotes = false;
for ( int i = 0; i < inputString.Length; i++ )
{
char c = inputString[i];
if (withinQuotes)
{
if (c == '"')
{
withinQuotes = false;
}
else
{
currentPart.Append(c);
}
}
else
{
if (c == ',')
{
retVal.Add( currentPart.ToString().Trim() );
currentPart.Length = 0;
}
else if ( c == '"' )
{
withinQuotes = true;
}
else
{
currentPart.Append(c);
}
}
}
retVal.Add( currentPart.ToString().Trim() );
return retVal.ToArray();
}
This version also fixes a bug whereby the last item in the
comma-separated list wasn't being added to the return array.
Anyway, this is the kind of simple solution I was talking about : easy
to read, easy to maintain. It's also easy to add refinements like
backslash-escapes for quote characters, etc. | | | | re: Parsing Comma-delimited records?
Bruce Wood wrote: Quote:
Tom Spink wrote: Quote:
>VMI wrote:
>> Quote:
I'm parsing a comma-delimited record but I want it to do something if
some of
the string is between "". How can I do this? With the Excel import it
does it correct. I'm using String.Split().
Basically, this is what I want to do: Use string.Split() on the whole
string UNLESS the string is in between double-quotes. The part of the
string in-between the "" will be ignored by String.Split
>>
>I wrote a wee lexer that did this, somewhere earlier in the newsgroup.
>I've found it, and here it is:
>
Tom / VMI:
>
If you don't mind a bit of reworking, this version of Tom's lexer uses
StringBuilder rather than String, and so will not create so many
intermediate strings on the heap that later have to be
garbage-collected. For large volumes of data it will make a significant
difference:
>
public string[] GetStringParts ( string inputString )
{
List<stringretVal = new List<string>();
StringBuilder currentPart = new StringBuilder();
bool withinQuotes = false;
>
for ( int i = 0; i < inputString.Length; i++ )
{
char c = inputString[i];
if (withinQuotes)
{
if (c == '"')
{
withinQuotes = false;
}
else
{
currentPart.Append(c);
}
}
else
{
if (c == ',')
{
retVal.Add( currentPart.ToString().Trim() );
currentPart.Length = 0;
}
else if ( c == '"' )
{
withinQuotes = true;
}
else
{
currentPart.Append(c);
}
}
}
retVal.Add( currentPart.ToString().Trim() );
>
return retVal.ToArray();
}
>
This version also fixes a bug whereby the last item in the
comma-separated list wasn't being added to the return array.
>
Anyway, this is the kind of simple solution I was talking about : easy
to read, easy to maintain. It's also easy to add refinements like
backslash-escapes for quote characters, etc.
Hi Bruce,
Thanks for spotting the bug, I actually spotted it myself, when I posted
this lexer a while ago, and posted the fix, but I neglected to include it
when I copy-and-pasted it here.
--
Hope this helps,
Tom Spink
Google first, ask later. | | | | re: Parsing Comma-delimited records?
Hello Tom,
I have tried to use the ODBC CSV driver before to some success, but I have
used this to much success: http://www.heikniemi.net/jhlib/
Thanks,
Shawn Wildermuth
Speaker, Author and C# MVP http://adoguy.com Quote:
Bruce Wood wrote:
> Quote:
>Tom Spink wrote:
>> Quote:
>>VMI wrote:
>>>
>>>I'm parsing a comma-delimited record but I want it to do something
>>>if
>>>some of
>>>the string is between "". How can I do this? With the Excel import
>>>it
>>>does it correct. I'm using String.Split().
>>>Basically, this is what I want to do: Use string.Split() on the
>>>whole
>>>string UNLESS the string is in between double-quotes. The part of
>>>the
>>>string in-between the "" will be ignored by String.Split
>>I wrote a wee lexer that did this, somewhere earlier in the
>>newsgroup. I've found it, and here it is:
>>>
>Tom / VMI:
>>
>If you don't mind a bit of reworking, this version of Tom's lexer
>uses StringBuilder rather than String, and so will not create so many
>intermediate strings on the heap that later have to be
>garbage-collected. For large volumes of data it will make a
>significant difference:
>>
>public string[] GetStringParts ( string inputString )
>{
>List<stringretVal = new List<string>();
>StringBuilder currentPart = new StringBuilder();
>bool withinQuotes = false;
>for ( int i = 0; i < inputString.Length; i++ )
>{
>char c = inputString[i];
>if (withinQuotes)
>{
>if (c == '"')
>{
>withinQuotes = false;
>}
>else
>{
>currentPart.Append(c);
>}
>}
>else
>{
>if (c == ',')
>{
>retVal.Add( currentPart.ToString().Trim() );
>currentPart.Length = 0;
>}
>else if ( c == '"' )
>{
>withinQuotes = true;
>}
>else
>{
>currentPart.Append(c);
>}
>}
>}
>retVal.Add( currentPart.ToString().Trim() );
>return retVal.ToArray();
>}
>This version also fixes a bug whereby the last item in the
>comma-separated list wasn't being added to the return array.
>>
>Anyway, this is the kind of simple solution I was talking about : easy
>to read, easy to maintain. It's also easy to add refinements like
>backslash-escapes for quote characters, etc.
>>
Hi Bruce,
>
Thanks for spotting the bug, I actually spotted it myself, when I
posted this lexer a while ago, and posted the fix, but I neglected to
include it when I copy-and-pasted it here.
>
Google first, ask later.
>
| | | | re: Parsing Comma-delimited records?
* William Stacey [MVP] wrote, On 17-7-2006 22:21: Quote:
Here is a method on my blog.
http://staceyw.spaces.msn.com/blog/cns!F4A38E96E598161E!352.entry?
>
I've got a regex lying around that does basically the same:
^((((?<value>[^|""\r\n]*)|""(?<value>([^""]|\\"")*)"")[|])+((?<value>[^|""\r\n]*)|""(?<value>([^""]|\\"")*)""))\r?$
It captures one line, separated by | (quoted strings may contain
newlines, unquoted ones may not).
Use this as follows:
static void Main(string[] args)
{
ParsePipeSeparatedLine(input);
}
static string input =
@"1|2
3|4";
public static List<string[]ParsePipeSeparatedLine(string input)
{
Regex rx = new
Regex(@"^((((?<value>[^|""\r\n]*)|""(?<value>([^""]|\\"")*)"")[|])+((?<value>[^|""\r\n]*)|""(?<value>([^""]|\\"")*)""))\r?$",
RegexOptions.Compiled | RegexOptions.Multiline);
Match m = rx.Match(input);
List<string[]lines = new List<string[]>();
while (m.Success)
{
int elemCount = m.Groups["value"].Captures.Count;
string[] values = new string[elemCount];
for (int i = 0; i < elemCount; i++)
{
values[i] = m.Groups["value"].Captures[i].Value;
}
lines.Add(values);
m = m.NextMatch();
}
return lines;
} | | | | re: Parsing Comma-delimited records?
Gotta love regex. I'll have to look at that harder. Looks interesting.
--
William Stacey [MVP]
"Jesse Houwing" <jesse.houwing@nospam-sogeti.nlwrote in message
news:%23T1qPwfqGHA.4424@TK2MSFTNGP05.phx.gbl...
|* William Stacey [MVP] wrote, On 17-7-2006 22:21:
| Here is a method on my blog.
| http://staceyw.spaces.msn.com/blog/cns!F4A38E96E598161E!352.entry?
| >
|
| I've got a regex lying around that does basically the same:
|
|
^((((?<value>[^|""\r\n]*)|""(?<value>([^""]|\\"")*)"")[|])+((?<value>[^|""\r\n]*)|""(?<value>([^""]|\\"")*)""))\r?$
|
| It captures one line, separated by | (quoted strings may contain
| newlines, unquoted ones may not).
|
| Use this as follows:
|
| static void Main(string[] args)
| {
| ParsePipeSeparatedLine(input);
| }
|
| static string input =
| @"1|2
| 3|4";
|
| public static List<string[]ParsePipeSeparatedLine(string input)
| {
| Regex rx = new
|
Regex(@"^((((?<value>[^|""\r\n]*)|""(?<value>([^""]|\\"")*)"")[|])+((?<value>[^|""\r\n]*)|""(?<value>([^""]|\\"")*)""))\r?$",
| RegexOptions.Compiled | RegexOptions.Multiline);
|
| Match m = rx.Match(input);
|
| List<string[]lines = new List<string[]>();
| while (m.Success)
| {
| int elemCount = m.Groups["value"].Captures.Count;
| string[] values = new string[elemCount];
|
| for (int i = 0; i < elemCount; i++)
| {
| values[i] = m.Groups["value"].Captures[i].Value;
| }
| lines.Add(values);
| m = m.NextMatch();
| }
| return lines;
| } | | | | re: Parsing Comma-delimited records?
Shawn Wildermuth (C# MVP) wrote: Quote:
Hello Tom,
>
I have tried to use the ODBC CSV driver before to some success, but I have
used this to much success:
> http://www.heikniemi.net/jhlib/
>
>
Thanks,
Shawn Wildermuth
Speaker, Author and C# MVP http://adoguy.com
> Quote:
>Bruce Wood wrote:
>> Quote:
>>Tom Spink wrote:
>>>
>>>VMI wrote:
>>>>
>>>>I'm parsing a comma-delimited record but I want it to do something
>>>>if
>>>>some of
>>>>the string is between "". How can I do this? With the Excel import
>>>>it
>>>>does it correct. I'm using String.Split().
>>>>Basically, this is what I want to do: Use string.Split() on the
>>>>whole
>>>>string UNLESS the string is in between double-quotes. The part of
>>>>the
>>>>string in-between the "" will be ignored by String.Split
>>>I wrote a wee lexer that did this, somewhere earlier in the
>>>newsgroup. I've found it, and here it is:
>>>>
>>Tom / VMI:
>>>
>>If you don't mind a bit of reworking, this version of Tom's lexer
>>uses StringBuilder rather than String, and so will not create so many
>>intermediate strings on the heap that later have to be
>>garbage-collected. For large volumes of data it will make a
>>significant difference:
>>>
>>public string[] GetStringParts ( string inputString )
>>{
>>List<stringretVal = new List<string>();
>>StringBuilder currentPart = new StringBuilder();
>>bool withinQuotes = false;
>>for ( int i = 0; i < inputString.Length; i++ )
>>{
>>char c = inputString[i];
>>if (withinQuotes)
>>{
>>if (c == '"')
>>{
>>withinQuotes = false;
>>}
>>else
>>{
>>currentPart.Append(c);
>>}
>>}
>>else
>>{
>>if (c == ',')
>>{
>>retVal.Add( currentPart.ToString().Trim() );
>>currentPart.Length = 0;
>>}
>>else if ( c == '"' )
>>{
>>withinQuotes = true;
>>}
>>else
>>{
>>currentPart.Append(c);
>>}
>>}
>>}
>>retVal.Add( currentPart.ToString().Trim() );
>>return retVal.ToArray();
>>}
>>This version also fixes a bug whereby the last item in the
>>comma-separated list wasn't being added to the return array.
>>>
>>Anyway, this is the kind of simple solution I was talking about : easy
>>to read, easy to maintain. It's also easy to add refinements like
>>backslash-escapes for quote characters, etc.
>>>
>Hi Bruce,
>>
>Thanks for spotting the bug, I actually spotted it myself, when I
>posted this lexer a while ago, and posted the fix, but I neglected to
>include it when I copy-and-pasted it here.
>>
>Google first, ask later.
>>
Hi Shawn,
Interesting, but I've provided a small code-sample that you can embed
directly into your own code, that doesn't require you to download and
reference libraries.
--
Hope this helps,
Tom Spink
Google first, ask later. | | | | re: Parsing Comma-delimited records?
you could try this:
public static ArrayList Read(string file, bool hasHeader)
{
//returns an array list of rows which in turn contains an
arraylist of fields
ArrayList csvData = new ArrayList();
string path = Path.GetDirectoryName(file);
OleDbConnection con = new OleDbConnection();
if (hasHeader)
con.ConnectionString =
@"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + path + @";Extended
Properties=""Text;HDR=Yes;FMT=Delimited""";
else
con.ConnectionString =
@"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + path + @";Extended
Properties=""Text;HDR=No;FMT=Delimited""";
con.Open();
OleDbCommand cmd = new OleDbCommand("SELECT * FROM " +
Path.GetFileName(file), con);
OleDbDataReader rs = cmd.ExecuteReader();
while (rs.Read())
{
ArrayList csvRow = new ArrayList();
// loop through every field
for (int i = 0; i < rs.FieldCount; i++)
csvRow.Add(rs.GetValue(i));
csvData.Add(csvRow);
}
con.Close();
return csvData;
}
It's what I use. It's simple but can be extended to use your own
custom collection types, etc...
VMI wrote: Quote:
I'm parsing a comma-delimited record but I want it to do something if some of
the string is between "". How can I do this? With the Excel import it does
it correct. I'm using String.Split().
Basically, this is what I want to do: Use string.Split() on the whole string
UNLESS the string is in between double-quotes. The part of the string
in-between the "" will be ignored by String.Split
>
Thanks.
| | | | re: Parsing Comma-delimited records?
* William Stacey [MVP] wrote, On 18-7-2006 4:03: Quote:
Gotta love regex. I'll have to look at that harder. Looks interesting.
>
I give lectures in Regex, it's become a way of life ;)
Jesse | | | | re: Parsing Comma-delimited records?
Hello VMI,
I assume the CSV file lives on disk somewhere. If so, then how about using
OleDb to talk to it..
Connection String:
Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\TxtFilesFolder\;Extended
Properties='text;HDR=Yes;FMT=Delimited'
CommandText:
SELECT * FROM file.csv
-Boo Quote:
I'm parsing a comma-delimited record but I want it to do something if
some of
the string is between "". How can I do this? With the Excel import it
does
it correct. I'm using String.Split().
Basically, this is what I want to do: Use string.Split() on the whole
string
UNLESS the string is in between double-quotes. The part of the string
in-between the "" will be ignored by String.Split
Thanks.
>
| | | | re: Parsing Comma-delimited records?
Your sick! :-)
--
William Stacey [MVP]
"Jesse Houwing" <jesse.houwing@nospam-sogeti.nlwrote in message
news:ub2ypPrqGHA.4812@TK2MSFTNGP04.phx.gbl...
|* William Stacey [MVP] wrote, On 18-7-2006 4:03:
| Gotta love regex. I'll have to look at that harder. Looks interesting.
| >
|
| I give lectures in Regex, it's become a way of life ;)
|
| Jesse | | | | re: Parsing Comma-delimited records?
VMI,
I recently encountered the same issue and found a solution that made my life
very easy.
I added a reference to the Microsoft.VisualBasic component and used the
TextFieldParser object.
An example of my code follows...
TextFieldParser tfp = new TextFieldParser(txtFileName.Text);
tfp.TextFieldType=FieldType.Delimited;
tfp.SetDelimiters(",");
string[] columns;
while (!tfp.EndOfData)
{
try
{
columns = tfp.ReadFields();
Address address = new Address();
address.Address1 = columns[1];
address.Address2 = columns[2];
address.City = columns[3];
address.State = columns[4];
address.Zip = columns[5];
address.CarrierRoute = columns[6];
address.DeliveryPoint = columns[7];
address.AddressCode = columns[8];
addresses.Add(address); // Addresses is my <List>
reference
}
catch
{
}
}
By reading the file into a <listobject, I can look at each record and do
whatever I like with it. You can see that the fields are separated and thus,
you can perform an operation on an individual field as well.
Hope this helps!
TChris
--
.... but they that seek the Lord understand all things. Prov 28:5
"VMI" wrote: Quote:
I'm parsing a comma-delimited record but I want it to do something if some of
the string is between "". How can I do this? With the Excel import it does
it correct. I'm using String.Split().
Basically, this is what I want to do: Use string.Split() on the whole string
UNLESS the string is in between double-quotes. The part of the string
in-between the "" will be ignored by String.Split
>
Thanks.
|  | Similar C# / C Sharp bytes | | | Forums
Visit our community forums for general discussions and latest on Bytes
/bytes/about
We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights.
Get the best answers to your questions from over 229,155 network members.
|