471,337 Members | 1,036 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,337 software developers and data experts.

parsing words in a string

Hi,

Is there a class that can handle splitting of a string on a comma such
that the commas in quotes are ignored?

I know we can use Text::ParseWords directive in perl to do this, but I
am new to C#.Net and couldn't find anything similar.

For example

string str = "one field, two field, \"field val one, field val two,
field val three\", three field" ;

Then I should get the following in a str_arr

str_arr[0] = "one field"
str_arr[1] = "two filed"
str_arr[2] = "field val one, field val two, field val three"
str_arr[3] = "three field"

Thanks in advance,

Ashoo.

Jan 6 '06 #1
19 1601
string[] str_arr = str.Split(",");

Jan 6 '06 #2
string [] str_arr = str.Split(',');

does not work. I had tried it. It gives me

str_arr[0] = "one field"
str_arr[1] = "two filed"
str_arr[2] = "\"field val one"
str_arr[3] = "field val two"
str_arr[4] = "field val three\""
str_arr[5] = "three field"

Instead of
str_arr[0] = "one field"
str_arr[1] = "two filed"
str_arr[2] = "field val one, field val two, field val three"
str_arr[3] = "three field"

look at the str_arr[2] value.

Thanks,
Ashoo
--
Sent via .NET Newsgroups
http://www.dotnetnewsgroups.com
Jan 6 '06 #3
hi,
you can use Regular Expression which can give u such results. here is
one regular expression which will fetch you your desired result. i
assumed that you are using VS2003.

(?:(?<Vals>.*?),)+"(?<Vals>.*?)"(?:,(?<Vals>.*?))+ $

the ouput of your given example will be like this :

SubMatch: [Vals]
1:one field
2:two field
3:field val one, field val two,field val three
4:three field

try it out

Jan 7 '06 #4
At this time using split more than once seems to be the only option. I
cant recollect any other function that can help you do this quicker.
how do you use that function in perl? put some code and may be that can
ring the bell for some csharpers.

thanks

Jan 7 '06 #5
hi,
you can use Regular Expression which can give u such results. here is
one regular expression which will fetch you your desired result. i
assumed that you are using VS2003.

(?:(?<Vals>.*?),)+"(?<Vals>.*?)"(?:,(?<Vals>.*?))+ $

the ouput of your given example will be like this :

SubMatch: [Vals]
1:one field
2:two field
3:field val one, field val two,field val three
4:three field

try it out

Jan 7 '06 #6
Thanks, I will try it out.

Ashoo

--
Sent via .NET Newsgroups
http://www.dotnetnewsgroups.com
Jan 7 '06 #7
This is how you cld do it in perl

#!/usr/bin/perl

use Text::ParseWords;

$line = "one field, two field, \"field val one, field val two,field val
three\", three field" ;

my @line = &parse_line('\,', 0, $line);

for (int i=0; i<$#line; i++)
print $line[i];

Thanks,
Ashoo
--
Sent via .NET Newsgroups
http://www.dotnetnewsgroups.com
Jan 7 '06 #8
public System.Collections.ArrayList parseWords(string s)
{

if (s == null)
{
return (null);
}
bool bQuote = false;
System.Collections.ArrayList al = new ArrayList();
System.Text.StringBuilder sTemp = new StringBuilder();

for (int i = 0; i < s.Length; i++)
{
switch (s[i])
{
case ',':
if (bQuote == false)
{
al.Add(sTemp.ToString());
sTemp.Length = 0;
}
else
{
sTemp.Append(s[i]);
}
break;
case '\"':
if (bQuote == true)
{
bQuote = false;
}
else
{
bQuote = true;
}

//requirement:: remove quote character
//sTemp.Append(s[i]);
break;
default:
sTemp.Append(s[i]);
break;
}
}

if (sTemp.Length > 0)
{
al.Add(sTemp.ToString());
sTemp.Length = 0;
}

return (al);
}

"asrs63" wrote:
Hi,

Is there a class that can handle splitting of a string on a comma such
that the commas in quotes are ignored?

I know we can use Text::ParseWords directive in perl to do this, but I
am new to C#.Net and couldn't find anything similar.

For example

string str = "one field, two field, \"field val one, field val two,
field val three\", three field" ;

Then I should get the following in a str_arr

str_arr[0] = "one field"
str_arr[1] = "two filed"
str_arr[2] = "field val one, field val two, field val three"
str_arr[3] = "three field"

Thanks in advance,

Ashoo.

Jan 7 '06 #9
Have you tried using regular expressions, instead of the Split()?

~~~~~~~~~~
"Ashoo Sharda" <as**********@reyrey.com> wrote in message
news:%2*****************@TK2MSFTNGP15.phx.gbl...
string [] str_arr = str.Split(',');

does not work. I had tried it. It gives me

str_arr[0] = "one field"
str_arr[1] = "two filed"
str_arr[2] = "\"field val one"
str_arr[3] = "field val two"
str_arr[4] = "field val three\""
str_arr[5] = "three field"

Instead of
str_arr[0] = "one field"
str_arr[1] = "two filed"
str_arr[2] = "field val one, field val two, field val three"
str_arr[3] = "three field"

look at the str_arr[2] value.

Thanks,
Ashoo
--
Sent via .NET Newsgroups
http://www.dotnetnewsgroups.com

Jan 7 '06 #10
Hi,

I tried the regular expressions and I am using VS.Net 2003.
This is how I have used it.

Regex regEx = new
Regex("(??<Vals>.*?),)+\"(?<Vals>.*?)\"(?:,(?<Vals >.*?))+$");
string [] text1 = regEx.Split(text);

I am getting the following run-time error

"An unhandled exception of type 'System.ArgumentException' occurred in
system.dll

Additional information: parsing
"(??<Vals>.*?),)+"(?<Vals>.*?)"(?:,(?<Vals>.*?))+$ " - Unrecognized
grouping construct."

Can you please advise as to what I am doing wrong?

Thanks,
Ashoo
--
Sent via .NET Newsgroups
http://www.dotnetnewsgroups.com
Jan 9 '06 #11
Thanks scott for the method. It was very helpful.

Thanks,
Ashoo.

--
Sent via .NET Newsgroups
http://www.dotnetnewsgroups.com
Jan 9 '06 #12
RSH
This is good stuff. How would the rest of the code look? I tried just
using plain regex but I couldn't get it to return the array.

Ron

"Lucky" <tu************@gmail.com> wrote in message
news:11**********************@o13g2000cwo.googlegr oups.com...
hi,
you can use Regular Expression which can give u such results. here is
one regular expression which will fetch you your desired result. i
assumed that you are using VS2003.

(?:(?<Vals>.*?),)+"(?<Vals>.*?)"(?:,(?<Vals>.*?))+ $

the ouput of your given example will be like this :

SubMatch: [Vals]
1:one field
2:two field
3:field val one, field val two,field val three
4:three field

try it out

Jan 9 '06 #13
RSH
This one removes all the unwanted characters as well:

public Form1()
{
InitializeComponent();
ArrayList al = ParseString(" \"M1, M2, M3, M4, \"S1, S2, S3 , S4,\"M5, M6,\"
S5, S6, \" M7, M8, M9 \"");
foreach (String aItem in al)
{
Console.WriteLine(aItem);
}
}
public ArrayList ParseString(string strInput)
{
string strTemp = "";
string ModString = "";
Boolean bQuote = false;
ArrayList aParsedString = new ArrayList();
for(int i = 0; i < strInput.Length; i++)
{
if (strInput[i] == '\"' && bQuote == false)
{
bQuote = true;
}
else if (strInput[i] == '\"' && bQuote == true)
{
ModString = strTemp.ToString();
ModString = ModString.Replace("\"", "");
ModString = ModString.TrimEnd(null);
ModString = ModString.TrimStart(null);
aParsedString.Add(ModString);
strTemp = "";
bQuote = false;
}
if(strInput[i] != ',')
{
strTemp += (strInput[i]);
}
else
{
strTemp += (strInput[i]);
if (bQuote == false)
{
ModString = strTemp.ToString();
ModString = ModString.Replace(",", " ");
ModString = ModString.Replace("\"", "");
ModString = ModString.TrimEnd(null);
ModString = ModString.TrimStart(null);
aParsedString.Add(ModString);
strTemp = "";
}
}
}
return (aParsedString);
}
Jan 9 '06 #14
hi Ashoo,

here is the implimentation of the Expression
Regex reg = new Regex("(?:(?.*?),)+[\\s]\"(?.*?)\"(?:,[\\s](?.*?))+$");
MatchCollection MatchColl;
MatchColl = reg.Matches("one field, two field, \"field val one, field
val two, field val three\", three field");
string[] vals;
foreach (Match mat in MatchColl) {
vals = Array.CreateInstance(typeof(string),
mat.Groups["Vals"].Captures.Count);
int i = 0;
foreach (Capture cap in mat.Groups["Vals"].Captures) {
vals(i) = cap.Value();
i++;
}
}

i've done this in vb.net and converted into c# for you so check some
sytaxts. anyways code is running.

let me know if you have any query regarding it.

Lucky

Jan 10 '06 #15
also check that i've little modified the expression. you can set
properties of RegEx to ignore case, multi line as per your requirements
but in your case i think only "ignore case" is only required.

Jan 10 '06 #16
RSH
Lucky,

I wish I could get this to work but I'm getting two major errors:

parsing "(?.*?),)+[\s]"(?.*?)"(?:,[\s](?.*?))+$" - Unrecognized grouping
construct.

And the looping part is generating a whole series of errors.

"Lucky" <tu************@gmail.com> wrote in message
news:11**********************@g44g2000cwa.googlegr oups.com...
hi Ashoo,

here is the implimentation of the Expression
Regex reg = new Regex("(?:(?.*?),)+[\\s]\"(?.*?)\"(?:,[\\s](?.*?))+$");
MatchCollection MatchColl;
MatchColl = reg.Matches("one field, two field, \"field val one, field
val two, field val three\", three field");
string[] vals;
foreach (Match mat in MatchColl) {
vals = Array.CreateInstance(typeof(string),
mat.Groups["Vals"].Captures.Count);
int i = 0;
foreach (Capture cap in mat.Groups["Vals"].Captures) {
vals(i) = cap.Value();
i++;
}
}

i've done this in vb.net and converted into c# for you so check some
sytaxts. anyways code is running.

let me know if you have any query regarding it.

Lucky

Jan 10 '06 #17
"RSH" <wa*************@yahoo.com> wrote in news:ud8yT#fFGHA.3120
@TK2MSFTNGP10.phx.gbl:
parsing "(?.*?),)+[\s]"(?.*?)"(?:,[\s](?.*?))+$" - Unrecognized grouping
construct.


Looks like the poster missed a '(' at the beginning. The second ')' is
unmatched. I haven't tested, but try just adding another '(' at the very
beginning.

-mdb

Jan 10 '06 #18
hi,
as i said i wrote this in vb.net and converted for you. but the
converter missed some parts. so i've manually wrote the code in c#.net.
here is the code. try it and let me know.
Regex reg = new
Regex("(?:(?<Vals>.*?),)+[\\s]\"(?<Vals>.*?)\"(?:,[\\s](?<Vals>.*?))+$");
MatchCollection MatchColl;
MatchColl = reg.Matches("one field, two field, \"field val one, field
val two, field val three\", three field");
string[] vals;
foreach (Match mat in MatchColl)
{
vals =new string[mat.Groups["Vals"].Captures.Count];
int i = 0;
foreach (Capture cap in mat.Groups["Vals"].Captures)
{
vals[i] = cap.Value;
i++;
}
}

you need to import this namespace in order to use this code.

using System.Text.RegularExpressions;

Lucky

Jan 13 '06 #19
http://spaces.msn.com/members/staceyw/Blog/cns!1pnsZpX0fPvDxLKC6rAAhLsQ!352.entry

--
William Stacey [MVP]

"asrs63" <as**********@reyrey.com> wrote in message
news:11**********************@o13g2000cwo.googlegr oups.com...
Hi,

Is there a class that can handle splitting of a string on a comma such
that the commas in quotes are ignored?

I know we can use Text::ParseWords directive in perl to do this, but I
am new to C#.Net and couldn't find anything similar.

For example

string str = "one field, two field, \"field val one, field val two,
field val three\", three field" ;

Then I should get the following in a str_arr

str_arr[0] = "one field"
str_arr[1] = "two filed"
str_arr[2] = "field val one, field val two, field val three"
str_arr[3] = "three field"

Thanks in advance,

Ashoo.

Jan 13 '06 #20

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

3 posts views Thread by Fuzzyman | last post: by
8 posts views Thread by Anders Eriksson | last post: by
15 posts views Thread by Freddie | last post: by
5 posts views Thread by Aleksandar Matijaca | last post: by
2 posts views Thread by JaythePCguy | last post: by
4 posts views Thread by william | last post: by
13 posts views Thread by Chris Carlen | last post: by
reply views Thread by rosydwin | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.