473,399 Members | 3,919 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,399 software developers and data experts.

parsing words in a string

Hi,

Is there a class that can handle splitting of a string on a comma such
that the commas in quotes are ignored?

I know we can use Text::ParseWords directive in perl to do this, but I
am new to C#.Net and couldn't find anything similar.

For example

string str = "one field, two field, \"field val one, field val two,
field val three\", three field" ;

Then I should get the following in a str_arr

str_arr[0] = "one field"
str_arr[1] = "two filed"
str_arr[2] = "field val one, field val two, field val three"
str_arr[3] = "three field"

Thanks in advance,

Ashoo.

Jan 6 '06 #1
19 1697
string[] str_arr = str.Split(",");

Jan 6 '06 #2
string [] str_arr = str.Split(',');

does not work. I had tried it. It gives me

str_arr[0] = "one field"
str_arr[1] = "two filed"
str_arr[2] = "\"field val one"
str_arr[3] = "field val two"
str_arr[4] = "field val three\""
str_arr[5] = "three field"

Instead of
str_arr[0] = "one field"
str_arr[1] = "two filed"
str_arr[2] = "field val one, field val two, field val three"
str_arr[3] = "three field"

look at the str_arr[2] value.

Thanks,
Ashoo
--
Sent via .NET Newsgroups
http://www.dotnetnewsgroups.com
Jan 6 '06 #3
hi,
you can use Regular Expression which can give u such results. here is
one regular expression which will fetch you your desired result. i
assumed that you are using VS2003.

(?:(?<Vals>.*?),)+"(?<Vals>.*?)"(?:,(?<Vals>.*?))+ $

the ouput of your given example will be like this :

SubMatch: [Vals]
1:one field
2:two field
3:field val one, field val two,field val three
4:three field

try it out

Jan 7 '06 #4
At this time using split more than once seems to be the only option. I
cant recollect any other function that can help you do this quicker.
how do you use that function in perl? put some code and may be that can
ring the bell for some csharpers.

thanks

Jan 7 '06 #5
hi,
you can use Regular Expression which can give u such results. here is
one regular expression which will fetch you your desired result. i
assumed that you are using VS2003.

(?:(?<Vals>.*?),)+"(?<Vals>.*?)"(?:,(?<Vals>.*?))+ $

the ouput of your given example will be like this :

SubMatch: [Vals]
1:one field
2:two field
3:field val one, field val two,field val three
4:three field

try it out

Jan 7 '06 #6
Thanks, I will try it out.

Ashoo

--
Sent via .NET Newsgroups
http://www.dotnetnewsgroups.com
Jan 7 '06 #7
This is how you cld do it in perl

#!/usr/bin/perl

use Text::ParseWords;

$line = "one field, two field, \"field val one, field val two,field val
three\", three field" ;

my @line = &parse_line('\,', 0, $line);

for (int i=0; i<$#line; i++)
print $line[i];

Thanks,
Ashoo
--
Sent via .NET Newsgroups
http://www.dotnetnewsgroups.com
Jan 7 '06 #8
public System.Collections.ArrayList parseWords(string s)
{

if (s == null)
{
return (null);
}
bool bQuote = false;
System.Collections.ArrayList al = new ArrayList();
System.Text.StringBuilder sTemp = new StringBuilder();

for (int i = 0; i < s.Length; i++)
{
switch (s[i])
{
case ',':
if (bQuote == false)
{
al.Add(sTemp.ToString());
sTemp.Length = 0;
}
else
{
sTemp.Append(s[i]);
}
break;
case '\"':
if (bQuote == true)
{
bQuote = false;
}
else
{
bQuote = true;
}

//requirement:: remove quote character
//sTemp.Append(s[i]);
break;
default:
sTemp.Append(s[i]);
break;
}
}

if (sTemp.Length > 0)
{
al.Add(sTemp.ToString());
sTemp.Length = 0;
}

return (al);
}

"asrs63" wrote:
Hi,

Is there a class that can handle splitting of a string on a comma such
that the commas in quotes are ignored?

I know we can use Text::ParseWords directive in perl to do this, but I
am new to C#.Net and couldn't find anything similar.

For example

string str = "one field, two field, \"field val one, field val two,
field val three\", three field" ;

Then I should get the following in a str_arr

str_arr[0] = "one field"
str_arr[1] = "two filed"
str_arr[2] = "field val one, field val two, field val three"
str_arr[3] = "three field"

Thanks in advance,

Ashoo.

Jan 7 '06 #9
Have you tried using regular expressions, instead of the Split()?

~~~~~~~~~~
"Ashoo Sharda" <as**********@reyrey.com> wrote in message
news:%2*****************@TK2MSFTNGP15.phx.gbl...
string [] str_arr = str.Split(',');

does not work. I had tried it. It gives me

str_arr[0] = "one field"
str_arr[1] = "two filed"
str_arr[2] = "\"field val one"
str_arr[3] = "field val two"
str_arr[4] = "field val three\""
str_arr[5] = "three field"

Instead of
str_arr[0] = "one field"
str_arr[1] = "two filed"
str_arr[2] = "field val one, field val two, field val three"
str_arr[3] = "three field"

look at the str_arr[2] value.

Thanks,
Ashoo
--
Sent via .NET Newsgroups
http://www.dotnetnewsgroups.com

Jan 7 '06 #10
Hi,

I tried the regular expressions and I am using VS.Net 2003.
This is how I have used it.

Regex regEx = new
Regex("(??<Vals>.*?),)+\"(?<Vals>.*?)\"(?:,(?<Vals >.*?))+$");
string [] text1 = regEx.Split(text);

I am getting the following run-time error

"An unhandled exception of type 'System.ArgumentException' occurred in
system.dll

Additional information: parsing
"(??<Vals>.*?),)+"(?<Vals>.*?)"(?:,(?<Vals>.*?))+$ " - Unrecognized
grouping construct."

Can you please advise as to what I am doing wrong?

Thanks,
Ashoo
--
Sent via .NET Newsgroups
http://www.dotnetnewsgroups.com
Jan 9 '06 #11
Thanks scott for the method. It was very helpful.

Thanks,
Ashoo.

--
Sent via .NET Newsgroups
http://www.dotnetnewsgroups.com
Jan 9 '06 #12
RSH
This is good stuff. How would the rest of the code look? I tried just
using plain regex but I couldn't get it to return the array.

Ron

"Lucky" <tu************@gmail.com> wrote in message
news:11**********************@o13g2000cwo.googlegr oups.com...
hi,
you can use Regular Expression which can give u such results. here is
one regular expression which will fetch you your desired result. i
assumed that you are using VS2003.

(?:(?<Vals>.*?),)+"(?<Vals>.*?)"(?:,(?<Vals>.*?))+ $

the ouput of your given example will be like this :

SubMatch: [Vals]
1:one field
2:two field
3:field val one, field val two,field val three
4:three field

try it out

Jan 9 '06 #13
RSH
This one removes all the unwanted characters as well:

public Form1()
{
InitializeComponent();
ArrayList al = ParseString(" \"M1, M2, M3, M4, \"S1, S2, S3 , S4,\"M5, M6,\"
S5, S6, \" M7, M8, M9 \"");
foreach (String aItem in al)
{
Console.WriteLine(aItem);
}
}
public ArrayList ParseString(string strInput)
{
string strTemp = "";
string ModString = "";
Boolean bQuote = false;
ArrayList aParsedString = new ArrayList();
for(int i = 0; i < strInput.Length; i++)
{
if (strInput[i] == '\"' && bQuote == false)
{
bQuote = true;
}
else if (strInput[i] == '\"' && bQuote == true)
{
ModString = strTemp.ToString();
ModString = ModString.Replace("\"", "");
ModString = ModString.TrimEnd(null);
ModString = ModString.TrimStart(null);
aParsedString.Add(ModString);
strTemp = "";
bQuote = false;
}
if(strInput[i] != ',')
{
strTemp += (strInput[i]);
}
else
{
strTemp += (strInput[i]);
if (bQuote == false)
{
ModString = strTemp.ToString();
ModString = ModString.Replace(",", " ");
ModString = ModString.Replace("\"", "");
ModString = ModString.TrimEnd(null);
ModString = ModString.TrimStart(null);
aParsedString.Add(ModString);
strTemp = "";
}
}
}
return (aParsedString);
}
Jan 9 '06 #14
hi Ashoo,

here is the implimentation of the Expression
Regex reg = new Regex("(?:(?.*?),)+[\\s]\"(?.*?)\"(?:,[\\s](?.*?))+$");
MatchCollection MatchColl;
MatchColl = reg.Matches("one field, two field, \"field val one, field
val two, field val three\", three field");
string[] vals;
foreach (Match mat in MatchColl) {
vals = Array.CreateInstance(typeof(string),
mat.Groups["Vals"].Captures.Count);
int i = 0;
foreach (Capture cap in mat.Groups["Vals"].Captures) {
vals(i) = cap.Value();
i++;
}
}

i've done this in vb.net and converted into c# for you so check some
sytaxts. anyways code is running.

let me know if you have any query regarding it.

Lucky

Jan 10 '06 #15
also check that i've little modified the expression. you can set
properties of RegEx to ignore case, multi line as per your requirements
but in your case i think only "ignore case" is only required.

Jan 10 '06 #16
RSH
Lucky,

I wish I could get this to work but I'm getting two major errors:

parsing "(?.*?),)+[\s]"(?.*?)"(?:,[\s](?.*?))+$" - Unrecognized grouping
construct.

And the looping part is generating a whole series of errors.

"Lucky" <tu************@gmail.com> wrote in message
news:11**********************@g44g2000cwa.googlegr oups.com...
hi Ashoo,

here is the implimentation of the Expression
Regex reg = new Regex("(?:(?.*?),)+[\\s]\"(?.*?)\"(?:,[\\s](?.*?))+$");
MatchCollection MatchColl;
MatchColl = reg.Matches("one field, two field, \"field val one, field
val two, field val three\", three field");
string[] vals;
foreach (Match mat in MatchColl) {
vals = Array.CreateInstance(typeof(string),
mat.Groups["Vals"].Captures.Count);
int i = 0;
foreach (Capture cap in mat.Groups["Vals"].Captures) {
vals(i) = cap.Value();
i++;
}
}

i've done this in vb.net and converted into c# for you so check some
sytaxts. anyways code is running.

let me know if you have any query regarding it.

Lucky

Jan 10 '06 #17
"RSH" <wa*************@yahoo.com> wrote in news:ud8yT#fFGHA.3120
@TK2MSFTNGP10.phx.gbl:
parsing "(?.*?),)+[\s]"(?.*?)"(?:,[\s](?.*?))+$" - Unrecognized grouping
construct.


Looks like the poster missed a '(' at the beginning. The second ')' is
unmatched. I haven't tested, but try just adding another '(' at the very
beginning.

-mdb

Jan 10 '06 #18
hi,
as i said i wrote this in vb.net and converted for you. but the
converter missed some parts. so i've manually wrote the code in c#.net.
here is the code. try it and let me know.
Regex reg = new
Regex("(?:(?<Vals>.*?),)+[\\s]\"(?<Vals>.*?)\"(?:,[\\s](?<Vals>.*?))+$");
MatchCollection MatchColl;
MatchColl = reg.Matches("one field, two field, \"field val one, field
val two, field val three\", three field");
string[] vals;
foreach (Match mat in MatchColl)
{
vals =new string[mat.Groups["Vals"].Captures.Count];
int i = 0;
foreach (Capture cap in mat.Groups["Vals"].Captures)
{
vals[i] = cap.Value;
i++;
}
}

you need to import this namespace in order to use this code.

using System.Text.RegularExpressions;

Lucky

Jan 13 '06 #19
http://spaces.msn.com/members/staceyw/Blog/cns!1pnsZpX0fPvDxLKC6rAAhLsQ!352.entry

--
William Stacey [MVP]

"asrs63" <as**********@reyrey.com> wrote in message
news:11**********************@o13g2000cwo.googlegr oups.com...
Hi,

Is there a class that can handle splitting of a string on a comma such
that the commas in quotes are ignored?

I know we can use Text::ParseWords directive in perl to do this, but I
am new to C#.Net and couldn't find anything similar.

For example

string str = "one field, two field, \"field val one, field val two,
field val three\", three field" ;

Then I should get the following in a str_arr

str_arr[0] = "one field"
str_arr[1] = "two filed"
str_arr[2] = "field val one, field val two, field val three"
str_arr[3] = "three field"

Thanks in advance,

Ashoo.

Jan 13 '06 #20

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Fuzzyman | last post by:
I want to parse some text and generate an output that is similar but not identical to the input. The string I produce will be of similar length to the input string - but a bit longer. I'm...
8
by: Anders Eriksson | last post by:
Hello! I want to extract some info from a some specific HTML pages, Microsofts International Word list (e.g. http://msdn.microsoft.com/library/en-us/dnwue/html/swe_word_list.htm). I want to...
15
by: Freddie | last post by:
Happy new year! Since I have run out of alcohol, I'll ask a question that I haven't really worked out an answer for yet. Is there an elegant way to turn something like: > moo cow "farmer john"...
19
by: ARK | last post by:
I am writing a search program in ASP(VBScript). The user can enter keywords and press submit. The user can separate the keywords by spaces and/or commas and key words may contain plain words,...
4
by: meldrape | last post by:
Hello, I need to parse a long string into no more than 30 character chunks, but I also need to leave the words intact. Right now, I am using: For intStart = 1 to Len(strOriginal) by 30...
5
by: Aleksandar Matijaca | last post by:
Hi there, I am in some need of help. I am trying to parse using the apache sax parser a file that has vaid UTF-8 characters - I keep end up getting a sun.io.MalformedInputException error. ...
2
by: JaythePCguy | last post by:
Hi, I am trying to write a text parser to group all nonprintable and control characters, spaces and space delimited words in different groups using Regex class. Using a parsing of...
4
by: william | last post by:
Hello, I've imported an excel spreadsheet with a Name column which is formatted as Last, First, MI. Some examples I have in the Name column: Smith, Ellen P. Jones, Mary Jane...
1
by: kellysgirl | last post by:
Im not good at parsing strings....and Ive been driving myslef nuts This is what I need to do....use an if/else statement to validate thata delimeter has been selected. These delimeters being...
13
by: Chris Carlen | last post by:
Hi: Having completed enough serial driver code for a TMS320F2812 microcontroller to talk to a terminal, I am now trying different approaches to command interpretation. I have a very simple...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.