473,387 Members | 1,455 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

Seperate words

375 256MB
Hello

I want to separate or say split a string.
Eg. I have a word Client Server
It should be split as
Client
Server
I went and searched and got it

Expand|Select|Wrap|Line Numbers
  1. string MainString = "String Manipulation"; 
  2. string [] Split = MainString.Split(new Char [] {' '}); 
  3. MessageBox.Show(Convert.ToString(Split[0])); 
  4. MessageBox.Show(Convert.ToString(Split[1])); 
Now this is not working if there are two spaces between two words
Moroever I also want to be split if the string consists of more than one word

Eg.
Client Server Connection
to be split into
Client
Server
Connection



Kindly do help
Regards
cmrhema
Jul 19 '07 #1
5 2131
bharathreddy
111 100+
Hai,

Pls find the article below. I hope this will help you.

Strings in .NET and C#
The System.String type (shorthand string in C#) is one of the most important types in .NET, and unfortunately it's much misunderstood. This article attempts to deal with some of the basics of the type.

What is a string?
A string is basically a sequence of characters. Each character is a Unicode character in the range U+0000 to U+FFFF (more on that later). The string type (I'll use the C# shorthand rather than putting System.String each time) has the following characteristics:

It is a reference type
It's a common misconception that string is a value type. That's because its immutability (see next point) makes it act sort of like a value type. It actually acts like a normal reference type. See my articles on parameter passing and memory for more details of the differences between value types and reference types.
It's immutable
You can never actually change the contents of a string, at least with safe code which doesn't use reflection. Because of this, you often end up changing the value of a string variable. For instance, the code s = s.Replace ("foo", "bar"); doesn't change the contents of the string that s originally referred to - it just sets the value of s to a new string, which is a copy of the old string but with "foo" replaced by "bar".
It can contain nulls
C programmers are used to strings being sequences of characters ending in '\0', the nul or null character. (I'll use "null" because that's what the Unicode code chart calls it in the detail; don't get it confused with the null keyword in C# - char is a value type, so can't be a null reference!) In .NET, strings can contain null characters with no problems at all as far as the string methods themselves are concerned. However, other classes (for instance many of the Windows Forms ones) may well think that the string finishes at the first null character - if your string ever appears to be truncated oddly, that could be the problem.
It overloads the == operator
When the == operator is used to compare two strings, the Equals method is called, which checks for the equality of the contents of the strings rather than the references themselves. For instance, "hello".Substring(0, 4)=="hell" is true, even though the references on the two sides of the operator are different (they refer to two different string objects, which both contain the same character sequence). Note that operator overloading only works here if both sides of the operator are string expressions at compile time - operators aren't applied polymorphically. If either side of the operator is of type object as far as the compiler is concerned, the normal == operator will be applied, and simple reference equality will be tested.
Interning
.NET has the concept of an "intern pool". It's basically just a set of strings, but it makes sure that every time you reference the same string literal, you get a reference to the same string. This is probably language-dependent, but it's certainly true in C# and VB.NET, and I'd be very surprised to see a language it didn't hold for, as IL makes it very easy to do (probably easier than failing to intern literals). As well as literals being automatically interned, you can intern strings manually with the Intern method, and check whether or not there is already an interned string with the same character sequence in the pool using the IsInterned method. This somewhat unintuitively returns a string rather than a boolean - if an equal string is in the pool, a reference to that string is returned. Otherwise, null is returned. Likewise, the Intern method returns a reference to an interned string - either the string you passed in if was already in the pool, or a newly created interned string, or an equal string which was already in the pool.

Literals
Literals are how you hard-code strings into C# programs. There are two types of string literals in C# - regular string literals and verbatim string literals. Regular string literals are similar to those in many other languages such as Java and C - they start and end with ", and various characters (in particular, " itself, \, and carriage return (CR) and line feed (LF)) need to be "escaped" to be represented in the string. Verbatim string literals allow pretty much anything within them, and end at the first " which isn't doubled. Even carriage returns and line feeds can appear in the literal! To obtain a " within the string itself, you need to write "". Verbatim string literals are distinguished by having an @ before the opening quote. Here are some examples of the two types of literal, and what they amount to:

Regular literal Verbatim literal Resulting string
"Hello" @"Hello" Hello
"Backslash: \\" @"Backslash: \" Backslash: \
"Quote: \"" @"Quote: """ Quote: "
"CRLF:\r\nPost CRLF" @"CRLF:
Post CRLF" CRLF:
Post CRLF

For other escape sequences, please see the relevant FAQ entry. Note that the difference is only for the compiler's sake. Once the string is in the compiled code, there's no such thing as a verbatim string literal vs a regular string literal.

Strings and the debugger
Numerous people run into problems when inspecting strings in the debugger, both with VS.NET 2002 and VS.NET 2003. Ironically, the problems are often generated by the debugger trying to be helpful, and either displaying the string as a regular string literal with backslash-escaped characters in, or displaying it as a verbatim string literal complete with leading @. This leads to many questions asking how the @ can be removed, despite the fact that it's not really there in the first place - it's only how the debugger's showing it. Also, some versions of VS.NET will stop displaying the contents of the string at the first null character, and evaluate its Length property incorrectly, calculating the value itself instead of asking the managed code. Again, it then considers the string to finish at the first null character.

Given the confusion this has caused, I believe it's best to examine strings in a different way when debugging, at least if you think something odd is going on. I suggest using a method like the one below, which will print the contents of a string to the console in a safe way. Depending on what kind of application you're developing, you may want to write this information to a log file, to the debug or trace listeners, or pop it up in a message box.

static readonly string[] LowNames =
{
"NUL", "SOH", "STX", "ETX", "EOT", "ENQ", "ACK", "BEL",
"BS", "HT", "LF", "VT", "FF", "CR", "SO", "SI",
"DLE", "DC1", "DC2", "DC3", "DC4", "NAK", "SYN", "ETB",
"CAN", "EM", "SUB", "ESC", "FS", "GS", "RS", "US"
};
public static void DisplayString (string text)
{
Console.WriteLine ("String length: {0}", text.Length);
foreach (char c in text)
{
if (c < 32)
{
Console.WriteLine ("<{0}> U+{1:x4}", LowNames[c], (int)c);
}
else if (c > 127)
{
Console.WriteLine ("(Possibly non-printable) U+{0:x4}", (int)c);
}
else
{
Console.WriteLine ("{0} U+{1:x4}", c, (int)c);
}
}
}



Memory usage
In the current implementation at least, strings take up 20+(n/2)*4 bytes (rounding the value of n/2 down), where n is the number of characters in the string. The string type is unusual in that the size of the object itself varies. The only other classes which do this (as far as I know) are arrays. Essentially, a string is a character array in memory, plus the length of the array and the length of the string (in characters). The length of the array isn't always the same as the length in characters, as strings can be "over-allocated" within mscorlib.dll, to make building them up easier. (StringBuilder does this, for instance.) While strings are immutable to the outside world, code within mscorlib can change the contents, so StringBuilder creates a string with a larger internal character array than the current contents requires, then appends to that string until the character array is no longer big enough to cope, at which point it creates a new string with a larger array. The string length member also contains a flag in its top bit to say whether or not the string contains any non-ASCII characters. This allows for extra optimisation in some cases.

Although strings aren't null-terminated as far as the API is concerned, the character array is null-terminated, as this means it can be passed directly to unmanaged functions without any copying being involved, assuming the inter-op specifies that the string should be marshalled as Unicode.

Encoding
(If you don't know about character encodings and Unicode, please read my article on the subject first.)

As stated at the start of the article, strings are always in Unicode encoding. The idea of "a Big-5 string" or "a string in UTF-8 encoding" is a mistake (as far as .NET is concerned) and usually indicates a lack of understanding of either encodings or the way .NET handles strings. It's very important to understand this - treating a string as if it represented some valid text in a non-Unicode encoding is almost always a mistake.

Now, the Unicode coded character set (one of the flaws of Unicode is that the one term is used for various things, including a coded character set and a character encoding scheme) contains more than 65536 characters. This means that a single char (System.Char) cannot cover every character. This leads to the use of surrogates where characters above U+FFFF are represented in strings as two characters. Essentially, string uses the UTF-16 character encoding form. Most developers may well not need to know much about this, but it's worth at least being aware of it.

Culture and internationalization oddities
Some of the oddities of Unicode lead to oddities in string and character handling. Many of the string methods are culture-sensitive - in other words, what they do depends on the culture of the current thread. For example, what would you expect "i".toUpper() to return? Most people would say "I", but in Turkish the correct answer is "İ" (Unicode U+0130, "Latin capital I with dot above"). To perform a culture-insensitive case change, you can use CultureInfo.InvariantCulture, and pass that to the overload of String.ToUpper which takes a CultureInfo.

There are further oddities when it comes to comparing, sorting, and finding the index of a substring. Some of these are culture-specific, and some aren't. For instance, in all cultures (as far as I can see), "lassen" and "la\u00dfen" (a "sharp S" or eszett being the Unicode-escaped character in there) are considered equal when CompareTo or Compare are used, but not when Equals is used. IndexOf will treat the eszett as the same as "ss", unless you use a CompareInfo.IndexOf and specify CompareOptions.Ordinal as the options to use.

Some other unicode character appear to be completely invisible to the normal IndexOf. Someone asked in the C# newsgroup why a search/replace method was going into an infinite loop. It was repeatedly using Replace to replace all double spaces with a single space, and checking whether or not it had finished by using IndexOf, so that multiple spaces would collapse to a single space. Unfortunately, this was failing due to a "strange" character in the original string between two spaces. IndexOf matched the double space, ignoring the extra character, but Replace didn't. I don't know which exact character was in the real data, but it can be easily reproduced using U+200C which is a zero-width non-joiner character (whatever that means, exactly!). Put one of those in the middle of the text you're searching in, and IndexOf will ignore it, but Replace won't. Again, to make the two methods behave the same, you can use CompareInfo.IndexOf and pass in CompareOptions.Ordinal. My guess is that there's a lot of code which would fail on "awkward" data like this. (I wouldn't for a moment claim that all my code is immune, either.)
Jul 19 '07 #2
RoninZA
78
That was a little long-winded, so you can try the following code...C# in VS2005, and I've replaced all space characters with + to improve readability:

Expand|Select|Wrap|Line Numbers
  1. private string[] splitSentence(string sentence)
  2. {
  3.     //First we're going to strip out any double spaces, and replace them with
  4.     //single spaces
  5.     while (sentence.IndexOf("++") > -1)
  6.         sentence = sentence.Replace("++", "+");
  7.  
  8.     //Now we're going to split the sentence into seperate words, into a string
  9.     //array, which we will be returning out of this function
  10.     string[] words = sentence.Split('+');
  11.  
  12.     return words;
  13. }
Hope this helps :)

PS: Remember to replace the '+' with spaces when you test the code!
Jul 19 '07 #3
cmrhema
375 256MB
That was a little long-winded, so you can try the following code...C# in VS2005, and I've replaced all space characters with + to improve readability:

Expand|Select|Wrap|Line Numbers
  1. private string[] splitSentence(string sentence)
  2. {
  3.     //First we're going to strip out any double spaces, and replace them with
  4.     //single spaces
  5.     while (sentence.IndexOf("++") > -1)
  6.         sentence = sentence.Replace("++", "+");
  7.  
  8.     //Now we're going to split the sentence into seperate words, into a string
  9.     //array, which we will be returning out of this function
  10.     string[] words = sentence.Split('+');
  11.  
  12.     return words;
  13. }
Hope this helps :)

PS: Remember to replace the '+' with spaces when you test the code!
Thanks both of you
Its resolved.
I did in the below way

Expand|Select|Wrap|Line Numbers
  1. string MainString = lstClientSendData.Text;
  2.            string[] Split = MainString.Split(new Char[] { ' ' });
  3.            int s1 = Split.Length;
  4.            string final;
  5.  
  6.             for (int i = 0; i < s1; i++)
  7.  
  8.             {
  9.                 final = Split[i].ToString();
  10.                 if (final != "")
  11.  
  12.                 {
  13.                     MessageBox.Show(final);
  14.                 }
  15.  
  16.  
  17.             }
Thanks onece again
Jul 20 '07 #4
Plater
7,872 Expert 4TB
Better yet, the Split() function is overriden to take a 2nd parameter that says what to do with "empty sets"
So if you split a string based on ' ' (a space) and you had the string
"I ran fast" (2 spaces in between ran and fast) The output array would be
"i"
"ran"
"fast"

instead of
"I"
"ran"
""
"fast"
Jul 20 '07 #5
cmrhema
375 256MB
Better yet, the Split() function is overriden to take a 2nd parameter that says what to do with "empty sets"
So if you split a string based on ' ' (a space) and you had the string
"I ran fast" (2 spaces in between ran and fast) The output array would be
"i"
"ran"
"fast"

instead of
"I"
"ran"
""
"fast"
yes plater thats exactly what i wanted in my program
Jul 21 '07 #6

Sign in to post your reply or Sign up for a free account.

Similar topics

0
by: Karl Rhodes | last post by:
Ok, we think we have a problem which SHOULD be simple to resolve but appears to be impossible (unless we're not as good as we thought we were!!!) We have seperate workstations on which we do our...
7
by: Shannan Casteel via AccessMonster.com | last post by:
I have a form for entering part numbers along with the associated quantity for each part. There are 25 Part fields and 25 associated Quantity fields. If I go to record 1 and enter part number 1234...
9
by: ern | last post by:
I'm using scanf("%s",userInput) to capture up to three words from the user. I want to seperate those three words into three variables: char * firstWord; char * secondWord; char * thirdWord; ...
8
by: Serge | last post by:
Hi, I have some intensive code that is running on my main thread. I try to show a status update on a 'status form'. The problem that i have is that because it is running in the same thread the...
12
by: Brian Keating EI9FXB | last post by:
Hello all, Wonder what approach is used for this problem. I have a MDIApplication, the MDIClinets are to be in a seperate thread. So I've done something like this, // Create a new Show...
8
by: feng | last post by:
In our VB.Net application, we need to be able to start another process (thread won't do it) and run some logic in it, and still be able to communicate with the main process. Is this possible and...
6
by: Kyle Teague | last post by:
What would give better performance, serializing a multidimensional array and storing it in a single entry in a table or storing each element of the array in a separate table and associating the...
0
by: bloukopkoggelmander | last post by:
Hi All wonderfull brains! Right I have two questions after my last very successfull thread. I have tried looking these up on the net, but no luck. Scenario 1 is : I have a bound form with bound...
2
by: desertavataraz | last post by:
I am going write an application in C++ that allows the user to see two languages at once, and allows them to search each individual language for words or keywords. I have a font that I made...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.