473,725 Members | 2,193 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Parse a String and get unique values

Hi,

I'm looking for ideas for the most efficient way to accomplish this. I have a string representing names a person goes by.

"John Myers Joe John Myers"

And I need to parse it in such a way that I end up with an array of UNIQUE strings that appear in the original string (In no particular order)

arr(0) = "John"
arr(1) = "Myers"
arr(2) = "Joe"

One way I can think of is to use a hashtable and add the words like they were keys, then just iterate through the end result, The results should be unique, but I'm not sure that would be the best method for performance.

Thanks for any ideas!
--Michael
Nov 21 '05 #1
2 2769
Here is an example from the VS2003 help files for string
parsing - Note how you can have several delimeters listed -
you need to have delimeters in your string.

Sub strSplit()
'4 delimeters here, a space, comma, period, colon
Dim delimStr As String = " ,.:"
Dim delimiter As Char() = delimStr.ToChar Array()
Dim words As String = "one two,three:four. "
Dim split As String() = Nothing
Console.WriteLi ne("The delimiters are -{0}-", delimStr)
Dim x As Integer
For x = 1 To 5
split = words.Split(del imiter, x)
Console.WriteLi ne(ControlChars .Cr + "count =
{0,2} .............." , x)
Dim s As String
For Each s In split
Console.WriteLi ne("-{0}-", s)
Next s
Next x
End Sub

-----Original Message-----
Hi,

I'm looking for ideas for the most efficient way to accomplish this. I have a string representing names a
person goes by.
"John Myers Joe John Myers"

And I need to parse it in such a way that I end up with an array of UNIQUE strings that appear in the original
string (In no particular order)
arr(0) = "John"
arr(1) = "Myers"
arr(2) = "Joe"

One way I can think of is to use a hashtable and add the words like they were keys, then just iterate through the
end result, The results should be unique, but I'm not sure
that would be the best method for performance.
Thanks for any ideas!
--Michael
.

Nov 21 '05 #2
something like:

(?=(\b\w+\b)|^) ([^\1]*)\1+

could be used in a regex replace on the original string to give you unique
words...replaci ng w/ $2. from there you could just do a regex split on \b
(word boundry). viola, there's your unique array.

hth,

steve
"Raterus" <mo*********@su retar.reverse> wrote in message
news:eI******** *****@TK2MSFTNG P10.phx.gbl...
Hi,

I'm looking for ideas for the most efficient way to accomplish this. I have
a string representing names a person goes by.

"John Myers Joe John Myers"

And I need to parse it in such a way that I end up with an array of UNIQUE
strings that appear in the original string (In no particular order)

arr(0) = "John"
arr(1) = "Myers"
arr(2) = "Joe"

One way I can think of is to use a hashtable and add the words like they
were keys, then just iterate through the end result, The results should be
unique, but I'm not sure that would be the best method for performance.

Thanks for any ideas!
--Michael
Nov 21 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
22043
by: N | last post by:
Hi, I would like to parse out each value that is seperated by a comma in a field and use that value to join to another table. What would be the easiest way to do so without having to write a function or routine ? EX. Table AAA COL1 COL2
22
872
by: Ram Laxman | last post by:
Hi all, I have a text file which have data in CSV format. "empno","phonenumber","wardnumber" 12345,2234353,1000202 12326,2243653,1000098 Iam a beginner of C/C++ programming. I don't know how to tokenize the comma separated values.I used strtok function reading line by line using fgets.but it gives some weird behavior.It doesnot stripout the "" fully.Could any body have sample code for the same so that it will be helfful for my...
24
3168
by: | last post by:
Hi, I need to read a big CSV file, where different fields should be converted to different types, such as int, double, datetime, SqlMoney, etc. I have an array, which describes the fields and their types. I would like to somehow store a reference to parsing operations in this array (such as Int32.Parse, Double.Parse, SqlMoney.Parse, etc), so I can invoke the appropriate one without writing a long switch.
10
18605
by: Mamuninfo | last post by:
Hello, Have any function in the DB2 database that can generate unique id for each string like oracle, mysql,sybase,sqlserver database. In mysql:- select md5(concat_ws("Row name")) from tablename; Here this function generate unique id for each row of the table. Regards..
3
4746
by: Ken Bush | last post by:
How can I write an update query that removes part of a field? Like if I have a field with values such as 8/3/68 (a birthday obviously) and I need to put values in a new column but I need everything after and including the final / removed to end up with simply 8/3
11
3612
by: hoopsho | last post by:
Hi Everyone, I am trying to write a program that does a few things very fast and with efficient use of memory... a) I need to parse a space-delimited file that is really large, upwards fo a million lines. b) I need to store the contents into a unique hash. c) I need to then sort the data on a specific field. d) I need to pull out certain fields and report them to the user.
6
8588
by: trevor | last post by:
Incorrect values when using float.Parse(string) I have discovered a problem with float.Parse(string) not getting values exactly correct in some circumstances(CSV file source) but in very similar circumstances(XML file source) and with exactly the same value it gets it perfectly correct all the time. These are the results I got, XML is always correct, CSV are only incorrect for some of the values (above about 0.01) but always gives the...
9
18133
by: Robert Mago | last post by:
Is there a way to create a 10 characthers or less, alph-numeric string which is unique. I can't use the guid since its longer then 10 characthers. Also i cannot use a random number, since being random does not mean that its unique.
1
64167
AdrianH
by: AdrianH | last post by:
Assumptions I am assuming that you know or are capable of looking up the functions I am to describe here and have some remedial understanding of C programming. FYI Although I have called this article “How to Parse a File in C++”, we are actually mostly lexing a file which is the breaking down of a stream in to its component parts, disregarding the syntax that stream contains. Parsing is actually including the syntax in order to make...
0
8888
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9401
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
9176
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9113
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8097
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6702
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6011
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4784
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
3
2157
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.