473,406 Members | 2,259 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

AlgorithmNeeded

Hi,
I am looking for a routine (if someone has already written it), which can
convert the following string

"ILoveMyPCButIDon'tLoveWindows2000"

To

"I Love My PC But I Don't Love Windows 2000"

Didn't want to spend time if someone has already implemented it and wants to
share it.

If I don't find it I will write it.

Thanks.
Jun 1 '07 #1
7 1111
So you want to determine where every word starts and pre-pend each word with
a space?

In your sample there is no clear way to separate words from each other.
You've got one acronym (PC), so the task is already quite complex. I regret
to inform you that in order to achieve this, you'll have to come up with
some artificially intelligent piece of software.

"Aamir Mahmood" <ab*****@efg.hijwrote in message
news:O%****************@TK2MSFTNGP04.phx.gbl...
Hi,
I am looking for a routine (if someone has already written it), which can
convert the following string

"ILoveMyPCButIDon'tLoveWindows2000"

To

"I Love My PC But I Don't Love Windows 2000"

Didn't want to spend time if someone has already implemented it and wants
to share it.

If I don't find it I will write it.

Thanks.

Jun 1 '07 #2
* Aamir Mahmood wrote, On 2-6-2007 0:23:
Hi,
I am looking for a routine (if someone has already written it), which can
convert the following string

"ILoveMyPCButIDon'tLoveWindows2000"

To

"I Love My PC But I Don't Love Windows 2000"

Didn't want to spend time if someone has already implemented it and wants to
share it.

If I don't find it I will write it.

Thanks.


A regex should do the trick. But it's quite hard as ILove should be
split into I Love, but PCBut shouldn't be split into P C But. The
following expression comes pretty close:

"(?:(?<tag>[a-z])(?=[A-Z0-9])|(?<tag>[A-Z]*[A-Z])(?![a-z]))"

Just use Regex.Replace with the following replacement pattern:

"${tag} "

There might be a few examples which won't work, but most should.

Regex.Replace(
input,
"(?:(?<tag>[a-z])(?=[A-Z0-9])|(?<tag>[A-Z]*[A-Z])(?![a-z]))",
"${tag} ",
RegexOptions.None
);

Jesse
Jun 1 '07 #3
There is of course a dumb way to determine where an acronym is.

If you loop through the characters and then find a capital letter =A new
word starts.
While ( letter is capital && next_letter is capital too ) { add letter to
acronym }

There's a risk of having two adjacent acronyms - in this case, tough luck...

using System;
using System.Collections.Generic;
using System.Text;

namespace parsing
{
class Program
{
static void Main( string[] args )
{
string strOriginal = "ILoveMyPCButIDon'tLoveWindows2000";
string strResult = "";
bool bReadingNumber = false;
bool bReadingAcronym = false;

for( int i = 0; i < strOriginal.Length; i++ )
{
if( strOriginal[i] >= 'A' && strOriginal[i] <= 'Z' )
{
if( i 0 )
{
if( i + 1 < strOriginal.Length &&
( strOriginal[i + 1] >= 'A' && strOriginal[i +
1] <= 'Z' ) )
{
if( !bReadingAcronym )
{
bReadingAcronym = true;
strResult += ' ';
}
}
else
{
bReadingAcronym = false;
strResult += ' ';
}
}

bReadingNumber = false;
}

else if( strOriginal[i] >= '0' && strOriginal[i] <= '9' )
{
if( !bReadingNumber )
{
if( i 0 )
strResult += ' ';

bReadingNumber = true;
}
}

else
bReadingNumber = false;

strResult += strOriginal[i];
}
}
}
}

"Ashot Geodakov" <a_********@nospam.hotmail.comwrote in message
news:OO**************@TK2MSFTNGP03.phx.gbl...
So you want to determine where every word starts and pre-pend each word
with a space?

In your sample there is no clear way to separate words from each other.
You've got one acronym (PC), so the task is already quite complex. I
regret to inform you that in order to achieve this, you'll have to come up
with some artificially intelligent piece of software.

"Aamir Mahmood" <ab*****@efg.hijwrote in message
news:O%****************@TK2MSFTNGP04.phx.gbl...
>Hi,
I am looking for a routine (if someone has already written it), which can
convert the following string

"ILoveMyPCButIDon'tLoveWindows2000"

To

"I Love My PC But I Don't Love Windows 2000"

Didn't want to spend time if someone has already implemented it and wants
to share it.

If I don't find it I will write it.

Thanks.


Jun 1 '07 #4
If you noticed, every word is starting with a capital letter.
So even if multiple caps come along the last one should be the beginning of
the next word. All the previous ones should either make up an acronym or a
word itself (as in "ILove").

"Ashot Geodakov" <a_********@nospam.hotmail.comwrote in message
news:OO**************@TK2MSFTNGP03.phx.gbl...
So you want to determine where every word starts and pre-pend each word
with a space?

In your sample there is no clear way to separate words from each other.
You've got one acronym (PC), so the task is already quite complex. I
regret to inform you that in order to achieve this, you'll have to come up
with some artificially intelligent piece of software.

"Aamir Mahmood" <ab*****@efg.hijwrote in message
news:O%****************@TK2MSFTNGP04.phx.gbl...
>Hi,
I am looking for a routine (if someone has already written it), which can
convert the following string

"ILoveMyPCButIDon'tLoveWindows2000"

To

"I Love My PC But I Don't Love Windows 2000"

Didn't want to spend time if someone has already implemented it and wants
to share it.

If I don't find it I will write it.

Thanks.


Jun 1 '07 #5
Nice, I haven't thought of regex's...

Can you come up with an internationalized one? :)

"Jesse Houwing" <je***********@nospam-sogeti.nlwrote in message
news:eI**************@TK2MSFTNGP04.phx.gbl...
>* Aamir Mahmood wrote, On 2-6-2007 0:23:
>Hi,
I am looking for a routine (if someone has already written it), which can
convert the following string

"ILoveMyPCButIDon'tLoveWindows2000"

To

"I Love My PC But I Don't Love Windows 2000"

Didn't want to spend time if someone has already implemented it and wants
to share it.

If I don't find it I will write it.

Thanks.


A regex should do the trick. But it's quite hard as ILove should be split
into I Love, but PCBut shouldn't be split into P C But. The following
expression comes pretty close:

"(?:(?<tag>[a-z])(?=[A-Z0-9])|(?<tag>[A-Z]*[A-Z])(?![a-z]))"

Just use Regex.Replace with the following replacement pattern:

"${tag} "

There might be a few examples which won't work, but most should.

Regex.Replace(
input,
"(?:(?<tag>[a-z])(?=[A-Z0-9])|(?<tag>[A-Z]*[A-Z])(?![a-z]))",
"${tag} ",
RegexOptions.None
);

Jesse

Jun 1 '07 #6
* Ashot Geodakov wrote, On 2-6-2007 1:36:
Nice, I haven't thought of regex's...

Can you come up with an internationalized one? :)
Replace [A-Z] with \p{Lu}
Replace [a-z] with \p{Ll}
Replace [0-9] with \p{N}

And you should be well under way. This sure as hell isn't going to cover
all possibilities (like McDonalds and O'Niell in names) but the previous
example didn't work well on those either.

Jesse

>
"Jesse Houwing" <je***********@nospam-sogeti.nlwrote in message
news:eI**************@TK2MSFTNGP04.phx.gbl...
>* Aamir Mahmood wrote, On 2-6-2007 0:23:
>>Hi,
I am looking for a routine (if someone has already written it), which can
convert the following string

"ILoveMyPCButIDon'tLoveWindows2000"

To

"I Love My PC But I Don't Love Windows 2000"

Didn't want to spend time if someone has already implemented it and wants
to share it.

If I don't find it I will write it.

Thanks.

A regex should do the trick. But it's quite hard as ILove should be split
into I Love, but PCBut shouldn't be split into P C But. The following
expression comes pretty close:

"(?:(?<tag>[a-z])(?=[A-Z0-9])|(?<tag>[A-Z]*[A-Z])(?![a-z]))"

Just use Regex.Replace with the following replacement pattern:

"${tag} "

There might be a few examples which won't work, but most should.

Regex.Replace(
input,
"(?:(?<tag>[a-z])(?=[A-Z0-9])|(?<tag>[A-Z]*[A-Z])(?![a-z]))",
"${tag} ",
RegexOptions.None
);

Jesse

Jun 2 '07 #7


"Ashot Geodakov" <a_********@nospam.hotmail.comwrote in message
news:OO**************@TK2MSFTNGP03.phx.gbl...
So you want to determine where every word starts and pre-pend each word
with a space?

In your sample there is no clear way to separate words from each other.
You've got one acronym (PC), so the task is already quite complex. I
regret to inform you that in order to achieve this, you'll have to come up
with some artificially intelligent piece of software.

"Aamir Mahmood" <ab*****@efg.hijwrote in message
news:O%****************@TK2MSFTNGP04.phx.gbl...
>Hi,
I am looking for a routine (if someone has already written it), which can
convert the following string

"ILoveMyPCButIDon'tLoveWindows2000"

To

"I Love My PC But I Don't Love Windows 2000"

Didn't want to spend time if someone has already implemented it and wants
to share it.

If I don't find it I will write it.

Thanks.

I'm curious. Why do you have a string like this? Are you creating this
string for some reason? I'm sure if we get a good understanding of why your
data is like this, we may be able to help you further in getting around it,
otherwise, the previous replies are similar if not exactly what you'll have
to do...

HTH,
Mythran

Jun 4 '07 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.