By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
434,741 Members | 1,979 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 434,741 IT Pros & Developers. It's quick & easy.

Changing case in a sentence to Capitalize Case.

P: n/a
Hello,

I am a javascript newbie and I'm stick at one place.

I have a requirement where I will get a sentence in a variable

example

var v1 ="This is a sentence"

Now I have to change the sentence to Capitalize case where the first
alphabet of every word will be in caps

So for above example the output should be

This Is A Sentence

I have found some scripts that can change case to uppercase or
lowercase but I'm not able to come up with a solution for this.

One more thing, there is no limit on the number of words that I'll get
in the sentence. I may get one word or even ten words.

I'm looking for a solution that will work for all scenarios,

Regards,
Rayne
Sep 22 '08 #1
Share this Question
Share on Google+
12 Replies


P: n/a
On 2008-09-22 18:19, ja***********@gmail.com wrote:
var v1 ="This is a sentence"

Now I have to change the sentence to Capitalize case where the first
alphabet of every word will be in caps
v1 = v1.replace(/\b(\w)/g, function (s, c) {
return c.toUpperCase();
});

BTW, your question looks like a typical homework assignment. If that's
the case: letting other people solve your beginner assignments is not a
not a clever idea, if you want to learn the language or have to pass
exams later. If this wasn't homework, please disregard.
- Conrad
Sep 22 '08 #2

P: n/a
ja***********@gmail.com wrote:
I have a requirement where I will get a sentence in a variable

example

var v1 ="This is a sentence"

Now I have to change the sentence to Capitalize case where the first
alphabet of every word will be in caps
You mean _letter_; *not* alphabet, which is a set of letters.

<http://en.wikipedia.org/wiki/Alphabet>
So for above example the output should be

This Is A Sentence
v1 = v1.replace(/(^|\s)([a-z])/g,
function(m, p1, p2) {
return p1 + p2.toUpperCase();
});

You may adapt the character class to fit your needs.
PointedEars
--
var bugRiddenCrashPronePieceOfJunk = (
navigator.userAgent.indexOf('MSIE 5') != -1
&& navigator.userAgent.indexOf('Mac') != -1
) // Plone, register_function.js:16
Sep 22 '08 #3

P: n/a
<ja***********@gmail.comschreef in bericht
news:b2**********************************@z6g2000p re.googlegroups.com...
Hello,

I am a javascript newbie and I'm stick at one place.

I have a requirement where I will get a sentence in a variable

example

var v1 ="This is a sentence"

Now I have to change the sentence to Capitalize case where the first
alphabet of every word will be in caps

So for above example the output should be

This Is A Sentence
a) Regular Expressions, but I don't know how to use them.

b) 1:Split string into words; 2:capitalize first letters; 3:concatenate
words into string.
1: look up what var wordarray = []; wordarray = v1.split(' ') will do
2: for all k: wordarray[k] = wordarray[k].charAt(0).toUpperCase() +
wordarray[k].substr(1);
3: check out the join function: output = wordarray.join(' ');

Tom
Sep 22 '08 #4

P: n/a
Conrad Lender wrote:
On 2008-09-22 18:19, ja***********@gmail.com wrote:
>var v1 ="This is a sentence"

Now I have to change the sentence to Capitalize case where the first
alphabet of every word will be in caps

v1 = v1.replace(/\b(\w)/g, function (s, c) {
return c.toUpperCase();
});
\b matches a word boundary; it does not work with non-ASCII letters.
\w matches ASCII letters, decimal digits and `_'.
BTW, your question looks like a typical homework assignment. If that's
the case: letting other people solve your beginner assignments is not a
not a clever idea, if you want to learn the language or have to pass
exams later. If this wasn't homework, please disregard.
Full ACK.
PointedEars
--
Use any version of Microsoft Frontpage to create your site.
(This won't prevent people from viewing your source, but no one
will want to steal it.)
-- from <http://www.vortex-webdesign.com/help/hidesource.htm>
Sep 22 '08 #5

P: n/a
SAM
ja***********@gmail.com a crit :
>
var v1 ="This is a sentence"

Now I have to change the sentence to Capitalize case where the first
alphabet of every word will be in caps
function capitalize( t ) {
t = t.split(' ');
for(var i=0; i<t.length; i++) {
t[i] = t[i].charAt(0).toUpperCase()+t[i].substring(1);
}
return t.join(' ');
}

alert(capitalize(v1));
I'm looking for a solution that will work for all scenarios,
alert(capitalize('ask google for charAt, join, '+
'substring and split in javaScript'));


HTML :
======
<a href="javascript:document.geElementById('here').in nerHTML = v1">
capitalize the variable v1</a>

<p id="here" style="text-transform: capitalize"></p>
Sep 22 '08 #6

P: n/a
On 2008-09-22 20:00, Thomas 'PointedEars' Lahn wrote:
Conrad Lender wrote:
..
>v1 = v1.replace(/\b(\w)/g, function (s, c) {
return c.toUpperCase();
});
\b matches a word boundary; it does not work with non-ASCII letters.
\w matches ASCII letters, decimal digits and `_'.
Yes, I was assuming simple English sentences, where \b will usually work
(and it doesn't matter when toUpperCase is applied digits or the
underscore). In this case, my earlier example could even be simplified to:

v1 = v1.replace(/\b\w/g, function (c) {
return c.toUpperCase();
});

Your character class approach (in your other post) would work if the
character set is known and rather small. Latin1, for example, could use
[a-z*áâãäåæçèéêëì*îïðñòóôõöø ùúûüý]. But if we're assuming an random
international setting, this is going to be a lot harder. Creating a
character class that would work on the complete Unicode set would be
almost impossible, and also error prone. It would be simpler to define
custom "word boundary" characters, and just let JavaScript uppercase
everything following them:

var wBound = '\\s,.;:?!\'"';
var rex = new RegExp('(^|[' + wBound + '])([^' + wBound + '])', 'g');

v1 = v1.replace(rex, function (s, g1, g2) {
return g1 + g2.toUpperCase();
});

wBound would still have to be adjusted as required to include, for
example, different types of quotes, or the Japanese/Chinese full stop
character 。).
- Conrad
Sep 22 '08 #7

P: n/a
Conrad Lender wrote:
On 2008-09-22 20:00, Thomas 'PointedEars' Lahn wrote:
>Conrad Lender wrote:
>>v1 = v1.replace(/\b(\w)/g, function (s, c) { return c.toUpperCase();
});
\b matches a word boundary; it does not work with non-ASCII letters.
\w matches ASCII letters, decimal digits and `_'.

Yes, I was assuming simple English sentences, where \b will usually work
(and it doesn't matter when toUpperCase is applied digits or the
underscore).
It matters because it would be needlessly inefficient.
In this case, my earlier example could even be simplified to:

v1 = v1.replace(/\b\w/g, function (c) {
return c.toUpperCase();
});
Correct, \b would match the empty string before the \w then.
Your character class approach (in your other post) would work if the
character set is known and rather small. Latin1, for example, could use
[a-z*áâãäåæçèéêëì*îïðñòóôõöø ùúûüý]. But if we're assuming an random
international setting, this is going to be a lot harder.
Harder, granted.
Creating a character class that would work on the complete Unicode set
would be almost impossible, and also error prone.
I do not think it any of the above would apply, though. ISTM you are
unaware of the fact that, while the Unicode Standard (4.0) already defines a
finite character set of which ECMAScript implementations only support the
Basic Multilingual Plane (U+0000 to U+FFFF), the number of characters that
can be subject to case switching is even more limited, and that character
ranges can be used in regular expressions, whereas their boundaries can also
be written as Unicode escape sequences.

All it takes is a bit of research on the defined Unicode character ranges
and the scripts (as in writing) they provide support for. Take some Latin
character ranges for example:

/[a-z\u00c0-\u00f6\u00f8-\u00ff\u0100-\u017f\u0180-\u01bf\u01c4-\u024f]/i

(This can be optimized, of course, but it helps [you] to get the picture.)

See also: <http://www.unicode.org/charts/>
It would be simpler to define custom "word boundary" characters, and just
let JavaScript uppercase everything following them:
Would it? ISTM the punctuation of languages is a lot more complicated than
their letters; take Spanish, for example. But then ISTM capitalizing titles
is not something that is common in other languages than English, and some
even consider it deprecated there already. However, for uniformity one
might be inclined to apply this formatting to non-English (song) titles as
well; I have seen that before.
var wBound = '\\s,.;:?!\'"';
var rex = new RegExp('(^|[' + wBound + '])([^' + wBound + '])', 'g');

v1 = v1.replace(rex, function (s, g1, g2) {
return g1 + g2.toUpperCase();
});
That does not make much sense, though, since with the exception of white
space, and single and double quote, none of those (punctuation) characters
is likely to occur directly before something that can be considered a word
character. In fact, it is customary to have (white) space between those
characters and the word character to be uppercased, so there would never be
a match then.
wBound would still have to be adjusted as required to include, for
example, different types of quotes, or the Japanese/Chinese full stop
character 。).
I am afraid it would have to be rewritten entirely anyway.
PointedEars
--
Anyone who slaps a 'this page is best viewed with Browser X' label on
a Web page appears to be yearning for the bad old days, before the Web,
when you had very little chance of reading a document written on another
computer, another word processor, or another network. -- Tim Berners-Lee
Sep 22 '08 #8

P: n/a
SAM
Thomas 'PointedEars' Lahn a écrit :
Conrad Lender wrote:
>var wBound = '\\s,.;:?!\'"';
var rex = new RegExp('(^|[' + wBound + '])([^' + wBound + '])', 'g');

v1 = v1.replace(rex, function (s, g1, g2) {
return g1 + g2.toUpperCase();
});

That does not make much sense, though, since with the exception of white
space, and single and double quote, none of those (punctuation) characters
is likely to occur directly before something that can be considered a word
character.
Hu ? at least : ' and " can be found
(the others too if typo)
In fact, it is customary to have (white) space between those
I see no white space after ' or " in following :
l'éléphant ça "trompe" énormément
characters and the word character to be uppercased, so there would never be
a match then.
We never need to capitalize all words of a sentence in any case. It is a
spelling mistake otherwise of grammar in french.
>wBound would still have to be adjusted as required to include, for
example, different types of quotes, or the Japanese/Chinese full stop
character 。).

I am afraid it would have to be rewritten entirely anyway.
and your solution doesn't work for me

'l\'éléphant ça "trompe" énormément'.replace(/(^|\s)([a-z])/g,
function(m, p1, p2) {
return p1 + p2.toUpperCase();
});

result :
L'éléphant ça "trompe" énormément

While Conrad's code gives :
L'Éléphant Ça "Trompe" Énormément
if the charset is e.g. Latin 1 (and not utf-8)

--
sm
Sep 22 '08 #9

P: n/a
On 2008-09-23 00:20, Thomas 'PointedEars' Lahn wrote:
>Creating a character class that would work on the complete Unicode set
would be almost impossible, and also error prone.
I do not think it any of the above would apply, though. ISTM you are
unaware of the fact that, while the Unicode Standard (4.0) already defines a
finite character set of which ECMAScript implementations only support the
Basic Multilingual Plane (U+0000 to U+FFFF), the number of characters that
can be subject to case switching is even more limited, and that character
ranges can be used in regular expressions, whereas their boundaries canalso
be written as Unicode escape sequences.
You're mistaken, I'm quite familiar with Unicode and character set
support in JavaScript.

The BMP has a theoretical limit of 2^16 characters (although not all of
the code points are currently assigned, granted), which gives us an
upper limit of 65536 characters to consider. Yes, it's a finite set.
Note that I said "almost impossible", meaning not practically feasible,
because...
All it takes is a bit of research on the defined Unicode character ranges
and the scripts (as in writing) they provide support for.
... this is still quite an undertaking, and I wouldn't presume to
understand enough about, say, Mongolian or Burmese to decide which of
the characters could/should be converted to uppercase. There are over a
hundred scripts in the BMP, not including the symbol collections.
Take some Latin character ranges for example:

/[a-z\u00c0-\u00f6\u00f8-\u00ff\u0100-\u017f\u0180-\u01bf\u01c4-\u024f]/i

(This can be optimized, of course, but it helps [you] to get the picture.)
Again, I think I've already got a pretty good picture, but thanks for
the effort. Just to illustrate the pitfalls of your approach - out of
only 591 characters (basic latin to latin extended-b), you have

- included all the uppercase characters like À (U+00C0)
- included the × character (U+00D7) which is a symbol
- used the "i" modifier, which is redundant because you have already
listed the exact code points that you want included

That's for a group of characters that we're largely familiar with. Now,
to find out which of the characters in the more exotic groups are
lowercase letters, that would take more than just "a bit of research".

Perhaps somebody else has already collected all the interesting
character ranges, and we could use that information in our character
class, but why should we, if JavaScript's toUpperCase() already does the
right thing with all types of characters? Hence:
>It would be simpler to define custom "word boundary" characters, and just
let JavaScript uppercase everything following them:
Would it? ISTM the punctuation of languages is a lot more complicated than
their letters; take Spanish, for example. But then ISTM capitalizing titles
is not something that is common in other languages than English, and some
even consider it deprecated there already. However, for uniformity one
might be inclined to apply this formatting to non-English (song) titlesas
well; I have seen that before.
That's beside the point. For one thing, to a lesser extent, capitalising
the first letters in titles is also common in some Germanic languages,
in Italian, etc. More importantly, deciding which languages do or do not
use capitalisation, or have deprecated it, or are only using it for
certain words, is beyond what we can do in a simple script function;
that's up to the person requesting the functionality.
>var wBound = '\\s,.;:?!\'"';
var rex = new RegExp('(^|[' + wBound + '])([^' + wBound + '])', 'g');

v1 = v1.replace(rex, function (s, g1, g2) {
return g1 + g2.toUpperCase();
});
That does not make much sense, though, since with the exception of white
space, and single and double quote, none of those (punctuation) characters
is likely to occur directly before something that can be considered a word
character. In fact, it is customary to have (white) space between those
characters and the word character to be uppercased, so there would never be
a match then.
"is likely"? "it is customary"? That's wishful thinking, just look at
some of the postings in this group (SCNR). You often see people omitting
the space after full stops or commas, for example:

"this is a sentence!and so is this,see?"

It may not be pretty, but there is no doubt that "and" and "see" are
both separate words, and thus should be capitalised.
I am afraid it would have to be rewritten entirely anyway.
I disagree, but YMMV.
- Conrad
Sep 23 '08 #10

P: n/a
On 2008-09-23 01:30, SAM wrote:
We never need to capitalize all words of a sentence in any case. It is a
spelling mistake otherwise of grammar in french.
Neither my example nor Thomas's were meant as complete implementations.
You'd need quite a bit more logic for that, and knowledge about the
input languages. You'd need at least a list of exceptions for words like
"et" and "de" in French, or "of" and "in" in English, and that would
make the RegExp approach impossible, or at least unmaintainable.

There are other, more important shortcomings, too: groups like "I'm" or
"we're" would be incorrectly capitalised. All in all, a perfect solution
isn't going to be posted here (at least not by me).
and your solution doesn't work for me

'l\'éléphant ça "trompe" énormément'.replace(/(^|\s)([a-z])/g,
function(m, p1, p2) {
return p1 + p2.toUpperCase();
});

result :
L'éléphant ça "trompe" énormément
To be fair, he did mention that the character class should be adapted.
If you use [a-z\u00DF-\u00FF] instead of [a-z], "ça" and "énormément"
will be capitalised as well.
- Conrad
Sep 23 '08 #11

P: n/a
Conrad Lender wrote:
On 2008-09-23 00:20, Thomas 'PointedEars' Lahn wrote:
[...]
>All it takes is a bit of research on the defined Unicode character ranges
and the scripts (as in writing) they provide support for.

... this is still quite an undertaking, and I wouldn't presume to
understand enough about, say, Mongolian or Burmese to decide which of
the characters could/should be converted to uppercase. There are over a
hundred scripts in the BMP, not including the symbol collections.
ISTM few writing systems have a concept of letter case, see below. (CMIIW)
>Take some Latin character ranges for example:

/[a-z\u00c0-\u00f6\u00f8-\u00ff\u0100-\u017f\u0180-\u01bf\u01c4-\u024f]/i

(This can be optimized, of course, but it helps [you] to get the picture.)

Again, I think I've already got a pretty good picture, but thanks for
the effort. Just to illustrate the pitfalls of your approach - out of
only 591 characters (basic latin to latin extended-b), you have

- included all the uppercase characters like À (U+00C0)
That was done on purpose, though, because although it should,
case-insensitive matching might not recognize the proper uppercase character
for a non-ASCII lowercase letter and vice-versa.
- included the × character (U+00D7) which is a symbol
ACK, I overlooked that one.
- used the "i" modifier, which is redundant because you have already
listed the exact code points that you want included
It is *not* redundant because it would definitely be supported for /[a-z]/.
That's for a group of characters that we're largely familiar with. Now,
to find out which of the characters in the more exotic groups are
lowercase letters, that would take more than just "a bit of research".

Perhaps somebody else has already collected all the interesting
character ranges, and we could use that information in our character
class,
<http://www.unicode.org/Public/UNIDATA/CaseFolding.txtlooks promising.
but why should we, if JavaScript's toUpperCase() already does the
right thing with all types of characters?
Iff it does. And that would still not mean anything for other implementations.
>>It would be simpler to define custom "word boundary" characters, and just
let JavaScript uppercase everything following them:
Would it? ISTM the punctuation of languages is a lot more complicated than
their letters; take Spanish, for example. But then ISTM capitalizing titles
is not something that is common in other languages than English, and some
even consider it deprecated there already. However, for uniformity one
might be inclined to apply this formatting to non-English (song) titles as
well; I have seen that before.

That's beside the point. For one thing, to a lesser extent, capitalising
the first letters in titles is also common in some Germanic languages,
in Italian, etc. More importantly, deciding which languages do or do not
use capitalisation, or have deprecated it, or are only using it for
certain words, is beyond what we can do in a simple script function;
that's up to the person requesting the functionality.
My point was that ISTM punctuation is more difficult to handle than letters.
>>var wBound = '\\s,.;:?!\'"';
var rex = new RegExp('(^|[' + wBound + '])([^' + wBound + '])', 'g');

v1 = v1.replace(rex, function (s, g1, g2) {
return g1 + g2.toUpperCase();
});
That does not make much sense, though, since with the exception of white
space, and single and double quote, none of those (punctuation) characters
is likely to occur directly before something that can be considered a word
character. In fact, it is customary to have (white) space between those
characters and the word character to be uppercased, so there would never be
a match then.

"is likely"? "it is customary"? That's wishful thinking, just look at
some of the postings in this group (SCNR). You often see people omitting
the space after full stops or commas, for example:

"this is a sentence!and so is this,see?"

It may not be pretty, but there is no doubt that "and" and "see" are
both separate words, and thus should be capitalised.
Non sequitur; one should only capitalize properly written text. YMMV.
PointedEars
--
Use any version of Microsoft Frontpage to create your site.
(This won't prevent people from viewing your source, but no one
will want to steal it.)
-- from <http://www.vortex-webdesign.com/help/hidesource.htm>
Sep 23 '08 #12

P: n/a
SAM
Conrad Lender a écrit :
On 2008-09-23 01:30, SAM wrote:
>We never need to capitalize all words of a sentence in any case. It is a
spelling mistake otherwise of grammar in french.

Neither my example nor Thomas's were meant as complete implementations.
Yes. It was just a way of saying.
All in all, a perfect solution isn't going to be posted here
(at least not by me).
>and your solution doesn't work for me

'l\'éléphant ça "trompe" énormément'.replace(/(^|\s)([a-z])/g,
function(m, p1, p2) {
return p1 + p2.toUpperCase();
});

result :
L'éléphant ça "trompe" énormément

To be fair, he did mention that the character class should be adapted.
And that should not have to be with your solution.
(except with charset utf-8 in my Fx in quirksmode, and I do not really
understand why)

If you use [a-z\u00DF-\u00FF] instead of [a-z], "ça" and "énormément"
will be capitalised as well.
a little better :
L'éléphant Ça "trompe" Énormément
-----^------------^

Oui mais ça fait pas propre ces \u... ou \x...
This JS can't find by itself correct corresponding unicodes ?

--
sm
Sep 23 '08 #13

This discussion thread is closed

Replies have been disabled for this discussion.