473,396 Members | 2,158 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Splitting string into word array - regular expression

Hi,
What regex do I need to split a string, using javascript's split
method, into words-array?
Splitting accroding to whitespaces only is not enough, I need to split
according to whitespace, comma, hyphen, etc...
Is there a regex that does the trick?
Thanks, Anat.

May 25 '06 #1
7 25763
Anat wrote:
Hi,
What regex do I need to split a string, using javascript's split
method, into words-array?
Of course, that depends on how you define a word.

Splitting accroding to whitespaces only is not enough, I need to split
according to whitespace, comma, hyphen, etc...
Is there a regex that does the trick?


To split at one or more non-word characters (basically any character
other than a letter or number):

var words = string.split(/\W+/);

--
Rob
Group FAQ: <URL:http://www.jibbering.com/faq/>
May 25 '06 #2
RobG wrote:
Anat wrote:
Hi,
What regex do I need to split a string, using javascript's split
method, into words-array?


Of course, that depends on how you define a word.

Splitting accroding to whitespaces only is not enough, I need to split
according to whitespace, comma, hyphen, etc...
Is there a regex that does the trick?


To split at one or more non-word characters (basically any character
other than a letter or number):

var words = string.split(/\W+/);


Not all browsers will tolerate regular expressions in split(), it may be
safer to replace all non-word characters with a space then split on that:

var newString = string.replace(/\W+/g,' ');
var words = newString.split(' ');
For the OP to consider...

--
Zif
May 25 '06 #3
Thanks guys,
But actually, when I come to think of it, it's not a good solution for
what I'm trying to do.
I want to take a given string, and make certain words hyperlinks.
For example:
"Hello world, this is a wonderful day!"
I'd like the words world, wonderful and day to be hyperlinks, therefore
after my manipulation it should be:
"Hello <a href=...>world</a>, this is a <a href=...>wonderful</a> <a
href=...>day</a>!"
Using split method is not good, because the whitespaces, commas and
other punctuation marks are gone.
Instead of displaying
"Hello <a href=...>world</a>, this is a <a href=...>wonderful</a> <a
href=...>day</a>!"
I will display
"Hello <a href=...>world</a> this is a <a href=...>wonderful</a> <a
href=...>day</a>"
(note that the comma and exclamation mark are gone).
Any ideas on how I can locate words, replace them but not loose
punctuation marks on the way?
Thanks again!!!

May 25 '06 #4
Zifud <zi*@yahoo.com> writes:
Not all browsers will tolerate regular expressions in split(),


Can you mention one that doesn't that is more recent that Netscape 3?
I can see that both IE 4 and Netscape 4.80 does support it.

/L
--
Lasse Reichstein Nielsen - lr*@hotpop.com
DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
'Faith without judgement merely degrades the spirit divine.'
May 25 '06 #5
RobG wrote:
Anat wrote:
What regex do I need to split a string, using javascript's split
method, into words-array?


Of course, that depends on how you define a word.


Exactly.
Splitting accroding to whitespaces only is not enough, I need to split
according to whitespace, comma, hyphen, etc...
Is there a regex that does the trick?


To split at one or more non-word characters (basically any character
other than a letter or number):

var words = string.split(/\W+/);


Therefore, one seldom wants that (considering Unicode word characters that
match \W), and probably the OP does not. They are looking for character
classes instead:

var s = [
"Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do",
"eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim",
"ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut",
"aliquip ex ea commodo consequat. Duis aute irure dolor in",
"reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla",
"pariatur. Excepteur sint occaecat cupidatat non proident, sunt in",
"culpa qui officia deserunt mollit anim id est laborum."
].join(" ");

window.alert(s);

// "etc." not included
var words = s.split(/[\s,-]+/);

window.alert(words.join(" | "));
PointedEars
May 26 '06 #6
Zifud wrote:
RobG wrote:
Anat wrote:
Splitting accroding to whitespaces only is not enough, I need to split
according to whitespace, comma, hyphen, etc...
Is there a regex that does the trick? To split at one or more non-word characters (basically any character
other than a letter or number):

var words = string.split(/\W+/);


Not all browsers will tolerate regular expressions in split(),


The RegExp object and Regular Expression literals were introduced with
JavaScript 1.2 (NN 4.0, June 1997), and JScript 3.0 (IE 4.0, October 1997).

Since then, the ECMA WG has produced two more editions of ECMAScript, where
Edition 3 (December 1999, March 2000) (finally) formally specified that
feature. No scriptable user agent can survive in the mid-term without
supporting it nowadays.

I'd say your information is /slightly/ outdated.
it may be safer to replace all non-word characters with a space then split
on that:
Unlikely.
var newString = string.replace(/\W+/g,' ');
That does not recognize "Überlandstraße" as one word ...
var words = newString.split(' ');
.... and makes ["", "berlandstra", "e"] out of it.
For the OP to consider...


.... and to reject.
PointedEars
--
But he had not that supreme gift of the artist, the knowledge of
when to stop.
-- Sherlock Holmes in Sir Arthur Conan Doyle's
"The Adventure of the Norwood Builder"
May 26 '06 #7
Anat wrote:
I want to take a given string, and make certain words hyperlinks.
For example:
"Hello world, this is a wonderful day!"
I'd like the words world, wonderful and day to be hyperlinks, therefore
after my manipulation it should be:
"Hello <a href=...>world</a>, this is a <a href=...>wonderful</a> <a
href=...>day</a>!"
Using split method is not good, because the whitespaces, commas and
other punctuation marks are gone.
[...]
Any ideas on how I can locate words, replace them but not loose
punctuation marks on the way?
From your use of the `a' element, I assume this is for `innerHTML'.
Please note that this property is proprietary, and its behavior is
both implementation-dependent and context-dependent.

You could use \b of course, but that will get you in trouble with
words containing non-ASCII characters. Therefore:

var s = ...innerHTML;
s = s.replace(
/(^|[\s-])(world|wonderful|day)([\s,;.?!-]|$)/g,
"$1<a href="http://en.wikipedia.org/wiki/$2">$2<\/a>$3");
...innerHTML = s;

Or with positive lookahead (requires JavaScript 1.5, JScript 5.5,
ECMAScript Ed. 3 [1]):

...
s = s.replace(
/([\s-]|^)(world|wonderful|day)(?=([\s,;.?!-]|$))/g,
'$1<a href="http://en.wikipedia.org/wiki/$2">$2<\/a>');
...

(Use those character classes, unless you want to code all UCS
[non-]word characters as compactly defined in the XML grammar.)

I can remember to have suggested a probably more sophisticated replacing
approach a few months ago already, that also points out the difficulties
with general replacing. Search the (Google Groups) archives for "IBM
replace author:PointedEars" or so.

When implementing this, you should additionally take into account that too
many hyperlinks in continuous text can make that text hardly legible.
Thanks again!!!


You are welcome. But please get your Exclamation Mark key repaired.
PointedEars
___________
[1] <URL:http://pointedears.de/es-matrix>
--
Indiana Jones: The Name of God. Jehovah.
Professor Henry Jones: But in the Latin alphabet,
"Jehovah" begins with an "I".
Indiana Jones: J-...
May 26 '06 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Babu Mannaravalappil | last post by:
Hi, I want to replace some words in my text files (actually transpose). For example, I have a whole lot of expressions (words) in my files as follows: TABLECUSTOMERS TABLEORDERS...
0
by: Mohee | last post by:
In VB.NET I am trying to create a regular expression that will validate any string as long as it does not contain a specified string. For example, I want to match any word that does not contain...
6
by: Friso Wiskerke | last post by:
Hi All, I'm trying to find a way to validate a string variable in the code behind with a regular expression. I know there are validator controls but these all work with the ControlToValidate...
5
by: Kerry | last post by:
Please help. I need a regular expression that parses a stream of up to 450 characters into 15 separate strings of up to 30 characters each. The regex must break at newlines. Ideally, the regex...
2
by: Rahul | last post by:
Hey Guys I have a development environment, in which the whole SQL syntax is stored in the Database. So the syntax in the databse column could be "where BirthDate = '12/31/2005' and ID =...
2
by: bluebeta | last post by:
Hi, I am using embedded visual C++ 4.0 and I want to use function strtok to split string into array. my sample code is as below: ...
3
by: Zach | last post by:
Hello, Please forgive if this is not the most appropriate newsgroup for this question. Unfortunately I didn't find a newsgroup specific to regular expressions. I have the following regular...
3
by: focus | last post by:
Hi, I'm wanting to return a match on a string using a regular expression where the value contains certain words but also doesn't include certain words. My string looks like this: Toyota...
2
by: Sal Sal | last post by:
If I have a string as follows XXXasdf23s5\r\n asdflkoirfn329i4\r\n sef29384ewrj28039\r\n XXX123sd3t334\r\n sdorfu23984rr\r\n sdflk2893rjf\r\n weirj2983jhwer2398\r\n XXX12356789\r\n
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.