473,836 Members | 1,362 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Splitting string into word array - regular expression

Hi,
What regex do I need to split a string, using javascript's split
method, into words-array?
Splitting accroding to whitespaces only is not enough, I need to split
according to whitespace, comma, hyphen, etc...
Is there a regex that does the trick?
Thanks, Anat.

May 25 '06 #1
7 25794
Anat wrote:
Hi,
What regex do I need to split a string, using javascript's split
method, into words-array?
Of course, that depends on how you define a word.

Splitting accroding to whitespaces only is not enough, I need to split
according to whitespace, comma, hyphen, etc...
Is there a regex that does the trick?


To split at one or more non-word characters (basically any character
other than a letter or number):

var words = string.split(/\W+/);

--
Rob
Group FAQ: <URL:http://www.jibbering.c om/faq/>
May 25 '06 #2
RobG wrote:
Anat wrote:
Hi,
What regex do I need to split a string, using javascript's split
method, into words-array?


Of course, that depends on how you define a word.

Splitting accroding to whitespaces only is not enough, I need to split
according to whitespace, comma, hyphen, etc...
Is there a regex that does the trick?


To split at one or more non-word characters (basically any character
other than a letter or number):

var words = string.split(/\W+/);


Not all browsers will tolerate regular expressions in split(), it may be
safer to replace all non-word characters with a space then split on that:

var newString = string.replace(/\W+/g,' ');
var words = newString.split (' ');
For the OP to consider...

--
Zif
May 25 '06 #3
Thanks guys,
But actually, when I come to think of it, it's not a good solution for
what I'm trying to do.
I want to take a given string, and make certain words hyperlinks.
For example:
"Hello world, this is a wonderful day!"
I'd like the words world, wonderful and day to be hyperlinks, therefore
after my manipulation it should be:
"Hello <a href=...>world</a>, this is a <a href=...>wonder ful</a> <a
href=...>day</a>!"
Using split method is not good, because the whitespaces, commas and
other punctuation marks are gone.
Instead of displaying
"Hello <a href=...>world</a>, this is a <a href=...>wonder ful</a> <a
href=...>day</a>!"
I will display
"Hello <a href=...>world</a> this is a <a href=...>wonder ful</a> <a
href=...>day</a>"
(note that the comma and exclamation mark are gone).
Any ideas on how I can locate words, replace them but not loose
punctuation marks on the way?
Thanks again!!!

May 25 '06 #4
Zifud <zi*@yahoo.co m> writes:
Not all browsers will tolerate regular expressions in split(),


Can you mention one that doesn't that is more recent that Netscape 3?
I can see that both IE 4 and Netscape 4.80 does support it.

/L
--
Lasse Reichstein Nielsen - lr*@hotpop.com
DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleD OM.html>
'Faith without judgement merely degrades the spirit divine.'
May 25 '06 #5
RobG wrote:
Anat wrote:
What regex do I need to split a string, using javascript's split
method, into words-array?


Of course, that depends on how you define a word.


Exactly.
Splitting accroding to whitespaces only is not enough, I need to split
according to whitespace, comma, hyphen, etc...
Is there a regex that does the trick?


To split at one or more non-word characters (basically any character
other than a letter or number):

var words = string.split(/\W+/);


Therefore, one seldom wants that (considering Unicode word characters that
match \W), and probably the OP does not. They are looking for character
classes instead:

var s = [
"Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do",
"eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim",
"ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut",
"aliquip ex ea commodo consequat. Duis aute irure dolor in",
"reprehende rit in voluptate velit esse cillum dolore eu fugiat nulla",
"pariatur. Excepteur sint occaecat cupidatat non proident, sunt in",
"culpa qui officia deserunt mollit anim id est laborum."
].join(" ");

window.alert(s) ;

// "etc." not included
var words = s.split(/[\s,-]+/);

window.alert(wo rds.join(" | "));
PointedEars
May 26 '06 #6
Zifud wrote:
RobG wrote:
Anat wrote:
Splitting accroding to whitespaces only is not enough, I need to split
according to whitespace, comma, hyphen, etc...
Is there a regex that does the trick? To split at one or more non-word characters (basically any character
other than a letter or number):

var words = string.split(/\W+/);


Not all browsers will tolerate regular expressions in split(),


The RegExp object and Regular Expression literals were introduced with
JavaScript 1.2 (NN 4.0, June 1997), and JScript 3.0 (IE 4.0, October 1997).

Since then, the ECMA WG has produced two more editions of ECMAScript, where
Edition 3 (December 1999, March 2000) (finally) formally specified that
feature. No scriptable user agent can survive in the mid-term without
supporting it nowadays.

I'd say your information is /slightly/ outdated.
it may be safer to replace all non-word characters with a space then split
on that:
Unlikely.
var newString = string.replace(/\W+/g,' ');
That does not recognize "Ãœberlandstraà Ÿe" as one word ...
var words = newString.split (' ');
.... and makes ["", "berlandstr a", "e"] out of it.
For the OP to consider...


.... and to reject.
PointedEars
--
But he had not that supreme gift of the artist, the knowledge of
when to stop.
-- Sherlock Holmes in Sir Arthur Conan Doyle's
"The Adventure of the Norwood Builder"
May 26 '06 #7
Anat wrote:
I want to take a given string, and make certain words hyperlinks.
For example:
"Hello world, this is a wonderful day!"
I'd like the words world, wonderful and day to be hyperlinks, therefore
after my manipulation it should be:
"Hello <a href=...>world</a>, this is a <a href=...>wonder ful</a> <a
href=...>day</a>!"
Using split method is not good, because the whitespaces, commas and
other punctuation marks are gone.
[...]
Any ideas on how I can locate words, replace them but not loose
punctuation marks on the way?
From your use of the `a' element, I assume this is for `innerHTML'.
Please note that this property is proprietary, and its behavior is
both implementation-dependent and context-dependent.

You could use \b of course, but that will get you in trouble with
words containing non-ASCII characters. Therefore:

var s = ...innerHTML;
s = s.replace(
/(^|[\s-])(world|wonderf ul|day)([\s,;.?!-]|$)/g,
"$1<a href="http://en.wikipedia.or g/wiki/$2">$2<\/a>$3");
...innerHTML = s;

Or with positive lookahead (requires JavaScript 1.5, JScript 5.5,
ECMAScript Ed. 3 [1]):

...
s = s.replace(
/([\s-]|^)(world|wonde rful|day)(?=([\s,;.?!-]|$))/g,
'$1<a href="http://en.wikipedia.or g/wiki/$2">$2<\/a>');
...

(Use those character classes, unless you want to code all UCS
[non-]word characters as compactly defined in the XML grammar.)

I can remember to have suggested a probably more sophisticated replacing
approach a few months ago already, that also points out the difficulties
with general replacing. Search the (Google Groups) archives for "IBM
replace author:PointedE ars" or so.

When implementing this, you should additionally take into account that too
many hyperlinks in continuous text can make that text hardly legible.
Thanks again!!!


You are welcome. But please get your Exclamation Mark key repaired.
PointedEars
___________
[1] <URL:http://pointedears.de/es-matrix>
--
Indiana Jones: The Name of God. Jehovah.
Professor Henry Jones: But in the Latin alphabet,
"Jehovah" begins with an "I".
Indiana Jones: J-...
May 26 '06 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
6485
by: Babu Mannaravalappil | last post by:
Hi, I want to replace some words in my text files (actually transpose). For example, I have a whole lot of expressions (words) in my files as follows: TABLECUSTOMERS TABLEORDERS TABLEORDERLINES ....
0
1807
by: Mohee | last post by:
In VB.NET I am trying to create a regular expression that will validate any string as long as it does not contain a specified string. For example, I want to match any word that does not contain the following strings: "--" or "/*" My main purpose is to not allow string that have Oracle comments in them.
6
7171
by: Friso Wiskerke | last post by:
Hi All, I'm trying to find a way to validate a string variable in the code behind with a regular expression. I know there are validator controls but these all work with the ControlToValidate property and I don't have a control, just a string variable. Can anyone help me out? TIA,
5
1103
by: Kerry | last post by:
Please help. I need a regular expression that parses a stream of up to 450 characters into 15 separate strings of up to 30 characters each. The regex must break at newlines. Ideally, the regex will "word wrap" that is, not break in the middle of words. I have the following: (?m:(?:(.)?){1,30}\s\n?){1,15}? This works well as long as the user leaves at least one whitespace character
2
2840
by: Rahul | last post by:
Hey Guys I have a development environment, in which the whole SQL syntax is stored in the Database. So the syntax in the databse column could be "where BirthDate = '12/31/2005' and ID = 345" Note : The above string is stored and the dates are in US format. so
2
8715
by: bluebeta | last post by:
Hi, I am using embedded visual C++ 4.0 and I want to use function strtok to split string into array. my sample code is as below: ---------------------------------------------------------------- void CNEWSERIALNO::OnBInsert() { // TODO: Add your control notification handler code here
3
2568
by: Zach | last post by:
Hello, Please forgive if this is not the most appropriate newsgroup for this question. Unfortunately I didn't find a newsgroup specific to regular expressions. I have the following regular expression. ^(.+?) uses (?!a spoon)\.$
3
307
by: focus | last post by:
Hi, I'm wanting to return a match on a string using a regular expression where the value contains certain words but also doesn't include certain words. My string looks like this: Toyota Mazda Ford Honda Holden
2
7748
by: Sal Sal | last post by:
If I have a string as follows XXXasdf23s5\r\n asdflkoirfn329i4\r\n sef29384ewrj28039\r\n XXX123sd3t334\r\n sdorfu23984rr\r\n sdflk2893rjf\r\n weirj2983jhwer2398\r\n XXX12356789\r\n
0
9825
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9672
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10560
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10601
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9388
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7794
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5653
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
4460
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
3116
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.