473,320 Members | 1,961 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

How to Parse Sentences into Words

Hi all,

I have a table of text and associated data. I want to break apart the text
into individual words, yet retain the data in other columns. For example:

Sentence: Chapter:
--------------------------
I like cats. 1
Joe likes dogs. 2

Should become:

Word: Chapter:
--------------------------
I 1
like 1
cats 1
Joe 2
likes 2
dogs. 2

Are there built-in SQL parsing functions, If not, what text handling
features would be most useful for building them?

Thanks!
Jul 20 '05 #1
7 17369
"HumanJHawkins" <JH******@HumanitiesSoftware.Com> wrote in message news:<Ml*****************@newsread1.news.pas.earth link.net>...
Hi all,

I have a table of text and associated data. I want to break apart the text
into individual words, yet retain the data in other columns. For example:

Sentence: Chapter:
--------------------------
I like cats. 1
Joe likes dogs. 2

Should become:

Word: Chapter:
--------------------------
I 1
like 1
cats 1
Joe 2
likes 2
dogs. 2

Are there built-in SQL parsing functions, If not, what text handling
features would be most useful for building them?

Thanks!


Here's a recursive solution:
CREATE PROCEDURE Split (@Sentence AS VARCHAR(1024), @CHAPTER AS
INTEGER) AS
BEGIN
DECLARE @Position AS INTEGER, @Str AS VARCHAR(50)

SET @Position = CHARINDEX(' ', @Sentence)

IF (@Position = 0)
INSERT INTO Words (Word, Chapter) VALUES (@Sentence, @Chapter)
ELSE
BEGIN
SET @Str = LEFT(@Sentence, @Position-1)
INSERT INTO Words (Word, Chapter) VALUES (@Str, @Chapter)
SET @Str = RIGHT(@Sentence, LEN(@Sentence) - @Position)
EXEC Split @Str, @Chapter
END
END

This solution only works for sentences of 32 words or fewer since that
is the maximum nesting depth for MS-SQLServer.

Joe Melville
Jul 20 '05 #2
Wow. This looks like a great way of handling it. If anything is longer than
32 words, I should be able to find a way of stopping at 32 and running
another pass later. I'll post the final code if it adds anything to what you
have written here.

Thanks!

"Joe Melville" <je********@yahoo.com> wrote in message
news:2b**************************@posting.google.c om...
"HumanJHawkins" <JH******@HumanitiesSoftware.Com> wrote in message

news:<Ml*****************@newsread1.news.pas.earth link.net>...
Hi all,

I have a table of text and associated data. I want to break apart the text into individual words, yet retain the data in other columns. For example:
Sentence: Chapter:
--------------------------
I like cats. 1
Joe likes dogs. 2

Should become:

Word: Chapter:
--------------------------
I 1
like 1
cats 1
Joe 2
likes 2
dogs. 2

Are there built-in SQL parsing functions, If not, what text handling
features would be most useful for building them?

Thanks!


Here's a recursive solution:
CREATE PROCEDURE Split (@Sentence AS VARCHAR(1024), @CHAPTER AS
INTEGER) AS
BEGIN
DECLARE @Position AS INTEGER, @Str AS VARCHAR(50)

SET @Position = CHARINDEX(' ', @Sentence)

IF (@Position = 0)
INSERT INTO Words (Word, Chapter) VALUES (@Sentence, @Chapter)
ELSE
BEGIN
SET @Str = LEFT(@Sentence, @Position-1)
INSERT INTO Words (Word, Chapter) VALUES (@Str, @Chapter)
SET @Str = RIGHT(@Sentence, LEN(@Sentence) - @Position)
EXEC Split @Str, @Chapter
END
END

This solution only works for sentences of 32 words or fewer since that
is the maximum nesting depth for MS-SQLServer.

Joe Melville

Jul 20 '05 #3
>Wow. This looks like a great way of handling it. If anything is longer than
32 words, I should be able to find a way of stopping at 32 and running
another pass later. I'll post the final code if it adds anything to what you
have written here.

Thanks!


drop table tmpTable
Go
create Table tmpTable
(WordString VARCHAR(500))
Insert into tmpTable (WordString) Values ('Test of some words')
Insert into tmpTable (WordString) Values ('Another Test of some words')
Go
Drop Table tmpwords
Go
Create Table tmpWords
(Words VARCHAR(500))

BEGIN TRAN
DECLARE @PCStatus Integer
Declare @TmpWords VARCHAR(500)
Declare @newWord Varchar(500)

/* cursor to read each certificates ssn & certificateID */
DECLARE PCcursor CURSOR FOR
SELECT WordString
FROM TmpTable
OPEN PCcursor
/* read the 1st row */
SET @PCStatus = 0
While @PCStatus = 0
Begin
FETCH NEXT FROM PCcursor INTO @TmpWords
SET @PCStatus = @@Fetch_status
IF @PCstatus <> 0
break

Set @TmpWords = @tmpWords + ' '
WHILE CHARINDEX(' ', @TmpWords) > 0
Begin
SET @NewWord = LEFT(@TmpWords, CHARINDEX(' ', @TmpWords))
SET @TmpWords = SUBSTRING(@TmpWords,CHARINDEX(' ', @TmpWords)+1,
len(@TmpWOrds))
insert into tmpWords (words) values (@NewWord)
End
End
Close PcCursor
commit tran
deallocate PCcursor

select * from tmpwords

Randy
http://members.aol.com/rsmeiner
Jul 20 '05 #4

"HumanJHawkins" <JH******@HumanitiesSoftware.Com> wrote in message
news:Ml*****************@newsread1.news.pas.earthl ink.net...
Hi all,

I have a table of text and associated data. I want to break apart the text
into individual words, yet retain the data in other columns. For example:

Sentence: Chapter:
--------------------------
I like cats. 1
Joe likes dogs. 2

Should become:

Word: Chapter:
--------------------------
I 1
like 1
cats 1
Joe 2
likes 2
dogs. 2

Are there built-in SQL parsing functions, If not, what text handling
features would be most useful for building them?

Thanks!


There's no built-in-in SPLIT() function or whatever, but this may help:

http://www.sommarskog.se/arrays-in-sql.html#iterative

In general, the string functions are limited in SQL, and it's often easier
to dump the data to a flat file and use an external script to do text
manipulation, especially if you need advanced features such as regular
expression support.

Simon
Jul 20 '05 #5
"HumanJHawkins" <JH******@HumanitiesSoftware.Com> wrote in message news:<Ml*****************@newsread1.news.pas.earth link.net>...
Hi all,

I have a table of text and associated data. I want to break apart the text
into individual words, yet retain the data in other columns. For example:

Sentence: Chapter:
--------------------------
I like cats. 1
Joe likes dogs. 2

Should become:

Word: Chapter:
--------------------------
I 1
like 1
cats 1
Joe 2
likes 2
dogs. 2

Are there built-in SQL parsing functions, If not, what text handling
features would be most useful for building them?

Thanks!


Here's a recursive solution:
CREATE PROCEDURE Split (@Sentence AS VARCHAR(1024), @CHAPTER AS
INTEGER) AS
BEGIN
DECLARE @Position AS INTEGER, @Str AS VARCHAR(50)

SET @Position = CHARINDEX(' ', @Sentence)

IF (@Position = 0)
INSERT INTO Words (Word, Chapter) VALUES (@Sentence, @Chapter)
ELSE
BEGIN
SET @Str = LEFT(@Sentence, @Position-1)
INSERT INTO Words (Word, Chapter) VALUES (@Str, @Chapter)
SET @Str = RIGHT(@Sentence, LEN(@Sentence) - @Position)
EXEC Split @Str, @Chapter
END
END

This solution only works for sentences of 32 words or fewer since that
is the maximum nesting depth for MS-SQLServer.

Joe Melville
Jul 20 '05 #6
Wow. This looks like a great way of handling it. If anything is longer than
32 words, I should be able to find a way of stopping at 32 and running
another pass later. I'll post the final code if it adds anything to what you
have written here.

Thanks!

"Joe Melville" <je********@yahoo.com> wrote in message
news:2b**************************@posting.google.c om...
"HumanJHawkins" <JH******@HumanitiesSoftware.Com> wrote in message

news:<Ml*****************@newsread1.news.pas.earth link.net>...
Hi all,

I have a table of text and associated data. I want to break apart the text into individual words, yet retain the data in other columns. For example:
Sentence: Chapter:
--------------------------
I like cats. 1
Joe likes dogs. 2

Should become:

Word: Chapter:
--------------------------
I 1
like 1
cats 1
Joe 2
likes 2
dogs. 2

Are there built-in SQL parsing functions, If not, what text handling
features would be most useful for building them?

Thanks!


Here's a recursive solution:
CREATE PROCEDURE Split (@Sentence AS VARCHAR(1024), @CHAPTER AS
INTEGER) AS
BEGIN
DECLARE @Position AS INTEGER, @Str AS VARCHAR(50)

SET @Position = CHARINDEX(' ', @Sentence)

IF (@Position = 0)
INSERT INTO Words (Word, Chapter) VALUES (@Sentence, @Chapter)
ELSE
BEGIN
SET @Str = LEFT(@Sentence, @Position-1)
INSERT INTO Words (Word, Chapter) VALUES (@Str, @Chapter)
SET @Str = RIGHT(@Sentence, LEN(@Sentence) - @Position)
EXEC Split @Str, @Chapter
END
END

This solution only works for sentences of 32 words or fewer since that
is the maximum nesting depth for MS-SQLServer.

Joe Melville

Jul 20 '05 #7
>Wow. This looks like a great way of handling it. If anything is longer than
32 words, I should be able to find a way of stopping at 32 and running
another pass later. I'll post the final code if it adds anything to what you
have written here.

Thanks!


drop table tmpTable
Go
create Table tmpTable
(WordString VARCHAR(500))
Insert into tmpTable (WordString) Values ('Test of some words')
Insert into tmpTable (WordString) Values ('Another Test of some words')
Go
Drop Table tmpwords
Go
Create Table tmpWords
(Words VARCHAR(500))

BEGIN TRAN
DECLARE @PCStatus Integer
Declare @TmpWords VARCHAR(500)
Declare @newWord Varchar(500)

/* cursor to read each certificates ssn & certificateID */
DECLARE PCcursor CURSOR FOR
SELECT WordString
FROM TmpTable
OPEN PCcursor
/* read the 1st row */
SET @PCStatus = 0
While @PCStatus = 0
Begin
FETCH NEXT FROM PCcursor INTO @TmpWords
SET @PCStatus = @@Fetch_status
IF @PCstatus <> 0
break

Set @TmpWords = @tmpWords + ' '
WHILE CHARINDEX(' ', @TmpWords) > 0
Begin
SET @NewWord = LEFT(@TmpWords, CHARINDEX(' ', @TmpWords))
SET @TmpWords = SUBSTRING(@TmpWords,CHARINDEX(' ', @TmpWords)+1,
len(@TmpWOrds))
insert into tmpWords (words) values (@NewWord)
End
End
Close PcCursor
commit tran
deallocate PCcursor

select * from tmpwords

Randy
http://members.aol.com/rsmeiner
Jul 20 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
by: Mosher | last post by:
Hi all, I was wondering if php can parse a text string for certain words and return "true" if that word is found. For example, I have a string like this: $string = "The rain in spain is the same...
4
by: Tony | last post by:
Hello, Can someone please point me toward a regular expression that goes through a string and contructs a list of sentences and part sentences, where words are gradually dropped from the front...
1
by: HumanJHawkins | last post by:
Hi all, I have a table of text and associated data. I want to break apart the text into individual words, yet retain the data in other columns. For example: Sentence: Chapter:...
15
by: Randall Parker | last post by:
I've noticed when exporting from Microsoft Word XP into an HTML file that Word uses a span style of mso-spacerun: yes. This has the effect of making there be about 2 spaces between sentences. So...
12
by: effendi | last post by:
Hi can anyone tell me what is the best way to determine the number of sentences that someone enter into a text area? Thanks in advanced.
46
by: vvk4 | last post by:
I have an excel spreadsheet that I need to parse. I was thinking of saving this as a CSV file. And then reading the file using C. The actual in EXCEL looks like: a,b a"b a","b a,",b In CSV...
3
by: Bob | last post by:
What I want to do is write a program that reads through a Word Document, finds certain words or sentences I want, and then paste into an Excel spreadsheet. I dont know much about C#. But I...
4
by: otto | last post by:
Dear All Can somebody help me how to send a word or sentences via only the parallel port to another PC directly?
17
by: Umesh | last post by:
Please try to do it while I try myself!
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.