SP to parse string adding synonimes from table

The easiest way for me to explain what I'm trying to do is show an example.

I have the following tables:

WORDS
WordID Word
1 1,000,000
2 a million
3 one million

WORDLINKS
WordID WordLinkID
1 2
1 3
2 1
2 3
3 1
3 2

I would then pass the following sentance into the SP
a million times a day

What the SP needs to do is parse the string and find all the phrases within that match the words in the WORDS table. It would then take the WordID and do a lookup in the WORDLINKS table to find all the possible alternatives/synonimes.

The result would be in the following form:
{a million|1,000,000|one million} times a day

the matched phrase has been wrapped in {} with each matched alternatives/synonimes seperated by a pipe |.

there would be multiple phrases in each input string and possibly nested words
eg in the example above the phrase 'one million', 'one' could have an alternative of '1' in which case the output would be:
{a million|1,000,000|{one|1} million} times a day

i hope this makes sense. any help would really be appreciated.

Thanks

Josh

Aug 9 '07 #1

Subscribe Post Reply

1954

azimmer

200

Expert 100+

It's going to be a long one... (But a promise is a promise.) What makes it a bit more difficult is the nested replacements (whose proper handling I'm still not 100% sure of):

have it greedy or non-greedy (if you have replacements both for 'one million' and 'one' which one has precedence)?
If any replacement would result in an expanded string that could be further replaced, what to do? (I.e.: replacements: '1'->'one', 'one million'->'a million', what should the output be for '1 million'? And if replacements are '1'->'one', 'one million'->'1,000,000', what should the output be for '1 million'?)

The code below is greedy and processes nested replacements only to the length of the replaced term and not including the term being replaced (see examples at the end). It has two functions because the SQL Server wouldn't take the default value (don't know why) and I've had no more time to play around with it. You only need to use the one without the underscore.

So here goes the code:

Expand|Select|Wrap|Line Numbers

 
create function _syno (@instr as varchar(1000), @term_replace as bit = 0)

returns varchar(1000)

as

begin

declare @pos as int

declare @current_str as varchar(1000)

declare @syno_term as varchar(1000)

declare @replaced_length as int

declare @outstr as varchar(1000)

set @outstr=''
 
set @pos=1

while @pos<=len(@instr)

begin

    -- try from current position on

    set @current_str=right(@instr,len(@instr)-@pos+1)

    declare current_replacements cursor local for

        -- this is tricky

        select distinct top 1 with ties w.word [syno_term], len(w2.word) [replaced_length]

        from words w, wordlinks l, words w2

        where w.wordid=l.wordlinkid

        and (l.wordid=w2.wordid)

        and charindex(w2.word,@current_str)=1

        and (@term_replace=0 or (w2.word<>@current_str))

        order by len(w2.word) desc

    open current_replacements

    fetch next from current_replacements into @syno_term, @replaced_length

    if @@fetch_status = 0    -- there are replacements

    begin

        set @outstr=@outstr+'{'+left(@current_str,@replaced_length)

        -- iterate on current set of replacements (recursivity)

        while @@fetch_status = 0

        begin

            set @syno_term=dbo._syno(@syno_term, 1)

            set @outstr=@outstr+'|'+@syno_term

            fetch next from current_replacements into @syno_term, @replaced_length

        end

        set @outstr=@outstr+'}'

        set @pos=@pos+@replaced_length

    end

    else

    begin

        set @outstr=@outstr+left(@current_str,1)

        set @pos=@pos+1

    end

    deallocate current_replacements

end

return @outstr 

end

GO
 
create function syno (@instr as varchar(1000))

returns varchar(1000)

as

begin

return dbo._syno(@instr,0)

end

GO

Note: I found the easiest way to get the code itself (without line numbers) is to "Reply" to the message and copy-paste whatever is between '[ CODE]' and '[ /CODE]'.

My WORDS and WORDLINKS tables are as follows:

Expand|Select|Wrap|Line Numbers

 
WordID      Word                                               

----------- -------------------

1           1,000,000

2           a million

3           one million

4           one

5           1
 
WordID      WordLinkID  

----------- ----------- 

1           2

1           3

2           1

2           3

3           1

3           2

4           5

Examples and usage:

Expand|Select|Wrap|Line Numbers

print dbo.syno('I hope you run it a million times')

results: I hope you run it {a million|1,000,000|{one|1} million} times

Expand|Select|Wrap|Line Numbers

print dbo.syno('I hope you run it one million times')

results: I hope you run it {one million|1,000,000|a million} times
(note how it will replace 'one million', and will process the 'one' in the 'one million' because that's the whole term to be replaced in the first place)

Expand|Select|Wrap|Line Numbers

print dbo.syno('I hope someone runs it one million times')

results: I hope some{one|1} runs it {one million|1,000,000|a million} times

Hope it helps. I don't say it's impossible to refine it (especially in the handling of nested replacements) but I believe it's a good start.

P.S.: It, indeed, would have been a rather steep learning curve from a very basic level...

Aug 10 '07 #2

inspireuk

This is ammmaazzing! I can't thank you enough :)

However :) just a couple more things if you have the time. Would it be possible only to match full words and with regards to the nested replacements i think i may just programmically recall the sp for small parts if needed so no recursion needed (sorry).

the precedence i think should be for the longer phrase.

let me know if there is anything else.

Josh

Aug 10 '07 #3

inspireuk

so i've been playing with what you gave me. i've changed it to a stored procedure as i'm no longer doing recursion. if you get a chance can you look at the code below - it does seem to be working but i'm not sure whether it is written the best way. Thanks again for all your time.

Expand|Select|Wrap|Line Numbers

 
ALTER PROCEDURE ParseString

    @instr as varchar(max)

AS
 
    declare @pos as int

    declare @current_str as varchar(max)

    declare @syno_term as varchar(max)

    declare @replaced_length as int

    declare @outstr as varchar(max)
 
    set @outstr = ''

    set @pos=1

    while @pos<=len(@instr)

    begin

        -- try from current position on

        set @current_str=right(@instr,len(@instr)-@pos+1)

        declare current_replacements cursor local for

            -- this is tricky

            select distinct top 1 with ties w.word [syno_term], len(w2.word) [replaced_length]

            from words w, LinkedWords l, words w2

            where w.wordid=l.LinkedWordID

            and (l.wordid=w2.wordid)

            and charindex(w2.word,@current_str)=1

            and right(left(@current_str, len(w2.word)+1),1) = ' '

            order by len(w2.word) desc

        open current_replacements

        fetch next from current_replacements into @syno_term, @replaced_length

        if @@fetch_status = 0    -- there are replacements

        begin

            set @outstr=@outstr+'{'+left(@current_str,@replaced_length)

            -- iterate on current set of replacements (recursivity)

            while @@fetch_status = 0

            begin

                --set @syno_term=ammuser._syno(@syno_term, 1)

                set @outstr=@outstr+'|'+@syno_term

                fetch next from current_replacements into @syno_term, @replaced_length

            end

            set @outstr=@outstr+'} '

            set @pos=@pos+@replaced_length+1

        end

        else

        begin

            if charindex(' ',@current_str) = 0

                begin

                    set @outstr=@outstr+@current_str

                    set @pos=@pos+len(@current_str)                

                end

            else

                begin

                    set @outstr=@outstr+left(@current_str,charindex(' ',@current_str))

                    set @pos=@pos+charindex(' ',@current_str)

                end

        end

        deallocate current_replacements

    end

    SELECT @outstr

Aug 13 '07 #4

azimmer

200

Expert 100+

so i've been playing with what you gave me. i've changed it to a stored procedure as i'm no longer doing recursion. if you get a chance can you look at the code below - it does seem to be working but i'm not sure whether it is written the best way. Thanks again for all your time.

Expand|Select|Wrap|Line Numbers

ALTER PROCEDURE ParseString

    @instr as varchar(max)

AS

    declare @pos as int

    declare @current_str as varchar(max)

    declare @syno_term as varchar(max)

    declare @replaced_length as int

    declare @outstr as varchar(max)

    set @outstr = ''

    set @pos=1

    while @pos<=len(@instr)

    begin

        -- try from current position on

        set @current_str=right(@instr,len(@instr)-@pos+1)

        declare current_replacements cursor local for

            -- this is tricky

            select distinct top 1 with ties w.word [syno_term], len(w2.word) [replaced_length]

            from words w, LinkedWords l, words w2

            where w.wordid=l.LinkedWordID

            and (l.wordid=w2.wordid)

            and charindex(w2.word,@current_str)=1

            and right(left(@current_str, len(w2.word)+1),1) = ' '

            order by len(w2.word) desc

        open current_replacements

        fetch next from current_replacements into @syno_term, @replaced_length

        if @@fetch_status = 0    -- there are replacements

        begin

            set @outstr=@outstr+'{'+left(@current_str,@replaced_length)

            -- iterate on current set of replacements (recursivity)

            while @@fetch_status = 0

            begin

                --set @syno_term=ammuser._syno(@syno_term, 1)

                set @outstr=@outstr+'|'+@syno_term

                fetch next from current_replacements into @syno_term, @replaced_length

            end

            set @outstr=@outstr+'} '

            set @pos=@pos+@replaced_length+1

        end

        else

        begin

            if charindex(' ',@current_str) = 0

                begin

                    set @outstr=@outstr+@current_str

                    set @pos=@pos+len(@current_str)

                end

            else

                begin

                    set @outstr=@outstr+left(@current_str,charindex(' ',@current_str))

                    set @pos=@pos+charindex(' ',@current_str)

                end

        end

        deallocate current_replacements

    end

    SELECT @outstr

Is good that you have the ambition and time to play around with it as I have very little time at the moment. Still I've been thinking about whole word replacements. One of the tough issues is that you need to have a word terminator at both ends of the term to be replaced - and word terminators come in many forms (not only space, but period, comma, semicolon, dash, new line, tab, etc as well - perhaps whatever is not a letter (but then what about '@' and digits?).) It should somehow be taken care of in the cursor select to make sure no false replacements are signaled (an that's not easy). Alternatively it may be possible to modify the select so that it sometimes gives false possibilities and then check and disregard them (but that's ain't easy either).

Slowly we're venturing into the realms of grep and awk - implemented in a SQL Server SP :-) (Just kidding; we're miles away.)

Aug 14 '07 #5

by: N | last post by:

Hi, I would like to parse out each value that is seperated by a comma in a field and use that value to join to another table. What would be the easiest way to do so without having to write a...

Microsoft SQL Server

How to Parse a CSV formatted text file

by: Ram Laxman | last post by:

Hi all, I have a text file which have data in CSV format. "empno","phonenumber","wardnumber" 12345,2234353,1000202 12326,2243653,1000098 Iam a beginner of C/C++ programming. I don't know how to...

C / C++

How to parse various types without a switch?

by: | last post by:

Hi, I need to read a big CSV file, where different fields should be converted to different types, such as int, double, datetime, SqlMoney, etc. I have an array, which describes the fields and...

.NET Framework

XML parse/build metaphor mismatch

by: Steve Jorgensen | last post by:

I'm wondering if there's an approach to writing consistent code to read/write XML data in arbitrary order that I'm simply missing. It seems to be easy getting stuff -out- of a DOM via XPath, but...

.NET Framework

fastest way to parse a file; Most efficient way to store the data?

by: hoopsho | last post by:

Hi Everyone, I am trying to write a program that does a few things very fast and with efficient use of memory... a) I need to parse a space-delimited file that is really large, upwards fo a...

C# / C Sharp

Parse Date

by: Doug Bell | last post by:

Hi, I have some data in the format of yyyymmdd (Int32) that I want to display, in a datagrid, formatted in International Date Format dd MM yyyy. Any suggestions on an efficient approach? ...

Visual Basic .NET

Parse Nested Elements to Single DataGrid

by: BMeyer | last post by:

I have been losing my mind trying to parse an XML document (with nested child elements, not all of which appear in each parent node) into a DataGrid object. What I want to do is "flatten" the XML...

.NET Framework

Parse a string from a field and display in table

by: Dan Somdahl | last post by:

Hi, I am new to ASP but have what should be a fairly simple task that I can't figure out. I need to parse a string from a single, semi-colon delimited, 60 character field (el_text) in a recordset...

ASP / Active Server Pages

Error loading Stylesheet:Xpath parse Failure

by: Leighya | last post by:

Im currently working on this xml file but when i load it to Mozilla, i got an error "Error Loading Stylesheet: Xpath parse failure: invalid variable name" It loads on IE properly. Only with the...

XML

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

C# / C Sharp

SP to parse string adding synonimes from table

Similar topics