By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
424,847 Members | 2,402 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 424,847 IT Pros & Developers. It's quick & easy.

Performance-String searching

P: n/a
Hi,

Does anyone have any suggestions on an approach to speed this up
please?

I need to check a given column in a given table to see whether there
are any characters within any of the records that fall outside of a
range of valid ASCII characters.

My problem is that of speed. The approach I have taken is to write a
function that iterates the string character by character and checks the
ASCII value at the current position. If it is deemed invalid I drop out
of the function and return True. The calling procedure then knows to
stop searching as the table column has already failed.

Running down a table of million rows, can take up to an hour depending
on the datatype and level of data population.

If I write the same thing in Access 2003 VBA and run it via ODBC it
takes a matter of minutes.

I would like to find a set-based way of doing this maybe PATINDEX or
something else creative but am stumped.

Does anyone have any brilliant ideas?

Thank you!

Aug 3 '05 #1
Share this Question
Share on Google+
5 Replies


P: n/a
It would probably help if you gave a specific example of what your data
looks like and which characters you consider valid, but something like
this might work for you - you can replace 'a-zA-Z0-9' with your range
of valid characters:

create table #t (col1 nchar(1))

insert into #t select 'A'
insert into #t select '0' -- zero
insert into #t select 'a'

if exists (
select * from #t
where col1 like '%[^a-zA-Z0-9]%' collate Latin1_General_BIN
)
print 'Column failed validation'
else
print 'Column passed'

insert into #t select '' -- u umlaut, ascii 252

-- now run the query again to get the 'failed' message

If this doesn't help, I suggest you post a short script to show a
sample of your data and exactly which rows should pass or fail.

Simon

Aug 3 '05 #2

P: n/a
Hi

Hope this can help you:

SELECT *
FROM <TABLE>
WHERE <COLUMN> NOT LIKE '%[^A-Z0-9]%'

this returns only those fields whose column have a valid char.

hope this answers your question

best Regards,
Chandra
http://www.SQLResource.com/
http://chanduas.blogspot.com/
---------------------------------------

*** Sent via Developersdex http://www.developersdex.com ***
Aug 3 '05 #3

P: n/a
Chandra (ch*****@discussions.hotmail.com) writes:
Hope this can help you:

SELECT *
FROM <TABLE>
WHERE <COLUMN> NOT LIKE '%[^A-Z0-9]%'

this returns only those fields whose column have a valid char.


ingsms was looking for strings that contained at least one non-ASCII
charcater, so the condition sbould be LIKE and not NOT LIKE.

But it does not stop there A-Z is evaluated in the collation of the
column, so with most collations, it includes non-ASCII characters.

For instance:

CREATE TABLE #nonascii (a varchar(100) COLLATE Latin1_General_CI_AS
NOT NULL)
go
INSERT #nonascii (a) VALUES ('Rksmrgs')
INSERT #nonascii (a) VALUES ('Tte tte')
INSERT #nonascii (a) VALUES ('nave')
INSERT #nonascii (a) VALUES ('RKSMRGS')
go
SELECT *
FROM #nonascii
WHERE a LIKE '%[^A-Z0-9]%'
go
DROP TABLE #nonascii

finds only "Tte tte", when it should find all four.

Simon's example with casting into a binary collateion is correct.


--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Aug 3 '05 #4

P: n/a

Here's what I would do.

I would add a binary or int column, say 'VALID', and check the field
when it's inserted- and then index VALID. That way you could do a
search on Valid and it would be lickety-split!

Or, if you don't want to change the table, make a new table that has
the PK of the original and VALID, and use a trigger upon insert to keep
track of the 'valid'ness of that row.

Glenn Dekhayser
Voyant Strategies, Inc.

Aug 4 '05 #5

P: n/a
Actually, I think what you would want to do (if you wanted to emulate
the Access methodology) is use a cursor on the table, and iterate until
you find the first record that fails.

Problem is, if the bad record is the millionth one, this will take WAY
too much time. I still like my previous solution better, which would
end up being:

if exists(select valid from myTable where valid = 'N') then set
@i_have_a_bad_letter = 1

That would be instant, especially if you indexed 'valid'.

Glenn Dekhayser
Voyant Strategies, Inc.

Aug 4 '05 #6

This discussion thread is closed

Replies have been disabled for this discussion.