473,385 Members | 1,927 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

Performance-String searching

Hi,

Does anyone have any suggestions on an approach to speed this up
please?

I need to check a given column in a given table to see whether there
are any characters within any of the records that fall outside of a
range of valid ASCII characters.

My problem is that of speed. The approach I have taken is to write a
function that iterates the string character by character and checks the
ASCII value at the current position. If it is deemed invalid I drop out
of the function and return True. The calling procedure then knows to
stop searching as the table column has already failed.

Running down a table of million rows, can take up to an hour depending
on the datatype and level of data population.

If I write the same thing in Access 2003 VBA and run it via ODBC it
takes a matter of minutes.

I would like to find a set-based way of doing this maybe PATINDEX or
something else creative but am stumped.

Does anyone have any brilliant ideas?

Thank you!

Aug 3 '05 #1
5 2500
It would probably help if you gave a specific example of what your data
looks like and which characters you consider valid, but something like
this might work for you - you can replace 'a-zA-Z0-9' with your range
of valid characters:

create table #t (col1 nchar(1))

insert into #t select 'A'
insert into #t select '0' -- zero
insert into #t select 'a'

if exists (
select * from #t
where col1 like '%[^a-zA-Z0-9]%' collate Latin1_General_BIN
)
print 'Column failed validation'
else
print 'Column passed'

insert into #t select 'ü' -- u umlaut, ascii 252

-- now run the query again to get the 'failed' message

If this doesn't help, I suggest you post a short script to show a
sample of your data and exactly which rows should pass or fail.

Simon

Aug 3 '05 #2
Hi

Hope this can help you:

SELECT *
FROM <TABLE>
WHERE <COLUMN> NOT LIKE '%[^A-Z0-9]%'

this returns only those fields whose column have a valid char.

hope this answers your question

best Regards,
Chandra
http://www.SQLResource.com/
http://chanduas.blogspot.com/
---------------------------------------

*** Sent via Developersdex http://www.developersdex.com ***
Aug 3 '05 #3
Chandra (ch*****@discussions.hotmail.com) writes:
Hope this can help you:

SELECT *
FROM <TABLE>
WHERE <COLUMN> NOT LIKE '%[^A-Z0-9]%'

this returns only those fields whose column have a valid char.


ingsms was looking for strings that contained at least one non-ASCII
charcater, so the condition sbould be LIKE and not NOT LIKE.

But it does not stop there A-Z is evaluated in the collation of the
column, so with most collations, it includes non-ASCII characters.

For instance:

CREATE TABLE #nonascii (a varchar(100) COLLATE Latin1_General_CI_AS
NOT NULL)
go
INSERT #nonascii (a) VALUES ('Räksmörgås')
INSERT #nonascii (a) VALUES ('Tête à tête')
INSERT #nonascii (a) VALUES ('naîve')
INSERT #nonascii (a) VALUES ('RÄKSMÖRGÅS')
go
SELECT *
FROM #nonascii
WHERE a LIKE '%[^A-Z0-9]%'
go
DROP TABLE #nonascii

finds only "Tête à tête", when it should find all four.

Simon's example with casting into a binary collateion is correct.


--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Aug 3 '05 #4

Here's what I would do.

I would add a binary or int column, say 'VALID', and check the field
when it's inserted- and then index VALID. That way you could do a
search on Valid and it would be lickety-split!

Or, if you don't want to change the table, make a new table that has
the PK of the original and VALID, and use a trigger upon insert to keep
track of the 'valid'ness of that row.

Glenn Dekhayser
Voyant Strategies, Inc.

Aug 4 '05 #5
Actually, I think what you would want to do (if you wanted to emulate
the Access methodology) is use a cursor on the table, and iterate until
you find the first record that fails.

Problem is, if the bad record is the millionth one, this will take WAY
too much time. I still like my previous solution better, which would
end up being:

if exists(select valid from myTable where valid = 'N') then set
@i_have_a_bad_letter = 1

That would be instant, especially if you indexed 'valid'.

Glenn Dekhayser
Voyant Strategies, Inc.

Aug 4 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

25
by: Brian Patterson | last post by:
I have noticed in the book of words that hasattr works by calling getattr and raising an exception if no such attribute exists. If I need the value in any case, am I better off using getattr...
12
by: Fred | last post by:
Has anyone a link or any information comparing c and c++ as far as execution speed is concerned? Signal Processing algorithms would be welcome... Thanks Fred
12
by: serge | last post by:
I have an SP that is big, huge, 700-800 lines. I am not an expert but I need to figure out every possible way that I can improve the performance speed of this SP. In the next couple of weeks I...
6
by: teedilo | last post by:
We have an application with a SQL Server 2000 back end that is fairly database intensive -- lots of fairly frequent queries, inserts, updates -- the gamut. The application does not make use of...
5
by: Scott | last post by:
I have a customer that had developed an Access97 application to track their business information. The application grew significantly and they used the Upsizing Wizard to move the tables to SQL...
115
by: Mark Shelor | last post by:
I've encountered a troublesome inconsistency in the C-language Perl extension I've written for CPAN (Digest::SHA). The problem involves the use of a static array within a performance-critical...
13
by: bjarne | last post by:
Willy Denoyette wrote; > ... it > was not the intention of StrousTrup to the achieve the level of efficiency > of C when he invented C++, ... Ahmmm. It was my aim to match the performance...
13
by: Bern McCarty | last post by:
I have run an experiment to try to learn some things about floating point performance in managed C++. I am using Visual Studio 2003. I was hoping to get a feel for whether or not it would make...
7
by: Michael D. Ober | last post by:
When calling Enqueue, the internal array may need to be reallocated. My question is by how much? In the old MFC array classes, you could tell MFC how many additional elements to add to the array...
5
by: Markus Ernst | last post by:
Hello A class that composes the output of shop-related data gets some info from the main shop class. Now I wonder whether it is faster to store the info in the output class or get it from the...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.