By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
438,659 Members | 1,714 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 438,659 IT Pros & Developers. It's quick & easy.

Find Credit Card numbers in text field

code green
Expert 100+
P: 1,726
Data Protection etc.... shouldn't be there.
How do I find credit card numbers buried amongst any length of text.

Googling finds credit card validation claims, which are too complicated.

I need to find groups of numbers that resemble credit card numbers.
I think the criteria is 13 - 24 digits in groups of 4 or 5
seperated by spaces or dashes or bunched together.

It looks like a regex using PATINDEX is the answer but regex is black magic to me
Have I got this right and could I please have some pattern suggestions?
Expand|Select|Wrap|Line Numbers
  1. SELECT notes FROM cust_order WHERE PATINDEX('pattern',notes) > 0
May 19 '09 #1
Share this Question
Share on Google+
4 Replies


Uncle Dickie
P: 67
Using a similar function to the one I posted last time you could get something that might help you on your way (I'm not sure what CK had in mind for an alternative approach with String Parsing so I'm bumbling on as before !):

Expand|Select|Wrap|Line Numbers
  1. CREATE FUNCTION [dbo].[R_Test_func2](@sub nvarchar(1000))
  2. RETURNS int
  3. AS
  4. BEGIN
  5.  
  6. DECLARE @Output int
  7. DECLARE @myOutput int
  8. SET @Output = 0
  9. SET @myOutput = 0
  10.  
  11. WHILE patindex('%[0123456789]%', @sub) <> 0
  12.  BEGIN
  13.   SET @Output = 
  14.    len(substring
  15.    (
  16.     substring(@sub,patindex('%[0123456789]%',@sub),len(@sub))
  17.     ,1
  18.     ,CASE
  19.      WHEN patindex('%[^0123456789 -]%',substring(@sub,patindex('%[0123456789]%',@sub),len(@sub))) = 0
  20.      THEN len(@sub)
  21.      ELSE patindex('%[^0123456789 -]%',substring(@sub,patindex('%[0123456789]%',@sub),len(@sub)))-1
  22.     END
  23.    ))
  24.  
  25.   SET @sub = 
  26.    CASE
  27.     WHEN patindex('%[^0123456789 -]%',substring(@sub,patindex('%[0123456789]%',@sub),len(@sub))) = 0
  28.     THEN 'a'
  29.     ELSE substring
  30.     (
  31.      substring(@sub,patindex('%[0123456789]%',@sub),len(@sub))
  32.      ,patindex('%[^0123456789 -]%',substring(@sub,patindex('%[0123456789]%',@sub),len(@sub)))
  33.      ,len(@sub)
  34.     )
  35.    END
  36.  
  37.   SET @myOutput = 
  38.    CASE
  39.     WHEN @Output > @myOutput THEN @Output ELSE @myOutput
  40.     END
  41.   END
  42.  RETURN  @myOutput
  43.  
  44. END
The general principle is the same as before

It looks for the 1st numeric character:
Expand|Select|Wrap|Line Numbers
  1. patindex('%[0123456789]%',@sub)
Then for the 1st non numeric character thereafter (this time also allowing for a space or -):
Expand|Select|Wrap|Line Numbers
  1. patindex('%[^0123456789 -]%',substring(@sub,patindex('%[0123456789]%',@sub),len(@sub)))
The length of this string is stored and then the rest of the string is searched.
The resulting output will be the length of the longest string of consecutive numbers (including spaces and -)

So, you could use a WHERE statement such as:

Expand|Select|Wrap|Line Numbers
  1. WHERE dbo.R_Test_Func2(notes) > 12
You could get rid of one of the two variables @Output and @myOutput by doing the SET as a CASE statement in one go but I felt this may complicate the (already less than easy to read) code too much!

As a note, another fix from the original function I posted is that it will now work if the last character is non-numeric, hence the addition of a couple of CASE statements buried in there. In the previous version if the last character was not a number then it would try and return a string of either length 0 or starting from the 0th character which I discovered it did not like very much!


Hope this helps
May 20 '09 #2

code green
Expert 100+
P: 1,726
Looks like it could work Uncle Dickie.
I am under a bit of pressure with this one now, so will give it a try
May 20 '09 #3

ck9663
Expert 2.5K+
P: 2,878
It meant the same thing :) You're basically parsing the TEXT which is by common definition is a string. As string parsing is reading the string one character at a time or finding patterns within it.

As I always said, if that algorithm gave you the performance you need, that should be fine, until someone suggested a better one :)

Thanks Uncle Dickie and Happy Coding CG :)


--- CK
May 20 '09 #4

Uncle Dickie
P: 67
Having looked a bit more at the patindex function, a much easier option than my previous function would simply be:

Expand|Select|Wrap|Line Numbers
  1. SELECT notes
  2. FROM   cust_order
  3. WHERE  patindex('%[0-9][0-9 -][0-9 -][0-9 -][0-9 -][0-9 -][0-9 -][0-9 -][0-9 -][0-9 -][0-9 -][0-9 -][0-9 -]%',TranID) > 0
  4.  
This is just looking for a string of 13 consecutive numbers, spaces or '-' (it must start with a number).
May 21 '09 #5

Post your reply

Sign in to post your reply or Sign up for a free account.