By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
426,222 Members | 1,129 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 426,222 IT Pros & Developers. It's quick & easy.

Count of specific character in a text

P: 4
I have a list of texts and I need to find out the number of times each character shows up.

Example in the text "76DC4-34" the character '4' shows up twice. Is there a query that for this instance if I provide the text and character I'm looking for it will give the number of occurences as a result or if there is no such query how do I go about getting the result I want?

I assume I'm going to have to repeat a process 36 times (10 for each numeric digit and 26 for each letter, other characters in the text don't matter)
Feb 9 '09 #1
Share this Question
Share on Google+
44 Replies


ADezii
Expert 5K+
P: 8,624
@shahaba
I do not think that there is an easy Method of doing what you describe, but the following Function should point you in the right direction. It will count the Total Number of occurrences of each Number contained within the String passed to it:
Expand|Select|Wrap|Line Numbers
  1. Public Function fCountCharacters(strString As String) As String
  2. Dim intLenOfString As Integer
  3. Dim intCounter As Integer
  4. Dim strResult As String
  5. Dim intNumOf0s As Integer
  6. Dim intNumOf1s As Integer
  7. Dim intNumOf2s As Integer
  8. Dim intNumOf3s As Integer
  9. Dim intNumOf4s As Integer
  10. Dim intNumOf5s As Integer
  11. Dim intNumOf6s As Integer
  12. Dim intNumOf7s As Integer
  13. Dim intNumOf8s As Integer
  14. Dim intNumOf9s As Integer
  15.  
  16. If Len(strString) = 0 Then Exit Function
  17.  
  18. intLenOfString = Len(strString)
  19.  
  20. For intCounter = 1 To intLenOfString
  21.   Select Case Mid$(strString, intCounter, 1)
  22.     Case "0"
  23.       intNumOf0s = intNumOf0s + 1
  24.     Case "1"
  25.       intNumOf1s = intNumOf1s + 1
  26.     Case "2"
  27.       intNumOf2s = intNumOf2s + 1
  28.     Case "3"
  29.       intNumOf3s = intNumOf3s + 1
  30.     Case "4"
  31.       intNumOf4s = intNumOf4s + 1
  32.     Case "5"
  33.       intNumOf5s = intNumOf5s + 1
  34.     Case "6"
  35.       intNumOf6s = intNumOf6s + 1
  36.     Case "7"
  37.       intNumOf7s = intNumOf7s + 1
  38.     Case "8"
  39.       intNumOf8s = intNumOf8s + 1
  40.     Case "9"
  41.       intNumOf9s = intNumOf9s + 1
  42.   End Select
  43. Next
  44.  
  45. strResult = "Number of 0s: " & Format$(intNumOf0s, "00") & vbCrLf
  46. strResult = strResult & "Number of 1s: " & Format$(intNumOf1s, "00") & vbCrLf
  47. strResult = strResult & "Number of 2s: " & Format$(intNumOf2s, "00") & vbCrLf
  48. strResult = strResult & "Number of 3s: " & Format$(intNumOf3s, "00") & vbCrLf
  49. strResult = strResult & "Number of 4s: " & Format$(intNumOf4s, "00") & vbCrLf
  50. strResult = strResult & "Number of 5s: " & Format$(intNumOf5s, "00") & vbCrLf
  51. strResult = strResult & "Number of 6s: " & Format$(intNumOf6s, "00") & vbCrLf
  52. strResult = strResult & "Number of 7s: " & Format$(intNumOf7s, "00") & vbCrLf
  53. strResult = strResult & "Number of 8s: " & Format$(intNumOf8s, "00") & vbCrLf
  54. strResult = strResult & "Number of 9s: " & Format$(intNumOf9s, "00") & vbCrLf
  55.   fCountCharacters = strResult
  56. End Function
Expand|Select|Wrap|Line Numbers
  1. Debug.Print fCountCharacters("76DC4-34")
OUTPUT:
Expand|Select|Wrap|Line Numbers
  1. Number of 0s: 00
  2. Number of 1s: 00
  3. Number of 2s: 00
  4. Number of 3s: 01
  5. Number of 4s: 02
  6. Number of 5s: 00
  7. Number of 6s: 01
  8. Number of 7s: 01
  9. Number of 8s: 00
  10. Number of 9s: 00
Feb 10 '09 #2

missinglinq
Expert 2.5K+
P: 3,532
Here's a function that is a little more compact. It uses the Replace() function, so you'll have to be using Access 2000 or later for it to work. It covers everything you've specified, 0-9 and A -Z.

Expand|Select|Wrap|Line Numbers
  1. Public Function CountAllCharacters(ParsedString As String) As String
  2. Dim ReportString As String
  3. ReportString = ""
  4.  
  5. For x = 48 To 90
  6.  If x > 57 And x < 65 Then GoTo Skip2MeLoop
  7.  cnt = Len(ParsedString) - Len(Replace(ParsedString, Chr(x), ""))
  8.  
  9. If cnt <> 0 Then
  10.  If ReportString = "" Then
  11.    ReportString = "Number of " & Chr(x) & "'s: " & cnt
  12.  Else
  13.   ReportString = ReportString & vbCrLf & "Number of " & Chr(x) & "'s: " & cnt
  14.  End If
  15. End If
  16. Skip2MeLoop:
  17. Next x
  18.  
  19. CountAllCharacters = ReportString
  20.  
  21. End Function
  22.  
  23.  
This will give you an output, for the sample string 76DC4-34 of
Expand|Select|Wrap|Line Numbers
  1. Number of 3's: 1
  2. Number of 4's: 2
  3. Number of 6's: 1
  4. Number of 7's: 1
  5. Number of C's: 1
  6. Number of D's: 1
If you need to show zero for all the characters that do not appear, a simple modification will do that.

Welcome to Bytes!

Linq ;0)>
Feb 10 '09 #3

Expert 100+
P: 1,287
Linq, that's brilliant!
Feb 10 '09 #4

FishVal
Expert 2.5K+
P: 2,653
Not bad, not bad.

However I don't think iterating ASCII values is a good idea.
I suppose a better way is to:
  • Get the first character.
  • Replace this character with nothing throughout the string and get length difference.
  • Return to Step 1 until the string is empty.

This gives a more efficient code and allows (I'm almost sure though haven't checked this) to work with unicode strings.

Regards,
Fish.
Feb 10 '09 #5

ADezii
Expert 5K+
P: 8,624
@missinglinq
Nice job Linq! Your code is definitely more compact, efficient, practical, portable, digestible, etc., etc., etc.
Feb 10 '09 #6

ADezii
Expert 5K+
P: 8,624
@FishVal
Hello FishVal, just out of sheer curiosity, why don't you think that iterating ASCII values is a good idea? Would there be that much of a difference iterating ASCII Values as opposed to their Character Representations?
Feb 10 '09 #7

FishVal
Expert 2.5K+
P: 2,653
@ADezii
  • This is inefficient because not all characters are expected to be in a string being processed.
  • This is completely unacceptable for unicode strings.
Feb 10 '09 #8

ADezii
Expert 5K+
P: 8,624
@FishVal
Thanks Fish, couldn't the ChrW() Function be used to return a String containing a Unicode character as well as the Standard ASCII Characters from a String?
Feb 10 '09 #9

FishVal
Expert 2.5K+
P: 2,653
It certainly could, but there is no reason to search a string for all unicode characters.
Feb 10 '09 #10

missinglinq
Expert 2.5K+
P: 3,532
Thank you for pointing this out, FishVal, but until you entered the thread I don't remember anyone talking about unicode strings. Did I miss something?

Of course, taking the first letter of the string and replacing all instances of it, making a note of how many instances there were, then repeating this thru the string, would leave you with

1) A final string that listed the characters in the order that they appeared, not 0-9 and A-Z order

2) Having to account for repeating characters so that they weren't counted twice.

3) Having to account for characters other than 0-9 and A-Z, which is all that the OP stated they were concerned with.

Linq ;0)>
Feb 10 '09 #11

FishVal
Expert 2.5K+
P: 2,653
@missinglinq
:D Well. The initial question was not about counting all characters at all.

Of course, taking the first letter of the string and replacing all instances of it, making a note of how many instances there were, then repeating this thru the string, would leave you with

1) A final string that listed the characters in the order that they appeared, not 0-9 and A-Z order
Results array sorting would not take more time than running Replace() function for all possible characters.

2) Having to account for repeating characters so that they weren't counted twice.
They will not since they are excluded from string when found.

3) Having to account for characters other than 0-9 and A-Z, which is all that the OP stated they were concerned with.
Don't you like to go slightly beyond of what you guess about OP's question? :D

Kind regards,
Fish.
Feb 10 '09 #12

100+
P: 675
The original question is to count the number of occurrences for a given character. Function would return a single value, from 0 to n, the length of the original string. If this is a field in a query, then SELECT tbl.StringToTest AS [Match]. . . fCountOccur([Match],"A") AS CountA . . . FROM tbl
Expand|Select|Wrap|Line Numbers
  1. Public Function fCountOccur(strSource As String, strMatch As String) As String
  2. Dim iCount As Integer
  3. Dim iPosition As Integer
  4.     iCount = 0
  5.     For iPosition = 1 To Len(strSource)
  6.         If Mid(strSource, iPosition, 1) = strMatch Then iCount = iCount + 1
  7.     Next
  8.     fCountOccur = Format(iCount, "000")
  9. End Function
  10.  
One query could do all 36 conditions. Design grid would have 36 fields beyond original table. They might be CountA: fCountOccur([Match],"A"), CountB: fCountOccur([Match],"B"), etc.
These 36 fields could be concatinated into one long string, such that the count for 0's is in positions 1-3; the count for 1's in positions 4-6; etc. therefore, there are mid(strLongString,n*3+1,3) n's, n from 0 to 9. Letters would be n=Asc(letter)-55. Concatination could be in query or with an enclosing loop in function, in which case 2nd argument would not be used as inner loop would use ascii value from outer loop.

OldBirdman
Feb 11 '09 #13

ADezii
Expert 5K+
P: 8,624
Sorry guys, but I still feel that Linq's approach, or a variation thereof, is the way to go on this one!
Feb 11 '09 #14

missinglinq
Expert 2.5K+
P: 3,532
No, ADezii! You don't think we should try to see how #$%^&* complicated we could make this? I mean, what if some of the strings were in Klingon? Would it work then? Or in Esperanto and encrypted? Would it work then?

Linq ;0)>
Feb 11 '09 #15

FishVal
Expert 2.5K+
P: 2,653
Fathers,

Actually (now it is related to original question) I would not use a query with 36 calculated fields. Characters to count (in Klingon, Esperanto or even in Shakespare's Talk) could be stored in an additional table. This table could be joined using cartesian join with the target table to produce all possible combinations of [text] and [character] which are used to calculate character ocurence count. Thus obtained dataset could be used as a source for crosstab query.

Query: qryCount
Expand|Select|Wrap|Line Numbers
  1. SELECT tbl.txt, tblChars.txtChar,
  2. Len(tbl.txt)-Len(Replace(tbl.txt, tblChars.txtChar, "",1,-1,1)) AS lngCount
  3. FROM tbl, tblChars;
  4.  
Query: qryCrosstab
Expand|Select|Wrap|Line Numbers
  1. TRANSFORM First(qryCount.lngCount) AS lngCount
  2. SELECT qryCount.txt
  3. FROM qryCount
  4. GROUP BY qryCount.txt
  5. PIVOT qryCount.txtChar;
  6.  
Feb 11 '09 #16

100+
P: 675
We don't know, and nobody has asked, why this is needed. For the example given, text = "76DC4-34", I can't think of any reason why this might be needed. I assumed English (26 letters) and not Klingon, Russian, or Arabic.
I think the way to proceed depends on this answer. Is it 1) a One-Time study, 2) a query or table field(s) even if temporary, or 3) a calculated control? When shahaba says "I assume I'm going to have to repeat a process 36 times ... ", does this mean he will be working with 36 values, or can he reduce it to one value, say the largest? If the answer is 1), then a list might be appropriate. If 2) or 3), this has to be reduced to a single number or string. These require different solutions.
Feb 12 '09 #17

FishVal
Expert 2.5K+
P: 2,653
:D

Aim? Who does care. Just fire.
Feb 12 '09 #18

NeoPa
Expert Mod 15k+
P: 31,419
I think the OP forgot to come back (Either that, or they were frightened off by all the high-faluting "discussions").

In truth though, it's always interesting to get the points of view of other experts (Yes OB - That includes you too, regardless of tag), even when they are not strictly required for the original question.
Feb 16 '09 #19

100+
P: 675
NeoPa, as the only non-expert here, I am glad you appreciate my input.
"shahaba" is a Newbie (1 post), and needs to be invited back into this discussion.
shahaba, can you give a few more details, such as how you are going to use this number. Is it a field in a table, an unbound control on a form or report, a variable in a VBA formula, a one-time special study, or ...?
You write:
I assume I'm going to have to repeat a process 36 times (10 for each numeric digit and 26 for each letter, ... )
which means you may need 36 values. How are these to be used. Maybe you really don't need a query, only a function. You have 4 experts interested here, lets get to a solution.
Don't give up on this forum, it may be the best one out there!
Feb 16 '09 #20

NeoPa
Expert Mod 15k+
P: 31,419
@OldBirdman
I certainly do ;)

Particularly how you look to get the precise question sorted out as a basis for continuing (not just in this thread).
Feb 16 '09 #21

P: 4
Sorry for not responding sooner, this issue was put on the backburner for a while for various reasons. So far the information has been useful even though most of what is said here is going over my head. Let me answer what I can and if you want any more information I'll be checking back regularly.

I think that getting the results as fields in a table might be the most useful.

What I meant by 36 times is refering to if there was such built in function I'd be using it once for each digit 0-9 and once for each letter.

To be clear, I need the sum of the characters in all records of a table not the sum of each character in for each individual records

This in not a 1 time study. Currently the actual table I plan to run this on has 25000 records of varying length up to 14 characters. But I'll likely need to run this process everytime we get new data to append to this table.
Feb 18 '09 #22

NeoPa
Expert Mod 15k+
P: 31,419
Is the data for each record contained in a single field?

PS. With this new information we'd certainly be looking at doing it differently. I'll give it some thought.
Feb 18 '09 #23

100+
P: 675
Does it matter as strAll = Str1 [& Str2 [& Str3 ...]]?
A function is going to be called 350,000 times. Isn't the question whether to do this in queries or code?
We're calling f(strAll, strLetter) for each letter 0-9, A-Z for each record.
Solution 1 - Queries - 1) SELECT f(strAll, "A") AS CountA, ..., f(strAll, "9") AS Count9 will result in 25,000 rows, 36 fields. 2)SUM values in each field
Solution 2 - Code (logic) - Do Until EOF; Do strX = "A" to "Z", "0" to "9"; txtLetter&strX = txtLetter&strX + f(strAll, strX); where txtLetter&strX are textboxes on a form.
Function f returns Len(strAll)-Len((Replace(strAll,strX, "")) per ADezii and this code could replace actual call either in SQL Statement or Code.
Disclaimer - The above is not intended to be code in any language. It is intended to point this discussion in one of two directions.
shahaba - My inquisitive mind would like to know the reason behind this need. Can you tell us. OldBirdman
Feb 18 '09 #24

NeoPa
Expert Mod 15k+
P: 31,419
Firstly, I'm going to proceed here, but please remember to respond to OB's post too.

In my considered opinion, the very real benefit of using Replace() for a single character, is more than lost when it has to be run on all the characters in the alphabet + numerals. This may not be true for an enormous string, but if we have many records then just putting the data together into a string would be enormously processor intensive, not to say time consuming.

I'm thinking on the basis that if the data in a record is up to 14 characters long, it's almost certainly in a single field.

I can't think of a reliable and efficient way of arranging the data such that it can be worked on as a single piece of data (string) so I've concluded the best way to handle this is to use DSum() to sum the values produced (by a function call we'll get to later) for each record in the table.

The function must essentially return the count of qualifying characters in the string data passed.
Feb 18 '09 #25

NeoPa
Expert Mod 15k+
P: 31,419
The following function should do the trick :
Expand|Select|Wrap|Line Numbers
  1. Public Function CountValid(strData As String) As Long
  2.   Dim intX As Integer
  3.  
  4.   CountValid = 0
  5.   For intX = 1 To Len(strData)
  6.     If InStr(1, _
  7.              "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789", 
  8.              Mid(strData, intX, 1), _
  9.              vbTextCompare) > 0 Then CountValid = CountValid + 1
  10.   Next intX
  11. End Function
Use vbBinaryCompare in line #9 if only upper case characters are required.

To get the total simply use :
Expand|Select|Wrap|Line Numbers
  1. Call MsgBox("Valid chars in database = " & _
  2.             Format(DSum("CountValid([YourFieldName])", _
  3.                         "[YourTablename]"), "#,##0\."))
Feb 18 '09 #26

NeoPa
Expert Mod 15k+
P: 31,419
If the format of the data is known always to fit any type of template, then that may well effect how best it can be done (the only characters that are ever included in the data, but which should not be counted, come from a list shorter than 36 for instance).

Otherwise there is an open invitation to post ideas. We're looking for those which are more efficient in execution rather than less though, so please review some of the known issue laid out earlier before contributing.
Feb 18 '09 #27

P: 4
The purpose of all this is creating Identity Tags by piecing these tags from single characters which must be requested in advance.

The data for each record is contained within a single field

Other characters that may appear are \ / -
Feb 18 '09 #28

P: 4
The purpose of all this is creating Identity Tags by piecing these tags from single characters which must be requested in advance.
Maybe this is an idea that has been proposed but should be forgotten. Random numbers or ? might be another approach to solving the underlying problem, which is to create Identity Tags.
In my defense, I realized that the data was probably in 1 field, but if not, the 2 or 3 fields could be concatinated, thus this issue could be ignored. I never considered creating a string of many thousands of characters, but 14 characters max.
The function "Function CountValid" does not do what is requested. It counts the total number of characters that are one of the 36 allowed. It does not count the number of each letter (i.e. How many "K"s?). From post #1:
I have a list of texts and I need to find out the number of times each character shows up.
Feb 18 '09 #29

NeoPa
Expert Mod 15k+
P: 31,419
@shahaba
How does counting the total number of these 36 characters relate to this :S
Feb 18 '09 #30

NeoPa
Expert Mod 15k+
P: 31,419
@shahaba
If these are the only other characters that may appear in the data anywhere, then the code can execute more quickly as :
Expand|Select|Wrap|Line Numbers
  1. Public Function CountValid(strData As String) As Long
  2.   CountValid = Len(Replace(Replace(Replace(strData, _
  3.                                            "/", ""), _
  4.                                    "\", ""), _
  5.                            "-", ""))
  6. End Function
Feb 18 '09 #31

NeoPa
Expert Mod 15k+
P: 31,419
Firstly, please PM me if there's anything wrong with your account. I will do what I can to resolve any problem.
@CannotUseSite
No comment. I'm lost as to how this fits in at all with what was asked for before :S
@CannotUseSite
No defense required OB. I was also considering proceeding on those lines (I mean concatenating records, not just fields), until I thought it through more thoroughly. My comments were certainly not directed at your idea.
@CannotUseSite
In response to your prompting, Shahaba replied with an explanation of what he was trying to do fundamentally :
@shahaba
It is this new understanding that I'm working to (I think). I am no longer trying to answer the original question as it seems that it was a completely wrong approach anyway (going by what Shahaba has since posted). My understanding of that is that Shahaba was intending to call the function for each possible character then sum the results to produce the total (The only item of interest as far as I now understand).
Feb 18 '09 #32

P: 4
Originally Posted by shahaba
To be clear, I need the sum of the characters in all records of a table not the sum of each character in for each individual records
Not clear! "...the sum of the individual characters in all records", or "...the sum of the valid characters in all records"?
Feb 18 '09 #33

NeoPa
Expert Mod 15k+
P: 31,419
@CannotUseSite
I took it as the latter, but please retry your OldBirdman account. It should be working now.

PS. Please use me in future if you have any such, or similar, difficulties again. They're actually pretty rare, and I'm happy to resolve them quickly if / when they do crop up.
Feb 18 '09 #34

100+
P: 675
I needless to say consider the first statement to be correct, as shahaba was clarifying, not changing the conditions for his request.
I assume I'm going to have to repeat a process 36 times (10 for each numeric digit and 26 for each letter, ...
About an efficient function - I would imagine that Instr() is much more efficient than Mid(), and that the code for calling a function is no worse than for executing a For...Next Loop. Replace() is least efficient. Therefore:
Expand|Select|Wrap|Line Numbers
  1. Private Function fCountOneLetter(iStart As Integer, strToTest As String, strFindThis As String) As Integer
  2. Dim i As Integer
  3. Static iCount As Integer
  4.     i = InStr(iStart, strToTest, strFindThis)
  5.     If i <> 0 Then
  6.         i = fCountOneLetter(i + 1, strToTest, strFindThis)
  7.         iCount = iCount + 1
  8.     Else
  9.         iCount = 0
  10.     End If
  11.     fCountOneLetter = iCount
  12. End Function
Feb 19 '09 #35

ADezii
Expert 5K+
P: 8,624
  1. I've created an Algorithm that will calculate the Total Number of Characters (A-Z, 0-9) in a Variable Length Field in a Table. Simply pass the Name of your Table and Field to the fCountCharacters() Function and it will do the rest. The code assumes that the Field containing the Characters is Not Null. This is purely from a code perspective, and is my interpretation of the what the OP is looking for given his last, modified request. Is there a more efficient Method of doing this? Probably, but this one came right off the Top-of-My-Head (no Comment please, NeoPa). The one drawback that is evident is that there could be a staggering number of Loop Iterations for large Recordsets since there are 3 nested levels.
  2. Function Definition:
    Expand|Select|Wrap|Line Numbers
    1. Public Function fCountCharacters(strTableName As String, strFieldName As String)
    2. Dim strCompareString As String
    3. Dim MyDB As DAO.Database
    4. Dim rstTest As DAO.Recordset
    5. Dim intCharCounter As Integer
    6. Dim intCharCtField As Integer
    7. Dim intFieldChar As Integer
    8. Dim intCumTotals As Integer
    9.  
    10. strCompareString = "ABCDEFGHIJKLMNOPQUSTUVWXYZ0123456789"
    11.  
    12. If DCount("*", strTableName) = 0 Then
    13.   MsgBox "No Records exist in " & strTableName, vbCritical, "No Records"
    14.     Exit Function
    15. End If
    16.  
    17. Set MyDB = CurrentDb()
    18. Set rstTest = MyDB.OpenRecordset(strTableName, dbOpenSnapshot)
    19.  
    20. DoCmd.Hourglass True
    21.  
    22. With rstTest
    23.   For intCharCounter = 1 To Len(strCompareString)       'A-Z and 0-9
    24.     Do While Not .EOF       'Loop through All Records in the Recordset
    25.       intCharCtField = Len(.Fields(strFieldName))       'Number of characters in the Field
    26.         For intFieldChar = 1 To intCharCtField      'Iterate each character
    27.           'Loop through all Characters in the Field for Current Record
    28.           If UCase$(Mid$(.Fields(strFieldName), intFieldChar, 1)) = Mid$(strCompareString, intCharCounter, 1) Then
    29.             intCumTotals = intCumTotals + 1     'Increment Counter for specific Counter
    30.           End If
    31.         Next
    32.         .MoveNext
    33.     Loop
    34.       If intCumTotals > 0 Then
    35.         Debug.Print "Number of " & Mid$(strCompareString, intCharCounter, 1) & "'s: " & _
    36.                      Format$(intCumTotals, "00000")
    37.       End If
    38.       intCumTotals = 0      'Reset
    39.       .MoveFirst    'Advance to 1st Record in ecordset
    40.   Next
    41. End With
    42.  
    43. DoCmd.Hourglass False
    44.  
    45. rstTest.Close
    46. Set rstTest = Nothing
    47. End Function
  3. Sample Function Call with required Arguments:
    Expand|Select|Wrap|Line Numbers
    1. Call fCountCharacters("Table1", "Field1")
  4. Sample Data used for analysis:
    Expand|Select|Wrap|Line Numbers
    1. Field1
    2. ABC-432-9923
    3. CAFG-826
    4. JJ-J-1924K
    5. BBZ-00-11-888-PP
    6. 7 7 7 7 7 7 7
    7. defrtyu-99-102-l
  5. OUTCOME:
    Expand|Select|Wrap|Line Numbers
    1. Number of A's: 00002
    2. Number of B's: 00003
    3. Number of C's: 00002
    4. Number of D's: 00001
    5. Number of E's: 00001
    6. Number of F's: 00002
    7. Number of G's: 00001
    8. Number of J's: 00003
    9. Number of K's: 00001
    10. Number of L's: 00001
    11. Number of P's: 00002
    12. Number of U's: 00001
    13. Number of T's: 00001
    14. Number of U's: 00001
    15. Number of Y's: 00001
    16. Number of Z's: 00001
    17. Number of 0's: 00003
    18. Number of 1's: 00004
    19. Number of 2's: 00005
    20. Number of 3's: 00002
    21. Number of 4's: 00002
    22. Number of 6's: 00001
    23. Number of 7's: 00007
    24. Number of 8's: 00004
    25. Number of 9's: 00005
  6. I'm done for now, download the Attachment is anyone is actually interested.
Feb 19 '09 #36

ADezii
Expert 5K+
P: 8,624
@OldBirdman
This in not a 1 time study. Currently the actual table I plan to run this on has 25000 records of varying length up to 14 characters. But I'll likely need to run this process everytime we get new data to append to this table.
Just for curiosity, do you think that a Recursive Procedure will work under these Parameters?
Feb 19 '09 #37

100+
P: 675
Wouldn't it be better to reverse these two statements?
Expand|Select|Wrap|Line Numbers
  1. For intCharCounter = 1 To Len(strCompareString)       'A-Z and 0-9 
  2.     Do While Not .EOF       'Loop through All Records in the Recordset 
Then the file(table) would not be read through 36 times, which has got to be slower than reading it once. Seems from other things I've done that Access will requery or reread for about any excuse. Of course, you'd need an array to hold the 36 cumulating totals until the end of the function.
Why call UCase() when:
Expand|Select|Wrap|Line Numbers
  1.        If UCase$(Mid$(.Fields(strFieldName), intFieldChar, 1)) = Mid$(strCompareString, intCharCounter, 1) Then 
  2. is the same as
  3.        If StrComp(Mid$(.Fields(strFieldName), intFieldChar, 1)) ,Mid$(strCompareString, intCharCounter, 1), vbTextCompare) = 0 Then
  4. is the same, if Option Compare Database, as
  5.       If Mid$(.Fields(strFieldName), intFieldChar, 1) = Mid$(strCompareString, intCharCounter, 1) Then
The function StrComp() may not actually be a function call, but only the way the code is handled when executed.
Feb 19 '09 #38

ADezii
Expert 5K+
P: 8,624
@OldBirdman
Wouldn't it be better to reverse these two statements?
Excellent point, OldBirdman. Your reversal should be more efficient and subsequently much faster.
Why call UCase() when:
To be honest with you, I cannot see how StrComp() would not be a Function Call, and I figured the String Version of the UCase$() Function would be faster, but of course I have no way of proving this. Whenever I get a chance, I'll perform some Benchmark Tests to see what approach is more efficient. We can even make it a horror movie: Ucase$() vs StrComp(). Seriously, thanks for the input.
Feb 19 '09 #39

100+
P: 675
Many compiles/intepreters have functions that aren't really calls. The code is executed as if it were a simple statement, not a function call. IsNull might be such a function. There may be others. I have no basis for assuming any of this for Access VBA however.
I've tried to time different methods for efficiency, but I cannot get meaningful results. Must be that something else starts in background and messes up the testing. If you have a test method, I'd like to see it.
By the way:
Just for curiosity, do you think that a Recursive Procedure will work under these Parameters?
was answered in post #35, but as you posted almost simultaneously with your post #36, I guess you missed it.
Feb 19 '09 #40

Expert Mod 2.5K+
P: 2,545
In the attempt I've made at this below I have taken a different approach, which is to do minimal processing at the counting end and do the A to Z selection when reporting the counts. I use an array which counts the frequency of all ASCII characters in the strings concerned. This does not limit the final counts to A-Z and 0-9 should it be necessary to change the range later.

On a recordset comprising 66,568 single-textfield records totalling 5,930,963 characters the count and report function took just over 4 seconds to run start-to-finish.

Expand|Select|Wrap|Line Numbers
  1. Public Function fCountCharsInRecords(strRSName As String) As Long
  2.     'This function counts the number of characters contained in the fields
  3.     'of the recordset whose name is supplied as the argument.
  4.     '
  5.     'Assumption: the recordset comprises fields which contain solely text
  6.     'to be counted. The number of fields is not predetermined, but all should be
  7.     'text based. Null values acceptable.
  8.     '
  9.     'Source of these should be a query on base data which extracts
  10.     'the text fields and no others.
  11.     '
  12.     'Method: this function generates a frequency count for all the ASCII character
  13.     'codes in the source text fields. Lower case characters are counted
  14.     'separately from uppercase, although the final results reported are for the
  15.     'combined cases (that is the count for, "A" is the sum of the counts for "A"
  16.     'and "a". The result file is a tab-separated text file on C containing the
  17.     'character counts. The listing function can be modified as necessary for other
  18.     'purposes.
  19.     '
  20.     'Returns the total number of characters in the character strings counted.
  21.     '
  22.     Dim chrCounts(255) As Long
  23.     Dim RS As DAO.Recordset
  24.     Dim intFields As Integer
  25.     Dim lngChrCount As Long
  26.     Dim strSomeString As String
  27.     Set RS = CurrentDb.OpenRecordset(strRSName)
  28.     intFields = RS.Fields.Count
  29.     Call fClearChrCounts(chrCounts())
  30.     Do While Not RS.EOF
  31.         For intFields = 0 To intFields - 1
  32.             strSomeString = Nz(RS.Fields(intFields))
  33.             Call fCountChars(strSomeString, chrCounts())
  34.             lngChrCount = lngChrCount + Len(strSomeString)
  35.         Next intFields
  36.         RS.MoveNext
  37.     Loop
  38.     RS.Close
  39.     Call fListLetterAndNumberCounts(chrCounts())
  40.     fCountCharsInRecords = lngChrCount
  41. End Function
  42.  
  43. Public Function fClearChrCounts(chrCounts() As Long)
  44.     Dim intI As Integer
  45.     For intI = 1 To UBound(chrCounts)
  46.         chrCounts(intI) = 0
  47.     Next intI
  48. End Function
  49.  
  50. Public Function fCountChars(strSomeString, chrCounts() As Long)
  51.     Dim intI As Integer, intAsciiCode As Integer
  52.     If Not IsNull(strSomeString) Then
  53.         For intI = 1 To Len(strSomeString)
  54.             intAsciiCode = Asc(Mid(strSomeString, intI, 1))
  55.             chrCounts(intAsciiCode) = chrCounts(intAsciiCode) + 1
  56.         Next intI
  57.     End If
  58. End Function
  59.  
  60. Public Function fListLetterAndNumberCounts(chrCounts() As Long)
  61.     'Outputs list of character counts for 'A' to 'Z' and
  62.     ''0' to '9', regardless of lower or uppercase original characters
  63.     'Output file is a tab-separated textfile on C:
  64.     '
  65.     Dim intAsciiCode As Integer
  66.     Open "C:\chrcount.txt" For Output As #1
  67.     Print #1, "Char" & Chr(9) & "Count"
  68.     For intAsciiCode = Asc("A") To Asc("Z")
  69.         Print #1, Chr(intAsciiCode) & Chr(9) & chrCounts(intAsciiCode) + chrCounts(intAsciiCode + 32)
  70.     Next intAsciiCode
  71.     For intAsciiCode = Asc("0") To Asc("9")
  72.         Print #1, Chr(intAsciiCode) & Chr(9) & chrCounts(intAsciiCode)
  73.     Next intAsciiCode
  74.     Close #1
  75. End Function
Results
Expand|Select|Wrap|Line Numbers
  1. Char Count
  2. A   350321
  3. B    87295
  4. C   196770
  5. D   211750
  6. E   441251
  7. F   147846
  8. G    87188
  9. H   111309
  10. I   244076
  11. J    41821
  12. K    39836
  13. L   164205
  14. M   162828
  15. N   256494
  16. O   288119
  17. P    67275
  18. Q     8000
  19. R   302632
  20. S   230041
  21. T   274570
  22. U   117314
  23. V    36758
  24. W    52701
  25. X    38013
  26. Y    90558
  27. Z     1316
  28. 0    19418
  29. 1    32999
  30. 2    20188
  31. 3    15608
  32. 4     9277
  33. 5    11932
  34. 6     6641
  35. 7     6484
  36. 8     6174
  37. 9     5676
Feb 19 '09 #41

FishVal
Expert 2.5K+
P: 2,653
@shahaba
Hmm.. Have you considered using one of known hashing algorithm.
Feb 19 '09 #42

ADezii
Expert 5K+
P: 8,624
@OldBirdman
I use the timeGetTime() API Function for testing purposes. It is illustrated below:
http://bytes.com/topic/access/insigh...ttime-vs-timer
Feb 19 '09 #43

ADezii
Expert 5K+
P: 8,624
Although not as efficient as Stewart's approach, I decided to take the much longer but compact route where all code functionality exists within the context of a single Function. The following code processess 100,000 Records consisting of a Fixed-Width Field of 14 Characters (MAX specified by OP), against the Comparison String (A-Z, 0-9). The code can definately be made more efficient as pointed out previously by Old Birdman in POst #38. The Function, a Sample Call, and subsequent Output are listed below:
Expand|Select|Wrap|Line Numbers
  1. Public Function fCountCharacters(strTableName As String, strFieldName As String)
  2. Dim strCompareString As String
  3. Dim MyDB As DAO.Database
  4. Dim rstTest As DAO.Recordset
  5. Dim intCharCounter As Integer
  6. Dim intCharCtField As Integer
  7. Dim intFieldChar As Integer
  8. Dim intCumTotals As Long
  9.  
  10. strCompareString = "ABCDEFGHIJKLMNOPQUSTUVWXYZ0123456789"
  11.  
  12. If DCount("*", strTableName) = 0 Then
  13.   MsgBox "No Records exist in " & strTableName, vbCritical, "No Records"
  14.     Exit Function
  15. End If
  16.  
  17. Set MyDB = CurrentDb()
  18. Set rstTest = MyDB.OpenRecordset(strTableName, dbOpenSnapshot)
  19.  
  20. DoCmd.Hourglass True
  21.  
  22. With rstTest
  23.   For intCharCounter = 1 To Len(strCompareString)       'A-Z and 0-9
  24.     Do While Not .EOF       'Loop through All Records in the Recordset
  25.       intCharCtField = Len(.Fields(strFieldName))       'Number of characters in the Field
  26.         For intFieldChar = 1 To intCharCtField          'Iterate each character
  27.           'Loop through all Characters in the Field for Current Record
  28.           If UCase$(Mid$(.Fields(strFieldName), intFieldChar, 1)) = Mid$(strCompareString, intCharCounter, 1) Then
  29.             intCumTotals = intCumTotals + 1     'Increment Counter for specific Counter
  30.           End If
  31.         Next
  32.         .MoveNext
  33.     Loop
  34.       If intCumTotals > 0 Then
  35.         Debug.Print "Number of " & Mid$(strCompareString, intCharCounter, 1) & "'s: " & _
  36.                      Format$(intCumTotals, "###,###,###")
  37.       End If
  38.       intCumTotals = 0      'Reset
  39.       .MoveFirst    'Advance to 1st Record in ecordset
  40.   Next
  41. End With
  42.  
  43. DoCmd.Hourglass False
  44.  
  45. rstTest.Close
  46. Set rstTest = Nothing
Expand|Select|Wrap|Line Numbers
  1. Call fCountCharacters("Table2", "Field1")
OUTPUT:
Expand|Select|Wrap|Line Numbers
  1. Number of A's: 99,980
  2. Number of C's: 117,427
  3. Number of D's: 33
  4. Number of E's: 16,988
  5. Number of M's: 33
  6. Number of N's: 2
  7. Number of O's: 99,967
  8. Number of P's: 200,317
  9. Number of S's: 47,860
  10. Number of T's: 17,294
  11. Number of 0's: 241,183
  12. Number of 1's: 39,604
  13. Number of 2's: 45,270
  14. Number of 3's: 177,512
  15. Number of 4's: 79,460
  16. Number of 5's: 75,140
  17. Number of 6's: 41,465
  18. Number of 7's: 36,615
  19. Number of 8's: 32,285
  20. Number of 9's: 31,466
Feb 19 '09 #44

P: 4
Problem solved. Next time I need help, I'll try to be a little more descriptive so there isn't any confusion. Thanks all,
Feb 19 '09 #45

Post your reply

Sign in to post your reply or Sign up for a free account.