454,215 Members | 1,372 Online
Need help? Post your question and get tips & solutions from a community of 454,215 IT Pros & Developers. It's quick & easy.

# Lix count (readabillity) / Flesch-Kincaid

 P: n/a For my thesis I'm using access to analyze news-articles. I would like to calculate readability score. But I don't know how. I have a very simple word count already, and I also have an approximation of whole sentences (number of periods "."). If this group can help me count how many occurrences of long words there is in an article, I can calculate the Danish LIX. A long word is defined as longer than 7 letters. Thank you in advance Nov 13 '05 #1
7 Replies

 P: n/a What's the definition of a long word: simply the number of letters in it? Assuming you're using Access 2000 or newer, you can use the Split function to break each sentence into its individual words, then calculate the length of each word. Something like the following untested aircode (it assumes that the entire paragraph is stored in strFile): Dim lngCountLongWords As Long Dim lngCountWords As Long Dim lngSentence As Long Dim lngWords As Long Dim strFile As String Dim varSentences As Variant Dim varWords As Variant varSentences = Split(strFile, ".") If IsNull(varSentences) = False Then For lngSentence = LBound(varSentences) To UBound(varSentences) ' Extra spaces and the like may result in some elements in ' varSentences not actually being true sentences. For example, ' if there are 2 blanks in a row, you'll end up with an empty element ' in varSentences. You can vary the limit here if you like... If Len(varSentences(lngSentence) > 0 Then varWords = Split(varSentences(lngSentence), " ") If IsNull(varWords) = False Then For lngWord = LBound(varWords) To UBound(varWords) If Len(varWords(lngWord)) > 0 Then lngCountWords = lngCountWords + 1 ' You'll likely want to vary this number too If Len(varWords(lngWord)) > 9 Then lngCountLongWords = lngCountLongWords + 1 End If End If Next lngWord End If End If Next lngSentence End If -- Doug Steele, Microsoft Access MVP http://I.Am/DougSteele (no e-mails, please!) "achristoffersen" wrote in message news:11**********************@g49g2000cwa.googlegr oups.com... For my thesis I'm using access to analyze news-articles. I would like to calculate readability score. But I don't know how. I have a very simple word count already, and I also have an approximation of whole sentences (number of periods "."). If this group can help me count how many occurrences of long words there is in an article, I can calculate the Danish LIX. A long word is defined as longer than 7 letters. Thank you in advance Nov 13 '05 #2

 P: n/a > What's the definition of a long word: simply the number of letters in it? :-) Yeah - Danish is not that complicated. Something like the following untested aircode Thank you SO!!! much... I'm not good at VB and SQL, so I really dont understand everything thats going on in your code. But I'll try it out tomorrow (right now its 2 am on a saturday night) and let you know. (it assumes that the entire paragraph is stored in strFile) It's Access 'memo'. Again: THX a bunch Nov 13 '05 #3

 P: n/a A few more questions: Is it correct that I cannot use this code in a query? So I must make a form, and then call via a cmd - on event function? What exactly is the syntax for the strfile... is it simply the path to the mdb file? Or just the tabel and field name? Where is lngCountLongWords, lngCountWords, lngSentence and lngWords created? In the strfile? Not in a separate query? I really am sorry for my incompetense - but beyond drag and drop access, I have VERY little experience... Again - thank you in advance Nov 13 '05 #4

 P: n/a "achristoffersen" wrote in message news:11*********************@g44g2000cwa.googlegro ups.com...A few more questions: Is it correct that I cannot use this code in a query? So I must make a form, and then call via a cmd - on event function? You could create turn the snippet of code into a function that returns a value, but since I don't understand how you calculate readability, it's difficult to tell you what that value that's returned should be. What exactly is the syntax for the strfile... is it simply the path to the mdb file? Or just the tabel and field name? The code I wrote assumed that strFile contained the actual text of a paragraph you're analysing. If you were to turn that code into a function, I'd assume that you'd pass strFile into the function as a parameter: Function AnalysisText(TextToAnalyse As String) As ? then you'd replace varSentences = Split(strFile, ".") with varSentences = Split(TextToAnalyse, ".") Where is lngCountLongWords, lngCountWords, lngSentence and lngWords created? In the strfile? Not in a separate query? Not quite sure what you mean by "created". Those four values are variables used in the code. If you convert the code into a function, they'd be local to the function: not used (or accessible) anywhere else. lngSentence and lngWord (that was a typo in the original code: I use lngWord, not lngWords everywhere else) are simply looping variables. The first Split function takes the paragraph that's supplied, and creates an array out it, with each element in the array representing a different sentence in the paragraph. lngSentence is how the code loops through each sentence. Similarly, the second Split function takes each sentence, and creates an array of each word in the sentence, and lngWord loops through each word. lngCountWords and lngCountLongWords are the gist of the function: when all of the loops are done, lngCountWords contains how many words are in the paragraph, while lngCountLongWords contains how many of those words are greater than the given length (9 letters in my sample code) If you need more help, you're going to have to provide details such as how your data is stored in the database, and what calculation needs to be done once you've determined how many words and how many "long words" are in the snippet of text being analysed. -- Doug Steele, Microsoft Access MVP http://I.Am/DougSteele (no e-mails, please!) Nov 13 '05 #5

 P: n/a > but since I don't understand how you calculate readability, it's difficult to tell you what that value that's returned should be. The danish readabillty number, lix, i calculated on the basis of: Number of words (W) Number of periods (P) Number og words with at least 7 letters (L) Lix=(W/P+(L/W x 100)). Ie. Lix = number of word per sentence (one can argue that since abbrevations are more diffucult to read, there is no need to take account of this) + the percentage of long words in the text. A lix number of less then 24 is childrens litterature, and a number of +55 is difficult academic writing. 35-44 is normal, e.g. a newspaper article. Since I already have the number of periods and the number of words returned by a quite simple query, my idea was to get the number of long words returned, and then simply do the math in a seperate query field. Does this make sense? The article is stored in a memofield ("artikel tekst") in tbl_anonymiser, along with a autoincrement primary key ("artikelID") , and other fields. Nov 13 '05 #6

 P: n/a My advice would be to write a function that accepts a string as input, and then returns the Lix value. You say you already know the number of words and the number of periods, and I've shown you have to calculate the number of long words. Hopefully that's enough to get you going. -- Doug Steele, Microsoft Access MVP http://I.Am/DougSteele (no e-mails, please!) "achristoffersen" wrote in message news:11**********************@f14g2000cwb.googlegr oups.com... but since I don't understand how you calculate readability, it's difficult to tell you what that value that's returned should be. The danish readabillty number, lix, i calculated on the basis of: Number of words (W) Number of periods (P) Number og words with at least 7 letters (L) Lix=(W/P+(L/W x 100)). Ie. Lix = number of word per sentence (one can argue that since abbrevations are more diffucult to read, there is no need to take account of this) + the percentage of long words in the text. A lix number of less then 24 is childrens litterature, and a number of +55 is difficult academic writing. 35-44 is normal, e.g. a newspaper article. Since I already have the number of periods and the number of words returned by a quite simple query, my idea was to get the number of long words returned, and then simply do the math in a seperate query field. Does this make sense? The article is stored in a memofield ("artikel tekst") in tbl_anonymiser, along with a autoincrement primary key ("artikelID") , and other fields. Nov 13 '05 #7

 P: n/a > You say you already know the number of words and the number of periods, and I've shown you have to calculate the number of long words. Hopefully that's enough to get you going. It's certainly enough to get me going :-) If I go crashing in to a concreete wall - I'll let you know. For now, I'll just say - thanx a million!!! Sincerely Andreas Nov 13 '05 #8

### This discussion thread is closed

Replies have been disabled for this discussion.