By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
448,682 Members | 1,105 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 448,682 IT Pros & Developers. It's quick & easy.

to count the number of bigrams

P: 24
hi,guys i have done a program to count the number of bigrams.
i have taken a input file by using @ARGV[0],then icounted the number of lines in the file,using the split function i splited the sentence and pushed it in an array.and counted how many times the word is repeated.similarly,this is for single word,similarly i have paired two words from the array,and now i want to count the number of occurence of this,please help me out of this..........the code is here.


Expand|Select|Wrap|Line Numbers
  1. #!/user/bin/perl
  2. $file=@ARGV[0];
  3. undef $/;
  4. open (FH,"$file");
  5.  
  6. @f=<FH>;
  7. $n=<FH>;
  8. close (FH);
  9. print "$n\n";
  10. $a=@f;
  11. print "the number of lines in the file is:$a\n";
  12.  
  13. foreach $lin(@f)
  14. {
  15.     chomp($lin);
  16.  push(@b,split (/\s+/,$lin));
  17.  
  18.  
  19. }
  20. print "@b \n";
  21.  
  22. $c=@b;
  23.  
  24. print "total numbers is $c\n";
  25.  
  26. for $i(0 .. $#b-1)
  27. {
  28.     $bigram=($b[$i] . " " . $b[$i+1]);
  29.     print "$bigram\n";
  30.  
  31. }
  32.  
till this part iam getting the bigrams,but from here onwards i have to calculate
it that how many times it appers in pair.

Expand|Select|Wrap|Line Numbers
  1. $count=0;
  2. while($n =~ /this is/)
  3. {
  4. $count += "$n";
  5. $count++;
  6. }
  7. print "to play=$count\n";
  8.  
Sep 11 '07 #1
Share this Question
Share on Google+
13 Replies


numberwhun
Expert Mod 2.5K+
P: 3,503
Humaid,

It is common practice on this site that when you post code to the forums that code tags should surround your code. I have added them to this post for you. This is the second time I have done this for you. I ask that after this, you please diligently apply the code tags to the code in your posts.

If you do not know how to use them, please refer to the "REPLY GUIDELINES" to the right of the posting window.

Regards,

Jeff
Sep 11 '07 #2

KevinADC
Expert 2.5K+
P: 4,059
humaid,

Is this school or course work of some kind?
Sep 11 '07 #3

numberwhun
Expert Mod 2.5K+
P: 3,503
Well, considering in his last post that he said he was "told to do" the program, I would say that all the stuff he is doing is school related.

Regards,

Jeff
Sep 11 '07 #4

P: 24
jeff,first of all sorry for the code tags here after i'll see to it,and next thing is that iam not a student,iam working in a research center where i was doing program in java for 3 years,but now i was shifted to a search engine project for a regional language,for which i have to study perl.my first posts were just a practise,the day i posted my first question was my third day in perl.

in this iam trying to train the machine which it can differentiate the domains by knowing the repititive word,now iam trying for bigrams then trigrams and so on.
Sep 12 '07 #5

KevinADC
Expert 2.5K+
P: 4,059
First thing you need to do is compile all your perl programs using the "strict" and "warnings" pragmas;

Expand|Select|Wrap|Line Numbers
  1. use strict;
  2. use warnings;
Don't bother to continue learning perl unless you do. These two pragmas will save you so much problems you have no idea. Start using them immediately even if you do not understand what the benefit is. Eventually you will understand.

Don't continue with the bad habit of double-quoting single scalars in perl like you did here:

open (FH,"$file");

this is asking for trouble. Using double-quotes like that can introduce very hard to find bugs in your perl code that not even strict and warnings will help you to find. It also slows down your perl program because the double-quotes forces perl to make an annonymous copy of whats inside the double-quotes. Since the scalar is already defined this is unecessary.

Don't use $a or $b in your perl programs except for sorting with the sort() function. They are special variables that should never be declared with "my" in your perl programs. Just don't use them except for sorting.

I am pretty sure there are ngram modules on CPAN. If you are going to learn perl you need to know about CPAN and how to install modules unless you have tech support that does that for you. You still need to know about CPAN though.

http://search.cpan.org/

I think thats enough for now.
Sep 12 '07 #6

P: 24
thanks kevin,i'll remember your words.the problem iam facing is
iam getting the bigrams in an array,and iam printing it, now i want to count that a particular bigram occurs how many time.
shall i use regex to match all the bigram with the $n value or something else.
Sep 12 '07 #7

KevinADC
Expert 2.5K+
P: 4,059
You would use a regexp if you are dealing with patterns. You would use a string operator if you are dealing with strings. If you are counting one set of ngrams a simple scalar will work. if you are counting multiple sets of ngrams a hash would be better.
Sep 12 '07 #8

P: 24
thanks kevin i'll try out.
Sep 12 '07 #9

P: 24
thanks kevin i tried it out and got the result.here is the code

Expand|Select|Wrap|Line Numbers
  1. foreach $w(@c)
  2. {
  3.     $hash{$w}++;
  4. }
  5. foreach $w1(keys(%hash))
  6. {
  7.  
  8. print "$w1 => $hash{$w1}\n";
  9. }
  10.  
Sep 12 '07 #10

P: 24
hey guys i have used the code tags but y is it not like it.
Sep 12 '07 #11

numberwhun
Expert Mod 2.5K+
P: 3,503
You need to look at the example's given in the "REPLY GUIDELINES" on the right side of your screen when replying.

Just like the QUOTE tags, the CODE tags start with a [ character and they end with a ] character. In between you would have something like code=perl. That will do what you want. The end tag ( there should always be an end tag, just like the QUOTE tag, should be /code between the square brackets. If you reply to this message, just look at the tags I have used below:

Expand|Select|Wrap|Line Numbers
  1. some code
  2. some more code
  3.  
I have also fixed the post above. :-)

Regards,

Jeff
Sep 12 '07 #12

P: 24
Expand|Select|Wrap|Line Numbers
  1.  
  2.  
  3. foreach $w(@c)
  4. {
  5.     $hash{$w}++;
  6. }
  7. foreach $w1(keys(%hash))
  8.  
  9. print "$w1 => $hash{$w1}\n";
  10. }
  11.  
is it like this........
sorry jeff i have edited this..
Sep 12 '07 #13

numberwhun
Expert Mod 2.5K+
P: 3,503
Exactly! Thank you for your diligence!

Regards,

Jeff
Sep 12 '07 #14

Post your reply

Sign in to post your reply or Sign up for a free account.