473,385 Members | 1,673 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

to count the number of bigrams

24
hi,guys i have done a program to count the number of bigrams.
i have taken a input file by using @ARGV[0],then icounted the number of lines in the file,using the split function i splited the sentence and pushed it in an array.and counted how many times the word is repeated.similarly,this is for single word,similarly i have paired two words from the array,and now i want to count the number of occurence of this,please help me out of this..........the code is here.


Expand|Select|Wrap|Line Numbers
  1. #!/user/bin/perl
  2. $file=@ARGV[0];
  3. undef $/;
  4. open (FH,"$file");
  5.  
  6. @f=<FH>;
  7. $n=<FH>;
  8. close (FH);
  9. print "$n\n";
  10. $a=@f;
  11. print "the number of lines in the file is:$a\n";
  12.  
  13. foreach $lin(@f)
  14. {
  15.     chomp($lin);
  16.  push(@b,split (/\s+/,$lin));
  17.  
  18.  
  19. }
  20. print "@b \n";
  21.  
  22. $c=@b;
  23.  
  24. print "total numbers is $c\n";
  25.  
  26. for $i(0 .. $#b-1)
  27. {
  28.     $bigram=($b[$i] . " " . $b[$i+1]);
  29.     print "$bigram\n";
  30.  
  31. }
  32.  
till this part iam getting the bigrams,but from here onwards i have to calculate
it that how many times it appers in pair.

Expand|Select|Wrap|Line Numbers
  1. $count=0;
  2. while($n =~ /this is/)
  3. {
  4. $count += "$n";
  5. $count++;
  6. }
  7. print "to play=$count\n";
  8.  
Sep 11 '07 #1
13 4755
numberwhun
3,509 Expert Mod 2GB
Humaid,

It is common practice on this site that when you post code to the forums that code tags should surround your code. I have added them to this post for you. This is the second time I have done this for you. I ask that after this, you please diligently apply the code tags to the code in your posts.

If you do not know how to use them, please refer to the "REPLY GUIDELINES" to the right of the posting window.

Regards,

Jeff
Sep 11 '07 #2
KevinADC
4,059 Expert 2GB
humaid,

Is this school or course work of some kind?
Sep 11 '07 #3
numberwhun
3,509 Expert Mod 2GB
Well, considering in his last post that he said he was "told to do" the program, I would say that all the stuff he is doing is school related.

Regards,

Jeff
Sep 11 '07 #4
humaid
24
jeff,first of all sorry for the code tags here after i'll see to it,and next thing is that iam not a student,iam working in a research center where i was doing program in java for 3 years,but now i was shifted to a search engine project for a regional language,for which i have to study perl.my first posts were just a practise,the day i posted my first question was my third day in perl.

in this iam trying to train the machine which it can differentiate the domains by knowing the repititive word,now iam trying for bigrams then trigrams and so on.
Sep 12 '07 #5
KevinADC
4,059 Expert 2GB
First thing you need to do is compile all your perl programs using the "strict" and "warnings" pragmas;

Expand|Select|Wrap|Line Numbers
  1. use strict;
  2. use warnings;
Don't bother to continue learning perl unless you do. These two pragmas will save you so much problems you have no idea. Start using them immediately even if you do not understand what the benefit is. Eventually you will understand.

Don't continue with the bad habit of double-quoting single scalars in perl like you did here:

open (FH,"$file");

this is asking for trouble. Using double-quotes like that can introduce very hard to find bugs in your perl code that not even strict and warnings will help you to find. It also slows down your perl program because the double-quotes forces perl to make an annonymous copy of whats inside the double-quotes. Since the scalar is already defined this is unecessary.

Don't use $a or $b in your perl programs except for sorting with the sort() function. They are special variables that should never be declared with "my" in your perl programs. Just don't use them except for sorting.

I am pretty sure there are ngram modules on CPAN. If you are going to learn perl you need to know about CPAN and how to install modules unless you have tech support that does that for you. You still need to know about CPAN though.

http://search.cpan.org/

I think thats enough for now.
Sep 12 '07 #6
humaid
24
thanks kevin,i'll remember your words.the problem iam facing is
iam getting the bigrams in an array,and iam printing it, now i want to count that a particular bigram occurs how many time.
shall i use regex to match all the bigram with the $n value or something else.
Sep 12 '07 #7
KevinADC
4,059 Expert 2GB
You would use a regexp if you are dealing with patterns. You would use a string operator if you are dealing with strings. If you are counting one set of ngrams a simple scalar will work. if you are counting multiple sets of ngrams a hash would be better.
Sep 12 '07 #8
humaid
24
thanks kevin i'll try out.
Sep 12 '07 #9
humaid
24
thanks kevin i tried it out and got the result.here is the code

Expand|Select|Wrap|Line Numbers
  1. foreach $w(@c)
  2. {
  3.     $hash{$w}++;
  4. }
  5. foreach $w1(keys(%hash))
  6. {
  7.  
  8. print "$w1 => $hash{$w1}\n";
  9. }
  10.  
Sep 12 '07 #10
humaid
24
hey guys i have used the code tags but y is it not like it.
Sep 12 '07 #11
numberwhun
3,509 Expert Mod 2GB
You need to look at the example's given in the "REPLY GUIDELINES" on the right side of your screen when replying.

Just like the QUOTE tags, the CODE tags start with a [ character and they end with a ] character. In between you would have something like code=perl. That will do what you want. The end tag ( there should always be an end tag, just like the QUOTE tag, should be /code between the square brackets. If you reply to this message, just look at the tags I have used below:

Expand|Select|Wrap|Line Numbers
  1. some code
  2. some more code
  3.  
I have also fixed the post above. :-)

Regards,

Jeff
Sep 12 '07 #12
humaid
24
Expand|Select|Wrap|Line Numbers
  1.  
  2.  
  3. foreach $w(@c)
  4. {
  5.     $hash{$w}++;
  6. }
  7. foreach $w1(keys(%hash))
  8.  
  9. print "$w1 => $hash{$w1}\n";
  10. }
  11.  
is it like this........
sorry jeff i have edited this..
Sep 12 '07 #13
numberwhun
3,509 Expert Mod 2GB
Exactly! Thank you for your diligence!

Regards,

Jeff
Sep 12 '07 #14

Sign in to post your reply or Sign up for a free account.

Similar topics

22
by: Ling Lee | last post by:
Hi all. I'm trying to write a program that: 1) Ask me what file I want to count number of lines in, and then counts the lines and writes the answear out. 2) I made the first part like this: ...
2
by: John Furphy | last post by:
Could someone assist with getting the count function working correctly in this example please. I know the count function will return all rows that do not have null values, but in this case I want...
2
by: Sally B. | last post by:
Hi, Using string.replace with regular expressions, is there any way to count the number of replacements that actually happens? I know you can limit the number of replacements w/ a count value, but...
5
by: Cro | last post by:
Hello Access Developers, I'd like to know if it is possible to perform a count in an expression that defines a control source. My report is based on a query. In my report, I want a text box to...
1
by: sunilkeswani | last post by:
Hi I am still new to access. I want to know how i can build a query which can display results from 4 different columns/fields Like. Field1 Field2 Field3 Field4 1 2 1 ...
68
by: Martin Joergensen | last post by:
Hi, I have some files which has the following content: 0 0 0 0 0 0 0 1 1 1 1 0 0 1 1 1 1 0 0 1 1 1 1 0 0 1 1 1 1 0 0 0 0 0 0 0
11
by: Mack | last post by:
Hi all, I want to write a program to count number of bits set in a number. The condition is we should not loop through each bit to find whether its set or not. Thanks in advance, -Mukesh
3
by: waynejr25 | last post by:
can anyone help me add a function that will count the occurance of each word in an input file. here's the code i have so far it counts the number of characters, words, and lines but i need the...
1
by: jlt206 | last post by:
This code <?php include("counter.php")?> on the webpage produces the count number. (function code below) I want to place the current number into a variable $MemberNo or into a FormField to be sent...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.