473,574 Members | 2,291 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

to count the number of bigrams

24 New Member
hi,guys i have done a program to count the number of bigrams.
i have taken a input file by using @ARGV[0],then icounted the number of lines in the file,using the split function i splited the sentence and pushed it in an array.and counted how many times the word is repeated.simila rly,this is for single word,similarly i have paired two words from the array,and now i want to count the number of occurence of this,please help me out of this..........t he code is here.


Expand|Select|Wrap|Line Numbers
  1. #!/user/bin/perl
  2. $file=@ARGV[0];
  3. undef $/;
  4. open (FH,"$file");
  5.  
  6. @f=<FH>;
  7. $n=<FH>;
  8. close (FH);
  9. print "$n\n";
  10. $a=@f;
  11. print "the number of lines in the file is:$a\n";
  12.  
  13. foreach $lin(@f)
  14. {
  15.     chomp($lin);
  16.  push(@b,split (/\s+/,$lin));
  17.  
  18.  
  19. }
  20. print "@b \n";
  21.  
  22. $c=@b;
  23.  
  24. print "total numbers is $c\n";
  25.  
  26. for $i(0 .. $#b-1)
  27. {
  28.     $bigram=($b[$i] . " " . $b[$i+1]);
  29.     print "$bigram\n";
  30.  
  31. }
  32.  
till this part iam getting the bigrams,but from here onwards i have to calculate
it that how many times it appers in pair.

Expand|Select|Wrap|Line Numbers
  1. $count=0;
  2. while($n =~ /this is/)
  3. {
  4. $count += "$n";
  5. $count++;
  6. }
  7. print "to play=$count\n";
  8.  
Sep 11 '07 #1
13 4772
numberwhun
3,509 Recognized Expert Moderator Specialist
Humaid,

It is common practice on this site that when you post code to the forums that code tags should surround your code. I have added them to this post for you. This is the second time I have done this for you. I ask that after this, you please diligently apply the code tags to the code in your posts.

If you do not know how to use them, please refer to the "REPLY GUIDELINES" to the right of the posting window.

Regards,

Jeff
Sep 11 '07 #2
KevinADC
4,059 Recognized Expert Specialist
humaid,

Is this school or course work of some kind?
Sep 11 '07 #3
numberwhun
3,509 Recognized Expert Moderator Specialist
Well, considering in his last post that he said he was "told to do" the program, I would say that all the stuff he is doing is school related.

Regards,

Jeff
Sep 11 '07 #4
humaid
24 New Member
jeff,first of all sorry for the code tags here after i'll see to it,and next thing is that iam not a student,iam working in a research center where i was doing program in java for 3 years,but now i was shifted to a search engine project for a regional language,for which i have to study perl.my first posts were just a practise,the day i posted my first question was my third day in perl.

in this iam trying to train the machine which it can differentiate the domains by knowing the repititive word,now iam trying for bigrams then trigrams and so on.
Sep 12 '07 #5
KevinADC
4,059 Recognized Expert Specialist
First thing you need to do is compile all your perl programs using the "strict" and "warnings" pragmas;

Expand|Select|Wrap|Line Numbers
  1. use strict;
  2. use warnings;
Don't bother to continue learning perl unless you do. These two pragmas will save you so much problems you have no idea. Start using them immediately even if you do not understand what the benefit is. Eventually you will understand.

Don't continue with the bad habit of double-quoting single scalars in perl like you did here:

open (FH,"$file");

this is asking for trouble. Using double-quotes like that can introduce very hard to find bugs in your perl code that not even strict and warnings will help you to find. It also slows down your perl program because the double-quotes forces perl to make an annonymous copy of whats inside the double-quotes. Since the scalar is already defined this is unecessary.

Don't use $a or $b in your perl programs except for sorting with the sort() function. They are special variables that should never be declared with "my" in your perl programs. Just don't use them except for sorting.

I am pretty sure there are ngram modules on CPAN. If you are going to learn perl you need to know about CPAN and how to install modules unless you have tech support that does that for you. You still need to know about CPAN though.

http://search.cpan.org/

I think thats enough for now.
Sep 12 '07 #6
humaid
24 New Member
thanks kevin,i'll remember your words.the problem iam facing is
iam getting the bigrams in an array,and iam printing it, now i want to count that a particular bigram occurs how many time.
shall i use regex to match all the bigram with the $n value or something else.
Sep 12 '07 #7
KevinADC
4,059 Recognized Expert Specialist
You would use a regexp if you are dealing with patterns. You would use a string operator if you are dealing with strings. If you are counting one set of ngrams a simple scalar will work. if you are counting multiple sets of ngrams a hash would be better.
Sep 12 '07 #8
humaid
24 New Member
thanks kevin i'll try out.
Sep 12 '07 #9
humaid
24 New Member
thanks kevin i tried it out and got the result.here is the code

Expand|Select|Wrap|Line Numbers
  1. foreach $w(@c)
  2. {
  3.     $hash{$w}++;
  4. }
  5. foreach $w1(keys(%hash))
  6. {
  7.  
  8. print "$w1 => $hash{$w1}\n";
  9. }
  10.  
Sep 12 '07 #10

Sign in to post your reply or Sign up for a free account.

Similar topics

22
61356
by: Ling Lee | last post by:
Hi all. I'm trying to write a program that: 1) Ask me what file I want to count number of lines in, and then counts the lines and writes the answear out. 2) I made the first part like this: in_file = raw_input("What is the name of the file you want to open: ") in_file = open("test.txt","r")
2
6536
by: John Furphy | last post by:
Could someone assist with getting the count function working correctly in this example please. I know the count function will return all rows that do not have null values, but in this case I want to count all the rows except those with a zero sale price, (which are unsold). The table shows works offered for sale by an artist, with a positive...
2
5701
by: Sally B. | last post by:
Hi, Using string.replace with regular expressions, is there any way to count the number of replacements that actually happens? I know you can limit the number of replacements w/ a count value, but how can one count the actual number of times that an expression gets replaced? Thanks, Sally
5
18252
by: Cro | last post by:
Hello Access Developers, I'd like to know if it is possible to perform a count in an expression that defines a control source. My report is based on a query. In my report, I want a text box to display the number of times a certain value appears in a certain field (i.e. perform a ‘count'). I will be doing this for many values in many...
1
3669
by: sunilkeswani | last post by:
Hi I am still new to access. I want to know how i can build a query which can display results from 4 different columns/fields Like. Field1 Field2 Field3 Field4 1 2 1 0 1 1 0 0
68
6783
by: Martin Joergensen | last post by:
Hi, I have some files which has the following content: 0 0 0 0 0 0 0 1 1 1 1 0 0 1 1 1 1 0 0 1 1 1 1 0 0 1 1 1 1 0 0 0 0 0 0 0
11
14650
by: Mack | last post by:
Hi all, I want to write a program to count number of bits set in a number. The condition is we should not loop through each bit to find whether its set or not. Thanks in advance, -Mukesh
3
2701
by: waynejr25 | last post by:
can anyone help me add a function that will count the occurance of each word in an input file. here's the code i have so far it counts the number of characters, words, and lines but i need the occurance of each word. #include <fstream> #include <iostream> #include <string> #include <cstdlib> using namespace std;
1
3602
by: jlt206 | last post by:
This code <?php include("counter.php")?> on the webpage produces the count number. (function code below) I want to place the current number into a variable $MemberNo or into a FormField to be sent via an email function. But just can't figure it out. <? //////////////////////////////////////////////////////////// //
0
7831
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
7746
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
8092
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...
0
8262
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
0
8129
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
0
6497
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
0
5331
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3768
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
0
1092
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.