473,320 Members | 1,804 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

Analysing text files with Perl

Analysing text files to obtain statistics on their content

You are to write a Perl program that analyses text files to obtain statistics on their content. The program should operate as follows:

1) When run, the program should check if an argument has been provided. If not, the program should prompt for, and accept input of, a filename from the keyboard.

2) The filename, either passed as an argument or input from the keyboard, should be checked to ensure it is in MS-DOS format. The filename part should be no longer than 8 characters and must begin with a letter or underscore character followed by up to 7 letters, digits or underscore characters. The file extension should be optional, but if given is should be ".TXT" (upper- or lowercase).

If no extension if given, ".TXT" should be added to the end of the filename. So, for example, if "testfile" is input as the filename, this should become "testfile.TXT". If "input.txt" is entered, this should remain unchanged.

3) If the filename provided is not of the correct format, the program should display a suitable error message and end at this point.

4) The program should then check to see if the file exists using the filename provided. If the file does not exist, a suitable error message should be displayed and the program should end at this point.

5) Next, if the file exists but the file is empty, again a suitable error message should be displayed and the program should end.

6) The file should be read and checked to display crude statistics on the number of characters, words, lines, sentences and paragraphs that are within the file.



Here is the code I have done so far and it doesn't seem to work. Can anybody see why??

Expand|Select|Wrap|Line Numbers
  1. #usr/bin/perl 
  2.  
  3. use strict; 
  4. use warnings; 
  5.  
  6. if ($#ARGV == -1) #no filename provided as a command line argument. 
  7. print("Please enter a filename: "); 
  8. $filename = <STDIN>; 
  9. chomp($filename); 
  10. else #got a filename as an argument. 
  11. $filename = $ARGV[0]; 
  12.  
  13. #perform the specified checks 
  14. #check if filename is valid, exit if not 
  15. if ($filename !~ m^/[a-z]{1,7}\.TXT$/i) 
  16. die("File format not valid\n");) 
  17.  
  18. if ($filename !~ m/\.TXT$/i) 
  19. $filename .= ".TXT"; 
  20.  
  21. #check if filename is actual file, exit if it is. 
  22. if (-e $filename) 
  23. die("File does not exist\n"); 
  24.  
  25. #check if filename is empty, exit if it is. 
  26. if (-s $filename) 
  27. die("File is empty\n"); 
  28.  
  29. my $i = 0; 
  30. my $p = 1; 
  31. my $words = 0; 
  32. my $chars = 0; 
  33.  
  34. open(READFILE, "<$data1.txt") or die "Can't open file '$filename: $!"; 
  35.  
  36. #then use a while loop and series of if statements similar to the following 
  37. while (<READFILE>) { 
  38. chomp;    #removes the input record Separator 
  39. $i = $.;    #"$". is the input record line numbers, $i++ will also work 
  40. $p++ if (m/^$/);   #count paragraphs 
  41. split (/\s+/);    #split sentences into "words" 
  42. $words++     #count all characters except spaces and add to $chars 
  43. $chars += tr/ //c;     #tr/ //c replaces everything in the string with itself, except spaces, and returns the number of such characters replaced 
  44.  
  45.  
  46. #display results 
  47. print "There are $i lines in $data1\n"; 
  48. print "There are $p Paragraphs in $data1\n"; 
  49. print "There are $words in $data1\n"; 
  50. print "There are $chars in $data1\n"; 
  51.  
  52. close(READFILE);
  53.  
Jun 25 '08 #1
5 2457
numberwhun
3,509 Expert Mod 2GB
First off, the first line in your program is incorrect. You have:

Expand|Select|Wrap|Line Numbers
  1. #usr/bin/perl
  2.  
when what you should have is:

Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/perl
  2.  
The first two characters are the "#" and "!". They make up the "shbang" (hashbang). You also need the "/" at the beginning of the path as well.

Another issue that I see is:

Expand|Select|Wrap|Line Numbers
  1. split (/\s+/);    #split sentences into "words"
  2.  
This will definitely throw you an error. You are doing a split, but into what? You need to have an array set equal to this, as so:

Expand|Select|Wrap|Line Numbers
  1. my @words;
  2.  
  3. @words = split (/\s+/);   
  4.  
If you don't have something to put the "words" into, an exception is thrown.


Other than that I have two notes. The first, is please use code tags any time you include code in your posting. You can see them in your original posting if you edit it. They are required and not optional in the forums.

Second, when you say you are "getting an error" or "the code produces and error", it is customary to please include the error(s) that you are seeing as we cannot see them otherwise.

Regards,

Jeff
Jun 25 '08 #2
Analysing text files to obtain statistics on their content

You are to write a Perl program that analyses text files to obtain statistics on their content. The program should operate as follows:

1) When run, the program should check if an argument has been provided. If not, the program should prompt for, and accept input of, a filename from the keyboard.

2) The filename, either passed as an argument or input from the keyboard, should be checked to ensure it is in MS-DOS format. The filename part should be no longer than 8 characters and must begin with a letter or underscore character followed by up to 7 letters, digits or underscore characters. The file extension should be optional, but if given is should be ".TXT" (upper- or lowercase).

If no extension if given, ".TXT" should be added to the end of the filename. So, for example, if "testfile" is input as the filename, this should become "testfile.TXT". If "input.txt" is entered, this should remain unchanged.

3) If the filename provided is not of the correct format, the program should display a suitable error message and end at this point.

4) The program should then check to see if the file exists using the filename provided. If the file does not exist, a suitable error message should be displayed and the program should end at this point.

5) Next, if the file exists but the file is empty, again a suitable error message should be displayed and the program should end.

6) The file should be read and checked to display crude statistics on the number of characters, words, lines, sentences and paragraphs that are within the file.



I am very new to Perl and have managed to compile this code using examples from various books. Could anyone oversee this coding and see how it could be improved.

Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/perl 
  2.  
  3. use strict; 
  4. use warnings; 
  5.  
  6. if ($#ARGV == -1) #no filename provided as a command line argument. 
  7. print("Please enter a filename: "); 
  8. $filename = <STDIN>; 
  9. chomp($filename); 
  10. else #got a filename as an argument. 
  11. $filename = $ARGV[0]; 
  12.  
  13. #perform the specified checks 
  14. #check if filename is valid, exit if not 
  15. if ($filename !~ m^/[a-z]{1,7}\.TXT$/i) 
  16. die("File format not valid\n");) 
  17.  
  18. if ($filename !~ m/\.TXT$/i) 
  19. $filename .= ".TXT"; 
  20.  
  21. #check if filename is actual file, exit if it is. 
  22. if (-e $filename) 
  23. die("File does not exist\n"); 
  24.  
  25. #check if filename is empty, exit if it is. 
  26. if (-s $filename) 
  27. die("File is empty\n"); 
  28.  
  29. my $i = 0; 
  30. my $p = 1; 
  31. my $words = 0; 
  32. my $chars = 0; 
  33.  
  34. open(READFILE, "<$data1.txt") or die "Can't open file '$filename: $!"; 
  35.  
  36. #then use a while loop and series of if statements similar to the following 
  37. while (<READFILE>) { 
  38. chomp; #removes the input record Separator 
  39. $i = $.; #"$". is the input record line numbers, $i++ will also work 
  40. $p++ if (m/^$/); #count paragraphs 
  41. $my @t = split (/\s+/); #split sentences into "words" 
  42. $words += @t; #add count to $words 
  43. $chars += tr/ //c; #tr/ //c count all characters except spaces and add to $chars 
  44.  
  45.  
  46. #display results 
  47. print "There are $i lines in $data1\n"; 
  48. print "There are $p Paragraphs in $data1\n"; 
  49. print "There are $words in $data1\n"; 
  50. print "There are $chars in $data1\n"; 
  51.  
  52. close(READFILE);
  53.  
Jun 25 '08 #3
numberwhun
3,509 Expert Mod 2GB
First, I distinctly remember asking you to use code tags when posting code in the forums. I even mentioned that it was not an option, but instead a requirement that they be used. This is no longer me asking, this is your only warning. PLEASE use code tags when posting code in the forums!

Second, DO NOT start a new thread on the same exact topic as you previously posted. Simply reply to your post and post your additions. I have merged your two threads accordingly.

Please be sure and read the Guidelines that are posted at the top of this forum as they will tell you the proper way to post in the forums.

As for your issue, it looks as though this is school work, especially since it is formatted as a homework question. It is against this sites guidelines to post your homework here in hopes of getting us to do it for you. Other than the issue you had before, you did not mention any errors this time, so we don't have anything to fix. I believe that optimizing this code is probably part of your assignment (especially since you copied it out of books). You should learn the basics of Perl and examine the code and see if you can first find any ways to optimize it.

Please heed the warning(s) I have provided above as well.

Regards,

Jeff
Jun 25 '08 #4
nithinpes
410 Expert 256MB
There are many errors in your code. In line :
Expand|Select|Wrap|Line Numbers
  1. open(READFILE, "<$data1.txt") or die "Can't open file '$filename: $!"; 
  2.  
where are you getting $data1 from? You have assigned filename to $fileneme.
The following line is not correct.
Expand|Select|Wrap|Line Numbers
  1. $my @t = split (/\s+/);
  2.  
The loop for testing file format should come after loop for appending .TXT. Also, there are logical errors in loops for testing existence and size of file. You should use 'unless' instead of 'if'.


Modify the script as below:
Expand|Select|Wrap|Line Numbers
  1. #!usr/bin/perl 
  2.  
  3. use strict; 
  4. use warnings; 
  5.  
  6. my $filename; 
  7. my $i = 0; 
  8. my $p = 1; 
  9. my $words = 0; 
  10. my $chars = 0; 
  11. if ($#ARGV == -1) #no filename provided as a command line argument. 
  12. print("Please enter a filename: "); 
  13. $filename = <STDIN>; 
  14. chomp($filename); 
  15. else #got a filename as an argument. 
  16. $filename = $ARGV[0]; 
  17.  
  18. if ($filename !~ m/\.TXT$/i) 
  19. $filename .= ".txt"; 
  20. #perform the specified checks 
  21. #check if filename is valid, exit if not 
  22. if ($filename !~ m/^[a-z]{1,7}\.TXT$/i) 
  23. die("File format not valid\n");
  24.  
  25.  
  26.  
  27. #check if filename is actual file, exit if it is. 
  28. unless (-e $filename) 
  29. die("File does not exist\n"); 
  30.  
  31. #check if filename is empty, exit if it is. 
  32. unless (-s $filename) 
  33. die("File is empty\n"); 
  34.  
  35.  
  36.  
  37. open(READFILE, "<$filename") or die "Can't open file $filename: $!"; 
  38.  
  39. #then use a while loop and series of if statements similar to the following 
  40. while (<READFILE>) { 
  41. chomp;    #removes the input record Separator 
  42. $i = $.;    
  43. ## 
  44. $p++ if (m/^$/); #count paragraphs 
  45. my @t = split (/\s+/); #split sentences into "words" 
  46. $words += @t; #add count to $words 
  47. $chars += tr/ //c;     #tr/ //c replaces everything in the string with itself, except spaces, and returns the number of such characters replaced 
  48.  
  49.  
  50. #display results 
  51. print "There are $i lines in $filename\n"; 
  52. print "There are $p Paragraphs in $filename\n"; 
  53. print "There are $words words in $filename\n"; 
  54. print "There are $chars characters in $filename\n"; 
  55.  
  56. close(READFILE);
  57.  
Jun 25 '08 #5
KevinADC
4,059 Expert 2GB
Second, DO NOT start a new thread on the same exact topic as you previously posted. Simply reply to your post and post your additions. I have merged your two threads accordingly.


Regards,

Jeff
He did the same on the perlguru forum. Obnoxious.
Jun 25 '08 #6

Sign in to post your reply or Sign up for a free account.

Similar topics

4
by: MJL | last post by:
Suppose you have a perl program that is called by a web page to generate another web page. The program is written to load data from a text file on the server and make certain substitutions in the...
4
by: Xah Lee | last post by:
20050207 text pattern matching # -*- coding: utf-8 -*- # Python # suppose you want to replace all strings of the form # <img src="some.gif" width="30" height="20"> # to # <img...
1
by: Marc H. | last post by:
Hello, I recently converted one of my perl scripts to python. What the script does is simply search a lot of big mail files (~40MB) to retrieve specific emails. I simply converted the script...
10
by: ross | last post by:
I want to do some tricky text file manipulation on many files, but have only a little programming knowledge. What are the ideal languages for the following examples? 1. Starting from a certain...
7
by: Sam Lowry | last post by:
Greetings. I am trying to do something which should elementary for Perl, but I have only been able to find bits and pieces on it. When I put the bits together they do not work. Maybe I am going...
5
by: Karthik | last post by:
Hi, I have an ASCII text file which contains 1's,0's and *'s. The file looks something like this 0 1 1 * 1 1 1 0 * * * 0 1 0 1 I want to create an image file which shows the 1's in green...
3
by: pantagruel | last post by:
Hi, My work is putting in a large application that is basically split up between 30 or so Javascript files. I have some security concerns about this application. Basic security concerns is: ...
1
by: Xah Lee | last post by:
Text Processing with Emacs Lisp Xah Lee, 2007-10-29 This page gives a outline of how to use emacs lisp to do text processing, using a specific real-world problem as example. If you don't know...
2
by: sbettadpur | last post by:
hello, I am trying to install the Text::NLP::Stanford::EntityExtract perl module on windows xp. But its not installing properly... can anybody tell solution for the above problem. But its...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
0
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.