473,903 Members | 4,083 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Q: Analyse data and provide a report - Arrays?

Hi,

I need to write a script which reads some data and reports the findings.
Just to give you an idea the structure is similar to the following.

Data input example:

HEADING 1
**********
ColumnA ColumnB ColumnC ColumnD ColumnE
Pete Male Marketing Single 40
Kate Female Marketing Married 30
John Male Sales Married 38
Pete Male Sales Single 52
John Male Sales Single 24
HEADING 2
**********
ColumnF ColumnG ColumnH ColumnI
whatever
whatever
whatever
whatever
Report Output example:
# of Pete's =
# of Males =
# of Salespeople =
# of Singles =
# of over 35s =
Since this is the first time I'm even writing such a script I would
appreciate some pointers.
1) Do I use arrays or associate arrays for this? Why or why not?
2) Is it possible for someone to give me a code example of counting how many
Singles we have?
3) What happens when I have read all the data under HEADING 1 and need to
move onto HEADING 2?
That is, how do I accomplish the jump from what I think is one loop onto the
next?

I imagine that there will be many more posts following this one so there's
no need to get into too much detail. Some guidance would be nice as I will
need to utilise Google and my references for the rest.

Thanks in advance.

Jul 19 '05 #1
35 3707
Ga Mu,
Great stuff - thanks very much. :)

The headings differentiate blocks of data so once we count everything under
HEADING 1 we move onto HEADING 2 then HEADING 3 etc.

Does this help a bit?


"Ga Mu" <Ng******@SPcom cast.netAM> wrote in message
news:xl44b.3068 60$Ho3.43264@sc crnsc03...
Troll wrote:
1) Do I use arrays or associate arrays for this? Why or why not?


Use hashes (aka associative arrays) because they work so well for
counting occurences of words. A hash instance is automatically
initialized to zero the first time it is used, so, assuming you have
already declared the hash %names ('my %names;') and we are in your
parsing loop and have extracted the person's name into $name, all you
need is:

$names{$name}++ ; # increment the count for this name.
2) Is it possible for someone to give me a code example of counting how many Singles we have?


You could count everything with hashes.

Prior to your parsing loop:

my (%names, %sexes, %depts, %m_statuses, %ages);

Within your parsing loop:

# extract four words and a number into scalars:
my ($name, $sex, $dept, $m_status, $age) =
/^(\w+) (\w+) (\w+) (\w+) (\d+)$/;

# increment counts for each:
$names{$name}++ ;
$sexes{$sex}++;
$depts{$dept}++ ;
$m_statuses{$m_ status}++;
$ages{$age}++;

After your parsing loop:

$names{'Pete'} gives the number of Petes.
$sexes{'Male'} gives the number of Males.
$depts{'Sales'} gives the number of sales people.
$m_statuses{'Si ngle'} gives the number of single people.
$ages{'25'} gives the number of 25 year-olds.

To print a list of all names and the number of occurences of each:

foreach $key (keys %names) {
print "$key: $names{$key}\n" ;
}

This will output something like:

John: 2
Pete: 3
Kate: 1

This list could have been sorted by either name or count. Do a 'perldoc
-f' for 'keys' and 'sort'.
3) What happens when I have read all the data under HEADING 1 and need to move onto HEADING 2?
That is, how do I accomplish the jump from what I think is one loop onto the next?


Can't answer that, as you don't provide enough detail. What is the
significance of the headings? Would the results be the same if the
headings were completely ignored or do the headings signify some
distinction between blocks of data?

Greg

Jul 19 '05 #2
Ga Mu,

Pls disregard last post.

With regard to the jump between HEADINGS, will it be enough to do something
like:
while (<>)
....
if (/HEADING 1/ .. /HEADING 2/) {
# line falls between HEADING 1 and HEADING 2 in the text, inclusive.
# then do the string extraction
# then increment stuff
elsif (/HEADING 2/ .. /HEADING 3/) {
# line falls between HEADING 2 and HEADING 3 in the text, inclusive.
# then do the string extraction
# then increment stuff
etc?

I quite like the code example you provided - actually found a similar one in
http://www.oreilly.com/catalog/perlw...pter/ch08.html
Up until now I was under the impression that I would have to use split - can
you elaborate why you chose a different approach?

One other task I have to do is similar to:
If a line contains Single in the column then get the single person's name. I sort of came up with:
foreach $m_statuses{'Si ngle'}
print $names{$name}

but that's probably totally wrong. Can you advise?

Thanks again.
"Troll" <ab***@microsof t.com> wrote in message
news:e7******** ***********@new s-server.bigpond. net.au... Ga Mu,
Great stuff - thanks very much. :)

The headings differentiate blocks of data so once we count everything under HEADING 1 we move onto HEADING 2 then HEADING 3 etc.

Does this help a bit?


"Ga Mu" <Ng******@SPcom cast.netAM> wrote in message
news:xl44b.3068 60$Ho3.43264@sc crnsc03...
Troll wrote:
1) Do I use arrays or associate arrays for this? Why or why not?
Use hashes (aka associative arrays) because they work so well for
counting occurences of words. A hash instance is automatically
initialized to zero the first time it is used, so, assuming you have
already declared the hash %names ('my %names;') and we are in your
parsing loop and have extracted the person's name into $name, all you
need is:

$names{$name}++ ; # increment the count for this name.
2) Is it possible for someone to give me a code example of counting
how many Singles we have?
You could count everything with hashes.

Prior to your parsing loop:

my (%names, %sexes, %depts, %m_statuses, %ages);

Within your parsing loop:

# extract four words and a number into scalars:
my ($name, $sex, $dept, $m_status, $age) =
/^(\w+) (\w+) (\w+) (\w+) (\d+)$/;

# increment counts for each:
$names{$name}++ ;
$sexes{$sex}++;
$depts{$dept}++ ;
$m_statuses{$m_ status}++;
$ages{$age}++;

After your parsing loop:

$names{'Pete'} gives the number of Petes.
$sexes{'Male'} gives the number of Males.
$depts{'Sales'} gives the number of sales people.
$m_statuses{'Si ngle'} gives the number of single people.
$ages{'25'} gives the number of 25 year-olds.

To print a list of all names and the number of occurences of each:

foreach $key (keys %names) {
print "$key: $names{$key}\n" ;
}

This will output something like:

John: 2
Pete: 3
Kate: 1

This list could have been sorted by either name or count. Do a 'perldoc
-f' for 'keys' and 'sort'.
3) What happens when I have read all the data under HEADING 1 and need to move onto HEADING 2?
That is, how do I accomplish the jump from what I think is one loop
onto the next?


Can't answer that, as you don't provide enough detail. What is the
significance of the headings? Would the results be the same if the
headings were completely ignored or do the headings signify some
distinction between blocks of data?

Greg


Jul 19 '05 #3
Troll wrote:
Ga Mu,

Pls disregard last post.

With regard to the jump between HEADINGS, will it be enough to do something
like:
while (<>)
...
if (/HEADING 1/ .. /HEADING 2/) {
# line falls between HEADING 1 and HEADING 2 in the text, inclusive.
# then do the string extraction
# then increment stuff
elsif (/HEADING 2/ .. /HEADING 3/) {
# line falls between HEADING 2 and HEADING 3 in the text, inclusive.
# then do the string extraction
# then increment stuff
etc?
I am unclear as to the distinction between blocks. Are there a separate
group of totals for each heading or is everyting totalled up together?
If the latter, then simply ignore the headings. If the former, then you
could parse out the heading name and use a multidimensiona l hash. I.e.,
replace this:

$names{$name}++ ;

with this:

$names{$heading }{$name}++;

I quite like the code example you provided - actually found a similar one in
http://www.oreilly.com/catalog/perlw...pter/ch08.html
Up until now I was under the impression that I would have to use split - can
you elaborate why you chose a different approach?
Either method produces the same results. If you plan on incorporating
error checking, m// allows to specifically define a format, e.g., four
words and a number, whereas split simply breaks a string up into a list.
Whichever method makes you happy.
One other task I have to do is similar to:
If a line contains Single in the column then get the single person's name.


I sort of came up with:
foreach $m_statuses{'Si ngle'}
print $names{$name}

but that's probably totally wrong. Can you advise?


Yes, it is totally wrong. $m_statuses{'Si ngle'} is a scalar. It is the
count of lines where the marital status is 'Single'. Your foreach loop
above would produce a syntax error. Although it is not what you're
after, a valid foreach loop could look like this:

foreach $m_status ( keys %m_statuses ) {
#
# $m_status will be 'female' for one iteration of the loop and 'male'
# for the other. (Unless you have more than two sexes...)
#
}

Perhaps a more meaningful foreach loop would look like this:

foreach $age ( keys %ages ) {
#
# For each iteration, $age will one the ages that was found in the data
# -->> IN NO PARTICULAR ORDER <<-- unless you sort it.
#
}

To do what you propose, i.e., print the name of all single people, you
would have to include the logic for that in the parsing loop:

# extract four words and a number into scalars:
my ($name, $sex, $dept, $m_status, $age) =
/^(\w+) (\w+) (\w+) (\w+) (\d+)$/;

# increment counts for each:
$names{$name}++ ;
$sexes{$sex}++;
$depts{$dept}++ ;
$m_statuses{$m_ status}++;
$ages{$age}++;

# take special actions:
if ( $m_status eq 'Single' ) print "$name is single.\n";
if ( $age >= 40 ) print "$name is over the hill!\n";
Hope this helps!

Greg
Jul 19 '05 #4
Thanks again !

1)
Sorry for being too vague. With regard to the HEADINGS they separate blocks
of data. But because the column names will be different [data is different]
then I'm not quite sure I could use:
$names{$heading }{$name}++;

So I'm looking at creating separate my () definitions for each HEADING and
just wanted to confirm how to jump out of one HEADING loop and start with
the next.

For example, under HEADING 1 we have these columns:
Name, Sex, Dept, M_Status, Age

and under HEADING 2we have:
Address, Phone#, Mobile#, Salary

So at the beginning of the script I would have
my (%names, %sexes, %depts, %m_statuses, %ages)
my (%addresses, %phones, %mobiles, %salaries)
#then I have my while (<>) and parsing here
#I have my output at the end

Is that a little more clearer?
2)
With my last question regarding the printing of the names of single people,
if we include a print statement in the parsing loop would that give us
something like:
Pete is single.
John is single.
while the parsing is still running?

What I'm after is hopefully feeding that output into something else
[@array?] which can then print a list of the names [line by line] at the end
of the script, something like:
#this is the output structure
Number of Petes =
Number of Males =
Singles are:
Pete
John
Number of Salespeople =
Does this make sense?

Thanks Greg.
"Ga Mu" <Ng******@SPcom cast.netAM> wrote in message
news:3G******** ***********@rwc rnsc52.ops.asp. att.net...
Troll wrote:
Ga Mu,

Pls disregard last post.

With regard to the jump between HEADINGS, will it be enough to do something like:
while (<>)
...
if (/HEADING 1/ .. /HEADING 2/) {
# line falls between HEADING 1 and HEADING 2 in the text, inclusive.
# then do the string extraction
# then increment stuff
elsif (/HEADING 2/ .. /HEADING 3/) {
# line falls between HEADING 2 and HEADING 3 in the text, inclusive.
# then do the string extraction
# then increment stuff
etc?


I am unclear as to the distinction between blocks. Are there a separate
group of totals for each heading or is everyting totalled up together?
If the latter, then simply ignore the headings. If the former, then you
could parse out the heading name and use a multidimensiona l hash. I.e.,
replace this:

$names{$name}++ ;

with this:

$names{$heading }{$name}++;

I quite like the code example you provided - actually found a similar one in http://www.oreilly.com/catalog/perlw...pter/ch08.html
Up until now I was under the impression that I would have to use split - can you elaborate why you chose a different approach?


Either method produces the same results. If you plan on incorporating
error checking, m// allows to specifically define a format, e.g., four
words and a number, whereas split simply breaks a string up into a list.
Whichever method makes you happy.
One other task I have to do is similar to:
If a line contains Single in the column then get the single person's
name.
I sort of came up with:
foreach $m_statuses{'Si ngle'}
print $names{$name}

but that's probably totally wrong. Can you advise?


Yes, it is totally wrong. $m_statuses{'Si ngle'} is a scalar. It is the
count of lines where the marital status is 'Single'. Your foreach loop
above would produce a syntax error. Although it is not what you're
after, a valid foreach loop could look like this:

foreach $m_status ( keys %m_statuses ) {
#
# $m_status will be 'female' for one iteration of the loop and 'male'
# for the other. (Unless you have more than two sexes...)
#
}

Perhaps a more meaningful foreach loop would look like this:

foreach $age ( keys %ages ) {
#
# For each iteration, $age will one the ages that was found in the data
# -->> IN NO PARTICULAR ORDER <<-- unless you sort it.
#
}

To do what you propose, i.e., print the name of all single people, you
would have to include the logic for that in the parsing loop:

# extract four words and a number into scalars:
my ($name, $sex, $dept, $m_status, $age) =
/^(\w+) (\w+) (\w+) (\w+) (\d+)$/;

# increment counts for each:
$names{$name}++ ;
$sexes{$sex}++;
$depts{$dept}++ ;
$m_statuses{$m_ status}++;
$ages{$age}++;

# take special actions:
if ( $m_status eq 'Single' ) print "$name is single.\n";
if ( $age >= 40 ) print "$name is over the hill!\n";
Hope this helps!

Greg

Jul 19 '05 #5
Troll wrote:
Thanks again !

1)
Sorry for being too vague. With regard to the HEADINGS they separate blocks
of data. But because the column names will be different [data is different]
then I'm not quite sure I could use:
$names{$heading }{$name}++;

So I'm looking at creating separate my () definitions for each HEADING and
just wanted to confirm how to jump out of one HEADING loop and start with
the next.

For example, under HEADING 1 we have these columns:
Name, Sex, Dept, M_Status, Age

and under HEADING 2we have:
Address, Phone#, Mobile#, Salary

So at the beginning of the script I would have
my (%names, %sexes, %depts, %m_statuses, %ages)
my (%addresses, %phones, %mobiles, %salaries)
#then I have my while (<>) and parsing here
#I have my output at the end

Is that a little more clearer?
Yes. Much clearer. There are a couple of different ways you could do
this. One is to use a single loop that reads through the file and uses
a state variable (e.g., $heading) to keep track of where you are in the
parsing process. The other is to have a separate loop for each heading.
Again, six of one, half a dozen of another. It's more a matter of
preference than anything else.

An example of the first approach:

my $heading = 'initial';
my $fin_name = '/usr/local/blah/blah/blah';
open FIN,$fin_name || die "Can't open $fin_name\n";

while (<FIN>) {

# check for a new heading
# I am assuming single word heading names
if ( /HEADING (\S+)/ {

$heading = $1; # set $heading equal to word extracted above

# take appropriate action based on the heading we are under

} elsif ( $heading eq 'NAMES' ) {

( $name, $sex, $dept, $m_status, $age ) =
/(\w+) (\w+) (\w+) (\w+) (\d+)/;

# update counts, append to lists, etc...

} elsif ( $heading eq 'ADDRESSES' ) {

# I am assuming the address field is limited to 30 characters
# here:
( $address,$phone , $mobile, $salary ) =
/(\.{30}) (\S+) (\S+) (\d+)/;

# update counts, append to lists, etc...

}

}
And the second approach:

my $heading = 'initial';
my $fin_name = '/usr/local/blah/blah/blah';
open FIN,$fin_name || die "Can't open $fin_name\n";

# scan for first heading
while ( <FIN> && ! /HEADING NAMES/ );

# parse the names, etc...
while ( <FIN> && ! /HEADING ADDRESSES/ ) {

( $name, $sex, $dept, $m_status, $age ) =
/(\w+) (\w+) (\w+) (\w+) (\d+)/;

# update counts, append to lists, etc...
# parse the addresses, etc...
# for brevity , I am assuming only two headings
while ( <FIN> ) {

( $address,$phone , $mobile, $salary ) =
/(\.{30}) (\S+) (\S+) (\d+)/;

# update counts, append to lists, etc...

}


2)
With my last question regarding the printing of the names of single people,
if we include a print statement in the parsing loop would that give us
something like:
Pete is single.
John is single.
while the parsing is still running?
Yes.

What I'm after is hopefully feeding that output into something else
[@array?] which can then print a list of the names [line by line] at the end
of the script, something like:
#this is the output structure
Number of Petes =
Number of Males =
Singles are:
Pete
John
Number of Salespeople =
Does this make sense?


Yes. It would be easy to create a list/array of, e.g., single people.
Prior to the loop, declare the array. Within the loop, test each person
for being single. If they are, push them onto the list:

# prior to your parsing loop, declare array @singles:

my @singles;

# within your parsing loop, after parsing out name, status, etc.:

if ( $m_status eq 'Single' ) push @singles,($name );

# after loop, to print the list of singles:

print "Single persons:\n";
foreach $single_person ( @singles ) print " $single_person\ n";
Greg

Jul 19 '05 #6
Wow. I don't know how you get the time to respond to my queries in such
detail. It is greatly appreciated.
I just came back from work and it's like 2:30 am so I'll crash out soon and
have a closer read tomorrow [especially of the HEADINGS part].

With the push @array stuff I actually got to this today in my readings. I
saw an example of appending an array onto another array with a push and I
was wondering if we could just substitute a $variable for one of the arrays.
I'm glad you confirmed this. :)

I was also wondering if doing this at the beginning of the script:

my (%names, %sexes, %depts, %m_statuses, %ages) # declaring things
locally

would be considered bad practice. I thought that one should declare things
as my ( ) if one is using things within a loop so as not to impact anything
external to the loop. But if one uses variables/arrays both within and
outside the loops, should we then still declare stuff as my ( )?
Maybe I'm just confused about my ( )...

Greg, if you could possibly keep an eye on this thread for the next few days
I would be very much in your debt. Your help has been invaluabe so far in
allowing me to visualise quite a few things.

Thanks very much.
"Ga Mu" <Ng******@SPcom cast.netAM> wrote in message
news:uR******** ***********@rwc rnsc52.ops.asp. att.net...
Troll wrote:
Thanks again !

1)
Sorry for being too vague. With regard to the HEADINGS they separate blocks of data. But because the column names will be different [data is different] then I'm not quite sure I could use:
$names{$heading }{$name}++;

So I'm looking at creating separate my () definitions for each HEADING and just wanted to confirm how to jump out of one HEADING loop and start with the next.

For example, under HEADING 1 we have these columns:
Name, Sex, Dept, M_Status, Age

and under HEADING 2we have:
Address, Phone#, Mobile#, Salary

So at the beginning of the script I would have
my (%names, %sexes, %depts, %m_statuses, %ages)
my (%addresses, %phones, %mobiles, %salaries)
#then I have my while (<>) and parsing here
#I have my output at the end

Is that a little more clearer?


Yes. Much clearer. There are a couple of different ways you could do
this. One is to use a single loop that reads through the file and uses
a state variable (e.g., $heading) to keep track of where you are in the
parsing process. The other is to have a separate loop for each heading.
Again, six of one, half a dozen of another. It's more a matter of
preference than anything else.

An example of the first approach:

my $heading = 'initial';
my $fin_name = '/usr/local/blah/blah/blah';
open FIN,$fin_name || die "Can't open $fin_name\n";

while (<FIN>) {

# check for a new heading
# I am assuming single word heading names
if ( /HEADING (\S+)/ {

$heading = $1; # set $heading equal to word extracted above

# take appropriate action based on the heading we are under

} elsif ( $heading eq 'NAMES' ) {

( $name, $sex, $dept, $m_status, $age ) =
/(\w+) (\w+) (\w+) (\w+) (\d+)/;

# update counts, append to lists, etc...

} elsif ( $heading eq 'ADDRESSES' ) {

# I am assuming the address field is limited to 30 characters
# here:
( $address,$phone , $mobile, $salary ) =
/(\.{30}) (\S+) (\S+) (\d+)/;

# update counts, append to lists, etc...

}

}
And the second approach:

my $heading = 'initial';
my $fin_name = '/usr/local/blah/blah/blah';
open FIN,$fin_name || die "Can't open $fin_name\n";

# scan for first heading
while ( <FIN> && ! /HEADING NAMES/ );

# parse the names, etc...
while ( <FIN> && ! /HEADING ADDRESSES/ ) {

( $name, $sex, $dept, $m_status, $age ) =
/(\w+) (\w+) (\w+) (\w+) (\d+)/;

# update counts, append to lists, etc...
# parse the addresses, etc...
# for brevity , I am assuming only two headings
while ( <FIN> ) {

( $address,$phone , $mobile, $salary ) =
/(\.{30}) (\S+) (\S+) (\d+)/;

# update counts, append to lists, etc...

}


2)
With my last question regarding the printing of the names of single people, if we include a print statement in the parsing loop would that give us
something like:
Pete is single.
John is single.
while the parsing is still running?


Yes.

What I'm after is hopefully feeding that output into something else
[@array?] which can then print a list of the names [line by line] at the end of the script, something like:
#this is the output structure
Number of Petes =
Number of Males =
Singles are:
Pete
John
Number of Salespeople =
Does this make sense?


Yes. It would be easy to create a list/array of, e.g., single people.
Prior to the loop, declare the array. Within the loop, test each person
for being single. If they are, push them onto the list:

# prior to your parsing loop, declare array @singles:

my @singles;

# within your parsing loop, after parsing out name, status, etc.:

if ( $m_status eq 'Single' ) push @singles,($name );

# after loop, to print the list of singles:

print "Single persons:\n";
foreach $single_person ( @singles ) print " $single_person\ n";
Greg

Jul 19 '05 #7
Now time for some stupid Qs:

Let's say that the data I have is in a file called employees.
How can I call this file so that I can parse it?

1) Can I do:
@HRdata = `cat employees`;
while (<@HRdata>) {
2) With regard to the HEADING sections, the script has to be able to
recognise the different sections by the following rules:
# there's a blank line
before each heading
HEADING 1 # this is the name of the heading -
this is a string with a special character and a blank space as part of it
ColumnA ColumnB ColumnC # these are the column names - these are
strings which also can inlude a blank space if they have 2 or more words
******* # a sort of an underlining
pattern

I guess this is to make sure that one does not include any silly heading
data as part of the arrays created and the parsing only takes place on
'real' data. Can you pls advise? Or do you need more info? I'm more in
favour of creating separate 'if' loops due to my 'newbie' status. I'll get
lost otherwise...

Thanks.

"Troll" <ab***@microsof t.com> wrote in message
news:uR******** ***********@new s-server.bigpond. net.au...
Wow. I don't know how you get the time to respond to my queries in such
detail. It is greatly appreciated.
I just came back from work and it's like 2:30 am so I'll crash out soon and have a closer read tomorrow [especially of the HEADINGS part].

With the push @array stuff I actually got to this today in my readings. I
saw an example of appending an array onto another array with a push and I
was wondering if we could just substitute a $variable for one of the arrays. I'm glad you confirmed this. :)

I was also wondering if doing this at the beginning of the script:

my (%names, %sexes, %depts, %m_statuses, %ages) # declaring things
locally

would be considered bad practice. I thought that one should declare things
as my ( ) if one is using things within a loop so as not to impact anything external to the loop. But if one uses variables/arrays both within and
outside the loops, should we then still declare stuff as my ( )?
Maybe I'm just confused about my ( )...

Greg, if you could possibly keep an eye on this thread for the next few days I would be very much in your debt. Your help has been invaluabe so far in
allowing me to visualise quite a few things.

Thanks very much.
"Ga Mu" <Ng******@SPcom cast.netAM> wrote in message
news:uR******** ***********@rwc rnsc52.ops.asp. att.net...
Troll wrote:
Thanks again !

1)
Sorry for being too vague. With regard to the HEADINGS they separate blocks of data. But because the column names will be different [data is different] then I'm not quite sure I could use:
$names{$heading }{$name}++;

So I'm looking at creating separate my () definitions for each HEADING and just wanted to confirm how to jump out of one HEADING loop and start with the next.

For example, under HEADING 1 we have these columns:
Name, Sex, Dept, M_Status, Age

and under HEADING 2we have:
Address, Phone#, Mobile#, Salary

So at the beginning of the script I would have
my (%names, %sexes, %depts, %m_statuses, %ages)
my (%addresses, %phones, %mobiles, %salaries)
#then I have my while (<>) and parsing here
#I have my output at the end

Is that a little more clearer?
Yes. Much clearer. There are a couple of different ways you could do
this. One is to use a single loop that reads through the file and uses
a state variable (e.g., $heading) to keep track of where you are in the
parsing process. The other is to have a separate loop for each heading.
Again, six of one, half a dozen of another. It's more a matter of
preference than anything else.

An example of the first approach:

my $heading = 'initial';
my $fin_name = '/usr/local/blah/blah/blah';
open FIN,$fin_name || die "Can't open $fin_name\n";

while (<FIN>) {

# check for a new heading
# I am assuming single word heading names
if ( /HEADING (\S+)/ {

$heading = $1; # set $heading equal to word extracted above

# take appropriate action based on the heading we are under

} elsif ( $heading eq 'NAMES' ) {

( $name, $sex, $dept, $m_status, $age ) =
/(\w+) (\w+) (\w+) (\w+) (\d+)/;

# update counts, append to lists, etc...

} elsif ( $heading eq 'ADDRESSES' ) {

# I am assuming the address field is limited to 30 characters
# here:
( $address,$phone , $mobile, $salary ) =
/(\.{30}) (\S+) (\S+) (\d+)/;

# update counts, append to lists, etc...

}

}
And the second approach:

my $heading = 'initial';
my $fin_name = '/usr/local/blah/blah/blah';
open FIN,$fin_name || die "Can't open $fin_name\n";

# scan for first heading
while ( <FIN> && ! /HEADING NAMES/ );

# parse the names, etc...
while ( <FIN> && ! /HEADING ADDRESSES/ ) {

( $name, $sex, $dept, $m_status, $age ) =
/(\w+) (\w+) (\w+) (\w+) (\d+)/;

# update counts, append to lists, etc...
# parse the addresses, etc...
# for brevity , I am assuming only two headings
while ( <FIN> ) {

( $address,$phone , $mobile, $salary ) =
/(\.{30}) (\S+) (\S+) (\d+)/;

# update counts, append to lists, etc...

}


2)
With my last question regarding the printing of the names of single people, if we include a print statement in the parsing loop would that give us
something like:
Pete is single.
John is single.
while the parsing is still running?


Yes.

What I'm after is hopefully feeding that output into something else
[@array?] which can then print a list of the names [line by line] at
the end of the script, something like:
#this is the output structure
Number of Petes =
Number of Males =
Singles are:
Pete
John
Number of Salespeople =
Does this make sense?


Yes. It would be easy to create a list/array of, e.g., single people.
Prior to the loop, declare the array. Within the loop, test each person
for being single. If they are, push them onto the list:

# prior to your parsing loop, declare array @singles:

my @singles;

# within your parsing loop, after parsing out name, status, etc.:

if ( $m_status eq 'Single' ) push @singles,($name );

# after loop, to print the list of singles:

print "Single persons:\n";
foreach $single_person ( @singles ) print " $single_person\ n";
Greg


Jul 19 '05 #8
I'm getting heaps of the following errors when I run my script:
Use of uninitialized value in hash element at ...

The beginning of my script looks like:
my(%names, %sexes, %depts);
%names = ("name" => "0");
%sexes = ("sex" => "0");
%depts = ("dept" => "0");

$names = '0';
$sexes = '0';
$depts = '0';
$name = '0';
$sex = '0';
$dept = '0';

while (<>)
#and the parsing loop here...
The hash errors relate to only these 3 lines which are part of the parsing
loop:
$names{$name}++ ;
$sexes{$sex}++;
$depts{$dept}++ ;
Can you run over the variable declarations/initializations for me as I'm not
sure I'm doing this right?
Thanks.
"Troll" <ab***@microsof t.com> wrote in message
news:eh******** ***********@new s-server.bigpond. net.au...
Now time for some stupid Qs:

Let's say that the data I have is in a file called employees.
How can I call this file so that I can parse it?

1) Can I do:
@HRdata = `cat employees`;
while (<@HRdata>) {
2) With regard to the HEADING sections, the script has to be able to
recognise the different sections by the following rules:
# there's a blank line before each heading
HEADING 1 # this is the name of the heading - this is a string with a special character and a blank space as part of it
ColumnA ColumnB ColumnC # these are the column names - these are
strings which also can inlude a blank space if they have 2 or more words
******* # a sort of an underlining
pattern

I guess this is to make sure that one does not include any silly heading
data as part of the arrays created and the parsing only takes place on
'real' data. Can you pls advise? Or do you need more info? I'm more in
favour of creating separate 'if' loops due to my 'newbie' status. I'll get
lost otherwise...

Thanks.

"Troll" <ab***@microsof t.com> wrote in message
news:uR******** ***********@new s-server.bigpond. net.au...
Wow. I don't know how you get the time to respond to my queries in such
detail. It is greatly appreciated.
I just came back from work and it's like 2:30 am so I'll crash out soon

and
have a closer read tomorrow [especially of the HEADINGS part].

With the push @array stuff I actually got to this today in my readings. I saw an example of appending an array onto another array with a push and I was wondering if we could just substitute a $variable for one of the

arrays.
I'm glad you confirmed this. :)

I was also wondering if doing this at the beginning of the script:

my (%names, %sexes, %depts, %m_statuses, %ages) # declaring things locally

would be considered bad practice. I thought that one should declare things as my ( ) if one is using things within a loop so as not to impact

anything
external to the loop. But if one uses variables/arrays both within and
outside the loops, should we then still declare stuff as my ( )?
Maybe I'm just confused about my ( )...

Greg, if you could possibly keep an eye on this thread for the next few

days
I would be very much in your debt. Your help has been invaluabe so far in allowing me to visualise quite a few things.

Thanks very much.
"Ga Mu" <Ng******@SPcom cast.netAM> wrote in message
news:uR******** ***********@rwc rnsc52.ops.asp. att.net...
Troll wrote:
> Thanks again !
>
> 1)
> Sorry for being too vague. With regard to the HEADINGS they separate

blocks
> of data. But because the column names will be different [data is

different]
> then I'm not quite sure I could use:
> $names{$heading }{$name}++;
>
> So I'm looking at creating separate my () definitions for each HEADING
and
> just wanted to confirm how to jump out of one HEADING loop and start

with
> the next.
>
> For example, under HEADING 1 we have these columns:
> Name, Sex, Dept, M_Status, Age
>
> and under HEADING 2we have:
> Address, Phone#, Mobile#, Salary
>
> So at the beginning of the script I would have
> my (%names, %sexes, %depts, %m_statuses, %ages)
> my (%addresses, %phones, %mobiles, %salaries)
> #then I have my while (<>) and parsing here
> #I have my output at the end
>
> Is that a little more clearer?

Yes. Much clearer. There are a couple of different ways you could do
this. One is to use a single loop that reads through the file and

uses a state variable (e.g., $heading) to keep track of where you are in the parsing process. The other is to have a separate loop for each heading. Again, six of one, half a dozen of another. It's more a matter of
preference than anything else.

An example of the first approach:

my $heading = 'initial';
my $fin_name = '/usr/local/blah/blah/blah';
open FIN,$fin_name || die "Can't open $fin_name\n";

while (<FIN>) {

# check for a new heading
# I am assuming single word heading names
if ( /HEADING (\S+)/ {

$heading = $1; # set $heading equal to word extracted above

# take appropriate action based on the heading we are under

} elsif ( $heading eq 'NAMES' ) {

( $name, $sex, $dept, $m_status, $age ) =
/(\w+) (\w+) (\w+) (\w+) (\d+)/;

# update counts, append to lists, etc...

} elsif ( $heading eq 'ADDRESSES' ) {

# I am assuming the address field is limited to 30 characters
# here:
( $address,$phone , $mobile, $salary ) =
/(\.{30}) (\S+) (\S+) (\d+)/;

# update counts, append to lists, etc...

}

}
And the second approach:

my $heading = 'initial';
my $fin_name = '/usr/local/blah/blah/blah';
open FIN,$fin_name || die "Can't open $fin_name\n";

# scan for first heading
while ( <FIN> && ! /HEADING NAMES/ );

# parse the names, etc...
while ( <FIN> && ! /HEADING ADDRESSES/ ) {

( $name, $sex, $dept, $m_status, $age ) =
/(\w+) (\w+) (\w+) (\w+) (\d+)/;

# update counts, append to lists, etc...
# parse the addresses, etc...
# for brevity , I am assuming only two headings
while ( <FIN> ) {

( $address,$phone , $mobile, $salary ) =
/(\.{30}) (\S+) (\S+) (\d+)/;

# update counts, append to lists, etc...

}

>
>
> 2)
> With my last question regarding the printing of the names of single

people,
> if we include a print statement in the parsing loop would that give us > something like:
> Pete is single.
> John is single.
> while the parsing is still running?

Yes.

>
> What I'm after is hopefully feeding that output into something else
> [@array?] which can then print a list of the names [line by line] at

the
end
> of the script, something like:
> #this is the output structure
> Number of Petes =
> Number of Males =
> Singles are:
> Pete
> John
> Number of Salespeople =
>
>
> Does this make sense?
>

Yes. It would be easy to create a list/array of, e.g., single people.
Prior to the loop, declare the array. Within the loop, test each person for being single. If they are, push them onto the list:

# prior to your parsing loop, declare array @singles:

my @singles;

# within your parsing loop, after parsing out name, status, etc.:

if ( $m_status eq 'Single' ) push @singles,($name );

# after loop, to print the list of singles:

print "Single persons:\n";
foreach $single_person ( @singles ) print " $single_person\ n";
Greg



Jul 19 '05 #9
Troll wrote:
Greg,
I decided to give you a glimpse at the code itself so as to make it clearer.
Just be aware that the variable/array names have changed but the general
idea is the same.
The hash errors refer to the variables in the increment section.

#!/usr/bin/perl -w

open(NET, "netstat|") || die ("Cannot run netstat: $!");

my(%UDP4localad dresses, %UDP4remoteaddr esses, %UDP4states);

$UDP4localaddre ss = '0';
$UDP4remoteaddr ess = '0';
$UDP4state = '0';

Why are you doing this (above)? This is initializing three variables to
zero. These three variables have nothing to do with the three variables
of the same name in the while loop.
$UDP4localaddre sses = '0';
$UDP4remoteaddr esses = '0';
$UDP4states = '0';

Why are you doing this (above)? This is initializing three scalars to
zero. These three scalars have the same name, but have nothing else to
do with the hashes of the same name.
$UDP4localaddre sses{$UDP4local address} = '0';
$UDP4remoteaddr esses{$UDP4remo teaddress} = '0';
$UDP4states = ($UDP4state} = '0';

Instances of hash keys are automatically initialized to zero. That is
what makes them perfect for counting occurences of unknown words,
numbers, etc. And even if you had to initialize them, you are
initilizing $UDP4localaddre sses{0} to zero.
while (<NET>) {
my($UDP4localad dress, $UDP4remoteaddr ess, $UDP4state)=
/(\s+) (\s+) (\s+)$/;

#increments start here
$UDP4localaddre sses{$UDP4local address}++;
$UDP4remoteaddr esses{$UDP4remo teaddress}++;
$UDP4states = ($UDP4state}++;
If the increments above are failing, it is probably because your m// is
failing and one or more of the keys (variable inside the {}) are
undefined. Try putting a print statement before the increments and
print each of the variables you are extracting, then play with the
regular expression until you get values for ALL of them.
}

#here comes the output
Can you pls criticise my futile attempt to get this going? As one can see,
I'm not that clear on initializations ...


Jul 19 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
4816
by: Redd | last post by:
The following is a technical report on a data modeling project that was recently assigned to me by my professor. I post it so that anyone else who is studying databases and data modeling can have an example to go by with their study of databases. I was assinged to come up with a data model, but I choose the Autoparts sales and inventory management schema. It you would like the SQL code to generate the schema or if you would like the ERWin...
0
2921
by: me | last post by:
I've posted this in the microsoft news group but just noticed the comp newsgroups. What's the difference anyways? This one is a tricky one so I'm interested in seeing what all you gurus have to say. I'm trying to put a resume into a database. There are only three reports (see below). I suspect that the best structure for a resume database is as follows:
4
1537
by: Greg Teets | last post by:
I have a table that provides all the details necessary for the line items of the report. I would like to use some other information in the header of the report. This information, like getting the invoice number for this invoice, the customer's previous balance, etc., would come from a completely different source. However, it seems that the report properties only allow for one datasource for the entire report.
18
1600
by: xarax | last post by:
Greetings, What is the general practice, usual and customary way, of including a data file into a source file? I have some large data structures defined as source similar to: ========================= typedef struct fubar
0
1047
by: pbb | last post by:
I've got an ASP.NET app (VB.NET) that I'm building for our company to use in-house. Just for background info - the prototype of this program was a windows-based app, but we want to make it web-based. The program has a web form that displays line item data for one record at a time with navigation buttons (like a rolodex) and also summary data for the found set. The problem I'm having has to do with getting summary data from SQL 2000 to...
3
12201
by: Phil Endecott | last post by:
Dear PostgreSQL experts, This is with version 7.4.2. My database has grown a bit recently, mostly in number of tables but also their size, and I started to see ANALYSE failing with this message: WARNING: out of shared memory ERROR: out of shared memory HINT: You may need to increase max_locks_per_transaction.
7
2964
by: fakeprogress | last post by:
For a homework assignment in my Data Structures/C++ class, I have to create the interface and implementation for a class called Book, create objects within the class, and process transactions that manipulate (and report on) members of the class. Interface consists of: - 5 private variables char author; char title; char code;
17
46619
Motoma
by: Motoma | last post by:
This article is cross posted from my personal blog. You can find the original article, in all its splendor, at http://motomastyle.com/creating-a-mysql-data-abstraction-layer-in-php/. Introduction: The goal of this tutorial is to design a Data Abstraction Layer (DAL) in PHP, that will allow us to ignore the intricacies of MySQL and focus our attention on our Application Layer and Business Logic. Hopefully, by the end of this guide, you will...
8
3095
by: Brock | last post by:
I am trying to populate a Crystal Report from data in my DataGrid. The reason for this is that I want the user to be able to change values without updating the database, but still have their report reflect the values they anticipate committing to see hypothetical totals of columns from a set of records. These records are displaying properly on my DataGrid but I'm not sure how to get Crystal Reports 10 to use as its datasource the dataset...
0
10003
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10882
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
10504
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9692
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
8055
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
7213
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5897
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
6099
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
4312
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.