What I'm trying to do seems right up Perl's alley, but I can't get it to work. I'm using the WWW::Mechanize module to retrieve a sprawling HTML document from which I want to extract certain strings and save them. I can get this much to work:
Expand|Select|Wrap|Line Numbers
- use WWW::Mechanize;
- $url = "http://someurl";
- my $mechanize = WWW::Mechanize->new(autocheck => 1);
- $mechanize->get($url);
- my @array_of_data = $mechanize->content;
The HTML doc is quite long, and contains numbers that I want to extract, numbers that are always preceeded by a text string that is the same each time, such as:
<a href bla bla bla>bla bla bla<random tag>mydigits=493409834%bla bla bla<meaningless tag>bla bla</a>
where the string "mydigits=" always preceeds the desired number and is sometimes all lowercase but can occasionally look like "MyDigits="; where the number itself may be anywhere from one to 10 digits in length; and where "%" might literally be "%" or any other non-digit character including a space. Moreover, the desired string might appear more than once per line -- assuming Perl doesn't see the HTML doc as just one single long line of text anyway.
What I have tried is many extremely ugly variations on
Expand|Select|Wrap|Line Numbers
- my $pattern = "[Mm]y[Dd]igits=[0-9]*[^0-9]";
- foreach (@array_of_data){
- if ( /$pattern/ ){
- print "$_\n";
219824
2230239084
04598
98739874
etc., etc.
or better yet, assign the output to an array that looks like:
@desired_array = ( 219824, 2230239084, 04598, 98739874);
I know I must be missing something very fundamental, so if anyone can help steer me away from the major mistakes I'm making, I'd appreciate it. Thanks.