By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
457,695 Members | 1,317 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 457,695 IT Pros & Developers. It's quick & easy.

Trying to extract a string from HTTP::Request object

P: 1
I'm trying to extract HTML from a website in the form of a string, and then I want to extract particular elements from the string using the substr function:
here is some sample code that I have thus far:

Expand|Select|Wrap|Line Numbers
  1. use HTTP::Request::Common;
  2. use LWP::UserAgent;
  3. use LWP::Simple;
  4.  
  5. $ua = LWP::UserAgent->new;
  6.  
  7. $request = HTTP::Request->new(GET => 'http://www.cnn.com');
  8. $response = $ua->request($request);
  9. $content = $response->content();
  10.  
  11. my $result2 = substr $content, index($content, 'Headlines');
  12.  
So, the variable $content seems to be an HTML object or something that is NOT a string. How can I convert $content to a string, so that I can use the substr function?

I have tried other methods including simpler code:

Expand|Select|Wrap|Line Numbers
  1. my $content = get('http://securities.stanford.edu//1014/TCHC00');
  2.  
however, I am not able to process $content as a string.

I have even tried putting the contents into a text file, but I am not able to extract a string from a text file either?

any help is appreciated!!!
Mar 27 '07 #1
Share this Question
Share on Google+
1 Reply


rickumali
P: 20
I ran your program through the Perl debugger, and confirmed that $response->content() definitely contains HTML. When I put the output into an editor, I found that the content does NOT contain "Headlines." Try another keyword, like "Weather."

If you want to examine variables without the debugger, use this code (provided your Perl has the Dumpvalue module):
Expand|Select|Wrap|Line Numbers
  1. use HTTP::Request::Common;
  2. use LWP::UserAgent;
  3. use LWP::Simple;
  4. use Dumpvalue;
  5.  
  6. $dumper=new Dumpvalue;
  7.  
  8. $ua = LWP::UserAgent->new;
  9.  
  10. $request = HTTP::Request->new(GET => 'http://www.cnn.com');
  11. $response = $ua->request($request);
  12. $content = $response->content();
  13.  
  14. $dumper->dumpValue(\$response);
  15.  
  16. my $result2 = substr $content, index($content, 'Headlines');
  17.  
Then when you run it, save the output to a text file. On my Windows box, with ActiveState Perl, I used this:
Expand|Select|Wrap|Line Numbers
  1. C:\cygwin\home\Rick\perl>perl getreq.pl > output.txt
  2.  
In the Perl debugger, this is what I see when I used 'Weather':
Expand|Select|Wrap|Line Numbers
  1. main::(getreq.pl:15):   my $result2 = substr $content, index($content, 'Weather');
  2.   DB<1>
  3. main::(getreq.pl:17):   print $result2;
  4.   DB<1> print length($result2)
  5. 105335
  6.   DB<2> print substr $result2, 0, 20
  7. Weather, Entertainme
  8.  
You're on the right track. Prove what each line does, and learn the Perl debugger to get interactive.
Mar 29 '07 #2

Post your reply

Sign in to post your reply or Sign up for a free account.