473,218 Members | 1,931 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,218 software developers and data experts.

Trying to extract a string from HTTP::Request object

I'm trying to extract HTML from a website in the form of a string, and then I want to extract particular elements from the string using the substr function:
here is some sample code that I have thus far:

Expand|Select|Wrap|Line Numbers
  1. use HTTP::Request::Common;
  2. use LWP::UserAgent;
  3. use LWP::Simple;
  4.  
  5. $ua = LWP::UserAgent->new;
  6.  
  7. $request = HTTP::Request->new(GET => 'http://www.cnn.com');
  8. $response = $ua->request($request);
  9. $content = $response->content();
  10.  
  11. my $result2 = substr $content, index($content, 'Headlines');
  12.  
So, the variable $content seems to be an HTML object or something that is NOT a string. How can I convert $content to a string, so that I can use the substr function?

I have tried other methods including simpler code:

Expand|Select|Wrap|Line Numbers
  1. my $content = get('http://securities.stanford.edu//1014/TCHC00');
  2.  
however, I am not able to process $content as a string.

I have even tried putting the contents into a text file, but I am not able to extract a string from a text file either?

any help is appreciated!!!
Mar 27 '07 #1
1 3598
I ran your program through the Perl debugger, and confirmed that $response->content() definitely contains HTML. When I put the output into an editor, I found that the content does NOT contain "Headlines." Try another keyword, like "Weather."

If you want to examine variables without the debugger, use this code (provided your Perl has the Dumpvalue module):
Expand|Select|Wrap|Line Numbers
  1. use HTTP::Request::Common;
  2. use LWP::UserAgent;
  3. use LWP::Simple;
  4. use Dumpvalue;
  5.  
  6. $dumper=new Dumpvalue;
  7.  
  8. $ua = LWP::UserAgent->new;
  9.  
  10. $request = HTTP::Request->new(GET => 'http://www.cnn.com');
  11. $response = $ua->request($request);
  12. $content = $response->content();
  13.  
  14. $dumper->dumpValue(\$response);
  15.  
  16. my $result2 = substr $content, index($content, 'Headlines');
  17.  
Then when you run it, save the output to a text file. On my Windows box, with ActiveState Perl, I used this:
Expand|Select|Wrap|Line Numbers
  1. C:\cygwin\home\Rick\perl>perl getreq.pl > output.txt
  2.  
In the Perl debugger, this is what I see when I used 'Weather':
Expand|Select|Wrap|Line Numbers
  1. main::(getreq.pl:15):   my $result2 = substr $content, index($content, 'Weather');
  2.   DB<1>
  3. main::(getreq.pl:17):   print $result2;
  4.   DB<1> print length($result2)
  5. 105335
  6.   DB<2> print substr $result2, 0, 20
  7. Weather, Entertainme
  8.  
You're on the right track. Prove what each line does, and learn the Perl debugger to get interactive.
Mar 29 '07 #2

Sign in to post your reply or Sign up for a free account.

Similar topics

11
by: Marcos | last post by:
This script not work with last Opera 8, but its work with Internet Explorer and Firefox, the HTTP Request yet not work fine on the Opera? function sendmessage(url, querystrings) { xmlhttp =...
7
by: Mullin Yu | last post by:
I want to submit a utf-8 xml request to a servlet by the following coding. it seesm that the servlet can't recognize it correctly. can i just using string postData = "..... utf-8 data" and then...
5
by: Henrik | last post by:
Hi, I am trying to read some industrial webservers using the HTTP/CGI webequest like this: wrs = (HttpWebRequest)WebRequest.Create(HTTP/CGI-string); mwst = (HttpWebResponse wrs.GetResponse();...
2
by: Centaury | last post by:
Hello, some time ago I've created a site in asp in combination with the XML http request object, this way I was able to load content from a database into div elements, this way there is no need to...
1
by: omantawy | last post by:
Hi, I have some legacy ASP web applications that use an unmanaged COM component to connect to a third party application. The third part application has moved to the managed code in the current...
1
by: shannonw | last post by:
New to .NET & VB and looking for some code to help me on my way. Looking particularly for an http request object. That is: I want an object I can plug into an application that will accept all...
2
by: vunet.us | last post by:
Why ASP's MS XML HTTP request object gets another page's HTML source without interpreting path differences. For example, if my page is: www.test.asp/one/two/page1.asp with XMLHTTP getting...
7
by: Ron Garret | last post by:
I'm writing a little HTTP server and need to parse request content that is mime-encoded. All the MIME routines in the Python standard library seem to have been subsumed into the email package,...
1
by: Edwin.Madari | last post by:
from each line separate out url and request parts. split the request into key-value pairs, use urllib to unquote key-value pairs......as show below... import urllib line = "GET...
0
by: VivesProcSPL | last post by:
Obviously, one of the original purposes of SQL is to make data query processing easy. The language uses many English-like terms and syntax in an effort to make it easy to learn, particularly for...
3
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 3 Jan 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). For other local times, please check World Time Buddy In...
0
by: mar23 | last post by:
Here's the situation. I have a form called frmDiceInventory with subform called subfrmDice. The subform's control source is linked to a query called qryDiceInventory. I've been trying to pick up the...
2
by: jimatqsi | last post by:
The boss wants the word "CONFIDENTIAL" overlaying certain reports. He wants it large, slanted across the page, on every page, very light gray, outlined letters, not block letters. I thought Word Art...
0
by: fareedcanada | last post by:
Hello I am trying to split number on their count. suppose i have 121314151617 (12cnt) then number should be split like 12,13,14,15,16,17 and if 11314151617 (11cnt) then should be split like...
0
by: stefan129 | last post by:
Hey forum members, I'm exploring options for SSL certificates for multiple domains. Has anyone had experience with multi-domain SSL certificates? Any recommendations on reliable providers or specific...
0
Git
by: egorbl4 | last post by:
Скачал я git, хотел начать настройку, а там вылезло вот это Что это? Что мне с этим делать? ...
1
by: davi5007 | last post by:
Hi, Basically, I am trying to automate a field named TraceabilityNo into a web page from an access form. I've got the serial held in the variable strSearchString. How can I get this into the...
0
by: MeoLessi9 | last post by:
I have VirtualBox installed on Windows 11 and now I would like to install Kali on a virtual machine. However, on the official website, I see two options: "Installer images" and "Virtual machines"....

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.