473,761 Members | 7,290 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Trying to extract a string from HTTP::Request object

1 New Member
I'm trying to extract HTML from a website in the form of a string, and then I want to extract particular elements from the string using the substr function:
here is some sample code that I have thus far:

Expand|Select|Wrap|Line Numbers
  1. use HTTP::Request::Common;
  2. use LWP::UserAgent;
  3. use LWP::Simple;
  4.  
  5. $ua = LWP::UserAgent->new;
  6.  
  7. $request = HTTP::Request->new(GET => 'http://www.cnn.com');
  8. $response = $ua->request($request);
  9. $content = $response->content();
  10.  
  11. my $result2 = substr $content, index($content, 'Headlines');
  12.  
So, the variable $content seems to be an HTML object or something that is NOT a string. How can I convert $content to a string, so that I can use the substr function?

I have tried other methods including simpler code:

Expand|Select|Wrap|Line Numbers
  1. my $content = get('http://securities.stanford.edu//1014/TCHC00');
  2.  
however, I am not able to process $content as a string.

I have even tried putting the contents into a text file, but I am not able to extract a string from a text file either?

any help is appreciated!!!
Mar 27 '07 #1
1 3663
rickumali
20 New Member
I ran your program through the Perl debugger, and confirmed that $response->content() definitely contains HTML. When I put the output into an editor, I found that the content does NOT contain "Headlines. " Try another keyword, like "Weather."

If you want to examine variables without the debugger, use this code (provided your Perl has the Dumpvalue module):
Expand|Select|Wrap|Line Numbers
  1. use HTTP::Request::Common;
  2. use LWP::UserAgent;
  3. use LWP::Simple;
  4. use Dumpvalue;
  5.  
  6. $dumper=new Dumpvalue;
  7.  
  8. $ua = LWP::UserAgent->new;
  9.  
  10. $request = HTTP::Request->new(GET => 'http://www.cnn.com');
  11. $response = $ua->request($request);
  12. $content = $response->content();
  13.  
  14. $dumper->dumpValue(\$response);
  15.  
  16. my $result2 = substr $content, index($content, 'Headlines');
  17.  
Then when you run it, save the output to a text file. On my Windows box, with ActiveState Perl, I used this:
Expand|Select|Wrap|Line Numbers
  1. C:\cygwin\home\Rick\perl>perl getreq.pl > output.txt
  2.  
In the Perl debugger, this is what I see when I used 'Weather':
Expand|Select|Wrap|Line Numbers
  1. main::(getreq.pl:15):   my $result2 = substr $content, index($content, 'Weather');
  2.   DB<1>
  3. main::(getreq.pl:17):   print $result2;
  4.   DB<1> print length($result2)
  5. 105335
  6.   DB<2> print substr $result2, 0, 20
  7. Weather, Entertainme
  8.  
You're on the right track. Prove what each line does, and learn the Perl debugger to get interactive.
Mar 29 '07 #2

Sign in to post your reply or Sign up for a free account.

Similar topics

11
2023
by: Marcos | last post by:
This script not work with last Opera 8, but its work with Internet Explorer and Firefox, the HTTP Request yet not work fine on the Opera? function sendmessage(url, querystrings) { xmlhttp = checkxml(); if(xmlhttp) { xmlhttp.open("POST", url ,true); // Method and URL destination
7
11988
by: Mullin Yu | last post by:
I want to submit a utf-8 xml request to a servlet by the following coding. it seesm that the servlet can't recognize it correctly. can i just using string postData = "..... utf-8 data" and then save the files at utf-8 encoded at the vs.net ide? or if i use string, vs.net ide will conside it to be utf-16??? UTF8Encoding encoding = new UTF8Encoding();
5
6023
by: Henrik | last post by:
Hi, I am trying to read some industrial webservers using the HTTP/CGI webequest like this: wrs = (HttpWebRequest)WebRequest.Create(HTTP/CGI-string); mwst = (HttpWebResponse wrs.GetResponse(); str = mwst.GetResponseStream(); This usually works fine but on some servers i get:
2
2784
by: Centaury | last post by:
Hello, some time ago I've created a site in asp in combination with the XML http request object, this way I was able to load content from a database into div elements, this way there is no need to reload a whole page wich speeds up the website's page loading time dramatically. I think it is a superb solution, also the content loaded in the div elements cannot be seen when viewing the source code of a page. But...I have found out, it has a...
1
2465
by: omantawy | last post by:
Hi, I have some legacy ASP web applications that use an unmanaged COM component to connect to a third party application. The third part application has moved to the managed code in the current release with backward compatibility with the unmanaged code. In the future releases, the vendor is going to drop backward compatibility and as a result of that all our legacy ASP applications will break.
1
2067
by: shannonw | last post by:
New to .NET & VB and looking for some code to help me on my way. Looking particularly for an http request object. That is: I want an object I can plug into an application that will accept all the usual pieces of information (url, request string, key/value pairs method etc...) that will make the outgoing connection for me and return the results.
2
2035
by: vunet.us | last post by:
Why ASP's MS XML HTTP request object gets another page's HTML source without interpreting path differences. For example, if my page is: www.test.asp/one/two/page1.asp with XMLHTTP getting source code of page: www.test.asp/one/page2.asp and this latter page has a CSS with path ../include/css.css, it won't
7
2943
by: Ron Garret | last post by:
I'm writing a little HTTP server and need to parse request content that is mime-encoded. All the MIME routines in the Python standard library seem to have been subsumed into the email package, which makes this operation a little awkward. It seems I have to do the following: 1. Extract the content-length header from the HTTP request and use that to read the payload. 2. Stick some artificial-looking headers onto the beginning of this...
1
2727
by: Edwin.Madari | last post by:
from each line separate out url and request parts. split the request into key-value pairs, use urllib to unquote key-value pairs......as show below... import urllib line = "GET...
0
9531
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9345
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10115
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9957
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
9905
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9775
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8780
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5373
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
3
3456
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.