473,406 Members | 2,352 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

Former Perl users ? : Spidering/Scraping in PHP

Would I be missing much if I stopped trying to learn Perl well enough to use for
spidering, screen scraping etc. and converted over to PHP ? I am looking to do
all, or at least most of the hacks decribed in the books "Spidering Hacks" and
"Perl & LWP". I am familiar with the book "Webbots, Spiders, and Screen Scrapers: A
Guide to Developing Internet Agents with PHP/CURL" Would anyone know of any other
sources of info related to this kind of thing ?

TIA
Roger H.

Il mittente di questo messaggio|The sender address of this
non corrisponde ad un utente |message is not related to a real
reale ma all'indirizzo fittizio|person but to a fake address of an
di un sistema anonimizzatore |anonymous system
Per maggiori informazioni |For more info
https://www.mixmaster.it

Jun 27 '08 #1
1 1756
George Orwell schreef:
Would I be missing much if I stopped trying to learn Perl well enough to use for
spidering, screen scraping etc. and converted over to PHP ? I am looking to do
all, or at least most of the hacks decribed in the books "Spidering Hacks" and
"Perl & LWP". I am familiar with the book "Webbots, Spiders, and Screen Scrapers: A
Guide to Developing Internet Agents with PHP/CURL" Would anyone know of any other
sources of info related to this kind of thing ?

TIA
Roger H.
Hi Roger,

I am quite sure you can do whatever you want to do using PHP when it
comes to spidering/agents.

If you are familiar with Curl, you are on the right track.
I expect you can translate any Perl code to PHP if you understand both
languages.

I do not know of any dedictaed sources on the subject, but googling
around will surely give some relevant hits.

Some pointers (maybe you know this already)
REGEX:
If you need regular expressions, and I expect you will be using them, be
sure to use the PCRE flavor in PHP. (Perl Compaticle Regex)
The functionnames all start with preg_

The functions can be found here:
http://nl3.php.net/manual/en/ref.pcre.php

Using preg will safe you a headache when using Perl style regex.

Database:
If you use a database to store the information, I think your best bet is
using PDO to interface with the database of your choice, be it MySQL or
Postgresql, etc. (I am a fan of Postgresql)

A lot of info can be found here:
http://nl3.php.net/manual/en/book.pdo.php

Hope this helps a little.
Good luck.

Regards,
Erwin Moller
Jun 27 '08 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: mbbx6spp | last post by:
Hi All, I already searched this newsgroup and google groups to see if I could find a Python equivalent to Perl's Template::Extract, but didn't find anything leading to a Python module that had...
6
by: Mark Watson | last post by:
Last year, I did an experiment of allowing a very polite web spider run for a few days trying to find RDF markup embedded in web pages. I found close to zero RDF - not encouraging! I a recent...
3
by: Xah Lee | last post by:
Split File Fullpath Into Parts Xah Lee, 20051016 Often, we are given a file fullpath and we need to split it into the directory name and file name. The file name is often split into a core...
0
by: Xah Lee | last post by:
One-Liner Loop in Functional Style Xah Lee, 200510 Today we show a example of a loop done as a one-liner of Functional Programing style. Suppose you have a list of file full paths of...
5
by: dananrg | last post by:
O'Reilly's Spidering Hacks books terrific. One problem. All the code samples are in Perl. Nothing Pythonic. Is there a book out there for Python which covers spidering / crawling in depth?
13
by: Otto J. Makela | last post by:
I'm trying to install to php the Perl-1.0.0.tgz package (from http://pecl.php.net/package/perl, enabling one to call perl libraries) to a pre-existing Solaris system. Unfortunately, the attempt...
21
KevinADC
by: KevinADC | last post by:
Note: You may skip to the end of the article if all you want is the perl code. Introduction Uploading files from a local computer to a remote web server has many useful purposes, the most...
10
by: happyse27 | last post by:
Hi All, I got this apache errors(see section A1 and A2 below) when I used a html(see section b below) to activate acctman.pl(see section c below). Section D below is part of the configuration...
1
KevinADC
by: KevinADC | last post by:
Note: You may skip to the end of the article if all you want is the perl code. Introduction Many websites have a form or a link you can use to download a file. You click a form button or click...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.