Rate this page

Flattr this

Fetch the content of a given URL in Perl

Objective

To fetch the content located at a given URL in Perl

Methods

Overview

There are at least four different methods for fetching the content of a URL in Perl:

Of these, LWP::UserAgent would be the author’s recommendation for general use. It supports a wide range of features, but these can mostly be ignored if you do not use them and its API is not excessively complicated when handling simple cases.

Using LWP::UserAgent

Features of LWP::UserAgent include:

Useful variants of LWP::UserAgent include LWP::RobotUA (a user agent that with built-in support for robots.txt) and WWW::Mechanize (for stateful navigation of a web site, with the ability to follow links and complete forms).

See Fetch the content of a given URL in Perl using LWP::UserAgent for further details.

Using LWP::Simple

LWP::Simple provides a simplified interface to LWP::UserAgent. Unfortunately it is rather too simple for many purposes, and if you do hit one of its limitations then it is usually necessary to start again with a different module. For this reason LWP::Simple is probably best avoided when writing non-trivial programs, but for simple throw-away scripts its brevity may compensate for any shortcomings.

See Fetch the content of a given URL in Perl using LWP::Simple for further details.

Using WWW::Curl

WWW::Curl provides a Perl binding to libcurl, a widely used file transfer library that can be used from many different programming languages. It presents two separate interfaces, WWW::Curl::Easy and WWW::Curl::Multi, but both are more complex to use than LWP::UserAgent.

The functionality provided by WWW::Curl is generally narrower but deeper than that of LWP::UserAgent. For example, it supports a wider range of URL schemes (21 according to the libcurl website), but provides nothing comparable to LWP::RobotUA or WWW::Mechanize.

Using IO::All

IO::All is a unifying framework for performing many different types of input/output through a common interface. In addition to files and URLs it can be used to interact with entities such as strings, sockets and processes. This is both a strength and a weakness.

For some types of program the ability to use URLs in the same manner as pathnames can be a useful convenience, and IO::All allows this to be provided without adding any significant complexity to a program. However this functionality can be dangerous if the URL came from an untrusted source, so IO::All is usually best avoided in security-sensitive applications such as CGI scripts.

Tags: perl