Fetch the content of a given URL in Perl
To fetch the content located at a given URL in Perl
There are at least four different methods for fetching the content of a URL in Perl:
LWP::UserAgent(or one of its derivatives);
LWP::UserAgent would be the author’s recommendation for general use. It supports a wide range of features, but these can mostly be ignored if you do not use them and its API is not excessively complicated when handling simple cases.
- support for the
- HTTP authentication (including Simple, Digest and Negotiate);
- sending and accepting cookies (either stored in memory or written to disc);
- use of a proxy server; and
- access to inbound and outbound HTTP headers.
Useful variants of
LWP::RobotUA (a user agent that with built-in support for
WWW::Mechanize (for stateful navigation of a web site, with the ability to follow links and complete forms).
See Fetch the content of a given URL in Perl using LWP::UserAgent for further details.
LWP::Simple provides a simplified interface to
LWP::UserAgent. Unfortunately it is rather too simple for many purposes, and if you do hit one of its limitations then it is usually necessary to start again with a different module. For this reason
LWP::Simple is probably best avoided when writing non-trivial programs, but for simple throw-away scripts its brevity may compensate for any shortcomings.
See Fetch the content of a given URL in Perl using LWP::Simple for further details.
WWW::Curl provides a Perl binding to libcurl, a widely used file transfer library that can be used from many different programming languages. It presents two separate interfaces,
WWW::Curl::Multi, but both are more complex to use than
The functionality provided by
WWW::Curl is generally narrower but deeper than that of
LWP::UserAgent. For example, it supports a wider range of URL schemes (21 according to the libcurl website), but provides nothing comparable to
IO::All is a unifying framework for performing many different types of input/output through a common interface. In addition to files and URLs it can be used to interact with entities such as strings, sockets and processes. This is both a strength and a weakness.
For some types of program the ability to use URLs in the same manner as pathnames can be a useful convenience, and
IO::All allows this to be provided without adding any significant complexity to a program. However this functionality can be dangerous if the URL came from an untrusted source, so
IO::All is usually best avoided in security-sensitive applications such as CGI scripts.