Configure Apache as a reverse proxy
Tested on |
Debian (Lenny, Squeeze) |
Ubuntu (Hardy, Intrepid, Jaunty, Karmic, Lucid, Maverick, Natty, Oneiric, Precise, Quantal |
Objective
To configure an Apache HTTP server as a reverse proxy, forwarding requests for a given set of URLs to another server
Background
A proxy server is one which forwards client requests to another server instead of fulfilling them itself. There are two main types:
- A forward proxy forwards to an arbitrary destination, typically on behalf of a particular set of clients.
- A reverse proxy forwards to a fixed destination, typically on behalf of arbitrary clients.
Common uses of a reverse HTTP proxy include:
- incorporating content hosted by one server into a larger website;
- making content hosted on a private network accessible from the public Internet; or
- adding features such as authentication or encryption to an existing web site that would not otherwise be able to support them.
Scenario
Suppose you have a web server attached to a private network which hosts the web site http://internal.example.com/
. Most of the content of this site is intended to stay within the private network, but there is one part of it rooted at the URL http://internal.example.com/public
which you want to be accessible from the outside world.
To achieve this you are able to make use of the web server that hosts your public web site at http://www.example.com/
. It is located in a DMZ with access both to your private network and from the public Internet. You intend to use this public server as a proxy, incorporating content from the private server into the existing public web site. The public URL of the proxied content should be http://www.example.com/internal
.
Method
Overview
The method described here has four steps:
- Enable the
mod_proxy
andmod_proxy_http
Apache modules. - Use the
ProxyPass
directive to map the required local path to the corresponding remote URL. - Optionally, use the
ProxyPassReverse
directive to rewrite URLs in HTTP headers. - Ensure that the proxy server allows access to the proxied content.
Note that the ProxyRequests
directive is required only when configuring a forward proxy and should not normally be used for a reverse proxy.
Enable the mod_proxy and mod_proxy_http Apache modules
Two separate modules are needed for Apache to act as a reverse HTTP proxy:
-
mod_proxy
to support aspects of proxying that are not specific to any particular protocol, and -
mod_proxy_http
to support aspects that are specific to HTTP.
These are standard Apache modules, and on the systems tested no separate action is needed to install them, but they are not usually loaded by default. On Debian-based system you can arrange for this to happen using the a2enmod
command:
a2enmod proxy_http service apache2 restart
There is no need to separately enable mod_proxy
because a2enmod
knows that it is a prerequisite of mod_proxy_http
. Note that module names passed to a2enmod
should have the mod_
prefix removed. On older systems you may need to replace service apache2
with /etc/init.d/apache2
.
If for any reason you need to load the modules manually then the appropriate configuration directives are:
LoadModule proxy_module /usr/lib/apache2/modules/mod_proxy.so LoadModule proxy_http_module /usr/lib/apache2/modules/mod_proxy_http.so
(where the pathnames of the modules should be replaced with whatever is appropriate for your system).
Use the ProxyPass directive to map the required local path to the corresponding remote URL
The ProxyPass
directive creates a mapping from a path within the local web site to a given remote URL. In this instance the local path is /internal
and it should be mapped to http://internal.example.com/public
. There are two ways in which this can be expressed. ProxyPass
has a two-argument form, where both local path and remote URL are specified explicitly:
ProxyPass /internal http://internal.example.com/public
It also has a single-argument form, where the local path is implied by the context:
<Location /internal> ProxyPass http://internal.example.com/public </Location>
The effect is the same either way. The mapping applies to any location within the subtree rooted at the given path, so (for example) the configuration above would cause /internal/logo.png
to be mapped to http://internal.example.com/public/logo.png
. Trailing slash characters are significant:
- If the slash is included then only the content of the specified directory is mapped, not the directory itself.
- Without the slash, both directory and content are mapped.
For this reason, the path and the URL should either both end with a slash or both end without one.
Optionally, use the ProxyPassReverse directive to rewrite URLs in HTTP headers.
If the proxied web site uses HTTP redirections then you will normally want to rewrite the returned URLs to refer to the proxy server in place of the upstream server. For example, the response:
HTTP/1.1 301 Moved Permanently Location: http://internal.example.com/public/status.cgi [...]
should become:
HTTP/1.1 301 Moved Permanently Location: http://www.example.com/internal/status.cgi [...]
This can be achieved using the ProxyPassReverse
directive, which rewrites any Location
, Content-Location
or URI
headers that it sees in the response from the upstream server:
ProxyPassReverse /internal http://internal.example.com/public
Note that ProxyPassReverse
does not itself cause the path to be proxied. For this reason it should be used in addition to ProxyPass
(or ProxyPassMatch
or RewriteRule
), not as a replacement.
If there is a single ProxyPass
directive then you will typically want a matching ProxyPassReverse
directive with the same arguments. If there are several then you may be able to obtain the necessary coverage using a single ProxyPassReverse
directive further up the path hierarchy, for example:
ProxyPass /internal/foo http://internal.example.com/public/foo ProxyPass /internal/bar http://internal.example.com/public/bar ProxyPass /internal/baz http://internal.example.com/public/baz ProxyPassReverse /internal http://internal.example.com/public
It is generally good practice to include ProxyPassReverse
even if the upstream site does not use HTTP redirection, because there is little harm in doing so and it may prevent future breakage. An exception would be if you have arranged for the proxy to respond to the same absolute URLs as the upstream server (as described below). Be aware that the HTTP/1.1 specification does not allow redirection to non-absolute URLs.
Ensure that the proxy server allows access to the proxied content
Some versions of Apache deny all access by default, relying on the configuration file to explicitly specify what should be accessible and to whom. If you are using one of these versions then you may need to add a <Proxy>
or <Location>
section to cover the proxied content, for example:
<Proxy *> Order allow,deny Allow from all </Proxy>
<Directory>
sections are not sufficient here because proxied content does not result in a filesystem access. Note that the Order
, Allow
and Deny
directives are deprecated as of Apache 2.4 (although they remain for the purpose of backward compatibility), so you should not make this addition without first considering whether it is appropriate.
Variations
Proxying incomplete subtrees
It may be the case that the set of paths you want to proxy does not form a complete subtree. There are three ways in which this can be handled:
- by proxying a number of smaller subtrees that contain only the material you want to allow;
- by proxying everything, then adding rules to exclude paths that you do not want to allow; or
- by using the
ProxyPassMatch
to specify exactly what you want to allow in the form of a regular expression.
One consideration when choosing between these options is the complexity of the resulting configuration file, however you should also consider what you would want to happen if any new paths were added to the remote web site without updating the proxy server (whether you would want them proxied or not by default).
Excluding paths
Paths can be excluded by replacing the remote URL with an exclamation mark. For example, you could exclude the path /internal/execluded
while continuing to provide access to the remainder of /internal
with the following pair of ProxyPass
directives:
ProxyPass /internal/excluded ! ProxyPass /internal http://internal.example.com/public
The order of these directives is important, since Apache acts on the first matching ProxyPath
(or ProxyPassMatch
) directive that it encounters.
Including or excluding paths using a regular expression
ProxyPassMatch
allows for selective proxying using a regular expression (as opposed to a path prefix) to decide what is included or excluded. This is useful when you want to use rules that do not relate to the path heirarchy. For example, if you wanted to proxy JPEG and PNG files but nothing else then you could do so with the directive:
ProxyPassMatch ^/internal/(.*[.](jpeg|png))$ http://internal.example.com/public/$1
(The anchors ^
and $
require that the pattern match the whole path, not a substring. It will match if it begins with the prefix /internal
and ends with the suffix .jpeg
or .png
. The portion following the prefix is captured as $1
for use in the remote URL.)
As in the case of ProxyPath
, you can supply an exclamation mark in place of the remote URL to prevent the selected paths from being proxied:
ProxyPassMatch ^/internal/.*[.](jpeg|png)$ ! ProxyPass /internal http://internal.example.com/public
Rewriting URLs in hyperlinks
Rewriting of URLs is best avoided if you can because of the amount of extra processing it involves, and because it is rather more likely to go wrong than simply forwarding the content verbatim. Possible methods of avoidance include:
- using relative URLs instead of absolute ones, or
- arranging for the proxy server to respond to the same absolute URLs as the upstream server (see below).
(Fully relative URLs are the most robust, although they can break if the website is proxied in a way that changes its internal structure. Site-relative URLs are sufficient if only the hostname and/or port number have changed.)
If rewriting is necessary then it can be implemented using the mod_proxy_html
module. This was originally a third-party module, but it became part of the Apache HTTP Server as of version 2.4. The required configuration directives for the scenario described above would be:
ProxyHTMLEnable On ProxyHTMLURLMap http://internal.example.com/public /internal
Note that the arguments are passed in the opposite order to ProxyPassReverse
, but otherwise similar considerations apply.
The default behaviour is relatively cautious: URLs are rewritten only within HTML and XHTML documents, and then only for particular attributes of particular element types. Coverage of embedded CSS and JavaScript can be enabled using the ProxyHTMLExtended
directive, but this introduces a significant risk of false positives and you need to supply rules that are carefully matched to the website you want to proxy.
Responding to the same absolute URLs as the upstream server
Most of the complications associated with proxying arise from changing the URL at which the relevant content is located. If this could be avoided then there would be no need to make compensatory changes at the proxy server. The following method achieves this:
- Configure both the upstream server and the proxy server to respond to HTTP requests directed to the same virtual hostname.
- Ensure that end users are directed to the proxy server when they resolve the hostname.
- Ensure that the proxy server is directed to the upstream server when it resolves the hostname.
If the proxy is the only machine needing access to the upstream server then one option would be for the public DNS to refer to the proxy, but for this to be overridden in /etc/hosts
on the proxy server. Alternatively, if you want clients on the private network to access the upstream server directly then that can be arranged using split-horizon DNS (where the IP address corresponding to the website hostname varies according to where it is resolved from).
See also
Further reading
- mod_proxy, Apache HTTP Server Version 2.4 (documentation)
- mod_proxy_html, Apache HTTP Server Version 2.4 (documentation)
- Nick Kew, Running a Reverse Proxy in Apache, Apache Tutor