Configure Apache as a reverse proxy
|Debian (Lenny, Squeeze)|
|Ubuntu (Hardy, Intrepid, Jaunty, Karmic, Lucid, Maverick, Natty, Oneiric, Precise, Quantal|
To configure an Apache HTTP server as a reverse proxy, forwarding requests for a given set of URLs to another server
A proxy server is one which forwards client requests to another server instead of fulfilling them itself. There are two main types:
- A forward proxy forwards to an arbitrary destination, typically on behalf of a particular set of clients.
- A reverse proxy forwards to a fixed destination, typically on behalf of arbitrary clients.
Common uses of a reverse HTTP proxy include:
- incorporating content hosted by one server into a larger website;
- making content hosted on a private network accessible from the public Internet; or
- adding features such as authentication or encryption to an existing web site that would not otherwise be able to support them.
Suppose you have a web server attached to a private network which hosts the web site
http://internal.example.com/. Most of the content of this site is intended to stay within the private network, but there is one part of it rooted at the URL
http://internal.example.com/public which you want to be accessible from the outside world.
To achieve this you are able to make use of the web server that hosts your public web site at
http://www.example.com/. It is located in a DMZ with access both to your private network and from the public Internet. You intend to use this public server as a proxy, incorporating content from the private server into the existing public web site. The public URL of the proxied content should be
The method described here has four steps:
- Enable the
- Use the
ProxyPassdirective to map the required local path to the corresponding remote URL.
- Optionally, use the
ProxyPassReversedirective to rewrite URLs in HTTP headers.
- Ensure that the proxy server allows access to the proxied content.
Note that the
ProxyRequests directive is required only when configuring a forward proxy and should not normally be used for a reverse proxy.
Two separate modules are needed for Apache to act as a reverse HTTP proxy:
mod_proxyto support aspects of proxying that are not specific to any particular protocol, and
mod_proxy_httpto support aspects that are specific to HTTP.
These are standard Apache modules, and on the systems tested no separate action is needed to install them, but they are not usually loaded by default. On Debian-based system you can arrange for this to happen using the
a2enmod proxy_http service apache2 restart
There is no need to separately enable
a2enmod knows that it is a prerequisite of
mod_proxy_http. Note that module names passed to
a2enmod should have the
mod_ prefix removed. On older systems you may need to replace
service apache2 with
If for any reason you need to load the modules manually then the appropriate configuration directives are:
LoadModule proxy_module /usr/lib/apache2/modules/mod_proxy.so LoadModule proxy_http_module /usr/lib/apache2/modules/mod_proxy_http.so
(where the pathnames of the modules should be replaced with whatever is appropriate for your system).
ProxyPass directive creates a mapping from a path within the local web site to a given remote URL. In this instance the local path is
/internal and it should be mapped to
http://internal.example.com/public. There are two ways in which this can be expressed.
ProxyPass has a two-argument form, where both local path and remote URL are specified explicitly:
ProxyPass /internal http://internal.example.com/public
It also has a single-argument form, where the local path is implied by the context:
<Location /internal> ProxyPass http://internal.example.com/public </Location>
The effect is the same either way. The mapping applies to any location within the subtree rooted at the given path, so (for example) the configuration above would cause
/internal/logo.png to be mapped to
http://internal.example.com/public/logo.png. Trailing slash characters are significant:
- If the slash is included then only the content of the specified directory is mapped, not the directory itself.
- Without the slash, both directory and content are mapped.
For this reason, the path and the URL should either both end with a slash or both end without one.
If the proxied web site uses HTTP redirections then you will normally want to rewrite the returned URLs to refer to the proxy server in place of the upstream server. For example, the response:
HTTP/1.1 301 Moved Permanently Location: http://internal.example.com/public/status.cgi [...]
HTTP/1.1 301 Moved Permanently Location: http://www.example.com/internal/status.cgi [...]
This can be achieved using the
ProxyPassReverse directive, which rewrites any
URI headers that it sees in the response from the upstream server:
ProxyPassReverse /internal http://internal.example.com/public
ProxyPassReverse does not itself cause the path to be proxied. For this reason it should be used in addition to
RewriteRule), not as a replacement.
If there is a single
ProxyPass directive then you will typically want a matching
ProxyPassReverse directive with the same arguments. If there are several then you may be able to obtain the necessary coverage using a single
ProxyPassReverse directive further up the path hierarchy, for example:
ProxyPass /internal/foo http://internal.example.com/public/foo ProxyPass /internal/bar http://internal.example.com/public/bar ProxyPass /internal/baz http://internal.example.com/public/baz ProxyPassReverse /internal http://internal.example.com/public
It is generally good practice to include
ProxyPassReverse even if the upstream site does not use HTTP redirection, because there is little harm in doing so and it may prevent future breakage. An exception would be if you have arranged for the proxy to respond to the same absolute URLs as the upstream server (as described below). Be aware that the HTTP/1.1 specification does not allow redirection to non-absolute URLs.
Some versions of Apache deny all access by default, relying on the configuration file to explicitly specify what should be accessible and to whom. If you are using one of these versions then you may need to add a
<Location> section to cover the proxied content, for example:
<Proxy *> Order allow,deny Allow from all </Proxy>
<Directory> sections are not sufficient here because proxied content does not result in a filesystem access. Note that the
Deny directives are deprecated as of Apache 2.4 (although they remain for the purpose of backward compatibility), so you should not make this addition without first considering whether it is appropriate.
It may be the case that the set of paths you want to proxy does not form a complete subtree. There are three ways in which this can be handled:
- by proxying a number of smaller subtrees that contain only the material you want to allow;
- by proxying everything, then adding rules to exclude paths that you do not want to allow; or
- by using the
ProxyPassMatchto specify exactly what you want to allow in the form of a regular expression.
One consideration when choosing between these options is the complexity of the resulting configuration file, however you should also consider what you would want to happen if any new paths were added to the remote web site without updating the proxy server (whether you would want them proxied or not by default).
Paths can be excluded by replacing the remote URL with an exclamation mark. For example, you could exclude the path
/internal/execluded while continuing to provide access to the remainder of
/internal with the following pair of
ProxyPass /internal/excluded ! ProxyPass /internal http://internal.example.com/public
The order of these directives is important, since Apache acts on the first matching
ProxyPassMatch) directive that it encounters.
ProxyPassMatch allows for selective proxying using a regular expression (as opposed to a path prefix) to decide what is included or excluded. This is useful when you want to use rules that do not relate to the path heirarchy. For example, if you wanted to proxy JPEG and PNG files but nothing else then you could do so with the directive:
ProxyPassMatch ^/internal/(.*[.](jpeg|png))$ http://internal.example.com/public/$1
$ require that the pattern match the whole path, not a substring. It will match if it begins with the prefix
/internal and ends with the suffix
.png. The portion following the prefix is captured as
$1 for use in the remote URL.)
As in the case of
ProxyPath, you can supply an exclamation mark in place of the remote URL to prevent the selected paths from being proxied:
ProxyPassMatch ^/internal/.*[.](jpeg|png)$ ! ProxyPass /internal http://internal.example.com/public
Rewriting of URLs is best avoided if you can because of the amount of extra processing it involves, and because it is rather more likely to go wrong than simply forwarding the content verbatim. Possible methods of avoidance include:
- using relative URLs instead of absolute ones, or
- arranging for the proxy server to respond to the same absolute URLs as the upstream server (see below).
(Fully relative URLs are the most robust, although they can break if the website is proxied in a way that changes its internal structure. Site-relative URLs are sufficient if only the hostname and/or port number have changed.)
If rewriting is necessary then it can be implemented using the
mod_proxy_html module. This was originally a third-party module, but it became part of the Apache HTTP Server as of version 2.4. The required configuration directives for the scenario described above would be:
ProxyHTMLEnable On ProxyHTMLURLMap http://internal.example.com/public /internal
Note that the arguments are passed in the opposite order to
ProxyPassReverse, but otherwise similar considerations apply.
ProxyHTMLExtended directive, but this introduces a significant risk of false positives and you need to supply rules that are carefully matched to the website you want to proxy.
Most of the complications associated with proxying arise from changing the URL at which the relevant content is located. If this could be avoided then there would be no need to make compensatory changes at the proxy server. The following method achieves this:
- Configure both the upstream server and the proxy server to respond to HTTP requests directed to the same virtual hostname.
- Ensure that end users are directed to the proxy server when they resolve the hostname.
- Ensure that the proxy server is directed to the upstream server when it resolves the hostname.
If the proxy is the only machine needing access to the upstream server then one option would be for the public DNS to refer to the proxy, but for this to be overridden in
/etc/hosts on the proxy server. Alternatively, if you want clients on the private network to access the upstream server directly then that can be arranged using split-horizon DNS (where the IP address corresponding to the website hostname varies according to where it is resolved from).