Rate this page

Process an XML document using an XSLT stylesheet

Tested with xsltproc on

Debian (Etch, Lenny, Squeeze)
Ubuntu (Hardy, Intrepid, Jaunty, Karmic, Lucid, Maverick, Natty, Trusty)

Tested with Xalan on

Debian (Etch, Lenny, Squeeze)
Ubuntu (Hardy, Intrepid, Jaunty, Karmic, Lucid, Maverick, Natty, Trusty)

Tested with Saxon-B on

Debian (Lenny, Squeeze)
Ubuntu (Intrepid, Jaunty, Karmic, Lucid, Maverick, Natty, Trusty)

Tested with Saxon-6 on

Debian (Etch, Lenny, Squeeze)
Ubuntu (Hardy, Intrepid, Jaunty, Karmic, Lucid, Maverick, Natty, Trusty)

Objective

To process an XML document using an XSLT stylesheet

Scenario

Suppose that you have an XML document called input.xml that you wish to process using an XSLT stylesheet called style.xsl to produce either:

Methods

Overview

An XSLT processor is a program for applying an XSLT stylesheet. Popular ones at the time of writing include xsltproc, Xalan and Saxon.

The current version of the XSLT specification is v2.0, but the only Open Source processor that currently implements it is Saxon. For XSLT v1.0, which is still in widespread use, any of the processors described here should suffice.

Selecting XML or HTML

The output method can be forced to XML or HTML by placing an output element at the start of the stylesheet, for example:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="html"/>
<!-- ... -->

(These are not the only possible output methods: other standard ones are text and, when using XSLT v2.0, xhtml.)

If the output method is not specified explicitly then the XSLT processor is supposed to choose an appropriate default:

Method (using xsltproc)

xsltproc is the front-end to libxslt, which is part of the GNOME project. It is written in C. Its only significant dependency is libxml2 (also part of GNOME), making it a relatively lightweight option. It does not currently support XSLT version 2 (as of January 2011), however there is a separate project called LIBX* which aims to create an XSLT 2.0-compatible processor based on libxslt.

On Debian-based systems the required package is xsltproc:

apt-get install xsltproc

Documents can then be processed using the xsltproc command:

xsltproc -o output.xml style.xsl input.xml

(Replace input.xml with a hyphen to read from standard input, omit -o output.xml to write to standard output.)

Method (using Xalan)

Xalan is an XSLT processor published by the Apache Software Foundation. It exists in two versions, one written in C++ and the other in Java. The C++ version is usually preferable for use from the command line, as it has a smaller dependency footprint and is significantly faster.

On Debian-based systems the required package is xalan:

apt-get install xalan

Documents can then be processed using the xalan command:

xalan -xsl style.xsl -in input.xml -out output.xml

(Omit -in input.xml to read from standard input, omit -out output.xml to write to standard output.)

Method (using Saxon-B)

Saxon-B is the Open Source variant of Saxon for releases up to and including version 9.1. At the time of writing was the most recent version available as a Debian or Ubuntu package. It is written in Java and fully supports XSLT v2.0.

(As of version 9.2 it has been replaced by Saxon-HE, but this deliberately omits some features so upgrading is not necessarily an advantage.)

On Debian-based systems the required package is libsaxonb-java, which first became available in Debian Lenny and Ubuntu Intrepid. You will also need a Java runtime environment, for which (as of Debian Lenny and Ubuntu Karmic) you can use default-jre:

apt-get install libsaxonb-java default-jre

Note that the package libsaxon-java is not Saxon-B (it contains Saxon-6, an older version described below).

Once installed you will need to place the Saxon JAR file on the CLASSPATH:

export CLASSPATH=$CLASSPATH:/usr/share/java/saxonb.jar

This affects only the process in which it is executed and processes descended from it. If you want to alter the CLASSPATH permanently then you will need to set it in a configuration file such as ~/.profile.

Documents are processed by invoking the relevant Java class name, which (for this variant of Saxon) is net.sf.saxon.Transform:

java net.sf.saxon.Transform -o:output.xml -s:input.xml -xsl:style.xsl

If you attempt to process an XSLT v1.0 stylesheet then Saxon will print a warning. You can suppress this by appending the option -versionmsg:off.

Method (using Saxon-6)

Saxon-6 is an older version which does not support XSLT v2.0. There has been no upstream development since 2001 and no upstream maintenance since 2005. However it is still supplied with current versions of Debian and Ubuntu for the purpose of backward compatibility (particularly in relation to DocBook processing).

On Debian-based systems the required package is libsaxon-java. As with Saxon-B, a Java runtime environment is needed:

apt-get install libsaxon-java default-jre

Documents can be processed by invoking the main class of the JAR file:

java -jar /usr/share/java/saxon.jar -o output.xml input.xml style.xsl

Alternatively the required class may be invoked explicitly. Note that its name differs from that in later versions of Saxon:

java com.icl.saxon.StyleSheet -o output.xml input.xml style.xsl

In the latter case it is necessary to first place the JAR file on the CLASSPATH:

export CLASSPATH=$CLASSPATH:/usr/share/java/saxon.jar

Testing

Given the following XSLT v1.0 stylesheet:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml"/>

<xsl:template match="/input">
  <output><xsl:apply-templates/></output>
</xsl:template>

<xsl:template match="p">
  <p><xsl:apply-templates/></p>
</xsl:template>
</xsl:stylesheet>

and the following XML document:

<?xml version="1.0" encoding="UTF-8"?>
<input>
  <p>Hello, World!</p>
</input>

any of the methods described above should produce output equivalent to:

<?xml version="1.0" encoding="UTF-8"?>
<output>
  <p>Hello, World!</p>
</output>

Errors

xsl:result-document is disabled when extension functions are disabled

When using Saxon, the error message:

xsl:result-document is disabled when extension functions are disabled

indicates that the stylesheet attempted to make use of the xsl:result-document element, but Saxon had not been configured to allow this.

Use of xsl:result-document is prohibited by default because it provides the ability to write to an arbitrary URI. This could be abused in a number of ways, most notably through use of the file: URI scheme to alter files on the local filesystem. The default can be overridden by setting the ext parameter to on:

java net.sf.saxon.Transform -o:output.xml -s:input.xml -xsl:style.xsl -ext:on

Before doing this you should assess whether the stylesheet poses any threat to your system. There are two points to consider:

The ext parameter also enables the use of extension functions. Like xsl:result-document these have significant potential for abuse, and require a similar degree of trust in the stylesheet.

Tags: xslt