Process an XML document using an XSLT stylesheet in Java
Tested with OpenJDK 6 |
Debian (Lenny, Squeeze) |
Ubuntu (Hardy, Intrepid, Jaunty, Karmic, Lucid, Maverick, Natty, Oneiric, Precise, Quantal) |
Tested with OpenJDK 7 |
Ubuntu (Oneiric, Precise, Quantal) |
Objective
To process an XML document using an XSLT stylesheet from within a Java program
Scenario
Suppose that you have an XML document called input.xml
that you wish to process using an XSLT stylesheet called style.xsl
to produce a new XML document called output.xml
.
(The method described here is not limited to handling documents in the form of files, however their use will simplify the task of providing a working example. Other input and output methods are considered later as variations.)
Method
Overview
The method described here has six steps:
- Obtain a
TransformerFactory
object. - Present the stylesheet as a
Source
. - Use the factory object to obtain a
Transformer
object for the given stylesheet. - Present the input document as a
Source
. - Provide a
Result
object to accept the output document. - Use the
Transformer
object to process the input document.
Once created the Transformer
object can be used to process any number of documents by repeating steps 4 to 6. This can greatly improve processing efficiency if the stylesheet is large and used many times.
The following imports are assumed:
import java.io.*; import javax.xml.transform.*; import javax.xml.transform.stream.*;
Obtain a TransformerFactory object
The Transformer
object that will be used to process the input document cannot be created directly, and must instead by created by an instance of the factory class TransformerFactory
. This in turn must be created using the factory method TransformerFactory.newInstance
:
TransformerFactory factory = TransformerFactory.newInstance();
Present the stylesheet as a Source
In order to avoid making assumptions about how or where the stylesheet is stored, the javax.xml.transform
package uses an abstraction called a Source
. This has implementations which are capable of accepting the stylesheet in a number of different forms, including:
- as an unparsed XML document (a
StreamSource
), - as a parsed XML document presented as a DOM tree (a
DOMSource
), or - as a parsed XML document presented serially using the SAX API (a
SAXSource
).
In this instance the stylesheet exists as a file and is therefore in unparsed form. This makes it most readily presentable as a StreamSource
:
String stylesheetPathname = "style.xsl"; Source stylesheetSource = new StreamSource(new File(stylesheetPathname).getAbsoluteFile());
(In this particular case it may be possible to pass the pathname directly into the StreamSource constructor where it would be interpreted as a relative URL, however this is not recommended for two reasons: firstly, the pathname could be misinterpreted if it contained any characters that have a special meaning within a URL, and secondly, the javax.xml.transform
documentation does not appear to clearly specify how such URLs should be resolved.)
Use the factory object to obtain a Transformer object for the given stylesheet
Given a stylesheet Source
and a Transformer
factory it is now possible to create the Transformer
object:
Transformer transformer = factory.newTransformer(stylesheetSource);
Present the input document as a Source
Like the stylesheet, the input document must be presented as a Source
:
String inputPathname = "input.xml"; Source inputSource = new StreamSource(new File(inputPathname).getAbsoluteFile());
Provide a Result object to accept the output document
The output document is represented by an abstraction called a Result
. This is the opposite of a Source
in that it specifies how and where the output document should be placed. Like a Source
it has implementations which are capable of representing the document in a number of different forms, including:
- as an unparsed XML document (a
StreamResult
), - as a parsed XML document presented as a DOM tree (a
DOMResult
), or - as a parsed XML document presented serially using the SAX API (a
SAXResult
).
In this instance the requirement is to write the output document to a file, which is most readily achieved using a StreamResult
:
String outputPathname = "output.xml"; Result outputResult = new StreamResult(new File(outputPathname).getAbsoluteFile());
Use the Transformer object to transform the input document
Given a Transformer
(the stylesheet and XSLT processor), a Source
(the input document) and a Result
(the output document), it is now possible to perform the transformation:
transformer.transform(inputSource, outputResult);
If no errors are reported then the file output.xml
should now contain the result of transforming input.xml
using the stylesheet style.xsl
.
Example program
Here is a complete example program which takes the names of the stylesheet, input document and output document as command line arguments:
import java.io.*; import javax.xml.transform.*; import javax.xml.transform.stream.*; class Transform { public static void main(String[] args) throws TransformerException { String stylesheetPathname = args[0]; String inputPathname = args[1]; String outputPathname = args[2]; TransformerFactory factory = TransformerFactory.newInstance(); Source stylesheetSource = new StreamSource(new File(stylesheetPathname).getAbsoluteFile()); Transformer transformer = factory.newTransformer(stylesheetSource); Source inputSource = new StreamSource(new File(inputPathname).getAbsoluteFile()); Result outputResult = new StreamResult(new File(outputPathname).getAbsoluteFile()); transformer.transform(inputSource, outputResult); } }
If written to a file named Transform.java
the program can be compiled and invoked as follows:
javac Transform.java java Transform style.xsl input.xml output.xml
Variations
Read the input from a string
If the input document (or stylesheet) exists in the form of a string containing an unparsed XML document then this can be converted to a StreamSource
via the intermediate form of a StringReader
:
String inputString = "<?xml version="1.0" encoding="UTF-8"?><foo>"; Source inputSource = new StreamSource(new StringReader(inputString));
(The conversion could be performed via an InputStream
, but only by encoding the document as a byte stream then decoding it back to a character stream. In addition to being somewhat inefficient this can corrupt the document if not done correctly.)
Write the output to a string
If the output document is wanted in the form of a string containing an unparsed XML document then that can be arranged by sending it via a StreamResult
to a StringWriter
:
Writer outputWriter = new StringWriter(); Result outputResult = new StreamResult(outputWriter); transformer.transform(inputSource, outputResult); String outputString = outputWriter.toString();
(The conversion could be performed via an OutputStream
, but doing so has the same disadvantages as using an InputStream
for reading.)
Read the input from a DOM tree
If the input document (or stylesheet) exists in the form of a DOM tree then this can be presented to the Transformer
object using a DOMSource
:
Source inputSource = new DOMSource(inputNode);
You can optionally specify a second argument which is the base URI with respect to which any relative URIs should be resolved.
Write the output to a DOM tree
If the output document is wanted in the form of a DOM tree then this can be arranged by sending it to a DOMResult
:
DOMResult outputResult = new DOMResult(); transformer.transform(inputSource, outputResult);
By default the DOMResult
object creates a Document
node to hold the output:
Document outputDocument = (Document)outputResult.getNode(); Element outputRootElement = outputDocument.getDocumentElement();