Rate this page

Flattr this

Group XML elements by key using XSLT v1.0

Tested with xsltproc on

Debian (Etch, Lenny, Squeeze)
Ubuntu (Hardy, Intrepid, Jaunty, Karmic, Lucid, Maverick, Natty, Precise, Trusty)

Tested with Xalan on

Debian (Etch, Lenny, Squeeze)
Ubuntu (Hardy, Intrepid, Jaunty, Karmic, Lucid, Maverick, Natty, Precise, Trusty)

Tested with Saxon on

Debian (Etch, Lenny, Squeeze)
Ubuntu (Hardy, Intrepid, Jaunty, Karmic, Lucid, Maverick, Natty, Precise, Trusty)

Objective

To organise a set of XML elements into groups, such that elements with the same key are placed in the same group, and elements with different keys are placed in different groups

Scenario

See Group XML elements by key using XSLT.

Method

Overview

XSLT v1.0 does not include any explicit support for grouping, however it is possible to achieve the same effect through creative use of the key and generate-id functions. This is called the Muenchian Method after its inventor Steve Muench:

  1. Create an index such that all elements with a given key can be retrieved quickly and efficiently.
  2. Iterate over all keys in the index.
  3. Use the index to retrieve the set of elements corresponding to each key.

The difficult part of the process is step 2 because it is not possible to extract a list of keys from the index. What you can do is construct a node set containing the same elements as those present in the index, then use the index to eliminate duplicates from that node set.

Duplicates are eliminated by designating once instance of each key as the canonical instance. The one chosen here is the first member of node set that is returned when the key is looked up in the index.

Create an index

The index is created by adding an xsl:key element to the top level of the stylesheet:

<xsl:key name="paths" match="path" use="text()">

The three attributes of this element specify that:

Iterate over all keys in the index

This is done using a xsl:for-each element acting upon a very particular XPath expression:

<xsl:for-each select="//path[generate-id()=generate-id(key('paths',text())[1])]">

What the expression does is to select all path elements in the document, then consider whether each one is a canonical instance or a duplicate:

  1. Extract the text from within the current path element.
  2. Use the index to identify all path elements with the same text content.
  3. Select the first of those path elements from the index (the canonical instance).
  4. Generate a string to uniquely identify that canonical instance.
  5. Generate a string to uniquely identify the current path element.
  6. Compare the two strings. If they are the same then the current path element is the canonical instance, otherwise it is a duplicate.

Retrieve the set of elements corresponding to each key

Given the value of a key, the index can be trivially used to retrieve the set of elements corresponding to that key. This can then be processed in whatever manner is needed.

Here is the complete stylesheet:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:key name="paths" match="path" use="text()"/>

<xsl:template match="log">
 <log>
  <xsl:for-each select="//path[generate-id()=generate-id(key('paths',text())[1])]">
   <pathentry>
    <path><xsl:value-of select="text()"/></path>
    <xsl:for-each select="key('paths',text())">
     <logentry>
      <xsl:attribute name="revision"><xsl:value-of select="ancestor::logentry/@revision"/></xsl:attribute>
      <xsl:attribute name="action"><xsl:value-of select="@action"/></xsl:attribute>
      <xsl:attribute name="date"><xsl:value-of select="ancestor::logentry/date/text()"/></xsl:attribute>
     </logentry>
    </xsl:for-each>
   </pathentry>
  </xsl:for-each>
 </log>
</xsl:template>
</xsl:stylesheet>

Testing

For sample input and output data see Group XML elements by key using XSLT.

To apply the stylesheet see Process an XML document using an XSLT stylesheet.

See also