Rate this page

Match arbitrary content using RELAX NG

Tested using xmllint on

Debian (Etch, Lenny, Squeeze)
Ubuntu (Hardy, Intrepid, Jaunty, Karmic, Lucid, Maverick, Natty, Oneiric, Precise, Quantal)

Tested using Jing on

Debian (Squeeze)
Ubuntu (Lucid, Maverick, Natty, Oneiric, Precise, Quantal)

Tested using rnv on

Debian (Lenny)
Ubuntu (Intrepid, Jaunty, Karmic, Lucid, Maverick, Natty)

Objective

To write a RELAX NG pattern that will match any content presented to it

Background

An XML schema should ideally include constraints for all elements in a document, but this may not be worthwhile if your interest is confined to a particular part of the document structure. You may therefore want to use a wildcard pattern that is capable of matching any content.

Scenario

Suppose that you wish to check that an XHTML document has an html element at its root containing a head and a body element in that order. You do not wish to validate the content of the head and body, merely confirm that they exist. For simplicity you will allow the html, head and body elements to take any attributes.

Method

The name of an element or attribute can be left unspecified in a RELAX NG schema by means of the anyName name class. There is no corresponding mechanism for leaving the content unspecified, so for both elements and attributes it is necessary to construct a pattern that matches any possible value. By making this recursive it is possible to match content nested to any number of levels.

Elements may contain further elements, or text, or an interleaved mixture of both. Attributes may only contain text. These constraints can be expressed using either compact syntax:

any_content = any_element* & text
any_element = element * { any_attribute*, any_content }
any_attribute = attribute * { text }

or XML syntax:

<define name="any_content">
 <interleave>
  <zeroOrMore>
   <ref name="any_element"/>
  </zeroOrMore>
  <text/>
 </interleave>
</define>

<define name="any_element">
 <element>
  <anyName/>
  <zeroOrMore>
   <ref name="any_attribute"/>
  </zeroOrMore>
  <zeroOrMore>
   <ref name="any_content"/>
  </zeroOrMore>
 </element>
</define>

<define name="any_attribute">
 <attribute>
  <anyName/>
 </attribute>
</define>

Here is a complete schema, written using compact syntax, to check for the presence of the html, head and body elements in an XHTML document:

default namespace = "http://www.w3.org/1999/xhtml" 

start = element html { any_attribute*, head, body }
head = element head { any_attribute*, any_content }
body = element body { any_attribute*, any_content }

any_content = any_element* & text
any_element = element * { any_attribute*, any_content }
any_attribute = attribute * { text }

Tags: relaxng