Match arbitrary content using RELAX NG
Content |
Tested using xmllint on |
Debian (Etch, Lenny, Squeeze) |
Ubuntu (Hardy, Intrepid, Jaunty, Karmic, Lucid, Maverick, Natty, Oneiric, Precise, Quantal) |
Tested using Jing on |
Debian (Squeeze) |
Ubuntu (Lucid, Maverick, Natty, Oneiric, Precise, Quantal) |
Tested using rnv on |
Debian (Lenny) |
Ubuntu (Intrepid, Jaunty, Karmic, Lucid, Maverick, Natty) |
Objective
To write a RELAX NG pattern that will match any content presented to it
Background
An XML schema should ideally include constraints for all elements in a document, but this may not be worthwhile if your interest is confined to a particular part of the document structure. You may therefore want to use a wildcard pattern that is capable of matching any content.
Scenario
Suppose that you wish to check that an XHTML document has an html
element at its root containing a head
and a body
element in that order. You do not wish to validate the content of the head
and body
, merely confirm that they exist. For simplicity you will allow the html
, head
and body
elements to take any attributes.
Method
The name of an element or attribute can be left unspecified in a RELAX NG schema by means of the anyName
name class. There is no corresponding mechanism for leaving the content unspecified, so for both elements and attributes it is necessary to construct a pattern that matches any possible value. By making this recursive it is possible to match content nested to any number of levels.
Elements may contain further elements, or text, or an interleaved mixture of both. Attributes may only contain text. These constraints can be expressed using either compact syntax:
any_content = any_element* & text any_element = element * { any_attribute*, any_content } any_attribute = attribute * { text }
or XML syntax:
<define name="any_content"> <interleave> <zeroOrMore> <ref name="any_element"/> </zeroOrMore> <text/> </interleave> </define> <define name="any_element"> <element> <anyName/> <zeroOrMore> <ref name="any_attribute"/> </zeroOrMore> <zeroOrMore> <ref name="any_content"/> </zeroOrMore> </element> </define> <define name="any_attribute"> <attribute> <anyName/> </attribute> </define>
Here is a complete schema, written using compact syntax, to check for the presence of the html
, head
and body
elements in an XHTML document:
default namespace = "http://www.w3.org/1999/xhtml" start = element html { any_attribute*, head, body } head = element head { any_attribute*, any_content } body = element body { any_attribute*, any_content } any_content = any_element* & text any_element = element * { any_attribute*, any_content } any_attribute = attribute * { text }
Tags: relaxng