XPath and Default Namespace handling

A lot of questions about XPath expressions not returning the expected results seem to be related to the (ab)use of Namespaces and mostly by so-called "Default Namespaces". This article will try to explain the problem and provides solutions using 3 popular XPath implementations: Jaxen, the JAXP XPathFactory and XSLT.

What's the Problem?

Let's assume the following XML:

<catalog>
  <cd>
    <artist>Sufjan Stevens</artist>
    <title>Illinois</title>
    <src>http://www.sufjan.com/</src>
  </cd>
  <cd>
    <artist>Stoat</artist>
    <title>Future come and get me</title>
    <src>http://www.stoatmusic.com/</src>
  </cd>
  <cd>
    <artist>The White Stripes</artist>
    <title>Get behind me satan</title>
    <src>http://www.whitestripes.com/</src>
  </cd>
</catalog>
You could use the following XPath to return all the cd elements '//cd' not declared in a namespace.

Now let's take the same XML however now defining all elements in the 'http://www.edankert.com/examples/' namespace.

And instead of prefixing all the different elements (although this would cause the same problem), we're declaring a so-called default namespace at the root element.

So the XML now looks like:

<catalog xmlns="http://www.edankert.com/examples/">
  <cd>
    <artist>Sufjan Stevens</artist>
    <title>Illinois</title>
    <src>http://www.sufjan.com/</src>
  </cd>
  <cd>
    <artist>Stoat</artist>
    <title>Future come and get me</title>
    <src>http://www.stoatmusic.com/</src>
  </cd>
  <cd>
    <artist>The White Stripes</artist>
    <title>Get behind me satan</title>
    <src>http://www.whitestripes.com/</src>
  </cd>
</catalog>
When we now use the same XPath as above '//cd', we notice that nothing is returned. This is because the specified XPath returns all cd elements that have not been declared in a namespace and in the example above all the 'cd' elements are declared in the'http://www.edankert.com/examples/' namespace.

Namespace-Prefix mappings

We need some kind of way to specify in our XPath expression that we are looking for all 'cd' elements in the'http://www.edankert.com/examples/' namespace.

To handle this, the XPath specification allows us to use a QName to specify an element or an attribute. A QName can be either a name on its own 'element' or a name with a prefix 'pre:element'. This prefix however needs to be mapped to a Namespace URI. So mapping the 'pre' prefix to the 'http://www.edankert.com/test' Namespace URI should allow us to find all 'element' elements defined in the 'http://www.edankert.com/test' namespace.

In this case for instance we could use the 'edx' prefix and map this prefix to the 'http://www.edankert.com/examples/'namespace URI. This would result in the following XPath expression that should return all 'cd' elements that are declared in the'http://www.edankert.com/examples/' namespace: '//edx:cd'.

All XPath processors allow you to specify prefix-namespace mappings, however how depends on the specific implementation. See below for examples of how to map namespaces and prefixes using Jaxen (JDOM/dom4j/XOM), JAXP and XSLT.

Jaxen and Dom4J

The following code reads a XML Document from the file system in a org.dom4j.Document and searches this document for 'cd'elements defined in the 'http://www.edankert.com/examples/' namespace.

try {
  SAXReader reader = new SAXReader();
  Document document = reader.read( "file:catalog.xml");
  
  HashMap map = new HashMap();
  map.put( "edx", "http://www.edankert.com/examples/");
  
  XPath xpath = new Dom4jXPath( "//edx:cd");
  xpath.setNamespaceContext( new SimpleNamespaceContext( map));
  
  List nodes = xpath.selectNodes( document);
  
  ...
  
} catch ( JaxenException e) {
  // An error occurred parsing or executing the XPath
  ...
} catch ( DocumentException e) {
  // the document is not well-formed.
  ...
}	

The first step is to create a SAXReader, which is used to read the 'catalog.xml' document from the file system and create a dom4j specific Document from it.

The next step is the same for all Jaxen implementations, this is to create a HashMap of prefix and namespace-uris.

To be able to use the Jaxen XPath functionality with dom4j we need to create a dom4j specific XPath object (Dom4jXPath) passing our XPath expression into the constructor.

Now we have created the XPath object, we can provide the map with prefix and namespace-uris to the XPath engine, wrapping this map in the SimpleNamespaceContext object, the default implementation of the Jaxen NamespaceContext interface.

The last step is to perform the search, calling the 'selectNodes()' method on the XPath, passing the complete dom4j Document as the context node for this method. any node in the document can be used as the context node

Jaxen and XOM

XOM is the newest kid on the block of the simplified Java DOM APIs, it's design promises an easy to use and to learn interface.
try {
  Builder builder = new Builder();
  Document document = builder.build( "file:catalog.xml");
  
  HashMap map = new HashMap();
  map.put( "edx", "http://www.edankert.com/examples/");
  
  XPath xpath = new XOMXPath( "//edx:cd");
  xpath.setNamespaceContext( new SimpleNamespaceContext( map));
  
  List nodes = xpath.selectNodes( document);
  
  ...
  
} catch ( JaxenException e) {
  // An error occurred parsing or executing the XPath
  ...
} catch ( IOException e) {
  // An error occurred opening the document
  ...
} catch ( ParsingException e) {
  // An error occurred parsing the document
  ...
}

We need to create a Builder object, to read the 'catalog.xml' document from the file system and to create a XOM specific Document.

Next we create the HashMap of prefix and namespace-uris.

We need to create a XOM specific XPath object (XOMXPath) passing our XPath expression into the constructor to be able to use the Jaxen XPath functionality with XOM.

After we have created the XPath object, we again provide the map with prefix and namespace-uris to the XPath engine, wrapping this map in the SimpleNamespaceContext object.

Finally we perform the search by calling the 'selectNodes()' method on the XPath object, passing the XOM Document as the context node for this method.

Jaxen and JDOM

JDOM, the first of the simplified XML APIs.
try {
  SAXBuilder builder = new SAXBuilder();
  Document document = builder.build( "file:catalog.xml");
  
  HashMap map = new HashMap();
  map.put( "edx", "http://www.edankert.com/examples/");
  
  XPath xpath = new JDOMXPath( "//edx:cd");
  xpath.setNamespaceContext( new SimpleNamespaceContext( map));
  
  List nodes = xpath.selectNodes( document);
  
  ...
  
} catch ( JaxenException e) {
  // An error occurred parsing or executing the XPath
  ...
} catch ( IOException e) {
  // An error occurred opening the document
  ...
} catch ( JDOMException e) {
  // An error occurred parsing the document
  ...
}

First we create a JDOM specific Document using the SAXBuilder object.

Next we create a JDOM specific XPath object (JDOMXPath.

After this, we can provide the map with prefix and namespace-uris to the XPath engine, wrapping this map in theSimpleNamespaceContext object.

Finally we perform the search by calling the 'selectNodes()' method on the XPath object, passing the JDOM Document as the context node for this method.

JAXP XPathFactory

Since version 1.3, JAXP also provides a generic mechanism to perform XPath searches on XML Object Models.
try {
  DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
  domFactory.setNamespaceAware( true);
  
  DocumentBuilder builder = domFactory.newDocumentBuilder();
  
  Document document = builder.parse( new InputSource( "file:catalog.xml"));
  
  XPathFactory factory = XPathFactory.newInstance();
  XPath xpath = factory.newXPath();
  xpath.setNamespaceContext( new NamespaceContext() {
    public String getNamespaceURI( String prefix) {
      if ( prefix.equals( "edx")) {
        return "http://www.edankert.com/examples/";
      } else if ... 
        ...
      }
      
      return XPathConstants.NULL_NS_URI;
    }
  
    public String getPrefix( String namespaceURI) {
      if ( namespaceURI.equals( "http://www.edankert.com/examples/")) {
        return "edx";
      } else if ... 
        ...
      }  
    
      return null;
    }
  
    public Iterator getPrefixes( String namespaceURI) {
      ArrayList list = new ArrayList();
    
      if ( namespaceURI.equals( "http://www.edankert.com/examples/")) {
        list.add( "edx");
      } else if ... 
        ...
      }
    
      return list.iterator();
    }
  });
  
  Object nodes = xpath.evaluate( "//edx:cd", document.getDocumentElement(), 
                                 XPathConstants.NODESET);
  
  ...
  
} catch ( ParserConfigurationException e) {
  ...
} catch ( XPathExpressionException e) {
  ...
} catch ( SAXException e) {
  ...
} catch ( IOException e) {
  ...
}

First we build a org.w3c.dom.Document using the JAXP DocumentBuilderFactory functionality, making sure namespace processing is enabled.

We can now search this document by creating a XPath object using the XPathFactory.

To provide a map with prefix and namespace-uris to the XPath engine we need to implement the NamespaceContext interface, there is currently no default implementation available. This means implementing the getNamespaceURI, getPrefix and getPrefixes methods, making sure the methods return the correct values, also for the 'xmlns' and 'xml' namespace prefixes.

After we have provided the NamespaceContext to the XPath engine, we can evaluate our XPath expression using the evaluate method, providing our XPath expression, using the root element as the starting context and specifying a NodeList as the desired return type.

XSLT

XPath was originally designed to be used with XSLT, this (and maybe because XSLT is an XML vocabulary) might explain why declaring prefix namespace-uri mappings in XSLT seems very natural.

<xsl:stylesheet version="1.1" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="//edx:cd" xmlns:edx="http://www.edankert.com/examples/">
    <xsl:apply-templates/>
  </xsl:template>
</xsl:stylesheet>

To specify the prefix namespace-uri we can simply specify a namespace-uri for the 'edx' prefix, using the normal XML mechanism.

To get the same output as for the previous examples, we can use a xsl:template that matches our //edx:cd XPath expression.

Conclusion

So, to be able to use XPath expressions on XML content defined in a (default) namespace, we need to specify a namespace prefix mapping. As we have seen, it does not matter what prefix the namespace is mapped to.

This same mechanism can also be used to search for elements that have been defined using a different prefix. This means that the above examples will also work on the following XML where instead of using a default namespace, the namespace has been mapped to the 'examples' prefix:

<examples:catalog xmlns:examples="http://www.edankert.com/examples/">
  <examples:cd>
    <examples:artist>Sufjan Stevens</examples:artist>
    <examples:title>Illinois</examples:title>
    <examples:src>http://www.sufjan.com/</examples:src>
  </examples:cd>
  <examples:cd>
    <examples:artist>Stoat</examples:artist>
    <examples:title>Future come and get me</examples:title>
    <examples:src>http://www.stoatmusic.com/</examples:src>
  </examples:cd>
  <examples:cd>
    <examples:artist>The White Stripes</examples:artist>
    <examples:title>Get behind me satan</examples:title>
    <examples:src>http://www.whitestripes.com/</examples:src>
  </examples:cd>
</examples:catalog>

Using the XPath expression '//edx:cd' and namespace prefix mapping from the examples above will again return all 'cd' elements that are declared in the 'http://www.edankert.com/examples/' namespace.

Sample Code

Download any of the archives to try out the examples above.

The archives consist of the ./catalog.xml document and 4 Java code examples (in the ./src directory) to search the document using DOM, JDOM, dom4j and XOM.

To run these examples, please use the following command-line options:

ModelCommand Line
DOMjava -cp xpath-examples.jar com.edankert.examples.dom.XPathExample
JDOMjava -cp xpath-examples.jar;lib/jdom.jar;lib/jaxen-1.1.1.jar com.edankert.examples.jdom.XPathExample
dom4jjava -cp xpath-examples.jar;lib/dom4j-1.6.1.jar;lib/jaxen-1.1.1.jar com.edankert.examples.dom4j.XPathExample
XOMjava -cp xpath-examples.jar;lib/xom-1.0.jar;lib/jaxen-1.1.1.jar com.edankert.examples.xom.XPathExample

The archive also contains the example XML Stylesheet (./catalog.xsl). To process the XML with the stylesheet please invoke your favorite XML Processor from the command-line or use the transform.xhp project included in the ./xmlhammer-projects directory.

To be able to process the transform.xhp and the also included xpath.xhp project, you will need to have the XML Hammerapplication installed. This can be downloaded from:

http://www.xmlhammer.org/downloads.html.

Resources


출처 - http://www.edankert.com/defaultnamespaces.html#Jaxen_and_Dom4J




'Development > XML' 카테고리의 다른 글

xml - XSLT  (0) 2013.08.10
XQuery  (0) 2013.02.20
XSL(Extensible Stylesheet Language)  (0) 2013.02.19
XPath  (0) 2013.02.19
XML - XML 스키마(XSD) 및 xsi 접두어 의미  (0) 2012.10.02
Posted by linuxism
,