Programming
When working with XML, there are two standards you can
consider:
-
the XML Document Object Model (XML DOM).
With this specification, you access the XML data
through a hierarchical object oriented interface,
so you can actually traverse the hierarchy of the
document, without a specific order (i.e., you can
also step back into upper hierarchies).
-
the Simple API to XML (SAX).
With this specification, you walk through the
XML document on a line-by-line basis. You cannot
go back, nor can you skip a sub-hierarchy:
all elements must be processed.
Either method has its benefits and drawbacks:
- XML DOM (cons)
-
The XML document must be
well-formed, otherwise the DOM method
cannot access all nodes of the document.
-
The XML document must be loaded, parsed
and the DOM tree must be built
before you can access every
single node. This means that DOM is rather
slow and memory intensive.
- XML DOM (pros)
-
You can step through the document at will;
you can access all nodes at any time.
-
You can easily make changes to the DOM
hierarchy, and easily save these changes
as well. E.g., you can alter values, or
rearrange, add and delete the nodes
themselves.
-
You can use XPath (or XSL queries) to find
groups of nodes; you do not need any
logic for finding specific data within
your application.
-
You can use XSL Transformations (also
known as just XSL or XSLT) to
declaratively alter the XML structure
with XSL templates. For example, the
following rather useless template
- <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
- <xsl:template match="nest">
- <sort>
- <xsl:for-each select=".//*">
- <xsl:sort select="name()"/>
- <xsl:element name="{name()}"/>
- </xsl:for-each>
- </sort>
- </xsl:template>
</xsl:stylesheet>
changes
into
- SAX (cons)
-
The SAX(2) specification is not widely
known, nor is it well documented. You're
pretty much on your own here.
-
As said before, SAX processes the XML
document on a line-by-line basis, forcing
you to provide your own caching
mechanisms if you'd want to search the
XML document.
- SAX (pros)
-
SAX is a lot faster than DOM, and with a
lot less overhead.
-
SAX is (theoretically speaking) better
equipped to handle mallformed XML
documents. When you've got a mallformed
XML document, it's likely the DOM method
will be useless, yet SAX will only
stumble over the error as they occur.
There are several excellent free implementations of XML
parsers and processors, like the MSXML3 implementation
by Microsoft, see the
References section.