XML Retrieval and Processing Comparison

I’m clearing my whiteboard, and need the space this was occupying, so I’m dumping it here for future reference. The following table compares the time it takes to retrieve XML from various sources (XML databases and the like) as well as to perform various types of processing (nothing, save as file, convert to DOM…). This table was very useful when we were coming up with our current XML storage strategy.

XML Store	Retrieving	Process	Time (ms) (289 records per query)	Time (ms) (per record)
XStreamDB	Minimal XML (handful of elements)	DOM via DOM4JKVC	3038.9	10.52
	Minimal XML (handful of elements)	NSDictionary	2684.6	9.28
	Full LOM (entire XML Document)	DOM4JKVC	3371.5	11.66
		null (no processing)	1854.3	6.41
		W3CDOM	3513.8	12.16
		NSDictionary	N/A *	14.28 *
JUD		Save File	33 minutes / 4056 records	488.17

This process was flakey at best, and refused to convert some records to NSDictionary objects, so the multiple conversion method failed.

All XStreamDB tests were performed using a WebObjects application that ran some java code to perform an XQuery against an XStreamDB database containing a copy of the CAREO repository (4056 records).

DOM4JKVC: a simple version of what has become the JavaEOXMLSupport.framework. It uses DOM4J to provide a Key Value Coding interface around a DOM Element, and involves parsing an XML string into a DOM Element.

NSDictionary: Uses WebObjects’ built-in XML-NSDictionary conversion (without mapping file)

W3CDOM: Simple conversion of the XML document into a W3CDOM Document.

Save File: Just save the XML string to the filesystem with no additional processing.

The JUD test was performed by using a Python script that pulled every document out of the live CAREO repository, and saved them to the local filesystem as .xml files.

All tests were performed on my PowerBook, with the XML being retrieved from a separate server (XStreamDB on our commons webserver, JUD on the U of C IT appserver)

This was nowhere near a fully empirical test, with extra variables popping in all over the place. The goal of this was to give an idea at the order-of-magnitude level of which strategy was fastest, and which was slowest. Basically, I needed to see if pulling the full LOM from XStreamDB and converting the whole shebang into a DOM would kill us. Turns out it’s over an order of magnitude faster than what CAREO had been doing… I also needed to get an idea of the additional time it would take to wrap a DOM Element with the Key Value Coding interface. Turns out that didn’t add anything, and somehow actually shaved some time off (although that is due to the DOM4J vs. W3C class performance).

I was surprised to fine that the combination of XStreamDB + DOM was approximately 41 times faster than the JUD (MySQL database to store any XML document by breaking it into Elements, Attibutes, and some other meta stuff, and reconstituting it on the fly via a PHP script).

Once I’ve got JavaXStreamDBAdaptor.framework and JavaEOXMLSupport.framework polished off a bit more, I’ll add the metrics for their performance to this chart.