I’ve spent the morning building a prototype WebObjects app to act as an xml metadata server. I’ve embedded eXistDB into the application, and it created the necessary database files and indices for me.
Then, I wrote a short method to import xml documents from a path (and added the added bonus of importing a whole directory if that was given). 3600+ records in the embedded database.
And boy, is it fast. Queries are almost instantaneous (~100ms typical), but document retrievals are a wee bit slower, increasing linearly with the number of hits. I haven’t added any limits, so you can do a query for something lame like “*a*” and get the whole database back in one page.
The embedded eXist database doesn’t use the XML-RPC API like the standalone database does, so there isn’t any marshalling/unmarshalling overhead. Just native java calls.
When considering the document retrieval isn’t optimized (and is just basically a debug “dump the entire LOM as the item to display”), performance is quite acceptable already.
Here’s the stats from a simple search for “biology”
Query: //text() &= ‘biology’
Query Time: 124 ms
Retrieval Time: 6059 ms
That retrieval time includes pulling the ENTIRE LOM for each and every one of the 377 results.
UPDATE: Just ran some more tests, and cracked open the debug log file. Here’s what I found:
09 Dec 2003 18:42:00,401 – loading 3647 documents from 2collections took 3ms.
09 Dec 2003 18:42:00,411 – found image: 2800 in 4ms.
09 Dec 2003 18:42:00,414 – found nasa: 9 in 0ms.
09 Dec 2003 18:42:00,417 – found space: 13 in 1ms.
query: //text() &= ‘image nasa space’
query time: 86
retrieve time: 22
Finding 2800 records containing the string “image” took 4ms. Holy freaking cow.
From the various test queries I’ve run, on average the vast majority of the time is spent retrieving the documents out of the database. The query runs extremely fast, but yanking the entire LOM out takes some time. I’m going to look at ways to only pull various XPATH values rather than the full record – that may be faster…