eXist: Open Source XML Database-D'Arcy Norman, PhD

I initially sent this as an email to the group, but thought it might serve better on the weblog…

I’ve been playing around with eXist today. Holy crap.

I used Rob’s JUD export script to suck all 3600+ records out of the CAREO JUD (took almost 2 hours to process that), then ran the import function on eXist (took maybe 5 minutes to import them all).

It looks like it’s going to be able to do some pretty freaky stuff, search-wise. I’ve been playing around with some pretty loose XPath queries, and it returns excellent hits, pretty darned fast. It can be slow if I request, say, all documents with the letter “a” in them somewhere, but for normal queries, it’s stinky fast.

Even for some pretty compound queries, it’s fast, too.

Here’s an example:

document(*)//text() &= ‘image biology water’

This basically says: Return any xml document that contains, somewhere in the various elements in the document, the strings “image”, “biology”, and “water”.

It might match “image” in /lom/technical/format, and “biology” in /lom/classification/keywords/langstring, and “water” in /general/description/langstring.

This particular search returned 60 hits, taking a total of 638ms of processing. Without having added any indexing.

I did another search for:
document(*)//text() &= “biology video”

and it found stuff that would have been difficult to know it was a video otherwise (the technical/location had a value that had “/VIDEOS/” in it, so it matched.

Also, it seems to cache search results on the fly, so subsequent searches for the same thing return instantly. Very nice.