Expose Blob

It just struck me that the fancy Expose Blob on MacOSX 10.3 is really quite useless. It’s not that convenient to mouse over to it and click the button. Much easier to just hit F9 or whatever you’ve set your Expose Key to…

It would be MUCH more useful if you’ve got a touch screen. Like, say, a SMART Board, or, say, a tablet. Newton could have used the Expose Blob.

Expose Blob

I’m just saying… Wonder what’s under wraps for MWSF next week…

eXistDB Prototype Database Server

I’ve got a prototype eXistDB server, built as a WebObjects application, running on an iMac on my desk. Works pretty well, and it does some great XQuery stuff. I’ve entered the CAREO metadata, current as of a week ago, so it’s got 3733 IMS LOM records to play with.

Check it out.

It has a simple search (just enter a term and hit the button), as well as a generic XQuery entry panel. Feel free to experiment on any XQuery statements you want (do a simple search and look at the source HTML for the result page for a starting point…) Go ahead and try some funky boolean searches like “earth image satellite”. Still some refinements left to go (like limits on search results – it’s currently possible to return the entire database as a result of a query – not recommended). I also have to play around with handling multiple schema – IMS LOM, DublinCore, MPEG7, IMS CP, METS, … for both querying and retrieval.

Basically, eXist is pretty darned good, and Wolfgang (the developer) is quite responsive. Not sure how it compares to XStreamDB performance-wise, but it does beat it on the cost…

Merry Christmas!

I’ve been trying to be offline for the last few days (and will keep trying until Jan. 5, when I’m Back In The Office). I’ll likely be lurking online, though.

Took Evan to see Santa last night. He has been fine with Santa before – sat on his knee several times, for lotsa photos – but last night, he decided that Santa Is Evil. Crying and fussing must be associated with attempted Santa Knee Sittings. Oh, well… It was still pretty cute.

Evan on Santa's Knee, 2003

Merry Christmas, everyone, and I’ll see you in 2004 (hopefully refreshed and raring to go…)

Federated Identity Management

Looking into techniques to allow us to decentralize user management in cross-institutional (and non-institutional) software, such as APOLLO.

Here are some links I’ve come across on the topic:

Many of these articles look like corporate shovelware “Read about how smart we are – give us money” but maybe there’s some good stuff in there, too.

This is stuff waaaay outside my normal realm of things, so I’ll be doing some reading/thinking about this stuff, and how it might affect CAREO/APOLLO.

The goal is to be able to do something like this scenario:

Bill is a professor at the University of Calgary. He securely logs into an APOLLO search application using his U of C login, and APOLLO is aware of the groups and roles that Bill has as part of his U of C identity.

Mary is a grad student at the University of British Columbia. She logs into an APOLLO collaborative application using her UBC login, and is able to access resources defined by her groups and roles described by her UBC identity.

Bill and Mary are working together on a project, and Bill creates an ad-hoc group in APOLLO for them to share resources privately while collaborating on their development. Once ready for publication, these resources are made available to individuals at both the U of C and UBC.

eXistDB and WebObjects

I’ve spent the morning building a prototype WebObjects app to act as an xml metadata server. I’ve embedded eXistDB into the application, and it created the necessary database files and indices for me.

Then, I wrote a short method to import xml documents from a path (and added the added bonus of importing a whole directory if that was given). 3600+ records in the embedded database.

And boy, is it fast. Queries are almost instantaneous (~100ms typical), but document retrievals are a wee bit slower, increasing linearly with the number of hits. I haven’t added any limits, so you can do a query for something lame like “*a*” and get the whole database back in one page.

The embedded eXist database doesn’t use the XML-RPC API like the standalone database does, so there isn’t any marshalling/unmarshalling overhead. Just native java calls.

When considering the document retrieval isn’t optimized (and is just basically a debug “dump the entire LOM as the item to display”), performance is quite acceptable already.

Here’s the stats from a simple search for “biology”
Query: //text() &= ‘biology’
Hits: 377
Query Time: 124 ms
Retrieval Time: 6059 ms

That retrieval time includes pulling the ENTIRE LOM for each and every one of the 377 results.

UPDATE: Just ran some more tests, and cracked open the debug log file. Here’s what I found:

09 Dec 2003 18:42:00,401 – loading 3647 documents from 2collections took 3ms.
09 Dec 2003 18:42:00,411 – found image: 2800 in 4ms.
09 Dec 2003 18:42:00,414 – found nasa: 9 in 0ms.
09 Dec 2003 18:42:00,417 – found space: 13 in 1ms.
query: //text() &= ‘image nasa space’
hits: 3
query time: 86
retrieve time: 22

Finding 2800 records containing the string “image” took 4ms. Holy freaking cow.

From the various test queries I’ve run, on average the vast majority of the time is spent retrieving the documents out of the database. The query runs extremely fast, but yanking the entire LOM out takes some time. I’m going to look at ways to only pull various XPATH values rather than the full record – that may be faster…