Bags-of-Keywords vs. Nested Taxonomies


It just struck me that the shortcut I described to allow me to easily search del.icio.us and Flickr would have been darned near impossible if they'd used some form of complex nested taxonomies instead of employing bags-of-keywords.

The more I play with tools that use bags-of-keywords, the more I see the power of them. Sure, you lose some of the elegance on the metadata side of things (hard to define relationships and parent-child links with bags of keywords, easy with taxonomies), but do users really care (or understand) that stuff anyway? Just enter a few more keywords, and eventually you'll have described it as richly as you would have with a Library of Congress classification, without having to memorize LoC...

I'm thinking the real benefit of the bags-of-keywords approach is that it makes it more elegant for the user (possibly at the expense of elegance for the metadata). We should be able to make the software do some really funky stuff under the covers to make up for that, though... Latent semantic analysis anyone? Intelligent algorithms? I think this is where the real value will come - make it easy for folks to mess around with the data, and provide a powerful chunk (or sets of chunks) of code to manage the complexity behind the scenes...

Just thinking out loud here (and wanting to capture that thinking so I can look back in a few months and note how silly I was).


comments powered by Disqus