Thursday, February 14, 2013

#ItsAProcess

Apologies for my absence last week! Much of what I did here was technical research and doesn't translate to an exciting blogpost. One thing that I did which needs to be written about was organizing keywords into a hierarchy. Keywords, as most (if not all) of my readers know are different from subjects. Subjects are the overarching concepts that the material concerns.

adorable
For instance, a documentary about koalas entitled "The Lives of Koala Bears" might be considered about marsupials, while a keyword search would pick up the following words: lives, koala, bears. If the justifiably curious user was looking for information about different species of bear would find the aforementioned documentary in their results. However if the newly-cautious user searched for 'brown bear' in subjects they would find the proper related materials to their desired godless killing machine.

terrifying

At PBS, their behind-the-scenes keywords are somewhere between the above definitions of 'keyword' and 'subject'. Most of the searchable videos on the web site actually have captions that are searchable, so even if there's a passing mention of 'bears', they'll come up in the results. This works for animals, but let's switch to a different scenario.

Barcelona
A high school student has an assignment due about Barcelona. A visual learner, they decided to search the PBS website for episode segments that contained the keyword 'Barcelona'. Only one result in three pages of material is actually about the city Barcelona. The other results concern the Olympics, Wood Allen, and Javier Bardem (who is from Las Palmas but starred in the former's film 'Vicky Christina Barcelona'). Keywords! Oh, you...
not Barcelona

So I think we can all agree that subjects are important and generally (there are exceptions, to be sure) better than keywords. From my perspective, the PBS web site has some optimization to do (in all their spare time). Behind the scenes we have the ability to sort by subject, but all too often there are inconsistencies. A few weeks ago my supervisor e-mailed me his keyword list and asked me what I thought. While browsing, I noticed that while some capital cities (like Barcelona) were listed, but others (Buenos Aires) weren't, even though there was content that demanded it. I spent the day creating a hierarchical keyword list that could be resorted and reorganized, edited, and generally tweaked to our needs. What's a hierarchical keyword list, you ask? I'll show you!

Rather than appearing as:
World Geography
Barcelona

It would look like this:
World Geography



       Europe        



        Spain        


   Barcelona   

With each term becoming narrower and narrower. This makes it easier to organize and guarantee consistency. If created in Excel or a proper taxonomy management program it's also easy to collapse and hide columns, sections, and useless words.

The downside?

When it comes to consistency, providing every capital for every country in the EU isn't practical. While there are three results for Chisinau (two of which are about the Moldovan capital), there are precisely zero for Slovenia's capital Ljubjana. It is therefore pretty useless to include a keyword with no content attached. Furthermore, we don't have time to go through and create a new keyword list that would match everything in the extensive PBS collection. Therefore we must simply narrow down terms that are most likely to be relevant to our current and forthcoming material.

The upside?

I am learning about managing taxonomies in special libraries, especially one that deals with both current events and culture. It's fascinating, challenging, and no one's life depends on it. What we do is optimize and make as many resources accessible to the public as we can. This is a mission that both archives and libraries share, and largely use similar techniques. It's great to be able to talk to archivists about these challenges and have educational conversations. On Tuesday I visited University of Maryland's archives (Public Broadcasting Archives), and while I was there I had the privilege of speaking to Chuck Howell, one of their archivists. Speaking with him about their archives and their descriptions taught me a lot and opened my mind to some other possibilities (and future opportunities). I was surprised to learn that they used MARC21 to ensure that their archives are included in the OPAC. Just one record for each collection to ensure that they're accessible by the curious researcher. Each record had descriptions, a brief summary, and applicable subject headings as well as the name of the collection in the 650 tag. I look forward to going there again to visit and see their collections.