Librarians are often considered nerdy, anti-social, not very much fun, and slightly quirky. These stereotypes may be true for some librarians, but it doesn’t accurately demonstrate how smart and technologically sophisticated librarians are.  Indeed, while they often look like they are simply putting in key words, it is important not to forget they have also sorted those keywords, attached them to the document through the library database, and indexed many different types of information for our benefit. The “background” work they have done and their knowledge about different ways to search and what key words might be helpful is invaluable.

Let’s take the scenarios described in Cameron Blevins’ Topic Modeling Martha Ballard’s Diary as our example. Blevin’s provides a simple example, explaining that searching for “God” does not bring up words that effectively mean “God” but are different. He gives the example of “Author of all my Mercies” also meaning “God”, but in a more descriptive way. Sarahmcole and I have a conversation about this notion, where sarahmcole agrees with the author about the limited nature of the search function and I agree with them both, but ask how we are supposed to know our search is missing vital key words if we are solely relying on “big data” and data mining. The system is even more difficult when cultural differences are involved, such that “God” means something else or is described differently.

I suspect this is where people trained in this task come in: Librarians. This is especially true at a university or archive where the librarians are more likely to be specialised or experts in their particular field. Having a greater knowledge of the subject area than ourselves, they can suggest alternative key terms to broaden our search.

However, before the librarian is able to run that deceptively-simple search, they have made the material searchable. From the brief exercises we have done on finding, extracting, mining, comparing, and labelling data, it is clearly not an easy task. When presented with a copy of A Midwife’s Tale, how is the librarian expected to know the contents, put a bibliographic entry together, connect it to search terms and make it available? It is certainly not possible that each document or piece of information destined for the digital world is carefully read and annotated. But yet, I suspect this must be the case, unless all key words are generated by digital frequency analysis, in which case, the potential for inaccurate and unrealised patterns and discoveries is great. Rachel_johnson alludes to these difficulties in her annotation.

In Academic Journals: The Most Profitable Obsolete Technology in History, the author explains that making and publishing academic journals is hard. I agree, but suggest that as hard as making the journal is, compiling and making searchable any data is harder. Let’s think this one through for a minute, taking our exercises and the discussions about GeoCities as our starting point. With the old newspapers, they were originally print. Someone had to scan them and ensure they were readable by OCR. This takes time and technological skill. Then key words had to be identified and a bibliographic entry made for the document. Well done OCR may need to be partially retyped and all the data has to be linked to similar data so that associations and searches can be usefully made. In the case of public work like GeoCities, in order for it to become available for searching, the process most likely looks like something Ian Milligan and others in our readings have discussed. Even when each web-page is archived, someone has to make it searchable. This can be done through a complicated process that I do not fully understand, but sort of attempted (I think) in this week’s assignments.

Assuming the librarian has read all of A Midwife’s Tale, issues around bias present themselves, both in the constructing and identifying of key words and in the assistance provided to the researcher. Sarahmcole and Csamuelson consider this, with Csamuelson suggesting an “open source” approach to counter the bias. Might this be a good place for librarians to go as well? Would it be helpful (and lesson the workload and responsibility) for librarians to share their initial work beforehand? The idea is certainly interesting and fits well with the readings and discussions we have been having recently.

Further questions to consider include the context, time period, and cultural background of the librarian and what effect this might have on the way the source is viewed. Might a librarian be more conscious of war-themed topics during a time of war? Does an African-American librarian see a text the same way a librarian of European descent see it? Twenty years from the time it was inputted, might a previously glanced-over topic become relevant? Do we regularly need to revisit old search programs and update them? What about “politically correct”? If the data mining and word frequency of a 1920s US novel identifies words which we now consider unacceptable, does the librarian include them? Is there a difference between information made available for historians, where the identification of such words may be valuable, and the use of such terms in a public library catalogue?

Finally, how is it possible for librarians to accurately and effectively catalogue the billions of millions of entries that the public would like to put on the web? Is it reasonable to expect librarians to verify, separate and log every key word, create unique programs to process and store that information, and spit it out in a neat structure the public can understand? Remember, people are not born knowing how to use various databases, but they have to be taught on user-friendly programs. And these thoughts are based solely on relatively simple search engine work. The more specialised the material gets, the more work is involved and the more support the librarian needs to provide to users.

So, the next time the librarian tells you to “pipe it down a little”, be considerate. After all, he or she may be pouring over 100 years of a small-town newspaper on microfiche, trying to decide how to digitise it, how to categorise it, and how to make it easy for you to type in “hospital” and “1919” to discover something truly groundbreaking, or in the case of big data, illustrating, confirming, or challenging what you already know.

