August 20: The wrap up

The big project is due today, so I am calling this the big wrap up. However, as we discussed in class, there is really no such thing as a “finished” digital history project by its very nature, and so this post will be more of a summary of what I have done so far, what questions I have asked and begun to find answers for, and what steps and tools I would envision carrying this project forward.

So far, I have used wget to download Equity papers in .txt format, including the January 14, 1897 edition which I used for this paper. After trying several cleanup methods, I used TEI to cleanup and encode the single file. I encoded for people, places, medicines, and sales items. I used these tags to begin answering the questions I settled on:

  1. How often are well-known people vs. regular people mentioned?
  2. How might this speak to the function or readership of the paper?
  3. How frequently are locations mentioned?
  4. Does this speak to the relative “world” they lived in?
  5. Are some items sold more than others? What and why?


I used Voyant to visually map word frequencies and word connections using the Cirrus and Links tools. I produced two word pictures which confirmed my previous suspicions about what words would be most frequent throughout the paper. However, I had to re-evaluate my original premise and understand that word frequency could not automatically be equated with popularity, editorial intent, cultural importance, or local reality. This was especially true because the paper had been poorly digitalized using OCR technology. It was unclear what the “real” results would have been had the OCR done a better job and more fully captured the paper’s contents.

I explored and was confronted by the ethical and methodological challenges of doing digital history in public (on the web), and I re-evaluated many of my plans and initial suppositions. My project changed significantly over time, a product of an evolving understanding of the strengths and limitations of the digital mediums I was using, and the practical challenges I faced (described in my large fail-log).

I was able to conclude in a preliminary sense that yes, more wealthy people seemed to be in the paper, but that no, this couldn’t really provide an accurate reading of dynamics and daily realities in the Shawville region because of the presence of the council elections and the fact that the sample size was too small to be reliable. The same conclusion was drawn regarding locations and their relative importance to the region. I did not get far with the sales research I wanted to do, but did notice an abundance of winter-type clothing (as it was January) and meat products on sale. I also noticed many quack medicines on sale, a common situation during the late 1800s and early 1900s.

Going forward, I would suggest running several visualisations through Gephi and some topic modelling through a program such as Antconc.


The link to my paradata is on this google drive. This will need to be downloaded because it is a .md file and won’t read automatically.

Please also find it at github.


Leave a Reply

Your email address will not be published. Required fields are marked *