Having sorted out the paragraph tags (<p> and </p>) and making sure everything was aligned properly, it was time to begin the actual encoding. For this, I used Professor Graham’s examples set forward in his tutorial.
My motivation had changed from the beginning of the project and so too had my questions. As I wrote that day “I will encode it and then people will be able to quickly search it for people’s names and such. If combined with many different but similar Equity files that have also been similarly coded, this can be visualised and used for much more useful research.”
My new research questions focused on the skimming I had done when breaking the text into paragraphs. I had noticed that the paper focused heavily on advertisements and political reporting. From this, I created these questions that framed my work going forward:
How often are well-known people vs. regular people mentioned?**
How might this speak to the function or readership of the paper?
How frequently are locations mentioned? Does this speak to the relative “world” they lived in? E.g. closer contact with locals – what they were interested in?
Are some grocery items sold more than others? What and why?
These questions are vastly different than the ones I started out with, but they represent a different focus. Rather than taking a strictly big data approach and doing topic modelling and comparisons of political sentiment over time, I focused on the local situation. I wanted to know how people were represented in the paper, how and why place names were talked about, and what domestic life was talked about. I chose to look for these answers by identifying, counting, and mapping people’s names, place names, and sale item types.
Based on the prof’s example encoding to create a stylesheet. A stylesheet is necessary to tell the .xml file (a web-readable, encodable version of my text file) how to do the actual encoding. I began encoding for people, places, medicine, and sales.
Persons <persName key=”Last, First” from=”YYYY” to=”YYYY” role=”Occupation” ref=”http://www.website.com/webpage.html”> </persName>
Places <placeName key=”Sheffield, United Kingdom” ref=”http://tools.wmflabs.org/geohack/geohack.php?pagename=Sheffield¶ms=53_23_01_N_1_28_01_W_type:city_region:GB””> </placeName>
Medicine <medicineName key=”Name” from=”Business” claim=”medProperties” ref=”website”> </medicineName>
Sale <saleType key=”type” from=”Business” to=”amount”> </saleType>
I began with people since I realised it would take the longest. Here is an excerpt from my fail-log:
“I used the formula below to encode the first name (Cation Thornloe), which appears to be a poorly rendered Captain Thornloe).
<p> <CationThornloe <key=”Thornloe, Cation” from=”?” to=”?” role=”Bishop” ref=”none”> </persName>
I received a parse error saying it was improperly formed.”
Prof. Graham asked if I had a stylesheet for the xml file. I did not, nor did I know what it was. He explained and provided a sample. To me, it seems like a legend that formats the tags we put in the xml file.
After much trial and error and help from my brother and the kind professor, I made sure the .xsl was referred to in the .xml file. I also had to go through and make sure the tags were properly closed again. It worked eventually.
I also had some ethical and methodological concerns about the encoding I was doing, specifically about the information I was including in the tags which were meant to help future researchers. I discussed this at length in my notes and worked out a methodology that I felt comfortable with. A snippet of that conversation concludes this posting. I also made the decision to only encode the first 200 lines of the document. I did this because of the enormous amount of names in this issue (because of the municipal elections reporting) and because I was short on time and believed that 200 lines was sufficient to gain a preliminary understanding of the answers to my research questions and provide a proof of concept.
“The information I include in the brief description of the person also needs to be problematized because each person could have at least a full essay written about them, their politics, significance to early Ottawa history, influence at the Bank, trade influence, religious and social views, etc.
Each decision about what to include should be explicitly given, although I am not sure where it would be most appropriate to do so or how that type of paradata should be shared. For example, in looking up George Hay, I found one biography for him that seemed to give a rather complete history of his interactions with several different aspects of life. I chose not to look at any addition resources after briefly searching to make sure George Hay was the person I was looking for (context included connection to the Bank of Ottawa, that he was wealthy and influential). I also chose to exclude some of the information about his complex political ties more detailed religious leanings. I did include that he was the leader of the Ottawa Bible Society because it seemed to include his broader position within the religious circles about Ottawa. I also included it because as a person of faith myself, I found myself identifying with him on this level.”