As mentioned previously, TEI encoding is relatively easy once you get the hang of it, but it takes a long time. On August 13-15, I continued working on the TEI encoding, moving from people, to place names, to medicines, and to sale items.
I had to change some of the URLs from the people references from “&” which as discussed previously, is something that the .xml format won’t read, to & which is the accepted format for the same thing. When it loads on the internet, it renders it the exact same way, so that the URL loads anyway.
When encoding places, I decided to work only on recognised place names (cities mostly) rather than grocery stores other buildings. I also decided to focus only on cities and states, because there really is not much to say about “Canada” or “Portugal”. I also did not include any actual information about the location other than city, province, country, and a reference to the appropriate website.
I chose to present medicine as a separate category from other sales items because I had a personal interest in patent medicines (quack medicine). Here I used the PDF copy of the original newspaper to search for medical advertisements, because I was less familiar with this category and because this would ensure I did not mislabel any of the medicines (they all seemed to focus on the same type of ailments and key words).
I had some trouble with the medicine category precisely because the OCR jumbled up the text and separated labels from product descriptions. As I noted that day:
“Problem of finding these items by keyword (eg. Hair for Hall’s Hair Renewer) is that because the advertisements are often broken up, other references such as scalp, bald, re-growth, etc are not caught. This is the type of thing that the topic modeling would catch and group together to give a more accurate representation of how often Hall’s Hair Renewer was mentioned and in what contexts (especially if searching for the topic modeling that uses positive and negative sentiments).
This is interestingly the case for “Warners safe kidney and liver cure”. While “Kidney” is sprinkled throughout the document, it is done without an explicit connection to Warner, so it is unethical and unfair to assign it to his medicine. This is especially true because Doan also has a kidney cure advertised within this edition and without being connected to a specific name, it would be poor methodology to include it.”
The sale was challenging in that I did not know exactly what I wanted to do with the information. Originally, I thought about categorising it merchant and then listing what they sold. In the end, I decided it would make better sense for me later on to search by type of sales item (e.g. staples, meat, clothing). This would present me a list of meats and allow me to do further work with frequency of type of meat and vendor. If this work was taken to a larger scale, it would be possible for the historian to track these variables over time and compare by season. For example, we would be able to determine when merchant x started selling fur coats in 1897 and if this became earlier or later by 1904. This might be an indication of larger consumer patterns or market changes.
I had a discussion with my paradata document about what qualifies as “luxury” and “staple” items. Conversations like this are important to note because it preserves my integrity as a researcher aware of the potential problems within my methodology and it raises issues that other researchers may not have considered. Digital history is a community of practice and it is perfectly fine to work with others, to inform and be informed by the work, failures, and successes of others, and to base your initial work off of others’. In order to properly document this, be honest, and acknowledge the collaboration and expertise of others, I have done my best to note where I received help (largely class tutorials and my brother).