you have to have text to work with. In my case, I have text, but it’s handwritten on parchment. I spent a good amount of last semester transcribing pages of text from Univ of Illinois MS 80, a small manuscript (about 3.5″ x 5.5″) of 122 folios (244 pages) into a word document. While my enthusiasm to record as much of the information from the manuscript as possible in my transcription was admirable for my paleography project, the data I have will need some adjusting for my digital humanities project. I plan to use plagiarism software to see if I can better narrow down the original source material for the portions of MS 80’s text where the scribe simply states that some of the writings are from St. Bernard, from St. Anselm, and from St. Augustine. In order to run MS 80 (in Middle English) and the source materials (some Latin, some Middle English, some modern English, depending on the source and where I’m able to obtain a digital copy) through the plagiarism software to (hopefully!) detect overlapping text, I need unmarked, unformatted, digitized text.
As my digitized version of MS 80 stands now, several problems are immediately apparent:
1. I need to remove the table (not a big fix)
2. The punctuation needs to be standardized. This is a little more complicated; the periods sometimes indicate a sense break (comma) and sometimes indicate the end of a sentence (semicolon or period). Sometimes, they simply indicate that the reader should pause briefly before continuing. I’m still not exactly sure whether the slash marks indicate the end of a paragraph, a sentence, or both. It seems to depend on which scribe is copying at the time, and also depends perhaps on the formatting of the source material.
3. All abbreviations need to be spelled out, with parentheses removed. However, if the source material that I’ll be checking this manuscript against is formatted with parentheses, I will either have to remove those as well, or leave them in my text and be prepared for screwier data.
4. If the source material is modernized, I will need to create a modernized version of my manuscript. This is problematic in itself, as my vocabulary selection may not always match that of other editors.
5. The spelling, too, is not standardized. If my sources are in Middle English (my preference, to help decrease the number of variables involved here), their spelling will also not be standardized, nor will it be consistent within itself.
So, my original concern that I wouldn’t be able to find adequate software is turning out to be less of a concern as the issue of data preparation becomes more complex and unwieldy. I’m learning quite a bit about textual editing theory in the process, which I’m finding oddly enjoyable. The more I work on this, the less likely it seems that I will actually come away with new information about MS 80. I may end up simplifying this project by searching for similarities between two *known* borrowings (for example, MS 80 includes a well-known & oft-copied version of a prayer to Jesus) and see how well the plagiarism software detects what scholars have already verified, and then determine whether or not this is a viable option for detecting textual echoes.