Alpo Honkapohja · Jukka Suomela

Lexical and function words or language and text type? Abbreviation consistency in an aligned corpus of Latin and Middle English plague tracts

Digital Scholarship in the Humanities · to appear


This study examines the consistency of medieval abbreviation practices in a parallel corpus consisting of Latin and Middle English copies of a plague treatise attributed to John of Burgundy. Focusing on different versions of the treatise enables us to maximize textual and lexical overlap, comparing differences caused by text type, word type, and language. We examine how the following variables affect the consistency of abbreviating across manuscript witnesses: A) language: Latin vs. English, B) text type: recipes vs. running text, C) word type: lexical vs. function words, and D) the number of characters in a word. Variables A-D are compared using a parallel corpus of automatically aligned rich TEI P5 XML-tagged transcriptions of six manuscript witnesses to the JB treatise. The alignment process is based on computer-human collaboration and a custom-built alignment tool which uses sections tagged in the TEI XML file and word division. The results reveal that abbreviation was overwhelmingly more consistent in Latin than Middle English and somewhat more consistent in recipes. High token counts of frequent lexical items had a major effect on the results. Word length worked better than division into lexical and function words.


This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author’s copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.