Please note: This course will be taught in hybrid mode. Hybrid delivery of courses will include synchronous live sessions during which on campus and online students will be taught simultaneously.
Dr Markl is a Research Fellow at the Institute for Analytics and Data Science and the Department of Language and Linguistics at the University of Essex. She is interested in language variation and change and the impact of speech technologies on speech communities. More broadly, her research explores the socio-technical contexts and impacts of computing technologies, in particular machine learning and “artificial intelligence”. Before joining Essex, she completed her PhD at the University of Edinburgh, where she conducted interdisciplinary work on algorithmic bias in automatic speech recognition, the development of language technologies for under-resourced languages, and computational methods for linguistic research.
Dr Redmon is a Lecturer in the Department of Language and Linguistics at the University of Essex, working primarily at the intersection of phonetics, psycholinguistics, and computing, but also addressing questions in morphology, historical linguistics, and Germanic. His research covers a variety of languages and language families, but is mainly focused on the South Asian region. This work began during his Masters at EFLU in Hyderabad, and continues through active collaborations with researchers at Jadavpur University in Kolkata, IIT in Guwahati, and the North-Eastern Hill University (NEHU) in Shillong. In the computational space, his main interests are in technological access for low-resource languages, and in the development of tools that are less dependent on large data sources (which definitionally exclude many communities), such as physiologically grounded automatic speech recognition.
Course Content
While there is great research interest in multimodal data (e.g., social media video), the large-scale analysis of such data is challenging. Complementing ESS courses on text analysis, we will focus on state-of-the-art tools to facilitate automatic transcription of audio and video data and handwritten or printed (non-digitised) text. We will furthermore briefly explore tools to automatically annotate video. These systems allow students to make use of complex multimodal data such as social media videos.
The first half of the course focuses on the theoretical background, introducing core concepts and techniques to ensure students have a basic understanding of the underlying technologies. The second half of the course focuses on the application of state-of-the-art tools. Each day consists of one theory session and one practical session, and students will be a completing in-class programming exercises. During the second week, students will complete a small data analysis project, including analysis design, data extraction, data analysis and visualisation.
By the end of this course, students will:
- Understand the theoretical foundations of state-of-the-art (multimodal) large language models, and more conventional tools for automatic speech recognition, and optical character recognition
- Understand the limitations of state-of-the-art language technologies
- Be able to design an appropriate data analysis methodology for audio, video and text data using state-of-the-art language technologies
- Be able to prepare video, audio and non-digitised text data for semi-automated analysis
- Be able to analyse and visualise text data
Prerequisites:
Some familiarity with Python recommended but not required; attendees should bring laptops for practical work
Representative Background Reading
O’Sullivan, James. 2022. The Bloomsbury Handbook to the Digital Humanities. Bloomsbury. ISBN: 978-1-3502-3213-6 (this will be provided by ESS)
Optional Reading:
Daniel Jurafsky and James H. Martin. 2025. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition with Language Models, 3rd edition. Online manuscript released January 12, 2025. https://web.stanford.edu/~jurafsky/slp3. – in particular chapters 10 and 16
Jacob Eisenstein. 2019. Introduction to Natural Language Processing. MIT Press. Open Access copy: https://cseweb.ucsd.edu/~nnakashole/teaching/eisenstein-nov18.pdf — chapter 1
Joe Nockels, Paul Gooding, Sarah Ames, and Melissa Terras. Understanding the application of handwritten text recognition technology in heritage contexts: a systematic review of Transkribu in published research. Archival Science 22, 367–392 (2022). https://doi.org/10.1007/s10502-022-09397-0 (Open Access)
For students who want to familiarise themselves with Python:
https://www.youtube.com/playlist?list=PL-osiE80TeTt2d9bfVyTiXJA-UTHn6WwU