2R Analysing Multimodal Language Data for Quantitative Social Science

Please note: This course will be taught in hybrid mode. Hybrid delivery of courses will include synchronous live sessions during which on campus and online students will be taught simultaneously.

Dr Nina Markl is a Research Fellow at the Institute for Analytics and Data Science and the Department of Language and Linguistics at the University of Essex. She is interested in language variation and change and the impact of speech technologies on speech communities. More broadly, her research explores the socio-technical contexts and impacts of computing technologies, in particular machine learning and “artificial intelligence”. Before joining Essex, she completed her PhD at the University of Edinburgh, where she conducted interdisciplinary work on algorithmic bias in automatic speech recognition, the development of language technologies for under-resourced languages, and computational methods for linguistic research.

Dr Charles Redmon is a Lecturer in the Department of Language and Linguistics at the University of Essex, working primarily at the intersection of phonetics, psycholinguistics, and computing, but also addressing questions in morphology, historical linguistics, and Germanic. His research covers a variety of languages and language families, but is mainly focused on the South Asian region. This work began during his Masters at EFLU in Hyderabad, and continues through active collaborations with researchers at Jadavpur University in Kolkata, IIT in Guwahati, and the North-Eastern Hill University (NEHU) in Shillong. In the computational space, his main interests are in technological access for low-resource languages, and in the development of tools that are less dependent on large data sources (which definitionally exclude many communities), such as physiologically grounded automatic speech recognition.

Course Content
While there is great research interest in multimodal data (e.g., social media video), the large-scale analysis of such data is challenging. Complementing ESS courses on text analysis, we will focus on state-of-the-art tools to facilitate automatic transcription of audio and video data and handwritten or printed (non-digitised) text. We will furthermore briefly explore tools to automatically annotate video. These systems allow students to make use of complex multimodal data such as social media videos.

The first half of the course focuses on the theoretical background, introducing core concepts and techniques to ensure students have a basic understanding of the underlying technologies. The second half of the course focuses on the application of state-of-the-art tools. Each day consists of one theory session and one practical session, and students will be a completing in-class programming exercises. During the second week, students will complete a small data analysis project, including analysis design, data extraction, data analysis and visualisation.

Course Objectives
By the end of this course, students will:

Understand the theoretical foundations of state-of-the-art (multimodal) large language models, and more conventional tools for automatic speech recognition, and optical character recognition
Understand the limitations of state-of-the-art language technologies
Be able to design an appropriate data analysis methodology for audio, video and text data using state-of-the-art language technologies
Be able to prepare video, audio and non-digitised text data for semi-automated analysis
Be able to analyse and visualise text data

Course Prerequisites
Some familiarity with Python recommended but not required. Attendees should bring laptops for practical work.

Representative Background Reading
• O’Sullivan, James. 2022. The Bloomsbury Handbook to the Digital Humanities. Bloomsbury. ISBN: 978-1-3502-3213-6

Optional Reading

Daniel Jurafsky and James H. Martin. 2025. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition with Language Models, 3rd edition. Online manuscript released January 12, 2025. https://web.stanford.edu/~jurafsky/slp3. – in particular chapters 10 and 16
Jacob Eisenstein. 2019. Introduction to Natural Language Processing. MIT Press. Open Access copy: https://cseweb.ucsd.edu/~nnakashole/teaching/eisenstein-nov18.pdf — chapter 1
Joe Nockels, Paul Gooding, Sarah Ames, and Melissa Terras. Understanding the application of handwritten text recognition technology in heritage contexts: a systematic review of Transkribu in published research. Archival Science 22, 367–392 (2022). https://doi.org/10.1007/s10502-022-09397-0 (Open Access)
For students who want to familiarise themselves with Python:
https://www.youtube.com/playlist?list=PL-osiE80TeTt2d9bfVyTiXJA-UTHn6WwU
https://automatetheboringstuff.com/

Week 1: Introducing multimodal language analysis

Day 1: Introduction
Day 2: Optical Character Recognition
Day 3: Automatic Speech Recognition and Transcription
Day 4: Image and video analysis
Day 5: Large Language Models

Week 2: Developing data analysis protocols

Day 6: Designing Data Analysis
• Data Ethics
• Developing a Data Collection Protocol
• Exploratory Analysis
• System validation

Day 7: Extracting digitised text data from different sources
• Automatic Transcription
• Optical Character Recognition
• Data Wrangling
• Data validation

Day 8: Text analysis
• Understanding differences between text and transcripts
• Making use of multimodality

Day 9: Data Visualisation
• Visualising data
• Interpreting data

Day 10: Future directions and limitations
• Developing independent data analysis protocols

Analysing Multimodal Language Data for Quantitative Social Science

Latest News

Networking Events

Apply now

2R Analysing Multimodal Language Data for Quantitative Social Science

Analysing Multimodal Language Data for Quantitative Social Science

Latest News

Networking Events

Apply now

Find us online!