Please note: This course will be taught in hybrid mode. Hybrid delivery of courses will include synchronous live sessions during which on campus and online students will be taught simultaneously.

Xavier Fernández-i-Marín is Social Sciences methodologist, currently serving as ”Ramón y Cajal” fellow at the Universitat de Barcelona. He develops and tailors solutions for social science research methods, including current developments in Bayesian inference, data visualization, probabilistic programming, experimental designs and machine learning. He has substantial contributions in comparative politics, public administration, public policy, international relations and psychology. He has worked in the fields of global governance and IGOs, the diffusion of policies and institutions and the processes of development of regulatory agencies. Also, he worked on Internet and e-Government diffusion and other related aspects of the public management of the Information Society, which lately include the adoption of Artificial Intelligence in public administration. He has been trained in methodology at the University of Essex, obtaining a postgraduate degree, and teaching in the summer school for several years. He has ample experience with hierarchical/multilevel models and bayesian inference, and also in cluster analysis, principal component analysis, factor analysis, survival (event history) analysis, spatial models and quantitative methods in general, as well as with experimental designs and text analysis. This has brought collaborations with several disciplines, bringing scientific, methodological and systematic value to different teams. He is the creator and developer of several R packages, including ggmcmc[6], an R package for assessing and diagnosing convergence of Markov Chain Monte Carlo simulations, as well as for graphically display results from full MCMC analysis; and PolicyPortfolios, to work with data on comparative public policy.
Course Objectives
The purpose of the course is to give students an introduction on the strategies to achieve systematic, flexible and appropriate measurement indicators for concepts of interest. The set of tools includes Bayesian inference modelling for combining variables and estimate latent traits, and the systematic use of Large Generative Artificial Intelligence Models (LGAIMs, also known as LLMs) to experiment with measures for textual data. It aims to allow the development of strategies for transforming raw data into meaningful measures of concepts in the social sciences.
Developing good measures of the concepts to work with is the first step for assessing causal relationships and associations amongst multiple variables. At the end of the course participants
will be able to design a system to transform raw measures into meaningful variables; create clusters and typologies of observations; and perform measurement of such concepts through time. They will also be able to employ Large Generative AI Models (LLMs) hand-in-hand with inference based Bayesian methods to produce latent variables of interest, specially in the context of observational data collection.
Course Prerequisites
Some familiarity with the linear model and/or with logit/probit regression is helpful, but not essential. No specific software requirements are necessary, since only Free and Open Source Software, as well as open weight LLMs will be used. R, JAGS and software for managing Large Language Models will be covered from the beginning.
Background Knowledge
Statistics:
OLS – elementary
Maximum Likelihood – elementary
Software:
R – moderate
Maths:
Linear Regression – elementary
Calculus – elementary
The course has 10 sessions of 2 lecture hours and 1.5 lab hours each day. Labs give students the possibility to work with the datasets provided by the instructor or with their own datasets.
1. Introduction
The first session reviews challenges, opportunities and limitations of measurement, as well as notation and fundamental prerequisites (such as probability distributions) used later in the course. A basic introduction to Bayesian inference is also performed.
2. Continuous variables: factor analysis
One session covers the fundamentals of factor analysis for continuous variables. The Human Development Index from the UNDP is presented and discussed.
3. Binary variables: item–response models
One session covers item–response models for binary indicators. This includes the Rasch one-parameter model and the two and three logistic parameter models [4] Besides the
uses, the session also reviews the limitations of such models. The examples examined include the following: generating an indicator out of several binary survey questions; ideal points of US legislators; and a measure of the complexity of European Regulatory Agencies [8].
4. Ordered variables and Mixed measurement (continuous, binary and ordinal)
This session covers the transformation of ordinal variables into continuous, by generating data-driven cutting points in each category. An application on how to measure democracy [17] is examined. The session also extends the continuous, binary and ordinal models into mixed factor analysis, that combines both indicators and also ordered variables. Relying on Quinn [13], an example about the degree of formalism of legal systems [15] is provided, as well as an example on the degree of independence and accountability of regulatory agencies.
5. Missing data as latent data, and extensions: Temporal dynamics and Mixed sources
One session specifically deals with the advantages of Bayesian inference for dealing with missing data and how this approach can be seen as a way to model latent data. How to extend measurement models to account for temporal variability is revised, making special emphasis on simple time-series models and the use of Kalman filters. Examples on morality policies and regional autonomy, federalism and decentralization are presented. How to mix data from several sources and aggregate it in order to obtain pooled indicators of the desired measures. An example about pooling data from different polling sources is revised [10]. Also, a discussion about the aggregation performed at the Worldwide Governance Indicators [18] is performed.
6. Classification, clustering and typologies
One session provides an introduction to clustering and classification. By relying on [5], an example about classification of welfare regimes in Western countries [1] is revised. Other examples include classifying European drug regimes, institutional features of organizations [11] and typologies of regions.
7. Conjoint analysis
This session provides a concrete example of the use of Hierarchical Bayes (HB) approaches to conjoint analysis, extracting every surveyed individual with the preference for each
of the profiles. A revision of [7] will be covered, with revised readings from [2] and [16].
8. Text analysis as a measurement challenge
A final set of 3 sessions session introduce text analysis as part of a more general effort in producing valid, reliable and systematic measures of large streams of documents. It relies on employing Large Generative Artificial Models (LGAIMs, also known as LLMs) to experiment with codification. The sessions provide specific tools on how to effectively employ LGAIMs / LLMs to be later on combined with proper measurement models.
References
[1] John S. Ahlquist and Christian Breunig. “Model-based Clustering and Typologies in the Social Sciences”. In: Political Analysis 20.1 (2012), pp. 92–112.
[2] Greg M Allenby, Jaehwan Kim, and Peter E Rossi. “Economic models of choice”. In: Handbook of Marketing Decision Models. Springer, 2017, pp. 199–222.
[3] P. Congdon. Bayesian statistical modelling. Wiley series in probability and mathematical statistics. Probability and mathematical statistics. John Wiley & Sons, 2006.
[4] S. McKay Curtis. “BUGS Code for Item Response Theory”. In: Journal of Statistical Software 36.CS-1 (2010).
[5] B.S. Everitt et al. Cluster Analysis. Wiley Series in Probability and Statistics. Wiley, 2011.
[6] Xavier Fernández-i-Marín. “ggmcmc: Analysis of MCMC Samples and Bayesian Inference”. In: Journal of Statistical Software 70.1 (2016), pp. 1–20.
[7] Xavier Fernández-i-Marín et al. “Discrimination against mobile European Union citizens before and during the first COVID-19 lockdown: Evidence from a conjoint experiment in Germany”. In: European Union Politics (2021), p. 14651165211037208.
[8] Susanna Salvador Iborra et al. “The Governance of Goal-Directed Networks and Network Tasks: An Empirical Analysis of European Regulatory Networks”. In: Journal of Public Administration Research and Theory (2017).
[9] Simon Jackman. Bayesian Analysis for the Social Sciences. New Jersey: John Wiley & Sons, 2009.
[10] Simon Jackman. “Polling the polls over an election campaign”. In: Australian Journal of Political Science 40.4 (2005), pp. 499–517.
[11] Jacint Jordana, Xavier Fernández-i-Marín, and Andrea C Bianculli. “Agency proliferation and the globalization of the regulatory state: Introducing a data set on the institutional features of regulatory agencies”. In: Regulation & Governance 12 (4 2018), pp. 524–540.
[12] Martyn Plummer. “JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling”. In: Proceedings of the 3rd International Workshop on Distributed Statistical Computing. Vienna, Austria, 2003.
[13] Kevin M. Quinn. “Bayesian Factor Analysis for Mixed Ordinal and Continuous Responses”. In: Political Analysis 12 (2004), pp. 338–353.
[14] R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria, 2016.
[15] Howard Rosenthal and Erik Voeten. “Measuring legal systems”. In: Journal of Comparative Economics 35 (2007), pp. 711–728.
[16] P.E. Rossi, G.M. Allenby, and R. McCulloch. Bayesian Statistics and Marketing. Wiley Series in Probability and Statistics. Wiley, 2012.
[17] Shawn Treier and Simon Jackman. “Democracy as a latent variable”. In: American Journal of Political Science 52.1 (2008), pp. 201–217.
[18] World Bank.