Matt Loftis is an Assistant Professor of Political Science at Aarhus University in Aarhus, Denmark. he got his PhD from Rice University in Houston, Texas. His substantive research interests are in governance, bureaucracy, and party politics, but his work all involves making use of the vast amounts of government data freely available on the internet to build large original data sets.
This short course prepares students to acquire and process data from the Internet in the R statistical programming language. The course provides principles and a toolkit for several aspects of the process. We begin with tools for accessing web data in a variety of forms, from the open web to varieties of application programming interfaces (APIs). We also cover principles for archiving and cleaning web data and advanced tools for data storage.
• Understand pitfalls and challenges to acquiring and processing Internet data
• Gain experience accessing data on the open web and via API calls
• Provide a toolkit and best practices for your own research
Students will be able to:
• Acquire and process information on the open web using R
• Select appropriate tools for accessing and processing open web data
• Access and process information via API calls using R
• Process, archive, and munge many types of Internet data using R
• Store and work with data in certain types of advanced data structures
Each day includes a homework assignment. Students are strongly recommended to make their best attempt to complete the daily homework as assignments will be launching points for the next day’s class discussions and homework assignments will confront students with real-life problems they will encounter when scraping the open web.