The Augmented Social Scientist. Using Recent Advances in NLP to Annotate Millions of Texts with a Human-Level Accuracy [↑]
- Étienne Ollion, Professor in Sociology at l'École Polytechnique, Paris, France
- Rubing Shen, PhD Candidate at Sciences Po (Médialab) and at l'Institut polytechnique de Paris, France
This tutorial aims to introduce its participants to the logic of transfer learning applied to text data, in order to make them able to carry out their own text analysis projects. It will do so in 3 moments. The first one will review the existing literature on the topic, both classic and recent, to show the relevance of this approach. We show that a social scientist can, in a limited amount of time, train an algorithm that correctly annotates hundreds of thousands of texts. We will also show that when efficiently trained, the algorithm performs this task better than most humans (who can get tired, bored, or inattentive). The second part will be fully hands-on. We will demonstrate how to use a BERT algorithm on text data. We will use an online interface to walk the participants through each step of the analysis. Finally, we will discuss practical questions that emerge while carrying out the training phase, and we will conclude by briefly evoking the downsides of this approach.
Capturing Human Values during Controversies [↑]
Appeal to values is fundamental in mass persuasive communication. In this tutorial, we give an overview of how the basic human and moral values are interpreted and quantified according to the psychological literature, how they can be assessed from user generated data, and how they may be employed in persuasion and propaganda identification. In the first part of the tutorial, we provide an overview of traditional survey methods, and discuss their applicability to the validity of recruitment using the Internet and social media. We briefly cover the entailed biases of each source, and showcase several applications of value analysis in politics, health, and charitable giving. We discuss the role of appeal to values in disinformation campaigns and tools to detect them. In the second part of this tutorial, we will lead a hands-on demonstration of tools for (1) moral value extraction from text, (2) network analysis for opinion clustering, and (3) persuasion techniques identification in two scenarios: the COVID-19 vaccination debate and the recent Russian invasion of Ukraine. No technical prior knowledge of natural language processing, network analysis and machine learning is assumed. Familiarity with Python is helpful for getting the most out of the hands-on session.
Computational Social Science from Space [↑]
- Sanja Šćepanović, Research Scientist at Nokia Bell Labs, UK, and Beyond Fellow at AI4EO Lab at TUM/DLR, Munich, Germany
- Ingmar Weber, Alexander von Humboldt Professor and Chair for Societal Computing at Saarland University, Germany
The IC2S2 community studies how online social media and web data can be used to understand different aspects of society and human behavior. While these online data sources are valuable for sensing and quantifying the ‘social fabric’, they are not made for sensing the physical world. This tutorial will provide attendees with another data source with which to complement their analysis: satellite imagery. As the Earth Observation (EO) community is turning to use social media data, we aim to encourage a reciprocal interdisciplinary approach, too. At the same time, satellite-based remote sensing creates new needs for setting community standards around the responsible use of such data, something the IC2S2 community is well-positioned to contribute to. In this tutorial, we will give a short overview of existing work in the social sciences that uses EO to study social phenomena, encouraging discussion from the audience. Additionally, we will offer a hands-on introduction to using Google Earth Engine to demonstrate the effects of redlining, the practice of racially segregated housing policies, visible from space and still impacting communities today.
Creating multiplayer, interactive online experiments with Empirica [↑]
Empirica is a free and open-source platform that enables researchers to design and conduct synchronous experiments with groups of human participants in a virtual lab setting. The platform has been adopted by researchers at 30+ leading institutions and is becoming a popular choice for creating high-throughput group experiments. Empirica takes care of crucial tasks such as managing participants, randomizing groups, assigning treatments, and collecting data, and provides a simple and validated framework that can be customized to create arbitrarily complex experimental designs. This workshop is designed for researchers who are interested in conducting multiplayer, interactive online experiments, and would like to learn how Empirica can help them meet their research objectives. The tutorial will provide an overview of Empirica's purpose and structure, illustrate its capabilities with real-world case studies, and guide participants through the process of building their first multi-player experiment. Workshop participants will learn how to install and run Empirica, modify participant interfaces, adapt server-side logic, and deploy their experiment to a cloud server. Additionally, the workshop will highlight the new features present in the recent release of Empirica v2 and demonstrate some advanced patterns that take advantage of Empirica's power and flexibility.
Data access for researchers under the Digital Services Act [↑]
The Digital Services Act was recently adopted by the European Commission and contains an article (namely Article 31
) that is of particular importance for computational social sciences. It regulates data access from very large online platforms for academic researchers. In short, researchers will have the right to submit data access requests for specific research questions related to systemic risk (e.g., about misinformation or polarization/radicalization) and get access to the necessary datasets. This law will go into effect in 2023 for designated major online platforms and in 2024 for all. In this tutorial, we will learn more about the specifics of Article 31 and how it could be implemented and enforced. In the second part of the tutorial, we will write hands-on drafts of data requests for the DSA as realistically as possible. As the implementation of the DSA is still underway, the CSS community could play a key role by preparing such research questions and requests in advance and having them ready in time when the law goes into effect to put pressure on platforms to comply and remove our research from their dependency.
Geospatial Data Science [↑]
Geospatial data is ubiquitous. Massive geospatial data are generated every second from our smartphones, through our social media posts, or through many kinds of other means, allowing us to trace the movements and behavioral patterns of entire societies. As these data keep growing, it becomes more important to extract meaningful insights from location, relation, and position, for applications as diverse as business analytics, epidemiology, or species protection. This tutorial provides participants core competences in Geospatial Data Science (GDS) using Python with focus on two core themes: 1) GDS basics for the computational social scientist, 2) Network science applications with OpenStreetMap, including:
- Data structures and principles of GIS; map projections and measurement
- Gathering and preprocessing large-scale geospatial data
- State-of-the-art computational tools for GDS
- Spatial network analysis
- Main methodologies available to the Geospatial Data Scientist, as well as their intuition as to how and when they can be applied
- Real world applications of these techniques in an applied context
- August Lohse, Ph.D. fellow at Center for Social Data Science (SODAS) University of Copenhagen (email@example.com)
- Simon P. von der Maase, Researcher at Department of Peace and Conflict Dynamics, The Peace Research Institute Olso (PRIO) (firstname.lastname@example.org)
In this tutorial, we will introduce participants to working with images as data in a computational social science setting. The tutorial will cover a range of subjects, including importing images into Python, utilizing pre-trained deep learning models for image labeling, creating custom neural networks, and handling large datasets of images. Through hands-on coding exercises and the creation of their own models, attendees will gain practical experience and technical skills. The tutorial will also introduce participants to the vast potential of using images as data through the exploration of cutting-edge computational social science research and will showcase a selection of excellent research in this field. By the end of the tutorial, attendees will be both inspired and ready to pursue their own research questions involving images as data.
Thinking with Deep Learning: An exposition of deep (representation) learning for social science research Presenters [↑]
- James Evans, Professor of Sociology and Computation at the University of Chicago, external faculty member at the Santa Fe Institute, US
- Bhargav Srinivasa Desikan, Doctoral Researcher and Fellowship student in the Computer and Communication Sciences department at EPFL, Switzerland
A deluge of digital content is generated daily by web-based platforms and sensors that capture digital traces of communication and connection, and complex states of society, the economy, the human mind, and the physical world. Emerging deep learning methods enable the integration and analysis of these complex data in order to address research and real-world problems by designing and discovering successful solutions. Our tutorial serves as a companion to our book, “Thinking with Deep Learning”. This book takes the position that the real power of deep learning is unleashed by thinking with deep learning to reformulate and solve problems traditional machine learning methods cannot address. These include fusing diverse data like text, images, tabular and network data into integrated and comprehensive “digital doubles” of the subjects and scenarios you want to model, the generation of promising recommendations, and the creation of AI assistants to radically augment an analyst or system’s intelligence. For scientists, social scientists, humanists, and other researchers who seek to understand their subjects more deeply, deep learned representations facilitate the opportunity to not only predict and simulate them but also to provide novel insights, associations, and understanding available for analysis and reuse. The tutorial will walk attendees through various non-nerual representations of social text, image and network data, and the various distance metrics we can use to measure between these representations. We then move on to introducing to neural models and their use in modern science and computing, with a focus on social sciences. After introducing neural architectures, we will explore how they are used with various multi-modal social data, and how their power can be unleashed with integrating and aligning these representations.