The Augmented Social Scientist. Using Recent Advances in NLP to Annotate Millions of Texts with a Human-Level Accuracy[↑]
Time and location
09:00, Room D + streaming in Room F
- Étienne Ollion, Professor in Sociology at l'École Polytechnique, Paris, France
- Rubing Shen, PhD Candidate at Sciences Po (Médialab) and at l'Institut polytechnique de Paris, France
This tutorial aims to introduce its participants to the logic of transfer learning applied to text data, in order to make them able to carry out their own text analysis projects. It will do so in 3 moments. The first one will review the existing literature on the topic, both classic and recent, to show the relevance of this approach. We show that a social scientist can, in a limited amount of time, train an algorithm that correctly annotates hundreds of thousands of texts. We will also show that when efficiently trained, the algorithm performs this task better than most humans (who can get tired, bored, or inattentive). The second part will be fully hands-on. We will demonstrate how to use a BERT algorithm on text data. We will use an online interface to walk the participants through each step of the analysis. Finally, we will discuss practical questions that emerge while carrying out the training phase, and we will conclude by briefly evoking the downsides of this approach.
Capturing Human Values during Controversies [↑]
Appeal to values is fundamental in mass persuasive communication. In this tutorial, we give an overview of how the basic human and moral values are interpreted and quantified according to the psychological literature, how they can be assessed from user generated data, and how they may be employed in persuasion and propaganda identification. In the first part of the tutorial, we provide an overview of traditional survey methods, and discuss their applicability to the validity of recruitment using the Internet and social media. We briefly cover the entailed biases of each source, and showcase several applications of value analysis in politics, health, and charitable giving. We discuss the role of appeal to values in disinformation campaigns and tools to detect them. In the second part of this tutorial, we will lead a hands-on demonstration of tools for (1) moral value extraction from text, (2) network analysis for opinion clustering, and (3) persuasion techniques identification in two scenarios: the COVID-19 vaccination debate and the recent Russian invasion of Ukraine. No technical prior knowledge of natural language processing, network analysis and machine learning is assumed. Familiarity with Python is helpful for getting the most out of the hands-on session.
Geospatial Data Science[↑]
Geospatial data is ubiquitous. Massive geospatial data are generated every second from our smartphones, through our social media posts, or through many kinds of other means, allowing us to trace the movements and behavioral patterns of entire societies. As these data keep growing, it becomes more important to extract meaningful insights from location, relation, and position, for applications as diverse as business analytics, epidemiology, or species protection. This tutorial provides participants core competences in Geospatial Data Science (GDS) using Python with focus on two core themes: 1) GDS basics for the computational social scientist, 2) Network science applications with OpenStreetMap, including:
- Data structures and principles of GIS; map projections and measurement
- Gathering and preprocessing large-scale geospatial data
- State-of-the-art computational tools for GDS
- Spatial network analysis
- Main methodologies available to the Geospatial Data Scientist, as well as their intuition as to how and when they can be applied
- Real world applications of these techniques in an applied context
Creating multiplayer, interactive online experiments with Empirica[↑]
Time and location
09:00, Room H (15th floor)
- Mohammed Alsobay, PhD candidate at the Information Technology group at MIT Sloan, US
- James Houghton, Postdoctoral Researcher at the University of Pennsylvania's Computational Social Science Lab, US
Empirica is a free and open-source platform that enables researchers to design and conduct synchronous experiments with groups of human participants in a virtual lab setting. The platform has been adopted by researchers at 30+ leading institutions and is becoming a popular choice for creating high-throughput group experiments. Empirica takes care of crucial tasks such as managing participants, randomizing groups, assigning treatments, and collecting data, and provides a simple and validated framework that can be customized to create arbitrarily complex experimental designs. This workshop is designed for researchers who are interested in conducting multiplayer, interactive online experiments, and would like to learn how Empirica can help them meet their research objectives. The tutorial will provide an overview of Empirica's purpose and structure, illustrate its capabilities with real-world case studies, and guide participants through the process of building their first multi-player experiment. Workshop participants will learn how to install and run Empirica, modify participant interfaces, adapt server-side logic, and deploy their experiment to a cloud server. Additionally, the workshop will highlight the new features present in the recent release of Empirica v2 and demonstrate some advanced patterns that take advantage of Empirica's power and flexibility.
Thinking with Deep Learning: An exposition of deep (representation) learning for social science research[↑]
Time and location
13:30, Room D + streaming in room F
- James Evans, Professor of Sociology and Computation at the University of Chicago, external faculty member at the Santa Fe Institute, US
- Bhargav Srinivasa Desikan, Doctoral Researcher and Fellowship student in the Computer and Communication Sciences department at EPFL, Switzerland
A deluge of digital content is generated daily by web-based platforms and sensors that capture digital traces of communication and connection, and complex states of society, the economy, the human mind, and the physical world. Emerging deep learning methods enable the integration and analysis of these complex data in order to address research and real-world problems by designing and discovering successful solutions. Our tutorial serves as a companion to our book, “Thinking with Deep Learning”. This book takes the position that the real power of deep learning is unleashed by thinking with deep learning to reformulate and solve problems traditional machine learning methods cannot address. These include fusing diverse data like text, images, tabular and network data into integrated and comprehensive “digital doubles” of the subjects and scenarios you want to model, the generation of promising recommendations, and the creation of AI assistants to radically augment an analyst or system’s intelligence. For scientists, social scientists, humanists, and other researchers who seek to understand their subjects more deeply, deep learned representations facilitate the opportunity to not only predict and simulate them but also to provide novel insights, associations, and understanding available for analysis and reuse. The tutorial will walk attendees through various non-nerual representations of social text, image and network data, and the various distance metrics we can use to measure between these representations. We then move on to introducing to neural models and their use in modern science and computing, with a focus on social sciences. After introducing neural architectures, we will explore how they are used with various multi-modal social data, and how their power can be unleashed with integrating and aligning these representations.
Time and location
13:30, Room E + streaming in Room G
- August Lohse, Ph.D. fellow at Center for Social Data Science (SODAS) University of Copenhagen (firstname.lastname@example.org)
- Simon P. von der Maase, Researcher at Department of Peace and Conflict Dynamics, The Peace Research Institute Olso (PRIO) (email@example.com)
In this tutorial, we will introduce participants to working with images as data in a computational social science setting. The tutorial will cover a range of subjects, including importing images into Python, utilizing pre-trained deep learning models for image labeling, creating custom neural networks, and handling large datasets of images. Through hands-on coding exercises and the creation of their own models, attendees will gain practical experience and technical skills. The tutorial will also introduce participants to the vast potential of using images as data through the exploration of cutting-edge computational social science research and will showcase a selection of excellent research in this field. By the end of the tutorial, attendees will be both inspired and ready to pursue their own research questions involving images as data.
Computational Social Science from Space[↑]
Time and location
13:30, Room C
- Sanja Šćepanović, Research Scientist at Nokia Bell Labs, UK, and Beyond Fellow at AI4EO Lab at TUM/DLR, Munich, Germany
- Ingmar Weber, Alexander von Humboldt Professor and Chair for Societal Computing at Saarland University, Germany
The IC2S2 community studies how online social media and web data can be used to understand different aspects of society and human behavior. While these online data sources are valuable for sensing and quantifying the ‘social fabric’, they are not made for sensing the physical world. This tutorial will provide attendees with another data source with which to complement their analysis: satellite imagery. As the Earth Observation (EO) community is turning to use social media data, we aim to encourage a reciprocal interdisciplinary approach, too. At the same time, satellite-based remote sensing creates new needs for setting community standards around the responsible use of such data, something the IC2S2 community is well-positioned to contribute to. In this tutorial, we will give a short overview of existing work in the social sciences that uses EO to study social phenomena, encouraging discussion from the audience. Additionally, we will offer a hands-on introduction to using Google Earth Engine to demonstrate the effects of redlining, the practice of racially segregated housing policies, visible from space and still impacting communities today.
Data access for researchers under the Digital Services Act[↑]
Time and location
13:30, Room H (15th floor)
- Philipp Lorenz-Spreen, Research Scientist at the Max Planck Institute for Human Development, Berlin, Germany
- Julian Jaursch, Project Director at Stiftung Neue Verantwortung, Berlin, Germany
For years, researchers have faced obstacles when they needed online platforms’ data to explore platforms’ functioning, potential risks and effects. A plan for a one-off cooperation between a platform and researchers failed miserably
, tech companies have allegedly pressured
researchers to stop their work and a key tool to study platforms has been closed
. In early 2023, these challenges have come to the fore again as one of the more open big platforms, Twitter, made it much harder
to access data.
European policymakers – pushed by academia and civil society – have acknowledged the need for researchers to access platform data and the value of the resulting studies to help with independent platform oversight in the public interest. Most prominently, lawmakers in the European Union included a data access provision for researchers in the Digital Services Act (DSA). This article (Article 40) could potentially provide a huge opportunity for researchers to better understand platforms and, ultimately, help independent regulators oversee platforms based on scientific evidence.
Yet, despite the promises held by the DSA, many important open questions remain. Details on data access requests, the type of data to be requested, the timelines for full application and the communication exchanges between researchers, regulators and platforms will be hashed out over the coming months.
In this tutorial, participants will learn about the DSA’s Article 40 so they have a better understanding of who can request data for what type of research, how this process works and what open questions might be addressed in what way. They will also actively engage with one another to develop exemplary research questions and data access requests. Taken together, the tutorial thus aims to enable the CSS community to understand und utilize their data access rights under the DSA.