Sang Jung Kim


Emotion Recognition of Images and Videos using FER and DeepFace Python libraries

While social scientists widely apply computational analysis on texts such as natural language processing techniques or topic-modeling techniques, only recently have social scientists used computational analysis on images. One of the topical areas social scientists are interested in is the effect of emotional appeals in visual messages on audiences’ attitudes or engagement on social media. This tutorial delves into analyzing emotions in facial expressions embedded in images and invites social science researchers who plan to apply computer vision techniques to inform their research questions or hypotheses.


Facial emotion recognition (FER) is one of the most investigated fields in computer vision. There are generally two approaches in the computer science field to process automatic facial emotion recognition: (1) conventional FER approaches and (2) deep-learning-based FER approaches. Compared to conventional FER approaches that detect emotions based on the features extracted directly from the images, deep learning approaches enable “end-to-end” learning, largely reducing the sole dependency on the face-physics-based model (Ko, 2018).


Among deep-learning-based FER models built by computer scientists, social scientists can easily apply Python packages “FER” and “deepface” to explore their dataset related to facial expressions or answer their research questions. FER package uses a deep learning approach with Convolutional Neural Network based on the dataset from Kaggle competition, “Challenges in representation learning” (Goodfellow et al., 2013). On the other hand, the “deep face” package is a facial recognition framework wrapping multiple state-of-the-art deep learning models. Because social scientists can quickly learn both packages and compare the results by validating one with another, a tutorial on these packages will significantly help social scientists inform their work.


Machine learning with Pytorch for science of science and innovation

AI has made amazing contributions to science and creativity in recent years. It has led to advances in difficult and fundamental problems such as protein folding in biology, has beat humans in strategy games such as chess and go, and reached human level in recognition and synthesis of images and text. With the availability of powerful, high-level packages for AI, such Pytorch, it is easier than ever to utilize AI in computational social science (CSS) problems. However, a large fraction of the CSS community may find it difficult to get started with AI. One reason for this is that most tutorials to AI are not designed specifically with the CSS community in mind. They are either too high-level, or too mathematical for CSS researchers. They also often don’t cover the range of problems in CSS, such as network analysis, natural language processing and image processing. We bridge this gap by providing a tutorial that is tailored to the CSS audience. We discuss some of the most important AI techniques for CSS and review in detail a few papers utilizing AI in CSS. We then discuss the structure of the most important AI deep learning models for CSS. We provide an interactive notebook to teach how to design and use these methods in pytorch.


Jaren Haber

Web-Crawling: A Practical Introduction in Python

Web-crawling is an increasingly popular tool for collecting data from websites, online databases, and social media—but it also remains a common stumbling block for computational social science. With such a wide range of tools and languages available (HTML, APIs, and XPath are just a few), developing a crawling pipeline can be a frustrating experience for researchers unfamiliar with the promises and pitfalls of web-based data. This is especially the case for scholars without formal training in Computer Science, for whom attempting to collect new web corpora often means reinventing the flat tire.


Whatever your background, this workshop will give you the building blocks to use web-crawling in your research. We will tackle common problems including collecting web addresses (by automated Google search); focused, narrow crawling of a limited number of websites (with Requests and BeautifulSoup); and flexible, broad crawling of heterogeneous web corpora (with Scrapy). We will explore the tradeoff between precision and extensibility and challenge conventional skepticism toward noisier but scalable crawling.


We will build on best practices to create a decision hierarchy promoting accessible and efficient workflows. First, if there’s an API, use that before scraping; the data provider prefers this and it’s probably easier. Second, precise scraping is good—but not necessarily at the expense of scale or for methods less sensitive to noise (e.g., raw word counts). Finally, when collecting web data representative of a population, it’s okay to break individual sites’ Terms of Service, but be polite and don’t release anything private or copyrighted.

This tutorial will: (1) Engage in discussion and exercise that explore baseline and state-of-the-art approaches to Sentiment Classification and (2) Provide access to current and historical global and local newspapers titles which are in high demand by computational social scientists for text-mining research.


Sentiment Classification is a sub-task of text classification which attempts to assign an affective or emotion state to the text. Unlike the related task of sentiment analysis which aims to assign a single valence or likert score from ‘very negative’ to ‘very positive’, sentiment classification is less studied. In this tutorial, we will analyze newspaper articles in order to identify the following emotions: ‘anger’, ‘disgust’, ‘fear’, ‘sadness’, ‘happiness’, ‘love’, ‘surprise’, and ‘neutral’.


Classifying sentiment can be especially valuable with newspaper content with many potential applications such as: • What are the ‘typical’ public emotions surrounding political events? Tragic events? • How do emotions or affective states as represented by a newspaper outlet vary by location and era?


• What are the most common emotion transitions?


As part of the tutorial, participants will (1) Create newspaper datasets focusing on sentiment-related topics and 2) Walk-through, develop understanding, and run both Python code for baseline BoW models and a SoTA BERT-based model which was developed as part of a collaboration with students and researchers at the University of Michigan. Since a text and data mining solution from ProQuest, TDM Studio, will be used for the tutorial, researchers will be provided access to this solution and hundreds of newspaper titles during the exercise.


Dashun Wang

Yian Yin

Toward a quantitative understanding of failure

Despite its inevitable nature and the incontrovertible wisdom that “failure is the mother of success”, our quantitative understanding of failure remains limited, in part due to the lack of systematic datasets that record the frequently occurring yet often neglected failures within individuals, teams and organizations. This situation is changing radically, however, thanks to newly available large-scale datasets spanning social, scientific, and technical domains. In this tutorial, we will touch on different examples of failures through the use of behavioral experiments, sociological theories, data analytics, causal inference, and mathematical modeling, hoping to illustrate that a computational social science agenda towards failures ---combining canonical social science frameworks, big data, and computational tools from AI and complexity sciences --- offers exciting new opportunities and challenges. By helping improve our understanding and predictions of the why, how, and when of failure, advances in this area not only hold potential policy implications; they could also substantially further our ability to imagine and create by revealing the total pipeline of creativity.

This tutorial will provide an introduction for building digital infrastructure for high throughput experimentation in computational social science. The first half of the tutorial will introduce participants to three platforms on which researchers can build their own computational social science experiments: (1) Empirica (2) Yourfeed (3) Thorat (This or that). Each of these platforms enable researchers flexibility to conduct massive online experiments; Empirica is an open-source JavaScript framework for running multiplayer interactive experiments and games in the browser, Yourfeed is a website resembling a social media newsfeed where the newsfeed is made up of researcher uploaded stimuli, Thorat is an open-source Flask framework for running decision making experiments on multimedia stimuli. Each of these frameworks is designed to enable computational social science experiments that can scale to tens of millions of participants. In the second half of the tutorial, we will split into three groups and work through specific examples of building an experiment from scratch in each of the three frameworks. Participants will learn about available tools for high throughput experimentation in computational social science and how to use webframeforks (e.g. Meteor in Javascript, Flask and Django in Python) to build dynamic websites for custom experiments.


James Evans

Bhargav Srinivasa Desikan

Thinking with Deep Learning: An exposition of deep (representation) learning for social science research

A deluge of digital content is generated daily by web-based platforms and sensors that capture digital traces of communication and connection, and complex states of society, the economy, the human mind, and the physical world. Emerging deep learning methods enable the integration and analysis of these complex data in order to address research and real-world problems by designing and discovering successful solutions. Our tutorial serves as a companion to our book, “Thinking with Deep Learning”. This book takes the position that the real power of deep learning is unleashed by thinking with deep learning to reformulate and solve problems traditional machine learning methods cannot address. These include fusing diverse data like text, images, tabular and network data into integrated and comprehensive “digital doubles” of the subjects and scenarios you want to model, the generation of promising recommendations, and the creation of AI assistants to radically augment an analyst or system’s intelligence. For scientists, social scientists, humanists, and other researchers who seek to understand their subjects more deeply, deep learned representations facilitate the opportunity to not only predict and simulate them but also to provide novel insights, associations, and understanding available for analysis and reuse.


The tutorial will walk attendees through various non-nerual representations of social text, image and network data, and the various distance metrics we can use to measure between these representations. We then move on to introducing to neural models and their use in modern science and computing, with a focus on social sciences. After introducing neural architectures, we will explore how they are used with various multi-modal social data, and how their power can be unleashed with integrating and aligning these representations.


Jason Jeffrey Jones

Using Twitter Profile Bios to Investigate Personally Expressed Identity

We will discuss a method (developed by the instructor) to study personally expressed identity at scale. Personally expressed identity is who or what an individual themselves says they are. At scale means with data on millions of individuals. Researchers can now collect such data from Twitter profiles. Twitter profiles contain a field named user-description. In that field is each user’s response to the prompt "Describe yourself in 160 characters or less." Happily, this data comes timestamped and geocodable - meaning temporal trends and geographic comparisons at fine resolution are readily analyzable. Attendees will learn how to acquire Twitter profile data at scale by using the API. They will also become familiar with publicly available data already collected by the instructor. To collect and analyze this data, the instructor will provide Python and R scripts, and guide attendees through several examples of their use.

Call for


Decisions have been delayed, but will be sent out by Monday, April 25, and conference deadlines will be extended correspondingly.


IC2S2 2022 will be preceded by a day of tutorials and skills workshops on Tuesday, July 19. They should provide social science researchers, and data analysts the opportunity to add new tools to their toolkit. Specifically we are calling for proposals of tutorials that address methods, skills and tools useful to conduct research in computational social science, including but not limited to the following topics:

  • data collection / text mining approaches for social scientists

  • deep neural networks applied to social data

  • new advances in (social) network analysis

  • new advances in text analysis

  • new advances in visual analysis

  • active and adaptive learning applied to experimental design

  • visual communication and visualizations

  • using sensors for studying behavior

  • combining digital trace data and additional data (e.g., surveys)

  • assessing biases in data collection

  • best practices for working with online communities (including crowdsourcing and participants recruitment)

  • legal and ethical dimensions of CSS research

  • reproducibility in CSS research

  • experimental design and development in CSS


We particularly invite proposals for "disciplinary state of the art sessions" that give a focused overview on the latest developments, trends and perspectives in a specific discipline or research area. They can also focus on a specific platform or environment (e.g., PyTorch). These should help researchers in the interdisciplinary field of CSS to catch up with developments in areas beyond their core expertise.

We also welcome proposals for tutorials on any other topics at the intersection of the social sciences, computer science and/or statistics/data science. We will consider any topic; provided that the proposal makes a strong argument that the tutorial is important for the IC2S2 community. 

Tutorials should be of interest to a substantial portion of the community and should represent a sufficiently mature area of research or practice. Tutorials should be comprehensive and should not focus only on the presenter’s previous work.

We anticipate that each accepted tutorial will be 3 hours long. However, we are also accepting half-tutorials (1.5 hours) and full-day tutorials (6 hours).

Submission Format


Submissions can be sent to Proposals for tutorials should be no more than three pages in length and contain the following:


  • Title

  • Presenters / organizers: Please provide names, affiliations, email addresses, and short bios (up to 200 words) for each presenter. Bios should cover the presenters’ expertise related to the topic of the tutorial. If there are multiple presenters, please describe how the time will be divided between them.

  • Topic: An abstract describing the topic (approx. 250 words)

  • Audience: A short statement about the expected target audience. What prior knowledge, if any, do you expect from the audience?

  • Rationale: What is the objective / learning outcome of the tutorial? What is the benefit for the attendees? Why is this tutorial important to the IC2S2 community?

  • Format: A description of the proposed event format (tutorial, hands-on, workshop, etc.) and a list of proposed activities.

  • Equipment: A short note on equipment or features required for the tutorial format.

  • Previous tutorials: Has the tutorial (or a similar/highly related tutorial) been presented at another venue previously? If so, please list the dates and venues, and describe the similarities and differences between the previous tutorials and the proposed tutorial.

  • Proposed length of the tutorial: please choose from 1.5 hours (half session), 3 hours (full session), and 6 hours (full day). If you are flexible, please indicate in the outline the content that will not be included if a short/long version of the tutorial is given. If you would like to give a 6 hour tutorial please justify why a full day is necessary.


Organizers of accepted tutorials are expected to provide some information material to announce the tutorial on the conference website and to help to spread the word about the tutorial via mailing lists and social media. Our hope is that the tutorial day will be among the most highly visible and attended of the entire conference.

Important Dates


Proposal deadline (AoE, anywhere on Earth)

February 25

Decision notification

April 15

Participant registration deadline

April 29

Tutorial day

July 19