Tutorials at The Web Conference 2023

This paper summarizes the content of the 28 tutorials that have been given at The Web Conference 2023.


TUTORS
Lisette Espin-Noboa1 is a PostDoc at Central European University (CEU) and at the Complexity Science Hub Vienna (CSH).Her research interests lie at the intersection between computational social science, network science, and AI for social good.She is particularly focused on understanding how edges form in social networks [11], and how these mechanisms of edge formation may affect machine learning algorithms [9,13,14], and human behavior [10,12,53].
Tiago Peixoto2 is an Associate Professor in the Department of Network and Data Science at the Central European University (CEU).His research focuses on the development of methods to extract scientific understanding from network data, as well as the mathematical modeling of network behavior and evolution.He is particularly interested in problems of network inference, where meaningful structural and functional patterns are missing or cannot be obtained by direct inspection or low-order statistics, and require instead more sophisticated approaches based on large-scale generative models and efficient algorithms derived from them [44,45,57,59].Many of the methods developed in his work are made available as part of the graph-tool library [41], which is extensively documented.
Fariba Karimi3 is an Assistant Professor at the Vienna University of Technology (TU Wien) and a group leader at the the Complexity Science Hub Vienna (CSH).Her research mainly focuses on computational and network approaches to address societal challenges such as gender disparities in collaboration and citation networks [20,26], visibility of minorities in social and technical systems [14,23,39], algorithmic biases [9,17,38], and sampling hard-to-reach groups [13,51].Her research also touches upon the emergence of culture in Wikipedia [22,47], spreading of information and norms [25], and perception biases [29] by using mathematical models, digital traces and online experiments.

TOPIC AND RELEVANCE
In this tutorial, we aim to cover two paradigms related to social network modeling and some applications.

Social theories of edge formation
Understanding how networks form is a key interest for "The Web Conference" community.For example, social scientists are frequently interested in studying relations between entities within social networks, e.g., how social friendship ties form between actors and explain them based on attributes such as a person's gender, race, political affiliation or age in the network [48].Similarly, the complex networks community suggests a set of generative network models aiming at explaining the formation of edges focusing on the two core principles of popularity and similarity [40].Thus, a series of approaches to study edge formation have emerged including statistical tools [27,49] and model-based approaches [24,40,50] specifically established in the physics and complex networks communities.Other disciplines such as the computer sciences, and political sciences use these tools to understand how co-authorship networks [33] or online communities [1] form or evolve.
In terms of similarity, many social networks demonstrate a property known as homophily, which is the tendency of individuals to associate with others who are similar to them, e.g., with respect to gender or ethnicity [34].Alternatively, individuals may also prefer to close triangles by connecting to people whom they already share a friend with [18] which in turn can explain the emergence of communities [5], high connectivity [37], and induced homophily [3].Furthermore, the class balance or distribution of individual attributes over the network is often uneven, with coexisting groups of different sizes, e.g., one ethnic group may dominate the other in size.Popularity, on the other hand, often refers to how well connected a node is in the network which in turn creates an advantage over poorly connected nodes.This is also known as the rich-get-richer or Matthew effect when new nodes attach preferentially to other nodes that are already well connected [4].Many networks, including the World Wide Web, reflect such property by means of scale-free power-law degree distributions.
Here we will focus on the main mechanisms of edge formation namely homophily, triadic closure, node activity, and preferential attachment.Moreover, we will pay special attention to certain structural properties of networks such as class (im)balance, directed edges, and edge density.

Network models
In this section, we will review a set of well known network generator models.We will cover attributed graphs where nodes possess metadata information such as class membership, and edges are influenced by such information.The implementation of these models can be found in the netin python package.

Model selection and validation
Identifying the model that best explains a given network remains an open challenge.First, we will show how to infer the hyper-parameters of each network model (e.g., homophily and triadic closure [44]) given a real-world network.Then, we will learn how to use and interpret different approaches including AIC [55], BIC [2], MDL [42], Bayes factors [19], and likelihood ratios [56], and highlight their strengths and limitations under specific tasks.

Applications
Here, we will demonstrate how to exploit network models to generate a wide range of synthetic networks to understand how certain algorithms are influenced by network structure and edge formation.
The idea is to evaluate the outcomes of the following algorithms and see how they change while also changing the input network.

Biases in node sampling.
A range of network properties such as degree and betweenness centrality have been found to be sensitive to the choice of sampling methods [30,31,52].These efforts have shown that network estimates become more inaccurate with lower sample coverage, but there is a wide variability of these effects across different measures, network structures and sampling errors.In terms of benchmarking network sampling strategies, [7] shows that it is not enough to ask which method returns the most accurate sample (in terms of statistical properties); one also needs to consider API constraints and sampling budgets [9,13].
3.4.2Inequalities in node rankings.Previous studies have shown that homophily and group-size affect the visibility of minorities in centrality rankings [14,15,23].In particular, such structural rankings may reduce, replicate and amplify the visibility of minorities in top ranks when majorities are homophilic, neutral and heterophilic, respectively.In other words, minorities are not always under-represented, they are just not well connected, and this can be shown by systematically varying the structure of synthetic networks [14].Here, we will also touch upon interventions on how to improve the visibility of minorities in degree rankings [36].

Biases in network inference.
In recent years, there has been an increase of research focusing on mitigating bias [28,46] and guaranteeing individual and group fairness while preserving accuracy in classification algorithms [6,8,21,58].While all this body of research focuses on fairness influenced by the attributes of the individuals, recent research proposes a new notion of fairness that is able to capture the relational structure of individuals [16,60].An important aspect of explaining discrimination [35] via network structure is that we gain a better understanding of the direction of bias (i.e., why and when inference discriminates against certain groups of people) [9].

3.4.4
Inequalities in spreading dynamics.Spreading processes may include simple and complex contagion mechanisms, different transmission rates within and across groups, and different seeding conditions.Here, we will study information access equality to demonstrate to what extent network structure influences a spreading process which in turn may affect the equality and efficiency of information access [54].

Challenges and open questions
We will conclude by summarizing what we have learned, and by brainstorming future directions of what is still missing for producing realistic networks via synthetic data.

STYLE, DURATION, AND MATERIAL
This will be a 6-hour hybrid hands-on tutorial.We will provide ready to use jupyter notebooks with all necessary code, libraries, and settings.We will be using python=3.9and libraries such as: (1) networkx=2.8.8 (2) netin=1.0.7 (3) graph-tool=2.45(4) matplotlib=3.6.0 (5) numpy=1.23.4 (6) pandas=1.5.1 (7) jupyterlab=3.6 We will provide the slides of the tutorial beforehand, as well as code in the form of python scripts and notebooks.We will also use publicly available real-world networks [43].All materials can be found here, 4 and a video teaser of this tutorial here. 5

PREVIOUS EDITIONS
This is the first time the organizers together have conceptualized and planned this tutorial.However, it will not be the first time they organize and teach network science to a broad audience.Tiago Peixoto has an extensive record in organizing workshops 6 , and teaching at seminars and international schools on topics about data science, network science, and probabilistic and statistical methods for networks 7 .Fariba Karimi has given lectures and seminars on network science, theory, and dynamics to a broad audience including computer scientists and social scientists at the University of Koblenz-Landau and GESIS -The Leibniz Institute for the Social Sciences.Lisette Espin-Noboa co-organized and co-lectured in 2020 a 4-day virtual handson seminar for social scientists on how to do network analysis in Python [32].Additionally, Karimi and Espin-Noboa, co-organized a virtual satellite event at Networks 2021 where they invited a diverse group of researchers to talk about their research on network structure and social phenomena8 .

EQUIPMENT
We will require connection to the internet, a projector, and host permissions in Zoom for screen sharing, breakout rooms assignment, and remote access if necessary.Attendees may join the session online or in person using their own computers.

ORGANIZATION DETAILS
In case of unexpected events (e.g., restricted mobility, sickness, or bad internet connection) we will provide pre-recorded lectures of the entire tutorial.Moreover, all exercises will be given in advance as python scripts and Jupyter notebooks.