Universality, criticality and complexity of information propagation in social media

Notarmuzi, Daniele; Castellano, Claudio; Flammini, Alessandro; Mazzilli, Dario; Radicchi, Filippo

doi:10.1038/s41467-022-28964-8

Download PDF

Article
Open access
Published: 14 March 2022

Universality, criticality and complexity of information propagation in social media

Nature Communications volume 13, Article number: 1308 (2022) Cite this article

10k Accesses
33 Citations
11 Altmetric
Metrics details

Subjects

Abstract

Statistical laws of information avalanches in social media appear, at least according to existing empirical studies, not robust across systems. As a consequence, radically different processes may represent plausible driving mechanisms for information propagation. Here, we analyze almost one billion time-stamped events collected from several online platforms – including Telegram, Twitter and Weibo – over observation windows longer than ten years, and show that the propagation of information in social media is a universal and critical process. Universality arises from the observation of identical macroscopic patterns across platforms, irrespective of the details of the specific system at hand. Critical behavior is deduced from the power-law distributions, and corresponding hyperscaling relations, characterizing size and duration of avalanches of information. Statistical testing on our data indicates that a mixture of simple and complex contagion characterizes the propagation of information in social media. Data suggest that the complexity of the process is correlated with the semantic content of the information that is propagated.

Signs of criticality in social explosions

Article Open access 08 February 2024

Inferring mechanisms of response prioritization on social media under information overload

Article Open access 14 January 2021

Misunderstanding the harms of online misinformation

Article 05 June 2024

Introduction

Social media have dramatically changed the way people produce, access and consume information¹, and there is increasing evidence that online discussions have the potential to impact society in unprecedented ways². For example, the public debate around the COVID-19 pandemic has been accompanied by the so-called Infodemic that is affecting the outcome of the vaccination campaign by increasing hesitancy^3,4,5. Also, online discussions in the Reddit channel r/wallstreetbets induced many individuals to buy GameStop shares in opposition to the shorting operation carried out by hedge funds and professional investors. As a result, the market capitalization of the company displayed an increase of more than $22 billion in just a few days⁶. It is not surprising therefore the renewed scientific interest to comprehend the mechanisms that drive information propagation.

Analyses of the propagation of information in social media reveal, at least qualitatively, similarities with other natural phenomena such as the firing of neurons^7,8 and earthquakes⁹. These processes are characterized by bursty activity patterns. The activity consists of point-like events in time, and bursts (or avalanches) of activity are defined as sequences of close-by events. Bursts are separated by long periods of low activity. Activity can be characterized at the macroscopic level by the distributions P(S) and P(T) of the size S and the duration T of avalanches^{10,11,12,13,14,15}. In real-world systems P(S) and P(T) have a power-law decay for large value of their argument, i.e., P(S) ~ S^−τ and P(T) ~ T^−α^{7,8,9,12,16,17,18}. This property is interpreted as evidence of the system operating at, or in the vicinity of, a critical point. This statement is supported by the theory of absorbing phase transitions according to which, if the avalanche dynamics is at a critical point, then P(S) and P(T) must decay as power laws, see Eq. (3). Furthermore, in a process operating at criticality, the average size of avalanches with given duration must obey the hyperscaling relation 〈S〉 ~ T^γ, with γ = (α − 1)/(τ − 1)^16,19,20. The specific values of the exponents τ and α typically differ for classes of systems. Their actual values are fundamental for the characterization of systems into universality classes, i.e., an ontology of processes with conceptual and practical relevance²¹.

Universality is the notion that nearly identical avalanche statistics are observed for a multitude of systems governed by different dynamical laws that nevertheless share some basic core mechanisms. Criticality instead refers to the fact that avalanche statistics are characterized by algebraic distributions. Classifying a system within a universality class is informative about the basic core mechanisms that drive the unfolding of the avalanches. Where information propagation (in general, and in online social media) is concerned, the issue of the existence of well-defined universality classes is far from settled. Existing analyses typically study data collected from a single source and over short observation windows. It is often found that distributions of avalanche size and duration obey power laws, but the estimated values of the exponents vary across studies: τ values range between τ ≃ 2 and τ ≃ 4^{13,14,22,23,24}, whereas α ≃ 3.6²⁵ or α ≃ 2.5^26,27. Also, empirical studies reporting on correlations between size and duration of avalanches fail to find a power law^28,29. This variability might be ascribed to multiple operative definitions of avalanches, which can be given in terms of hashtags time series^22,28 as well as reply trees or retweet chains^13,24,30. Furthermore, regardless of the definition, the temporal resolution can affect the avalanche distribution^12,31.

As a consequence of the variability in the distributions inferred, uncertainty about representative theoretical models remains. In particular, it is an open problem to determine when and if models based on simple contagion are more appropriate to describe the spreading of information online than those based on complex contagion. Stemming from the similarity between the spreading of disease and information, a widely accepted paradigm is that information propagates according to a simple contagion process, where only a single exposure to activity may be sufficient for its diffusion^{10,13,22,28,32,33}. Simple contagion is at the core of many theoretical models of information propagation used in the literature, all displaying critical properties of the mean-field branching process (BP), i.e., τ = 3/2 and α = 2^34,35,36,37, see Methods. However, there are quite a few studies in favor of the complex contagion paradigm^38,39,40,41. As originally introduced by Centola and Macy, in a complex contagion process the involvement of an individual in the propagation of information requires exposure from multiple acquaintances⁴². Complex contagion is exemplified by some models, such as the linear threshold model and the Random Field Ising Model^19,43 (RFIM), see Methods. Distinguishing between simple and complex contagion and, possibly, comprehending how they coexist within the same population⁴⁴, is fundamental to understand the spreading of (mis)information in online social media^38,45.

In this work, we perform a large-scale study of (hash)tags time series from Twitter, Telegram, Weibo, Parler, StackOverflow and Delicious [see Methods and Supplementary Information (SI) A for details about the data sets]. We consider a total of 206,972,692 time series. In our study, a time series consists of all posts that carry the same topic identifier, such as a hashtag on Twitter. Taken cumulatively, our time series consists of 905,377,009 events, collected over periods even longer than 10 years. The Twitter data, collected specifically for this work, are fully available together with codes to reproduce the results of this paper^46,47. To define avalanches in a principled fashion we adopt the approach inspired by percolation theory proposed in Ref. ³¹, see Methods. We provide evidence that social media share universal statistics of avalanches that are well described by power-law distributions. We also develop a novel statistical technique able to determine the level of criticality and complexity of individual time series, see Methods. We find that nearly 20% of the time series are less than 5% away from criticality. These account for 53% of all events in our data sets. At the aggregate level, each social medium displays a critical behavior that is compatible with the RFIM, indicating that, plausibly, processes compatible with complex contagion may play a preponderant role in information diffusion. A more detailed analysis reveals a more nuanced scenario, where about 50% of the individual time series are better explained in terms of a complex rather than a simple contagion process. A qualitative analysis of the most popular hashtags suggests that information concerning conversational topics, e.g., music or TV shows, spreads according to the rules of simple contagion, whereas information concerning political/societal controversies shows signatures of an underlying complex contagion process.

Results

Selection of temporal resolution

Here, an avalanche is defined as a maximal subset of contiguous events in a time series such that two consecutive ones are separated by a time interval smaller than Δ. A proper choice Δ^* of the time resolution Δ for the specific data set at hand is necessary to avoid significant distortion in the resulting avalanche statistics. This is true for synthetic time series generated by temporal point processes³¹, but also for the empirical time series as those analyzed in this paper (see SI E for details). To determine the value of Δ^* we use the principled method developed in Ref. ³¹ that identifies Δ^* as the critical point of a one-dimensional percolation model, see Methods for details. Results are presented in Fig. 1. Values of Δ^* for each data set are reported in the SI A; they vary substantially across data sets, from Δ^* ≃ 1500 s for Twitter to Δ^* ≃ 30,000 s for Telegram (Fig. 1b).

**Fig. 1: Universality of information propagation in online social media.**

Once the time resolution is rescaled according to Δ → Δ/Δ^*, the curves of the percolation strength for the different data sets exhibit a nearly identical quantitative behavior, see insets of Fig. 1. This fact suggests the possibility of seeing the propagation of information in social media as a universal process, with Δ^* representing the natural resolution for observing information avalanches. Figure 2a, b shows the distributions of avalanche size and duration obtained by setting Δ = Δ^*. Figure 2c shows the relation between average size and duration. The collapse of the curves relative to different data sets on a single curve hints once more, at least when data are considered at the aggregate level, to processes belonging to the same universality class.

**Fig. 2: Universality and criticality of information propagation in social media.**

Criticality and universality of avalanche statistics

The avalanche statistics of Fig. 2a–c seems well described by power laws, indicating that the underlying process is (nearly) critical, and that its universality class can be identified by estimating the value of the critical exponents τ, α, and γ, see Eq. (3)²¹. We rely on maximum likelihood estimation for τ and α⁴⁸; linear regression on the logarithm of the relation 〈S〉 ~ T^γ is used to estimate γ. Results are reported in Fig. 2d, see SI C for details. The estimated exponent $\hat{\tau }$ is compatible with the one of the mean-field RFIM universality class, i.e., τ = 9/4¹⁹. The compatibility of the avalanche statistics with those of a homogeneous mean-field model is not surprising given that in some social media there is no underlying network among users and in others there are mechanisms for the propagation of information that bypass it. For example, in Telegram all users who subscribe to a channel receive all messages sent from any other user of that channel, meaning that there is an all-to-all network among all users of the channel as in the mean-field version of the RFIM. In StackOverflow there is no underlying network as users do not follow each other, rather they search for content using common tools offered by the platform. Even in Twitter, where users have follower–followee relationships, the network can be easily bypassed by the way the platform manages users’ feeds. There is an apparent mismatch between our estimates $\hat{\alpha }$ and $\hat{\gamma }$ and the RFIM predictions α = 7/2 and γ = 2 due to finite-size effects. To properly address this issue, we performed numerical simulations of the RFIM, and measured the maximum likelihood estimators of τ and α. For consistency, we performed the same operation for the BP too. The results of Fig. 2 reveal that, overall, our data are compatible with the phenomenology of the RFIM and not with the phenomenology of the BP.

The proximity of exponents estimated across different data sets points to the existence of a genuine and distinctive universality class for information propagation in social media when considered at the aggregate level. In particular, this class seems to be different from that of the BP often invoked as a representative in phenomena related to information diffusion. This universal scaling is a genuine feature of social media, as if we repeat the same analysis on time series describing activity in very different types of systems, e.g., brain networks and earthquakes, avalanche duration and size still decay in a power-law fashion, but with radically different exponent values, see SI D for details. In particular, for neuronal avalanches in the brain, we recover exponents compatible with previous studies^8,49,50,51.

Complexity of avalanche statistics

To assess if the statistical properties obtained on aggregate data are representative of individual time series, we develop a maximum likelihood method to fit the time series against the BP and the RFIM. The technique is inspired by the work of Ref. ⁴⁸, see Methods for details. The method supports three different tests. First, it establishes the regime of a time series, depending on how the best estimate of the branching ratio parameter $\hat{n}$ compares to the critical value n_c = 1 for the BP, or how the best estimate of the disorder parameter $\hat{R}$ compares to the critical value ${R}_{c}=\sqrt{2/\pi }\simeq 0.8$ for the RFIM. Second, it evaluates the goodness of the individual fits via their p values. Similarly to the prescription of Ref. ⁴⁸, we set the threshold for statistical significance equal to p = 0.1. We verified, however, that the outcome of the analysis is not greatly affected by the choice of the threshold value, see SI J. Third, it establishes whether a time series is better modeled by the BP or by the RFIM by comparing their likelihood.

Results of our analysis are reported in Figs. 3 and 4. Our method is applied only to time series that contain at least two avalanches larger than S_min = 10. These two avalanches must also have different sizes, so that P(S) has at least two non-zero values. Tests of robustness for different S_min values are reported in the SI J. In all systems we find that the best fitting parameter assumes values over a broad range, encompassing a large portion of the subcritical phase and the critical point of the models (Fig. 3a, b). The majority of events belongs to a minority of time series giving rise to the largest avalanches. As a consequence, the large-scale behavior of each system is mainly determined by those few time series that are fitted in a narrow region of the parameter space close to the critical point for both the BP and the RFIM (insets of Fig. 3a, b). Also, our tests indicate that the vast majority of time series are well described by at least one of the two models (Fig. 4a). The model selection indicates that individual time series are divided into two nearly equally populated classes, one better described by the BP and the other by the RFIM (Fig. 4a). Simple and complex contagion thus coexist in social media, with only a mild dominance of complex over simple contagion (Fig. 3c). The individual-level analysis is not incompatible with the results obtained for the aggregate data (Fig. 2). If we aggregate data only from the time series that we attributed to the class of complex contagion, we consistently recover a power-law scaling compatible with that class for all avalanche sizes, see Fig. 3d. However, the aggregation of time series that are classified in the BP class generate a distribution characterized by a neat crossover from BP scaling for small avalanches to RFIM scaling for large avalanches (Fig. 3d). The mixture produces a universal distribution that is overall more compatible with the RFIM universality class rather than the BP class (Fig. 2c).

**Fig. 3: Criticality and complexity of information propagation in online social media.**

**Fig. 4: Simple vs. complex contagion in online social media.**

Discussion

We showed that temporal patterns characterizing bursts of activity in online social media are conveniently classified in two universality classes. This finding suggests that few core mechanisms determine the large-scale behavior of information diffusion and that many peculiarities that characterize individual platforms are far less relevant. Also, in contrast with the vast majority of previous studies where purely diffusive models have been considered³⁷, we showed that information propagation in social media is often better described by complex contagion dynamics. Complex contagion is here exemplified by the RFIM, an agent-based model of activation originally formulated to describe the para-to-ferromagnetic phase transition in metals¹⁹. Recast in the language proper to the description of information propagation⁵², the RFIM prescribes that each agent (i) has a personal opinion, (ii) is subject to the social influence exerted by the agents she interacts with, and (iii) is also driven by an external force representing the public information about exogenous events. These appear reasonable assumptions for modeling many realistic discussions happening in social media. Figure 4b shows the 30 most popular Twitter hashtags identified by our method either in the simple or in the complex contagion classes. In the category of simple contagion, we find conversational topics, mostly related to music or cinema/TV shows. Hashtags belonging to the class of complex contagion display either periodic patterns or are related to political/controversial themes. This suggests the existence of a relation between the semantics of hashtags and the universality class of the corresponding time series. This qualitative picture fits with previous studies that have explicitly focused on the semantic of different hashtags in Twitter⁴⁵. For both classes of information avalanches, we inferred the dynamics underlying their generation as critical, a fact that provides theoretical ground for the surprising but remarkable robustness of our findings. The presence of a large portion of social media content that acquires popularity via complex contagion dynamics calls for a reconsideration of predictive algorithms relying on the temporal characteristics of the signal only, because these algorithms often neglect the semantics of hashtags and, even more frequently, the characteristics of the network over which they spread^{53,54,55,56,57}. Both aspects are important for the successful characterization of the process underlying the propagation of information^38,45,58,59. We further speculate that our results extend beyond the six platforms considered here. If so, there must be a mechanism that explains the universality shown by the data, involving critical dynamics that is independent of the peculiarities implemented in the individual platforms. Understanding where this mechanism is rooted in and how to exploit this mechanism for the prediction of the propagation of information in online social media remain open challenges for future research.

Methods

Data

We build a time series for each (hash)tag appearing in the data at our disposal. A time series contains the times, i.e., {t₁, t₂, …}, when the (hash)tag is observed in the data.

Specifically, the Twitter data set is composed of 2,353,192,777 tweets corresponding to a 10% random sample of all Tweets posted on Twitter during the observation window from October 1 to November 30, 2019. The collection of this data has been performed via the Indiana University OSoME Decahose stream^60,61. Telegram time series are extracted from a total of 317,224,715 messages, originally collected in Ref. ⁶². Parler time series are extracted from a total of 183,062,974 posts, originally collected in Ref. ⁶³. Weibo time series are extracted from 226,841,249 posts, originally collected in Ref. ⁶⁴. StackOverflow time series are extracted from a total number of 46,947,635 questions and answers. Delicious time series were extracted from 7,034,524 users actions, originally collected in Ref. ⁶⁵. Timestamps always have the temporal resolution of the second, except for the StackOverflow data set, whose temporal resolution is the millisecond.

We pre-process the data so that the number of events per unit time is roughly constant over the whole temporal window considered (see SI A for details) to obtain a corpus of 206,972,692 time series consisting of 905,377,009 total events.

Selection of the temporal resolution

We follow the same procedure as in Ref. ³¹. Given a time series {t₁, t₂, …}, we define an avalanche starting at t_b as a sequence of events {t_b, t_b+1, …, t_b+S−1} such that t_b − t_b−1 > Δ, t_b+S − t_b+S−1 > Δ and t_b+i − t_b+i−1 ≤ Δ for all i = 1, …, S, where Δ is the resolution parameter. The size S of an avalanche is the number of events within it and the duration T is the time lag between the first and last event in the avalanche, i.e., T = t_b+S−1 − t_b. Depending on the value of Δ, the same time series is composed of different avalanches.

We identify the optimal resolution Δ^* as the critical point of a one-dimensional percolation model that is used to describe the time series. Each time series in a data set is considered as an instance of the one-dimensional percolation model. We measure the size S_M of the largest avalanche within each time series. We define the percolation strength P_∞ and its associated susceptibility χ, respectively, as

$$\begin{array}{l}{P}_{\infty }=\langle {S}_{M}\rangle \\ \chi =\frac{\langle {S}_{M}^{2}\rangle -{\left\langle {S}_{M}\right\rangle }^{2}}{\langle {S}_{M}\rangle }\,,\end{array}$$

(1)

where 〈S_M〉 and $\langle {S}_{M}^{2}\rangle$ are, respectively, the first and second moments of the distribution of the size of the largest avalanche S_M across all time series in a data set. Δ^* is computed as the resolution maximizing χ, i.e.,

$${{{\Delta }}}^{* }=\arg \max \,\chi ({{\Delta }})\ .$$

(2)

As time series with only one event introduces an offset in the measure of P_∞ and are not informative with respect to the optimal resolution Δ^*, i.e., S_M = 1 for any Δ in these time series, we remove them from the sample and compute P_∞ and χ considering only time series composed of at least two events.

Values of the optimal resolution Δ^* are reported in SI A. Note that the avalanche statistics reported in Fig. 2 is obtained considering all avalanches, excluding the largest one of each time series. This choice is due to the well-known fact that in percolation theory the largest cluster respects different statistics than that of finite clusters⁶⁶.

The branching process

In the BP an individual initially active spreads activity to a random number of peers, who can in turn spread activity further³⁴. The process continues for a number T of time steps or generations, until there is a generation in which no individual further spreads activity. T is the duration of the avalanche. The size S of the avalanche is the total number of individuals activated during the avalanche. The average number of individuals who are activated from a single spreader is the branching ratio n and the model is critical for n = n_c = 1. The branching ratio is the only tunable parameter of the model.

Finite avalanches of activity in the BP obey the laws

$$\begin{array}{l}P(S)={S}^{-\tau }{{{{{{{{\mathcal{D}}}}}}}}}_{S}({S}^{\sigma }n^{\prime} )\\ P(T)={T}^{-\alpha }{{{{{{{{\mathcal{D}}}}}}}}}_{T}({T}^{1/z\nu }n^{\prime} )\\ \langle S\rangle (T)\propto {T}^{\gamma }\,,\end{array}$$

(3)

where 〈⋅〉 is the average over different avalanches, and P(S) and P(T) are the probability distributions of S and T, respectively. The functions ${{{{{{{{\mathcal{D}}}}}}}}}_{S}$ and ${{{{{{{{\mathcal{D}}}}}}}}}_{T}$ are known as scaling functions and introduce corrections at small values of their argument, where we have defined the reduced distance from the critical point $n^{\prime} =| n-{n}_{c}| /{n}_{c}$. The BP is characterized by the exponent τ = 3/2, α = 2 and γ = 2. The above exponents are not independent, rather they are related by γ = 1/(σzν) = (α − 1)/(τ − 1). σ, z and ν are additional critical exponents that we do not explicitly consider in our analysis.

The Random Field Ising Model

We consider the mean-field formulation of the zero-temperature RFIM. Agent i is characterized by the state variable y_i = ±1 indicating whether the agent is active, y_i = +1, or not, y_i = −1. Each agent i has a propensity h_i to become active, with h_i ∈ (−∞, +∞). A large value of h_i indicates that the agent is particularly prone to become active. Agents interact by means of ferromagnetic interactions that model social pressure, i.e., active neighbors push an inactive agent to become active. The whole system is further affected by public information that all agents have access to and that pushes users toward becoming active with intensity H ∈ (−∞, +∞). In the initial configuration, all agents are inactive. The external pressure H grows till the agent with the largest h_i value becomes active. This change of state can trigger an avalanche of activity in the other nodes. Specifically, agent j becomes active if the following condition is met

$$H+{h}_{j}+{N}^{-1}\mathop{\sum}\limits_{k\ne j}{y}_{k} \, > \, 0\,,$$

(4)

where N is the system size and the mean-field formulation is expressed by the all-to-all interaction. Once in the active state, agents cannot change their state back to inactive. When an avalanche ends, the external pressure H grows again until a new user becomes active and triggers a new avalanche. The field is frozen during the unfolding of avalanches, meaning that avalanches are characterized by a time scale much shorter than the one characterizing external pressure. In the long-term limit, when H = +∞, all agents become active. The size S of an avalanche is given by the number of users that are activated during the avalanche; its duration T is given by the activation rounds characterizing the avalanche.

The stochasticity of the model comes from the random nature of the propensities h_i, extracted from a normal distribution with zero mean and variance R. The choice of the normal distribution is quite standard both for ferromagnets and social systems⁵². R is the only tunable parameter of the model, and the model is critical for $R={R}_{c}=\sqrt{2/\pi }$. Avalanche statistics obey laws similar to those of Eq. (3). The functional form of the scaling functions, however, is not the same as in the BP; also, their argument is given in terms of the distance from the critical point of RFIM, i.e., $n^{\prime} =| n-{n}_{c}| /{n}_{c}$ is replaced by $R^{\prime} =| R-{R}_{c}| /{R}_{c}$. The values of the critical exponents are τ = 9/4, α = 7/2 and γ = 2¹⁹. In SI F, we show that the peculiar form of the scaling function ${{{{{{{{\mathcal{D}}}}}}}}}_{T}$ introduces strong preasymptotic corrections to the functions P(T) and 〈S〉(T), affecting the measure of α and γ obtained through numerical simulations of the model.

Model selection

To ascribe each time series to a dynamical model, we first fit each model individually by maximizing its likelihood. We evaluate the p value of the fits and, if both hypotheses cannot be rejected, we select the best fit via the log-likelihood ratio test.

To perform the fit, we compare the probability distribution P(S) of the avalanche sizes identified in the time series with the conditional distributions of the avalanche size Q_RFIM(S∣R) and Q_BP(S∣n), respectively, obtained for the RFIM and the BP for a given value of the parameters R and n. The construction of the model distributions Q requires discretizing the parameter space of the models. In this study R varies in the interval [0.025, 2.7] by steps of length dR = 0.025 and n varies in [0.02, 1.7] by steps of length dn = 0.015. dR (dn) represents the uncertainty on the parameter. Instead of sampling avalanches from the model at a precisely given value of R (n), we consider model instances corresponding to R (n) values uniformly distributed over an interval of length dR (dn) centered at R (n). The distribution Q corresponding to a specific value of the parameter model is constructed as the superposition of 500 distributions whose parameter values are randomly sampled from the corresponding interval. Fitting a time series to a model means estimating the best parameter with an accuracy of dR (dn) for the RFIM (BP).

Given the empirical distribution P and the model distributions Q, we evaluate the log-likelihood function

$$\,{{\mbox{L}}}\,(P| | Q)=\mathop{\sum}\limits_{S\ge {S}_{{{{{{\rm{min}}}}}}}}P(S)\log [Q(S)]\,.$$

(5)

The summation is performed over all avalanches with S ≥ S_min, a parameter we vary in our analysis. The distributions P and Q are normalized over the interval [S_min, ∞) to account for this fact. The best fit is obtained by finding the parameter value that maximizes the log-likelihood of Eq. (5). The maximization of the log-likelihood of Eq. (5) is equivalent to the minimization of the cross-entropy of the distribution Q relative to the distribution P. To avoid numerical problems in the estimation of the likelihood, we smoothen the function Q. Details are provided in SI G.

To assign a p value to a fit, we follow the prescription of Ref. ⁴⁸. Indicating with Z_tail/Z the fraction of avalanches with S ≥ S_min in the fitted time series, a synthetic sample of Z avalanches is created by sampling avalanches with S≥S_min from the selected model Q with probability Z_tail/Z and by sampling avalanches with S < S_min from the empirical distribution with complementary probability. Each of these synthetic samples is fitted analogously to the original sample obtained from the time series. We compute the Kolmogorov–Smirnov (KS) distance between the empirical distribution P and the selected model Q, as well as between the synthetic samples and their best model. The p value of the fit is defined as the fraction of synthetic samples whose KS distance from the selected model is larger than the KS distance between the real sample and its best model. The hypothesis that the sample has been generated by a certain dynamical model, say the RFIM, cannot be rejected if the p value of the fit to the RFIM is larger than a pre-established significance threshold. We set the threshold to 0.1 in the main text, following the prescription of Ref. ⁴⁸. Tests of robustness against the choice of this parameter value are reported in SI J.

If one of the two hypotheses can be rejected but the other cannot, the non-rejected model automatically becomes the selected one. If both hypotheses can be rejected, the time series is classified as “None.” If, however, both hypotheses cannot be rejected, we select as the best model the one with the largest likelihood⁴⁸. We neglect the possibility that a single time series could be described by a mixture of models. Empirical data are fitted only if the time series contains at least 50 events and at least 10 avalanches.

We validate our fitting procedure applying it to synthetic distributions P generated by the RFIM or by the BP. Results are shown in SI I and confirm the ability of our procedure to identify the ground-truth model and the correct value of the parameter.

More details about the fitting and model selection protocol, including tests of robustness against the threshold on the p value and on S_min, are given in the SI.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

The Twitter data generated in this study have been deposited in the Zenodo (https://zenodo.org/record/5779063#.Yg_aP-7MLCV) and GitHub (https://github.com/DaniMuzi/SocialMedia) database^46,47. Telegram, Parler, Weibo, StackOverflow, and Delicious data used in this study have been generated in other works. URLs to each of these data sets are provided in SI A.

Code availability

The Python and C codes used for this project are available on Zenodo (https://zenodo.org/record/5779063#.Yg_aP-7MLCV) and GitHub (https://github.com/DaniMuzi/SocialMedia)^46,47.

References

Ahmad, A. N. Is Twitter a useful tool for journalists? J. Media Pract. 11, 145–155 (2010).
Article Google Scholar
Kwak, H., Lee, C., Park, H. & Moon, S. What is Twitter, a social network or a news media? In Web Conf. 2010 – Proc. World Wide Web Conf. WWW 2010, 591–600 (2010).
Pierri, F. et al. The impact of online misinformation on us covid-19 vaccinations. arXiv preprint arXiv:2104.10635 (2021).
Yang, K.-C., Torres-Lugo, C. & Menczer, F. Prevalence of low-credibility information on Twitter during the covid-19 outbreak. Proc. ICWSM Intl. Workshop on Cyber Social Threats (CySoc) https://doi.org/10.36190/2020.16 (2020).
Yang, K.-C. et al. The covid-19 infodemic: Twitter versus Facebook. Big Data Soc. 8, 20539517211013861 (2021).
Article Google Scholar
Phillips, M. & Lorenz, T. ‘dumb money’ is on GameStop, and it’s beating wall street at its own game. The New York Times (2021).
Dalla Porta, L. & Copelli, M. Modeling neuronal avalanches and long-range temporal correlations at the emergence of collective oscillations: continuously varying exponents mimic m/eeg results. PLoS Comput. Biol. 15, e1006924 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Beggs, J. M. & Plenz, D. Neuronal avalanches in neocortical circuits. J. Neurosci. 23, 11167–11177 (2003).
Article CAS PubMed PubMed Central Google Scholar
Bak, P., Christensen, K., Danon, L. & Scanlon, T. Unified scaling law for earthquakes. Phys. Rev. Lett. 88, 178501 (2002).
Article ADS PubMed Google Scholar
Gleeson, J. P., Ward, J. A., O’sullivan, K. P. & Lee, W. T. Competition-induced criticality in a model of meme popularity. Phys. Rev. Lett. 112, 048701 (2014).
Article ADS PubMed Google Scholar
Barabasi, A.-L. The origin of bursts and heavy tails in human dynamics. Nature 435, 207–211 (2005).
Article ADS CAS PubMed Google Scholar
Karsai, M., Kaski, K., Barabási, A.-L. & Kertész, J. Universal features of correlated bursty behaviour. Sci. Rep. 2, 1–7 (2012).
Article Google Scholar
Nishi, R. et al. Reply trees in twitter: data analysis and branching process models. Soc. Netw. Anal. Min. 6, 26 (2016).
Article Google Scholar
Wegrzycki, K., Sankowski, P., Pacuk, A. & Wygocki, P. Why do cascade sizes follow a power-law? In Web Conf. 2017 – Proc. World Wide Web Conf. WWW 2017, 569–576 (2017).
Lerman, K. & Ghosh, R. Information contagion: an empirical study of the spread of news on Digg and Twitter social networks. In 4th Int. AAAI Conf. Web Soc. Media ICWSM 2010, vol. 4 (2010).
Munoz, M. A., Dickman, R., Vespignani, A. & Zapperi, S. Avalanche and spreading exponents in systems with absorbing states. Phys. Rev. E 59, 6175 (1999).
Article ADS CAS Google Scholar
Onnela, J.-P. & Reed-Tsochas, F. Spontaneous emergence of social influence in online systems. Proc. Natl. Acad. Sci. USA. 107, 18375–18380 (2010).
Article ADS CAS PubMed PubMed Central Google Scholar
Munoz, M. A. Colloquium: criticality and dynamical scaling in living systems. Rev. Mod. Phys. 90, 031001 (2018).
Article ADS MathSciNet Google Scholar
Sethna, J. P., Dahmen, K. A. & Myers, C. R. Crackling noise. Nature 410, 242–250 (2001).
Article ADS CAS PubMed Google Scholar
Colaiori, F. Exactly solvable model of avalanches dynamics for barkhausen crackling noise. Adv. Phys. 57, 287–359 (2008).
Article ADS CAS Google Scholar
Ódor, G. Universality classes in nonequilibrium lattice systems. Rev. Mod. Phys. 76, 663 (2004).
Article ADS MathSciNet MATH Google Scholar
Sreenivasan, S., Chan, K. S., Swami, A., Korniss, G. & Szymanski, B. K. Information cascades in feed-based networks of users with limited attention. IEEE Trans. Netw. Sci. Eng. 4, 120–128 (2016).
Article MathSciNet Google Scholar
Zhou, F., Xu, X., Trajcevski, G. & Zhang, K. A survey of information cascade analysis: Models, predictions, and recent advances. ACM Comput. Surv. 54, 1–36 (2021).
Article Google Scholar
Cao, Q., Shen, H., Cen, K., Ouyang, W. & Cheng, X. Deephawkes: bridging the gap between prediction and understanding of information cascades. In Proc. ACM Int. Conf. Inf. Knowl. Manag., 1149–1158 (2017).
Oliveira, D. F. & Chan, K. S. Diffusion of information in an online social network with limited attention. Inf. Secur. 43, 362–374 (2019).
Google Scholar
Bild, D. R., Liu, Y., Dick, R. P., Mao, Z. M. & Wallach, D. S. Aggregate characterization of user behavior in twitter and analysis of the retweet graph. ACM Trans. Internet Technol. 15, 1–24 (2015).
Article Google Scholar
Weng, L., Flammini, A., Vespignani, A. & Menczer, F. Competition among memes in a world with limited attention. Sci. Rep. 2, 335 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Gleeson, J. P., O’Sullivan, K. P., Baños, R. A. & Moreno, Y. Effects of network structure, competition and memory time on social spreading phenomena. Phys. Rev. X 6, 021019 (2016).
PubMed Central Google Scholar
Szabo, G. & Huberman, B. A. Predicting the popularity of online content. Commun. ACM 53, 80–88 (2010).
Article Google Scholar
Li, W., Cranmer, S. J., Zheng, Z. & Mucha, P. J. Infectivity enhances prediction of viral cascades in Twitter. PLoS One 14, e0214453 (2019).
Article CAS PubMed PubMed Central Google Scholar
Notarmuzi, D., Castellano, C., Flammini, A., Mazzilli, D. & Radicchi, F. Percolation theory of self-exciting temporal processes. Phys. Rev. E 103, L020302 (2021).
Article ADS CAS PubMed Google Scholar
O’Brien, J. D., Aleta, A., Moreno, Y. & Gleeson, J. P. Quantifying uncertainty in a predictive model for popularity dynamics. Phys. Rev. E 101, 062311 (2020).
Article ADS MathSciNet PubMed Google Scholar
Crane, R. & Sornette, D. Robust dynamic classes revealed by measuring the response function of a social system. Proc. Natl. Acad. Sci. USA. 105, 15649–15653 (2008).
Article ADS CAS PubMed PubMed Central Google Scholar
Watson, H. W. & Galton, F. On the probability of the extinction of families. J.R. Anthropol. Inst. G.B. Irel. 4, 138–144 (1875).
Google Scholar
Harris, T. E. et al. The Theory of Branching Processes, Vol. 6 (Springer Berlin, 1963).
Liggett, T. M. Interacting Particle Systems, Vol. 276 (Springer Science & Business Media, 2012).
Radicchi, F., Castellano, C., Flammini, A., Muñoz, M. A. & Notarmuzi, D. Classes of critical avalanche dynamics in complex networks. Phys. Rev. Res. 2, 033171 (2020).
Article CAS Google Scholar
Weng, L., Menczer, F. & Ahn, Y.-Y. Predicting successful memes using network and community structure. In 8th Int. AAAI Conf. Web Soc. Media ICWSM 2014, Vol. 8 (2014).
Vasconcelos, V. V., Levin, S. A. & Pinheiro, F. L. Consensus and polarization in competing complex contagion processes. J. R. Soc. Interface 16, 20190196 (2019).
Article PubMed PubMed Central Google Scholar
State, B. & Adamic, L. The diffusion of support in an online social movement: evidence from the adoption of equal-sign profile pictures. In CSCW 2015 – Companion 2015 ACM Conf. Comput. Support. Coop. Work Soc. Comput., 1741–1750 (2015).
Hodas, N. O. & Lerman, K. The simple rules of social contagion. Sci. Rep. 4, 1–7 (2014).
Google Scholar
Centola, D. & Macy, M. Complex contagions and the weakness of long ties. Am. J. Sociol. 113, 702–734 (2007).
Article Google Scholar
Dodds, P. S. & Watts, D. J. A generalized model of social and biological contagion. J. Theor. Biol. 232, 587–604 (2005).
Article ADS MathSciNet CAS PubMed MATH Google Scholar
Guilbeault, D., Becker, J. & Centola, D. Complex contagions: a decade in review. Complex Spreading Phenomena in Social Systems 3–25 (2018).
Romero, D. M., Meeder, B. & Kleinberg, J. Differences in the mechanics of information diffusion across topics: idioms, political hashtags, and complex contagion on Twitter. In Web Conf. 2011 – Proc. World Wide Web Conf. WWW 2011, 695–704 (2011).
Notarmuzi, D., Castellano, C., Flammini, A., Mazzilli, D. & Radicchi, F. GitHub. https://github.com/DaniMuzi/SocialMedia (2021).
Notarmuzi, D., Castellano, C., Flammini, A., Mazzilli, D. & Radicchi, F. Zenodo. https://zenodo.org/record/5779063#.Ybhyi33P1Yg (2021).
Clauset, A., Shalizi, C. R. & Newman, M. E. Power-law distributions in empirical data. SIAM Rev. 51, 661–703 (2009).
Article ADS MathSciNet MATH Google Scholar
Haldeman, C. & Beggs, J. M. Critical branching captures activity in living neural networks and maximizes the number of metastable states. Phys. Rev. Lett. 94, 058101 (2005).
Article ADS PubMed Google Scholar
Friedman, N. et al. Universal critical dynamics in high resolution neuronal avalanche data. Phys. Rev. Lett. 108, 208102 (2012).
Article ADS PubMed Google Scholar
Shriki, O. et al. Neuronal avalanches in the resting meg of the human brain. J. Neurosci. 33, 7079–7090 (2013).
Article CAS PubMed PubMed Central Google Scholar
Michard, Q. & Bouchaud, J.-P. Theory of collective opinion shifts: from smooth trends to abrupt swings. Eur. Phys. J. B 47, 151–159 (2005).
Article ADS CAS Google Scholar
Kobayashi, R. & Lambiotte, R. Tideh: time-dependent Hawkes process for predicting retweet dynamics. In 10th Int. AAAI Conf. Web Soc. Media ICWSM 2016, Vol. 10 (2016).
Zhao, Q., Erdogdu, M. A., He, H. Y., Rajaraman, A. & Leskovec, J. Seismic: a self-exciting point process model for predicting tweet popularity. In Proc. 21th ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., 1513–1522 (2015).
Matsubara, Y., Sakurai, Y., Prakash, B. A., Li, L. & Faloutsos, C. Rise and fall patterns of information diffusion: model and implications. In Proc. 18th ACM SIGKDD Int. Conf. Knowl. Discov. Data Min, 6–14 (2012).
Rizoiu, M.-A. et al. Expecting to be hip: Hawkes intensity processes for social media popularity. In Web Conf. 2017 – Proc. World Wide Web Conf. WWW 2017, 735–744 (2017).
Haimovich, D., Karamshuk, D., Leeper, T. J., Riabenko, E. & Vojnovic, M. Scalable prediction of information cascades over arbitrary time horizons. Preprint at arXiv:2009.02092 (2020).
Barzel, B. & Barabási, A.-L. Universality in network dynamics. Nat. Phys. 9, 673–681 (2013).
Article CAS PubMed Central Google Scholar
Hens, C., Harush, U., Haber, S., Cohen, R. & Barzel, B. Spatiotemporal signal propagation in complex networks. Nat. Phys. 15, 403–412 (2019).
Article CAS Google Scholar
University, I. OSoMe, Observatory on social media. https://osome.iu.edu (2020).
Twitter. Decahose stream. https://developer.twitter.com/en/docs/twitter-api/v1/tweets/sample-realtime/overview/decahose.
Baumgartner, J., Zannettou, S., Squire, M. & Blackburn, J. The pushshift telegram dataset. Proceedings of the International AAAI Conference on Web and Social Media. 14, 840–847 (2020).
Aliapoulios, M. et al. An early look at the parler online social network. Preprint at arXiv:2101.03820 (2021).
Fu, K.-w., Chan, C.-h. & Chau, M. Assessing censorship on microblogs in china: discriminatory keyword analysis and the real-name registration policy. IEEE Internet Comput. 17, 42–50 (2013).
Article Google Scholar
Basile, V., Peroni, S., Tamburini, F. & Vitali, F. Topical tags vs non-topical tags: towards a bipartite classification? J. Inf. Sci. 41, 486–505 (2015).
Article Google Scholar
Stauffer, D. & Aharony, A. Introduction to Percolation Theory (CRC Press, 2018).

Download references

Acknowledgements

F.R. acknowledges support from the National Science Foundation (CMMI-1552487). D.N. was partially funded by the National Science Foundation NRT grant 1735095. Any opinions, findings, and conclusions or recommendations expressed in this work are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. A.F. acknowledges support from DARPA award HR001121C0169.

Author information

Authors and Affiliations

Center for Complex Networks and Systems Research, Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, 47408, USA
Daniele Notarmuzi, Alessandro Flammini, Dario Mazzilli & Filippo Radicchi
Istituto dei Sistemi Complessi (ISC-CNR), Via dei Taurini 19, I-00185, Roma, Italy
Claudio Castellano
Centro Ricerche Enrico Fermi, Via Panisperna 89 A, Roma, Italy
Claudio Castellano & Dario Mazzilli

Authors

Daniele Notarmuzi
View author publications
Search author on:PubMed Google Scholar
Claudio Castellano
View author publications
Search author on:PubMed Google Scholar
Alessandro Flammini
View author publications
Search author on:PubMed Google Scholar
Dario Mazzilli
View author publications
Search author on:PubMed Google Scholar
Filippo Radicchi
View author publications
Search author on:PubMed Google Scholar

Contributions

D.N., C.C., A.F., D.M., and F.R. designed the experiments and wrote the paper. D.N. performed the data collection and the experiments.

Corresponding author

Correspondence to Filippo Radicchi.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Baruch Barzel, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Peer Review File

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Notarmuzi, D., Castellano, C., Flammini, A. et al. Universality, criticality and complexity of information propagation in social media. Nat Commun 13, 1308 (2022). https://doi.org/10.1038/s41467-022-28964-8

Download citation

Received: 21 September 2021
Accepted: 21 February 2022
Published: 14 March 2022
Version of record: 14 March 2022
DOI: https://doi.org/10.1038/s41467-022-28964-8

This article is cited by

Environmental communication strategies in green consumption: spatiotemporal shifts across six domains revealed by social big data
- Han Huang
- Xiaomei Zeng
- Wenqi Wu
Environment, Development and Sustainability (2025)
MIGCL: Fake news detection with multimodal interaction and graph contrastive learning networks
- Wei Cui
- Mingsheng Shang
Applied Intelligence (2025)
Collective dynamics behind success
- Manuel S. Mariani
- Federico Battiston
- Dashun Wang
Nature Communications (2024)
Network toxicity analysis: an information-theoretic approach to studying the social dynamics of online toxicity
- Rupert Kiddle
- Petter Törnberg
- Damian Trilling
Journal of Computational Social Science (2024)