Published June 2026
| Version v1
Thesis
Early Diffusion Signals for Predicting Reach and Veracity in Social Media Rumor Cascades
Description
Misinformation response faces a timing constraint: harmful cascades can scale before reliable verification is available. This thesis asks whether early diffusion traces contain useful predictive signal for two related tasks: eventual reach and veracity. The main analysis uses FibVID, while Twitter15/16 are used as a comparative benchmark. I study size-based early windows (k ∈ {10, 20, 30, 45, 60, 90, 120, 180}) as the primary specification, since they hold observed message count fixed, and I report time-window analyses as robustness checks. Features combine residualized structural shape descriptors, Hawkes-inspired dynamic parameters, and structure-tempo interaction terms. Residualization is used to separate structural shape from early volume; after adjustment, the median absolute correlation between structural features and log early volume falls from about 0.248 to essentially zero in time windows. The main empirical result is a clear task asymmetry. On FibVID under the harmonized cascade-compatible specification, reach is strongly predictable from early diffusion: OLS reaches about $R^{2}=0.886$ at $k=180$, and the interaction-full specification peaks around $R^{2}=0.861$ at $k=120$. Veracity is much weaker. In the same FibVID analysis, the best baseline logit result reaches $AUC \approx 0.591$ at $k=180$ while richer feature bundles do not produce consistent gains over that baseline. The Twitter15/16 benchmark shows the same broad ordering, but the main result remains the FibVID pattern. Time-window results do not overturn these conclusions, but they are generally weaker than the size-based specification, especially for reach. The main implication is therefore not that early diffusion can classify truth reliably. Rather, early diffusion is much more informative for how far a cascade will spread than for whether it is false. In practical terms, the results support a staged view of diffusion-based early warning: diffusion features are most defensible as a weak misinformation-risk screening signal and a much stronger tool for prioritizing likely high-impact cascades.
Additional details
Identifiers
- Other
- oai:uchicago.tind.io:17121