Early Diffusion Signals for Predicting Reach and Veracity in Social Media Rumor Cascades

Zhang, Jiahao

doi:10.6082/uchicago.17121

Published June 2026 | Version v1

Thesis Open

Early Diffusion Signals for Predicting Reach and Veracity in Social Media Rumor Cascades

Zhang, Jiahao¹

1. University of Chicago

Contributors

Advisor:

Clipperton, Jean

Committee member:

Wang, Zhao

Misinformation response faces a timing constraint: harmful cascades can scale before reliable verification is available. This thesis asks whether early diffusion traces contain useful predictive signal for two related tasks: eventual reach and veracity. The main analysis uses FibVID, while Twitter15/16 are used as a comparative benchmark. I study size-based early windows (k ∈ {10, 20, 30, 45, 60, 90, 120, 180}) as the primary specification, since they hold observed message count fixed, and I report time-window analyses as robustness checks. Features combine residualized structural shape descriptors, Hawkes-inspired dynamic parameters, and structure-tempo interaction terms. Residualization is used to separate structural shape from early volume; after adjustment, the median absolute correlation between structural features and log early volume falls from about 0.248 to essentially zero in time windows. The main empirical result is a clear task asymmetry. On FibVID under the harmonized cascade-compatible specification, reach is strongly predictable from early diffusion: OLS reaches about $R^{2}=0.886$ at $k=180$, and the interaction-full specification peaks around $R^{2}=0.861$ at $k=120$. Veracity is much weaker. In the same FibVID analysis, the best baseline logit result reaches $AUC \approx 0.591$ at $k=180$ while richer feature bundles do not produce consistent gains over that baseline. The Twitter15/16 benchmark shows the same broad ordering, but the main result remains the FibVID pattern. Time-window results do not overturn these conclusions, but they are generally weaker than the size-based specification, especially for reach. The main implication is therefore not that early diffusion can classify truth reliably. Rather, early diffusion is much more informative for how far a cascade will spread than for whether it is false. In practical terms, the results support a staged view of diffusion-based early warning: diffusion features are most defensible as a weak misinformation-risk screening signal and a much stronger tool for prioritizing likely high-impact cascades.

Files

Thesis_paper_by_Jiahao_Zhang_2026.pdf

Files (947.8 kB)

Name	Size	Download all
Thesis_paper_by_Jiahao_Zhang_2026.pdf md5:f38d63db0df0b68e7f0150d7bbe009c2	947.8 kB	Preview Download

Additional details

Other: oai:uchicago.tind.io:17121

Division(s): Social Sciences Division
Department(s): Computational Social Sciences (MACSS)

	All versions	This version
Views	26	26
Downloads	8	8
Data volume	7.6 MB	7.6 MB

Early Diffusion Signals for Predicting Reach and Veracity in Social Media Rumor Cascades

Contributors

Advisor:

Committee member:

Files

Thesis_paper_by_Jiahao_Zhang_2026.pdf

Files (947.8 kB)

Additional details

Identifiers

UChicago Information

Early Diffusion Signals for Predicting Reach and Veracity in Social Media Rumor Cascades

Creators

Contributors

Advisor:

Committee member:

Description

Files

Thesis_paper_by_Jiahao_Zhang_2026.pdf

Files (947.8 kB)

Additional details

Identifiers

UChicago Information