Published June 2026
| Version v1
Thesis
How Can We Understand How People Get Information from Video-Based Social Media Depending on Emotional Cues?
Description
Micro-video is a major platform for accessing public information. This research explores the role of emotional cues in information processing on TikTok, focusing on audio cues, visual facial expressions, and their temporal placement in videos. The study uses a sample of 5,645 TikTok videos created by top creators. Visual cues are extracted from video frames using DeepFace, and audio cues are extracted from the soundtrack using acoustic feature extraction. Audience engagement is operationalized using shares and comments as observable proxies. The results suggest that emotional indicators have low but statistically significant explanatory power. Multimodal models that incorporate audio and visual cues from early, full-length, and late temporal segments perform better than single-segment and singlemodality models. In terms of discrete emotions, surprise and fear show consistent positive relationships with sharing and commenting, while arousal-based classifications are less informative than discrete, temporally specific emotional cues. With respect to temporal effects, full-duration features provide the best overall fit; however, early-stage features are more relevant for predicting shares, whereas late-stage features are more relevant for predicting comments. Supplementary robustness checks indicate that the links between emotional cues and audience engagement are partly driven by platform algorithms that promote visibility. In conclusion, these findings lend support to the idea that information processing in short-form video settings is a multimodal and distributed process, rather than a reaction to discrete emotional cues.
Additional details
Identifiers
- Other
- oai:uchicago.tind.io:17122