Sentiment Analysis and Effect of COVID-19 Pandemic using College SubReddit Data

Yan, Tian; Liu, Fang

Computer Science > Computation and Language

arXiv:2112.04351v1 (cs)

COVID-19 e-print

Important: e-prints posted on arXiv are not peer-reviewed by arXiv; they should not be relied upon without context to guide clinical practice or health-related behavior and should not be reported in news media as established information without consulting multiple experts in the field.

[Submitted on 30 Nov 2021 (this version), latest version 18 Sep 2023 (v3)]

Title:Sentiment Analysis and Effect of COVID-19 Pandemic using College SubReddit Data

Authors:Tian Yan, Fang Liu

View PDF

Abstract:The COVID-19 pandemic has affected societies and human health and well-being in various ways. In this study, we collected Reddit data from 2019 (pre-pandemic) and 2020 (pandemic) from the subreddits communities associated with 8 universities, applied natural language processing (NLP) techniques, and trained graphical neural networks with social media data, to study how the pandemic has affected people's emotions and psychological states compared to the pre-pandemic era. Specifically, we first applied a pre-trained Robustly Optimized BERT pre-training approach (RoBERTa) to learn embedding from the semantic information of Reddit messages and trained a graph attention network (GAT) for sentiment classification. The usage of GAT allows us to leverage the relational information among the messages during training. We then applied subgroup-adaptive model stacking to combine the prediction probabilities from RoBERTa and GAT to yield the final classification on sentiment. With the manually labeled and model-predicted sentiment labels on the collected data, we applied a generalized linear mixed-effects model to estimate the effects of pandemic and online teaching on people's sentiment in a statistically significant manner. The results suggest the odds of negative sentiments in 2020 is $14.6\%$ higher than the odds in 2019 ($p$-value $<0.001$), and the odds of negative sentiments are $41.6\%$ higher with in-person teaching than with online teaching in 2020 ($p$-value $=0.037$) in the studied population.

Subjects:	Computation and Language (cs.CL); Computers and Society (cs.CY); Machine Learning (cs.LG)
Cite as:	arXiv:2112.04351 [cs.CL]
	(or arXiv:2112.04351v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2112.04351

Submission history

From: Tian Yan [view email]
[v1] Tue, 30 Nov 2021 19:15:06 UTC (1,933 KB)
[v2] Thu, 2 Jun 2022 08:23:03 UTC (809 KB)
[v3] Mon, 18 Sep 2023 08:10:41 UTC (32 KB)

Computer Science > Computation and Language

Title:Sentiment Analysis and Effect of COVID-19 Pandemic using College SubReddit Data

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Sentiment Analysis and Effect of COVID-19 Pandemic using College SubReddit Data

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators