Abstract
Social media platforms such as Twitter or StockTwits are widely used for sharing stock market opinions between investors, traders, and entrepreneurs. Empirically, previous work has shown that the content posted on these social media platforms can be leveraged to predict various aspects of stock market performance. Nonetheless, actors on these social media platforms may not always have altruistic motivations and may instead seek to influence stock trading behavior through the (potentially misleading) information they post. While a lot of previous work has sought to analyze how social media can be used to predict the stock market, there remain many questions regarding the quality of the predictions and the behavior of active users on these platforms. To this end, this article seeks to address a number of open research questions: Which social media platform is more predictive of stock performance? What posted content is actually predictive, and over what time horizon? How does stock market posting behavior vary among different users? Are all users trustworthy or do some user’s predictions consistently mislead about the true stock movement? To answer these questions, we analyzed data from Twitter and StockTwits covering almost 5 years of posted messages spanning 2015 to 2019. The results of this large-scale study provide a number of important insights among which we present the following: (i) StockTwits is a more predictive source of information than Twitter, leading us to focus our analysis on StockTwits; (ii) on StockTwits, users’ self-labeled sentiments are correlated with the stock market but are only slightly predictive in aggregate over the short-term; (iii) there are at least three clear types of temporal predictive behavior for users over a 144 days horizon: short, medium, and long term; and (iv) consistently incorrect users who are reliably wrong tend to exhibit what we conjecture to be “botlike” post content and their removal from the data tends to improve stock market predictions from self-labeled content.
- [1] . 2021. Stance detection on social media: State of the art and trends. Inf. Process. Manage. 58, 4 (2021), 102597.
DOI: Google ScholarDigital Library
- [2] . 2011. TweetTracker: An analysis tool for humanitarian and disaster relief. In Proceedings of the International AAAI Conference on Web and Social Media a (ICWSM’11). 78–82.Google Scholar
- [3] . 2020. Relevance- and interface-driven clustering for visual information retrieval. Inf. Syst. 94 (2020), 101592.
DOI: Google ScholarCross Ref
- [4] . 2019. Relevance-driven clustering for visual information retrieval on Twitter. In Proceedings of the Conference on Human Information Interaction and Retrieval (CHIIR’19). Association for Computing Machinery, New York, NY, 349–353.
DOI: Google ScholarDigital Library
- [5] . 2019. Event tracker: A text analytics platform for use during disasters. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’19). Association for Computing Machinery, New York, NY, 1341–1344.
DOI: Google ScholarDigital Library
- [6] . 2016. Can social media predict election results? Evidence from New Zealand. J. Pol. Market. 15, 4 (2016), 416–432.
DOI: Google ScholarCross Ref
- [7] . 2014. Echo Chamber or public sphere? Predicting political orientation and measuring political homophily in Twitter using big data. J. Commun. 64, 2 (2014), 317–332.
arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/jcom.12084 .Google ScholarCross Ref
- [8] . 2016. Forecasting the subway passenger flow under event occurrences with social media. IEEE Trans. Intell. Transport. Syst. 18, 6 (2016), 1623–1632.Google Scholar
- [9] . 2015. Utilizing text mining on online medical forums to predict label change due to adverse drug reactions. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’15). Association for Computing Machinery, New York, NY, 1779–1788.
DOI: Google ScholarDigital Library
- [10] . 2015. Extracting adverse drug reactions from social media. In Proceedings of the 29th AAAI Conference on Artificial Intelligence.Google Scholar
Cross Ref
- [11] . 2019. Event attendance classification in social media. Inf. Process. Manage. 56, 3 (2019), 687–703.
DOI: Google ScholarDigital Library
- [12] . 2011. Twitter mood as a stock market predictor. Computer 44, 10 (
Ocber 2011), 91–94.DOI: Google ScholarDigital Library
- [13] . 1900. Théorie de la spéculation. Ann. sci. l’École Norm. Sup. 3e, 17 (1900), 21–86.
DOI: Google ScholarCross Ref
- [14] . 2015. Topic modeling based sentiment analysis on social media for stock market prediction. In Proceedings of the Joint Conference of the Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP’15) (Volume 1: Long Papers). Association for Computational Linguistics, 1354–1364.
DOI: Google ScholarCross Ref
- [15] . 2016. Trade the tweet: Social media text mining and sparse matrix factorization for stock market prediction. Int. Rev. Financ. Anal. 48 (2016), 272–281.
DOI: Google ScholarCross Ref
- [16] . 2018. Deep neural networks understand investors better. Decis. Supp. Syst. 112 (2018), 23–34.
DOI: Google ScholarCross Ref
- [17] . 2013. Social media and firm equity value. Inf. Syst. Res. 24, 1 (2013), 146–163.Google Scholar
Digital Library
- [18] . 2016. Sentiment analysis of Twitter data for predicting stock market movements. In Proceedings of the International Conference on Signal Processing, Communication, Power and Embedded System (SCOPES’16). 1345–1350.Google Scholar
Cross Ref
- [19] . 2017. Influence of social media over the stock market. Psychol. Market. 34, 1 (2017), 101–108.
DOI: Google ScholarCross Ref
- [20] . 2015. Sentiment analysis on social media for stock movement prediction. Expert Syst. Appl. 42, 24 (2015), 9603–9611.
DOI: Google ScholarDigital Library
- [21] . 2015. Stock market prediction: A big data approach. In Proceedings of the IEEE Region 10 International Conference (TENCON’15). 1–5.Google Scholar
Cross Ref
- [22] . 2014. Exploiting social media for stock market prediction with factorization machine. In Proceedings of the IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI’14) and Intelligent Agent Technologies (IAT’14), Vol. 2. 142–149.Google Scholar
Digital Library
- [23] . 2017. Tracking multiple social media for stock market event prediction. In Advances in Data Mining. Applications and Theoretical Aspects, (Ed.). Springer International Publishing, Cham, 16–30.Google Scholar
- [24] . 2020. Measuring forecasting skill from text. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 5317–5331.
DOI: Google ScholarCross Ref
- [25] . 2021. Evaluating the rationales of amateur investors. In Proceedings of the International World Wide Web Conference (WWW’21). Association for Computing Machinery, New York, NY, 3987–3998.
DOI: Google ScholarDigital Library
- [26] . 1969. The adjustment of stock prices to new information. Int. Econ. Rev. 10, 1 (1969), 1–21. http://www.jstor.org/stable/2525569.Google Scholar
Cross Ref
- [27] . 1991. Efficient capital markets: II. J. Financ. 46, 5 (1991), 1575–1617.
DOI: Google ScholarCross Ref
- [28] . 1997. Fundamental analysis, future earnings, and stock prices. J. Account. Res. 35, 1 (1997), 1–24. http://www.jstor.org/stable/2491464.Google Scholar
Cross Ref
- [29] . 1992. Fundamental analysis and subsequent stock returns. J. Account. Econ. 15, 2 (1992), 413–442.
DOI: Google ScholarCross Ref
- [30] . 2019. Deep reinforcement learning for financial trading using price trailing. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’19). 3067–3071.Google Scholar
Cross Ref
- [31] . 2015. Stock market trading rule based on pattern recognition and technical analysis: Forecasting the DJIA index with intraday data. Expert Syst. Appl. 42, 14 (2015), 5963–5975.
DOI: Google ScholarDigital Library
- [32] . 2020. Continuous control with stacked deep dynamic recurrent reinforcement learning for portfolio optimization. Expert Syst. Appl. 140 (2020), 112891.
DOI: Google ScholarDigital Library
- [33] . 2011. Twitter mood predicts the stock market. J. Comput. Sci. 2, 1 (2011), 1–8.
DOI: Google ScholarCross Ref
- [34] . 2019. Sentiment analysis on stock social media for stock price movement prediction. Eng. Appl. Artif. Intell. 85 (2019), 569–578.
DOI: Google ScholarCross Ref
- [35] . 2012. Using Twitter Sentiments and Search Volumes Index to Predict Oil, Gold, Forex and Markets Indices.
Technical Report . Institute of Technology, Delhi, India.Google Scholar - [36] . 2013. Can facebook predict stock market activity? In AFA Meetings Paper.Google Scholar
- [37] . 2014. Experimental evidence of massive-scale emotional contagion through social networks. Proc. Natl. Acad. Sci. U.S.A. 111, 24 (2014), 8788–8790.Google Scholar
Cross Ref
- [38] . 2021. Predicting $ GME stock price movement using sentiment from Reddit r/wallstreetbets. In Proceedings of the 3rd Workshop on Financial Technology and Natural Language Processing. 22–30.Google Scholar
- [39] . 2021. Stock price prediction leveraging Reddit: The role of trust filter and sliding window. In Proceedings of the IEEE International Conference on Big Data (Big Data’21). 1054–1060.
DOI: Google ScholarCross Ref
- [40] . 2019. Leveraging social media to predict continuation and reversal in asset prices. Comput. Econ. (2019), 1–21.Google Scholar
- [41] . 2017. Learning stock market sentiment Lexicon and sentiment-oriented word vector from StockTwits. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL’17). Association for Computational Linguistics, Vancouver, Canada, 301–310.
DOI: Google ScholarCross Ref
- [42] . 2020. Issues and perspectives from 10,000 annotated financial social media data. In Proceedings of the 12th Language Resources and Evaluation Conference. European Language Resources Association, Paris, France, 6106–6110. https://aclanthology.org/2020.lrec-1.749.Google Scholar
- [43] . 2011. Identifying and following expert investors in stock microblogs. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1310–1319.Google Scholar
Digital Library
- [44] . 2021. Is domain adaptation worth your investment? Comparing BERT and FinBERT on financial tasks. In Proceedings of the 3rd Workshop on Economics and Natural Language Processing. Association for Computational Linguistics, 37–44.
DOI: Google ScholarCross Ref
- [45] . 2017. SemEval-2017 task 5: Fine-grained sentiment analysis on financial microblogs and news. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval’17). Association for Computational Linguistics, 519–535.
DOI: Google ScholarCross Ref
- [46] . 2015. Twitter financial community sentiment and its predictive relationship to stock market movement. Quant. Financ. 15, 10 (2015), 1637–1656.
DOI: Google ScholarCross Ref
- [47] . 2013. Improving LDA topic models for microblogs via tweet pooling and automatic labeling. In Proceedings of the ACM SIGIR Conference (SIGIR’13). Association for Computing Machinery, New York, NY, 889–892.
DOI: Google ScholarDigital Library
- [48] . 2016. Topic modeling in Twitter: Aggregating tweets by conversations. In Proceedings of the 10th International AAAI Conference on Web and Social Media.Google Scholar
- [49] . 2017. A longitudinal study of topic classification on Twitter. In Proceedings of the 11th International AAAI Conference on Web and Social Media a (ICWSM’17). 552–555.Google Scholar
Cross Ref
- [50] . 2022. A longitudinal study of topic classification on Twitter. PeerJ Comput. Sci. 8 (2022), e991.Google Scholar
Cross Ref
- [51] . 2014. VADER: A parsimonious rule-based model for sentiment analysis of social media text. In Proceedings of the 8th International AAAI Conference on Web and Social Media a (ICWSM’14).Google Scholar
Cross Ref
- [52] . 2021. On the evaluation and combination of state-of-the-art features in twitter sentiment analysis. Artif. Intell. Rev. 54, 3 (2021), 1887–1936.Google Scholar
Digital Library
- [53] . 2016. Sentibench-a benchmark comparison of state-of-the-practice sentiment analysis methods. EPJ Data Sci. 5, 1 (2016), 1–29.Google Scholar
Cross Ref
- [54] . 2019. A comprehensive study on lexicon based approaches for sentiment analysis. As. J. Comput. Sci. Technol. 8, S2 (2019), 1–6.Google Scholar
Cross Ref
- [55] . 2020. Using VADER sentiment and SVM for predicting customer response sentiment. Expert Syst. Appl. 162 (2020), 113746.
DOI: Google ScholarCross Ref
- [56] . 2021. Survey of Twitter viewpoint on application of drugs by VADER sentiment analysis among distinct countries. Int. J. Manage. Technol. Soc. Sci. 6, 1 (2021), 110–127.Google Scholar
- [57] . 2020. A complete VADER-based sentiment analysis of Bitcoin (BTC) tweets during the era of COVID-19. Big Data Cogn. Comput. 4, 4 (2020).
DOI: Google ScholarCross Ref
- [58] . 2018. Twitter sentiment analysis via Bi-sense emoji embedding and attention-based LSTM. In Proceedings of the 26th ACM International Conference on Multimedia (MM’18). Association for Computing Machinery, New York, NY, 117–125.
DOI: Google ScholarDigital Library
- [59] . 2019. A review of social media posts from UniCredit bank in Europe: A sentiment analysis approach. In Proceedings of the 3rd International Conference on Business and Information Management (ICBIM’19). Association for Computing Machinery, New York, NY, 74–79.
DOI: Google ScholarDigital Library
- [60] . 2006. Pattern Recognition and Machine Learning. Springer, Berlin.Google Scholar
Digital Library
- [61] . 2020. Spam filtering using a logistic regression model trained by an artificial bee colony algorithm. Appl. Soft Comput. 91 (2020), 106229.
DOI: Google ScholarCross Ref
- [62] . 2020. Evaluation of machine learning algorithms for predicting readmission after acute myocardial infarction using routinely collected clinical data. Can. J. Cardiol. 36, 6 (2020), 878–885.
DOI: Google ScholarCross Ref
- [63] . 2021. Bayesian group selection in logistic regression with application to MRI data analysis. Biometrics 77, 2 (2021), 391–400.Google Scholar
Cross Ref
- [64] . 2008. LIBLINEAR: A library for large linear classification. J. Mach. Learn. Res. 9 (August 2008), 1871–1874.Google Scholar
Digital Library
- [65] . 1990. Word association norms, mutual information, and lexicography. Comput. Ling. 16, 1 (1990), 22–29. https://www.aclweb.org/anthology/J90-1003.Google Scholar
Digital Library
- [66] . 2020. Language models are few-shot learners.
arxiv:cs.CL/2005.14165 . Retrieved from https://arxiv.org/abs/2005.14165.Google Scholar - [67] . 2021. All that’s “human” is not gold: Evaluating human evaluation of generated text. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, 7282–7296.
DOI: Google ScholarCross Ref
- [68] . 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 4171–4186.Google Scholar
Index Terms
A User-Centric Analysis of Social Media for Stock Market Prediction
Recommendations
Exploiting Social Media for Stock Market Prediction with Factorization Machine
WI-IAT '14: Proceedings of the 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) - Volume 02When the stock market has become more and more competitive, the stock market prediction has been a hot research topic. Traditional methods are based on historical stock data, which ignore the latest market information. Later although financial news is ...
Predicting Stock Market Price Movement Using Sentiment Analysis: Evidence From Ghana
AbstractPredicting the stock market remains a challenging task due to the numerous influencing factors such as investor sentiment, firm performance, economic factors and social media sentiments. However, the profitability and economic advantage associated ...
A complex adaptive agent modeling to predict the stock market prices
AbstractUnderstanding the behaviors of financial markets and their participants remains a challenging problem to resolve. Adaptive agents, which switch from fundamentalist to chartist behavior, are examined in some recent work. In this paper, ...






Comments