Abstract
The detection of vague, speculative, or otherwise uncertain language has been performed in the encyclopedic, political, and scientific domains yet left relatively untouched in finance. However, the latter benefits from public sources of big financial data that can be linked with extracted measures of linguistic uncertainty as a mean of extrinsic model validation. Doing so further helps in understanding how the linguistic uncertainty of financial disclosures might induce financial uncertainty to the market. To explore this field, we use term weighting methods to detect linguistic uncertainty in a large dataset of financial disclosures. As a baseline, we use an existing dictionary of financial uncertainty triggers; furthermore, we retrieve related terms in specialized word embedding models to automatically expand this dictionary. Apart from an industry-agnostic expansion, we create expansions incorporating industry-specific jargon. In a set of cross-sectional event study regressions, we show that the such enriched dictionary explains a significantly larger share of future volatility, a common financial uncertainty measure, than before. Furthermore, we show that—different to the plain dictionary—our embedding models are well suited to explain future analyst forecast uncertainty. Notably, our results indicate that enriching the dictionary with industry-specific vocabulary explains a significantly larger share of financial uncertainty than an industry-agnostic expansion.
- Tim Bollerslev. 1986. Generalized autoregressive conditional heteroskedasticity. J. Econometr. 3, 31 (1986), 307--327.Google Scholar
- Samuel B. Bonsall, Andrew J. Leone, Brian P. Miller, and Kristina Rennekamp. 2017. A plain english measure of financial reporting readability. J. Account. Econ. 63, 2 (2017), 329--357.Google Scholar
- Brian J. Bushee, Ian D. Gow, and Daniel J. Taylor. 2018. Linguistic complexity in firm disclosures: Obfuscation or information? J. Account. Res. 56, 1 (2018), 85--121.Google Scholar
- Louis K. C. Chan, Josef Lakonishok, and Bhaskaran Swaminathan. 2007. Industry classifications and return comovement. Financ. Anal. J. 63, 6 (2007), 56--70.Google Scholar
Cross Ref
- Jacob Cohen. 1960. A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20, 1 (1960), 41--48.Google Scholar
- Travis Dyer, Mark Lang, and Lorien Stice-Lawrence. 2017. The evolution of 10-k textual disclosure: Evidence from latent dirichlet allocation. J. Account. Econ. 64, 2 (2017), 221--245.Google Scholar
Cross Ref
- Eugene F. Fama and Kenneth R. French. 1993. Common risk factors in the returns on stocks and bonds. J. Financ. Econ. 33, 1 (1993), 3--56.Google Scholar
Cross Ref
- Eugene F. Fama and Kenneth R. French. 1997. Industry costs of equity. J. Financ. Econ. 43, 2 (1997), 153--193.Google Scholar
Cross Ref
- Center for Research in Security Prices (CRSP). 2018. CRSP Indexes: Methodology Guide.Google Scholar
- Goran Glavaš and Sanja Štajner. 2015. Simplifying lexical simplification: Do we need simplified corpora? In Proceedings of the Conference of the Association for Computational Linguistics and the International Joint Conference on Natural Language Processing (ACL--IJCNLP’15). Association for Computational Linguistics, Stroudsburg, PA, 63--68.Google Scholar
- Robert Gunning. 1952. The Technique of Clear Writing. McGraw--Hill, New York, NY.Google Scholar
- Jack H. Hiller. 1971. Verbal response indicators of conceptual vagueness. Am. Educ. Res. J. 8, 1 (1971), 151--161.Google Scholar
- George Lakoff. 1973. Hedges: A study in meaning criteria and the logic of fuzzy concepts. J. Philos. Logic 2, 4 (1973), 458--508.Google Scholar
Cross Ref
- J. Richard Landis and Gary G. Koch. 1977. The measurement of observer agreement for categorical data. Biometrics 33, 1 (1977), 159--174.Google Scholar
- Reuven Lehavy, Feng Li, and Kenneth Merkley. 2011. The effect of annual report readability on analyst following and the properties of their earnings forecasts. Account. Rev. 86, 3 (2011), 1087--1115.Google Scholar
Cross Ref
- Feng Li. 2008. Annual report readability, current earnings, and earnings persistence. J. Account. Econ. 45, 2--3 (2008), 221--247.Google Scholar
- Jun Li and Xiaofei Zhao. 2015. Complexity and Information Content of Financial Disclosures: Evidence from Evolution of Uncertainty Following 10-K Filings. (2015).Google Scholar
- Tim Loughran and Bill McDonald. 2011. When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks. J. Financ. 66, 1 (2011), 35--65.Google Scholar
Cross Ref
- Tim Loughran and Bill McDonald. 2014. Measuring readability in financial disclosures. J. Financ. 69, 4 (2014), 1643--1671.Google Scholar
Cross Ref
- Tim Loughran and Bill McDonald. 2016. Textual analysis in accounting and finance: A survey. J. Account. Res. 54, 4 (2016), 1187--1230.Google Scholar
Cross Ref
- Adam Meirowitz. 2005. Informational party primaries and strategic ambiguity. J. Theor. Pol. 17, 1 (2005), 107--136.Google Scholar
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. Arxiv E-prints 1301.3781 (2013).Google Scholar
- Christoph E. Moody. 2016. Mixing dirichlet topic models and word embeddings to make lda2vec. Arxiv E-prints 1605.02019 (2016).Google Scholar
- Gustavo Henrique Paetzold and Lucia Specia. 2016. Unsupervised lexical simplification for non-native speakers. In Proceedings of the AAAI Conference on Artificial Intelligence. Association for the Advancements of Artificial Intelligence, Palo Alto, CA, 3761--3767.Google Scholar
- Martin F. Porter. 1980. An algorithm for suffix stripping. Program 14, 3 (1980), 130--137.Google Scholar
Digital Library
- Navid Rekabsaz, Mihai Lupu, Artem Baklanov, Allan Hanbury, Alexander Duer, and Linda Anderson. 2017. Volatility prediction using financial disclosures sentiments with word embedding-based IR models. In Proceedings of the Conference of the Association for Computational Linguistics (ACL’17). Association for Computational Linguistics, Stroudsburg, PA, 1712--1721.Google Scholar
- Navid Rekabsaz, Mihai Lupu, and Allan Hanbury. 2016. Uncertainty in neural network word embedding: Exploration of threshold for similarity. In Proceedings of the Neu-IR Workshop at the ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, NY.Google Scholar
- Navid Rekabsaz, Mihai Lupu, and Allan Hanbury. 2017. Exploration of a threshold for similarity based on uncertainty in word embedding. In Proceedings of the European Conference on Information Retrieval (ECIR). Springer, Cham, Switzerland.Google Scholar
- Jonathan L. Rogers. 2008. Disclosure quality and management trading incentives. J. Account. Res. 46, 5 (2008), 1265--1296.Google Scholar
- William F. Sharpe. 1963. A simplified model for portfolio analysis. Manage. Sci. 9, 2 (1963), 277--293.Google Scholar
- Christoph Kilian Theil, Sanja Štajner, and Heiner Stuckenschmidt. 2018. Word embeddings-based uncertainty detection in financial disclosures. In Proceedings of the ACL Workshop on Economics and Natural Language Processing (ECONLP’18). Association for Computational Linguistics, Stroudsburg, PA, 32--37.Google Scholar
- Ming-Feng Tsai and Chuan-Ju Wang. 2014. Financial keyword expansion via continuous word vector representations. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14). Association for Computational Linguistics, Stroudsburg, PA, 1453--1458.Google Scholar
- Ming-Feng Tsai, Chuan-Ju Wang, and Po-Chuan Chien. 2016. Discovering finance keywords via continuous-space language models. ACM Trans. Manage. Inf. Syst. 7, 3 (2016), 1--17.Google Scholar
Digital Library
- Maximilian A. M. Vermorken. 2011. GICS or ICB, How different is similar? J. Asset Manage. 12, 1 (2011), 30--44.Google Scholar
- George Kingsley Zipf. 1949. Human Behavior and the Principle of Least Effort. Addison-Wesley, Boston, MA (USA).Google Scholar
Index Terms
Explaining Financial Uncertainty through Specialized Word Embeddings
Recommendations
Unsupervised Bilingual Sentiment Word Embeddings for Cross-lingual Sentiment Classification
ICIAI '20: Proceedings of the 2020 the 4th International Conference on Innovation in Artificial IntelligenceIn recent years, bilingual word embeddings have been used to promote sentiment classification task in low-resource languages. However, existing bilingual word embedding methods either require annotated cross-lingual data or fail to capture enough ...
Graph Attention Network for Word Embeddings
Artificial Intelligence and SecurityAbstractThe word embeddings approaches have attracted extensive attention and widely used in many natural language processing (NLP) tasks. Relatedness between words can be reflected in vector space by word embeddings. However, the current word embeddings ...
Monolingual and Cross-Lingual Information Retrieval Models Based on (Bilingual) Word Embeddings
SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information RetrievalWe propose a new unified framework for monolingual (MoIR) and cross-lingual information retrieval (CLIR) which relies on the induction of dense real-valued word vectors known as word embeddings (WE) from comparable data. To this end, we make several ...






Comments