skip to main content
research-article
Open Access

Explaining Financial Uncertainty through Specialized Word Embeddings

Published:12 March 2020Publication History
Skip Abstract Section

Abstract

The detection of vague, speculative, or otherwise uncertain language has been performed in the encyclopedic, political, and scientific domains yet left relatively untouched in finance. However, the latter benefits from public sources of big financial data that can be linked with extracted measures of linguistic uncertainty as a mean of extrinsic model validation. Doing so further helps in understanding how the linguistic uncertainty of financial disclosures might induce financial uncertainty to the market. To explore this field, we use term weighting methods to detect linguistic uncertainty in a large dataset of financial disclosures. As a baseline, we use an existing dictionary of financial uncertainty triggers; furthermore, we retrieve related terms in specialized word embedding models to automatically expand this dictionary. Apart from an industry-agnostic expansion, we create expansions incorporating industry-specific jargon. In a set of cross-sectional event study regressions, we show that the such enriched dictionary explains a significantly larger share of future volatility, a common financial uncertainty measure, than before. Furthermore, we show that—different to the plain dictionary—our embedding models are well suited to explain future analyst forecast uncertainty. Notably, our results indicate that enriching the dictionary with industry-specific vocabulary explains a significantly larger share of financial uncertainty than an industry-agnostic expansion.

References

  1. Tim Bollerslev. 1986. Generalized autoregressive conditional heteroskedasticity. J. Econometr. 3, 31 (1986), 307--327.Google ScholarGoogle Scholar
  2. Samuel B. Bonsall, Andrew J. Leone, Brian P. Miller, and Kristina Rennekamp. 2017. A plain english measure of financial reporting readability. J. Account. Econ. 63, 2 (2017), 329--357.Google ScholarGoogle Scholar
  3. Brian J. Bushee, Ian D. Gow, and Daniel J. Taylor. 2018. Linguistic complexity in firm disclosures: Obfuscation or information? J. Account. Res. 56, 1 (2018), 85--121.Google ScholarGoogle Scholar
  4. Louis K. C. Chan, Josef Lakonishok, and Bhaskaran Swaminathan. 2007. Industry classifications and return comovement. Financ. Anal. J. 63, 6 (2007), 56--70.Google ScholarGoogle ScholarCross RefCross Ref
  5. Jacob Cohen. 1960. A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20, 1 (1960), 41--48.Google ScholarGoogle Scholar
  6. Travis Dyer, Mark Lang, and Lorien Stice-Lawrence. 2017. The evolution of 10-k textual disclosure: Evidence from latent dirichlet allocation. J. Account. Econ. 64, 2 (2017), 221--245.Google ScholarGoogle ScholarCross RefCross Ref
  7. Eugene F. Fama and Kenneth R. French. 1993. Common risk factors in the returns on stocks and bonds. J. Financ. Econ. 33, 1 (1993), 3--56.Google ScholarGoogle ScholarCross RefCross Ref
  8. Eugene F. Fama and Kenneth R. French. 1997. Industry costs of equity. J. Financ. Econ. 43, 2 (1997), 153--193.Google ScholarGoogle ScholarCross RefCross Ref
  9. Center for Research in Security Prices (CRSP). 2018. CRSP Indexes: Methodology Guide.Google ScholarGoogle Scholar
  10. Goran Glavaš and Sanja Štajner. 2015. Simplifying lexical simplification: Do we need simplified corpora? In Proceedings of the Conference of the Association for Computational Linguistics and the International Joint Conference on Natural Language Processing (ACL--IJCNLP’15). Association for Computational Linguistics, Stroudsburg, PA, 63--68.Google ScholarGoogle Scholar
  11. Robert Gunning. 1952. The Technique of Clear Writing. McGraw--Hill, New York, NY.Google ScholarGoogle Scholar
  12. Jack H. Hiller. 1971. Verbal response indicators of conceptual vagueness. Am. Educ. Res. J. 8, 1 (1971), 151--161.Google ScholarGoogle Scholar
  13. George Lakoff. 1973. Hedges: A study in meaning criteria and the logic of fuzzy concepts. J. Philos. Logic 2, 4 (1973), 458--508.Google ScholarGoogle ScholarCross RefCross Ref
  14. J. Richard Landis and Gary G. Koch. 1977. The measurement of observer agreement for categorical data. Biometrics 33, 1 (1977), 159--174.Google ScholarGoogle Scholar
  15. Reuven Lehavy, Feng Li, and Kenneth Merkley. 2011. The effect of annual report readability on analyst following and the properties of their earnings forecasts. Account. Rev. 86, 3 (2011), 1087--1115.Google ScholarGoogle ScholarCross RefCross Ref
  16. Feng Li. 2008. Annual report readability, current earnings, and earnings persistence. J. Account. Econ. 45, 2--3 (2008), 221--247.Google ScholarGoogle Scholar
  17. Jun Li and Xiaofei Zhao. 2015. Complexity and Information Content of Financial Disclosures: Evidence from Evolution of Uncertainty Following 10-K Filings. (2015).Google ScholarGoogle Scholar
  18. Tim Loughran and Bill McDonald. 2011. When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks. J. Financ. 66, 1 (2011), 35--65.Google ScholarGoogle ScholarCross RefCross Ref
  19. Tim Loughran and Bill McDonald. 2014. Measuring readability in financial disclosures. J. Financ. 69, 4 (2014), 1643--1671.Google ScholarGoogle ScholarCross RefCross Ref
  20. Tim Loughran and Bill McDonald. 2016. Textual analysis in accounting and finance: A survey. J. Account. Res. 54, 4 (2016), 1187--1230.Google ScholarGoogle ScholarCross RefCross Ref
  21. Adam Meirowitz. 2005. Informational party primaries and strategic ambiguity. J. Theor. Pol. 17, 1 (2005), 107--136.Google ScholarGoogle Scholar
  22. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. Arxiv E-prints 1301.3781 (2013).Google ScholarGoogle Scholar
  23. Christoph E. Moody. 2016. Mixing dirichlet topic models and word embeddings to make lda2vec. Arxiv E-prints 1605.02019 (2016).Google ScholarGoogle Scholar
  24. Gustavo Henrique Paetzold and Lucia Specia. 2016. Unsupervised lexical simplification for non-native speakers. In Proceedings of the AAAI Conference on Artificial Intelligence. Association for the Advancements of Artificial Intelligence, Palo Alto, CA, 3761--3767.Google ScholarGoogle Scholar
  25. Martin F. Porter. 1980. An algorithm for suffix stripping. Program 14, 3 (1980), 130--137.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Navid Rekabsaz, Mihai Lupu, Artem Baklanov, Allan Hanbury, Alexander Duer, and Linda Anderson. 2017. Volatility prediction using financial disclosures sentiments with word embedding-based IR models. In Proceedings of the Conference of the Association for Computational Linguistics (ACL’17). Association for Computational Linguistics, Stroudsburg, PA, 1712--1721.Google ScholarGoogle Scholar
  27. Navid Rekabsaz, Mihai Lupu, and Allan Hanbury. 2016. Uncertainty in neural network word embedding: Exploration of threshold for similarity. In Proceedings of the Neu-IR Workshop at the ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, NY.Google ScholarGoogle Scholar
  28. Navid Rekabsaz, Mihai Lupu, and Allan Hanbury. 2017. Exploration of a threshold for similarity based on uncertainty in word embedding. In Proceedings of the European Conference on Information Retrieval (ECIR). Springer, Cham, Switzerland.Google ScholarGoogle Scholar
  29. Jonathan L. Rogers. 2008. Disclosure quality and management trading incentives. J. Account. Res. 46, 5 (2008), 1265--1296.Google ScholarGoogle Scholar
  30. William F. Sharpe. 1963. A simplified model for portfolio analysis. Manage. Sci. 9, 2 (1963), 277--293.Google ScholarGoogle Scholar
  31. Christoph Kilian Theil, Sanja Štajner, and Heiner Stuckenschmidt. 2018. Word embeddings-based uncertainty detection in financial disclosures. In Proceedings of the ACL Workshop on Economics and Natural Language Processing (ECONLP’18). Association for Computational Linguistics, Stroudsburg, PA, 32--37.Google ScholarGoogle Scholar
  32. Ming-Feng Tsai and Chuan-Ju Wang. 2014. Financial keyword expansion via continuous word vector representations. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14). Association for Computational Linguistics, Stroudsburg, PA, 1453--1458.Google ScholarGoogle Scholar
  33. Ming-Feng Tsai, Chuan-Ju Wang, and Po-Chuan Chien. 2016. Discovering finance keywords via continuous-space language models. ACM Trans. Manage. Inf. Syst. 7, 3 (2016), 1--17.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Maximilian A. M. Vermorken. 2011. GICS or ICB, How different is similar? J. Asset Manage. 12, 1 (2011), 30--44.Google ScholarGoogle Scholar
  35. George Kingsley Zipf. 1949. Human Behavior and the Principle of Least Effort. Addison-Wesley, Boston, MA (USA).Google ScholarGoogle Scholar

Index Terms

  1. Explaining Financial Uncertainty through Specialized Word Embeddings

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!