skip to main content
research-article

Applying Text Analytics to the Mind-section Literature of the Tibetan Tradition of the Great Perfection

Authors Info & Claims
Published:15 April 2021Publication History
Skip Abstract Section

Abstract

Over the past decade, through a mixture of optical character recognition and manual input, there is now a growing corpus of Tibetan literature available as e-texts in Unicode format. With the creation of such a corpus, the techniques of text analytics that have been applied in the analysis of English and other modern languages may now be applied to Tibetan. In this work, we narrow our focus to examine a modest portion of that literature, the Mind-section portion of the literature of the Tibetan tradition of the Great Perfection. Here, we will use the lens of text analytics tools based on machine learning techniques to investigate a number of questions of interest to scholars of this and related traditions of the Great Perfection. It has been necessary for us to participate in all portions of this process: corpora identification and text edition selection, rendering the text as e-texts in Unicode using both Optical Character Recognition and manual entry, data cleaning and transformation, implementation of software for text analysis, and interpretation of results. For this reason, we hope this study can serve as a model for other low-resource languages that are just beginning to approach the problem of providing text analytics for their language.

References

  1. Jean-Luc Achard. 1997. L'Essence Perlée du Secret. Brepols, Turnhout, Belgium.Google ScholarGoogle Scholar
  2. Ashutosh Adhikari, Achyudh Ram, Raphael Tang, and Jimmy Lin. 2019. DocBERT: BERT for document classification. arXiv:1904.08398 (2019).Google ScholarGoogle Scholar
  3. Charles Bazerman. 2003. Intertextuality: How texts rely on other texts. In What Writing Does and How It Does It. Routledge, 89–102.Google ScholarGoogle ScholarCross RefCross Ref
  4. David Beavan. 2008. Glimpses though the clouds: Collocates in a new light. In Digital Humanities 2008. University of Oulu, 53.Google ScholarGoogle Scholar
  5. Marcus Bingenheimer, Jen-Jou Hung, and Cheng-en Hsieh. 2017. Stylometric analysis of Chinese Buddhist texts—Do different Chinese translations of the Gaṇḍavyūha reflect stylistic features that are typical for their age? J. Japan. Assoc. Dig. Human. 2, 1 (2017), 1–30. DOI:https://doi.org/10.17928/jjadh.2.1_1Google ScholarGoogle Scholar
  6. J. N. G. Binongo and M. W. A. Smith. 1999. The application of principal component analysis to stylometry. Lit. Ling. Comput. 14, 4 (1999), 445–466. DOI:https://doi.org/10.1093/llc/14.4.445Google ScholarGoogle ScholarCross RefCross Ref
  7. Steven Bird, Edward Loper, and Ewan Klein. Natural Language Processing with Python. O'Reilly Media, Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. J. Mach. Learn. Res. 3, (2003), 993–1022. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Bryan Catanzaro, Narayanan Sundaram, and Kurt Keutzer. 2008. Fast support vector machine training and classification on graphics processors. In Proceedings of the 25th International Conference on Machine Learning. ACM, 104–111. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Krystyna Cech. 1986. The history, teaching and practice of dialectics according to the Bon tradition. Tibet J. 11, 2 (1986), 3–28.Google ScholarGoogle Scholar
  11. Rosa María Coyotl-Morales, Luis Villaseñor-Pineda, Manuel Montes-y-Gómez, and Paolo Rosso. 2006. Authorship attribution using word sequences. In Iberoamerican Congress on Pattern Recognition. Springer 844–853. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Weiwei Cui, Shixia Liu, Zhuofeng Wu, and Hao Wei. 2014. How hierarchical topics evolve in large text corpora. IEEE Trans. Visualiz. Comput. Graph. 20, 12 (2014), 2281–2290. DOI:https://doi.org/10.1109/TVCG.2014.2346433Google ScholarGoogle ScholarCross RefCross Ref
  13. Stefan Debortoli, Oliver Müller, Iris Junglas, and Jan vom Brocke. 2016. Text mining for information systems researchers: An annotated topic modeling tutorial. Commun. Assoc. Inf. Syst. 39, 1 (2016). DOI:https://doi.org/10.17705/1CAIS.03907Google ScholarGoogle Scholar
  14. Drang-srong-rnam-rgyal and Sga-ston Tshul-khrims-rgyal-mtshan. 2009. Bon gyi dpe dkon phyogs bsgrigs/Collection of rare Bonpo texts. Vajra Publications.Google ScholarGoogle Scholar
  15. Jeffrey Drouin. 2014. Close-and distant-reading modernism: Network analysis, text mining, and teaching the little review. J. Mod. Period. Stud. 5, 1 (2014), 110–135.Google ScholarGoogle ScholarCross RefCross Ref
  16. Drupchen Élie Roux, Ngawang Trinley, and Joyce Mackzenzie. 2019. Esukhia/pybo. Esukhia. Retrieved from: https://github.com/Esukhia/pybo.Google ScholarGoogle Scholar
  17. Zhao Geng, Tom Cheesman, Robert S. Laramee, Kevin Flanagan, and Stephan Thiel. 2015. ShakerVis: Visual analysis of segment variation of German translations of Shakespeare's Othello. Inf. Visualiz. 14, 4 (2015), 273–288. DOI:https://doi.org/10.1177/1473871613495845Google ScholarGoogle ScholarCross RefCross Ref
  18. David Germano. 2005. The funerary transformation of the great perfection (Rdzogs chen). J. Int. Assoc. Tibetan Stud. 1, (2005), 1–54.Google ScholarGoogle Scholar
  19. Wael H. Gomaa and Aly A. Fahmy. 2013. A survey of text similarity approaches. Int. J. Comput. Applic. 68, 13 (2013), 13–18.Google ScholarGoogle ScholarCross RefCross Ref
  20. Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. The MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Chodrak Gyatso. Volume 1 Mahamudra text collection. Dharmadownloads. Retrieved from: http://www.dharmadownload.net/pages/english/mahamudra/01_mahamudra%20Jazhung/001_mahamudra_jazhung.htm.Google ScholarGoogle Scholar
  22. Zellig S. Harris. 1954. Distributional structure. Word 10, 2–3 (1954), 146–162.Google ScholarGoogle ScholarCross RefCross Ref
  23. Trevor Hastie, Robert Tibshirani, and Jerome Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition. Springer Science & Business Media.Google ScholarGoogle Scholar
  24. John D. Hunter. 2007. Matplotlib: A 2D graphics environment. Comput. Sci. Eng. 9, 3 (2007), 90–95. DOI:https://doi.org/10.1109/MCSE.2007.55 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Forrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and<0.5 MB model size. arXiv preprint arXiv:1602.07360 (2016).Google ScholarGoogle Scholar
  26. Paul Jaccard. 1902. Lois de distribution florale dans la zone alpine. Bull. Soc. Vaudoise Sci. Nat. 38, (1902), 69–130.Google ScholarGoogle Scholar
  27. S. Jänicke, G. Franzini, M. F. Cheema, and G. Scheuermann. 2017. Visual text analysis in digital humanities. Comput. Graph. Forum 36, 6 (2017), 226–250. DOI:https://doi.org/10.1111/cgf.12873 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Stefan Jänicke, Greta Franzini, Muhammad Faisal Cheema, and Gerik Scheuermann. 2015. On close and distant reading in digital humanities: A survey and future challenges. In Eurographics Conference on Visualization (EuroVis) – STARs. 83–103.Google ScholarGoogle Scholar
  29. Stefan Jänicke, Annette Geßner, Marco Büchler, and Gerik Scheuermann. 2014. Visualizations for text re-use. In International Conference on Information Visualization Theory and Applications (IVAPP’14). 59–70.Google ScholarGoogle Scholar
  30. Matthew L. Jockers. 2012. Computing and visualizing the 19th-century literary genome. In Proceedings of the Digital Humanities. 242–244.Google ScholarGoogle Scholar
  31. Samten Karmay. 2007. The Great Perfection (rDzogs chen): A Philosophical and Meditative Teaching of Tibetan Buddhism. Second edition. Brill. Retrieved from: https://brill.com/view/title/12880.Google ScholarGoogle Scholar
  32. Samten Gyaltsen Karmay. 2005. The Treasury of Good Sayings: A Tibetan History of Bon. Motilal Banarsidass Publishing House.Google ScholarGoogle Scholar
  33. Kurt Keutzer. 2012. The nine cycles of the hidden, the nine mirrors, and nine minor texts on mind: Early mind section literature in Bon. Revue d'Etudes Tibétaines 24, (2012), 165–201.Google ScholarGoogle Scholar
  34. Kurt Keutzer. 2020. keutzer/bo-corpus-analytics. Retrieved from: https://github.com/keutzer/bo-corpus-analytics.Google ScholarGoogle Scholar
  35. Karen Liljenberg. 2012. A critical study of the thirteen later translations of the dzogchen mind series. PhD Dissertation. SOAS University of London, London, UK. Retrieved from: https://eprints.soas.ac.uk/15851/.Google ScholarGoogle Scholar
  36. Dan Martin. 2001. Unearthing Bon Treasures: Life and Contested Legacy of a Tibetan Scripture Revealer, with a General Bibliography of Bon. Brill.Google ScholarGoogle Scholar
  37. Klaus-Dieter Mathes. 2011. The collection of “indian mahamudra works” (phyag chen rgya gzhung) compiled by the seventh karma pa chos grags rgya mtsho. In Mahāmudrā and the Bka’-brgyud Tradition, Roger R. Jackson and Matthew T. Kapstein (Eds.). International Institute for Tibetan and Buddhist Studies. Andiast. S, 89–127.Google ScholarGoogle Scholar
  38. Michael Waskom, Olga Botvinnik, Joel Ostblom, Maoz Gelbart, Saulius Lukauskas, Paul Hobson, David C. Gemperline, Tom Augspurger, Yaroslav Halchenko, John B. Cole, Jordi Warmenhoven, Julian de Ruiter, Cameron Pye, Stephan Hoyer, Jake Vanderplas, Santi Villalba, Gero Kunter, Eric Quintero, Pete Bachant, Marcel Martin, Kyle Meyer, Corban Swain, Alistair Miles, Thomas Brunner, Drew O'Kane, Tal Yarkoni, Mike Lee Williams, Constantine Evans, Clark Fitzgerald, and Brian. 2020. mwaskom/seaborn: v0.10.1 (April 2020). Zenodo. DOI:https://doi.org/10.5281/zenodo.3767070Google ScholarGoogle Scholar
  39. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).Google ScholarGoogle Scholar
  40. Franco Moretti. 2016. Distant Reading. Konstanz University Press. Retrieved from: https://kops.uni-konstanz.de/handle/123456789/35563.Google ScholarGoogle Scholar
  41. Frederick Mosteller and David L. Wallace. 1963. Inference in an authorship problem. J. Amer. Statist. Assoc. 58, 302 (1963), 275–309. DOI:https://doi.org/10.2307/2283270Google ScholarGoogle Scholar
  42. Trevor Muñoz. 2013. Data curation as publishing for the digital humanities. J. Dig. Hum. 2, 3 (2013), 14–22.Google ScholarGoogle Scholar
  43. Aditi Muralidharan and Marti A. Hearst. 2013. Supporting exploratory text analysis in literature study. Lit. Ling. Comput. 28, 2 (2013), 283–295. DOI:https://doi.org/10.1093/llc/fqs044Google ScholarGoogle ScholarCross RefCross Ref
  44. Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, and Pasquale Lops. 2016. Learning word embeddings from Wikipedia for content-based recommender systems. In Advances in Information Retrieval (Lecture Notes in Computer Science). Springer International Publishing, Cham, 729–734. DOI:https://doi.org/10.1007/978-3-319-30671-1_60Google ScholarGoogle Scholar
  45. Tempestt Neal, Kalaivani Sundararajan, Aneez Fatima, Yiming Yan, Yingfei Xiang, and Damon Woodard. 2018. Surveying stylometry techniques and applications. ACM Comput. Surv. 50, 6 (2018), 86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Thubten Nyima. 2009. Snga ’Gyur Rgyud ’Bum Phyogs Bsgrigs. Mi rigs dpe skrun khang, Pe cin.Google ScholarGoogle Scholar
  47. Travis E. Oliphant. 2006. A Guide to NumPy. Trelgol Publishing, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Morten Ostensen. 2018. Reconsidering the contents and function of the rdzogs chen classifications of sems phyogs and sems sde. Rev. d'Etudes Tibétaines (2018), 32.Google ScholarGoogle Scholar
  49. Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. 2011. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 85 (2011), 2825–2830. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Fuchun Peng, Dale Schuurmans, and Shaojun Wang. 2004. Augmenting naive Bayes classifiers with statistical language models. Inf. Ret. 7, 3–4 (2004), 317–345. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Radim Rehurek and Petr Sojka. 2010. Software framework for topic modelling with large corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. 45–50.Google ScholarGoogle Scholar
  52. Drang-srong rNam-rgyal and Sga-ston Tshul-khrims-rgyal-mtshan. 2009. Bon gyi dpe dkon phyogs bsgrigs = Collection of rare Bonpo texts. Vajra Publications.Google ScholarGoogle Scholar
  53. Zach Rowinski and Kurt Keutzer. 2016. Namsel: An optical character recognition system for Tibetan text. Himal. Ling. 15, 1 (2016). Retrieved from: http://escholarship.org/uc/item/6d5781k5.pdf.Google ScholarGoogle Scholar
  54. Sam van Schaik. 2014a. The Tibetan Chan Manuscripts: A Complete Descriptive Catalogue of Tibetan Chan Texts in the Dunhuang Manuscript Collections. Sinor Research Institute for Inner Asian Studies Indiana University.Google ScholarGoogle Scholar
  55. Sam van Schaik. 2014b. Transliterations of Tibetan Chan manuscripts in the Stein and Pelliot collections. Retrieved from: http://idp.bl.uk/database/oo_cat.a4d?shortref=TibetanChanTransliterations_2014.Google ScholarGoogle Scholar
  56. Sheng Shen, Zhen Dong, Jiayu Ye, Linjian Ma, Zhewei Yao, Amir Gholami, Michael W. Mahoney, and Kurt Keutzer. 2020. Q-BERT: Hessian based ultra low precision quantization of BERT.Google ScholarGoogle Scholar
  57. Karen Sparck Jones. 1972. A statistical interpretation of term specificity and its application in retrieval. J. Docum. 28, 1 (1972), 11–21.Google ScholarGoogle ScholarCross RefCross Ref
  58. Smon rgyal lha sras (Ed.). 1999. theg chen g.yung drung bon gyi bka’ ’gyur. si khron zhing chen par khrun lte gnas par ’debs khang.Google ScholarGoogle Scholar
  59. Nicolas Tournadre. 2014. The Tibetic languages and their classification. In Trans-Himalayan Linguistics: Historical and Descriptive Linguistics of the Himalayan Area. De Gruyter, 105–129.Google ScholarGoogle Scholar
  60. Sam Van Schaik. 2004. The early days of the great perfection. J. Int. Assoc. Buddhist Stud. 27, 1 (2004), 165–206.Google ScholarGoogle Scholar
  61. Daniel Veidlinger (Ed.). 2019. Digital Humanities and Buddhism: An Introduction (1st ed.). De Gruyter.Google ScholarGoogle Scholar
  62. Vimalamitra. 2016. rdzogs chen rgyud bcu bdun volume 1. Si khron mi rigs dpe skrun khang, Khren tu'u.Google ScholarGoogle Scholar
  63. Vimalamitra. 2016. rdzogs chen rgyud bcu bdun volume 2. Si khron mi rigs dpe skrun khang, Khren tu'u.Google ScholarGoogle Scholar
  64. Ronald L. Wasserstein and Nicole A. Lazar. 2016. The ASA statement on p-values: Context, process, and purpose. Ame. Statist. 70, 2 (2016), 129–133. DOI:https://doi.org/10.1080/00031305.2016.1154108Google ScholarGoogle Scholar
  65. Mark Wolff. 2013. Surveying a corpus with alignment visualization and topic modeling. In Proceedings of the Digital Humanities Conference. 546.Google ScholarGoogle Scholar
  66. 2002. rdzogs pa chen po zhang zhung snyan rgyud bka’ rgyud skor bzhi. Triten Norbutse Library, Kathmandu, Nepal.Google ScholarGoogle Scholar
  67. 2003. bka’ ’gyur (dpe sdur ma). krung go'i bod rig pa'i dpe skrun khang, Beijing. Tibetan Buddhist Resource Center ID: W1PD96682.Google ScholarGoogle Scholar
  68. 2005. bla med rtdzogs pa chen po'i bka’ sems smad sde dgu'i skor bzhugs so (First ed.). Triten Norbutse Library, Kathmandu, Nepal.Google ScholarGoogle Scholar
  69. 2015. gsung rab sgo mdzod rin po che'i glegs bam. kan su'u mi rogs dpe skrun khang, Lanzhou.Google ScholarGoogle Scholar
  70. Tsadra Foundation's Treasury of Precious Instructions Cataloging Project. Retrieved from https://dnz.tsadra.org/index.php/Main_Page.Google ScholarGoogle Scholar
  71. Vairocana. 1971. The Rgyud 'bum of Vairocana: A Collection of Ancient Tantras and Esoteric Instructions Compiled and Translated by the Eighth Century Tibetan Master. S. W. Tashigangpa, Leh, Ladakh.Google ScholarGoogle Scholar

Index Terms

  1. Applying Text Analytics to the Mind-section Literature of the Tibetan Tradition of the Great Perfection

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Asian and Low-Resource Language Information Processing
          ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 20, Issue 2
          March 2021
          313 pages
          ISSN:2375-4699
          EISSN:2375-4702
          DOI:10.1145/3454116
          Issue’s Table of Contents

          Copyright © 2021 Copyright held by the owner/author(s). Publication rights licensed to ACM.

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 15 April 2021
          • Online AM: 7 May 2020
          • Accepted: 1 April 2020
          • Received: 1 January 2020
          Published in tallip Volume 20, Issue 2

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!