skip to main content
research-article

Taming the Wild Etext: Managing, Annotating, and Sharing Tibetan Corpora in Open Spaces

Published:23 April 2021Publication History
Skip Abstract Section

Abstract

Digital text is quickly becoming essential to modern daily life. The article you are reading right now is born digital; unlike texts of the not-so-distant past, it may never be printed at all. Worldwide, the trend is clear: Digital text is on the way in, and print is on its way out. Year-by-year, more and more readers are turning to ebooks, internet news, and other forms of ereading, while generation by generation, print is becoming less and less relevant.1

  • 1 Pew research shows 50% of Americans have a dedicated ereading device, with yearly gains in ereadership [1]; industry research, too, shows a definite trend toward ereading and non-traditional publishing, with ebooks making up 50% of fiction reading in 2016 [2], while journalism is also trending online [3].

  • These trends are not unique to English—to meet the demands and expectations of today's readers, Tibetan texts, too, are being digitized by many organizations and institutions with a shared appreciation for the Tibetan literary heritage. They include a variety of secular publishers, monastic institutions, and Buddhist foundations, among others. But while these organizations share common goals for common texts, their work is all too frequently completely disconnected from the community at large.

    This situation negatively impacts what is already a minoritized and under-resourced language. While competition—from other languages, as well as other publishers in the Tibetan etext world—has been a driver of innovation in the adoption of ereading technology, we believe that a rich, shared data source is not only in everyone's best interest but also the only practical way forward when we consider the time, effort, expertise, and money that quality digitization takes.

    That is why we have designed OpenPecha to be a public, open platform for collaborative etext curation and annotation sharing. Its aim is providing a wide range of users with the latest version of the exact “view” of any text needed, while maintaining the integrity of the text and its annotations and simultaneously allowing for community improvements and additions. In this article, we explore the details of how the project came to be, what it is, and how it works, while also presenting a few common use cases.

    References

    1. Kathryn Zickuhr and Lee Raine. 2014. Tablet and E-reader Ownership. Pew Research Center. Retrieved from https://www.pewresearch.org/internet/2014/01/16/tablet-and-e-reader-ownership/.Google ScholarGoogle Scholar
    2. Steve Bohme. 2017. Books and Consumers in 2016. Nielsen Book Research. Retrieved from https://quantum.londonbookfair.co.uk/RXUK/RXUK_PDMC/responsive/images/2017/Steve%20Bohme%20-%20The%202016%20Book%20Market%20Highlights%20from%20the%20Books%20and%20Consumers%202016%20Survey.pdf?v=636257834636180655.Google ScholarGoogle Scholar
    3. Paul Grabowicz. 2020. The Transition to Digital Journalism. Berkeley Advanced Media Institute. Retrieved from https://multimedia.journalism.berkeley.edu/tutorials/digital-transform/.Google ScholarGoogle Scholar
    4. V. Alfano (Ed.) and A. Stauffer (Series Ed.). Virtual Victorians: Networks, Connections, Technologies (1st ed.). Palgrave Macmillan.Google ScholarGoogle Scholar
    5. Paul Eggert. 2016. The book, the e-text and the 'work-site'. In Text Editing, Print and the Digital World, Marilyn Deegan and Kathryn Sutherland (Eds.). Literary and Linguistic Computing, Vol. 25, 360–362. 10.1093/llc/fqq009.Google ScholarGoogle Scholar
    6. Text Encoding Initiative (TEI). 2021. Guidelines for Electronic Text Encoding and Interchange. Retrieved from https://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-standOff.html.Google ScholarGoogle Scholar
    7. Javier Pose Rodriguez. 2013. “A Generic Formalism for Encoding Standoff annotations in TEI”. Retrieved from https://sourceforge.net/p/tei/feature-requests/_discuss/thread/1e0e4acb/13db/attachment/A%20Generic%20Formalism%20for%20Encoding%20Standoff%20annotations%20in%20TEI.pdf.Google ScholarGoogle Scholar
    8. Nancy Ide, Christian Chiarcos, Manfred Stede, and Steve Cassidy. 2017. Designing Annotation Schemes: From Model to Representation.Google ScholarGoogle Scholar
    9. https://www.knora.org/.Google ScholarGoogle Scholar
    10. Desmond Allen Schmidt. 2016. Using standoff properties for marking-up historical documents in the humanities. Inf. Technol. 58, 2 (2016), 63–69.Google ScholarGoogle Scholar
    11. See RATT retrieved from http://ratt.sourceforge.net/ or Gneious retrieved from https://www.geneious.com/.Google ScholarGoogle Scholar
    12. Stephen M. Miller and Robert V. Huber. 2004. The Bible: A History. Good Books.Google ScholarGoogle Scholar
    13. Bruce M. Metzger. 1977. The Early Versions of the New Testament: Their Origin, Transmission and Limitations. Oxford University Press.Google ScholarGoogle Scholar
    14. Strong James. 1890. The Exhaustive Concordance of the Bible. Jennings & Graham, Cincinnati.Google ScholarGoogle Scholar
    15. Logos Bible Software. 2020. Website https://www.logos.com/.Google ScholarGoogle Scholar
    16. The Sefaria Library. Website https://www.sefaria.org/texts.Google ScholarGoogle Scholar
    17. Bod lJongs Rig gNas Dra ba. Website http://zw.tibetculture.org.cn/.Google ScholarGoogle Scholar
    18. Buddhist Digital Resource Center. 2017. Website https://www.tbrc.org/.Google ScholarGoogle Scholar
    19. Dharma Downloads. 2020. Website http://www.dharmadownload.net/.Google ScholarGoogle Scholar
    20. Dharma Ebooks. 2021. Website https://dharmaebooks.org/.Google ScholarGoogle Scholar
    21. Drepung Gomang Library Ebook Portal. Website http://www.gomanglibrary.com/.Google ScholarGoogle Scholar
    22. Sakya Digital Library. Website http://sakyalibrary.com/Home/Index.Google ScholarGoogle Scholar
    23. Sera Jey Rigzod Chenmo. 2021. Website https://www.serajeyrigzodchenmo.org/.Google ScholarGoogle Scholar
    24. Tibetan Ebooks. 2019. Website http://tibetanebooks.com/.Google ScholarGoogle Scholar
    25. Timeless Treasuries. 2021. Website http://dharmacloud.tsadra.org/library/.Google ScholarGoogle Scholar
    26. Dege Kangyur (sde dge bka’ ‘gyur). Degé Publishing House. Digitization/ed. Esukhia. 2014. Retrieved from http://www.thlib.org/encyclopedias/literary/canons/kt/catalog.php#cat=d.Google ScholarGoogle Scholar
    27. Dege Kangyur (sde dge bka’ ‘gyur). Degé Publishing House. Digitization/ed. Esukhia. 2015. Retrieved from https://adarsha.dharma-treasure.org/kdbs/degekangyur?fbclid=IwAR3q8ZABIxXXvH02mW6IFO1D0rFFsy8ZGiKRWTsXJ8ScGz7qhe9udcV3Y8Q.Google ScholarGoogle Scholar
    28. Dege Kangyur (sde dge bka’ ‘gyur). Degé Publishing House. Digitization/ed. Esukhia. 2018. Retrieved from https://www.tbrc.org/#!rid=W4CZ5369.Google ScholarGoogle Scholar
    29. Dege Kangyur (sde dge bka’ ‘gyur). Degé Publishing House. Digitization/ed. Esukhia. 2019. Retrieved from https://github.com/84000/data-translation-memory.Google ScholarGoogle Scholar
    30. Dege Kangyur (sde dge bka’ ‘gyur). Degé Publishing House. Digitization/ed. Esukhia. 2020. Retrieved from https://github.com/Esukhia/derge-kangyur.Google ScholarGoogle Scholar
    31. https://staging.nalanda.works/.Google ScholarGoogle Scholar
    32. Daniel Wakelin. 2014. Scribal Correction and Literary Craft. Cambridge University Press.Google ScholarGoogle Scholar
    33. Prose. Website https://prose.io/.Google ScholarGoogle Scholar
    34. https://github.com/OpenPecha/openpecha-toolkit.Google ScholarGoogle Scholar
    35. https://github.com/OpenPecha/hfml.Google ScholarGoogle Scholar
    36. Overview of the BIBFRAME 2.0 Model. 2016. Website https://www.loc.gov/bibframe/docs/bibframe2-model.html.Google ScholarGoogle Scholar
    37. https://github.com/OpenPecha.Google ScholarGoogle Scholar
    38. Hypothesis. Website https://web.hypothes.is/.Google ScholarGoogle Scholar
    39. Graham Barwell, Chris Tiffin, Phillip Berrie, and Paul Eggert. 2001. The Authenticated Electronic Editions Project. Computing Arts. Retrieved from https://www.academia.edu/708529/The_authenticated_electronic_editions_project.Google ScholarGoogle Scholar
    40. Universal Dependencies. 2017. CoNLL-U Format. Retrieved from https://universaldependencies.org/format.html.Google ScholarGoogle Scholar
    41. Diff-Match-Patch. 2006. Google. Retrieved from https://github.com/google/diff-match-patch.Google ScholarGoogle Scholar
    42. A. Korzybski and R. P. Pula. 2005. Science and sanity: An introduction to non-aristotelian systems and general semantics. Institute of General Semantics, Texas.Google ScholarGoogle Scholar
    43. Eric Miller, Uche Ogbuji, Victoria Mueller, Kathy MacDougall. 2012. Bibliographic Framework as a Web of Data: Linked Data Model and Supporting Services. Library of Congress. Retrieved from https://www.loc.gov/bibframe/pdf/marcld-report-11-21-2012.pdf.Google ScholarGoogle Scholar
    44. BDRC and Esukhia. 2019. The OpenPecha Project:. OpenPecha Catalog, OpenPecha Template, OpenPecha Toolkit, OpenPecha Abstract Demo. Retrieved from https://github.com/OpenPecha.Google ScholarGoogle Scholar
    45. Thomas D. Otto, Gary P. Dillon, Wim S. Degrave, and Matthew Berriman. 2011. RATT: Rapid annotation transfer tool. Nucleic Acids Res. 39, 9 (2011), e57.Google ScholarGoogle ScholarCross RefCross Ref
    46. Sanderson Robert et al. (Editors) 2013. Open Annotation Data Model. Retrieved from http://www.openannotation.org/spec/core/20130208/.Google ScholarGoogle Scholar
    47. Peter L. Shillingsburg. 2006. From Gutenberg to Google. Cambridge University Press.Google ScholarGoogle Scholar
    48. Standoff/RDF Text Markup. 2019. Knora Documentation. Retrieved from https://docs.knora.org/paradox/01-introduction/standoff-rdf.html.Google ScholarGoogle Scholar
    49. John Weinstein. 2020. Field Museum. Retrieved from https://www.eurekalert.org/multimedia/pub/211088.php.Google ScholarGoogle Scholar
    50. Retrieved from https://www.pexels.com/photo/selective-focus-photography-of-octopus-3046629/.Google ScholarGoogle Scholar
    51. Bański. 2010. Why TEI standoff annotation doesn't quite work: and why you might want to use it nevertheless. In Proceedings of Balisage: The Markup Conference, 2010. Vol. 5 of Balisage Series on Markup Technologies.Google ScholarGoogle Scholar
    52. Goecke Lüngen and Metzing Stührenberg. 2010. Different views on markup: Distinguishing levels and layers. In Linguistic Modeling of Information and Markup Languages, Text, Speech and Language Technology. 1–21.Google ScholarGoogle Scholar
    53. Geer Benjamin. 2016. Redesign Standoff. Knora API: Github Issue Thread. Retrieved from https://github.com/dasch-swiss/knora-api/issues/101.Google ScholarGoogle Scholar
    54. Project Gutenberg. Website https://www.gutenberg.org/.Google ScholarGoogle Scholar
    55. Distributed Proofreaders. 2021. Website https://www.pgdp.net/.Google ScholarGoogle Scholar
    56. Gitenberg. Website https://www.gitenberg.org/.Google ScholarGoogle Scholar
    57. Wikibooks. Website https://en.wikibooks.org/wiki/Main_Page.Google ScholarGoogle Scholar

    Index Terms

    1. Taming the Wild Etext: Managing, Annotating, and Sharing Tibetan Corpora in Open Spaces

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM Transactions on Asian and Low-Resource Language Information Processing
              ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 20, Issue 2
              March 2021
              313 pages
              ISSN:2375-4699
              EISSN:2375-4702
              DOI:10.1145/3454116
              Issue’s Table of Contents

              Copyright © 2021 Copyright held by the owner/author(s). Publication rights licensed to ACM.

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 23 April 2021
              • Accepted: 1 August 2020
              • Revised: 1 May 2020
              • Received: 1 November 2019
              Published in tallip Volume 20, Issue 2

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Refereed
            • Article Metrics

              • Downloads (Last 12 months)21
              • Downloads (Last 6 weeks)4

              Other Metrics

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            HTML Format

            View this article in HTML Format .

            View HTML Format
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!