Abstract
Digital text is quickly becoming essential to modern daily life. The article you are reading right now is born digital; unlike texts of the not-so-distant past, it may never be printed at all. Worldwide, the trend is clear: Digital text is on the way in, and print is on its way out. Year-by-year, more and more readers are turning to ebooks, internet news, and other forms of ereading, while generation by generation, print is becoming less and less relevant.1
1 Pew research shows 50% of Americans have a dedicated ereading device, with yearly gains in ereadership [1]; industry research, too, shows a definite trend toward ereading and non-traditional publishing, with ebooks making up 50% of fiction reading in 2016 [2], while journalism is also trending online [3].
These trends are not unique to English—to meet the demands and expectations of today's readers, Tibetan texts, too, are being digitized by many organizations and institutions with a shared appreciation for the Tibetan literary heritage. They include a variety of secular publishers, monastic institutions, and Buddhist foundations, among others. But while these organizations share common goals for common texts, their work is all too frequently completely disconnected from the community at large.
This situation negatively impacts what is already a minoritized and under-resourced language. While competition—from other languages, as well as other publishers in the Tibetan etext world—has been a driver of innovation in the adoption of ereading technology, we believe that a rich, shared data source is not only in everyone's best interest but also the only practical way forward when we consider the time, effort, expertise, and money that quality digitization takes.
That is why we have designed OpenPecha to be a public, open platform for collaborative etext curation and annotation sharing. Its aim is providing a wide range of users with the latest version of the exact “view” of any text needed, while maintaining the integrity of the text and its annotations and simultaneously allowing for community improvements and additions. In this article, we explore the details of how the project came to be, what it is, and how it works, while also presenting a few common use cases.
- Kathryn Zickuhr and Lee Raine. 2014. Tablet and E-reader Ownership. Pew Research Center. Retrieved from https://www.pewresearch.org/internet/2014/01/16/tablet-and-e-reader-ownership/.Google Scholar
- Steve Bohme. 2017. Books and Consumers in 2016. Nielsen Book Research. Retrieved from https://quantum.londonbookfair.co.uk/RXUK/RXUK_PDMC/responsive/images/2017/Steve%20Bohme%20-%20The%202016%20Book%20Market%20Highlights%20from%20the%20Books%20and%20Consumers%202016%20Survey.pdf?v=636257834636180655.Google Scholar
- Paul Grabowicz. 2020. The Transition to Digital Journalism. Berkeley Advanced Media Institute. Retrieved from https://multimedia.journalism.berkeley.edu/tutorials/digital-transform/.Google Scholar
- V. Alfano (Ed.) and A. Stauffer (Series Ed.). Virtual Victorians: Networks, Connections, Technologies (1st ed.). Palgrave Macmillan.Google Scholar
- Paul Eggert. 2016. The book, the e-text and the 'work-site'. In Text Editing, Print and the Digital World, Marilyn Deegan and Kathryn Sutherland (Eds.). Literary and Linguistic Computing, Vol. 25, 360–362. 10.1093/llc/fqq009.Google Scholar
- Text Encoding Initiative (TEI). 2021. Guidelines for Electronic Text Encoding and Interchange. Retrieved from https://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-standOff.html.Google Scholar
- Javier Pose Rodriguez. 2013. “A Generic Formalism for Encoding Standoff annotations in TEI”. Retrieved from https://sourceforge.net/p/tei/feature-requests/_discuss/thread/1e0e4acb/13db/attachment/A%20Generic%20Formalism%20for%20Encoding%20Standoff%20annotations%20in%20TEI.pdf.Google Scholar
- Nancy Ide, Christian Chiarcos, Manfred Stede, and Steve Cassidy. 2017. Designing Annotation Schemes: From Model to Representation.Google Scholar
- https://www.knora.org/.Google Scholar
- Desmond Allen Schmidt. 2016. Using standoff properties for marking-up historical documents in the humanities. Inf. Technol. 58, 2 (2016), 63–69.Google Scholar
- See RATT retrieved from http://ratt.sourceforge.net/ or Gneious retrieved from https://www.geneious.com/.Google Scholar
- Stephen M. Miller and Robert V. Huber. 2004. The Bible: A History. Good Books.Google Scholar
- Bruce M. Metzger. 1977. The Early Versions of the New Testament: Their Origin, Transmission and Limitations. Oxford University Press.Google Scholar
- Strong James. 1890. The Exhaustive Concordance of the Bible. Jennings & Graham, Cincinnati.Google Scholar
- Logos Bible Software. 2020. Website https://www.logos.com/.Google Scholar
- The Sefaria Library. Website https://www.sefaria.org/texts.Google Scholar
- Bod lJongs Rig gNas Dra ba. Website http://zw.tibetculture.org.cn/.Google Scholar
- Buddhist Digital Resource Center. 2017. Website https://www.tbrc.org/.Google Scholar
- Dharma Downloads. 2020. Website http://www.dharmadownload.net/.Google Scholar
- Dharma Ebooks. 2021. Website https://dharmaebooks.org/.Google Scholar
- Drepung Gomang Library Ebook Portal. Website http://www.gomanglibrary.com/.Google Scholar
- Sakya Digital Library. Website http://sakyalibrary.com/Home/Index.Google Scholar
- Sera Jey Rigzod Chenmo. 2021. Website https://www.serajeyrigzodchenmo.org/.Google Scholar
- Tibetan Ebooks. 2019. Website http://tibetanebooks.com/.Google Scholar
- Timeless Treasuries. 2021. Website http://dharmacloud.tsadra.org/library/.Google Scholar
- Dege Kangyur (sde dge bka’ ‘gyur). Degé Publishing House. Digitization/ed. Esukhia. 2014. Retrieved from http://www.thlib.org/encyclopedias/literary/canons/kt/catalog.php#cat=d.Google Scholar
- Dege Kangyur (sde dge bka’ ‘gyur). Degé Publishing House. Digitization/ed. Esukhia. 2015. Retrieved from https://adarsha.dharma-treasure.org/kdbs/degekangyur?fbclid=IwAR3q8ZABIxXXvH02mW6IFO1D0rFFsy8ZGiKRWTsXJ8ScGz7qhe9udcV3Y8Q.Google Scholar
- Dege Kangyur (sde dge bka’ ‘gyur). Degé Publishing House. Digitization/ed. Esukhia. 2018. Retrieved from https://www.tbrc.org/#!rid=W4CZ5369.Google Scholar
- Dege Kangyur (sde dge bka’ ‘gyur). Degé Publishing House. Digitization/ed. Esukhia. 2019. Retrieved from https://github.com/84000/data-translation-memory.Google Scholar
- Dege Kangyur (sde dge bka’ ‘gyur). Degé Publishing House. Digitization/ed. Esukhia. 2020. Retrieved from https://github.com/Esukhia/derge-kangyur.Google Scholar
- https://staging.nalanda.works/.Google Scholar
- Daniel Wakelin. 2014. Scribal Correction and Literary Craft. Cambridge University Press.Google Scholar
- Prose. Website https://prose.io/.Google Scholar
- https://github.com/OpenPecha/openpecha-toolkit.Google Scholar
- https://github.com/OpenPecha/hfml.Google Scholar
- Overview of the BIBFRAME 2.0 Model. 2016. Website https://www.loc.gov/bibframe/docs/bibframe2-model.html.Google Scholar
- https://github.com/OpenPecha.Google Scholar
- Hypothesis. Website https://web.hypothes.is/.Google Scholar
- Graham Barwell, Chris Tiffin, Phillip Berrie, and Paul Eggert. 2001. The Authenticated Electronic Editions Project. Computing Arts. Retrieved from https://www.academia.edu/708529/The_authenticated_electronic_editions_project.Google Scholar
- Universal Dependencies. 2017. CoNLL-U Format. Retrieved from https://universaldependencies.org/format.html.Google Scholar
- Diff-Match-Patch. 2006. Google. Retrieved from https://github.com/google/diff-match-patch.Google Scholar
- A. Korzybski and R. P. Pula. 2005. Science and sanity: An introduction to non-aristotelian systems and general semantics. Institute of General Semantics, Texas.Google Scholar
- Eric Miller, Uche Ogbuji, Victoria Mueller, Kathy MacDougall. 2012. Bibliographic Framework as a Web of Data: Linked Data Model and Supporting Services. Library of Congress. Retrieved from https://www.loc.gov/bibframe/pdf/marcld-report-11-21-2012.pdf.Google Scholar
- BDRC and Esukhia. 2019. The OpenPecha Project:. OpenPecha Catalog, OpenPecha Template, OpenPecha Toolkit, OpenPecha Abstract Demo. Retrieved from https://github.com/OpenPecha.Google Scholar
- Thomas D. Otto, Gary P. Dillon, Wim S. Degrave, and Matthew Berriman. 2011. RATT: Rapid annotation transfer tool. Nucleic Acids Res. 39, 9 (2011), e57.Google Scholar
Cross Ref
- Sanderson Robert et al. (Editors) 2013. Open Annotation Data Model. Retrieved from http://www.openannotation.org/spec/core/20130208/.Google Scholar
- Peter L. Shillingsburg. 2006. From Gutenberg to Google. Cambridge University Press.Google Scholar
- Standoff/RDF Text Markup. 2019. Knora Documentation. Retrieved from https://docs.knora.org/paradox/01-introduction/standoff-rdf.html.Google Scholar
- John Weinstein. 2020. Field Museum. Retrieved from https://www.eurekalert.org/multimedia/pub/211088.php.Google Scholar
- Retrieved from https://www.pexels.com/photo/selective-focus-photography-of-octopus-3046629/.Google Scholar
- Bański. 2010. Why TEI standoff annotation doesn't quite work: and why you might want to use it nevertheless. In Proceedings of Balisage: The Markup Conference, 2010. Vol. 5 of Balisage Series on Markup Technologies.Google Scholar
- Goecke Lüngen and Metzing Stührenberg. 2010. Different views on markup: Distinguishing levels and layers. In Linguistic Modeling of Information and Markup Languages, Text, Speech and Language Technology. 1–21.Google Scholar
- Geer Benjamin. 2016. Redesign Standoff. Knora API: Github Issue Thread. Retrieved from https://github.com/dasch-swiss/knora-api/issues/101.Google Scholar
- Project Gutenberg. Website https://www.gutenberg.org/.Google Scholar
- Distributed Proofreaders. 2021. Website https://www.pgdp.net/.Google Scholar
- Gitenberg. Website https://www.gitenberg.org/.Google Scholar
- Wikibooks. Website https://en.wikibooks.org/wiki/Main_Page.Google Scholar
Index Terms
Taming the Wild Etext: Managing, Annotating, and Sharing Tibetan Corpora in Open Spaces
Recommendations
Grading Tibetan Children’s Literature: A Test Case Using the NLP Readability Tool “Dakje”
Worldwide, literacy is on the rise. This historically unprecedented surge—especially over the past 200 years—has changed nearly everything about the ancient technology of reading. Who reads is changing: Literacy is no longer just for elite, professional ...
Comparison of Methods to Annotate Named Entity Corpora
The authors compared two methods for annotating a corpus for the named entity (NE) recognition task using non-expert annotators: (i) revising the results of an existing NE recognizer and (ii) manually annotating the NEs completely. The annotation time, ...
Building a semantically annotated corpus of clinical texts
In this paper, we describe the construction of a semantically annotated corpus of clinical texts for use in the development and evaluation of systems for automatically extracting clinically significant information from the textual component of patient ...






Comments