Abstract
Worldwide, literacy is on the rise. This historically unprecedented surge—especially over the past 200 years—has changed nearly everything about the ancient technology of reading. Who reads is changing: Literacy is no longer just for elite, professional readers, but for anyone and everyone. What and why we read is changing: We do not just read difficult texts for academic, religious, legal, or record-keeping purposes; we also read easy texts to be entertained, to access information, and to communicate with each other on a daily basis. And how we read is changing: Memorization, recitation, and oral performance has given way to a rapid, silent, individual activity.
Many of these democratizing changes have been made possible by technology. This has included advances in methods and materials that have made reading and writing easy, cheap, and widely available—like paper, the printing press, and the digital revolution. But perhaps the biggest reason literacy has become so widespread has been its ability to reach people in their own natural languages. More recently, this progress has been enhanced by NLP tools, like readability editors, that have helped authors, journalists, and other writing professionals make simple, clear content suitable for both beginning readers and widespread audiences.
To that end, this article introduces a new readability tool, “Dakje,” alongside a specific use case, and demonstrates how it can help benefit literacy in the Tibetan languages. This NLP software works by word-splitting Tibetan text and analyzing those words using level lists that are based on frequency analysis from corpora. Users then have instant access to statistics on the readability of their word choices so they can make edits for easy-to-read text. In our test-case, Dakje helped us reduce sentence complexity by 34%, total word count by 10%, and non-level vocabulary use from 16% to 1% when compared to an original English-to-Tibetan translation.
- Charles A. Perfetti and Susan Dunlap. 2008. Learning to read across languages: Cross-linguistic relationships in first and second language literacy development. Retrieved from http://www.lrdc.pitt.edu/perfettilab/pubpdfs/Learning%20to%20read%20(chapt)-%20Dunlap.pdf.Google Scholar
- Mohamed Maamouri. 1998. Arabic Diglossia and Its Impact on the Quality of Education in the Arab Region. International Literacy Institute, University of Pennsylvania.Google Scholar
- Charles Ferguson. 1991. Diglossia revisited, Southwest J. Linguist. 10, 1 (1991), 214--234.Google Scholar
- Bill Graves. 2010. Most college students print as cursive writing starts to disappear. The Oregonian. Retrieved from http://www.oregonlive.com/education/index.ssf/2010/10/most_college_students_print_as.html.Google Scholar
- Gordon E. Legge and Charles A. Bigelow. 2011. Does print size matter for reading? A review of findings from vision science and typography. J. Vision 11, 5, 8 (2011), 1--22. Retrieved from http://jov.arvojournals.org/article.aspx?articleid=2191906 via http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3428264/.Google Scholar
Cross Ref
- P. C. Smythe, R. G. Stennett, M. Hardy, and H. R. Wilson. 1971. Developmental patterns in elemental skills: Knowledge of uppercase and lower-case letter names. J. Read. Behav. 3, 3 (1971), 24--33.Google Scholar
Cross Ref
- Paul Saenge. 1997. Space Between Words: The Origins of Silent Reading. Stanford University Press, Stanford, CA.Google Scholar
- Maria Nikolajeva. 2014. Reading for Learning: Cognitive Approaches to Children’s Literature. John Benjamins Publishing Co., Philadelphia.Google Scholar
- Michael Pressley. 2006. Child and Adolescent Development for Educators. Guilford Press, New York.Google Scholar
- Nichola Callander and Lindy Nahmad-Williams. 2010. Communication, Language and Literacy. Continuum International Publishing Group, London.Google Scholar
- Claire Painter, J. R. Martin, and Lens Unsworth. 2014. Reading Visual Narratives: Image Analysis in Children’s Picture Books. Equinox.Google Scholar
- Andrew J. Reagan, Lewis Mitchell, Dian Kiley, Christopher M. Danforth, and Peter Sheridan Dodds. 2016. The emotional arcs of stories are dominated by six basic shapes. EPJ Data Science 5.1 (2016).Google Scholar
- D. Murphy. 1947. How plain talk increases readership 45% to 60%. Printer’s Ink. 220 (1947), 35--37.Google Scholar
- C. E. Swanson. 1948. Readability and readership: A controlled experiment. Journalism Quart. 25, 339--343.Google Scholar
Cross Ref
- The Extensive Reading Foundation. 2011. Guide to extensive reading”. MacMillan, Cambridge/Pearson, Oxford. Retrieved from https://erfoundation.org/guide/ERF_Guide.pdf.Google Scholar
- Paul Saenge. 1997. Space Between Words: The Origins of Silent Reading. Stanford University Press, Stanford, CA.Google Scholar
- Daniel T. Kline. 2003. Medieval Literature for Children. Psychology Press.Google Scholar
- W. H. DuBay. 2006. Smart language: Readers, readability, and the grading of text. Impact Information, Costa Mesa.Google Scholar
- G. R. Klare and B. Buck. 1954. Know Your Reader: The Scientific Approach to Readability. Heritage House, New York.Google Scholar
- Anne O’Keeffe et al. 2007. From Corpus to Classroom: Language Use and Language Teaching. Cambridge University Press, Cambridge.Google Scholar
- Tony McEnery. 2010. What corpora can offer in language teaching and learning. Handbook of Research in Second Language Teaching and Learning. E. Hinkel (Ed.). Vol. 2. Routledge, London/New York, 364--380. Retrieved from http://lancs.ac.uk/∼xiaoz/papers/Corpora%20and%20language%20teachingv7.rtf.Google Scholar
- Nicolas Tournadre. 2003. Manual of Standard Tibetan. Snow Lion Publications.Google Scholar
- Nicolas Tournadre. 2008. Arguments against the Concept of “Conjunct”/“Disjunct” in Tibetan in Chomolangma, Demawend, und Kasbek. Festschrift für Roland Bielmeier zu seinem 65. Geburtstag. B. Huber, M. Volkart, P. Widmer, and P. Schwieger, (Eds), Vol 1. p. 281--308. Retrieved from http://tournadre.nicolas.free.fr/fichiers/2008-Conjunct.pdf.Google Scholar
- Andrea Butcher. 2013. Grammatically speaking: Religious authority and development discourse in Buddhist Ladakh. Durham Anthropol. J. 19, 1 (2013), 95—109.Google Scholar
- Birgit Kellner. 2018. Vernacular literacy in tibet: Present debates and historical beginnings. In Anfangsgeschichten/Origin Stories. Brill.Google Scholar
- Tomothy Thurston. 2018. The purist campaign as metadiscursive regime in china’s tibet. Brill.Google Scholar
- ri mor bltas nas ka kha bslab pa. 2010. Xining: Qinghai renming shuban-che. Retrieved from http://tibetbook.net/en/46-children.Google Scholar
- Dhundup Tsering. 2016. bod skad: deb dang po. Tibetan Baby Books. Retrieved from https://www.tibetanbabybooks.com/.Google Scholar
- Shes rig dpar khang. 2018. Shes rig dpar khang. Retrieved from https://sherigparkhang.com/.Google Scholar
- mTsho sngon mi rigs dpe skrun khang. 2019. mTsho sngon mi rigs dpe skrun khang. Retrieved from http://tibetbook.net/en/46-children.Google Scholar
- TibetBook.net. 2020. TibetBook.net. Retrieved from http://tibetbook.net/en/46-children.Google Scholar
- Dhundup Tsering. 2020. Tibetan Baby Books. Retrieved from https://www.tibetanbabybooks.com/.Google Scholar
- Tenzin Norbu Nangsal. 2020. TALI: Children’s Books. Retrieved from http://talitibet.org/wp/about_en/.Google Scholar
- Manjushri Educational Services. 2019. Pratham books: StoryWeaver. Retrieved from https://storyweaver.org.in/.Google Scholar
- bod ljongs mi dmangs dpe skrun khang. 2020. TibetBook.net: Children’s Books. Retrieved from http://tibetbook.net/en/46-children.Google Scholar
- VOA. 2014. Video. Translating quality children’s books into tibetan. Retrieved from https://www.facebook.com/voatibetan/videos/770373659665088/.Google Scholar
- Udor Wan-a-rom (2008). Comparing the vocabulary of different graded-reading schemes, Read. Foreign Lang. 20, 1 (2008), 43--69. Retrieved from http://nflrc.hawaii.edu/rfl/April2008/wanarom/wanarom.pdf.Google Scholar
- D. Hirsh and P. Nation. 1992. What vocabulary size is needed to read unsimplified texts for pleasure? Read. Foreign Lang. 8 (1992), 689--696.Google Scholar
- B. Laufer. 1989. What percentage of text-lexis is essential for comprehension? In Special Language: From Humans to Thinking Machines, C. Lauren 8 M.Nordman (Eds.). Clevedon, England, 316--323.Google Scholar
- Human Development Report: China. 2008. U.N. Retrieved from http://hdr.undp.org/sites/default/files/china_2008_en.pdf.Google Scholar
- Full Report: Tibetan Literacy. 2016. ESUKHIA. Retrieved from https://docs.google.com/document/d/1LPYlqBIjWpXgjCtDAxETg5GyGPSZY61anWyY-u7wbB0/edit?usp=sharing.Google Scholar
- C. Hamilton and Mark Shinn. 2003. Characteristics of word callers: An investigation of the accuracy of teachers’ judgments of reading comprehension and oral reading skills. School Psychol. Rev. 32 (2003), 228--240.Google Scholar
Cross Ref
- John Allen. 2003. The BBC News Styleguide. BBC Training 8 Development. Retrieved from http://www2.media.uoa.gr/lectures/linguistic_archives/academic_papers0506/notes/stylesheets_3.pdf.Google Scholar
- Rudolph Flesch. 2016. How to Write Plain English. University of Canterbury, Christchurch. Retrieved from http://www.mang.canterbury.ac.nz/writing_guide/writing/flesch.shtml.Google Scholar
- Velma J. Beaglehole. 2010. The full stop effect: Using readability statistics with young writers. J. Literacy Technol. 53, 11 4 (2010), 55--83. Retrieved from http://www.literacyandtechnology.org/uploads/1/3/6/8/136889/jlt_v11_4_beaglehole.pdf.Google Scholar
- Adam Grant. 2018. Those who can do, can’t teach: Advice for college students: The best experts sometimes make the worst educators. New York Times. Retrieved from https://www.nytimes.com/2018/08/25/opinion/sunday/college-professors-experts-advice.html.Google Scholar
- SayMore. 2020. Software. Retrieved from https://software.sil.org/saymore/.Google Scholar
- Esukhia. 2020. Retrieved from https://github.com/Esukhia/children-stories.Google Scholar
- Esukhia. 2020. Children’s stories: Session M0044. Retrieved from https://github.com/Esukhia/children-stories/blob/master/SayMore/Children%20Stories/Sessions/M0044/M0044_transcription_subtitle.srt.Google Scholar
- Anthony Laurence. 2019. Software. Antconc. Retrieved from https://www.laurenceanthony.net/software/antconc/.Google Scholar
- Esukhia. 2020. Children’s stories: Session M0054. Retrieved from https://github.com/Esukhia/children-stories/blob/master/SayMore/Children%20Stories/Sessions/M0054/M0054_transcription_subtitle.srt.Google Scholar
- Hemmingway App. 2020. Retrieved from http://www.hemingwayapp.com/.Google Scholar
- Esukhia. 2020. Pybo. Retrieved from https://github.com/Esukhia/pybo.Google Scholar
- Esukhia. 2020. Corpora. Retrieved from https://github.com/Esukhia/Corpora/.Google Scholar
- Keith Rayner, Barbara Foorman, Charles A. Perfetti, David Pesetsky, and Mark S. Seidenberg. 2001. How psychological science informs the teaching of reading. Psychol. Sci. Public Interest 2, 2 (2) (2001), 31--74.Google Scholar
- Mary E. Dahlgren. 2008. Oral language and vocabulary development. In Proceedings of the Reading First National Conference. Retrieved from https://www2.ed.gov/programs/readingfirst/2008conferences/language.pdf.Google Scholar
- E. Dale and J. Chall. 1948. A formula for predicting readability. Education. Res. Bull. 27 (1948), 11--20.Google Scholar
- N. Schmitt, Jiang Xiangying, and William Grabe. 2011. The Percentage of Words Known in aText and Reading Comprehension. Modern Lang. J. 95, 1 (2011). Retrieved from https://www.lextutor.ca/cover/papers/schmitt_etal_2011.pdf.Google Scholar
Cross Ref
Index Terms
Grading Tibetan Children’s Literature: A Test Case Using the NLP Readability Tool “Dakje”
Recommendations
Designing for people who do not read easily
BCS-HCI '08: Proceedings of the 22nd British HCI Group Annual Conference on People and Computers: Culture, Creativity, Interaction - Volume 2Many people do not read easily for all sorts of reasons: social and cultural, because of impairments, or because of their context. Even in the area of impairments, design for people with learning disabilities might be very different from design for ...
Tibetan Linguistic Terminology on the Base of the Tibetan Traditional Grammar Treatises Corpus
TSD 2015: Proceedings of the 18th International Conference on Text, Speech, and Dialogue - Volume 9302The paper is devoted to Tibetan grammatical terminology. For this purpose Tibetan grammatical works corpus was created. At the same time Russian translations of the works were added to the corpus, so it is factually a parallel Tibetan-Russian corpus. ...
Corpus-based stylostatistic research of modern Tibetan scientific writing
IMS2017: Proceedings of the International Conference IMS-2017The study examines the style of writing in modern scientific Tibetan texts. During the research a corpus of 32468 tokens was created, which was then analyzed in morphological, lexical and syntactical aspects. The results reveal differences and ...






Comments