skip to main content
research-article

LanguageLogger: A Mobile Keyboard Application for Studying Language Use in Everyday Text Communication in the Wild

Published:18 June 2020Publication History
Skip Abstract Section

Abstract

We present a concept and tool for studying language use in everyday mobile text communication (e.g. chats). Our approach for the first time enables researchers to collect comprehensive data on language use during unconstrained natural typing (i.e. no study tasks) without logging readable messages to preserve privacy. We achieve this with a combination of three customisable text abstraction methods that run directly on participants' phones. We report on our implementation as an Android keyboard app and two evaluations: First, we simulate text reconstruction attempts on a large text corpus to inform conditions for minimising privacy risks. Second, we assess people's experiences in a two-week field deployment (N=20). We release our app as an open source project to the community to facilitate research on open questions in HCI, Linguistics and Psychology. We conclude with concrete ideas for future studies in these areas.

References

  1. 2013. Korpusbasierte Wortgrundformenliste DEREWO, v-ww-bll-320000g-2012--12--31--1.0, mit Benutzerdokumentation. http://www.ids-mannheim.de/derewoGoogle ScholarGoogle Scholar
  2. Azy Barak and Orit Gluck-Ofri. 2007. Degree and Reciprocity of Self-Disclosure in Online Forums. Cyber Psychology & Behavior 10, 3 (2007), 407--417. https://doi.org/10.1089/cpb.2006.9938Google ScholarGoogle ScholarCross RefCross Ref
  3. Michael Beißwenger and Angelika Storrer. 2008. 21. Corpora of Computer-Mediated Communication. Corpus Linguistics. An International Handbook. Series: Handbücher zur Sprach-und Kommunikationswissenschaft/Handbooks of Linguistics and Communication Science. Mouton de Gruyter, Berlin(2008).Google ScholarGoogle Scholar
  4. Barry Brown, Moira McGregor, and Donald McMillan. 2014. 100 Days of iPhone Use: Understanding the Details of Mobile Device Use. In Proceedings of the 16th International Conference on Human-computer Interaction with Mobile Devices & Services(Toronto, ON, Canada)(MobileHCI '14). ACM, New York, NY, USA, 223--232. https://doi.org/10.1145/2628363.2628377Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Tom Buchanan, Carina Paine, Adam N. Joinson, and Ulf-Dietrich Reips. 2007. Development of Measures of Online Privacy Concern and Protection for Use on the Internet.J. Am. Soc. Inf. Sci. Technol. 58, 2 (Jan. 2007), 157--165. https://doi.org/10.1002/asi.v58:2Google ScholarGoogle ScholarCross RefCross Ref
  6. Daniel Buschek, Benjamin Bisinger, and Florian Alt. 2018. ResearchIME: A Mobile Keyboard Application for Studying Free Typing Behaviour in the Wild. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems(Montreal QC, Canada)(CHI '18). ACM, New York, NY, USA, Article 255, 14 pages. https://doi.org/10.1145/3173574.3173829Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Daniel Buschek, Sarah Völkel, Clemens Stachl, Lukas Mecke, Sarah Prange, and Ken Pfeuffer. 2018. Experience Sampling As Information Transmission: Perspective and Implications. In Proceedings of the 2018 ACM International Joint Conference and 2018 International Symposium on Pervasive and Ubiquitous Computing and Wearable Computers(Singapore, Singapore)(UbiComp '18). ACM, New York, NY, USA, 606--611. https://doi.org/10.1145/3267305.3267543Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Henriette Cramer, Paloma de Juan, and Joel Tetreault. 2016. Sender-intended Functions of Emojis in US Messaging. In Proceedings of the 18th International Conference on Human-Computer Interaction with Mobile Devices and Services(Florence, Italy)(MobileHCI '16). ACM, New York, NY, USA, 504--509. https://doi.org/10.1145/2935334.2935370Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Irit Dinur and Kobbi Nissim. 2003. Revealing Information While Preserving Privacy. In Proceedings of the Twenty-Second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems(San Diego, California)(PODS'03). Association for Computing Machinery, New York, NY, USA, 202--210. https://doi.org/10.1145/773153.773173Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Michelle Drouin and Claire Davis. 2009. R u txting? Is the Use of Text Speak Hurting Your Literacy? Journal of Literacy Research 41, 1 (2009), 46--67. https://doi.org/10.1080/10862960802695131Google ScholarGoogle ScholarCross RefCross Ref
  11. Michelle Drouin and Brent Driver. 2014. Texting, textese and literacy abilities: A naturalistic study.Journal of Research in Reading 37, 3 (2014), 250--267. https://doi.org/10.1111/j.1467--9817.2012.01532.xGoogle ScholarGoogle Scholar
  12. Penelope Eckert. 2008. Variation and the indexical field1.Journal of Sociolinguistics 12, 4 (2008), 453--476. https://doi.org/10.1111/j.1467--9841.2008.00374.xGoogle ScholarGoogle Scholar
  13. Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. 2014. RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security(Scottsdale, Arizona, USA)(CCS '14). Association for Computing Machinery, New York, NY, USA, 1054--1067. https://doi.org/10.1145/2660267.2660348Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Golnoosh Farnadi, Geetha Sitaraman, Shanu Sushmita, Fabio Celli, Michal Kosinski, David Stillwell, Sergio Davalos,Marie-Francine Moens, and Martine De Cock. 2016. Computational personality recognition in social media. User Modeling and User-Adapted Interaction 26, 2 (01 Jun 2016), 109--142. https://doi.org/10.1007/s11257-016--9171-0Google ScholarGoogle Scholar
  15. Chris Fullwood, Lisa J. Orchard, and Sarah A. Floyd. 2013. Emoticon convergence in Internet chat rooms. Social Semiotics23, 5 (2013), 648--662. https://doi.org/10.1080/10350330.2012.739000 arXiv: https://doi.org/10.1080/10350330.2012.739000Google ScholarGoogle Scholar
  16. J. Golbeck, C. Robles, M. Edmondson, and K. Turner. 2011. Predicting Personality from Twitter. In 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third International Conference on Social Computing. 149--156. https://doi.org/10.1109/PASSAT/SocialCom.2011.33Google ScholarGoogle ScholarCross RefCross Ref
  17. Dirk Goldhahn, Thomas Eckart, and Uwe Quasthoff. 2012. Building Large Monolingual Dictionaries at the Leipzig Corpora Collection: From 100 to 200 Languages. In LREC, Vol. 29. 31--43.Google ScholarGoogle Scholar
  18. Joshua Goodman, Gina Venolia, Keith Steury, and Chauncey Parker. 2002. Language Modeling for Soft Keyboards. In Proceedings of the 7th International Conference on Intelligent User Interfaces(San Francisco, California, USA)(IUI '02). ACM, New York, NY, USA, 194--195. https://doi.org/10.1145/502716.502753Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Joshua T. Goodman. 2001. A bit of progress in language modeling. Computer Speech & Language 15, 4 (2001), 403--434. https://doi.org/10.1006/csla.2001.0174Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Niels Henze, Enrico Rukzio, and Susanne Boll. 2012. Observational and Experimental Investigation of Typing Behaviour Using Virtual Keyboards for Mobile Devices. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems(Austin, Texas, USA)(CHI '12). ACM, New York, NY, USA, 2659--2668. https://doi.org/10.1145/2207676.2208658Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Susan C. Herring and John C. Paolillo. 2006. Gender and genre variation in weblogs. Journal of Sociolinguistics10, 4(2006), 439--459. https://doi.org/10.1111/j.1467--9841.2006.00287.xarXiv: https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1467--9841.2006.00287.xGoogle ScholarGoogle ScholarCross RefCross Ref
  22. Anjuli Kannan, Karol Kurach, Sujith Ravi, Tobias Kaufmann, Andrew Tomkins, Balint Miklos, Greg Corrado, Laszlo Lukacs, Marina Ganea, Peter Young, and Vivek Ramavajjala. 2016. Smart Reply: Automated Response Suggestion for Email. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(San Francisco, California, USA)(KDD '16). ACM, New York, NY, USA, 955--964. https://doi.org/10.1145/2939672.2939801Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. R. Kneser and H. Ney. 1995. Improved backing-off for M-gram language modeling. In 1995 International Conference on Acoustics, Speech, and Signal Processing, Vol. 1. 181--184 vol.1. https://doi.org/10.1109/ICASSP.1995.479394Google ScholarGoogle ScholarCross RefCross Ref
  24. Michal Kosinski, David Stillwell, and Thore Graepel. 2013. Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences 110, 15 (2013), 5802--5805. https://doi.org/10.1073/pnas.1218772110Google ScholarGoogle Scholar
  25. Per Ola Kristensson and Keith Vertanen. 2014. The Inviscid Text Entry Rate and Its Application As a Grand Goal for Mobile Text Entry. In Proceedings of the 16th International Conference on Human-computer Interaction with Mobile Devices & Services(Toronto, ON, Canada)(MobileHCI '14). ACM, New York, NY, USA, 335--338. https://doi.org/10.1145/2628363.2628405Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Ethan Kross, Philippe Verduyn, Margaret Boyer, Brittany Drake, Izzy Gainsburg, Brian Vickers, Oscar Ybarra, and JohnJonides. 2019. Does counting emotion words on online social networks provide a window into people's subjective experience of emotion? A case study on Facebook. Emotion (Washington, D.C.)19, 1 (February 2019), 97-107. https://doi.org/10.1037/emo0000416Google ScholarGoogle ScholarCross RefCross Ref
  27. Weijian Li, Yuxiao Chen, Tianran Hu, and Jiebo Luo. 2018. Mining the Relationship between Emoji Usage Patterns and Personality. In International AAAI Conference on Web and Social Media. AAAI Publications, Palo Alto, CA, USA, 4. http://arxiv.org/abs/1804.05143Google ScholarGoogle Scholar
  28. Rich Ling and Naomi S. Baron. 2007. Text Messaging and IM.Journal of Language and Social Psychology 26, 3 (2007),291 --298. https://doi.org/10.1177/0261927X06303480Google ScholarGoogle Scholar
  29. Matthias R. Mehl, James W. Pennebaker, D. Michael Crow, James Dabbs, and John H. Price. 2001. The Electronically Activated Recorder (EAR): A device for sampling naturalistic daily activities and conversations.Behavior Research Methods, Instruments, & Computers 33, 4 (01 Nov 2001), 517--523. https://doi.org/10.3758/BF03195410Google ScholarGoogle Scholar
  30. Hannah Jean Miller, Daniel Kluver, Jacob Thebault-Spieker, Loren G Terveen, and Brent J Hecht. 2017. Understanding Emoji Ambiguity in Context: The Role of Text in Emoji-Related Miscommunication.. In International AAAI Conference on Web and Social Media. AAAI Publications, Palo Alto, CA, USA, 152--161.Google ScholarGoogle Scholar
  31. Gene Ouellette and Melissa Michaud. 2016. Generation text: Relations among undergraduates' use of text messaging, textese, and language and literacy skills. Canadian Journal of Behavioural Science / Revue canadienne des sciences ducomportement48, 3 (2016), 217--221. https://doi.org/10.1037/cbs0000046Google ScholarGoogle Scholar
  32. James W Pennebaker, Martha E. Francis, and Roger J. Booth. 2001. Linguistic inquiry and word count: LIWC 2001. Mahway: Lawrence Erlbaum Associates 71, 2001 (2001), 2001.Google ScholarGoogle Scholar
  33. Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, 1532--1543. https://doi.org/10.3115/v1/D14--1162Google ScholarGoogle ScholarCross RefCross Ref
  34. Paul R. Pintrich. 2003. A Motivational Science Perspective on the Role of Student Motivation in Learning and Teaching Contexts. Journal of Educational Psychology 95, 4 (2003), 667--686. https://doi.org/10.1037/0022-0663.95.4.667Google ScholarGoogle Scholar
  35. Henning Pohl, Christian Domin, and Michael Rohs. 2017. Beyond Just Text: Semantic Emoji Similarity Modeling to Support Expressive Communication.ACM Trans. Comput.-Hum. Interact. 24, 1, Article 6 (March 2017), 42 pages. https://doi.org/10.1145/3039685Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Henning Pohl, Dennis Stanke, and Michael Rohs. 2016. EmojiZoom: Emoji Entry via Large Overview Maps. In Proceedings of the 18th International Conference on Human-Computer Interaction with Mobile Devices and Services(Florence, Italy) (MobileHCI '16). ACM, New York, NY, USA, 510--517. https://doi.org/10.1145/2935334.2935382Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Robert Remus, Uwe Quasthoff, and Gerhard Heyer. 2010. SentiWS-A Publicly Available German-language Resource for Sentiment Analysis. In LREC. Citeseer.Google ScholarGoogle Scholar
  38. Shyam Reyal, Shumin Zhai, and Per Ola Kristensson. 2015. Performance and User Experience of Touchscreen and Gesture Keyboards in a Lab Setting and in the Wild. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems(Seoul, Republic of Korea)(CHI '15). ACM, New York, NY, USA, 679--688. https://doi.org/10.1145/2702123.2702597Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Avi Rosenfeld, Sigal Sina, David Sarne, Or Avidov, and Sarit Kraus. 2018. A Study of WhatsApp Usage Patterns and Prediction Models without Message Content. CoRRabs/1802.03393 (2018). arXiv:1802.03393 http://arxiv.org/abs/1802.03393Google ScholarGoogle Scholar
  40. Alireza Sahami Shirazi, Niels Henze, Tilman Dingler, Martin Pielot, Dominik Weber, and Albrecht Schmidt. 2014. Large-scale Assessment of Mobile Notifications. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems(Toronto, Ontario, Canada)(CHI '14). ACM, New York, NY, USA, 3055--3064. https://doi.org/10.1145/2556288.2557189Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. H. Schmid. 1999.Improvements in Part-of-Speech Tagging with an Application to German. Springer Netherlands, Dordrecht, 13--25. https://doi.org/10.1007/978--94-017--2390--9_2Google ScholarGoogle Scholar
  42. H Andrew Schwartz, Johannes C Eichstaedt, Margaret L Kern, Lukasz Dziurzynski, Stephanie M Ramones, Megha Agrawal, Achal Shah, Michal Kosinski, David Stillwell, Martin E P Seligman, and Lyle H Ungar. 2013. Personality, gender, and age in the language of social media: the open-vocabulary approach. PloS one 8, 9 (2013), e73791. https://doi.org/10.1371/journal.pone.0073791Google ScholarGoogle ScholarCross RefCross Ref
  43. H. Andrew Schwartz, Johannes C. Eichstaedt, Margaret L. Kern, Lukasz Dziurzynski, Stephanie M. Ramones, Megha Agrawal, Achal Shah, Michal Kosinski, David Stillwell, Martin E. P. Seligman, and Lyle H. Ungar. 2013. Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach. PLOS ONE8, 9 (09 2013), 1--16. https://doi.org/10.1371/journal.pone.0073791Google ScholarGoogle Scholar
  44. Beat Siebenhaar. 2006.Code choice and code-switching in Swiss-German Internet Relay Chat rooms. Journal of Sociolinguistics 10, 4 (2006), 481--506. https://doi.org/10.1111/j.1467--9841.2006.00289.xarXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1467--9841.2006.00289.xGoogle ScholarGoogle Scholar
  45. Noah A. Smith. 2019. Contextual Word Representations: A Contextual Introduction. arXiv:1902.06006 [cs](Feb. 2019).http://arxiv.org/abs/1902.06006 arXiv: 1902.06006.Google ScholarGoogle Scholar
  46. Zhiyi Song, Stephanie Strassel, Haejoong Lee, Kevin Walker, Jonathan Wright, Jennifer Garland, Dana Fore, Brian Gainor, Preston Cabe, Thomas Thomas, Brendan Callahan, and Ann Sawyer. 2014. Collecting Natural SMS and Chat Conversations in Multiple Languages: The BOLT Phase 2 Corpus. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014). European Languages Resources Association (ELRA), Reykjavik, Iceland, 1699--1704. http://www.lrec-conf.org/proceedings/lrec2014/pdf/1094_Paper.pdfGoogle ScholarGoogle Scholar
  47. Clemens Stachl, Sven Hilbert, Jiew-Quay Au, Daniel Buschek, Alexander De Luca, Bernd Bischl, Heinrich Hussmann,and Markus Bühner. 2017. Personality Traits Predict Smartphone Usage. European Journal of Personality 31, 6 (2017),701--722. https://doi.org/10.1002/per.2113Google ScholarGoogle Scholar
  48. Yla Tausczik, Kate Faasse, James W. Pennebaker, and Keith J. Petrie. 2012. Public Anxiety and Information Seeking Following the H1N1 Outbreak: Blogs, Newspaper Articles, and Wikipedia Visits. Health Communication 27, 2 (2012), 179--185. https://doi.org/10.1080/10410236.2011.571759Google ScholarGoogle ScholarCross RefCross Ref
  49. Simone Ueberwasser and Elisabeth Stark. 2017. What's up, Switzerland? A corpus-based research project in a multilingual country.Linguistik Online84, 5 (Sep. 2017). https://doi.org/10.13092/lo.84.3849Google ScholarGoogle Scholar
  50. Niels van Berkel, Denzil Ferreira, and Vassilis Kostakos. 2017. The Experience Sampling Method on Mobile Devices. ACM Comput. Surv.50, 6, Article 93 (Dec. 2017), 40 pages. https://doi.org/10.1145/3123988Google ScholarGoogle Scholar
  51. Lieke Verheijen and Wessel Stoop. 2016. Collecting Facebook Posts and WhatsApp Chats. In Text, Speech, and Dialogue, Petr Sojka, Ale? Horák, Ivan Kopeek, and Karel Pala (Eds.). Springer International Publishing, Cham, 249--258.Google ScholarGoogle Scholar
  52. Sarah Theres Völkel, Daniel Buschek, Jelena Pranjic, and Heinrich Hussmann. 2019. Understanding Emoji Interpretation through User Personality and Message Context. In Proceedings of the 21st International Conference on Human-Computer Interaction with Mobile Devices and Services(Taipeh, Taiwan)(MobileHCI '19). ACM, New York, NY, USA. https://doi.org/10.1145/3338286.3340114Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Sarah Theres Völkel, Ramona Schödel, Daniel Buschek, Clemens Stachl, Quay Au, Bernd Bischl, Markus Bühner,and Heinrich Hussmann. 2019. Opportunities and Challenges of Utilizing Personality Traits for Personalization in HCI:Towards a shared perspective from HCI and Psychology. De Gruyter, Oldenbourg, Germany.Google ScholarGoogle Scholar
  54. Michael Wilson. 1988. MRC psycholinguistic database: Machine-usable dictionary, version 2.00.Behavior Research Methods, Instruments, & Computers 20, 1 (01 Jan 1988), 6--10. https://doi.org/10.3758/BF03202594Google ScholarGoogle Scholar
  55. Tal Yarkoni. 2010. Personality in 100,000 Words: A large-scale analysis of personality and word use among bloggers. Journal of Research in Personality 44, 3 (jun 2010), 363--373. https://doi.org/10.1016/j.jrp.2010.04.001 arXiv:NIHMS150003Google ScholarGoogle ScholarCross RefCross Ref
  56. Michelle X. Zhou, Gloria Mark, Jingyi Li, and Huahai Yang. 2019. Trusting Virtual Agents: The Effect of Personality. ACM Trans. Interact. Intell. Syst. 9, 2--3, Article 10 (March 2019), 36 pages. https://doi.org/10.1145/3232077Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. LanguageLogger: A Mobile Keyboard Application for Studying Language Use in Everyday Text Communication in the Wild

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!