Abstract

We present a concept and tool for studying language use in everyday mobile text communication (e.g. chats). Our approach for the first time enables researchers to collect comprehensive data on language use during unconstrained natural typing (i.e. no study tasks) without logging readable messages to preserve privacy. We achieve this with a combination of three customisable text abstraction methods that run directly on participants' phones. We report on our implementation as an Android keyboard app and two evaluations: First, we simulate text reconstruction attempts on a large text corpus to inform conditions for minimising privacy risks. Second, we assess people's experiences in a two-week field deployment (N=20). We release our app as an open source project to the community to facilitate research on open questions in HCI, Linguistics and Psychology. We conclude with concrete ideas for future studies in these areas.
- 2013. Korpusbasierte Wortgrundformenliste DEREWO, v-ww-bll-320000g-2012--12--31--1.0, mit Benutzerdokumentation. http://www.ids-mannheim.de/derewoGoogle Scholar
- Azy Barak and Orit Gluck-Ofri. 2007. Degree and Reciprocity of Self-Disclosure in Online Forums. Cyber Psychology & Behavior 10, 3 (2007), 407--417. https://doi.org/10.1089/cpb.2006.9938Google Scholar
Cross Ref
- Michael Beißwenger and Angelika Storrer. 2008. 21. Corpora of Computer-Mediated Communication. Corpus Linguistics. An International Handbook. Series: Handbücher zur Sprach-und Kommunikationswissenschaft/Handbooks of Linguistics and Communication Science. Mouton de Gruyter, Berlin(2008).Google Scholar
- Barry Brown, Moira McGregor, and Donald McMillan. 2014. 100 Days of iPhone Use: Understanding the Details of Mobile Device Use. In Proceedings of the 16th International Conference on Human-computer Interaction with Mobile Devices & Services(Toronto, ON, Canada)(MobileHCI '14). ACM, New York, NY, USA, 223--232. https://doi.org/10.1145/2628363.2628377Google Scholar
Digital Library
- Tom Buchanan, Carina Paine, Adam N. Joinson, and Ulf-Dietrich Reips. 2007. Development of Measures of Online Privacy Concern and Protection for Use on the Internet.J. Am. Soc. Inf. Sci. Technol. 58, 2 (Jan. 2007), 157--165. https://doi.org/10.1002/asi.v58:2Google Scholar
Cross Ref
- Daniel Buschek, Benjamin Bisinger, and Florian Alt. 2018. ResearchIME: A Mobile Keyboard Application for Studying Free Typing Behaviour in the Wild. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems(Montreal QC, Canada)(CHI '18). ACM, New York, NY, USA, Article 255, 14 pages. https://doi.org/10.1145/3173574.3173829Google Scholar
Digital Library
- Daniel Buschek, Sarah Völkel, Clemens Stachl, Lukas Mecke, Sarah Prange, and Ken Pfeuffer. 2018. Experience Sampling As Information Transmission: Perspective and Implications. In Proceedings of the 2018 ACM International Joint Conference and 2018 International Symposium on Pervasive and Ubiquitous Computing and Wearable Computers(Singapore, Singapore)(UbiComp '18). ACM, New York, NY, USA, 606--611. https://doi.org/10.1145/3267305.3267543Google Scholar
Digital Library
- Henriette Cramer, Paloma de Juan, and Joel Tetreault. 2016. Sender-intended Functions of Emojis in US Messaging. In Proceedings of the 18th International Conference on Human-Computer Interaction with Mobile Devices and Services(Florence, Italy)(MobileHCI '16). ACM, New York, NY, USA, 504--509. https://doi.org/10.1145/2935334.2935370Google Scholar
Digital Library
- Irit Dinur and Kobbi Nissim. 2003. Revealing Information While Preserving Privacy. In Proceedings of the Twenty-Second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems(San Diego, California)(PODS'03). Association for Computing Machinery, New York, NY, USA, 202--210. https://doi.org/10.1145/773153.773173Google Scholar
Digital Library
- Michelle Drouin and Claire Davis. 2009. R u txting? Is the Use of Text Speak Hurting Your Literacy? Journal of Literacy Research 41, 1 (2009), 46--67. https://doi.org/10.1080/10862960802695131Google Scholar
Cross Ref
- Michelle Drouin and Brent Driver. 2014. Texting, textese and literacy abilities: A naturalistic study.Journal of Research in Reading 37, 3 (2014), 250--267. https://doi.org/10.1111/j.1467--9817.2012.01532.xGoogle Scholar
- Penelope Eckert. 2008. Variation and the indexical field1.Journal of Sociolinguistics 12, 4 (2008), 453--476. https://doi.org/10.1111/j.1467--9841.2008.00374.xGoogle Scholar
- Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. 2014. RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security(Scottsdale, Arizona, USA)(CCS '14). Association for Computing Machinery, New York, NY, USA, 1054--1067. https://doi.org/10.1145/2660267.2660348Google Scholar
Digital Library
- Golnoosh Farnadi, Geetha Sitaraman, Shanu Sushmita, Fabio Celli, Michal Kosinski, David Stillwell, Sergio Davalos,Marie-Francine Moens, and Martine De Cock. 2016. Computational personality recognition in social media. User Modeling and User-Adapted Interaction 26, 2 (01 Jun 2016), 109--142. https://doi.org/10.1007/s11257-016--9171-0Google Scholar
- Chris Fullwood, Lisa J. Orchard, and Sarah A. Floyd. 2013. Emoticon convergence in Internet chat rooms. Social Semiotics23, 5 (2013), 648--662. https://doi.org/10.1080/10350330.2012.739000 arXiv: https://doi.org/10.1080/10350330.2012.739000Google Scholar
- J. Golbeck, C. Robles, M. Edmondson, and K. Turner. 2011. Predicting Personality from Twitter. In 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third International Conference on Social Computing. 149--156. https://doi.org/10.1109/PASSAT/SocialCom.2011.33Google Scholar
Cross Ref
- Dirk Goldhahn, Thomas Eckart, and Uwe Quasthoff. 2012. Building Large Monolingual Dictionaries at the Leipzig Corpora Collection: From 100 to 200 Languages. In LREC, Vol. 29. 31--43.Google Scholar
- Joshua Goodman, Gina Venolia, Keith Steury, and Chauncey Parker. 2002. Language Modeling for Soft Keyboards. In Proceedings of the 7th International Conference on Intelligent User Interfaces(San Francisco, California, USA)(IUI '02). ACM, New York, NY, USA, 194--195. https://doi.org/10.1145/502716.502753Google Scholar
Digital Library
- Joshua T. Goodman. 2001. A bit of progress in language modeling. Computer Speech & Language 15, 4 (2001), 403--434. https://doi.org/10.1006/csla.2001.0174Google Scholar
Digital Library
- Niels Henze, Enrico Rukzio, and Susanne Boll. 2012. Observational and Experimental Investigation of Typing Behaviour Using Virtual Keyboards for Mobile Devices. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems(Austin, Texas, USA)(CHI '12). ACM, New York, NY, USA, 2659--2668. https://doi.org/10.1145/2207676.2208658Google Scholar
Digital Library
- Susan C. Herring and John C. Paolillo. 2006. Gender and genre variation in weblogs. Journal of Sociolinguistics10, 4(2006), 439--459. https://doi.org/10.1111/j.1467--9841.2006.00287.xarXiv: https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1467--9841.2006.00287.xGoogle Scholar
Cross Ref
- Anjuli Kannan, Karol Kurach, Sujith Ravi, Tobias Kaufmann, Andrew Tomkins, Balint Miklos, Greg Corrado, Laszlo Lukacs, Marina Ganea, Peter Young, and Vivek Ramavajjala. 2016. Smart Reply: Automated Response Suggestion for Email. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(San Francisco, California, USA)(KDD '16). ACM, New York, NY, USA, 955--964. https://doi.org/10.1145/2939672.2939801Google Scholar
Digital Library
- R. Kneser and H. Ney. 1995. Improved backing-off for M-gram language modeling. In 1995 International Conference on Acoustics, Speech, and Signal Processing, Vol. 1. 181--184 vol.1. https://doi.org/10.1109/ICASSP.1995.479394Google Scholar
Cross Ref
- Michal Kosinski, David Stillwell, and Thore Graepel. 2013. Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences 110, 15 (2013), 5802--5805. https://doi.org/10.1073/pnas.1218772110Google Scholar
- Per Ola Kristensson and Keith Vertanen. 2014. The Inviscid Text Entry Rate and Its Application As a Grand Goal for Mobile Text Entry. In Proceedings of the 16th International Conference on Human-computer Interaction with Mobile Devices & Services(Toronto, ON, Canada)(MobileHCI '14). ACM, New York, NY, USA, 335--338. https://doi.org/10.1145/2628363.2628405Google Scholar
Digital Library
- Ethan Kross, Philippe Verduyn, Margaret Boyer, Brittany Drake, Izzy Gainsburg, Brian Vickers, Oscar Ybarra, and JohnJonides. 2019. Does counting emotion words on online social networks provide a window into people's subjective experience of emotion? A case study on Facebook. Emotion (Washington, D.C.)19, 1 (February 2019), 97-107. https://doi.org/10.1037/emo0000416Google Scholar
Cross Ref
- Weijian Li, Yuxiao Chen, Tianran Hu, and Jiebo Luo. 2018. Mining the Relationship between Emoji Usage Patterns and Personality. In International AAAI Conference on Web and Social Media. AAAI Publications, Palo Alto, CA, USA, 4. http://arxiv.org/abs/1804.05143Google Scholar
- Rich Ling and Naomi S. Baron. 2007. Text Messaging and IM.Journal of Language and Social Psychology 26, 3 (2007),291 --298. https://doi.org/10.1177/0261927X06303480Google Scholar
- Matthias R. Mehl, James W. Pennebaker, D. Michael Crow, James Dabbs, and John H. Price. 2001. The Electronically Activated Recorder (EAR): A device for sampling naturalistic daily activities and conversations.Behavior Research Methods, Instruments, & Computers 33, 4 (01 Nov 2001), 517--523. https://doi.org/10.3758/BF03195410Google Scholar
- Hannah Jean Miller, Daniel Kluver, Jacob Thebault-Spieker, Loren G Terveen, and Brent J Hecht. 2017. Understanding Emoji Ambiguity in Context: The Role of Text in Emoji-Related Miscommunication.. In International AAAI Conference on Web and Social Media. AAAI Publications, Palo Alto, CA, USA, 152--161.Google Scholar
- Gene Ouellette and Melissa Michaud. 2016. Generation text: Relations among undergraduates' use of text messaging, textese, and language and literacy skills. Canadian Journal of Behavioural Science / Revue canadienne des sciences ducomportement48, 3 (2016), 217--221. https://doi.org/10.1037/cbs0000046Google Scholar
- James W Pennebaker, Martha E. Francis, and Roger J. Booth. 2001. Linguistic inquiry and word count: LIWC 2001. Mahway: Lawrence Erlbaum Associates 71, 2001 (2001), 2001.Google Scholar
- Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, 1532--1543. https://doi.org/10.3115/v1/D14--1162Google Scholar
Cross Ref
- Paul R. Pintrich. 2003. A Motivational Science Perspective on the Role of Student Motivation in Learning and Teaching Contexts. Journal of Educational Psychology 95, 4 (2003), 667--686. https://doi.org/10.1037/0022-0663.95.4.667Google Scholar
- Henning Pohl, Christian Domin, and Michael Rohs. 2017. Beyond Just Text: Semantic Emoji Similarity Modeling to Support Expressive Communication.ACM Trans. Comput.-Hum. Interact. 24, 1, Article 6 (March 2017), 42 pages. https://doi.org/10.1145/3039685Google Scholar
Digital Library
- Henning Pohl, Dennis Stanke, and Michael Rohs. 2016. EmojiZoom: Emoji Entry via Large Overview Maps. In Proceedings of the 18th International Conference on Human-Computer Interaction with Mobile Devices and Services(Florence, Italy) (MobileHCI '16). ACM, New York, NY, USA, 510--517. https://doi.org/10.1145/2935334.2935382Google Scholar
Digital Library
- Robert Remus, Uwe Quasthoff, and Gerhard Heyer. 2010. SentiWS-A Publicly Available German-language Resource for Sentiment Analysis. In LREC. Citeseer.Google Scholar
- Shyam Reyal, Shumin Zhai, and Per Ola Kristensson. 2015. Performance and User Experience of Touchscreen and Gesture Keyboards in a Lab Setting and in the Wild. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems(Seoul, Republic of Korea)(CHI '15). ACM, New York, NY, USA, 679--688. https://doi.org/10.1145/2702123.2702597Google Scholar
Digital Library
- Avi Rosenfeld, Sigal Sina, David Sarne, Or Avidov, and Sarit Kraus. 2018. A Study of WhatsApp Usage Patterns and Prediction Models without Message Content. CoRRabs/1802.03393 (2018). arXiv:1802.03393 http://arxiv.org/abs/1802.03393Google Scholar
- Alireza Sahami Shirazi, Niels Henze, Tilman Dingler, Martin Pielot, Dominik Weber, and Albrecht Schmidt. 2014. Large-scale Assessment of Mobile Notifications. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems(Toronto, Ontario, Canada)(CHI '14). ACM, New York, NY, USA, 3055--3064. https://doi.org/10.1145/2556288.2557189Google Scholar
Digital Library
- H. Schmid. 1999.Improvements in Part-of-Speech Tagging with an Application to German. Springer Netherlands, Dordrecht, 13--25. https://doi.org/10.1007/978--94-017--2390--9_2Google Scholar
- H Andrew Schwartz, Johannes C Eichstaedt, Margaret L Kern, Lukasz Dziurzynski, Stephanie M Ramones, Megha Agrawal, Achal Shah, Michal Kosinski, David Stillwell, Martin E P Seligman, and Lyle H Ungar. 2013. Personality, gender, and age in the language of social media: the open-vocabulary approach. PloS one 8, 9 (2013), e73791. https://doi.org/10.1371/journal.pone.0073791Google Scholar
Cross Ref
- H. Andrew Schwartz, Johannes C. Eichstaedt, Margaret L. Kern, Lukasz Dziurzynski, Stephanie M. Ramones, Megha Agrawal, Achal Shah, Michal Kosinski, David Stillwell, Martin E. P. Seligman, and Lyle H. Ungar. 2013. Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach. PLOS ONE8, 9 (09 2013), 1--16. https://doi.org/10.1371/journal.pone.0073791Google Scholar
- Beat Siebenhaar. 2006.Code choice and code-switching in Swiss-German Internet Relay Chat rooms. Journal of Sociolinguistics 10, 4 (2006), 481--506. https://doi.org/10.1111/j.1467--9841.2006.00289.xarXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1467--9841.2006.00289.xGoogle Scholar
- Noah A. Smith. 2019. Contextual Word Representations: A Contextual Introduction. arXiv:1902.06006 [cs](Feb. 2019).http://arxiv.org/abs/1902.06006 arXiv: 1902.06006.Google Scholar
- Zhiyi Song, Stephanie Strassel, Haejoong Lee, Kevin Walker, Jonathan Wright, Jennifer Garland, Dana Fore, Brian Gainor, Preston Cabe, Thomas Thomas, Brendan Callahan, and Ann Sawyer. 2014. Collecting Natural SMS and Chat Conversations in Multiple Languages: The BOLT Phase 2 Corpus. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014). European Languages Resources Association (ELRA), Reykjavik, Iceland, 1699--1704. http://www.lrec-conf.org/proceedings/lrec2014/pdf/1094_Paper.pdfGoogle Scholar
- Clemens Stachl, Sven Hilbert, Jiew-Quay Au, Daniel Buschek, Alexander De Luca, Bernd Bischl, Heinrich Hussmann,and Markus Bühner. 2017. Personality Traits Predict Smartphone Usage. European Journal of Personality 31, 6 (2017),701--722. https://doi.org/10.1002/per.2113Google Scholar
- Yla Tausczik, Kate Faasse, James W. Pennebaker, and Keith J. Petrie. 2012. Public Anxiety and Information Seeking Following the H1N1 Outbreak: Blogs, Newspaper Articles, and Wikipedia Visits. Health Communication 27, 2 (2012), 179--185. https://doi.org/10.1080/10410236.2011.571759Google Scholar
Cross Ref
- Simone Ueberwasser and Elisabeth Stark. 2017. What's up, Switzerland? A corpus-based research project in a multilingual country.Linguistik Online84, 5 (Sep. 2017). https://doi.org/10.13092/lo.84.3849Google Scholar
- Niels van Berkel, Denzil Ferreira, and Vassilis Kostakos. 2017. The Experience Sampling Method on Mobile Devices. ACM Comput. Surv.50, 6, Article 93 (Dec. 2017), 40 pages. https://doi.org/10.1145/3123988Google Scholar
- Lieke Verheijen and Wessel Stoop. 2016. Collecting Facebook Posts and WhatsApp Chats. In Text, Speech, and Dialogue, Petr Sojka, Ale? Horák, Ivan Kopeek, and Karel Pala (Eds.). Springer International Publishing, Cham, 249--258.Google Scholar
- Sarah Theres Völkel, Daniel Buschek, Jelena Pranjic, and Heinrich Hussmann. 2019. Understanding Emoji Interpretation through User Personality and Message Context. In Proceedings of the 21st International Conference on Human-Computer Interaction with Mobile Devices and Services(Taipeh, Taiwan)(MobileHCI '19). ACM, New York, NY, USA. https://doi.org/10.1145/3338286.3340114Google Scholar
Digital Library
- Sarah Theres Völkel, Ramona Schödel, Daniel Buschek, Clemens Stachl, Quay Au, Bernd Bischl, Markus Bühner,and Heinrich Hussmann. 2019. Opportunities and Challenges of Utilizing Personality Traits for Personalization in HCI:Towards a shared perspective from HCI and Psychology. De Gruyter, Oldenbourg, Germany.Google Scholar
- Michael Wilson. 1988. MRC psycholinguistic database: Machine-usable dictionary, version 2.00.Behavior Research Methods, Instruments, & Computers 20, 1 (01 Jan 1988), 6--10. https://doi.org/10.3758/BF03202594Google Scholar
- Tal Yarkoni. 2010. Personality in 100,000 Words: A large-scale analysis of personality and word use among bloggers. Journal of Research in Personality 44, 3 (jun 2010), 363--373. https://doi.org/10.1016/j.jrp.2010.04.001 arXiv:NIHMS150003Google Scholar
Cross Ref
- Michelle X. Zhou, Gloria Mark, Jingyi Li, and Huahai Yang. 2019. Trusting Virtual Agents: The Effect of Personality. ACM Trans. Interact. Intell. Syst. 9, 2--3, Article 10 (March 2019), 36 pages. https://doi.org/10.1145/3232077Google Scholar
Digital Library
Index Terms
LanguageLogger: A Mobile Keyboard Application for Studying Language Use in Everyday Text Communication in the Wild
Recommendations
ResearchIME: A Mobile Keyboard Application for Studying Free Typing Behaviour in the Wild
CHI '18: Proceedings of the 2018 CHI Conference on Human Factors in Computing SystemsWe present a data logging concept, tool, and analyses to facilitate studies of everyday mobile touch keyboard use and free typing behaviour: 1) We propose a filtering concept to log typing without recording readable text and assess reactions to filters ...
Identifying Design Opportunities for Multilingual Communication at International Courses: A Diary Study
CHI EA '18: Extended Abstracts of the 2018 CHI Conference on Human Factors in Computing SystemsPrevious research suggests that communication at international courses is usually multilingual. Students who speak the same native language may initiate course related discussions in either their own language or a common language shared by the whole ...
An Analysis of Language in University Students' Text Messages
Concerns over effects of 'textisms' on literacy have been reinforced by research identifying processing costs associated with reading textisms. But to what extent do such studies reflect actual textism use? This study examined the textual ...






Comments