1D-Touch: NLP-Assisted Coarse Text Selection via a Semi-Direct Gesture

Existing text selection techniques on touchscreen focus on improving the control for moving the carets. Coarse-grained text selection on word and phrase levels has not received much support beyond word-snapping and entity recognition. We introduce 1D-Touch, a novel text selection method that complements the carets-based sub-word selection by facilitating the selection of semantic units of words and above. This method employs a simple vertical slide gesture to expand and contract a selection area from a word. The expansion can be by words or by semantic chunks ranging from sub-phrases to sentences. This technique shifts the concept of text selection, from defining a range by locating the first and last words, towards a dynamic process of expanding and contracting a textual semantic entity. To understand the effects of our approach, we prototyped and tested two variants: WordTouch, which offers a straightforward word-by-word expansion, and ChunkTouch, which leverages NLP to chunk text into syntactic units, allowing the selection to grow by semantically meaningful units in response to the sliding gesture. Our evaluation, focused on the coarse-grained selection tasks handled by 1D-Touch, shows a 20% improvement over the default word-snapping selection method on Android.


INTRODUCTION
Text selection through direct manipulation on touchscreens is known to be difficult due to the Fat Finger Problem [28].The target acquisition task itself consists of defining a range between the start and end positions.The fundamental interface used for this task involves moving a pair of carets to these positions between words or characters.To mitigate the challenge of direct manipulation with low precision input, previous research focused on introducing indirection to text selection by providing indirect control of caret positions [1,2,13].
Although character-level text selection is important, there are many text manipulation tasks that focus on a larger granularity of text that conveys semantic meanings with words, phrases, clauses, or sentences.Typical examples include highlighting key phrases for note-taking, selecting phrases to be translated, copying-pasting Named Entities (NE) or other semantic segments, etc.Furthermore, there are increasing needs and opportunities for supporting semantic-based text manipulation, with the rapid rise of speech-based multimodal text manipulation, which often operates on the word-level [31], and the breakthroughs in Large Language Models (LLMs) that operates on the tokens (i.e.words).
Previous research showed that providing word-snapping for manipulating carets can improve the performance of some text selection tasks [22].In fact, both Android and iOS have integrated a word-snapping technique in their text selection method, in which users begin with a long-press (or double-tap) on a word to select it, and subsequently, without lifting the finger, drag it to the target 1 .The selection snaps to the end of the word closest to the touch lift position.This technique is integrated seamlessly with the carets for character-level selection by having the handles shown after the word-level selection is finished as the finger first lifts.
Beyond word-snapping, semantic-based text selection includes selecting units of phrase, sentence, and paragraph [12,18] as well as "smart text selection" assisted by Natural Language Processing (NLP) that recognizes user intent [23] or common entities in the text [4], such as addresses, URLs, and phone numbers 2 .Previous research also introduced the concept of "fuzzy text selection" and demonstrated how it can be well-used for text highlighting for sense-making tasks on mobile [6] and for assisting active reading with multimodal input [16].
Building on existing work, this paper introduces 1D-Touch-a new approach for coarse-grained text selection in the granularity of words and above.Using one-dimensional input, 1D-Touch shifts the concept of text selection, from specifying two 2D positions for defining a range, to selecting one word as the initial target and gradually expanding it to adjacent words to form a syntactic or semantic chunk of selection.Our technique can also be set to expand the selection by semantic units in sizes of words, sub-phrases, clauses, and sentences.We prototyped 1D-Touch by leveraging the constituency parsing in existing NLP tools to chunk text, providing a more gradual and advanced segmentation than basic units that only include words, phrases, sentences, and paragraphs.
The novelty of our approach is twofold: the use of a semi-direct one-dimensional sliding gesture for text selection; and the continuous and multi-scale linguistic chunking of selection units.To test the effects of the gestural control and the semantic chunking respectively, we conducted a controlled experiment that compared variants of 1D-Touch prototypes with expansion by words (WordTouch) and expansion by chunks (ChunkTouch) with the state-of-the-art selection technique on Android with word-snapping (Baseline).Our results showed that both of our 1D-Touch techniques outperformed the baseline significantly in overall performance time by 20%, with ChunkTouch being particularly good for selecting semantically meaningful units.
The three main contributions of this paper include 1) a new approach for text selection that combines direct and indirect manipulation gestures; 2) two 1D-Touch techniques that prototype the concept and enable evaluation of factors contributing to the advanced selection performance; and 3) empirical findings that demonstrate significant performance gain from both the simple semi-direct control and the NLP-assisted text chunking method, as well as provide insights into future improvements and integration with character-level selection methods.

RELATED WORK
This section summarizes existing text selection techniques, many of which have been employed to mitigate the difficulty of controlling carets on touchscreens, such as adding a magnifier [13], using the keyboard as trackpad [2], or tilting the device to control selection [1].We summarize these efforts in providing indirect or direct control for caret positioning, as well as other novel approaches leveraging alternative modalities, such as gaze or touch pressure to extend the users' degrees of freedom of control.

Indirect Control
Applying mapping strategies to enable users to control the handle positions without directly touching them is one remedy to the fat finger problem.The keyboard, as a familiar tool in text editing scenarios, is usually leveraged as the virtual controller [1,2,11,17,30].Fuccella et al. presented a series of gesture controls for text selection, where the selection range can be extended by swiping two fingers on the keyboard left or right [11].This method expands the selection by one word per swipe, which is designed for fine-tuning an existing selection during text editing rather than selecting longer targets across multiple sentences.Ando et al. pegged particular keys to navigate the cursor, leaving other keys for extending the selection range [2].Press & Tilt controls the selection by tilting the phone while pressing a key [1].However, unconventional tilting gestures require additional practice, which makes them more challenging than conventional touch-and handle-based techniques.Gedit proposes two types of gestures to control the cursor, including ring and flick, which were proven to be easy to learn and conducive for efficient revision for selected text [30].ForceSelect allows users to adjust the text selection granularity on mobile headsets by controlling the force they apply to press on the virtual trackpad with a mapping strategy [18].Suzuki et al. investigated switching the frame of reference, enabling users to move the rest of the interface around while keeping the caret fixed at the middle of the screen [29].Evaluations demonstrated significant improvements in selection time compared to the default UI, however, only for very small text (12 pt).

Direct Control
A few techniques have been proposed to address the Fat Finger Problem while maintaining direct control over the caret handles.Smart Selection leverages linguistic analysis through an ensemble of learning approaches to determine the users' intentions based on their touchpoints [23].Multi-step approaches have been leveraged to select precisely with touch -the user begins by providing a rough localization of the target text, after which they can adjust the precision of the selection with a zoom-in interface [7,20,27].BezelCopy uses bezel gestures, whereby the sentence is selected by dragging from the left bezel to the sentence on the screen [7].It then creates a new fullscreen page highlighting only the selected sentence with enlarged font and handle sizes, allowing the user to perform the conventional handle-based selection and copy with more confidence.Complete copy-paste workflows were evaluated, and the results showed that the technique outperforms the system-default pipeline for common tasks.Esteves et al. explored the use of touch-based motion matching to indirectly select out-of-reach targets [10].Swap by Li et al. is useful for text selection during revision by swapping individual words within a given text [20].After typing in a new word at the end of the text, the user taps the area around the word to be changed, which converts texts in its vicinity into enlarged buttons for easy selection.This eliminates the need to focus on the tiny carets during text selection.Additionally, Goguey et al. introduced a force-sensitive selection technique that leveraged a virtual "mode gauge" that displays the touch pressure when performing selection [12].Different pressure levels are mapped to different levels of chunking, and the gauge visualizes the pressure changes.

Alternative Modalities
While our technique focuses exclusively on touch input, we have also reviewed a few prior works that rely on alternative or multiple modalities to assist in text selection.Several techniques adopt touch pressure and force to provide an additional dimension of input control [12,18,19].However, while empirical results of these efforts showed benefits for some selection targets, they did not exhibit an overall significant performance gain in terms of selection time and error rate, compared to the standard caret selection [18].
Gaze is also leveraged as additional information to assist text selection and editing [3,26].However, empirical results also question the effectiveness, e.g., Gaze'N'Touch performs better than the touch-based technique when it comes to larger texts but not to a significant degree [26].More recently, EyeSayCorrect leverages users' gaze trajectory to implicitly select a word to be edited with voice, which saves text selection time by 40% [31].However, this technique is dedicated to error correction and can select only one word at a time.Gaze-Shifting, on the other hand, uses gaze to modulate input and combines direct pointing (e.g., through a touch pen) and indirect mapping through the user's gaze that offsets the input to the visual target [24].

Novelty of Our Approach
Prior research has investigated various ways to improve text selection, yet most of them focused on manipulating carets, either directly or indirectly.Despite the Fat Finger Problem persists in the standard caret selection technique, new interventions demonstrate limited overall performance gain.Our work takes a different path and focuses on introducing a semi-direct control to improve text selection at the word level and above.
There are two main novelties in this approach.First, indirect input and user control are introduced differently in the technique.Instead of indirectly controlling the position of the cursor or carets, the technique employs a gesture that starts with direct manipulation (touch) on a word and ends with an indirect mapping between distance increment and the number of included semantic entities in the target.Second, we introduce a continuous and generic NLP-assisted text selection method that snaps the ends of the selection to a syntactic or semantic unit, which differs from related works that spot only special entities in the text, such as addresses or phone numbers.

1D-TOUCH TECHNIQUES
Text represents a one-dimensional stream of information that is placed sequentially into a twodimensional form for printing or displaying on screens.Existing text selection techniques echo this dimensional dissonance as they accomplish the target acquisition by finding the positions of the two ends.Our work, on the other hand, introduces the use of one-dimensional control for text selection with 1D-Touch, a new technique that allows users to select text with a vertical 1D sliding gesture.The user slides up and down to expand or contract the selected range.The expansion and contraction of the selection range are controlled by the sliding distances 3 .
How can we segment text into distinct semantic units that represent chunks of meaning?In cognitive science literature, such "meaning" is often described as the "gist" extracted from a given verbatim, functioning at varying degrees of abstraction [25].Consequently, we define semantic units of text as chunks that can vary in size and represent meanings at different levels of abstraction.As an initial exploration, we choose word as the smallest semantic unit, with the syntactic unit serving as a method to chunk text that mirrors its underlying semantics.
We built two variants of the technique that adopt the 1D control method-WordTouch and ChunkTouch.WordTouch employs a simple word-by-word expansion controlled by the sliding distance, while ChunkTouch expands by syntactic units that grow larger in chunks derived from NLP Constituency Parsing.This section introduces how we designed and implemented these two 1D-Touch techniques in detail.

Selecting by 1D Sliding
To start the text selection, the user activates the 1D-Touch techniques by directly long-pressing on a word, highlighting its background.We use the conventional 500 ms threshold for detecting the long-press which is the same as the standard selection techniques on Android and iOS.Once activated, the user can slide up or down to expand the selection from the first selected word to the words before or after it, as shown in Figure 2. The selection keeps growing as the user continues sliding.Haptic feedback is provided for activation as well as every unitary expansion, similar to the default vibration feedback for long-press on Android and iOS devices.

Mapping the Sliding Distance to the Number of Expansion Units.
We map the sliding distance on the y-axis (as compared to the initial touchpoint, in millimeters) to the number of words (for WordTouch) or syntactic chunks (for ChunkTouch) to be added to the selection.
Equation 1 describes how we determine the number of words or chunks ( ) to expand, where PPI is the pixel density of the screen,  is the vertical sliding distance in pixels, and  is the predefined triggering distance of expansion for each technique.Based on our own experiences in selecting with 1D-Touch techniques on the device used for the following experiments (with a 6.7-inch screen), we set  WordTouch = 1.5 and  ChunkTouch = 10 for our implementation (Figure 3).We noticed that ChunkTouch expansion needed a longer triggering distance because users needed more time to confirm the next chunk of text to be selected.Shorter triggering distances caused more overshoots.Thus we chose 10 mm as a trade-off between selection efficiency and overshoot frequency.
More research needs to be conducted to optimize these thresholds in the future.They can also be adaptive for different screen sizes and user preferences (possibly through a calibration process to be discussed in Section 6).

Rewinding the Overshot Selections.
Overshoot is a common problem in text selection where one selects more than they originally planned to select.1D-Touch techniques support rewinding to fix an overshoot.After overshooting, before lifting the finger, the user can slide backward to rewind the current selection.The reference point for rewinding is the furthest point they have previously reached, so that the rewinding slide can go beyond the initial touchpoint.Rewinding is always by words, and we use the same mapping relationship as WordTouch for both techniques.As a result, when using ChunkTouch, if the expanded chunk is longer than the text the user intended to select, they can use rewinding as a remedy.ChunkTouch thus supports selecting from any word to any word in a corpus without sacrificing the efficiency of expanding the selection.This differs from other chunking methods in existing works that are limited to selecting up to five fixed levels of granularity-word, phrase, sentence, paragraph, and the whole text [12,20,22].

Clutching with
Increasing Target Sizes.1D-Touch supports clutching, which allows users to continue expanding an existing selection by sliding again.If one shot of sliding does not expand the selection enough to include all target text, users can lift the finger and start another sliding gesture from anywhere on the selected text.As a result, the selection becomes increasingly easier when the user clutches, as the target area grows bigger.While clutching, the selection always starts with expansion towards the sliding direction, while the user can retract by sliding backward without lifting the finger.Because our technique supports expanding to both directions (up and down), enabling clutching to directly retract would conflict with expanding to the other direction.After expanding, the user could then retract any number of words, even reducing the selection area to be smaller than before the clutching was performed, to correct overshoot from a previous gesture.

Chunking Text by Syntactic Units
We prototyped ChunkTouch by chunking text in syntactic units, as a first-step approach to infer units that represent meaning.As shown by the example in Figure 4A, the next text chunks to expand can be of any length of phrases, clauses, sentences, etc.We discuss how we identify these syntactic chunks in Section 3.2.2.As the lengths of the expanding chunks vary, which increases the uncertainty for selection actions, we design the visual feedforward and feedback mechanisms to assist using the ChunkTouch technique.Brackets show future chunks based on the current selection.We introduce brackets, placed before and after the current selection, as a visual feedforward to inform users of the future chunks that will be merged into the current selection.The brackets that are most visible and closest to the current selection indicate the endpoints of the next chunk to be merged, as in Figure 4A.1.The lighter brackets provide a preview of the next level of expansion.We display three sets of brackets that expand from the selected text in this implementation.As selection grows, the adjacent brackets move simultaneously, e.g., in Figure 4A, the closest bracket moves from position 1 to 3, and then to 5. Through a different perspective, users can also see the opacity of the brackets change, as in Figure 4A.2, 4, and 5, signifying the approaching expansion to the end of this sentence.For ChunkTouch, we actively compute brackets for every change in the selection and always display them with the selection.
Changing the background highlights the immediate chunk to expand.As the user starts sliding, we color the background of the to-be-merged chunk to semi-transparent (Figure 4B).As the sliding distance increases, the closer the expansion is to be triggered, the less transparent the background of the text chunk will be, up to sixty percent of the background density of the selected text that creates a discrete visual indication when a chunk becomes actually selected.The semi-transparent background works as feedback to actively inform the user of the upcoming chunk and visualize their sliding progress.Together with feedforwarding brackets, the changing-color background makes the action status clearer and increases the controllability of ChunkTouch.

Segmenting Text with NLP.
To segment text and identify neighboring syntactic chunks, we leverage an existing constituency parser from CoreNLP, a state-of-the-art NLP model to perform syntactic analysis on the corpus [21,32].The corpus will first be pre-processed into a tree-like structure that can be used for the following selection activities (Figure 5).The CoreNLP parser divides the corpus by sentences into disconnected trees and assigns a ROOT for each of them.We then added one more whole-text-level ROOT as the parent of them to build a tree that connects all the sentences.We refer to the green part-of-speech (POS) tags for the level of each word.Although the parsing quality can be affected by the syntactic and grammatical accuracy of the original text, we observed that the CoreNLP parser remains robust across various texts and consistently delivers tree structures in a uniform format.The parser demonstrates sufficient robustness for handling in-the-wild user-provided text with typos and grammar errors.
We determine the next syntactic chunks by the user's current selection-we anchor the first and last word of the selected text to find the adjacent chunks, namely siblings, that are positioned before and after the current selection.We identify the sibling chunks to be the adjacent syntactic nodes at the same or closest possible higher levels in the constituency tree.This strategy balances the expansion efficiency and granularity with a presumption that users prefer to expand at the same level of parsing as they have initially selected, e.g., expanding by phrases when a phrase is selected, and expanding by sentences when a sentence is selected.As a result, the more one selects, the faster ChunkTouch expands with larger chunks.
More specifically, we look up the next sibling by finding the node that is adjacent to and shares the same parent with the anchoring node.If none were found, e.g., node NN-fox in Figure 5 has no adjacent node after it, we recursively set the parent of the current anchoring node as the new anchor, and search for its siblings, until reaching the whole-text ROOT, where we know that we have selected the first or last word of the whole corpus.Once an immediate sibling has been found, we place brackets at the respective position or adjust it so that the neighboring punctuation would be included (only for the chunks after).Then, we leverage the end of this chunk as the new anchor, and recursively repeat the steps above, until reaching the corpus ends or the maximum number of brackets needed (3 in the current implementation).

Implementing a Working Prototype
We implement our technique as a web application with React.Each word and punctuation mark is wrapped in an SemanticNode element.A SemanticBlock contains multiple nodes and represents the current segmentation, which can be merged with other blocks or further divided into smaller units.A selected block renders the selection range in colors.A paragraph is represented by a SemanticText element and included by a SemanticBody element with other paragraphs.Touch-event listeners are bound with each paragraph node to detect selection events.
For ChunkTouch, we use Node.js[8] to run a wrapped version of CoreNLP [5], originally only available in Python.SemanticBody detects text changes and sends the new text, divided into paragraphs, to our server for parsing.The results are usually returned within only a fraction of a second and are then processed by our segmentation algorithm and filters.When waiting for the parsing result, or when an error occurs in parsing the text, our interface places every node at the same depth as the instant child of the sentence ROOT as a fallback-ChunkTouch then expands by words (adjacent siblings) just like WordTouch.

STUDY
To evaluate the 1D-Touch approach, we conducted a controlled experiment to compare its efficiency and accuracy against a baseline technique for word-level selection.We compared WordTouch and ChunkTouch in separate experimental conditions in order to distinguish the effects from the 1D control and the semantic chunking.Based on our experience with the technique, the following hypotheses are made: • H1: Both ChunkTouch and WordTouch outperforms Baseline in overall performance time.
• H2: ChunkTouch performs better than WordTouch in selecting Semantic Units, while worse in selecting Non-semantic Units.

Baseline
The baseline technique we chose is the default selection technique provided in the Android system.As mentioned in the Introduction section, it has a word-snapping feature activated by a 500 ms long-press on a word as the start word.Without lifting the finger, the user drags it towards the end of the selection.The system actively snaps the end of the selection to the end of the word closest to the touch position.While the selection expands by words, its endpoint moves backward characterby-character when users retract the selection to allow finer adjustment.This word-snapping feature is also integrated with the caret by having the handles shown after users first release their fingers.Users can then further adjust the selection by character with the handles.To our knowledge, this is still the state-of-the-art text selection technique, which is shown to perform better or comparably well in recent literature [9,12].We refer to this technique as Baseline in this section.
The 9 task types are applied to a collection of textual excerpts that are different in type and length, as well as their positions within the given corpus.Following prior research, these tasks were chosen as they cover a variety of selections that can be required in real-world use [9,12].Instead of applying the original nine types of tasks like the prior works, we removed character-selection tasks and replaced character-to-end tasks with word-to-end tasks, as our technique is designed to support word level and above granularity of selection.
The selection tasks differ in two main parameters.One is the length: Word < Phrase and Nonphrase (≈ four words) < Clause and Half-sentence (≈ ten words) < Sentence < Two-sentences < Word-to-end < Whole-text.The other parameter is whether the selected text forms a semantically meaningful unit or not, which we will refer to as either Semantic Units or Non-semantic Units herein.We created three additional types of tasks with selection targets that do not form semantic units: Non-phrase in a similar length to Phrase, Half-sentence in a similar length to Clause, and Word-to-end.Since it is likely that whether the selection target is a semantic unit or not would affect the performance of ChunkTouch, during the result analysis, we investigated this as an independent variable in order to better understand the effect and the usability of ChunkTouch for the tasks that both align and in-align with the design of the technique.
Our technique leverages the same activation method as the baseline and can potentially be integrated with the caret-based character-level adjustment in the same way as well: carets show after the user finishes the sliding gesture and lifts the finger.Although the integration deserves additional testing, we chose to focus this experiment on testing the tasks that depend only on the novel aspects of 1D-Touch and avoid introducing additional factors by reverse-engineering the system-default carets interface, which is required for prototyping the integration.

Tasks and Instructions
For each trial, we display the target text at the top of the screen (Figure 7).We create the selection tasks based on the parameters mentioned above.9 TaskType × 4 repetitions = 36 unique tasks are created for each Techniqe, resulting in 108 unique tasks in total.We composed all these texts to maintain similar lengths and structures for the same TaskType and to ensure the syntactic units are easily distinguishable.The difficulty for comprehension is kept as easy to understand for high school students or above.To avoid any noise introduced by parser errors from CoreNLP, we tested the corpus to ensure the syntactic segmentation returned in runtime was free of error.

Participants
We recruited twelve undergraduate students (five male, seven female) from local universities as participants.None of the participants had knowledge about our techniques before.

Apparatus
We built a web application that depended on a local server (for CoreNLP, experiment monitoring, and data collection) to run the experiment.Due to technical reasons, the first half of the study was conducted on a Samsung Galaxy S20+ (6.7 inch, 3200 × 1440 px) for participants to complete the tasks with a server running on a Windows desktop.The other half of the studies ran with a Google Pixel 7 (6.3 inch, 2400 × 1080 px) and a Mac server.In both cases, the server and client communicate through Socket.io.The implementation of ChunkTouch uses Node.js to run a wrapped version of CoreNLP (version 4.5.4) on the server side.No significant difference in results was identified for these two setups.
Instructions and text to be selected were displayed in typical size, font (Roboto Regular), and layout to facilitate reading (Figure 7).The font size is adjusted so that the lowercase 'x' is displayed in 2 mm in height on the screen.The left-aligned target text had a line height of 160%, with side margins of 3 mm from screen edges, and the content was placed 50 mm below the screen top.Whole-text 4 .Meanwhile, we found no significant difference between the three techniques for Word and Whole-text tasks.

Selection Accuracy
Another major measure of selection performance is the error rate.Correcting errors in selection takes time and affects user experience.A lower error rate means higher selection accuracy.We measured the error rate through NumAttempts and NumOvershoots for each trial.

Selection Attempts.
NumAttempts is defined as the number of times one clears all the selections, by tapping outside the selected area, to restart the process.We found a significant main effect for TaskType on NumAttempts.The post hoc tests showed that only significant pairwise differences existed between Baseline and WordTouch (  = 1.113,    ℎ = 1.070,  11 = 3.570,  < .05),and ChunkTouch and Baseline ( ℎ ℎ = 1.066,   = 1.113,  11 = 3.483,  < .05),for Phrase selections.Participants needed to restart significantly more often when using ChunkTouch than the other two techniques.We believe this is a main contributor to the time cost in ChunkTouch for Phrase tasks because of the 500 activation time for every restart.

Overshoot.
Both 1D-Touch techniques and Baseline start the selection at the word level, and expand to the text before and after this selection by either sliding up or down or by dragging the carets.Overshoot, i.e. expanding too much that exceeds the target point, is a common selection inaccuracy that could lead to longer selection time and more selection attempts.Overshoot could be costly for 1D-Touch, especially ChunkTouch.For Baseline, however, it is not a problem as it is easy to correct.As shown in Figure 10B, participants overshot significantly more with Baseline than our techniques.We did not find any significant difference between WordTouch and ChunkTouch.There is also no difference between Semantic Units and Non-semantic Units for each technique respectively.From our observation, since overshooting is more costly in ChunkTouch, participants tend to avoid it by reducing the touch speed, opting to undershoot.

Subjective User Experience
We collected the perceived task load with the NASA-TLX index [14], asked the participants to rank each technique, and interviewed them about their experience in using three selection techniques.The results are reported below.

Perceived Workload.
According to the collected feedback of a 7-point NASA Task Load Index questionnaire, none of the measured dimensions (Mental demand, Physical demand, Temporal demand, Performance, Effort, and Frustration) showed significant differences between Baseline, WordTouch, and ChunkTouch.

User Preference.
As shown in Figure 11, 9 out of 12 participants ranked one of the 1D-Touch techniques-WordTouch-as the most preferred technique (the first rank).While the participants found Baseline familiar (P0: "already used to it" and P11: "can go letter by letter as I am used to"), most of them preferred WordTouch as it is more efficient, intuitive, and easy to control (P2: "direct and fast" and P8: "intuitive to interact").For ChunkTouch, some participants agreed that chunking helped them select Semantic Units and longer text (P5: "phrases and sentences were easy to select").Some appreciated the visual feedback (P4: "the various levels of color and brackets are informative").Despite the preferences, a chi-squared test on vote results did not show significant differences among the three techniques.
Participants also mentioned frustrations with all techniques.For Baseline, it was harder to select longer pieces of text (P0, P7, P10), required extra attention to deal with characters and punctuation (P1 and P4), and the tiny carets were hard to grab (P2 and P8).For WordTouch, it is harder to rewind (P0, P2, P5, and P7) and the selection could be occluded by the finger body (P4).For ChunkTouch, some participants (P3-P5) found it hard to rewind due to the reduced rewind speed (i.e.expanding by syntactic chunks but rewinding by words).And P3 was confused by the visualization ( "The color changed.So I would think I was selecting the sentences afterward.").Constructive suggestions were also brought up, including adding a zoom-in view as a remedy for occlusion (like the traditional handle-based techniques) and making the sensitivity customizable.The issues and suggestions, which will be further discussed in Section 6, will assist us in developing future iterations of 1D-Touch techniques and in exploring meaning-driven text selection.

Selection Strategies.
We observed several strategies that participants leveraged when using our techniques.P5 found brackets in ChunkTouch helpful in focusing on the adjacent text that is about to be selected ("as it made me look at the chunk rather than the whole text").P6 found himself more careful when using our techniques due to the cost of rewinding ("ChunkTouch required me to be mindful about where I put my finger after my first selection.When using WordTouch, I struggled to make edits to my initial selection so I ended up just canceling the entire selection.").However, P0 drew the opposite conclusion-"I normally use carets by first deciding a range and then the details, but ChunkTouch can help me determine the range." 5.3.4Semantic Chunking.Most participants found the chunking of ChunkTouch "makes sense" (P6), especially in selecting longer text (P6: "chunking more flexibly than only by sentences or paragraphs makes a lot of sense" and P11: "easier to select long targets with chunking").P6 found the chunking helped him select faster ("Having the system recognize semantically-related words for quicker selection of smaller chunks of text.").P3 also reported the inefficiency of ChunkTouch in selecting non-semantic units ("If you're trying to get parts of a phrase, I didn't like how you had to unselect after selecting something.") The comments indicate that the subjective experience with the interface is consistent with our quantitative findings that ChunkTouch performs better in selecting semantic units while having trade-offs in selecting non-semantic units.

5.3.5
Visual and Haptic Feedback.The participants appreciated the various feedback provided in our techniques, including the brackets and color changes for ChunkTouch, as well as haptic feedback for both techniques.Many participants appreciated the visual design (P7: "color changing assisted me in making selection more accurately").While P3 found it a bit confusing and suggested "changing the color altogether rather than just having a lighter color".In addition, P10 found the brackets confusing ("I'm always confused by where the bracket is.").P6 suggested making the "anchor points" (i.e.brackets) more salient.
Most participants find the haptic feedback for both WordTouch and ChunkTouch helpful (P1: "The vibration was helpful, as you can feel what you have selected even before seeing the visual feedback.").P6 preferred haptic feedback and thought it improved the experience of indirect control ("Since my finger isn't directly on top of what I'm selecting, it's nice to have some haptic feedback, other than visual feedback, to tell me I'm doing something.").

User
Acceptance and Use Cases.We asked participants if they wanted to use WordTouch or ChunkTouch for daily text selection.8 out of 12 participants wished to use WordTouch on a daily basis (P5: "selecting some words when sending emails to others" and P11: "WordTouch is probably best for smaller selections where selecting like one word at a time is useful.").7 participants claimed that they wished to employ ChunkTouch in various daily activities, especially when selecting longer text (P0: "quoting some longer sentences" and P6: "working with a lot of for copy and paste").While ChunkTouch has a learning curve, most participants believed it would become more convenient to use after sufficient practice.

DISCUSSION
In this work, we introduce the 1D-Touch approach, which uses one-dimensional semi-direct sliding gestures for text selection in the granularity of word level and above.Our study showed that our approach outperformed the start of the art in efficiency and on par with it in accuracy.Syntactic chunking segments the text into units, from words to larger entities parsed through NLP.While our choices of control (sliding distance) and chunking method (constituency parsing) can find alternatives in the future, evaluating these two example techniques-WordTouch and ChunkTouchallowed us to gain insights into the 1D-Touch approach.

Text Selection as Linguistic Object Expansion
The fundamental conceptual shift 1D-Touch brings is that word-level text selection can be less about finding the positions of the first and last words (or characters) on the screen, and more about growing the selected text object with syntactically or semantically associated units.This may be counter-intuitive for beginners as most people are used to position-based text selection, while easy to get used to based on our experience and user study observations.Multiple clutching also gets increasingly easier with each consecutive stroke as the expanded text increases the target area for the finger to touch.1D-Touch's control also makes it potentially more suitable for selecting longer text where the target ends off-screen on a scrollable interface.With the baseline, the off-screen selection is time-based-users have to hold the handle at the edge of the screen and wait for the page to roll up while aiming to stop at a good position.

Semi-Direct Input with Discrete CD Gain
A unique characteristic of this technique is that a selection starts with a direct touch on the first word and ends at a relative position based on indirect control.It is known in the literature that both direct and indirect input methods have unique advantages [15].For example, direct manipulation allows users to input naturally and intuitively, while indirect input enables manipulating outof-reach targets with enhanced precision.Our method allows an intuitive direct selection of the starting point, while the endpoint is specified indirectly by growing the target, thus mitigating the Fat Finger Problem.This transition from direct to indirect control is seamless and does not need mode switching or using additional interface components, leveraging the benefits of both input methods.However, one minor issue we observed from the study was that the upward sliding, for either expanding or rewinding, can cause occlusion posed by the finger body, instead of its tip.This problem was not as visible in WordTouch as in ChunkTouch, where the user needs to actively preview the next chunks to be merged.
Another interesting phenomenon ChunkTouch brings to the design space of input control techniques is its non-static Control-Display gain with a discrete change pattern as a function of the sliding distance.This introduces theoretical questions to target acquisition when semantics or other discrete parameters are involved.

Outlook for NLP-Assisted Text Selection
We believe coarse-grained text selection and NLP-assisted text selection can go hand in hand towards enabling new ways of interacting with text.Use cases of coarse-grained text selection include quick highlighting, copy-paste, translation, and selection of structured text (e.g., code), etc.These can be done with the syntactic parsing used in our techniques.Reducing the input precision needed in text selection can help smartphone users interact with text on the go, as well as benefit low-vision user groups.
Furthermore, as language AI rapidly develops, the underlying semantic structure of text will be analyzed more accurately and concretely.Future work could investigate the use of Large Language Models to extract semantic structures from text, perhaps even in a customizable or interactive way, instead of syntactic chunking used in our work.Effectively chunking text could help users consume and process large amounts of text.For instance, chunking text in transcripts of verbose speech may help users segment content based on topics or points.Chunking the text produced by generative AI may help improve the efficiency of users reviewing and processing the abundant content generated by the models.

Beyond Touchscreen and 2D Display
1D-Touch is shown to be efficient and easy to learn and perform thanks to its simple up-anddown sliding gesture that doesn't require extra hardware or interface components.Using such 1D control for text selection not only benefits touchscreen input but also can be used in other display environments.We argue that this approach shines even more when used in environments where position-based target acquisition is challenging to achieve.For example, we can imagine our approach being used on smart glasses with a small input device like the ring mouse, or on smartwatches where targets easily go off-screen.It could also be deployed in virtual reality environments where text is displayed as floating 3D objects in non-flat layouts, making it harder to control the cursor by position in the space.To learn about the gain and cost of 1D-Touch in different display and control environments, new prototypes for such environments need to be developed and evaluated in the future.

Limitations
One main limitation of the current ChunkTouch prototype is the choice and implementation of the chunking method.Although we propose 1D-Touch techniques to be meaning-driven, our prototype doesn't extract meanings directly.Instead, ChunkTouch uses syntactic units to infer semantics, which tends to be limited.CoreNLP constituency parser also has its limitations of only looking up constituents while providing parsing results regardless of the semantic and even syntactic correctness.It only works relatively well when the sentence itself is grammatically regular and syntactically correct, which is not the case in many real-world scenarios.While the chunking methods need to be improved in the future, we worked around these limitations by ensuring the syntactic correctness of the text used in our experimental tasks.
1D-Touch is relatively sensitive to where the selection target is located.The sliding gesture needs enough vertical room to perform.Sometimes the text is near the top or the bottom, leaving limited space and may force repetitive clutching.We also noticed that when the first word of the selection target is very short, like "I" or "a, " users struggled to select it sometimes.While expert users may realize they do not need to start from the first word since the expansion is bidirectional, beginners tend not to realize this.Although we believe our technique can be integrated with char-level selection with carets in the same way as the baseline, further studies could be conducted to assess the potential cost.One difference would be that for baseline, the position where the finger lifts after word-level selection is close to the caret position.Whereas in our case, the finger is further away due to the indirect control, potentially costing extra time for the finger to travel back to the caret position.

CONCLUSION AND FUTURE WORK
Our work contributes a novel concept for text selection as a gradual expansion and contraction process of semantic text objects and adopts a 1D semi-direct input method to control it.This conceptual shift benefits from the increasing capability of language AI in understanding the semantic structure of text.Our work makes the first step to facilitate semantic manipulation of text by supporting coarse-grained text selection, which is a fundamental operation for most higher-level interactions with text.Our study showed that the 1D-Touch approach significantly improved the state-of-the-art for coarse-grained selection by 20%, and attributed most of the performance gain to the simple 1D semi-direct control.1D-Touch can be used as a standalone technique in coarse-grained text selection scenarios, or as a complementary method used in combination with the carets for character-level adjustment.While WordTouch outperforms in most scenarios, ChunkTouch was shown to serve better for selecting semantically meaningful units.
We plan to improve the ChunkTouch technique for future work with enhanced and optimized chunking methods given different text target types and selection tasks.We will try to provide ways to customize the sensitivity of distance control with empirical usage data.Different ways of combining WordTouch and ChunkTouch to leverage the advantages of both for varied selection tasks can also be prototyped.For example, sliding with one finger for WordTouch and with two fingers to activate ChunkTouch.We will also explore the mechanism in other display environments where reducing the need for input precision is beneficial, such as tiny smartwatch screens and virtual environments.

Fig. 2 .
Fig. 2. 1D-Touch selects by sliding up or down.Activated by a conventional long-press, the word under the touch point gets selected as the initial state (B).When sliding up, the selection expands by words (WordTouch) or syntactic chunks (ChunkTouch) to the text before (A); and vice versa for sliding down (C).

Fig. 3 .
Fig. 3. WordTouch (A) expands to include the next word when sliding every 1.5 mm, and ChunkTouch (B) merges the next chunk when sliding every 10 mm.

Fig. 4 .
Fig. 4. (A) The brackets shift as the selection changes.(B) Changing backgrounds visualize sliding progress.

Fig. 5 .
Fig. 5.An example NLP Constituency Parsing process and its result, as visualized by the brackets.

3. 2 . 1
Visual Feedforward and Feedback Design.Extra feedforward and feedback visualizations are provided for ChunkTouch to facilitate the gestural control and text selection.

Fig. 6 .
Fig. 6.Examples of a target piece of text for each TaskType, for semantic units and non-semantic units.

Fig. 7 .
Fig. 7.The screenshot of an example task in the study in the ChunkTouch condition.

Fig. 9 .
Fig. 9. Performance time for each technique.Fig.10. (A) Performance time for each technique, for semantic and non-semantic targets.(B) The number of overshoots for different Techniqe, for semantic (solid) and non-semantic (shaded) targets.

Fig. 10 .
Fig. 9. Performance time for each technique.Fig.10. (A) Performance time for each technique, for semantic and non-semantic targets.(B) The number of overshoots for different Techniqe, for semantic (solid) and non-semantic (shaded) targets.