Effect of Stimuli Shape on Visual Search Performance: A Selective Literature Review

Background: Visual search tasks involve the identification of target objects within a visual scene. Two fundamental visual features in such tasks are stimulus motion and shape. However, there has been no review of research literature regarding the effects on visual search performance and/or mechanisms that might underlie their influence. Objective: The aim of the review is to fill the gap in the existing literature by reviewing the impact of stimulus motion and shape on visual search performance. Method: In this review, we systematically reviewed and analyzed relevant studies from two primary perspectives: stimulus shape and visual search performance; and theoretical models explaining underlying influential mechanisms. Results: The effects of shape on search performance differ under various types of motion conditions. Feature integration theory has suggested that these basic visual features are integrated serially through focused attention. But the Guided Search model has indicated that an activation map could be used in attention-guiding, resulting in an efficient search. Visual salience theory has suggested low-level feature integration into a salience map before further processing. Conclusion: We concluded from this review that stimulus shape may enhance or impair search performance under different conditions. This review will provide valuable references for practice and further research in the area of visual search.


INTRODUCTION
Visual search, as defined, involves seeking objects within a visual scene [1].Given that the majority of human-acquired external information is visual in nature, it holds significant importance in both daily routines and professional contexts [2].This review concentrates on the associations between stimulus shape and visual search performance, along with examining the influencing mechanisms.We chose shape for three reasons.First, shape is a basic visual feature of stimuli that many researchers have studied concerning their influences on visual search performance.Second, shape is a complex visual dimension that can be coded into sub-feature dimensions.Shape is probably the most problematical basic feature of visual search because there is no common definition of shape or a widely accepted layout of shape space [1].Third, shape with motion is closely related to the real world versus laboratory tasks.Unlike static search (i.e., the stimulus remains static), many practical visual search tasks have been performed under dynamic conditions such as driving and playing video games.This paper reviews and summarizes studies pertaining to the impact of stimulus shape on visual search performance, along with evaluating theoretical models that elucidate the effects of visual features on search performance.We discuss shape's influences on static and dynamic visual search performance, and three representative theoretical models to provide insights regarding the mechanisms of how stimulus shape influence visual search performance.

STIMULUS SHAPE AND VISUAL SEARCH PERFORMANCE
The shape is a basic visual feature as well as motion.To some extent, the shape is one of the most complex visual dimensions [1].Various concepts have been proposed to describe the shape, such as aspect ratio, curvature, etc., but up till now, no universal overall definition of shape has been developed [3].A significant reason why it is a challenge to define the shape of an object is that its local part features can combine and interact with other shape features [4].Pramod and Arun (2016) conducted research on how object shape attributes are combined in visual perception and proposed a simple additive rule in visual search.They discovered that the perceived distinctions between two objects result from the additive combination of local part differences, texture variations, and global property disparities.Huang (2020) explored the fundamental dimensions of shape features and defined the Segmentability-Compactness-Spikiness shape space, which can effectively differentiate differences between 2D shapes within the existing framework [5].In subsequent research [6], this shape classification has also emerged as a povotal feature in the framework of visual attention processing factors.The perception of shape is a multifaceted process requiring several different psychological functions [7].The processing of shape information can be divided into two stages [8].In stage 1, the shape is filtered by local linear filters tuned for orientation, spatial and other characteristics.In stage 2, higher-level mechanisms integrate the signals to gain overall shape information.Shape perception is influenced the separation between items and orientation, two vital features of shape [9].In the context of the visual search task, reaction time undergoes modulation due to the specific structural relations that exist between targets and their distractors.Notably, single-feature targets with shape properties that are linearly separable from those of their distractors result in significantly accelerated search rates, in contrast to linearly separable targets composed of a conjunction of distractor features [3].

Shape and Static Visual Search Performance
To investigate shape's relationship with visual search performance, numerous techniques for representing and describing shapes have been developed, and these can be classified into two categories of methods: contour-based and region-based [10].Contour-based approaches are more prevalent due to the significance of contours as a crucial source of shape information for the human visual system [11].The measure of shape similarity is one of the contour-based methods and is used in many visual search studies.As the similarity between the target and distractors increases and the similarity between distractors decreases, search performance declines [12].High target-distractors similarity and low distractor-distractor similarity mean that many areas of the stimulus area are salient, reducing the likelihood of attending to the target early on [13].High shape similarity between distractors is beneficial for perception grouping, which results in higher search efficiency [14].Gerlach et al. (2006) proposed a four-stage shape recognition model to explain shape's influence on visual search behavior [15].The first stage involved registering primitive visual features.In the second stage, contour elements and other units were derived.The third stage represented an intermediate phase of shape configuration, where local and global shape characteristics were integrated into more comprehensive shape descriptions, corresponding to complete objects or substantial object parts.In the final stage, shape descriptions were compared with representations stored in long-term visual memory.While shape similarity might be advantageous during shape configuration, it could be detrimental for shape selection.Current research also focuses on the differences between 3-D and 2-D shapes.In a study by Zhang et al. (2015) regarding visual search involving cube search, it was determined that cube search is less challenging compared to searches conducted under similar hexagon and rhombus conditions.This phenomenon could potentially be attributed to the extensive array of 2D image statistics inherent to cubes, rather than the characteristics of 3D scenes [16].

Shape and Dynamic Visual Search Performance
In addition to exploring shape's effects on visual search performance as a single discriminating feature, some scholars also investigated the conjunction of shape and motion's influence on search performance.McLeod et al. (1988) established that the search for a target defined by a conjunction of motion and shape operated as a parallel process, wherein the search time remained unaffected by the number of items [17].However, further study revealed that the conjunction search of motion and shape was not always a parallel process [18].If the target and distractors moved unpredictably, then the search was slow and serial.The proposed theory of motion filters elucidates these findings: the visual system enables the direction of attention to segments of the visual scene exhibiting specific motion characteristics, thereby facilitating the activation of target features while suppressing non-target features.[19].Moreover, it could discriminate between moving objects against a stationary backdrop and items moving in a distinct direction from those moving in another direction.Driver et al. (1992) used orientation as a shape characteristic and found similar results [19].They moved the filter theory a step further and proposed that two mechanisms were involved during the search process: a stationary form system geared towards precise shape discrimination while exhibiting relatively low sensitivity to motion, alongside a motion filter designed to differentiate moving entities from stationary ones but with a relatively reduced sensitivity to shape attributes.The impact of target-distractor shape representation on dynamic visual search performance was investigated within a display featuring uniform linear motion.[20].The conducted experiments unveiled that the representation of targetdistractor shapes influenced the interaction between motion and visual search performance.Additionally, the presence of motion blur contributed to improved performance in dynamic conditions.Hulleman (2020) also investigated the guidance of orientation and motion in T versus L search and found that orientation could interfere with motion guidance.The results showed that adding motion improved search performance but adding Ls with different orientations actually made searching more difficult [21].Muller et al. (1996) argued that visual quality can be used to explain the performance difference of conjunction searches of motion and shape [22].The shaping system encoded both static and moving items, although the moving items' quality suffered due to factors like reduced luminance contrast and retinal effects.When shape discrimination was simple, the moving items' representation in the shaping system enabled swift searches.Conversely, in challenging discrimination scenarios, the degraded representation extended search effort, resulting in longer search times.Additionally, motion uniformity among distractors exerted substantial effects on search performance.Observers found it simpler to consolidate distractors into a single object and dismiss them collectively, compared to the scenario where all distractors moved in varying directions.[23].

THEORETICAL MODELS EXPLAINING THE INFLUENTIAL MECHANISMS
To better understand visual search behavior, scholars have developed numerous theoretical models of visual attention, among which three of the best-known were feature integration theory, Guided Search model, and visual salience theory.

Feature Integration Theory
Feature integration theory was developed in 1980 by Treisman et al. [24] and stood as one of the most influential theories in visual information processing during the final quarter of the 20th century [25].According to the theory, visual perception can be divided into two stages [26].The first stage is preattentive where separable features of visual objects are detectable by parallel search and then are coded into a feature map.The second stage is the cross-dimensional processing stage.In this stage, features of different dimensions are conjoined by focused attention serially.In feature integration theory, features of different dimensions, such as motion status or shape, are processed in parallel by different regions of the brain, and then combined linearly in a serial process during the second stage.Feature integration theory can be used to explain illusion conjunction, i.e., features of different locations are wrongly conjoined.Illusory conjunction happens when focused attention spreads over several items and varies from time to time [27].Feature integration theory is typically bottom-up concerned, which emphasizes that human attention is stimulus-driven.Thus, human attention is attracted to more salient items.

Guided Searach Model
The Guided Search model was probably the most successful model challenging feature integration theory [28].The Guided Search model proposes that serial conjunction searches can be directed by parallel processes capable of categorizing stimulus items into distractors and potential targets.The search's effectiveness hinges on the guidance quality imparted by these parallel processes.[29].
In Guided Search 2.0, it is possible to have separate maps for each fundamental feature dimension.[30].Every feature module can be perceived as a pair of topographic maps, which receive considerable bottom-up or top-down activation.The cumulative impact of activation across all feature maps is captured by summing the activations to generate an activation map.This activation map facilitates attentional guidance by integrating information from multiple features.It retains the positions of individual items, directing attention towards the location with the highest activation [28].Guided Search 4.0 adopts a parallel processing approach during the initial phases of visual input processing.Subsequently, object recognition processes engage in matching visual objects with an extensive range of stored representations.This progression is impeded by a selective bottleneck, restricting the number of visual objects sent for recognition to a small subset.The outcomes of selective and non-selective pathways undergo an attentional blink bottleneck.Guided Search 4.0 shifts away from a strictly sequential model to a hybrid model that combines both serial and parallel processing [28].The latest Guided Search 6.0 expands upon the fundamental principles of Guided Search and describes more diverse attention guiding factors: guidance by the history of prior attention, value, and guidance from the structure and meaning of scenes [31].Guided Search 6.0 introduces two different search templates: a top-down "guiding template" as well as a more specific "target template, " and identifies three functional visual fields that describe the nature of foveal biases.In Guided Search 6.0, when a target is not located, the search concludes upon the accumulation of a terminating signal that surpasses an adaptable threshold.

Visual Salience Theory
Both FIT and the Guided Search model are conceptual models while the visual salience model is a computational model.Salience refers to the quality or state of being salient and was also termed "conspicuity" in early studies [32].An object's salience is determined not solely by its inherent properties, but also by the characteristics of neighboring items [33].Salience holds significance in visual search as variations in salience result in distinct search performances [34].Salience operates swiftly and is unaffected by the specific task at hand; it primarily functions through a bottom-up process [35].In addition, it could be modulated by top-down information [36].Visual salience theory is developed to quantitively analyze and predict visual behavior and performance [37].It divides the target detection process into three stages [38].In stage 1, basic visual features including contrast, color, size, etc., are extracted.In stage 2, the visual system computes individual feature maps to quantify the salience level of each dimension.In the last stage, these maps are integrated to generate a final salience map, similar to the activation map of Guided Search models.

CONCLUSIONS
The shape is one of the most complex visual dimensions.Shape information is first tuned into primitive visual features and then integrated into overall shape information.For static visual search, performance declines as the similarity in shape between the target and distractors increases, and as the shape similarity between distractors decreases.Studies regarding conjunction search of shape and motion indicated that this type of search could be serial or parallel.Movement filter theory and visual quality theory were developed to explain these results.FIT, Guided Search model, and visual salience theory are three popular models explaining visual search behavior and performance.Shape and motion are regarded as basic feature dimensions in these models.Feature integration theory suggests that basic visual features are integrated serially by focused attention.The Guided Search model proposes that an activation map can be used to guide attention, resulting in an efficient search.Visual salience theory focuses on bottom-up control and suggests low-level features are integrated into a salience map.
Other models or theories in visual search could be seen, to some degree, as variants of those three theoretical models in spirit.This review provides a solid foundation for understanding key issues and valuable references for further research and practice in the area of visual search.