Analyzing Depth Perception in Virtual Environments: A Comparative Study of Varied Scaling Factors

With the advancement of computing and display technologies, 3D vision devices have been widely used in various fields. This state-of-the-art technology enables users to immerse themselves in photorealistic volumetric images within virtually simulated environments. However, due to known depth perception issues, its usage has been limited to qualitative purposes. Studies have been executed to enhance depth perception in virtual spaces. This paper comprehensively analyzes experimental studies for improving measurement accuracy in 3D spaces by incorporating artificially generated depth cues. We studied in-depth how scaling factors, applied to virtual environments, impact human depth perception. A series of experiments were performed in two strategically designed settings, where the content within 3D spaces was either minified or magnified for proper viewing. Our two-staged development and user tests revealed several noteworthy findings to develop and apply computer-generated techniques for enhancing depth perception accuracy. This paper summarizes the study outcomes with future research tasks to utilize the 3D technology beyond qualitative purposes.


INTRODUCTION
For many years, 3D vision technology was predominantly utilized for qualitative purposes, including entertainment, training, education, and non-quantitative virtual analysis.However, its potential for quantitative analysis is now becoming more apparent.With the advancement of computer graphics technology and enhanced precision in graphical content, 3D vision technology can now be applicable for precise measurements within volumetric images.Various industries have exploited 3D vision technology in practices such as surgical operations and automated robotic movements [1].The future of 3D vision could innovate and facilitate a wide range of activities, including surgery planning, remote monitoring and operations, disaster response, and more.
The complications and inaccuracies associated with measuring depth in 3D virtual environments (VEs) using stereoscopic devices are well-known issues [2,3].These problems are compounded by the absence of natural depth cues in VEs and disparities between a user's real-world convergence and their convergence in a VE, which are known factors that can adversely affect depth perception during stereoscopic viewing.Researchers have conducted extensive investigations into human depth perception and the factors influencing it using various 3D vision devices.These studies have yielded mixed outcomes, with viewers' depth perception in VEs, demonstrating variability across different experimental scenarios, types of display devices, and levels of immersion [2][3][4][5].Livingston et al. conducted a comparative study on depth perception in indoor and outdoor environments, with and without augmented reality cues [4].Their findings indicated a trend of depth underestimation in indoor environments, a phenomenon frequently observed in other studies as well [2,3].In contrast, subjects tended to overestimate depth in outdoor environments.
Naceri et al. conducted a study to identify the factors influencing users' depth perception errors when using wide-screen stereoscopic displays versus head-mounted displays (HMDs) [6].Their research revealed that participants could accurately compare depths for both systems when presented with objects in the appropriate positions.However, as the size of the objects changed, distance judgments worsened when using HMDs.In a separate investigation, Zhang et al. explored the impact of depth of field on depth perception.They observed that, in general, participants' depth perception decreased as the depth of field increased, particularly in scenes where the height-in-the-field cue was removed [7].
Researchers have developed various methods to enhance context and introduce depth cues into images to improve depth perception accuracy.These techniques involve the integration of computergenerated visual cues, graphics rendering methods, and supplementary objects to augment the user's viewing in VEs.For instance, Cipiloglu et al. attempted to improve depth perception by strategically selecting and utilizing several prevailing depth cues, including size, linear perspective, shading, shadows, texture gradients, convergence, binocular disparity, motion parallax, and more [8].They developed a fuzzy logic system designed to identify the most suitable depth cues for a given scene to enhance depth perception as much as possible.Their study reported that the fuzzy logic system outperformed scenarios with no depth cues, fully random cue selection, cost-limited random cue selection, and the inclusion of all possible depth cues.They also reported linear perspective cues led users to reduce the underestimation of the depth of distant objects.Similarly, Livingston et al. found visual aids were an important factor in depth perception, particularly emphasizing the impact of linear perspective cues in a closed environment [4].Yang et al. delved into the use of pseudo-depth and perspective projection techniques to improve the spatial perception of vascular structures in fused images.Their study reported that a distance detectionbased contour rendering technique achieved 100% correctness in determining the aneurysm position in X-ray images [9].
We also investigated the effects of computer-generated depth cues on human depth perception within two strategically designed research settings.Two studies were conducted in minified VEs, where the objects and surroundings were rendered smaller than their actual dimensions [10,11].Two other experiments explored depth perception in magnified VEs, where the content was enlarged [5,12].Across four user tests, we observed variations in participants' depth perception influenced by the graphic techniques employed.Further, these studies produced different outcomes in estimating the depth in two contrasting VEs, despite using the same or similar graphic effects.
Adjustment of content size helps to ensure effective presentation and analysis on various display devices, including 3D vision.While many scholars have investigated methods to improve depth perception accuracy in 3D VEs using various setups, limited attention has been given to the impact of scaling factors on human depth perception in 3D vision.This paper provides a comprehensive analysis of prior research, drawing from cross-experimental data, to advance our understanding of human depth perception within virtual spaces created with varying scaling factors.This study also examines the influence of artificially generated depth cues on the viewers' spatial perception within VEs using opposite scaling factors.

METHOD
To explore the impact of computer-generated graphics techniques and varying scaling factors on human depth perception within VEs, we executed a two-phase experiment in two distinct virtual spaces: magnified and minified VE.The preliminary study was designed to identify key computer graphics techniques significantly influencing users' accuracy in perceiving depth.In the initial study, we employed foundational graphics techniques to construct VEs; the preliminary study unveiled some notable graphics techniques.Based on preliminary study results, we conducted follow-up studies incorporating modified or redesigned graphics techniques.

Applied Graphics Techniques
Preliminary Study: In the initial study, two contrasting VEs were created using fundamental computer graphics techniques including shading, shadow rendering, background coloration, texturing, scaling, and linear perspective.These specific graphics effects were selected based on a comprehensive review of prior research that investigated human depth perception when utilizing stereoscopic devices.Each depth cue was individually analyzed to identify the graphic effects with the most significant influence on depth perception.
1) Baseline Image: This image is the benchmark for assessing the effects of introducing independent depth cues to the VEs.The VEs presented a minified terrain model or a magnified volumetric medical image.No additional graphics techniques were applied except for basic illumination with ambient and diffuse light components.The center of the projection was positioned and oriented to replicate the eye position and viewing direction of the test subject.
2) Shading and Shadowing: To assess the impact of lighting effects on depth perception, we examined two well-known graphics techniques: shadows and shading.These graphical enhancements are essential in perceiving the shape of objects and supplementing the depth cues.In the first view, we turned off directional lighting (diffuse and specular lights) to create a flat appearance of content in VEs.In another scene, the measurement markers cast shadows onto the terrain or the medical image.Further details about measurement markers can be found in Section III.
3) Background Colors and Textures: Previous research indicates manipulating the color and texture of the background can influence the perceived depth of objects.Specifically, using a lighter, warmer background tends to make objects appear closer, while employing a darker, cooler background can create the illusion of greater distance.In this set of tests, we investigated how changes in the background context affect users' depth perception within virtual environments.Three background color variations were used: solid blue, solid orange, and a texture containing semi-randomly generated patterns of blue and black.The choice of blue is intended to create a sense of greater distance, while orange is selected to convey closer proximity.
4) Inverted Scaling Factors: This test explored how scaling factors affect depth perception.The dimensions of the content were manipulated relative to the baseline image to investigate the resizing factors within the VEs.In the minified VE, the scene was scaled up to amplify the details of the environment, such as elevation changes in the terrain.In the magnified VE, the medical image was presented at its original size, which is smaller than the baseline image.
5) Linear Perspective: Gridlines add linear perspective perception to the VEs.To study the impact of spatial cues on depth perception, we incorporated computer-generated gridlines into the VEs.Two variations of gridlines were used: one featuring regular gridlines that showed natural convergence and another featuring altered gridlines displaying modified convergence: the regular gridlines are directed towards a vanishing point positioned at the farthest point behind the image; the altered gridlines converged more rapidly than the normal gridlines as they extended deeper into the scene.
Follow-up Study: In the preliminary study, shadows and linear perspective using gridlines showed better performance in estimating depth, including reduced measurement variations.To comprehensively investigate the influence of linear perspective using strategically designed gridlines on depth perception, multiple variations of gridlines were tested in two distinct VEs.These gridlines had different styles, varying the convergence point, and were combined with projection methods.For the follow-up study, we devised seven variations of gridlines, categorized into four different styles across two VEs.
1) Normal Gridlines: In the minified VE, the gridlines were superimposed onto the terrain image, aligned with the XZ plane (converged into the screen).However, the gridlines positioned in the center of the image could potentially obscure the medical imagery in the magnified VE.To avoid this obstruction, a 3D wireframe or a 3D gridline box encloses the medical image in the middle of the VE.The 3D wireframe has minimal edges required to outline its cube shape.Meanwhile, the 3D gridline box incorporated additional gridlines onto the 3D wireframe to augment the sense of linear perspective.See Figure 1 below.
2) Draped Gridlines: Indicating the computer-generated depth cue's proximity to the objects under examination may assist users in measuring the depth more effectively.For this purpose, gridlines are draped along the contours of the terrain.The gridlines are still aligned with the XZ plane, but they follow the variations of the terrain along the Y-axis rather than maintaining a fixed height relative to the terrain's normal.Figure 1 shows an example of a scene with draped gridlines.
3) Inverted Gridlines: In contrast to normal gridlines, inverted gridlines shift the convergence point in the opposite direction, placing it behind the viewer's position.Normal and inverted gridlines share similar characteristics except for positioning the convergence point in the gridlines.This change was designed to investigate the impact of convergence on depth perception.
4) Combining Gridlines with Projected Images: This technique was developed to assist users in estimating the depth by referencing a secondary gridline plane, onto which objects in a 3D space are projected as 2D representations.The preliminary study showed promising results in depth perception when users referenced gridlines and shadows (the projection of 3D objects).In the minified VE, in addition to the draped gridlines on the terrain, a 2D gridlines plane is positioned above the terrain.Shadows of markers on the terrain are rendered onto the 2D gridlines plane.In the magnified VE, markers are projected onto the four surrounding planes of the 3D gridline box using orthographic projection.Figure 2 shows a medical image with a 3D gridline box and the associated projected markers.

Software System
For this study, a custom software application was developed and named the 3D Stereoscopic Viewer (3DSV).The 3DSV utilizes Open-SceneGraph and Blender to create two distinct VEs with varying scaling factors.The minified VEs presented terrain images that, in real life, are much larger than the dimension of the projection screen.Thus, the size of the rendered topographical landscape images were reduced to fit on the projection system.The magnified VEs displayed volumetric medical images, compiled from a series of 2D CAT scan data of the human foot and skull.The medical images were rendered on the projection screen bigger than the original dimension, creating a zoomed-in effect, which is common practice in medical image analysis for better viewing.The data sets were sourced from the USGS National Map Elevation and the National Institutes of Health databases.To mitigate potential influence of inherent depth cues present in the original images, such as natural scenery and prebaked illumination, the 3DSV system rendered images in grayscale solely with ambient and diffuse lighting.
The 3DSV used the asymmetric frustum method, also known as the off-axis frustum method, in conjunction with quad-buffers to generate stereoscopic images.This viewing method provides users with a more comfortable 3D viewing experience than the symmetric frustum method by preventing vertical parallax.Our system generates views of virtual spaces that align with the configuration of the projection system.The images maintain the same aspect ratio as the dimension of the projection screen.The camera was located at the position that replicates the viewpoint of users, viewing the rendered scenes from 5 feet away at the center of the screen.

EXPERIMENT
We conducted four user tests to assess the influence of integrated graphics techniques within VEs with different viewing scales.These tests were divided into two sets: two studies focused on magnified VE, while the other examined minimized VE.Both experiments adhered to a consistent design and experimental procedures, comprising a preliminary test and a subsequent follow-up experiment.The preliminary test explored fundamental depth cues, while the follow-up experiment examined redesigned graphics techniques incorporating findings from the preliminary test.
In order to quantify the depth in virtual spaces, we strategically positioned two spherical markers and a reference scale (represented as a 3D cube) within the VEs.The red and green markers and yellow reference cube are shown in Figures 1 and 2. Markers were placed at pseudo-random locations near the terrain surface or the medical image.Markers were mainly separated along the Z-axis (into the screen), signifying the starting and ending points of specific distances to be measured.We introduced small distances along the horizontal and vertical axes to add variations in measuring depth without overlapping the two markers.The reference scale is provided to establish the correlation between the scale of the VEs and the distance units that participants were tasked with reporting.
In each VE, participants encountered either a minified or magnified imagery with two distance markers and a reference scale.Participants were tasked with estimating the distance between two markers three times, with the markers positioned at different locations for each graphics technique.The measured distances were recorded to analyze the impact of various graphics techniques on viewers' depth perception.

Experimental Procedure & Environment
A total of 66 volunteers participated in the four experimental studies.In the minified VE, 36 participants, comprising both Computer Science majors and alumni, volunteered for the study.The preliminary study had 15 participants, while the follow-up study involved 21 participants.In the magnified VE, 30 participants engaged in the tests.The preliminary study included 15 participants, and the follow-up study involved 15 subjects.
The experimental test was executed using a large screen 3D projection system featuring a projection screen measuring 13.3 feet in width and 7.5 feet in height.Two high-definition 1920 x 1080 resolution projectors generate two circular polarized images on the rear projection screen.Figure 1 shows an example of a virtual space rendered on our projection system.Participants wore 3D polarized glasses to engage with the VE in a semi-immersive environment.They were seated in a chair positioned approximately 5 feet away from the projection screen and facing towards the screen.Participants were asked to estimate the distance between two markers displayed within the VEs without significant body and head movement.Only one participant was tested at a time.
The experiment included three phases: depth perception tests, training to acquaint participants with the VE, and the actual experimental assessments.Before commencing the training phase, all participants were presented with the NVIDIA 3D Stereoscopic Test to confirm their capacity to perceive depth in a 3D space using our system.After confirming that participants could perceive 3D stereoscopy, the training phase was initiated.
Different VEs and content were used for the training and the actual experiments, respectively.Participants were asked to judge the distance between the two markers within the scene.Following their distance estimation, participants were immediately provided with the actual distance between the markers.This feedback was given to enhance participant comfort and familiarize them with measuring distance within a virtual space before performing the actual experiment.To minimize bias acquired from prior testing, the training was repeated with markers placed at varying locations and orientations within the VEs.Participants were allowed to repeat the training exercises as many times as necessary to be ready for the depth perception test.Once test subjects had become familiar with the virtual space through training, the actual experiment was conducted with VEs constructed with different content.During the experimental tests, no feedback was provided regarding the accuracy of participants' distance measurements.Test subjects were allowed as much time as needed to estimate the distance between the two markers in each scene.

Experimental Results
To evaluate the efficacy of graphics techniques in enhancing depth perception, we computed the weighted average of the differences and standard deviations in the participants' estimated measurements for each distinct graphics technique.The weighted averages were calculated by the following formula: where A i is the actual answer for the measurement item i, S ij is the participant j th estimation for measurement item i, and N is the total number of participants.A positive weighted average value indicates an overestimation of the distance, while a negative weighted average value represents an underestimation of the distance.The scene, without adding any strategically designed graphics techniques, served as a baseline for all technique comparisons.Both preliminary and follow-up studies indicated that computergenerated graphics techniques have varying impacts on participants' depth perception.Table 1 presents the noteworthy results from the preliminary study.In the preliminary test, participants estimated the depth of objects in contrasting VEs using opposite scaling factors.
Participants, in general, overestimated the depth in minified VE.Meanwhile, they showed a tendency to underestimate distance in magnified VE.Similar phenomena of overestimation in large (minified) VEs have been observed in other studies [2,3].We found related outcomes when an inverted scaling factor is applied (See the last row of Table 1) that magnify the content in minified VE and minimize the medical image in magnified VE.In minified VE, only this technique yielded depth underestimation.The preliminary study also showed the effectiveness of linear perspective depth cues and shadows in reducing measurement variances among participants.The introduction of artificially generated gridlines and projected images in the VEs appeared to result in a smaller standard deviation.
Table 2 summarizes the outcomes from the follow-up experiment, examining the efficacy of the purposefully designed linear perspective depth cues.The inclusion of draped gridlines and a 3D gridline box led to reduced measurement variations among participants.However, when these depth cues were integrated with projected images, the approach further reduced the standard deviation but had an adverse effect on the weighted average value.

DISCUSSION AND FUTURE WORK
These comparative studies investigated the impact of the graphics effect combined with varying scaling factors within stereoscopic images, unveiling several new insights into virtual depth perception.
The subsequent sections summarize notable findings in three categories: trends observed in both types of VEs and trends observed only in magnified or minified environments.

Both Environments
The introduction of gridlines as a linear perspective depth cue in VEs served well in reducing measurement variations, irrespective of the scaling factors employed.Further, placing gridlines closer to the objects under examination resulted in improved depth perception measurements.In the minimized follow-up study, the draped gridlines yielded superior results compared to normal gridlines.Similar findings were observed in magnified VEs when using the 3D gridline box and gridlines with projection.Additionally, our studies revealed that incorporation of other graphics techniques with linear perspective depth cues had varying impacts on human depth perception.In line with a prior study [8], we plan to expand this study into systematically analyzing the correlations among linear perspectives, VEs' scaling factors, and other computer-generated depth cues.There were noticeable accuracy differences between the studies conducted in minimized and magnified VEs.In general, the results obtained in minimized VEs revealed a lower standard deviation.One contributing factor could be the nature of the data; despite both environments being displayed in 3D, the relatively flat terrain used in minimized VEs has minimal height variation compared to the medical images rendered in magnified VEs.As a result, distance estimation primarily relied on two dimensions: width and depth.Perceiving only two distance factors is inherently easier, resulting in better depth measurement accuracy, especially when gridlines are used as references.It's worth noting that using terrain with greater height variation could yield different measurement results.In contrast, the markers on the medical image had equal distance variations in all three dimensions.Analyzing measurement differences across diverse VEs with depth variations along the viewing direction and changing viewer positions could be a future study to enhance the understanding of depth perception in virtual spaces.
Unnecessarily duplicated depth cues could negatively influence measuring the depth.For instance, projecting markers to one plane in the minimized VEs yielded better results compared to displaying projected markers on two gridline planes in the magnified environment.This duplicated depth information on both sides of the image may confuse viewers as they reconcile multiple pieces of information.Projecting markers onto just two sides of the 3D gridline box closer to them could aid viewers in measuring the depth while maintaining a clear correlation between the actual objects and a copy of the projected images.

Magnified Virtual Environment
Participants' depth perception results showed a decreased standard deviation as each depth feature was introduced.The progressive inclusion of additional depth cues, starting with adding gridlines to the 3D wireframe and further incorporating projected markers on the 3D gridline box, clearly improves the consistency of distance estimation.It could be worthwhile to investigate the outcomes of blending even more detailed cues, such as variations in quantity and gridline styles, to determine the potential upper limit for enhancing depth perception without interfering with the users' viewing experience.
Projected markers on the 3D gridline box allowed the viewers to compare marker locations closely to the gridlines while eliminating distractions from the image.While this technique produced a low standard deviation, it showed a noticeable distance underestimation on average.This approach helped viewers independently measure the distances from two different projection planes.However, they may not realize some dimensional information loss from employed orthographic projection.Applying different types of projection, such as oblique projection, may address this limitation while preserving measurement consistency.
The baseline scenes used for the magnified preliminary and follow-up studies were identical.However, they showed noticeably different performance: measurements in the preliminary baseline image had significantly more underestimation and a lower standard deviation.To minimize any potential bias acquired from the previous measures, the medical object was presented with varying orientations and marker positions in each test.The orientation of the object and the placement of markers could have influenced participants' depth perception.When viewers examine the objects from more comfortable or familiar angles, it may assist their depth perception.Allowing viewers to rotate objects can be a valuable depth cue for improving depth perception.Further research is expected to explore depth perception consistency when observing objects from different orientations and varying angles, which could serve as a motion parallax depth cue.

Minimized Virtual Environment
Both studies showed that controlling convergence influences the viewer's depth perception.The inverted gridlines technique in the minified follow-up study and the altered gridlines in the magnified preliminary test had the lowest standard deviation of all methods used in the associated user tests.These graphics techniques adjusted the convergence point to be near the viewer.In a real environment, viewing an entire landscape would require the viewer to be high in the air.From a high elevation, the ground below seems to converge directly below the viewer.The adjusted convergence point simulates a view from above or closer, which may help the viewer feel more familiar with presented VEs.

CONCLUSION
This comprehensive analysis of depth perception in VEs using different scaling factors combined with supplementary depth cures has revealed several noteworthy findings for improving measurement accuracy.A series of user tests conducted in two distinct VEs clearly showed that depth perception in minified and magnified VEs differs.Participants tended to overestimate depth in minified VE, and the opposite trend was observed in magnified VE.Furthermore, our experimental studies presented that artificially generated depth cues influence depth perception within VEs by supplementing missing natural depth cues.Different forms of gridlines effectively added the linear perspective depth cue to the virtual spaces.The manipulation of convergence and scaling factors improved viewer familiarity and comfort, leading to better perception in VEs.While these findings contribute significantly to our understanding of 3D depth perception, this study also suggested several areas for further investigation.The development of graphics techniques that position computer-generated depth cues near target objects without disrupting the users' viewing and streamline the correlation of visual cues is essential.Although certain graphics techniques were beneficial across multiple studies, the unique characteristics of each VE necessitate distinct combinations of graphics effects to achieve precise depth perception.Identifying the optimal depth cues for specific VEs is equally important as developing universally effective cues.We anticipate that these findings will contribute to improving depth perception accuracy, ultimately facilitating practical quantitative analysis within virtual spaces.

Figure 1 :
Figure 1: The 3DSV system shows a minified VE with drafted gridlines.

Figure 2 :
Figure 2: A 3D gridline box with projected makers in magnified VE.

Table 1 :
Depth Estimation Error and variation with Basic Graphics Techniques

Table 2 :
Percent Error of Using Variation of Gridlines