Effects of Point Size and Opacity Adjustments in Scatterplots

Systematically changing the size and opacity of points on scatterplots can be used to induce more accurate perceptions of correlation by viewers. Evidence points to the mechanisms behind these effects being similar, so one may expect their combination to be additive regarding their effects on correlation estimation. We present a fully-reproducible study in which we combine techniques for influencing correlation perception to show that in reality, effects of changing point size and opacity interact in a non-additive fashion. We show that there is a great deal of scope for using visual features to change viewers’ perceptions of data visualizations. Additionally, we use our results to further interrogate the perceptual mechanisms at play when changing point size and opacity in scatterplots.


INTRODUCTION
Scatterplots are used for a wide variety of communicative tasks, both in academic and non-academic contexts.While most commonly used to represent linear correlation, or the degree of linear relatedness between two variables, they can also be used to represent different groups (clustering), to aid in the detection of outliers, to characterize distributions, and to visualize non-linear correlations.Figure 1 contains examples of scatterplots optimized for different tasks.There is evidence that people generally interpret scatterplots in similar ways [21], and that they support the interpretation of correlation significantly better than competing data visualizations [25].Rapid interpretation by viewers [40], along with low levels of interindividual variance render scatterplots particularly suited for experimental work; they provide important insights into perception and visualization design while being simple to study [40].
While our interpretations of scatterplots are generally similar, the accuracy of those interpretations is generally poor.In particular, viewers systematically underestimate the correlation displayed in positively correlated scatterplots.This holds true for direct estimation [7,11,12,22,23,28,47] and estimation via bisection tasks [41], and is particularly pronounced for values of r between 0.2 and 0.6.The COVID-19 pandemic demonstrated that lay populations, in addition to those who work with data professionally, are now expected to be able to use and accurately interpret data visualizations on a daily basis [6].This expectation confers a responsibility on the part of data visualization designers to design in such ways that people with limited statistical or graph training are able to correctly interpret data visualizations.Doing this requires us to understand human perception, apply this understanding to visualization design, and test those designs in rigorous empirical work.Here, we present a fully-reproducible, crowdsourced online experiment in which we systematically change visual features in scatterplots to correct for a well-known bias.We combine two techniques for correcting correlation underestimation in scatterplots and show that the effects are stronger than what might be expected from simple additive combination.Building on this, we present a framework for visualization design informed from the ground up by human perception.be modeled by deep neural networks [58].Extensive work throughout the 1970s to 1990s focused on participants' numerical estimates of correlation, and found evidence for a systematic underestimation for positive r values besides 0 and 1.This underestimation was especially pronounced for 0.2 < r < 0.6 [7,11,12,22,23,28,47] and is illustrated in Figure 2.More recent work has attempted to model participants' correlation estimation performance by using a combination of a bisection task, in which participants are asked to adjust a plot until its correlation is halfway between that of two reference plots, and a staircase method task designed to produce Just Noticeable Differences between scatterplots such that their correlations are discriminable 75% of the time [42].This work has been extended to incorporate Bayesian data analysis [21], which also identifies scatterplots as particularly suited for the communication of correlation [17,25].The current experiment adapts techniques from previous work [48,49] and combines them to further push the envelope of how systematically adjusting visual features in scatterplots can radically alter people's perceptions of correlation.For this reason we use the same direct estimation paradigm to collect responses; this allows for a large number of judgements to be collected, and is simple enough that participants need little to no training.

Drivers of Correlation Perception
Evidence points towards correlation perception being driven by the shape of the underlying probability distribution represented by scatterplot points, however it should be noted that this is very much still an open question, especially with regards to what low level perceptual mechanisms may be at play.It is possible that there are different contributory perceptual mechanisms operating at different levels based on task-specific differences such as viewing time or levels of graph-training.Increasing the x and y scales on a scatterplot such that the size of the point cloud decreases [11] is associated with an increase in viewers' judgements of bivariate association, despite the objective r value remaining the same.It was suggested in this case that viewers may have been using the area of the point cloud to judge association.Later work found that the relationship between objective and perceived r values could be described by a function that included the mean of the geometric distances between the points and the regression line [29].Investigation of the idea that people use visual features to judge correlation provides evidence that, among three other equally predictive features, the standard deviation of all perpendicular distances from scatterplot points to the regression line is predictive of performance on a correlation estimation task [57].Equations for both discrimination and magnitude estimation of correlation include a quantity that is small when r = 1 and increases as r approaches 0 [41].This quantity is indifferent to the type of visualization used (e.g.line graphs and bar charts [17] or augmented stripplots [41]), and is functionally similar to that found in work mentioned above [11,29,57].Regarding scatterplots, this quantity represents the average distance between data points and the regression line, and can be thought of as closely approximating the width of the underlying probability distribution.Findings from a convolutional neural network that learnt visual features related to correlation perception also support the idea that viewers are using an aspect of the shape of the point cloud to judge correlation, or some measure of what has been termed dot entropy [58], again considered a candidate visual proxy for correlation judgements [39,41].
Recent work investigating the use of decay functions that change point size or opacity in scatterplots as a function of residual distance provide evidence for both point density and salience/perceptual weighting being drivers of correlation perception.The use of an inverted opacity decay function [49], such that the opacity of a point decreased the closer it was to the regression line, resulted in significantly lower and less accurate correlation estimates compared to uniformly full-opacity plots with the same overall shape.This finding suggests that lower opacity in the center of the scatterplot further biased participants towards underestimation.When point opacity or size are reduced as a function of distance from the regression line [48,49], viewers rate correlation as significantly higher and are significantly more accurate, findings that support a low-level data salience account.Our aims are to test the impact of systematically altering visual features on correlation perception and to provide empirically-derived tools for visualization designers to design better visualizations informed by the use of a simple, reproducible framework.

Opacity and Contrast
Changing the opacity of scatterplot points is standard practice to deal with issues of overplotting [26]; scatterplots with very large numbers of points, especially those with high degrees of overlap,  suffer from low individual-point visibility caused by high point density.Lowering the opacity of all points using alpha blending [14] addresses this, and makes data trends and distributions easier to see and interpret (see Figure 3).In the present study we use the ggplot2 package [56] to create our stimuli.This package uses an alpha parameter, or the level of linear interpolation [46] between foreground and background pixel values, to set the opacity of points.As demonstrated in Figure 4, an alpha value of 0 or 1 results in no interpolation and rendering of either the background or foreground pixel values respectively.Regarding the related concept of contrast, psychophysical definitions are often based on what is being presented (e.g., gratings) or are modeled to take into account aspects of human vision (e.g., visibility limits) [59].Our crowdsourced methodology gives us no control over the exact luminances of our stimuli, only over the relative differences in luminance between scatterplot points and backgrounds.For this reason we do not report absolute luminance nor make any attempt to adopt a formal definition of contrast; instead we report the alpha value used.Given that our work aims to improve correlation perception without removing data from the scatterplot, we also incorporate point visibility testing (see Section 3.5).Informal point visibility piloting suggested that our smallest, lowest opacity points had low visibility.We therefore implemented an alpha = 0.2 floor for these points, which we judged as conferring a sufficient level of point visibility for the range of point sizes we used.Previous work has found that uniformly lowering the opacity of all scatterplot points relative to the background can increase the level of underestimation error relative to full opacity, and that lowering point opacity as a function of distance from the regression line is able to bias correlation estimates upwards to partially correct for the underestimation observed [49].Evidence suggests point salience/perceptual weighting or spatial uncertainty as drivers of this effect.Lower stimulus contrast, which for isolated stimuli is functionally identical to lower opacity, is associated with lower salience [18], can bias judgements of mean point position [19], increase error in positional judgements [55], and can result in greater uncertainty in speed perception [8].Due to mechanistic accounts of both salience/perceptual weighting and spatial uncertainty predicting results in the same direction regarding opacity adjustments, previous work [49] has been unable to determine the extent to which each is responsible for the observed reduction in correlation estimation error.
Micallef et al. [30] found that "merging, dark dots" support correlation estimation; despite only changing point opacity in a uniform manner, the sheer number of points used in that study results in scatterplots that appear similar to those that reduce opacity as a function of residual error [49].That this technique has been shown to produce more accurate correlation estimates as compared to unadjusted scatterplots may explain why the optimization system employed in Micallef et al. [30] conferred benefits regarding correlation estimation.

Point Size
For discriminability reasons, scatterplots visualizing large datasets tend also to have smaller points.Bubble charts are a subclass of scatterplot which use point size to describe a third variable, but what experimental work there is on the impact of point size on correlation perception is inconclusive.Some work has found bias and variability in correlation perception to be invariant to both uniform and irregular changes in point size [38,40], while elsewhere a strong effect of changing point size as a function of distance from the regression line has been reported [48].Evidence points towards a salience-dominant mechanism in the latter case, albeit with a small effect of spatial uncertainty.There is evidence that larger stimulus size is associated with lower levels of spatial certainty [2], but higher levels of salience [18], results which are supported by evidence that reaction times are slower to smaller stimuli [16,34].The predicted effects of spatial certainty and salience on correlation perception operate in the opposite direction to one another, which has allowed researchers to provide evidence for the mechanistic dominance of salience when point size decay is used.When an inverted size decay function is used such that smaller points are located nearer the regression line, correlation estimation is significantly more accurate than when all points are the same size [48].In this case it was suggested that the higher spatial uncertainty brought on by larger exterior points caused a perceptual downweighting during correlation estimation, which is in line with work suggesting human perceptual systems make robust use of visuo-spatial information [48,53,54].This effect was small, meaning we do not take it into account when making our hypotheses.Our hypotheses do, however, take into account the wider body of evidence regarding stimulus size, perception, and attention.Hong et al. [19] found that larger stimuli could significantly alter perceptions of global scatterplot means.Participants' estimates were significantly biased towards areas with larger points, especially when a wide range of point sizes were used and there was a high correlation between point sizes and position.We formulate H3 according to these findings.

Hypotheses
We present a single experiment based on established effects of adjusting point size and point opacity in scatterplots.In our study, we combine previously independently tested point size and point opacity decay functions in both typical orientation (opacity/size is reduced with residual magnitude) and inverted orientation (opacity/size is increased with residual magnitude).Throughout we refer to congruent and incongruent conditions with respect to the combination of orientations of size and opacity decay functions.We hypothesize that: (H1) an increased reduction in correlation estimation error will be observed when congruent typical orientation decay functions are used; (H2) the use of congruent inverted orientation decay functions will produce the least accurate estimates of correlation; and (H3) that owing to the greater strength of the size channel observed in previous work [48], there will be a significant difference in correlation estimates between the two incongruent orientation conditions.

METHODOLOGY
In this section we begin by discussing our general research methods, including our approach to crowdsourcing, our implementations of open research practices, and the modelling methods we use to test our hypotheses.We also discuss the issues that arise regarding the use of these methods in our particular research paradigm.We then report on two additional pre-experimental tests that participants completed, point visibility testing and a screen scaling task to assess dot pitch.This section concludes by reporting on experimental procedure, participant recruitment and characteristics, and experimental design.

Crowdsourcing
Much prior work on correlation perception in scatterplots has taken place in-person, most often with graduate students with experience in statistics.While this work is valuable, especially to perception audiences, it can struggle to provide data that is resilient to different viewing contexts and the wide range of levels of statistical and graph experience present in lay populations.In addition, the ease and lowcost afforded to us by online, crowdsourced experimental work is unmatched.Given our intended HCI and design audiences, we therefore choose to crowdsource all participants.We acknowledge, however, that the technique has been affected by low quality data and skewed demographics in the past [9,10,35].In light of these issues we follow published guidelines [35] to ensure the collection of high quality data.Namely, we use the Prolific.coplatform [1] with stringent pre-screen restrictions; participants were required to have completed at least 100 studies using Prolific, and were required to have a Prolific score of 100, representing a 99% approval rate.This is more strict than the 95% suggested in previous work [35], but has served the authors well in previous work.

Open Research
This study was conducted according to the principles of open and reproducible research [3].All data and analysis code are included in a GitHub repository1 , which also contains instructions for building a Docker container [27] to fully reproduce the computational environment used, allowing for full replications of stimuli, analyses, and the paper itself.Ethical approval was granted by the University of Manchester's Computer Science departmental ethics panel (Ref: 2022-14660-24397).Hypotheses and analysis plans were pre-registered with the OSF2 and there were no deviations from them.

Modeling
We use linear mixed effects models to model the relationships between the combination of size and opacity decay functions and participants' errors in correlation estimates.Models such as these allow us to compare differences in our IV across the full range of participant responses, as opposed to relying purely on aggregate data and ANOVA.These models also afford us the ability to include random effects for participants and items, and are particularly resilient to a range of distributional assumption violations [43].As per our pre-registrations we preferred maximal models [4], including random intercepts and slopes for participants and items.The structures of these models were identified using the buildmer package (version 2.11, [52]) in R.This package takes a maximal random effects structure and then identifies the most complex model that converges by dropping terms that fail to explain a significant amount of variance.

Stimuli
When adjusting point size, we further transform values using a scaling factor of 4 and a constant of 0.2 to ensure that the minimum point size in the present study is both visible and consistent with that of previous work [48,49].0.25 was chosen as the value of b.This value was used in previous studies that the present work builds upon, and it produces a curve approximating a reflection around the identity line of the underestimation curve reported in previous studies [41,48,49].We acknowledge that there may be other, more suitable values of b, however testing these is outside the scope of the present work.We used this equation in typical and inverted orientation forms applied to point size and opacity in a factorial 2 × 2 design.Examples of the stimuli used can be seen in Figure 5.

Point Visibility Testing
Discussions about the size and opacity of particular scatterplot points are inherently difficult in the context of online, crowdsourced experiments; controlling the devices participants use to complete these kinds of experiments, beyond insisting on laptop or desktop computers, is impossible.While this may result in a lack of consistency in scatterplot point sizes, opacities, or contrast ratios between participants, it also provides results that are more resilient to different viewing contexts than traditional lab-based experimental work.In addition to measures implemented to ensure high quality participant data (see Section 3.1), it is also key that we do not inadvertently remove data from scatterplots by including points whose size or opacity renders them invisible.We therefore included point visibility testing.Participants viewed six scatterplots that were made up of a certain number of points between 2 and 7.These points were of the same size and opacity as the smallest and lowest opacity points used in the experimental items.Participants were asked to enter in a textbox how many points were present.Participants scored an average of 74.89% ( = 32.25%).Despite our use of the opacity floor detailed in Section 2.3, it is clear that some of our small, low opacity points were not reliably visible, most likely due to low contrast between the point and background, as previous work [48] found point visibility largely invariant to size.We suggest this is due to differences in monitor specifications between participants.In reality minimum visible point size and opacity would need to be calibrated on a per-monitor basis.We also include performance on the point visibility test as a fixed effect in Section 4.1.

Dot Pitch
We employed a method for obtaining the dot pitch, defined as the distance in millimetres between the centers of the pixels, of participants' monitors [31].Combining this with monitor resolution information allows us to calculate the physical on-screen size of scatterplot points.Participants were asked to hold a standard size credit/debit/ID card (ISO/IEC 7810 ID-1) up to their screen and resize an on-screen card until their sizes matched.We assumed a widescreen 16:9 aspect ratio and calculated dot pitch based on these measurements.Mean dot pitch was 0.60mm ( = 0.09), corresponding to a physical on-screen size of 7.80mm on a 1920 × 1080 pixel monitor for the smallest points displayed on a hypothetical 35.54 × 20.00cm monitor.We include analysis with dot pitch as a fixed effect in Section 4.1.

Procedure
The experiment was built using PsychoPy [36] and hosted on Pavlovia.org.Participants were only permitted to complete the experiment on a desktop or laptop computer.Each participant was first shown the participant information sheet and provided consent through key presses in response to consent statements.They were asked to provide their age in a free text box, followed by their gender identity.Participants completed the 5-item Subjective Graph Literacy test [15], followed by the point visibility task described in Section 3.5 and the screen scale task described in Section 3.6.Participants were given instructions, and were then shown examples of scatterplots with correlations of r = 0.2, 0.5, 0.8, and 0.95, as piloting of a previous experiment indicated some of the lay population may be unfamiliar with the visual character of scatterplots.Section 4.1 contains further analysis of the potential training effects of displaying these plots.Two practice trials were given before the experiment began.Participants worked through a randomly presented series of 180 experimental trials and were asked to use a slider to estimate correlation to 2 decimal places.Participants were asked to complete each trial as quickly and accurately as possible, although there were no time limits on individual trials.Visual masks were displayed for 1 second in between each scatterplot presentation.Interspersed were 6 attention check trials which explicitly asked participants to ignore the scatterplot and set the slider to 0 or 1.

Participants
150 participants were recruited using the Prolific.coplatform.Normal to corrected-to-normal vision and English fluency were required for participation.In addition, participants who had completed any of our previous studies into correlation estimation in scatterplots [48,49] were prevented from participating.Data were collected from 158 participants.8 failed more than 2 out of 6 attention check questions, and, as per pre-registration stipulations, were rejected from the study.Data from the remaining 150 participants were included in the full analysis (50.7% male, 48.7% female, and 0.7% non-binary).Participants' mean age was 30.6 (SD = 8.6).Participants' mean graph literacy score was 22.5 (SD = 3.5) out of 30.The average time taken to complete the experiment was 37 minutes (SD = 12.3), and is discussed further in Section 4.1.

Design
We employed a fully repeated-measures 2 × 2 factorial design.Each participant saw each combination of size and opacity decay function plots for a total of 180 experimental items.Participants viewed these experimental items, along with 6 attention check items, in a fully randomized order.All experimental code, materials, and instructions are hosted on GitLab 3 .

RESULTS
All analyses were conducted using R (version 4.3.2).Deviation coding was used for each of the experimental factors, which allows us to compare means of r estimation error for each fixed effect to the grand mean.We used the buildmer and lme4 (version 1.1-35.1 [5]) packages to build a linear mixed effects model where the difference between objective and rated r value was predicted by the size and opacity decay functions used.Semi-partial R 2 was calculated using the r2glmm package (version 0.1.2[20]).The emmeans package (version 1.10.0[24]) was used to calculate pairwise comparisons between levels of the size and opacity decay factors.
Our first two hypotheses were fully supported in this experiment.The combination of typical orientation size and opacity decay functions produced the most accurate estimates of correlation, although this also resulted in a large over-correction and consequent overestimation for many values of r (see Figure 7).Our second hypothesis 3 https://gitlab.pavlovia.org/Strain/size_and_opacity_additive_expwas also supported; the combination of inverted size and inverted opacity decay functions produced the least accurate estimates of correlation.We found no support for our third hypothesis; there was no significant difference in correlation estimates between typical orientation size/inverted orientation opacity decay plots and inverted orientation size/typical orientation opacity decay plots (Z = -2.26,p = .11),however we did find a significant interaction effect that provides evidence that the combination of size and opacity decay functions is not additive in nature.
A likelihood ratio test revealed that the model including point size and opacity decay conditions as fixed effects explained significantly more variance than the null ( 2 (3) = 5,286.81,p < .001).There were significant fixed effects of size decay and opacity decay, as well as a significant interaction between the two.The experimental model has random intercepts for items and participants, and random slopes for participants with regards to the size decay factor.Due both to our use of a linear mixed effects model with an interaction term, and our lack of a comparative baseline condition (i.e no size or opacity decay function used), we do not report a traditional measure of effect size.Instead we report the amounts of variance explained by each fixed effect term and the interaction term as semi-partial R 2 [32], which can be seen in Table 1 along with all model statistics.Pairwise comparisons between levels of the size and opacity decay factor can be seen in Table 2.

Additional Analyses
We find no effects of graph literacy ( 2 (1) = 3.50, p = .061),performance on the point visibility task ( 2 (1) = 1.29, p = .257),or dot pitch ( 2 (1) = 1.52, p = .218)on participants' errors in correlation estimation.We did find a significant effect of training ( 2 (1) = 23.78,p < .001),with participants rating correlation .01lower during the second half.This may imply that having more recently viewed the four training plots described in Section 3.7 increased participants' estimates of correlation.To analyse the potential variability of participant responses in more detail, we build a model including trial number.As the presentation order of experimental stimuli was randomized separately for each participant, this allows us to examine responses purely as a function of when they were made.We find a significant effect of trial number on participant's Table 1: Significances of fixed effects and the interaction between them.Semi-partial R 2 for each fixed effect and the interaction term is also displayed in lieu of effect sizes.errors ( 2 (1) = 29.31,p < .001). Figure 8 shows participants' unsigned mean errors in correlation estimation against trial number.Variability in error, as represented by the ribbon in Figure 8, stabilizes quickly and remains stable for most of the experiment, only widening again around trial number 170.We suggest that this is a result of participants knowing that they were coming to the end of the experiment and being less vigilant as a result.Regardless of its statistical significance, we do not consider this effect large enough to warrant further analysis, although future work will take into account potential effects of experiment length.

DISCUSSION
The finding of a significant interaction between point size and opacity decay provides evidence that these functions combine in non-additive ways.In addition, we provide further confirmatory evidence of what has been found previously.Namely, that while manipulations of both point size and opacity have significant effects, the effect of changing point size is stronger, and that while we can influence correlation estimates in either direction, typical orientation manipulations are more powerful than inverted ones [48,49].As one would expect, we also see an effect of orientation congruency on the extent to which a manipulation can bias correlation estimates; redundant encoding, such as that present here in congruent conditions, is known to support visual grouping and segmentation [33].We now provide evidence that redundancy can be exploited to change correlation perception.The lack of support for our third hypothesis, that there would be a difference in correlation estimates between incongruent conditions, was surprising given the greater strength of the size channel relative to opacity demonstrated in previous work [19,48,49], although this may be a facet of the non-additive interaction between size and opacity manipulations we found.Despite the lack of support for this hypothesis, we did find that the size decay channel explained more variance (.104) in our model than opacity decay (.087).
Taking into account the present work, which manipulates point size and opacity together, and previous work manipulating the same visual features in isolation, we provide recommendations to visualization designers and researchers: • When r is between approximately 0.3 and 0.75, and the scatterplot in question is intended solely for the communication of correlation, designers may wish to implement the size decay function, as previous work [48] has shown it to produce the most accurate correlation estimates in this range.• Outside of this range, and with the same caveats in place, designers may wish to implement the opacity decay function [49]; while its effect on correlation estimation is small, it does significantly increase estimation accuracy.• There exists a combination of size and opacity decay functions that produces accurate correlation estimates while maintaining the increased r estimation precision that we would expect to see with high r values (see Section 5.2).Finding this will require extensive future testing.

Combining Manipulations
Figure 6 and Figure 7 show how, on average, the combination of typical orientation size and opacity decay functions results in an overestimation of r for the majority of values.While this does not solve the underestimation problem, it does demonstrate that with regards to using point size and opacity to bias viewer's estimates of correlation in scatterplots, there would appear to be few limitations.
If we can over-correct correlation estimates, then we certainly have the ability to correct appropriately.The issue here is not one of our ability to change people's perceptions, but of tuning the use of these visual features to be able to change people's perceptions in systematic ways.We explore what further work would need to be done to achieve this in Section 5.6.Combining inverted size and opacity decay functions also had the predicted effect in this case, producing the lowest and least accurate estimates of correlation.Combining inverted manipulations did not, however, significantly change the shape of the estimation curve (see Figure 7).In addition to interacting non-additively, the effects we observe operate differently depending on the direction of the change induced in perception.This finding can also explain the lack of support found for our hypothesis stating there would be a significant difference in r estimation error between the two incongruent conditions.Despite the size channel being more powerful with regards to influencing correlation estimates, the fact that this power depends on the direction the function is set causes incongruent functions to act against each other in ways we would not expect.Indeed, the incongruent condition that used a typical orientation size decay function exhibited lower mean error than the one using inverted orientation size decay (see Figure 7), however in each case opacity decay appears to have blunted the power of the size decay function to the extent that the difference in errors is not statistically significant.

Estimation Precision
Much previous work is consistent regarding the finding that r estimation precision increases with the objective r value [13,38,[40][41][42].More recent work using the same size or opacity decay functions (albeit in isolation and without an opacity floor) [48,49] found that in some cases, precision in r estimation is constant across the range of r values investigated.For example, the use of a size decay function, whether using typical/inverted orientation non-linear functions or a linear decay function, results in no change in r estimation precision [48].When opacity is used in the same ways, only an inverted decay function does not exhibit the conventional increase in precision with r.In the present work, precision in r estimation increased whenever a typical orientation opacity decay function was used.We suggest this is part of the moderating effect of the point opacity decay function on the size decay function; the visual character of scatterplots with high r values that use the size decay function eliminates the usual increase in precision we would expect, however the introduction of the opacity decay function normalizes this to the point where precision is restored.

Contributions of Size and Opacity Decay
Incorporating data from previous work [48,49] allows us to compare estimation curves for size decay and opacity decay in isolation and in combination.Figure 9 shows correlation estimation error curves in the present experiment, in two previous studies that used decay functions applied solely to size or opacity, and with no manipulations present.The "no manipulations present" plot is averaged over conditions from previous work [48,49].Using opacity decay alone significantly changes the amplitude of the estimation error curve, while leaving its shape intact (compare opacity manipulated and no manipulation present plots in Figure 9).Using size decay, however, changes both the amplitude and shape of the underestimation error curve (compare size manipulated and no manipulation present plots in Figure 9).When size and opacity decay functions are combined, the shape appears similar to that of size alone.This is in line with previous work establishing size as a more potent channel for the manipulation of correlation estimates [48] and positional means [19].It would appear that the addition of the opacity curve moderates the effect of the size curve as a function of the objective r value itself, without affecting the general shape of the curve.In the following, we briefly discuss the effects of each manipulation in isolation and the manipulations together, before making a case for the inclusion of both when tuning scatterplots for correlation estimation due to the complementary benefits each confers.Throughout the course of this section it should be noted that the analyses include data from several separate experiments [48,49].We argue that their use of near-identical decay equations (in isolation and without the use of an opacity floor) and experimental paradigms render comparisons appropriate, but we acknowledge the potential for overstated conclusions.
Using an opacity decay function in isolation has a small effect on correlation estimation.It does little to change the shape of the underestimation curve (see Figure 2), but slightly biases r estimates up to partially correct for the underestimation observed with normal scatterplots [49].Importantly, it also preserves the increase in correlation estimation precision with r that we would expect to find during correlation estimation tasks.Using the size decay function in isolation has a more dramatic effect.The shape of the estimation curve is altered quite radically, and estimation precision does not increase with the objective r value [48].Size decay over-corrects at lower values of r, leading to an overestimation effect, while at high values the curve begins to change direction, leading to a more severe underestimation.In the middle range of r values, however, the size decay function in isolation performs well.One option for tuning correlation estimation using these functions would therefore be to use the size decay function alone for mid-range values of r (0.3 to 0.75), and to use the opacity decay function outside of this range.Used together however, we can exploit the power of the size decay function whilst maintaining the expected increase in precision with r that the opacity decay function confers.It is clear that the simple combination we have used in the present study does not represent an ideal tuning, as participants overestimated r for the majority of values, but this confirms that there is the scope to bias r estimates significantly using the functions supplied here.Further work would be required to obtain precise measures of the contributions of each decay function when they are used together, as their combination is not additive.Doing this would allow us to tune each function according to both objective r value and the tuning of the other function to facilitate more accurate correlation estimates.We can derive new curves that describe the effect that each manipulation and the combination of manipulations has on correlation estimates (Figure 10) by comparing them to estimates without any manipulation present across the range of r values used.We term this 'power'.The dotted line on each plot shows the power that would be needed to correct for the standard underestimation curve (see Figure 2).As we can see, size alone provides the closest to the requirement, and combining size and opacity decay functions results in gross overestimation.Figure 10 includes the integral of each power curve over r as a measure of the total power of each curve.We also display the difference between this integral and the integral of each required-power curve over r, which shows to what extent the power we observed differs from what would be required.what is observed when no manipulation is used.The dashed line represents the power that would be required to correct for the observed underestimation of correlation in scatterplots.The integral of each power curve over r is provided, as well as the difference between this integral and the integral of each required-power curve over r.

Mechanisms
Previous work has made the case for opacity and size decay acting primarily through salience/perceptual weighting regarding correlation estimation, with the caveat that spatial certainty also plays a small part in the mechanism behind size decay [48].Our results are supportive of this notion, with our highest and lowest estimates being observed in the congruent typical and congruent inverted conditions respectively.These findings also support dot density [58] and feature-based attentional bias accounts [19,50].As all of these mechanisms would be expected to operate in the same direction, making conclusions about the relative contributions of each is difficult.The body of evidence generally points to a high-level probability distribution account [39,41].On a lower level, numerous candidate mechanisms exist, which are mostly expected to act in the same direction.Previous work concluded that spatial certainty [48] may play a small role with regards to the effects of size decay on correlation perception; our results neither confirm nor refute this, but instead provide further evidence for salience/perceptual weighting/dot density changing participants' perceptions of the width of a probability distribution to affect correlation estimates.Hong et al. [19] found that the inclusion of larger or more opaque scatterplot points was able to bias estimates of positional means, but that the relative contributions (weights) of these visual features with regards to perception change as a function of the ranges of sizes/opacities used.It is clear from this evidence and the present study that the perceptual weights of point size and opacity are not the same.

Limitations
Firstly, our participants' performance on the point visibility task was poor, with an average score of 74.89%.It is clear from these results that for many participants, the smallest and lowest opacity points we used were simply not visible, although it would seem that this low visibility had no significant effect on correlation estimates.Regardless, for many of our participants it will have appeared that we were removing data, which violates our intended aims.Addressing this would require a by-participant calibration of point size and point opacity, which is beyond the scope of our current methodology, as it would require stimuli to be fully re-generated for each participant.We aim to implement this in the future, although it would require the use of a platform such that experimental stimuli could be regenerated per a calibration task completed by the participant.We cannot say precisely what proportions of the observed effect in the typical orientation congruent condition were due to size or opacity decay.We can conclude that these effects are not linearly additive, but must suggest further work to define precisely each of their contributions.While our provision of dot pitch measurements is a step in the right direction, the present methodology leaves us unable to comment on the variability in participants' viewing contexts.We argue that our large sample size and recruitment from lay populations renders our conclusions resilient to changes in viewing contexts, but further, in-person, experimentation is planned to confirm this.Due to extensive previous testing of no-adjustment, size-only, and opacity-only manipulation scatterplots [48,49], we chose not to include these as conditions in the present work.We did not consider the increased costs and experimental length worth the inclusion of three extra conditions, although we acknowledge that these would assist us in making claims about additivity and the relative contributions of point size and opacity decay.
Channels such as point size, colour, opacity, and shape have been used in past work to encode variables beyond the standard two typically used in scatterplots [19,45].While we focus purely on correlation estimation in the present work, we acknowledge that the use of our techniques is likely to lead to incorrect interpretations, especially when scatterplots are designed with other tasks in mind.Given evidence that size, shape, and colour are not entirely separable scatterplot features [45], if viewers assume that variations in point size/opacity correspond to additional encoded variables, confounds in interpretation may be introduced.As our contribution in this paper is providing evidence that changing certain visual features in systematic ways can alter viewers' estimates of correlation, we do not consider this to be problematic.If plots such as the ones we have presented were to appear in the wild, however, it would be necessary to clarify that they were designed to aid in the rapid and intuitive interpretation of correlation (and only this).Irrespective of the potential for misinterpretation, we provide strong baseline evidence for a perceptual effect of changing point size and opacity in scatterplots that may be expanded on and further exploited in future work.
While we put forward salience as the most likely driver of the effects on correlation estimation we observe when using point size/opacity manipulations, the data we have gathered do not allow us to comment on the reasons behind the differences in the shapes of the estimation error curves (see Figure 9).The context that the manipulation is used in (i.e. the r value of the scatterplot) interacts with point size and opacity manipulations in complex ways.To understand how point size/opacity manipulations operate in more detail, future work may wish to choose fewer r values and instead vary the type of decay equation or range of sizes/opacities, but this is beyond the scope of the current work.

Future Work
There is evidence that viewers overestimate correlation in negatively correlated scatterplots [44].Findings that correlation perception in negatively correlated scatterplots functions symmetrically to that of positively correlated scatterplots [17] suggest that the techniques we have implemented in the current paper may be used (in a symmetrical manner) to address the overestimation bias.We found evidence that the influence of size and opacity decay functions changes according to the direction they are operating in, meaning experimental work with negatively correlated scatterplots would be required, and results may differ significantly from findings related to the underestimation of correlation in positively correlated scatterplots.For size and opacity decay in the present work we used equation 1.Given our finding that the combination is nonadditive, there are a multitude of adjustable parameters for each decay function that require rigorous testing in order to produce concrete values of the contributions of each.The value of b is one such parameter.We used the same value of b (0.25) as previous work [48,49].Changing this value can increase or decrease the severity of the decay function in question.Additionally, we used a constant and a scaling factor with the size decay manipulation to ensure our points were visible.These values could also be changed.Aside from changing aspects of equation 1, there are other equations that could be used, including ones that take into account objective r value to change the values used to set point size or opacity.Future experimental work may use the major axis through the probability ellipse instead of the regression line as a baseline to change point sizes and opacities; evidence that people often report the major axis when asked to visually estimate the regression line [12] suggests that this may produce a different pattern of results from those seen here.If changes in dot density are driving changes in correlation estimates, the congruent conditions here are an example of redundant encoding.Future work may explore using different channels to redundantly encode dot density, such as marker shape, orientation, or color.The present study opens the door for this future work, as it provides the necessary additional data to previous experiments using only size [48] or opacity [49] decay functions.Further testing of these manipulations in isolation and combination using different decay function parameters will allow researchers to build a more complete picture of how these visual features impact correlation estimation, and how we can exploit them to correct for well-known biases.
Through this work we also provide an example of an experimental framework that we argue should be employed to test a wider range of data visualizations, statistical summaries, and task types.Our framework is fully open source, and can be easily adapted for other charts and modalities.Doing so will further the cause of empirically-informed data visualization design.

Figure 1 :
Figure 1: Examples of scatterplots designed for different scatterplot-associated tasks.Both color and point shape have been used to delineate different clusters in the cluster separation plot.

Figure 2 :
Figure2: Using a function relating objective to perceived r value[41] provides a visualization of the nature of correlation underestimation reported in previous work.An identity line has been included to illustrate where viewers are most and least accurate.

Figure 3 :Figure 4 :
Figure 3: Adjusting point opacity to address overplotting.Contrast between the points and the background is full (alpha = 1, full opacity points, L) or low (alpha = .1,low opacity points, R).The dataset used has 40,000 points.

Figure 5 :
Figure 5: Examples of the experimental stimuli used with an r value of 0.6.

Figure 7 :
Figure 7: Plots showing how participants' correlation estimation errors change as a function of the r value for each combination of size and opacity decay factors.Overestimation occurs above the dashed line.

Figure 8 :
Figure 8: Comparing mean errors in correlation estimation by trial number.Points represent unsigned mean errors for each trial number.The plotted line is the locally estimated smoothed curve, with the ribbon representing standard errors.

Figure 9 :Figure 10 :
Figure 9: Plotting r estimation error against the objective r value for opacity and size decay in isolation from previous work, and for their combination in the present study.The dashed line represents 0 error in correlation estimation, and standard deviations are shown as error bars.Note that these curves have been smoothed.Overestimation occurs above the dashed line.

Table 2 :
Pairwise comparisons.TO = Typical Orientation, IO = Inverted Orientation.The interaction is driven by the non-additive nature of combining point size and contrast decay functions, and the only nonsignificant contrast is found when incongruent decay functions are compared.