Nested Fusion: A Method for Learning High Resolution Latent Structure of Multi-Scale Measurement Data on Mars

The Mars Perseverance Rover represents a generational change in the scale of measurements that can be taken on Mars, however this increased resolution introduces new challenges for techniques in exploratory data analysis. The multiple different instruments on the rover each measures specific properties of interest to scientists, so analyzing how underlying phenomena affect multiple different instruments together is important to understand the full picture. However each instrument has a unique resolution, making the mapping between overlapping layers of data non-trivial. In this work, we introduce Nested Fusion, a method to combine arbitrarily layered datasets of different resolutions and produce a latent distribution at the highest possible resolution, encoding complex interrelationships between different measurements and scales. Our method is efficient for large datasets, can perform inference even on unseen data, and outperforms existing methods of dimensionality reduction and latent analysis on real-world Mars rover data. We have deployed our method Nested Fusion within a Mars science team at NASA Jet Propulsion Laboratory (JPL) and through multiple rounds of participatory design enabled greatly enhanced exploratory analysis workflows for real scientists. To ensure the reproducibility of our work we have open sourced our code on GitHub at https://github.com/pixlise/NestedFusion.

picture.However each instrument has a unique resolution, making the mapping between overlapping layers of data non-trivial.In this work, we introduce Nested Fusion, a method to combine arbitrarily layered datasets of different resolutions and produce a latent distribution at the highest possible resolution, encoding complex interrelationships between different measurements and scales.Our method is efficient for large datasets, can perform inference even on unseen data, and outperforms existing methods of dimensionality reduction and latent analysis on real-world Mars rover data.We have deployed our method Nested Fusion within a Mars science team at NASA Jet Propulsion Laboratory (JPL) and through multiple rounds of participatory design enabled greatly enhanced exploratory analysis workflows for real scientists.To ensure the reproducibility of our work we have open sourced our code on GitHub at https://github.com/pixlise/NestedFusion.

INTRODUCTION
In scientific data analysis the initial exploratory phase of visualizing and conceptualizing the relevant empirical phenomena in a dataset is both an essential aspect for effective work and comparatively under studied in the context of scientific applications, where skipping such inductive explorations in favor of immediately utilizing known models for analysis is the de facto standard.However, recent work has shown how unanticipated or anomalous phenomena can often mislead such analysis, motivating a workflow that at least starts with purely empirical exploration of data in the initial phases of work after making measurements in order to have a more informed prior of the distribution of actual phenomena within a dataset before applying the more rigorous scientific models to ensure the chosen models are appropriate [28].While common data-centric techniques of exploratory analysis such as dimensionality reduction visualization have proven to be very effective in many domains of scientific inquiry [2,3,8,13,14,16,17,19,22,27,29], in domains with multiple measurement apparatuses of different resolutions and scales, existing techniques can fail to model some of the phenomena we wish to discover.This is because the standard formalization for dimensionality reduction techniques is that of a single dataset of measurements of identical shape which corresponds one to one with the set of objects and patterns between objects that the analysis aims to visualize.However it is often the case that underlying phenomena are differentiated at levels that do not align with the resolutions of measurement each apparatus perfectly [28].Rather, there may be multiple methods of measurement which each elucidate different aspects of an underlying structure but which all have varying resolution scales and thus are sensitive to the different properties of various aggregations of the structure.
One such domain where scientists require more powerful exploratory analysis tools is the work done by the PIXL Science team with the Mars Perseverance Rover at NASA (National Aeronautics and Space Administration).In service of the high-level goal of searching for signs of a history of life on Mars, scientists are interested in the fine-grain mineral structure of target locations on the Martian surface [7].The Perseverance Rover contains two (among many) scientific instruments to assist in this task: the Planetary Instrument for X-ray Lithochemistry (PIXL) instrument [1], which includes an X-ray fluorescence (XRF) spectrometer, and a Micro-Context Camera (MCC) for multi-spectral imaging.When observing a specific target location of geological interest, the rover will use both of these instruments to conduct two co-aligned scans as shown in Figure 1.While both of the instruments scan over the same physical location, their resolutions are much different, where for each scan point, a single XRF spectrum corresponds to a larger patch of approximately 100 MCC imaging pixels.At the same time, each instrument elucidates different aspects of the underlying mineralogy of the target.While the spatial precision of each MCC pixel corresponds much more closely to individual homogeneous mineral grains, it lacks a nuanced depth of information to accurately differentiate minerals based on chemistry.On the other hand, each XRF spectrum produces a detailed quantified distribution of the chemical composition of the scan point, but the larger diameter of this point may encompass multiple grains of different minerals thus producing an aggregate chemical distribution.The ultimate scientific question is about understanding the distribution of underlying minerals.While both measurements offer extremely powerful signals concerning this distribution, neither alone encompasses all the possible information to explore, leading to the need for modeling these different measurement scales together.To tackle these significant scientific challenges, we present the following major contributions: (1) A novel problem formulation tailored to exploratory analysis of nested measurement datasets, which consist of irregularly overlapping measurements of multiple scales (Sec 3.1).This formulation is rooted in addressing the practical needs of PIXL scientists at NASA who analyze XRF and MCC data collected by the Mars Perseverance Rover.
(2) The Nested Fusion algorithm, a new model for latent analysis and dimensionality reduction for nested measurement datasets (Sec 3.2), This method is significantly more effective than alternatives, yielding latent encodings at a resolution far higher than what existing dimensionality reduction techniques can achieve.We evaluate the effectiveness of Nested Fusion both qualitatively within the context of initial data exploration and quantitatively in data reconstruction fidelity.Nested Fusion outperforms the state of the art in dimensionality reduction for nested measurement datasets, providing more interpretable and practically useful results (Sec 4).
(3) Deployment of Nested Fusion in scientific practice within the PIXL team for the Mars Perseverance Rover, enabling scientifically meaningful visual interpretation and efficient discovery of cross-modal patterns (Sec 5).We analyze how Nested Fusion is utilized in practice and how it fits within the scientists' existing analytic workflows.To ensure reproducibility of our technique and findings, we have open-sourced it at https://github.com/pixlise/NestedFusion

BACKGROUND AND RELATED WORK
In this section, we introduce the scientific problem statement and dataset overview from the PIXL instrument on the Mars Perseverance Rover, define our formalization of nested measurement datasets, and go over related work in scientific exploratory data analysis and dimensionality reduction.

Mars Perseverance PIXL Data
The PIXL instrument aims to measure the mineral structure of small rock samples (called targets) on the surface of Mars contributing toward the larger inquiry towards any potential evidence of a history of life on Mars.For each individual target on the martian surface multiple scans are taken.First is the MCC Multi-spectral imaging camera, which takes a series of four images illuminated by specific wavelengths of near-visible light: Near-Infrared (NIR), Green, Blue, and Ultraviolet (UV).This produces a single color image for each target with 4 primary channels, as opposed to the standard 3 channel RGB, and is often analysed using the 16 distinct ratios between them.Each image will contain on average about 500,000 of these 16 channel pixels, spanning a region of approximately 100 square centimeters with each pixel corresponding to a resolution of approximately 15 microns.At roughly the same time a scan is taken of the same target with the PIXL instrument for X-Ray spectroscopy.This instrument produces much more detailed quantitative data, consisting of a grid of X-Ray fluorescence spectra which are quantified to represent the distribution of elemental weight percentages at each scan point, we call this distribution a quantification.Each scan can consist of between 1000 and 10,000 individual spectra (depending on the particular shape of the target) covering a smaller region of approximately 30 square millimeters.Each scan point is measured with a beam diameter of 50-200 microns 1 , thus corresponding to a region covering approximately 100 MCC pixels as shown in Figure 1.
Thus far, at the time of writing, during the time that the Perseverance Rover has been in operation, there have been 103 target locations scanned producing a total of 295,602 52-dimensional (the number of unique elements included in all quantifications) quantified spectra, as well as 26,966,169 MCC pixels.However, not all scans include both data types and so for this work focusing on combining information from both measurements, we are restricting to a total of 103,005 scan points which each contain a single quantification as well as 100 corresponding MCC pixels.

Related Work
Previous work in collaboration with PIXL scientists has shown how data science techniques can form an essential component of their scientific workflow by focusing specifically on modeling anomalies and visualizing distinct empirical phenomena [28].This work focuses on the problem of initial visualization and thus on dimensionality reduction as an effective technique for enabling such visualization for the high dimensional PIXL data.
Another conceptualization that can produce comparable visualizations is the approach of latent analysis which takes a more, generally Bayesian, probabilistic framework to the problem of learning low dimensional representations.These approaches mostly stem from the development of variational autoencoders (VAE) [11], and different latent models have been introduced to handle many scientific problems [8,27,29] including planetary science [19] among many other domains.

PROPOSED METHOD: NESTED FUSION
Grounded in understanding from previous work with PIXL scientists [28] our aim is to develop a method for visualizing and determining the distribution of mineral phenomena within each PIXL target, and to assist in their identification based on their relationship between the past history of targets.Focusing on targets where both XRF and MCC data are present and overlapping, we hope to enable work to discover new patterns that each individual instrument cannot differentiate independently.While scientific interpretation is the end goal, the specific interpretations (i.e., "we see a grain of olivine here or a potential aqueous intrusion there") enabled by the method are out of the scope of this work.Therefore we introduce a precise formalized problem statement which aims to properly encode the scientific priors and goals of the problem with specific consideration to the non-standard mixed scale measurements present in PIXL data, while simultaneously laying the foundation for how such methods can be more easily generalized to new domains.Finally after introducing the problem formulation we will describe our proposed method, Nested Fusion, which looks to solve this problem.

Problem Formalization of Nested Measurements
As Figure 1 shows the nested hierarchical structure of PIXL data is not immediately amenable to standard data science techniques barring some flattening operation which leads to over aggregation and loss of resolution (see Joint Models in Section 3.3).Thus we introduce a formalization of nested measurement datasets which we will use to model this structure and subsequently perform better analysis on the data in a more natural manner, while also outlining precisely the requirements that any other dataset must meet in order to utilize the methods introduced in Section 3 in other domains.Table 1 summarizes the notations and terminology introduced in this section and used throughout this paper.We recursively define a nested measurement dataset  as consisting of a tuple of two components: =1 is simply a standard dataset of  independent and identically distributed samples of  dimensional data representing the particular measurements at some specific scale.Then  is what we define as the nested scale.The nested scale is a tuple ( ′ , ) of another nested measurement dataset  ′ = ( ′ ,  ′ ) as well as a nesting function ( :  → 2  ′ ) which maps each data point in  to a set of corresponding data points in  ′ that cover the same underlying physical and latent area.In order to terminate this regress there must be a final scale  ∅ which has no further nested scale and thus is notated as ∅.Having no further nested scale means that  ∅ is the highest resolution available in the nested Symbol / Term Meaning Nested Measurement Dataset () A class of dataset which combines multiple kinds measurements that cover a common area Data Scale ( ) The set of measurements of a particular kind that define a data layer of a specific resolution Nested Scale (  ) A scale at a higher resolution which has a correspondence where multiple measurements in the nested scale correspond to a single measurement in the lower resolution scale Nesting Function () A function which maps a specific data point at a scale to the set of data points in the corresponding nested scale that cover the same physical space.

Maximum Resolution Latent Scale (𝑋 ∅ )
The scale for which no further nested scales exist, defining the highest resolution available in the dataset and thus the resolution at which latent structure can be modeled

Latent Base Scale Correspondence (𝛽)
A function which maps a specific data point at any scale to the set of data points at the maximum resolution latent scale that cover the same space as defined by repeated nesting.
∈  A specific data point at some data scale   ∅  ∈  ∅ A specific data point at the maximum resolution latent scale

𝑧 𝑖
The latent encoding corresponding to  ∅  Table 1: Notations and terminology used in this paper measurement dataset, and so we refer to it as the maximum resolution latent scale since our aim to model latent structure at this maximum resolution.
The key assumption is that all of the information at lower resolution scales supervenes on latent information at the maximum resolution.That is, that there is some more basic structure underlying the dataset that is approximately modeled at the maximum resolution as an unobserved latent variable, where each sample  ∅  ∈  ∅ is generated from a random process involving the latent value   that has a prior probability distribution  (), producing some conditional distribution  ( ∅ |) that we aim to learn 2 .For all other scale samples with nesting function  we then define the  correspondence which returns the set of all latents at the base maximum resolution that correspond to a sample   : The supervenience assumption then can be restated probabilistically that all lower resolution scale variables are generated from the conditional distributions  ( | ()), and thus are conditionally independent of measurements at any scale other than the maximum.This structure is outlined in the graphical model for the PIXL dataset in Figure 3.
While seemingly fairly abstract and obscure, this underlying structure and supervenience assumptions of a nested measurement dataset is in fact pervasive in the sciences [5,6].The natural sciences in particular commonly share the physicalist reduction assumption (at least within a single domain), that any given composite object of study is fully reducible to the set of underlying physical objects of which it is composed [20,25].This assumption necessitates that if multiple kinds of measurement apparatus measure an overlapping subject in time and space, then there must be some correspondence relation between the two measurement modalities.Furthermore this assumption enables us to study the intersections between these different layers of composed abstraction, as each class of composite structure is often best observed using separate kinds of measurement that very often do not have perfectly aligned scale and resolution.More complex composite structures will tend to exhibit additional complexity and depth (note the high dimensionality of the PIXL quantified spectra) however at the expense of necessarily being more spatially diffuse.While higher resolution measurements may be possible at the expense of more limited depth. 3

The Nested Fusion Algorithm
The previous section describes the formalized problem of learning latent maximum resolution scale variables from nested data.One important aspect to note when introducing our solution is that the formulation of the latent variables at this scale is itself already a modeling approximation.In reality we expect fundamental structures within a domain to exist at finer scales than are directly accessible, and so we simply use the highest resolution available in any given nested measurement dataset as a proxy scale for a 'true' latent .What this lends support to is the use of variational inference as a method to efficiently learn approximate distributions of , which is acceptable as we do not in general actually have strong enough priors about the structure and properties of a 'true'  to justify other methods which have significant computational and other drawbacks when compared to widespread empirical success of variational auto-encoding models.Therefore the approach taken in this work, Nested Fusion, is a variational auto-encoder model [11] structured to work on nested measurement datasets.
Figure 2 describes Nested Fusion's architecture.Without loss of generality, we explain how the framework is applied to the PIXL data/scenario presented in Figure 1.Specifically we show how a scan point consisting of both low-resolution elemental quantification values and nested high-resolution imaging pixels, is jointly used to learn high-resolution latent vectors.The latents are learned though optimizing via stochastic variational inference [9] both encoder and decoder models to maximally reconstruct the original scan points 4 .The 1-, 2-, or 3-dimension latents would then be used for visualization by PIXL Scientists.
First, let us consider the encoder step.For the encoder model, which estimates the conditional latent distribution given the data (|), we must choose a class of distributions for the latent prior  () and specify the relevant class of distributions for the data type of each measurement scale.We focus on a basic prior model of latents being standard normal ( ∼ N (0, I)) which we can use to compare to other methods of dimensionality reduction.However it is important to note other latent structures are possible to model, including mixing categorical and other distributions as relevant for the visualization technique and kind of analysis being done.
The task for the encoder then is to take the nested structure  as input, and output the reparamtetrized latent distribution parameters   and   for each   at the maximum resolution latent scale.In order to do this, we must choose a network architecture that can adequately handle the structure of  and/or perform some transformations on  to ensure it is compatible with the chosen encoder network structure.The approach taken by Nested Fusion is to convert the hierarchical set of heterogeneous data points into a single sequence of tokens that can be used as input to an encoder model. 5his is done by first using a learned mapping   which is a linear transformation for each data scale to a common high dimensional token dimension (determined as the sum of all dimensionalities of each data modality to ensure no bottleneck at this stage) that can be used as a common shape for the encoder sequence.Then a sequence is built where starting at the lowest resolution dataset  in , for each data point  ∈  we append the corresponding token at the front of the sequence, and then find the sequence for the nested scales of that token recursively and append the resultant nested sequence (here using addition/summation notation to represent sequence concatenation).

𝑆𝑒𝑞(𝑥
Once a sequence of tokens is generated this sequence is passed into some sequence-to-sequence encoder model which outputs a sequence of corresponding estimated latent parameterization means and variances.However only the output positions actually corresponding to  ∅  inputs are then taken to sample a latent from the reparamtetrized distribution   ∼ N (  ,   ).
For decoding, remember the conditional distribution for data points defined as  ( | ()).Thus, what is required for decoding is a unique model for each scale in , where a model either takes as input a single latent in the case of the maximum resolution scale or a set of latents as defined by the correspondence set .For the latent scale decoder, a simple multi-layer perceptron is an appropriate architecture, while for the higher levels needing to decode sets of latents we can use transformers [26].Importantly, in order to the prevent the potential pitfall of the model merely using positional information to encode information only used in the aggregate decoding step not corresponding to the actual specific latent at each point, our approach uses a transformer without positional embeddings in this step as they are order invariant, thus ensuring that the best examples of contemporary models which effectively encode nested structures (grammar in the case of language).full distribution of latents, rather than a few arbitrary picked out latents, properly encodes lower resolution aggregate information.
Finally, given the encoder and decoder models, as well as the latent prior distributions, the models are trained using stochastic variational inference on the evidence lower bound as is standard for a VAE based architecture [11]; implemented in our case using the probabilistic programming framework, Pyro [4].
To evaluate our method of Nested Fusion we test the model performance on the real, large-scale Mars Perseverance PIXL dataset introduced in Section 2.1 comparing to existing dimensionality reduction and latent analysis techniques.As analysis of this unique dataset representing the frontier of Mars exploration is the raison d'être for this work as a whole, we specifically focus on evaluation with direct relevance towards the scientific goals and capabilities of scientists actively working at NASA JPL and around the globe on this data.
First, in order to utilize nested fusion we have to define the relevant nested measurement dataset formulation for the PIXL dataset, which we define as: This includes   which consists of 103,005 of quantified spectra which are represented as 52 dimensional non-negative real valued vectors whose elements are the elemental weight percentage values produced from PIXL XRF scan points.Here   is the set of 1,983,506 MCC multispectral imaging pixels which are 16 dimensional nonnegative real valued vectors 6 .Finally we have   which is the nesting function of XRF scan points to corresponding pixels.This is generated by utilizing the known range of XRF beam diameters of the PIXL instrument being approximately 150 microns, as well as the calibrated location alignment of MCC images with XRF scan points.This alignment allows us to have a shared coordinate system and thus calculate physical distance between scan-point centroids and MCC pixels.Thus we can define the nesting function to select all pixels within 75 microns of an XRF scan point, which results in the 100 pixel aggregations previously discussed:

Comparing with Alternative Models
To demonstrate the effectiveness of Nested Fusion, we compare it with alternative dimensionality reduction models that can combine both scales of data.Since this problem is non-standard we must introduce the set of alternative models that allows utilization of existing methods to our problem.We categorize these models into three types based on how they handle the nested structure of the PIXL nested measurement dataset, Nested Fusion (our method), Concatenative Models, and Joint Models.We describe these three 6 This number is less than what you would expect given that each scan point with a quantification covers an area of 100 pixels, however in reality many of these areas overlap, meaning the same pixel can be included in multiple different scan points.Our formalization of nested measurement datasets allows this without issue and in fact it is preferred to strict partitioning as we can better models the actual resolution of dependency for each measurement.The only issue occurs when converting back into physical space such as with the color plot from Figure 1.We address this by simply averaging the multiple produced pixel level decoded inferences for overlapping pixels, however introducing more sophisticated techniques of dis-aggregation is a very promising direction for future work classes of models in Figure 3 using the language of Bayesian graphical models, which illustrates how these classes encompass a full taxonomy of problem conceptualizations for nested measurement datasets 7 .However within each of these classes any particular model type (e.g., UMAP or VAE) can be used.For our comparisons we took both alternative modeling frameworks and for each trained three representative models.First representing the most common approach to dimensionality reduction used ubiquitously in practise is Principle Component Analysis (PCA).Then to represent state of the art dimensionality reduction we used UMAP [15] over t-SNE [24] as it provides state of the art performance, has a well documented history of applications in science, and is among the techniques least sensitive to hyperparameters, and is much more computationally efficient for our scale of data.Finally we also trained a variational autoencoder to represent the most standard approach to generative latent analysis.Since Nested Fusion and the variational autoencoder methods are agnostic to the specific neural network sizes and architectures used, for our evaluation we trained multiple networks using simple multi-layer perceptron models (with the exception of using a transformer encoder for the Nested Fusion decoding step as described in Section 3) with hidden layer sizes from 64 to 256 and a number of hidden layers from 4 to 16 and selected the best-performing models at each latent dimensionality.Nested Fusion's open-source repository provides the pretrained tested models at https://github.com/pixlise/NestedFusion.These methods together cover the most common latent analysis and dimensionality reduction techniques used in practice, including both parametric and non-parametric methods.Furthermore, as PIXL scientists are the ultimate users who visualize the latents in 1-, 2-, and 3-dimensions we compare Nested Fusion with these alternatives at such dimensions.
Joint Models.The first class of alternative model we will consider are joint models which attempt to model the joint distribution of a low resolution data point and it's entire corresponding nested scales in a single latent.For the PIXL dataset we can describe this framework as trying to find a single latent for each XRF scan point: Concatenative Models.The other class of model considered are concatenative models, where each high resolution data point is used as the latent scale, and lower resolution corresponding measurements are simply concatenated to the high resolution sample vector.For PIXL we describe this as taking each XRF scan quantification and duplicating it and concatenating on top of each individual MCC pixel and using this to learn a high resolution latent:

EVALUATION: NESTED FUSION EFFECTIVENESS 4.1 Conceptual Drawbacks of Alternative Methods Compared to Nested Fusion
Despite covering the full set of possible alternative approaches (given the nested measurement dataset framework), each of these method classes has substantial conceptual drawbacks, illustrated in Figure 4.A joint model has a much more difficult encoding task where each latent value is overloaded with encoding the whole set of  (   ) making fidelity with low latent dimensionality very difficult.Furthermore it will also only produce a latent at the lowest possible resolution, the exact opposite of the high resolution latents in Nested Fusion.Concatentative models can perform somewhat better, as they produce latents as similarly high resolution to Nested Fusion.However the concatenative method of combining layers erases all scale contextualization of each high resolution data point, thus encoders and decoders do not have access to more complex distributional information within each nesting scale, which potentially can have an effect on the accuracy of final low resolution estimates when such information is important.For instance, if we consider a case where two scan points includes the same kinds of minerals but in different proportion, this will affect the values of    and    in such a way that any concatenative model must necessarily produce different embeddings even for the exact same kind of mineral!This false encoding of the confounding distributional information on the individual scale is inextricable from the concatenative method.However since Nested Fusion has access to this distributional information for its encoder and decoder, in principle it could learn something close to a 'true embedding' which the concatenative model strictly could not.Therefore, given these conceptual drawbacks of the entire range of alternate models to Nested Fusion, we have reason to prefer it based on our prior and theoretical understanding of what the different techniques can learn in principle.

Qualitative Evaluation
It is important to restate that the success or failure of any of the presented latent analysis and dimensionality reduction techniques is determined entirely within the context of their actual use, for the purposes of this paper being in their application within PIXL science.Previous work has outlined the basic structure of how machine learning techniques have been successfully applied within the PIXL science team, by enabling an iterative semantic phenomena modeling process [28] that helps scientists map out the space of considerations before continuing with standard domain modeling.Therefore we begin our evaluation of the different methods of latent analysis at the same point that PIXL scientists begin their analysis by visualizing the resultant latent distributions produced by each method directly, as two dimensional heatmaps, in order to try to discover the distinct phenomena to consider in their later modeling.Figure 5 shows the output of each of the methods applied to the Dourbes target from Figure 1.Specifically, for such a two dimensional heatmap plot of the latents scientists expect to see a small number of distinguishable regularities which can either be regions visualized in the heatmap as distinct areas of higher density in bright green or as separable clusters which need not be high density but otherwise must be otherwise identifiable as a standalone feature to consider.
In Figure 5, notice how all of the joint methods (right column) learn a comparatively small set of regularities, each showing only three distinct modes.While this regularity and differentiability is certainly a positive, we know from previous authoritative analysis on this specific target [23] that there are at least more than three relevant phenomena that must be distinguished and so we have reasonably high confidence that these representations are overly abstracting.The high resolution methods (left column) are more varied.Concatenative VAE, like the joint models, produces three primary clusters, while Concatenative PCA encodes a continuous global structure with limited local differentiation, which in this context makes mineral identification much more difficult.Finally concatenative UMAP produces an extremely complex distribution which shows no consistent high-density regions.Like a Rorschach inkblot, such complexity cannot serve as a reliable basis for building trustworthy shared interpretations between scientists focused on finding specific, repeatable, and understandable regularities.Indeed, the UMAP visualization produced in this context is perhaps the least scientifically helpful of all the options for PIXL scientists working on mineral identifications.Finally, we see that Nested Fusion (top left) produces the most distinguishable structure consisting of two large high-density regions on the left (which each are themselves clearly composed of a mixture of multiple overlapping but non-identical modes) accompanied by two more lower-density clusters on the right and another on the left.The distribution produced by Nested Fusion matches the scientific priors much more closely, where a reasonable number (more than three and less than a few hundred) of identifiable regularities likely corresponding with minerals can be clearly seen.
To further explore this effect, we compare the latent sub distributions of the highest performing methods (UMAP and Nested Fusion) when selecting known mineral grains to see the reliability of how well the latent space can be used to identify minerals.Based on existing well analyzed data in the Dourbes target [23] we looked to compare methods based on how well they could differentiate known distinct minerals.In Figure 6 the red region corresponds to known olivine while the green region corresponds to known pyroxene, two highly distinct mineral types present at the Dourbes target.We then can select the sets of latent samples corresponding to these two spatial regions in the dataset and compare each latent sub-distributions.What we want to see is a high degree of differentiability between the distributions of these two classes of mineral.Using the Wasserstein Distance metric [18] for empirical distributions, we found the Nested Fusion distance to be 1.416 while UMAP was 1.057.This shows Nested Fusion performing nearly 50% better than UMAP on this metric of mineral differentiability.Owing to the diffuse structure of the UMAP embedding when compared to the highly dense structure of Nested Fusion, the space was less clearly able to form distinct modes of different minerals, which is the primary goal of utilizing dimensionality reduction in this application context.
These results show how Nested Fusion produces a distribution more effective at identifying and representing distinct minerals or other phenomena which aligns precisely with what PIXL scientists hope to achieve in the scientific workflow of exploratory analysis, showing qualitatively the clear superiority of Nested Fusion to the alternative methods in assisting effective science.

Quantitative Evaluation
Besides the qualitative properties of the distributions that make them practically scientifically useful, PIXL scientists also require that the latent models are trustworthy enough in retaining most of the meaningful information present in the underlying data, and since we do not know a-priori what is or is not meaningful we must ensure that a representation retains as much information as possible about the original data to reconstruct it completely.Good fidelity then is a necessary but not sufficient condition for effective utilization, in particular considering the fidelity of quantifications which scientists trust as more authoritative when grounding mineral identification.Thus, we compare Nested Fusion with alternative models using reconstruction fidelity, a standard metric in evaluating autoencoding models, to quantify how much information is preserved in the latent encodings.For each model we calculate the coefficient of determination  2 for both X as well as X reconstructions in Table 2.
Our results show that Nested Fusion significantly outperforms all joint models (Joint VAE, Joint UMAP, and Joint PCA) at each reduced latent dimensionality used by PIXL scientists.This is expected, as explained Section 4.1, because the same dimensional latent values are tasked with a much greater amount of encoding and thus would be expected to perform worse at the low dimensionalities tested, and it confirms the observations from the qualitative evaluation that important information is likely being lost in the encoding.Concatenative models however tend to perform relatively better in these metrics.Among the concatenative models, concatenative PCA performs universally worst across all metrics, which is not surprising due to PCA being a linear model with limited modeling capacity.Concatenative VAE and UMAP both perform similarly in reconstructing the imaging layer as effectively as Nested Fusion.In green is shown a region of the target identified as Pyroxene while in red is a region identified as Olivine based on existing analysis [23].Comparing the latent sub-distributions of these two samples, Nested Fusion produces a distribution which has a greater degree of separation between the different minerals.
However, this layer contributes significantly less towards building trust for scientific interpretations as a standalone measurement but is most effective only when augmented with the more solid source of scientific semantic grounding in the XRF quantifications.When considering then the quantification reconstructions, what we find is that as predicted in Section 4.1 Nested Fusion significantly outperforms concatenative VAE in reconstructing the XRF quantification layer.Finally, concantenative UMAP's X reconstruction fidelity is lower but comparable to Nested Fusion's -however, given the other significant drawback of UMAP's inability to use this accuracy to practically assist in scientific exploration, its reconstruction performance is essentially irrelevant.
In summary, Nested Fusion attains higher reconstruction fidelity than the state of the art in dimensionality reduction and latent modeling while producing substantially more useful latent codes for scientific analysis. (highlighted in bold font), the crucial metric used by PIXL scientists when assessing the scientific trustworthiness of methods.

SCIENTIFIC DEPLOYMENT AND IMPACT
The ultimate importance of Nested Fusion is not found in its evaluation metrics but in its ability to have scientific impact by assisting PIXL scientists in visualizing and exploring combinations of datasets they simply could not easily or efficiently do otherwise.Towards this end, we deployed Nested Fusion in multiple capacities within the PIXL science team.The primary method thus far scientists have been able to utilize Nested Fusion is through its standalone implementation which is now open source at https: //github.com/pixlise/NestedFusion.This implementation directly works on existing and continuously incoming PIXL data, and pretrained models are also available.Pre-trained models include multiple different latent dimensionalities as well as models that include latent categorical class assignments that have the latent prior distribution being a sample of some latent class from a Dirichlet prior as well as a regular continuous latent code vector which allows the model to differentiate automatically seemingly categorically distinct regions.
With this implementation, PIXL scientists are able to easily visualize the distribution of multiple kinds of latent encodings across many targets at once.PIXL scientists choose to visualize these distributions in a number of ways, including direct distributions in latent space (see Figure 5) as well as visualizing various mappings into color overlaying the target image such as the plot in Figure 1.These two methods together allow scientists to see both abstract as well as spatial patterns and regularities in the data.These visualization techniques help PIXL scientists firstly discover quick heuristic understandings of the distribution of empirical phenomena present in a single target as well as commonalities in phenomena across multiple targets.
Through participatory design sessions over 6 months with nearly a dozen scientists, we have discovered a primary (though not exclusive) workflow that Nested Fusion enables within the context of exploratory data analysis.When a new dataset is generated, the process by which PIXL scientists begin to come to a consensus on its mineral composition is highly iterative, involving bringing forward various hypotheses and then coming up with ways to test these hypotheses given the data.The mechanics of how a hypothesis is tested can be complex and difficult, and so a method that could assist in having a more informed starting position in this iteration can greatly increase the efficiency of the whole process, saving a huge amount of extremely valuable and limited time.By forming a latent space over the whole history of PIXL data, and an encoder that can efficiently process this new data before having to retrain, new data can be quickly visualized and broken down into a few key regularities which can be compared to historical precedent of regions or even individual grains which bare a strong resemblance to regions or grains in the new dataset.This then helps to form a better initial assessment of the minerals present at a target and thus substantially speed up the overall identification process.This transforms the workflow of initial exploratory analysis, which historically would take the roughly 10-person team of spectroscopists approximately 21 days in collaboration to come to an initial determination of minerals into one which a single scientist can generate instantly a latent distribution and through refinement generate an identification of comparable quality in a matter of hours.
We found how the combination of the high fidelity of Nested Fusion along with its computational efficiency at inference time were both essential components compared to existing or alternative models to achieve buy-in by scientists.Furthermore we found that non-parametric alternative methods such as UMAP proved to be ineffective despite competitive fidelity due to the inability to form distributions for new data efficiently and producing distributions that are difficult or impossible to reliably interpret within the context of looking to understand specific phenomena, and thus does not help solve the scientific workflow problem that Nested Fusion addresses.
Nested Fusion provides a fundamentally new way for PIXL scientists to quickly visualize distributions of phenomena that span multiple measurement types and scales and thus explore new data more efficiently and effectively than was previously possible.This has provided a lesson for any interested applied data scientist: increasing the alignment between machine learning problem statement and scientific problem ontology, in this case by more accurately modeling multiple scale relationships, is an absolutely essential component of achieving genuine impact with these tools.Therefore we hope that future work will continue to develop ways in which we can improve the very frame from which we pose data science problems just as much as improving the methods for how we solve them, in order to make sure we can not only do better data science, but just do great science.

Figure 2 :
Figure 2: Model architecture and data processing pipeline for Nested Fusion as applied to PIXL data.High resolution latent vectors are encoded given a scan point containing an XRF quantification vector and collection of MCC imaging pixels.

Figure 3 :
Figure 3: Plate Notation for Graphical Models representing different latent variable formulations for the PIXL MCC nested measurement dataset.From left to right we have: (Left) Nested Fusion, representing the latent corresponding to the maximum resolution datascale and informing higher level measurements through aggregated functions; (Center) the concatenative model where there is a latent at the maximum resolution scale which affects higher level corresponding measurements not in aggregate but independently; and (Right) the joint model where a latent exists at low resolution and determines the whole distribution of all high resolution measurements.

Figure 4 :
Figure 4: Comparison between alternate models and their relative downsides.The left column shows the dependence mappings from the learned latent spaces to the two measurement spaces for Nested Fusion.The center column shows how a joint encoding learns a lower resolution representation which overloads the decoder for high resolution imaging data.The right column shows how a concatenative model ignores to full spatial context of the low resolution measurements by only forming a mapping from a single high resolution point.

Figure 5 :
Figure 5: Comparison of 2D Latent Distributions from different methods applied to Dourbes target (RGB map of MCC Image shown in top right).Axes are unitless latent values.High resolution models (left column: Nested Fusion and concatenative models ) displayed with 300 bins across each axis, while low resolution joint models (right column) has 200 bins due to the differing number of samples in each model type.

Figure 6 :
Figure 6: Comparison of Nested Fusion and Concatenative UMAP wit latent dimension 2 in differentiating distinct minerals in the Dourbes target.In green is shown a region of the target identified as Pyroxene while in red is a region identified as Olivine based on existing analysis[23].Comparing the latent sub-distributions of these two samples, Nested Fusion produces a distribution which has a greater degree of separation between the different minerals. dim

Table 2 :
Model reconstruction fidelity, measured as reconstruction fidelity  2 values for both the MCC imaging layer   (denoted as  2  ) and the XRF quantification layer   (deonted as  2  ) for latent dimensions of 1,2, and 3 needed by PIXL scientists.Nested Fusion outperforms all models across all latent dimensions on  2