A Fast Multi-objective Evolutionary Approach for Designing Large-Scale Optical Mode Sorter

Spatial mode division de-multiplexing of optical signals has many real-world applications, such as quantum computing and both classical and quantum optical communication. In this context, it is crucial to develop devices able to efficiently sort optical signals according to the optical mode they belong to and route them on different paths. Depending on the mode selected, this problem can be very hard to tackle. Recently, researchers have proposed using multi-objective evolutionary algorithms (MOEAs) —and NSGA-II in particular— combined with Linkage Learning (LL) to automate the process of design mode sorter. However, given the very large-search scale of the problem, the existing evolutionary-based solutions have a very slow convergence rate. In this paper, we proposed a novel approach for mode sorter design that combines (1) stochastic linkage learning, (2) the adaptive geometry estimation-based MOEA (AGE-MOEA-II), and (3) an adaptive mutation operator. Our experiments with two-and three-objectives (beams) show that our approach is faster (better convergence rate) and produces better mode sorters (closer to the ideal solutions) than the state-of-the-art approach. A direct comparison with the vanilla NSGA-II and AGE-MOEA-II also further confirms the importance of adopting LL in this domain.


INTRODUCTION
Our modern world heavily relies on optical communications.Simply put, it involves transmitting information from one place to another using light [1].This process can be divided into three main phases: modulation, transport, and detection.Modulation involves encoding bits of information (or quantum bits) into the properties of an optical beam.The light must then be transported from one place to another, either through free propagation or guided by an optical fiber.Finally, the beam is detected to recover the information.
In theory, any property of an optical beam could work to encode information.Besides amplitude, changes in phase, polarization, or wavelength of the carrier light can be used, but it requires a device able to discriminate optical beams based on that property.This requirement has restricted the use of spatial degrees of freedom in communication systems, which means encoding information in the shape of an optical beam.Distinguishing beams based on their shapes is a difficult problem [35,48] especially when it needs to be done on a miniaturized device [42].Additionally, the beam shape must also remain unchanged during propagation, meaning it needs to be a mode.Encoding information into modes offers several advantages, including inherent resistance to noise [5], the capability to encode quantum information [17,33], and, in theory, allows for an infinite number of modes to serve as the alphabet [12].
A device that distinguishes beams based on their mode is a device that routes every mode to different outputs.This allows the encoded information to be recovered by placing a light detector at each output.A mode sorter can be realized by designing a refractive index distribution that causes each mode to propagate differently, ensuring that the energy of each mode can be efficiently coupled to a specific detector.Designing such devices is an inverse problem where the optimal refractive index distribution must be determined to perform different tasks on different inputs simultaneously.
To tackle this problem, researchers have used evolutionary algorithms [9] to evolve devices (or mode sorter) that can effectively demultiplex different modes (or light signals).In particular, Di Domenico et al. [9] used L2-NSGA [23], which combines the multiobjective NSGA-II [8] with linkage-learning (LL) [41].The search objectives to optimize are the distances between (1) the theoretical locations where the different modes should be redirected (target outputs) and ( 2) the actual locations where the device redirects the modes.L2-NSGA uses Pareto ranking to select and evolve better devices.Instead, linkage learning is applied to identifying and preserving linkage structures, i.e., features of the devices that must be replicated altogether into the offspring.
While prior studies show that L2-NSGA is effective at designing Pareto-efficient two-mode and three-mode devices, it requires many fitness evaluations to converge.Di Domenico et al. [9] reported that L2-NSGA requires a large population of 1000 devices and thousands of generations to converge.This is due to the very large-scale nature of the problem since a device for demultiplexing two beams has more than 12K decision variables.
In this paper, we aim to tackle this slow converge rate by proposing a novel evolutionary approach, namely L2-AGE-MOEA, that extends the Improved (AGE-MOEA-II) [26].Our approach combines AGE-MOEA-II with linkage learning methods to preserve the link structures of the best devices generated through the generations.L2-AGE-MOEA maintains the environmental selection methods of AGE-MOEA-II but uses linkage-based recombination to generate new candidate devices in each iteration.Furthermore, our approach uses an adaptive mutation operator that linearly decreases the mutation rate through the generations.It increases exploration at the beginning of the search (higher mutation rate) while promoting exploitation in the later generations (lower mutation rate).
We conducted an empirical study with four different types of devices with two and three Hermite-Gauss beams to demultiplex.These devices differ in the number of beams to demultiplex and the device's size.Our experimental results show that (1) L2-AGE-MOEA outperforms the state-of-the-art L2-NSGA used in prior studies, and (2) L2-AGE-MOEA outperforms the AGE-MOEA-II (i.e., without linkage learning and the adaptive mutation operator).Furthermore, a direct comparison between the NSGA-II and AGE-MOEA-II with the linkage learning-based variants (i.e., L2-AGE-MOEA and L2-NSGA) further confirms how linkage learning is a fundamental ingredient for designing Pareto efficient mode sorters.
The remainder of the paper is organized as follows.Section 2 summarizes the preliminaries on mode sorters, multi-objective optimization, and AGE-MOEA-II.Section 3 introduces our approach, highlighting its novel aspects over AGE-MOEA-II.Section 4.1 describes the empirical study and discusses the achieved results.Section 5 concludes the paper and discusses potential future directions.

BACKGROUND
Multi-objective optimization has a wide range of applications engineering problems [21].A multi-objective problem requires to optimize multiple objectives simultaneously: where  = ( 1 , . . .,   )  is the decision vector with  decision variables from the feasible region Ω ⊂ R  ;  : Ω → R  is an objective vector with  conflicting objectives to optimize.A solution  dominates another solution  ( ≺ ), if   () ≤   () ∀  ∈  and there exists at least one objective   ∈  such that   () <   ().A Pareto optimal solution  * ∈ Ω is a solution that is non-dominated by any other solution in Ω, i.e.,  ∈ Ω such that  ≺  * .The set of all Pareto optimal solutions is called Pareto optimal solution set (PS), while the corresponding objective vectors form the so-called Pareto front (PF).

Mode Sorter
Mode sorters (or demultiplexers) are devices that distinguish beams based on their mode and route them to different outputs.The most common mode of the electromagnetic wave equation is the Gaussian beam due to both theoretical and practical considerations.Theoretically, Gaussian beams have a simple mathematical formulation characterized by a radially symmetrical intensity profile that follows a Gaussian function with respect to the distance from the beam axis [3].In practice, most lasers produce Gaussian beams due to their design and construction of laser resonators [13,36].
We focus on the family of Hermite-Gauss (HG) beams [12], which intensity distribution is represented by Hermite polynomials of various orders, with Gaussian beams being the first order 1 .HG beams are ideal in communication systems due to their ability to resist noise [5] and their orthogonality which allows them to be perfectly distinguished [32] limiting the crosstalk between channels.Besides, their unlimited number does not restrict the size of the alphabet that can be used [12].
Inverse design of optical devices.Researchers have proposed various techniques to automatically design and produce optical devices with various functionalities, such as wavelength demultiplexers [28,34], polarization splitters [30], and mode converters [16].These methods differ in the underline optimization techniques (e.g., line search [34] or steepest descent method [28]), which typically start with a randomly initialized device with continuous refractive index distribution that is iteratively updated until converging to a binary device structure [20].Furthermore, these approaches can successfully design small-scale devices, with size in the order of a few  2 and target a pre-defined fixed number of beams and channels (e.g., three channels in [34]).
Inverse design of large-scale mode sorters.In this work, we focus on a specific type of optical devices, namely large-scale mode sorters.In other words, we want to design large-scale (i.e., > 100 2 in size) devices that can distinguish and route the input beams based on their mode (mode sorter) and with different numbers of beams.To the best of our knowledge, the only prior work that focused on large-scale mode sorters is the work by Di Domenico et al. [9].The authors have formulated the problem of designing mode sorters for HG beams on a surface plasmon polarization platform as a multi-objective optimization problem, which can be tackled using evolutionary algorithms.In particular, a device  is encoded as an  ×  binary matrix, where each entry  , corresponds to a 400 × 400 region of the device. , =1 indicates that the region in row  and column  contains a polymethyl methacrylate (PMMA) layer, which increases the refractive index locally; while  , =0 indicate the absence of such layers.The refractive index distribution that makes up the devices (i.e., a grid of PMMA layers) affects the propagation of various beams differently and will determine where each beam will be routed.
Given a device  , the goal is to route  HG beams (modes) towards  different output locations.Therefore, the objectives to optimize are the distances between the target locations and the actual locations where the HG beams are routed by the device  .In other words, let  0 , . . .,   input beams, the objectives to optimize are: min where  (  ) indicates the theoretical (target) output signal for the beam   while (,   −1 ) denotes the actual output produced by the device  when given as input the beam   .Note that the output is simulated using the beam propagation method [9] (BPM), considering the material's characteristics (e.g., refractive index) that will be used to implement the device.The problem is inherently multi-objective since the goal is to design a device that routes multiple beams.Devices that find the optimal structure to route one individual beam may introduce obstacles for the other beams to route through the same device.

The Improved Adaptive Geometry Estimation based MOEA (AGE-MOEA-II)
AGE-MOEA-II is a many-objective evolutionary algorithm (MOEA) that adapts the diversity and proximity metrics based on the shape (or geometry) of the non-dominated front.Algorithm 1 [26] reports the pseudo-code AGE-MOEA-II.There are three main steps in AGE-MOEA-II [26]: (1) non-dominated sorting, (2) fronts normalization, (3) computing the curvature  of the first-non dominated front, and (4) applying the environmental selection based on proximity and diversity, both computed using the estimated value for .In the following, we briefly describe these main steps.

2.2.1
Non-dominated sorting and front normalization.First, the parent and offspring populations are combined, and their solutions are ranked in subsequent fronts using the fast non-dominated sorting algorithm from NSGA-II [8] (line 5 in Algorithm 1).Then, all fronts are normalized such that the objectives are scaled into the interval [0; 1]  , where  is the number of objectives (line 6 in Algorithm 1).The normalization is done by using the hyperplane method of NSGA-III [2].In particular, the front F  is scaled to the origin of the axes using the ideal point F   and divided by the hyperplane intercepting the extreme points of the front [26].

Front modeling. After normalization, AGE-MOEA-II uses the
Newton-Raphson method to determine its geometry, or curvature  (line 7 in Algorithm 1) of the first non-dominated front.The Newton-Raphson method is an iterative procedure used in AGE-MOEA-II [26], which provides a better (more accurate) estimation of the curvature [26] compared to the heuristic used in AGE-MOEA [25].More specifically, the iterative Newton-Raphson formula for computing  is the following: with   and  +1 are the approximated roots obtained at the iterations  and  + 1, respectively.To initialize the procedure, the initial value  0 is set 1, and the formula is applied for four iterations.As reported by Panichella [26], four iterations are enough to achieve an accurate approximation of  with an error lower than 0.0001.

Environmental selection.
Given the curvature  computed using Equation 3, the non-dominated solutions are projected onto the estimated front manifold with curvature .Finally, each solution is assigned a survival score that combines the convergence and diversity scores.The convergence score measures how distance each solution is from the utopia/ideal points.Instead, the diversity score measures the distance between the non-dominated points on the manifold with curvature .The diversity of each non-dominated solution  corresponds to the geodesic distance, which generalizes the Euclidean distance for manifolds with curvature  ≠ 1.In particular, the geodesic distance between two points  and  is as follows: where  ⊥ is the central point of the arch (curved line) connecting  and ; || −  ⊥ || 2 is the Euclidean distance between  and  ⊥ ; finally, and || −  ⊥ || 2 is the Euclidean distance between  and middle point  ⊥ .

APPROACH
This paper introduces a variant of AGE-MOEA-II, hereafter referred to as L2-AGE-MOEA, which combines the adaptive geometry-based Pareto front modeling with linkage learning.
Motivation.We have decided to use AGE-MOEA-II as starting point for our approach since it was designed for large-scale problems [26] and outperformed other state-of-the-art MOEAs, including LMEA [46], NSGA-III [7], and MOEA/D [24].Besides, in our preliminary experiments, we have also observed that the Pareto fronts for our problem (designing mode sorters) have a hyperbolic geometry (curvature).Prior studies [22,25,26,31] showed that AGE-MOEA-II performs better than other state-of-the-art algorithms for problems with hyperbolic shapes.
Furthermore, not all elements in a device (i.e., entries in the device matrix as explained in Section 2) do contribute to routing the beams towards the right/correct locations.The core idea for using linkage learning methods is that (through the generations) these methods are able to infer which entries in the matrix structure contribute to the objectives (i.e., routing the beams successfully) and, therefore, should be inherited by the children for the next generation.
Device encoding and evaluation.We use the same encoding scheme used by Di Domenico et al. [9] and discussed in Section 2. Therefore, a device  is encoded as an  ×  binary matrix, where each entry  , corresponds to a 400 × 400 region of the device.Each entry  , takes values in 0, 1 and denotes the absence or presence of a PMMA layer in that location of the device.The solutions/devices are evaluated using the beam propagation method [9] and considering the following device and beam characteristics.We consider a device with a plasmonic refractive index for air silver interface  0 =1.008856.As for the beams, we consider a plasmonic field with wavelength  = 1.064 • 10 −6 / 0 and vacuum wavevector 2/.The difference of index of refraction is 1.049321 −  0 for the case of 40nm of PMMA.Finally, for the target fields, we consider the width  0 = 1.6•10 −6 and field displacement equal to 12 • 10 6 .
Pesudo-code.The Pseud-code of L2-AGE-MOEA is outlined in Algorithm 2. The Algorithm starts in line 2 with a randomly generated population of  individuals.In our case, an individual is a randomly-generated binary matrix whose size corresponds to the dimensions (number of rows and columns) of the device to design.Then, the solutions in the initial population are ranked using the fast non-dominated sorting Algorithm (line 3 of Algorithm 2).The first front F 1 is used to derive the linkage structures and extract the family of subsets (FOS).We detail this procedure in Section 3.1.
The linkage model is then used in lines 8-12 to generate the new offspring population .For each solution   ∈ , a new offspring solution  is created by recombining   with a donor solution selected from  using the binary tournament selection.The recombination procedure is detailed in Section 3.2.It uses the linkage model to determine which genes (linkage structures) of the donor solutions will be replicated altogether in the offspring solution.
The binary tournament selection operator promotes solutions with better (lower) dominance ranking or better (larger) survival score (see Section 2.2.3) at the same level of dominance rank.The offspring  is created using the REPRODUCTION routine in line 10 Algorithm 2, which uses the FOS to decide which gene will be copied from the parent or the donor solution.This procedure is detailed in Section 3.2.Furthermore, the new solution  is further mutated using the adaptive mutation operator detailed in Section 3.3.
The remainder of Algorithm 2 is identical to the main loop of the original AGE-MOEA-II.In a nutshell, parent  and offspring  populations are combined and sorted using the fast non-dominated sorting algorithm (line 13) and normalized (line 14).The first front F 1 is then used to compute the curvature  using the Newton-Raphson method (Equation 3).Finally, in lines 17-22, the survival score (line 20) is computed iteratively for all fronts using the geodesic distance (line 19).
The loop between lines 17 and 24 inserts as many individuals to the new population  as possible, based on their ranks and until reaching the population size  .L2-AGE-MOEA first selects the non-dominated solutions from F 1 ; if | F  |<  , the loop selects the non-dominated solutions from F  , etc.The loop ends when inserting all the solutions from the front   exceeds the maximum population size, as shown in line 24.In this case, the algorithm selects the remaining solutions from the front F  according to the descending order of survival score in lines 24-25.
Finally, the linkage model is retrained in line 25 based on the first non-dominated front F 1 previously computed on line 13, i.e., on the latest population.

Linkage learning with Clustering Methods
Linkage learning [41] is a category of techniques applied to infer linkage structures from a population, i.e., groups or clusters of promising decision variable values that contribute to achieving better (more optimal) results.The linkage structures are critical to identifying and preserving good partial solutions.
Various techniques have been proposed in the literature to derive linkage structures, such as Bayesian Network [27], Dependency Structure Matrix [44], and hierarchical clustering [23,37].In this work, we focus on hierarchical clustering since they have been used in the past for binary problems [23,37] and by Di Domenico et al. [9] for the same problem we are addressing in this paper, i.e., mode sorter design.
We selected UPGMA [14] (unweighted pair group method with arithmetic mean) as the hierarchical clustering method and used the hamming distance to measure the dissimilarities between genes (or decision variables).UPGMA is a bottom-up approach that iteratively clusters decision variables into larger clusters.The output of the clustering is a tree structure (or dendrogram), as in the example of Figure 1.In the dendrogram, the leaf nodes are the decision variables, which are, in turn, clustered together based on their distance.In the example of Figure 1, the variables  3 and  4 are first clustered together, and the resulting cluster is then augmented with another variable closest to this cluster.This process is repeated until all decision variables are clustered in the root of the dendrogram.The internal nodes of the dendrogram are the linkage structures, also referred to as Family Of Subsets (FOS) in the literature [9,23,37].
The FOS has  leaf nodes and  −1 internal nodes, where  is the number of decision variables.The FOS is a set {Ω 0 , Ω 1 , . . ., Ω  −1 } in which an entry Ω  is the subset of decision variables clustered together at the -th level (or node) in the dendrogram.
The distance between two decision variables   and   measures how frequently they have the same values in the best individuals of a population .For example, if   and   always have the same values (either zero or one) in all individuals, their distance will The distance between pairs of decision variables is computed using the hamming distance, a well-known distance function for binary vectors.This is also the distance recommended in the literature due to its low (linear) computational complexity [9,23].

Linkage-based Reproduction
The FOS derived used the method described in Section 3.1 is then used during reproduction.In line 10 of Algorithm 2, a new solution  is obtained by recombining a solution  (parent) with a donor solution selected via binary tournament selection.Recombination is done by copying all decision variable values from the parent to the offspring solution .Then,  randomly selected decision variable values are copied by the donor (mask) into the offspring solution based on the clusters in the FOS.To this aim, the dendrogram is cut at the distance level so that the number of clusters is equal to /4, where  is the number of decision nodes.Then, 50% of these clusters are randomly selected and used as recombination masks.Let C be the set of clusters obtained from the dendrogram with cut point /4; let C * ⊂ C be the 50% or randomly selected clusters from C; the new offspring solution O is created as follows: where O  , A  , and Donor  denote the -th decision variable of O, A, and Donor respectively.

Adaptive Mutation
Adaptive mutation operators have been widely investigated in the related literature and shown to be both theoretically [10,11] and empirically [43] useful in improving the performance of evolutionary algorithms.In this paper, we propose and investigate a simple linear adaptive mutation in L2-AGE-MOEA.We aim to consider more advanced adaptive schema as part of our future agenda.
Given that we handle a binary problem, L2-AGE-MOEA uses the bit-flip mutation, which randomly flips some decision variable values from one to zero (or vice versa) based on the mutation rate .Instead of using a constant mutation rate, we use a linearly decreasing mutation rate, such as it has a large mutation rate in the initial generation (for better exploration) and a lower mutation rate in the last generation (for better exploitation).More precisely, the mutation rate  at the generation  is computed as follows: where   is the initial mutation rate, while   is the final mutation rate (with   >   );   denotes the number of solution evaluations performed at the generation  and    is the maximum number of solution evaluations (total search budget).

EMPIRICAL STUDY
To assess the performance of L2-AGE-MOEA in generating mode sorters, we performed a computational experiment with different types of sorters.We describe the setup of our study in Section 4.1 while the results are discussed in Section 4.2.

Empirical Setup
In this paper, we consider four formulations of the mode-sorter design problem.First, we consider two-mode and three-mode sorters that aim at demultiplex two and three Hermite-Gauss beams, respectively.In the following, we refer to these sorters as Two-Beams and Three-Beams.The two-beams sorter is encoded as a 95×130 binary matrix, corresponding to 12350 decision variables in total.The two-beams sorter is a 90×180 binary matrix with 16200 decision variables in total.
In addition, we consider another additional objective that aims at simplifying the generated devices.As described in Section 2, the main objectives measure how well the device route the input beams towards the target output locations.On top of these objectives, we aim to minimize the number of entries in the device (binary matrix) equal to one, which leads to removing the PMMA layers that are not useful for sorting the beams, simplifying the device structure.
This results in two additional problem formulations, hereafter referred to as TwoBeams+Structure and ThreeBeams+Structure. Given a device  , the objectives for the TwoBeams+Structure problem are as follows: (,  ) (7) with  (, ) being the value of the device/matrix in row  and column .The TwoBeams+Structure problem has four objectives: three for the beam output locations, and one for the device's structure.

Baselines selection.
As baseline, we selected L2-NSGA [9], AGE-MOEA-II [26], and NSGA-II [8].We selected L2-NSGA since it is the algorithm used by Di Domenico et al. [9] to generate mode sorters.Hence, our goal is to determine whether our approach outperforms the state-of-the-art algorithm used in the literature for the problem addressed in this paper.Our second baseline is AGE-MOEA-II [26] since it corresponds to the starting point we use to develop/design our approach.Finally, we considered NSGA-II as the third baseline because it does not use any linkage learning or front modeling techniques.Furthermore, this baseline allows us to directly compare NSGA-II with L2-NSGA.Note that Di Domenico et al. [9] only reported the results of L2-NSGA without a direct comparison with vanilla NSGA-II, i.e., without linkage learning.

Parameter
Values.For all MOEAs, we used the same parameter settings for the shared genetic operators [45].For all MOEAs, we set the population size  = 100 devices and the number of generations  = 1000, corresponding to 100K solutions evaluations.
For AGE-MOEA-II and NSGA-II we use the uniform crossover [45].With this operator, each gene in the offspring solution  has a 50% probability of being selected from the first parent  and 50% probability from the second parent .The choice is made using a number  randomly sampled from the uniformly distributed interval [0, 1].For all MOEAs, we used the crossover probability   =1.00 [39].As the mutation operator, we use the bit-flip mutation with mutation rate  = 1/, where  is the number of decision variables [39,45].
L2-AGE-MOEA required to set some additional parameters.It uses the linkage-learning-based recombination described in Section 3.2.This operator requires setting the cut point for the dendrogram.In our preliminary experiments, we experimented with different values for the cut point used to extract clusters from the dendrogram.We obtained good results by selecting the cut point to /4.Furthermore, L2-AGE-MOEA uses the adaptive mutation rate described in Section 3.3.We set the final mutation rate   = 1/ and the initial mutation rate   = 2‰.Note that this is a large mutation rate considering that  = 12350 and  = 16200 for the devices with two and three HG beams to demultiplex, respectively.For both L2-NSGA and L2-AGE-MOEA, we recompute the linkage model (FOS) every Λ=20 generation (the constant Λ in lines 4 and 25 of Algorithm 2) as suggested by Di Domenico et al. [9].

Implementation.
We have implemented L2-AGE-MOEA, the state-of-the-art L2-NSGA, and the four formulations of the device design problem in Matlab within the PlatEMO framework [39].We have selected this framework because it contains the original implementation of AGE-MOEA-II.It is also publicly available on GitHub2 , and it is easy to extend by adding more optimization problems and MOEAs.We reimplemented L2-NSGA in PlatEMO, re-using the original code by Di Domenico et al. [9] already written in Matlab and available on Zenodo3 .We re-adapted the code to be compatible with PlatEMO by applying as little change as possible to the original code of L2-NSGA.Instead, for AGE-MOEA-II and NSGA-II, we used their implementation already available on PlatEMO.
The source code of L2-AGE-MOEA is also available on Zenodo at: https:// doi.org/ 10.5281/ zenodo.7851998 We conducted all experiments on the same machine with the following characteristics: 6-Core, Intel Core i7 processor running at 3.2GHz with 16GB RAM.

Performance metrics.
Given the stochastic nature of the experimented MOEAs, we ran them 20 times for each problem formulation.In each run, we stored the non-dominated front produced at the end of the search by each MOEA.Then, we assess the overall performance of the different MOEAs by using the inverted generational distance (IGD) [49] as well as the hypervolume (HV) [29].Both metrics take values in [0, 1], but they measure different properties of the generated fronts.The IGD measures the distance of the front   generated by an algorithm ; therefore, smaller IGD values are preferable.Instead, HV measures the (hyper)volume (or portion) of the feasible region that is dominated by the generated front   .Hence, a large HV value indicates better results.
To compute IGD and HV scores, we need to compare the nondominated fronts produced by the MOEAs against the optimal (true) Pareto front.Since we do not know the true front for the mode sorter design problem, we build the so-called reference front, which combines the fronts produced by the evaluated MOEAs and selects the overall non-dominated solutions in this union set.More precisely, let { 1 , . . .,   } be all non-dominated fronts produced the different MOEAs; the reference front R is built as follows: To calculate the HV score we have to specify a reference point [15] (or reference vector).In this study, we consider the reference point to be 2 •   , where   denotes the nadir point of the reference front R computed using Equation 8.
To assess the significance of the differences among the different MOEAs, we use the Wilcoxon rank-sum test [4] with a significance level  = 0.05.A significant -value indicates that an algorithm  achieves significantly lower IGD values (or significantly higher HV values) than another algorithm  across the 20 runs.

Results
Table 1 reports the median and the Interquartile Range (IQR) values for the IGD indicator (top half) and HV (bottom half) scores achieved by the different MOEAs over 20 independent repetitions.In the Table, results highlighted with ▼ indicate that the IGD and HV values achieved by L2-AGE-MOEA are statistically better than the baselines, as indicated by the Wilcoxon rank-sum test [4].
From the tables, we can observe that L2-AGE-MOEA achieves significantly better (lower) IGD values and better (higher) HV values than all baselines, according to the Wilcoxon rank-sum test, independently of the problem formulation.In the following, we discuss the comparison among the different MOEAs separately.4.2.1 L2-AGE-MOEA vs. L2-NSGA.From a direct comparison between the approach used by Di Domenico [9] and our approach, we can observe that the latter achieves both better HV and IGD values.We can also observe that for Two-Beams, Two-Beams+Structure, and Three-Beams, there is one order of magnitude difference between the IGD values yielded by L2-AGE-MOEA over the baseline.Our approach's IQR values are also smaller, indicating that it produces more stable (less variable) results over the different runs.
Furthermore, L2-NSGA always achieves smaller HV values than L2-AGE-MOEA, with differences of one order of magnitude for the Three-Beams and Three-Beams+Structure devices.4.2.2L2-AGE-MOEA vs. L2-AGE-MOEA.Our approach also outperforms its predecessor AGE-MOEA-II in terms of IGD values.Note that the latter MOEA does not use linkage learning or the adaptive mutation operator.The differences are statistically significant for to the Wilcoxon test but also remarkably large.For example, on Two-Beams, L2-AGE-MOEA achieves (on average) less than 1/3 of the IGD values produced by the baseline.Similar results can be observed regarding HV values where our approach achieves better scores, e.g., 0.258 vs. 3.35 HV values for Three-Beams+Structure.4.2.3AGE-MOEA-II vs. L2-NSGA.This comparison is particularly noteworthy.The latter is the algorithm used by Di Domenico et al. [9] to generate mode sorters and also uses linkage learning ).This also empirically supports the choice made by Di Domenico [9] in using L2-NSGA over NSGA-II.

4.2.5
The impact of using linkage learning.Our results confirm the importance of using linkage learning methods for large-scale binary problems like designing mode sorters.Indeed, it is worth highlighting that L2-AGE-MOEA outperforms its predecessor AGE-MOEA-II, while L2-NSGA outperforms its predecessor NSGA-II according to the median IGD values reported in Table 1.

Graphical comparison.
To graphically compare the different MOEAs experimented in this study, Figure 2 depicts the nondominated fronts achieved for Two-Beams as well as the reference Pareto front R. For the sake of this analysis, we selected the front with the median IGD values across the 20 independent runs.As we can observe, L2-AGE-MOEA produced a set of non-dominated solutions (marked as red points) that are very close to the referent front R. The corresponding median IGD value is 0.2654.Unlike L2-AGE-MOEA, the baseline MOEAs generate fronts that are farthest from the reference front.AGE-MOEA-II produces the second-best non-dominated front, followed by L2-NSGA, while NSGA-II produces the least optimal front.8).
We can also observe that the reference Pareto front R has a hyperbolic geometry (convex) with  < 1 curvature.Similarly, all fronts generated by the MOEAs follow a hyperbolic geometry.This observation explains why L2-AGE-MOEA and AGE-MOEA-II are the best-performing MOEAs since they use front modeling methods that adapt the diversity and proximity measures based on this geometry.As reported by prior studies [25,26], geometry-based MOEAs work better than NSGA-II, NSGA-III and other traditional MOEAs (e.g.MOEA/D) for problems with  < 1 Pareto front curvatures.4.2.7 Generated devices and simulation results.In this section, we aim to analyze the characteristics of the device by selecting the knee points among the non-dominated front obtained by our approach.First, we selected the front with the median IGD value among the 20 independent runs.Then, from such a median front, we selected the knee point [47], which is the solution with the maximum marginal rates of return.We opted for the knee point since it is the point on the Pareto front that achieves the highest HV value [6].The resulting device and its beam propagation simulation results are  The device in Figure 3a (right side) displays a distinct pattern in the central area, while the four corners do not contribute to beam routing.This suggests two potential improvements for future work: First, the current mutation operator targets all entries in the device/matrix structure, which can be mutated with equal probability.Therefore, the first direction for future improvements is related to using non-uniform probabilities for the entries in the different areas of the devices (higher probability for the central area).Second, different methods could be considered to initialize the search with initial devices (individuals) such that more focus (variety) is given to the central area of the devices.

Additional Analysis.
As explained in Section 3.1, we used the hamming distance (HD) for the UPGMA method because of its linear complexity.Previous work on linkage learning methods suggested using mutual information (MI) instead [19,38].However, this function has a much larger computational complexity compared to HD.To assess this, we performed some preliminary experiments using MI over HD.Inferring the linkage structures and extracting the FOS (routine INFER-MODEL in Algorithm 1) takes 2.7s on average with HD compared to 7.42 minutes with MI for the Two-Beam devices.

CONCLUSION AND FUTURE WORK
Optical devices that demultiplex optical signals with different modes have a wide range of real-world applications.However, the design of such devices is particularly complex and can not be performed manually.Multi-objective evolutionary algorithms (MOEAs) have been recently applied in the literature [9] to generate such devices in an automated fashion.The state-of-the-art approach used L2-NSGA, a variant of the NSGA-II algorithm that relies on linkage learning to produce high-accurate demultiplexers.However, this algorithm is very slow to converge due to the very large scale of the problem (with more than 12K decision variables for a two-beams device).
In this paper, we proposed a novel multi-objective approach called L2-AGE-MOEA.The new MOEA incorporates linkage learning methods into AGE-MOEA-II, a many-objective algorithm that dynamically adapts its environmental selection based on the geometry of the Pareto front.We extend AGE-MOEA-II by introducing a recombination operator based on linkage learning and hierarchical clustering (UPMGA method) in particular [14].Second, we introduce an adaptive mutation operator that linearly decreases the mutation rate such that exploration and exploitation are promoted in different stages of the search process.
We conducted an empirical study to assess the performance of L2-AGE-MOEA in designing different demultiplexers for Hermite-Gauss beams.A comparison between L2-NSGA and AGE-MOEA-II showed that L2-AGE-MOEA achieved better performance w.r.t. the IGD and HV indicators.Besides, our results also indicate that linkage learning plays a critical role in achieving better results.
In our future work, we aim to assess the performance of our approach on different types of demultiplexers and beams, such as non-linear complex devices and devices capable of sorting beams from different families (e.g., Hermite-Gauss from Airy beams).We plan to investigate different adaptive schemes for the mutation operators (e.g., [10,11,43]) as well as different distance functions for the hierarchical clustering algorithm.We also aim to apply L2-AGE-MOEA to other large-scale engineering problems, such as feature selection [40] and sparse signal reconstruction [18].

1 𝑓 2 ReferenceFigure 2 :
Figure 2: Fronts produced by the different MOEAs for Two-Beams with =2 and population size  =100.The reference front R includes all best (non-dominated) solutions produced among all runs and algorithms as described in Equation 4.1.4(Equation8).
(a) Generated device (right side) and the corresponding beams propagation (left side) (b) Target output signal (red line) and generated output signal (blue line) for the two HG input beams

Figure 3 :
Figure 3: Example of device generated by L2-AGE-MOEA for the Two-Beams+Structure problem.

Table 1 :
Median and Interquartile Range (IQR) values achieved by the different evolutionary algorithms. denotes the number of objectives while  is the number of decision variables.The best values are highlighted in grey color.AGE-MOEA-II does not use any linkage learning methods but uses front modeling methods.As we can observe, AGE-MOEA-II outperforms L2-NSGA in producing more optimal devices according to the IGD indicator.The differences are also confirmed as statistically significant by the Wilcoxon rank-sum test (-values <0.01).Instead, AGE-MOEA-II outperforms L2-NSGA in terms of HV values only for three out of four problems.Indeed, for Two-Beams, both MOEAs achieve a statistically equivalent hypervolume, on average.The statistical difference holds for Two-Beams, Three-Beams, and Three-Beams+Structure (all -values are <0.01).L2-NSGA vs. NSGA-II.A direct comparison between L2-NSGA vs. NSGA-II shows that the former clearly outperforms the latter in terms of IGD values.The only difference between these two MOEAs is that L2-NSGA uses linkage learning methods while NSGA-II does not.The differences in terms of IGD values between these two MOEAs are statistically significant for all problem formulations (all -values are <0.01