Preference-Aware Constrained Multi-Objective Bayesian Optimization

This paper addresses the problem of constrained multi-objective optimization over black-box objective functions with practitioner-specified preferences over the objectives when a large fraction of the input space is infeasible (i.e., violates constraints). This problem arises in many engineering design problems, including analog circuits and electric power system design. We aim to approximate the optimal Pareto set over the small fraction of feasible input designs. The key challenges include the massive size of the design space, multiple objectives, a large number of constraints, and the small fraction of feasible input designs, which can be identified only after performing expensive experiments/simulations. We propose a novel and efficient preference-aware constrained multi-objective Bayesian optimization approach referred to as PAC-MOO to address these challenges. The key idea is to learn surrogate models for both output objectives and constraints, and select the candidate input for evaluation in each iteration that maximizes the information gained about the optimal constrained Pareto front while factoring in the preferences over objectives. Our experiments on synthetic and challenging real-world analog circuit design optimization problems demonstrate the efficacy of PAC-MOO over baseline methods.


Introduction
A large number of engineering design problems involve making design choices to optimize multiple objectives.Some examples include electric power systems design [45,8], design of aircrafts [46], and design of analog circuits [36,50], and nanoporous materials discovery [12].The common challenges in such constrained multi-objective optimization (MOO) problems include the following.1) The objective functions are unknown and we need to perform expensive experiments to evaluate each candidate design choice.2) The objectives are conflicting in nature and all of them cannot be optimized simultaneously.3) The constraints need to be satisfied, but we cannot evaluate them for a given input design without performing experiments.4) Only a small fraction of the input design space is feasible.Therefore, we need to find the Pareto optimal set of solutions from the subset of feasible inputs (i.e., satisfies constraints).Additionally, in several real-world applications, the practitioners have specific preferences over the objectives.For example, the designer prefers efficiency over settling time when optimizing analog circuits.
Bayesian optimization (BO) is an efficient framework to solve black-box optimization problems with expensive objective function evaluations [38,28].There are no BO algorithms for simultaneously handling the challenges of black-box constraints, a large fraction of input space is invalid (doesn't satisfy all constraints), and preferences over objectives.To fill this important gap, we propose a novel and efficient information-theoretic approach referred to as Preference-Aware Constrained Multi-Objective Bayesian Optimization (PAC-MOO).PAC-MOO builds surrogate models for both output objectives and constraints based on the training data from past function evaluations.PAC-MOO employs an acquisition function in each iteration to select a candidate input design for performing Figure 1: A high-level overview of the PAC-MOO algorithm.It takes as input the input space X and preferences over objectives p, and produces a Pareto set of candidate points as per the preferences after T iterations of PAC-MOO.In each iteration t, PAC-MOO selects a candidate point x t ∈ X to perform expensive function evaluations and the surrogate models for both objective functions and constraints are updated based on training data from the evaluated point.
the expensive function evaluation.The selected input design maximizes the information gain about the constrained optimal Pareto front while factoring in the designer preferences over objectives.The experimental results on two real-world analog circuit design benchmarks demonstrate that PAC-MOO was able to find circuit configurations with higher preferred objective values (efficiency) as intended by sacrificing the overall Pareto hypervolume indicator.

Contributions.
Our key contribution is the development and evaluation of the PAC-MOO algorithm to solve a general constrained multi-objective optimization problem.Specific contributions include: • A tractable acquisition function based on information gain to select candidate points for performing expensive function evaluations.
• Approaches to increase the chances of finding feasible candidate designs and to incorporate preferences over objectives.
• Evaluation of PAC-MOO on two challenging analog circuit design problems and comparison with prior methods.

Related Work
There are three families of approaches for solving constrained multi-objective optimization problems with expensive black-box functions.First, we can employ heuristic search algorithms such as multiobjective variants of simulated annealing [43,20,23], genetic algorithms [21,22], and particle swarm optimization [29,16,40,44] to solve them.The main drawback of this family of methods is that they require a large number of expensive function evaluations.Second, Bayesian optimization (BO) methods employ surrogate statistical models to overcome the drawbacks of the previous families of approaches.The surrogate models are initialized using a small set of randomly sampled training data, i.e., input-output pairs of design parameters and objective evaluations.They are iteratively refined during the optimization process to actively collect a new training example in each iteration through an acquisition function (e.g., expected improvement).There is a large body of work on BO for single-objective optimization [13,15].Standard BO methods have been applied to a variety of problems including solving simple analog circuit design optimization and synthesis problems [27,33,31,49,26,34,42,32,41].
Multi-objective BO (MOBO) is a relatively less-studied problem setting compared to the singleobjective problem.Some of the recent work on MOBO include Predictive Entropy Search for Multiobjective Bayesian Optimization (PESMO) [24], Max-value Entropy Search for Multi-Objective Bayesian optimization (MESMO) [3], Uncertainty-aware Search framework for Multi-Objective Bayesian Optimization (USEMO) [7], Pareto-Frontier Entropy Search (PFES) [39], and Expected Hypervolume Improvement [10,14].Each of these methods has been shown to perform well on a where y fj = f j (x) for all j ∈ {1, • • • , K} and y ci = C i (x) for all i ∈ {1, • • • , L}.We define an input vector x as feasible if and only if it satisfies all constraints.The input vector x Pareto-dominates another input vector x if f j (x) ≤ f j (x ) ∀j and there exists some The optimal solution of the MOO problem with constraints is a set of input vectors X * ⊂ X such that no configuration x ∈ X \ X * Pareto-dominates another input x ∈ X * and all configurations in X * are feasible.The solution set X * is called the optimal constrained Pareto set and the corresponding set of function values Y * is called the optimal constrained Pareto front.The most commonly used measure to evaluate the quality of a given Pareto set is by calculating the Pareto hypervolume (PHV) indicator [2] of the corresponding Pareto front of (y f1 , y f2 , • • • , y f K ) with respect to a reference point r.Our overall goal is to approximate the constrained Pareto set X * by minimizing the total number of expensive function evaluations.When a preference specification p over the objectives is provided, the MOO algorithm should prioritize producing a Pareto set of inputs that optimize the preferred objective functions.
Preferences over black-box functions.The designer/practitioner can define input preferences over multiple black-box functions through the notion of preference specification, which is defined as a vector of scalars Higher values of p i mean that the corresponding objective function f i is highly preferred.In such cases, the solution to the MOO problem should prioritize producing design parameters that optimize the preferred objective functions.

Preference-Aware Constrained Multi-Objective Bayesian Optimization
The general strategy behind the BO process is to employ an acquisition function to iteratively select a candidate input (i.e., design parameters) to evaluate using the information provided by the surrogate models.The surrogate models are updated based on new training examples (design parameters as input, and evaluations of objectives and constraints from function evaluations as output).
Overview of PAC-MOO.PAC-MOO algorithm is an instance of the BO framework, which takes as input the input space X, preferences over objectives p, expensive objective functions and constraints evaluator, and produces a Pareto set of candidate inputs as per the preferences after T iterations of PAC-MOO as shown in Algorithm 1.In each iteration t, PAC-MOO selects a candidate input design x t ∈ X to perform a function evaluation.Consequently, the surrogate models for both objective functions and constraints are updated based on training data from the function evaluations.

Surrogate Modeling
Gaussian Processes (GPs) [47] are suitable for solving black-box optimization problems with expensive function evaluations since they are rich and flexible models which can mimic any complex objective function.Intuitively, two candidate design parameters that are close to each other will potentially exhibit approximately similar performance in terms of output objectives.We model the objective functions and black-box constraints by independent GP models with zero mean and i.i.d.observation noise.Let D = {(x i , y i )} t−1 i=1 be the training data from past t−1 function evaluations, where x i ∈ X is a candidate design and is the output vector resulting from evaluating the objective functions and constraints at x i .

Acquisition Function
The state-of-the-art MESMO approach for solving MOO problems [3] proposed to select the input that maximizes the information gain about the optimal Pareto front for evaluation.However, this approach did not address the challenge of handling black-box constraints which can be evaluated only through expensive function evaluators.To overcome this challenge, MESMOC [4] maximizes the information gain between the next candidate input for evaluation x and the optimal constrained Pareto front Y * : In this case, the output vector Consequently, the first term in Equation ( 1), entropy of a factorizable (K + L)-dimensional Gaussian distribution P (y | D, x), can be computed in closed form as shown below: where σ 2 fj (x) and σ 2 ci (x) are the predictive variances of j th function and i th constraint GPs respectively at input x.The second term in Equation ( 1) is an expectation over the Pareto front Y * .We can approximately compute this term via Monte-Carlo sampling as shown below: where S is the number of samples and Y * s denote a sample Pareto front.There are two key algorithmic steps to compute this part of the equation: 1) How to compute constrained Pareto front samples Y * s ?; and 2) How to compute the entropy with respect to a given constrained Pareto front sample Y * s ?We provide solutions for these two questions below.
1) Computing constrained Pareto front samples via cheap multi-objective optimization.To compute a constrained Pareto front sample Y * s , we first sample functions and constraints from the posterior GP models via random Fourier features [25,37] and then solve a cheap constrained multi-objective optimization over the K sampled functions and L sampled constraints.
Cheap MO solver.We sample fi from GP model GP fj for each of the K functions and Cj from GP model GP cj for each of the L constraints.A cheap constrained multi-objective optimization problem over the K sampled functions f1 , f2 , • • • , fk and the L sampled constraints C1 , C2 , • • • , CL is solved to compute the sample Pareto front Y * s .Note that we refer to this optimization problem as cheap because it is performed over sampled functions and constraints, which are cheaper to evaluate than performing expensive function evaluations.We employ the popular constrained NSGA-II algorithm [17,11] to solve the constrained MO problem with cheap sampled objective functions and constraints.
2) Entropy computation with a sample constrained Pareto front.Let Y * s = {v 1 , • • • , v l } be the sample constrained Pareto front, where l is the size of the Pareto front and each v i is a (K + L)-vector evaluated at the K sampled functions and L sampled constraints The following inequality holds for each component y j of the (K + L)- The inequality essentially says that the j th component of y (i.e., y j ) is upper-bounded by a value obtained by taking the maximum of j th components of all l (K + L)-vectors in the Pareto front Y * s .This inequality had been proven by a contradiction for MESMO [3] for all objective functions j ∈ {f 1 , • • • , f K }.We assume the same for all constraints j ∈ {c By combining the inequality (4) and the fact that each function is modeled as an independent GP, we can approximate each component y j as a truncated Gaussian distribution since the distribution of y j needs to satisfy Furthermore, a common property of entropy measure allows us to decompose the entropy of a set of independent variables into a sum over entropies of individual variables [9]: The r.h.s is a summation over entropies of The differential entropy for each y j is the entropy of a truncated Gaussian distribution [35] and is given by the following equations: Consequently, we have: where , φ and Φ are the p.d.f and c.d.f of a standard normal distribution respectively.By combining equations ( 2) and (8) with equation (1), we get the final form of our acquisition function as shown below: And

Convex Combination for Preferences
We now describe how to incorporate preference specification (when available) into the acquisition function.The derivation of the acquisition function proposed in Equation 9 resulted in a function in the form of a summation of an entropy term defined for each of the objective functions and constraints as AF (i, x).Given this expression, the algorithm will select an input while giving the same importance to each of the functions and constraints.However, as an example, in problems such as circuit design optimization, efficiency is typically the most important objective function.
The designer would like to find a trade-off between the objectives.Nevertheless, candidate circuits with high voltage and very low efficiency might be useless in practice.Therefore, we propose to inject preferences from the designer into our algorithm by associating different weights to each of the objectives.A principled approach would be to assign appropriate preference weights resulting in a convex combination of the individual components of the summation AF (i, x).Let p i be the preference weight associated with each individual component.The preference-based acquisition function is defined as follows (see Algorithm 2): It is important to note that in practice if a candidate design does not satisfy the constraints, the optimization will fail regardless of the preferences over objectives.Therefore, the cumulative weights assigned to the constraints have to be at least equal to the total weight assigned to the objective functions: Given that satisfying all the constraints is equally important, the weights over the constraints would be equal.Finally, only the weights over the functions will need to be explicitly specified.

Finding Feasible Regions of Design Space
The acquisition function defined in equation 11 will build constrained Pareto front samples Y * s by sampling functions and constraints from the Gaussian process posterior.The posterior of the GP is built based on the current training data D. The truncated Gaussian approximation defined in Equations 6 and 7 requires the upper bound y fj * s and y ci * s to be defined.However, in the early Bayesian optimization iterations of the algorithm, the configurations evaluated may not include any feasible design parameters.This is especially true for scenarios where the fraction of feasible design configurations in the entire design space is very small.In such cases, the sampling process of the constrained Pareto fronts Y * s is susceptible to failure because the surrogate models did not gather any knowledge about feasible regions of the design space yet.Consequently, the upper bounds y fj * s and y ci * s are not well-defined and the acquisition function in 11 is not well-defined.Intuitively, the algorithm should first aim at identifying feasible design configurations by maximizing the probability of satisfying all the constraints.We define a special case of our acquisition function for such challenging scenarios as shown below: This acquisition function enables an efficient feasibility search due to its exploitation characteristics [18].Given that the probability of constraint satisfaction is binary (0 or 1), the algorithm will be able to quickly prune unfeasible regions of the design space and move to other promising regions until it identifies feasible design configurations.This approach will enable a more efficient search over feasible regions later and accurate computation of the acquisition function.The complete pseudo-code of PAC-MOO is given in Algorithm 1.

Experimental Setup and Results
In this section, we present experimental evaluation of PAC-MOO and baseline methods on two challenging analog circuit design problems.
Baselines.We compare PAC-MOO with state-of-the-art constrained MOO evolutionary algorithms, namely, NSGA-II [11] and MOEAD [48].We also compare to the constrained MOO method, the Uncertainty aware search framework for multi-objective Bayesian optimization with constraints (USEMOC) [5].We evaluated two variants of USEMOC: USEMOC-EI and USEMOC-TS, using expected improvement (EI) and Thompson sampling (TS) acquisition functions.

PAC-MOO:
We employ a Gaussian process (GP) with squared exponential kernel for all our surrogate models.We evaluated several preference values for the efficiency objective function.PAC-MOO-0 refers to the preference being equal over all objectives and constraints.PAC-MOO-1 refers to assigning 80% preference to the efficiency objective and equal importance to other functions and constraints, resulting in a preference value p i = 0.5 × 0.8 = 0.4 for the efficiency.With PAC-MOO-2, we assign a total preference of 85% to the objective functions with 92% importance to the efficiency resulting in a preference value of p i = 0.85 × 0.92 = 0.782.We assign equal preference to all other functions.With PAC-MOO-3, we assign more importance to the objective functions by assigning a total of 0.65 preference to them and 0.35 to the constraints.Additionally, we provide 88% importance to the efficiency resulting in a preference value of p i = 0.65 × 0.88 = 0.572.
Evaluation Metrics: The Pareto Hypervolume (PHV) indicator is a commonly used metric to measure the quality of the Pareto front [51].PHV is defined as the volume between a reference point and the Pareto front.After each circuit simulation, we measure the PHV for all algorithms and compare them.To demonstrate the efficacy of the preference-based PAC-MOO, we compare different algorithms using the maximum efficiency of the optimized circuit configurations as a function of the number of circuit simulations.
Benchmarks: Considering that the fraction of feasible circuit configurations in the design space is extremely low (around 4%), every method is initialized with 32 initial feasible designs provided by a domain expert.
In all our preference-based experiments, we assign a preference value to the efficiency objective and assign all other black-box functions (the rest of the objectives and the constraints) equal preference.
It is noteworthy that neither evolutionary algorithms nor the baseline BO method USEMOC are capable of handling preferences over objectives.This is an important advantage of our PAC-MOO algorithm, which we demonstrate through our experiments.respectively.An algorithm is considered relatively better if it achieves higher hypervolume with a lower number of circuit simulations.We make the following observations.1) PAC-MOO with no preferences (i.e., PAC-MOO-0) outperforms all the baseline methods.This is attributed to the efficient information-theoretic acquisition function and the exploitation approach to finding feasible regions in the circuit design space.2) At least one version of USEMOC performs better than all evolutionary baselines: USEMOC-EI for both SCVR and HCR designs.These results demonstrate that BO methods have the potential for accelerating analog circuit design optimization over evolutionary algorithms .
3) The performance of PAC-MOO with preference (i.e., PAC-MOO-1,2,3) is lower in terms of the hypervolume since the metric evaluates the quality of general Pareto front, while our algorithm puts emphasis on specific regions of the Pareto front via preference specification.This behavior is expected, nevertheless, we notice that the PHV with PAC-MOO-1 and PAC-MOO-2 is still competitive and degrades only when a significantly high preference is given to efficiency (PAC-MOO-3).
Efficiency of optimized circuits with preferences.Since efficiency is the most important objective for both SCVR and HCR circuits, we evaluate PAC-MOO by giving higher preference to efficiency over other objectives.Figures 2c and 2d show the results for maximum efficiency of the optimized circuit configurations as a function of the number of circuit simulations for SCVR and HCR design optimization.1) As intended by design, PAC-MOO with preferences outperforms all baseline methods, including PAC-MOO without preferences.
2) The improvement in maximum efficiency of uncovered circuit configurations for PAC-MOO with preferences comes at the expense of loss in hypervolume metric as shown in Figure 2a and Figure 2b.

Summary
Motivated by challenges in hard engineering design optimization problems (e.g., large design spaces, expensive simulations, a small fraction of configurations are feasible, and the existence of preferences over objectives), this paper proposed a principled and efficient Bayesian optimization algorithm referred to as PAC-MOO.The algorithm builds Gaussian process based surrogate models for both objective functions and constraints and employs them to intelligently select the sequence of input designs for performing experiments.The key innovations behind PAC-MOO include a scalable and efficient acquisition function based on the principle of information gain about the optimal constrained Pareto front; an effective exploitation approach to find feasible regions of the design space; and incorporating preferences over multiple objectives using a convex combination of the corresponding acquisition functions.Experimental results on two challenging analog circuit design optimization problems demonstrated that PAC-MOO outperforms baseline methods in finding a Pareto set of feasible circuit configurations with high hyper-volume using a small number of circuit simulations.With preference specification, PAC-MOO was able to find circuit configurations that optimize the preferred objective functions better.

Figure 2 :
Figure 2: Hypervolume and Efficiency of optimized circuits with preferences vs.No of simulations Hypervolume of Pareto set vs. No of circuit simulations.Figures 2a and 2b show the results for PHV of Pareto set as a function of the number of circuit simulations for SCVR and HCR design,respectively.An algorithm is considered relatively better if it achieves higher hypervolume with a lower number of circuit simulations.We make the following observations.1) PAC-MOO with no preferences (i.e., PAC-MOO-0) outperforms all the baseline methods.This is attributed to the efficient Preference based Acquisition function (α pref ) xt: yt ← (f1(xt), • • • , fK (xt), C1(xt), • • • , CL(xt)) 9: Aggregate data: D ← D ∪ {(xt, yt)} 10: Update models M f 1 , • • • , M f K and Mc 1 , • • • , Mc L using D 11: end for 12: return the Pareto set of feasible design parameters from D Algorithm 2 1.Switched-Capacitor Voltage Regulator (SCVR) design optimization setup.The constrained MOO problem for SCVR circuit design consists of 33 input design variables, nine objective functions, and 14 constraints.Every method is initialized with 24 randomly sampled circuit configurations.2. High Conversion Ratio (HCR) design setup.The constrained MOO problem for HCR circuit design consists of 32 design variables, 5 objective functions, and 6 constraints.