A Comparison of Mesh-Free Differentiable Programming and Data-Driven Strategies for Optimal Control under PDE Constraints

The field of Optimal Control under Partial Differential Equations (PDE) constraints is rapidly changing under the influence of Deep Learning and the accompanying automatic differentiation libraries. Novel techniques like Physics-Informed Neural Networks (PINNs) and Differentiable Programming (DP) are to be contrasted with established numerical schemes like Direct-Adjoint Looping (DAL). We present a comprehensive comparison of DAL, PINN, and DP using a general-purpose mesh-free differentiable PDE solver based on Radial Basis Functions. Under Laplace and Navier-Stokes equations, we found DP to be extremely effective as it produces the most accurate gradients; thriving even when DAL fails and PINNs struggle. Additionally, we provide a detailed benchmark highlighting the limited conditions under which any of those methods can be efficiently used. Our work provides a guide to Optimal Control practitioners and connects them further to the Deep Learning community.

We present a comprehensive comparison of DAL, PINN, and DP using a general-purpose mesh-free differentiable PDE solver based on Radial Basis Functions.Under Laplace and Navier-Stokes equations, we found DP to be extremely effective as it produces the most accurate gradients; thriving even when DAL fails and PINNs struggle.Additionally, we provide a detailed benchmark highlighting the limited conditions under which any of those methods can be efficiently used.Our work provides a guide to Optimal Control practitioners and connects them further to the Deep Learning community.

INTRODUCTION
In the physical sciences, ordinary and partial differential equations (ODEs and PDEs) are widely used to model natural phenomena.
Controlling the behaviour of systems governed by such equations is vital to many engineering disciplines.Optimal control (OC) is the task of minimising a cost objective J subject to specific dynamical constraints [45].In fig. 1 for instance, given an input velocity u in , the flow in the channel is fully determined by the Navier-Stokes equations.OC formalises related optimisation processes and describes how actions on u in affect u out at the outlet.It helps find the input condition for a desired output.Nowadays, automated systems based on OC are part of everyday life: spacecraft fuel control [8], population dynamics [3], computer files transfer [45], social distancing and COVID-19 [46], the list goes on.Like many optimisation problems, gradients are at the heart of OC.One of the most established numerical algorithms for OC is direct-adjoint looping (DAL) which has been successfully applied to problems in shape optimisation [2], turbulent fluid flow [29], and many more.It builds on Lagrange's famous 1853 treatise [26], which laid the foundation for Lagrange multipliers and adjointbased sensitivity analysis.At its core, it iteratively evaluates the gradient of J with respect to a control variable , until a stopping condition is met.Like most adjoint-based schemes, DAL works by solving a second PDE for  termed the adjoint problem, in addition to the original (direct) governing equations.
DAL is a powerful OC approach, but it has significant drawbacks.Just setting up the adjoint problem in DAL is a non-trivial task that often overshadows the computational effort of interest.For simple OC problems with complex PDEs, it is hard to justify the use of DAL and its significant derivation overhead.For extremely hard problems like engineering design applications, we often have a set of constraints which must be satisfied in addition to the governing equations.Some of these may be geometric and others may be overly entangled with the PDE variables.The soft constraints approach is typically employed to deal with those, thus eroding DAL's main selling points [17].
Physics-Informed Neural Networks (PINNs)1 [12,36] are among a new wave of methods for solving forward and inverse problems that followed the renaissance of Deep Neural Networks.The breathtaking success of Deep Learning is fuelled by the availability of large volumes of data and tensor-oriented accelerators, beautifully brought together with the backpropagation algorithm [38,50].This algorithm for exact and efficient computation of gradients underpins transformative discoveries in fields such as computer vision [19], natural language processing [48], and scientific machine learning [12] to state just those.The intuition behind backpropagation goes back to Lagrange multipliers and its adjoint states [26], repackaged more generally as reverse-mode automatic differentiation (AD) [18].Recognising its wide applicability outside Deep Learning, several works have leveraged AD to propagate gradients through an entire PDE's discretised solver: −Flow [20] and Mitsuba [30] geared towards computer graphics, JAX-CFD [7] for fluid dynamics, JAX-MD [40] for molecular dynamics, and many more specialised differentiable solvers for geosciences [43], robotics [22], etc.
Like DAL, PINNs in their original form [36] are subject to crippling weaknesses.Surrogate models built with them struggle to generalise to unseen scenarios, and they become harder to train in multi-objective settings.It is particularly hard for PINNs to approximate discontinuous PDE solutions which can develop from smooth initial conditions over time.In Computational Fluid Dynamics, for instance, PINNs routinely fail to learn multiscale, chaotic, or turbulent behaviours, hence limiting their appeal for robust engineering applications [12].The main reason behind these issues is the variational crime [6], which states that by minimising the residual's norm in their loss functions, PINNs demand more regularity than what the theory would typically allow.
In light of this plethora of numerical methods in modern-day control settings, it is important to know the strengths and weaknesses of each method, along with their new use cases outside their original domains.OC practitioners need a robust yet flexible tool to quickly prototype models and control them under various conditions.Furthermore, educating oneself on the mechanisms and future trends surrounding these methods should be beneficial to both OC and Deep Learning communities.
One potential solution to the problems of DAL and PINNs is differentiable programming (DP) 2 , specifically its discretise-thenoptimise (DTO) approach to leveraging AD.In the Python ecosystem, several libraries can enable DP.JAX [9] is one such library that exposes low-level composable transformations like AD, vectorisation, parallelisation, and just-in-time compilation to XLA kernels [39] for various hardware accelerators.In this work, we use JAX not only to implement PINNs, but also to build Updec: an endto-end differentiable PDE solver suitable for optimal control [31].The resulting framework for DP is compared to DAL and PINN for accuracy and computational performance on two problems in engineering settings: the Laplace problem on a unit square, and the fluid flow in a channel governed by the Navier-Stokes equations.
Before considering control, the underlying PDEs must be simulated with either of the following two categories of techniques: with or without a mesh.Using a mesh confers an important inductive bias, namely for discretising spatial differential operators.In contrast, mesh-free3 methods do not require such structure, and are therefore attractive when the geometry is complex.Radial Basis Functions (RBFs) are interpolants that allow for robust, intuitive, and mesh-free simulation of a large variety of PDEs [23].RBFs are the backbone of our DAL and DP numerical schemes.The choice of such an inherently mesh-free method is further motivated by a desire to test our RBF-based implementations against the data-driven but equally mesh-free PINN.
Our contribution is twofold.First, our comparative study of DAL, DP, and PINN in a unified mesh-free context is, to our knowledge, the first of its kind.Our second contribution lies in identifying RBFs as an effective method not only for PDE simulation, but also for enhanced optimisation via the user-friendly Updec differentiable programming framework [31].

METHODS 2.1 Radial Basis Functions
Radial Basis Functions have a long history rooted in approximation theory [10,35].They were revived in the last 30 years, particularly for their use as activation layers in Neural Networks [32].Building on their universal and best approximation theorems [1,33], the expressive power of RBFs is typically used to model PDEs of the form where Ω ∈ R  with boundary Ω = Γ  ∪ Γ  ∪ Γ  . 4,   ,   ,   , and  are known fields defined on Ω, Γ  , Γ  , Γ  , and Γ  respectively.The vector n is the outward-facing normal along the corresponding boundary.D is a (combination of) linear differential operator(s) applied to the unknown scalar field  interpolated as where  is a radial basis function (RBF): a real-valued function of the Euclidean distance ∥ • ∥ 2 .The nodal points {x  } =1,..., represent centres (or centroids) of the RBFs, arbitrarily scattered in Ω and on Ω.{  } =1,..., are appended polynomials as suggested in the RBF-FD framework [44].Solving eq. ( 1) means collocating û at nodes {x  } =1,...,  (which can be the same as the centroids), then applying D, resulting in a linear system of simultaneous equations whose solution yields the coefficients   and   .Under certain conditions, RBFs are known to suffer from the Runge phenomenon [13,14], leading to poor approximation near Ω.The question of boundary handling with RBF has always been tricky, with some of the most impactful applications of RBF bypassing it by using background meshes [41,42].Our implementation accounts for all three major boundary conditions in the literature by careful (re)ordering of the nodes: first the   internal nodes, then   Dirichlet nodes, then   Neumann nodes, and finally   Robin nodes.We refer the reader to our code Updec [31] for details.
In this work, we consider stationary limits of PDEs with no time dependence.Also, our control functions  : C → R are only applied to part of the boundaries.To provide a unified framework suitable for all experiments we conduct, we generalise eqs.(1a) to (1d) by accounting for vector fields with where F and B are respectively the PDE and boundary residuals.The state u is fully dependent on the control , and the constrained optimisation problem is to find

Direct-Adjoint Looping
The use of continuous adjoint variables for optimisation under equality constraints has its origins in Lagrange multipliers [26].
Centuries of development culminating in the Karush-Kuhn-Tucker conditions-to account for inequality constraints- [25] and Pontryagin's Maximum Principle in calculus of variations [34] have set the groundwork for optimal control theory and computational experimentation via adjoint-based sensitivity analysis.
To derive the adjoint equations, one may use calculus of variations to consider  > 0 and a small perturbation ℎ ∈ C, then define the directional derivative whose expansion should display the state sensitivity w.r.t. the control where the explicit notation u  indicates the solution to eq. ( 3) given a control .Linearity and continuity of u ′ w.r.t.ℎ may then be established by deriving a separate PDE based on eq. ( 3); both sides of which are multiplied by an adjoint state  (independent of ).Integration by parts and careful consideration of boundary conditions are then used to isolate u ′ from the differential operators, leading to an adjoint PDE for after which d J d = ∇J can be expressed as a function of u,  and .As showcased in fig.2a, DAL is an optimise-then-discretise (OTD) scheme that initialises , then solves eq. ( 3), then eq. ( 5), then evaluates ∇J , and finally updates  via gradient descent.Details of DAL derivations for the Laplace and Navier-Stokes PDEs using Euler-Lagrange equations can be found in [28].

Physics-Informed Neural Networks
Similar to RBFs, Multilayer Feed Forward Networks are universal approximators [21].They are leveraged by PINNs [36] to approximate solutions to problems involving PDEs (see fig. 2b).PINNs are learnt by enforcing the governing equations as soft constraints at points in the domain and its boundary.The core idea is to include these constraints as part of the loss function in eq. ( 6) where the subscripts on u and  indicate the dependence to Neural Network weights  .Adding a loss term for labelled data is optional, and the use of disconnected nodes in a point cloud makes the method mesh-free, just like RBFs.PINNs have been used in a wide range of application areas, and their efficiency for forward problems is the subject of considerable research interest.The usability of PINNs for optimal control, however, remains much less explored, something we address with this work.An impressive study of PINNs for such applications was recently conducted in [28].Their strategy involved adding to the loss function the cost objective J weighted by an adjustable coefficient .Given the inherent difficulty is solving such a multi-objective optimisation problem, the authors of [28] presented a two-step line search strategy for , which we reproduce.Before the search starts, a Multilayer Perceptron (MLP) is trained to solve the forward PDE with a prescribed control term.This helps identify the set of hyperparameters suitable for the problem at hand: namely, a learning rate schedule and an architecture for the solution network.Then the two-step line search is as follows: (1) A range 5 of coefficients  are used to learn potential optimal controls.Each weight corresponds to a different pair of (u  ,   ) solution-control networks; all trained with the same architecture and hyperparameters preliminarily determined.The weights of u  and   are updated in an alternating manner to minimise the multi-objective loss function L F/B +  J .(2) Since fitting the PDE is an imperative, new solution networks u ′  are retrained for each .This is done using the control networks saved from step 1.The loss function in this step does not include the cost objective J .After training, the pair (u ′ *  ,  *  ) that results in the lowest cost objective is chosen as the optimal solution. 6

Differentiable Programming
The rise of Deep Learning was accompanied by a substantial development in tools to efficiently calculate derivatives [9,18].For instance, with PINNs, network weights are updated via backpropagation [49], which is a special case of automatic differentiation (AD) [5].In contrast to Finite Differences which suffers from truncation and round-off errors, AD returns the exact gradients.It relies on the chain rule of differentiation applied to elementary built-in or custom-made units to compute exact values without ever analytically deriving the corresponding symbolic expressions.
In recent years, the tools that made AD so successful for Deep Learning have been leveraged for simulation in the DTO paradigm of differentiable programming (DP), also known as differentiable physics, differentiable simulation, or differentiable modelling.Showcase examples of applications of AD outside or in conjunction with Neural Networks include [7,20,30,40,43].The same concept permeates through all those examples: they are a succession of elementary operators {P} =1,..., whose derivatives w.r.t. to their inputs can be evaluated (see fig. 2c).That is how ∇J is ultimately obtained.
Our RBF software Updec [31] consists of such operations, which make the solver end-to-end differentiable.We use JAX [9] and its reverse-mode implementation of AD called grad to compute ∇J .In addition, the grad transformation is used to define the differential operator D described in section 2.1, thus giving users the liberty to effortlessly choose or design new functions  (see eq. ( 2)).Besides grad, JAX is particularly suited for this task because of its just-in-time compilation jit and batched (or vectorisation) vmap transformations which improve solver runtimes.

RESULTS
The accuracy and convergence rate of any RBF-based simulation depend heavily on the properties of the basis function .The merits and setbacks of several choices are discussed in the literature [42]: multiquadrics, Gaussians, thin plate splines, etc; most of which are parametrised by a shape parameter.To avoid tuning such parameter, 5 Our recommendation is to start with 1 and explore both directions with positive and negative powers of 10.See examples of ranges in section 3. 6 We find it crucial to train u ′  at least until it matches   on the appropriate boundaries.we opted for the polyharmonic cubic spline  :  →  3 , which when augmented with polynomials of maximum degree7  = 1, provided a robust and performant tool, capable of approximating even nonlinear PDEs such as the Navier-Stokes equations.For all our DAL, PINN, and DP experiments, we used the Adam optimiser [15] to perform gradient descent.While firmly established for Neural Network optimisation, the use of Adam with DAL or DP is less common.In this study, Adam helped increase robustness to noisy gradients at boundaries due to the Runge phenomenon [14].To guide our schemes towards faster and more accurate convergence to a minimiser, the initial learning rate was divided by 10 after half the iterations or epochs, and again by 10 at 75% completion.

The Laplace Problem
Consider the Laplace equation [28] in the unit square Ω with Dirichlet boundary conditions where  is the control potential applied to the top wall.We seek to solve the convex optimal control problem  * = argmin  J () subject to eq. ( 7) , with This problem has an analytical minimiser given by corresponding to the state solution For the DAL and DP techniques, the PDE ( 7) was solved on a regular 100 × 100 grid, which resulted in better conditioned collocation matrices compared with a scattered point cloud of the same size.∇J was then evaluated to iteratively update , initially set to identically 0. The initial learning rates for DAL and DP was 10 −2 , and 10 −3 for the PINN 8 .
With the PINN, training as seen in figs.3c to 3e was done on a scattered cloud, while testing was performed on the same 100×100 regular grid as for DAL and DP.This regularised the PINN and improved generalisation.The preliminary step to the line search recommended using a MLP with 3 hidden layers of 30 neurons each, an architecture which provided good balance between expressiveness, overfitting, and computational efficiency.Each layer was equipped with an infinitely differentiable tanh activation function.Like in [28], we tried 11 values of  (from 10 −3 to 10 7 ) and analysed their effect on the total loss function eq. ( 6).We found that  * = 10 −1 gives rise to the most balanced solution for this problem.Results highlighting the unmatched performance of DP are reported in fig.3; and a summary of the main hyperparameters used throughout the experiment is presented in table 1.

The Navier-Stokes Problem
The continued relevance of fluid flow between parallel plates is clear from extensive studies in the literature [4,37].As such, it represents an excellent case study for our methods.
The setup and the domain of interest in [28] are illustrated in fig.4a.Given some perturbations due to blowing at Γ  and suction at Γ  , what inflow velocity at Γ  would cause a parabolic profile at the outlet Γ  ?The flow in Ω is governed by the stationary incompressible Navier-Stokes equations in their dimensionless form (9b) 8 We present hyperparameters for the first and most important step of the line search strategy, which often required values different from step 2. 9 The three methods are trained for different number of epochs/iterations as per table  1: Hyperparameter summary for the Laplace problem implemented in a Python [47] environment equipped with JAX [9] and Updec [31].The learning rates follow a piece-wise constant schedule described early in section 3.
The network architecture indicates the number of hidden layers and neurons per layer in the MLP.The hyphen symbol "-" stands for hyperparameters not applicable (NA) to the method under consideration. .where u = (, ).The (controlled) boundary conditions are We want to find  * = argmin  J () subject to eqs. ( 9) and ( 10), (11) with Upon deriving the adjoint equations, we decoupled the (adjoint) velocity components and the (adjoint) pressure.We employed a Chorin-inspired projection approach [11] to iteratively bring the fields to steady states [51].We set the number of refinements  = 3 for DAL and  = 10 for DP, both with initial learning rates 10 −1 .We set the Reynolds number to  = 100, and the initial guess for the inflow velocity to 4 (1 − )/ 2  .Given the complexity of the domain and the benefits of mesh refinement near free surfaces [51], we meshed the domain with GMSH [16], from which we extracted 1385 scattered and disconnected nodes, used for all three methods (see table 2).
In the PINN's loss function, we included all Dirichlet and homogeneous Neumann boundary penalty terms for the velocity as suggested in [28].As for the pressure, only the Dirichlet boundary conditions at the outlet were enforced since it made no difference to the preliminary calibration of the PINN architecture to include Neumann conditions.During this preliminary step, we found that a MLP architecture with 5 fully connected hidden layers of 50 neurons each was well suited for this problem, and it offered a limited computational cost.As seen in table 2, the initial learning rate was set to 10 −3 .The line search strategy explored 9 values for  from 10 −3 to 10 5 , settling on  * = 1.

Hyperparameter DAL PINN DP
Init. learning rate 10 −1 10 −3  4d.The DAL fails to capture the solution due to RBF-related inaccuracies when computing the gradients of u, needed for resolving the adjoint advection operator.We found that this problem is lessened with a reduced  = 10 which led to better solutions with DAL.The DP and the PINN on the other hand, succeed in capturing minima, with the cost objectives getting as low as 2.61 × 10 −4 and 1.04 × 10 −3 , respectively.

DISCUSSION
In the context of RBFs, the DAL approach was shown to perform well on the Laplace optimal control problem, although it compared  poorly to the other methods (see fig. 3b).Here, DAL converged despite the gradients rising to very large values.The Adam optimiser which alleviated this issue was ultimately unable to repeat the same for the Navier-Stokes problem (see fig. 4b).This is common in OTD approaches, where round-off and numerical errors are often encountered [24].In short, with DAL, numerical errors linked to the derivation of continuous adjoint equations should be handled with care.
A well implemented PINN along with a two-step line search strategy is a powerful tool for optimal control.The PINN was able to find appropriate controls for the Laplace and Navier-Stokes problems (see figs.3a and 4c).Although setting up and training a PINN is perhaps more costly compared to DAL (which is very problemspecific), its performance is good.These observations align with those of [28] .On the flip side, the PINN showed little generalisation capability, especially with regard to .We experimented with values from  = 10 to  = 100 and found that more laminar flows required significantly more epochs to train to satisfaction 10 .The idea of repeating the line search strategy for each new set of parameters diminishes the appeal of PINNs.Furthermore, PINNs require a considerable amount of data and a substantial training time [43] as evident in table 3.
The most well-rounded approach is undeniably the DP.It provides extremely low cost objectives for both Laplace and Navier-Stokes problems (see figs.3b and 4b).In addition to it being relatively easy to set up, its discretise-then-optimise (DTO) approach is able to overcome the RBF-related inaccuracies that plague the DAL.That said, DP as conceived in this study can be memory inefficient due to storage and optimisation of a computational graph [ 9,24].For the Navier-Stokes problem, the computational complexity scales super-linearly with the number of refinement steps  needed to account for the non-linear advection operator.As recommended in similar studies [24,27], if  is small then DP should be prioritised, especially given that its gradients are the gold standard in optimisation 11 [24,43].We also note from fig. 4c that the DP control is considerably less smooth than the other two.This could be resolved by increasing the learning rate and performing less gradient descent iterations; or by penalising the control's variations by adding the integral term ∫   0 |∇ (  , )| 2 d to the cost objective.We refrained from doing the latter since it prevents a fair comparison between the methods.The computational performance of the methods is implementationspecific.However, we provide in table 3 a comparative summary of the three methods using Updec [31].It indicates the resources required to ultimately achieve the reported values of the cost objectives J .Although only indicative, we believe that similar benchmarks should always be considered when solving an optimal control problem.
Interpreting table 3, we see that DP is applicable in most optimal control scenarios; that is until the memory requirements become prohibitively large.In those extreme cases, we believe the PINN serves as a good second choice.Furthermore, the rapidly growing body of research currently focusing on advanced PINN paradigms for inverse problems will likely address most issues we've outlined in this work.The best use case for DAL is when the problem formulation is not complex (either linear elliptic or parabolic PDEs), allowing for a seamless derivation of the adjoint equations, like we've shown with the Laplace problem.

CONCLUSION
This paper compared the expressiveness of Deep Neural Networks to that of Radial Basis Functions for mesh-free control of systems governed by PDEs.We find that a combination of ideas from both 11 Although it should be noted that classical Finite Differences was efficient in providing accurate gradients for our Navier-Stokes problem at a reduced memory cost.worlds via differentiable programming (DP) is the most promising approach.For optimal control problems under Laplace and Navier-Stokes constraints, DP compared favourably to Physics-Informed Neural Networks (PINNs) in terms of accuracy and computational performance.It even succeeded where the well-established directadjoint looping (DAL) algorithm failed.We developed a flexible JAX-based framework for carrying out a robust and diverse range of experiments, which allowed us to share indicative comparisons on the performance of each method.Future work on this project will aim at improving the stability of the adjoint DAL equations and incorporate time to tackle turbulent flows.This includes exploring alternative mesh-free methods like Smoothed Particle Hydrodynamics (SPH).Moreover, we aim to improve the memory and computational efficiency of DP by massively parallelising the framework.

1 Figure 1 :
Figure 1: Qualitative evaluation of our differentiable programming (DP) framework against direct-adjoint looping (DAL) and physics-informed neural networks (PINNs) when the goal is to control the inflow velocity at  = 0 to achieve a parabolic outflow profile at  = 1.5 given a cross-flow at the mid-point.ABSTRACT The field of Optimal Control under Partial Differential Equations (PDE) constraints is rapidly changing under the influence of Deep Learning and the accompanying automatic differentiation libraries.Novel techniques like Physics-Informed Neural Networks (PINNs) and Differentiable Programming (DP) are to be contrasted with established numerical schemes like Direct-Adjoint Looping (DAL).We present a comprehensive comparison of DAL, PINN, and DP using a general-purpose mesh-free differentiable PDE solver based on Radial Basis Functions.Under Laplace and Navier-Stokes equations, we found DP to be extremely effective as it produces the most accurate gradients; thriving even when DAL fails and PINNs struggle.Additionally, we provide a detailed benchmark highlighting the limited conditions under which any of those methods can be efficiently used.Our work provides a guide to Optimal Control practitioners and connects them further to the Deep Learning community.

Figure 2 :
Figure 2: Illustration of a) DAL as it evaluates ∇J by simulating two distinct PDEs.b) PINN fits a Neural Network by enforcing soft PDE constraints, which J is part of.c) DP combines the fitting strategy of PINNs with the strict adherence to first principles of DAL; gradients travel backwards through the solver before influencing .

1 (Figure 3 :
Figure 3: Results for the Laplace control problem highlighting how well DP is able to minimise the cost objective J relative to DAL and PINN 9 (see (a) and (b)).In (c), (d), and (e), we showcase the various stages of the PINN's line search strategy for .(f) and (g) provide a comparison of the state solutions after optimisation, confirming the superiority of DP through its low absolute error.

Figure 4 :
Figure4: Setup adapted from[28] and results for the Navier-Stokes optimal control problem.(d) shows how DP and PINN succeed in achieving a near-parabolic profile at the outlet.However, from fig.1, we see that PINN achieves good control at the expense of first principles.

Table
3. The term "strided" in fig.3bis meant to convey that fact.

Table 2 :
Hyperparameter summary for the Navier-Stokes problem.Like in table 1, the hyphen "-" stands for not applicable (NA)..

Table 3 :
Performance details for DAL, PINN, and DP, with each method run on the hardware that allowed optimal performance.DAL and DP results were obtained on a AMD Ryzen 9 5950X 16-Core Processor.The PINN was trained on an Nvidia GeForce RTX 3090 (we highlight this with a *).