# Mathematics of Sensitivity Analysis

## Forward Sensitivity Analysis

The local sensitivity is computed using the sensitivity ODE:

where

is the Jacobian of the system,

are the parameter derivatives, and

is the vector of sensitivities. Since this ODE is dependent on the values of the independent variables themselves, this ODE is computed simultaneously with the actual ODE system.

Note that the Jacobian-vector product

can be computed without forming the Jacobian. With finite differences, this through using the following formula for the directional derivative

or, alternatively and without truncation error, by using a dual number with a single partial dimension, $d = x + v \epsilon$ we get that

as a fast way to calcuate $Jv$. Thus, except when a sufficiently good function for `J`

is given by the user, the Jacobian is never formed. For more details, consult the MIT 18.337 lecture notes on forward mode AD.

## Adjoint Sensitivity Analysis

This adjoint requires the definition of some scalar functional $g(u,p)$ where $u(t,p)$ is the (numerical) solution to the differential equation $d/dt u(t,p)=f(t,u,p)$ with $t\in [0,T]$ and $u(t_0,p)=u_0$. Adjoint sensitivity analysis finds the gradient of

some integral of the solution. It does so by solving the adjoint problem

where $f_u$ is the Jacobian of the system with respect to the state $u$ while $f_p$ is the Jacobian with respect to the parameters. The adjoint problem's solution gives the sensitivities through the integral:

Notice that since the adjoints require the Jacobian of the system at the state, it requires the ability to evaluate the state at any point in time. Thus it requires the continuous forward solution in order to solve the adjoint solution, and the adjoint solution is required to be continuous in order to calculate the resulting integral.

There is one extra detail to consider. In many cases we would like to calculate the adjoint sensitivity of some discontinuous functional of the solution. One canonical function is the L2 loss against some data points, that is:

In this case, we can reinterpret our summation as the distribution integral:

where $δ$ is the Dirac distribution. In this case, the integral is continuous except at finitely many points. Thus it can be calculated between each $t_i$. At a given $t_i$, given that the $t_i$ are unique, we have that

Thus the adjoint solution $\lambda^{\star}(t)$ is given by integrating between the integrals and applying the jump function $g_u$ at every data point $t_i$.

We note that

is a vector-transpose Jacobian product, also known as a `vjp`

, which can be efficiently computed using the pullback of backpropogation on the user function `f`

with a forward pass at `u`

with a pullback vector $\lambda^{\star}$. For more information, consult the MIT 18.337 lecture notes on reverse mode AD