Fisher's identity is useful to use in maximum-likelihood parameter estimation problems. In this post, I give its proof. The main reference is Douc, Moulines, Stoffer; Nonlinear time series theory, methods and applications.

Let $(X_n)$ and $(Y_n)$ be two sequences of random variables. We'll call $Y:=(Y_1,\ldots,Y_n)$ as the observed data, and $X:=(X_1,\ldots,X_n)$ as hidden variables. Let the joint probability distribution of these variables be parameterized by $\theta$ where $\theta \in \Theta$ and $\Theta \subset \bR^{d_\theta}$. A useful quantity arising in parameter estimation problems is the gradient of the log-likelihood of observations (i.e. the score function) wrt to parameter $\theta$; it is denoted as $\nabla_\theta \log p_\theta(y_{1:n})$ where $n \in \bN$.

The Fisher's identity relates this quantity with another one involving hidden variables,
\begin{align*}
\nabla_\theta \log p_\theta(y_{1:n}) = \int \nabla_\theta \log p_\theta(x_{1:n},y_{1:n}) p_\theta(x_{1:n}|y_{1:n}) \mbox{d}x_{1:n}
\end{align*}
where we assume all functions are regular enough to perform change of integration and differentiation.

*Proof.*We first note that \begin{align}\label{Marginal} p_\theta(y_{1:n}) = \int p_\theta(x_{1:n},y_{1:n})\mbox{d}x_{1:n} \end{align} and note also we'll use the differentiation of logarithm of a function, i.e., \begin{align*} \frac{\mbox{d}\log(f(x))}{\mbox{d}x} = \frac{f'}{f} \end{align*} Let us start by writing $\nabla_\theta \log p_\theta(y_{1:n})$ as, \begin{align} \nabla_\theta \log p_\theta(y_{1:n}) = \frac{\nabla_\theta p_\theta(y_{1:n})}{p_\theta(y_{1:n})} \end{align} by differentiating the logarithm. Then consider the Eq. \eqref{Marginal}, \begin{align*} \frac{\nabla_\theta p_\theta(y_{1:n})}{p_\theta(y_{1:n})} = \frac{\nabla_\theta \int p_\theta(x_{1:n},y_{1:n}) \mbox{d}x_{1:n}}{p_\theta(y_{1:n})} \end{align*} Remember that we assumed all functions are regular enough to perform change of integration and differentiation. Then, we'll have, \begin{align}\label{InterM} \nabla_\theta \log p_\theta(y_{1:n}) = \int \frac{\nabla_\theta p_\theta(x_{1:n},y_{1:n})}{p_\theta(y_{1:n})} \mbox{d}x_{1:n} \end{align} Now the trick is to write the following, \begin{align}\label{logtrick} \nabla_\theta p_\theta(x_{1:n},y_{1:n}) = \nabla_\theta \log p_\theta(x_{1:n},y_{1:n}) p_\theta(x_{1:n},y_{1:n}) \end{align} If we put \eqref{logtrick} into \eqref{InterM}, we obtain, \begin{align}\label{InterM2} \nabla_\theta \log p_\theta(y_{1:n}) = \int \nabla_\theta \log p_\theta(x_{1:n},y_{1:n}) \frac{p_\theta(x_{1:n},y_{1:n})}{p_\theta(y_{1:n})}\mbox{d}x_{1:n} \end{align} Note the Bayes rule for the last term in the integral, \begin{align}\label{Bayes} p(x_{1:n}|y_{1:n}) = \frac{p(x_{1:n},y_{1:n})}{p(y_{1:n})} \end{align} then we have, \begin{align} \nabla_\theta \log p_\theta(y_{1:n}) = \int \nabla_\theta \log p_\theta(x_{1:n},y_{1:n}) p_\theta(x_{1:n}|y_{1:n})\mbox{d}x_{1:n} \end{align} which is known as the Fisher's identity. $\blacksquare$

## 5 comments:

thanks for the notes, this helps!

You are welcome!

thanks you very much.

your post would be better if you were to mention all 'regularity conditions' in order to be able to reverse the sequence of differentiation/integration (used to move from in-between eq. 2 and 3, to 3 above)

Thanks for the feedback. I will add them whenever I find some time.

## Post a Comment