almost stochastic: Fisher's Identity

Fisher's identity is useful to use in maximum-likelihood parameter estimation problems. In this post, I give its proof. The main reference is Douc, Moulines, Stoffer; Nonlinear time series theory, methods and applications.

Let $(X_n)$ and $(Y_n)$ be two sequences of random variables. We'll call $Y:=(Y_1,\ldots,Y_n)$ as the observed data, and $X:=(X_1,\ldots,X_n)$ as hidden variables. Let the joint probability distribution of these variables be parameterized by $\theta$ where $\theta \in \Theta$ and $\Theta \subset \bR^{d_\theta}$. A useful quantity arising in parameter estimation problems is the gradient of the log-likelihood of observations (i.e. the score function) wrt to parameter $\theta$; it is denoted as $\nabla_\theta \log p_\theta(y_{1:n})$ where $n \in \bN$.

The Fisher's identity relates this quantity with another one involving hidden variables, \begin{align*} \nabla_\theta \log p_\theta(y_{1:n}) = \int \nabla_\theta \log p_\theta(x_{1:n},y_{1:n}) p_\theta(x_{1:n}|y_{1:n}) \mbox{d}x_{1:n} \end{align*} where we assume all functions are regular enough to perform change of integration and differentiation.

Proof. We first note that \begin{align}\label{Marginal} p_\theta(y_{1:n}) = \int p_\theta(x_{1:n},y_{1:n})\mbox{d}x_{1:n} \end{align} and note also we'll use the differentiation of logarithm of a function, i.e., \begin{align*} \frac{\mbox{d}\log(f(x))}{\mbox{d}x} = \frac{f'}{f} \end{align*} Let us start by writing $\nabla_\theta \log p_\theta(y_{1:n})$ as, \begin{align} \nabla_\theta \log p_\theta(y_{1:n}) = \frac{\nabla_\theta p_\theta(y_{1:n})}{p_\theta(y_{1:n})} \end{align} by differentiating the logarithm. Then consider the Eq. \eqref{Marginal}, \begin{align*} \frac{\nabla_\theta p_\theta(y_{1:n})}{p_\theta(y_{1:n})} = \frac{\nabla_\theta \int p_\theta(x_{1:n},y_{1:n}) \mbox{d}x_{1:n}}{p_\theta(y_{1:n})} \end{align*} Remember that we assumed all functions are regular enough to perform change of integration and differentiation. Then, we'll have, \begin{align}\label{InterM} \nabla_\theta \log p_\theta(y_{1:n}) = \int \frac{\nabla_\theta p_\theta(x_{1:n},y_{1:n})}{p_\theta(y_{1:n})} \mbox{d}x_{1:n} \end{align} Now the trick is to write the following, \begin{align}\label{logtrick} \nabla_\theta p_\theta(x_{1:n},y_{1:n}) = \nabla_\theta \log p_\theta(x_{1:n},y_{1:n}) p_\theta(x_{1:n},y_{1:n}) \end{align} If we put \eqref{logtrick} into \eqref{InterM}, we obtain, \begin{align}\label{InterM2} \nabla_\theta \log p_\theta(y_{1:n}) = \int \nabla_\theta \log p_\theta(x_{1:n},y_{1:n}) \frac{p_\theta(x_{1:n},y_{1:n})}{p_\theta(y_{1:n})}\mbox{d}x_{1:n} \end{align} Note the Bayes rule for the last term in the integral, \begin{align}\label{Bayes} p(x_{1:n}|y_{1:n}) = \frac{p(x_{1:n},y_{1:n})}{p(y_{1:n})} \end{align} then we have, \begin{align} \nabla_\theta \log p_\theta(y_{1:n}) = \int \nabla_\theta \log p_\theta(x_{1:n},y_{1:n}) p_\theta(x_{1:n}|y_{1:n})\mbox{d}x_{1:n} \end{align} which is known as the Fisher's identity. $\blacksquare$

5 comments:

Anonymous said...: thanks for the notes, this helps!; 1 January 2016 at 19:08
Deniz said...: You are welcome!; 3 January 2016 at 12:52
Anonymous said...: thanks you very much.; 19 July 2016 at 18:37
Anonymous said...: your post would be better if you were to mention all 'regularity conditions' in order to be able to reverse the sequence of differentiation/integration (used to move from in-between eq. 2 and 3, to 3 above); 30 August 2016 at 19:14
Deniz said...: Thanks for the feedback. I will add them whenever I find some time.; 30 August 2016 at 20:18

almost stochastic

2014/06/12

Fisher's Identity

5 comments:

Post a Comment

search

mathematics village

archive

band pass filter

links

almost stochastic

2014/06/12

Fisher's Identity

5 comments:

Post a Comment

rss

search

mathematics village

archive

band pass filter

links