## Introduction

This simple note reviews the importance sampling. This discussion is adapted from here and here.

## Importance sampling

Suppose we have a random variable $X$ with density $\pi(x)$ and we wish to estimate its moments of the form, \begin{align} \bE_\pi(f(X)) = \int f(x) \pi(x) dx \end{align} Suppose also that, drawing samples from this density is impossible. In importance sampling, our aim is to use some another tractable density, say $q(x)$, to estimate the expectations of a target density $\pi(x)$.

We assume that, we can draw samples from $q(x)$ easily. Then, to estimate the expectations of $\pi$, we use the following trick, \begin{align*} \bE_\pi(f(X)) &= \int f(x) \pi(x) dx \\ &=\int f(x) \frac{\pi(x)}{q(x)} q(x) dx \end{align*} For theoretical issues, we assume some absolute continuity properties. Also, note that, $q(x)$ should capture the support of $\pi(x)$. We put, \begin{align*} W(x) = \frac{\pi(x)}{q(x)} \end{align*} $W(x)$ is called weight function. Therefore, we write, \begin{align*} \bE_\pi(f(X)) &= \int f(x) \frac{\pi(x)}{q(x)} q(x) dx \\ &= \bE_q( f(X) W(X)) \end{align*} Say, we have $N$ samples $\{x^{(i)}\}_{i=1}^N$ such that $x^{(i)} \sim q(x)$. Then, we can estimate the expectation as follows, \begin{align*} \bE_q( f(X) W(X)) \approx \frac{1}{N} \sum_{i=1}^N f(x^{(i)}) W(x^{(i)}) \end{align*}

## Application to Bayesian statistics

In Bayesian statistics, a problem of interest is to estimate the normalizing constant of some unnormalized probability distribution. Importance sampling is useful for this purpose.

Suppose we have, \begin{align} \pi(x) = \frac{1}{Z} \varphi(x) \end{align} then, $Z = \int \varphi(x) dx$. Suppose again, we wish to estimate the expectations of the form $\bE_\pi(f(X))$. Hence, \begin{align*} \bE_\pi(f(X)) &= \frac{1}{Z} \int f(x) \varphi(x) dx \\ &= \frac{1}{Z} \int f(x) \frac{\varphi(x)}{q(x)} q(x) dx \\ &= \frac{1}{Z} \bE_q(f(X) W(X)) \end{align*} We can also apply this trick for $Z$, \begin{align*} Z &= \int \varphi(x) dx \\ & = \int \frac{\varphi(x)}{q(x)} q(x) dx \\ & = \bE_q (W(X)) \end{align*} Then the expectation $\bE_\pi(f(X))$ becomes, \begin{align} \bE_\pi(f(X)) = \frac{\bE_q(f(X) W(X))}{\bE_q (W(X))} \end{align} Since we can draw $N$ samples $\{x^{(i)}\}_{i=1}^N$ such that $x^{(i)} \sim q(x)$, we can also estimate this expectation as, \begin{align} \bE_\pi(f(X)) &\approx \frac{\frac{1}{N} \sum_{i=1}^N f(x^{(i)}) W(x^{(i)})}{\frac{1}{N} \sum_{j=1}^N W(x^{(j)})} \\ & = \frac{\sum_{i=1}^N f(x^{(i)}) W(x^{(i)})}{\sum_{j=1}^N W(x^{(j)})} \nonumber \end{align} The notion of normalised weights is heavily used in the literature. We noted that, $W(x)$ is called as weight function, however in this particular sense, it is called unnormalised weights. To normalise it, we do the following, \begin{align} \bE_\pi(f(X)) &\approx \frac{f(x^{(1)}) W(x^{(1)}) + f(x^{(2)}) W(x^{(2)}) + \cdots + f(x^{(N)}) W(x^{(N)})}{\sum_{j=1}^N W(x^{(j)})} \nonumber \\ &= f(x^{(1)}) \frac{W(x^{(1)})}{\sum_{j=1}^N W(x^{(j)})} + f(x^{(2)}) \frac{W(x^{(2)})}{\sum_{j=1}^N W(x^{(j)})} + \cdots \nonumber \\ &+ f(x^{(N)}) \frac{W(x^{(N)})}{\sum_{j=1}^N W(x^{(j)})} \end{align} We put, \begin{align*} w^{(i)} = \frac{W(x^{(i)})}{\sum_{j=1}^N W(x^{(j)})} \end{align*} then the estimate of the expectation becomes, \begin{align} \bE_\pi(f(X)) &\approx \sum_{i = 1}^N w^{(i)} f(x^{(i)}) \end{align} here $w^{(i)}$ is called as normalised importance weights.