Suppose, given a convex function $f: \bR^d \to \bR$, we would like to find the minimum of $f$ by iterating \begin{align*} \theta_t = \theta_{t-1} - \gamma \nabla f(\theta_{t-1}). \end{align*} How fast do we converge to the minima of $f$?

## 2020/11/11

## 2019/06/14

### The Poisson estimator

Let's say you want to estimate a quantity $\mu$, but you have only access to unbiased estimates of its logarithm, i.e., $\log\mu$. Can you obtain an unbiased estimate of $\mu$?

## 2016/10/26

### A primer on filtering

Say that you have a dynamical process of interest $X_1,\ldots,X_n$ and you can only observe the process with some noise, i.e., you get an observation sequence $Y_1,\ldots,Y_n$. What is the optimal way to estimate $X_n$ conditioned on the whole sequence of observations $Y_{1:n}$?

categories:
dynamical systems,
stochastic filtering

## 2016/09/23

### A simple bound for optimisation using a grid

If I give you a function on $[0,1]$ and a computer and want you to find the minimum, what would you do? Since you have the computer, you can be lazy: Just compute a grid on $[0,1]$, evaluate the grid points and take the minimum. This will give you something close to the true minimum. But how much?

## 2016/01/17

### An $L_2$ bound for Perfect Monte Carlo

Suppose that you sample from a probability measure $\pi$ to estimate the expectation $\pi(f) := \int f(x) \pi(\mbox{d}x)$ and formed an estimate $\pi^N(f)$. How close are you to the true expectation $\pi(f)$?

## 2015/03/08

### Tinkering around logistic map

I was tinkering around logistic map $x_{n+1} = a x_n (1 - x_n)$ today and I wondered what happens if I plot the histogram of the generated sequence $(x_n)_{n\geq 0}$. Can it possess some statistical properties?

## 2015/03/04

### Monte Carlo as Intuition

Suppose we have a continuous random variable $X \sim p(x)$ and we would like to estimate its tail probability, i.e. the probability of the event $\{X \geq t\}$ for some $t \in \mathbb{R}$. What is the most intuitive way to do this?

## 2014/06/12

### Fisher's Identity

Fisher's identity is useful to use in maximum-likelihood parameter estimation problems. In this post, I give its proof. The main reference is Douc, Moulines, Stoffer; Nonlinear time series theory, methods and applications.

categories:
maximum-likelihood,
time series analysis

## 2014/06/04

### Batch MLE for the GARCH(1,1) model

## Introduction

In this post, we derive the batch MLE procedure for the GARCH model in a more principled way than the last GARCH post. The derivation presented here is simple and concise.

categories:
maximum-likelihood,
time series analysis

## 2014/01/30

### Convergence of gradient descent algorithms

## Introduction

In this post, I review the convergence proofs of gradient algorithms. Our main reference is: Leon Bottou, Online learning and stochastic approximations. I rewrite the proofs described in Bottou's paper but with more details about the points which are subtle to me. I tried to write the proofs as clear as possible so as to make them accessible to everyone.