Convergence rate of gradient descent for convex functions

Suppose, given a convex function $f: \bR^d \to \bR$, we would like to find the minimum of $f$ by iterating \begin{align*} \theta_t = \theta_{t-1} - \gamma \nabla f(\theta_{t-1}). \end{align*} How fast do we converge to the minima of $f$?


The Poisson estimator

Let's say you want to estimate a quantity $\mu$, but you have only access to unbiased estimates of its logarithm, i.e., $\log\mu$. Can you obtain an unbiased estimate of $\mu$?


A primer on filtering

Say that you have a dynamical process of interest $X_1,\ldots,X_n$ and you can only observe the process with some noise, i.e., you get an observation sequence $Y_1,\ldots,Y_n$. What is the optimal way to estimate $X_n$ conditioned on the whole sequence of observations $Y_{1:n}$?


A simple bound for optimisation using a grid

If I give you a function on $[0,1]$ and a computer and want you to find the minimum, what would you do? Since you have the computer, you can be lazy: Just compute a grid on $[0,1]$, evaluate the grid points and take the minimum. This will give you something close to the true minimum. But how much?


An $L_2$ bound for Perfect Monte Carlo

Suppose that you sample from a probability measure $\pi$ to estimate the expectation $\pi(f) := \int f(x) \pi(\mbox{d}x)$ and formed an estimate $\pi^N(f)$. How close are you to the true expectation $\pi(f)$?


Tinkering around logistic map

I was tinkering around logistic map $x_{n+1} = a x_n (1 - x_n)$ today and I wondered what happens if I plot the histogram of the generated sequence $(x_n)_{n\geq 0}$. Can it possess some statistical properties?


Monte Carlo as Intuition

Suppose we have a continuous random variable $X \sim p(x)$ and we would like to estimate its tail probability, i.e. the probability of the event $\{X \geq t\}$ for some $t \in \mathbb{R}$. What is the most intuitive way to do this?


Fisher's Identity

Fisher's identity is useful to use in maximum-likelihood parameter estimation problems. In this post, I give its proof. The main reference is Douc, Moulines, Stoffer; Nonlinear time series theory, methods and applications.


Batch MLE for the GARCH(1,1) model


In this post, we derive the batch MLE procedure for the GARCH model in a more principled way than the last GARCH post. The derivation presented here is simple and concise.


Convergence of gradient descent algorithms


In this post, I review the convergence proofs of gradient algorithms. Our main reference is: Leon Bottou, Online learning and stochastic approximations. I rewrite the proofs described in Bottou's paper but with more details about the points which are subtle to me. I tried to write the proofs as clear as possible so as to make them accessible to everyone.