almost stochastic

Convergence rate of gradient descent for convex functions

2020-11-11T15:12:00.012+03:00

Suppose, given a convex function $f: \bR^d \to \bR$, we would like to find the minimum of $f$ by iterating \begin{align*} \theta_t = \theta_{t-1} - \gamma \nabla f(\theta_{t-1}). \end{align*} How fast do we converge to the minima of $f$?

The Poisson estimator

2019-06-14T13:46:00.001+03:00

Let's say you want to estimate a quantity $\mu$, but you have only access to unbiased estimates of its logarithm, i.e., $\log\mu$. Can you obtain an unbiased estimate of $\mu$?

A primer on filtering

2016-10-26T18:34:00.000+03:00

Say that you have a dynamical process of interest $X_1,\ldots,X_n$ and you can only observe the process with some noise, i.e., you get an observation sequence $Y_1,\ldots,Y_n$. What is the optimal way to estimate $X_n$ conditioned on the whole sequence of observations $Y_{1:n}$?

A simple bound for optimisation using a grid

2016-09-23T15:36:00.002+03:00

If I give you a function on $[0,1]$ and a computer and want you to find the minimum, what would you do? Since you have the computer, you can be lazy: Just compute a grid on $[0,1]$, evaluate the grid points and take the minimum. This will give you something close to the true minimum. But how much?

An $L_2$ bound for Perfect Monte Carlo

2016-01-17T13:17:00.001+02:00

Suppose that you sample from a probability measure $\pi$ to estimate the expectation $\pi(f) := \int f(x) \pi(\mbox{d}x)$ and formed an estimate $\pi^N(f)$. How close are you to the true expectation $\pi(f)$?

Tinkering around logistic map

2015-03-08T16:11:00.000+02:00

I was tinkering around logistic map $x_{n+1} = a x_n (1 - x_n)$ today and I wondered what happens if I plot the histogram of the generated sequence $(x_n)_{n\geq 0}$. Can it possess some statistical properties?

Monte Carlo as Intuition

2015-03-04T11:43:00.003+02:00

Suppose we have a continuous random variable $X \sim p(x)$ and we would like to estimate its tail probability, i.e. the probability of the event $\{X \geq t\}$ for some $t \in \mathbb{R}$. What is the most intuitive way to do this?

Fisher's Identity

2014-06-12T07:51:00.000+03:00

Fisher's identity is useful to use in maximum-likelihood parameter estimation problems. In this post, I give its proof. The main reference is Douc, Moulines, Stoffer; Nonlinear time series theory, methods and applications.

Batch MLE for the GARCH(1,1) model

2014-06-04T20:05:00.000+03:00

Introduction

In this post, we derive the batch MLE procedure for the GARCH model in a more principled way than the last GARCH post. The derivation presented here is simple and concise.

Fatou's lemma and monotone convergence theorem

2013-11-18T18:59:00.000+02:00

In this post, we deduce Fatou's lemma and monotone convergence theorem (MCT) from each other.

Young's, Hölder's and Minkowski's Inequalities

2013-11-14T18:57:00.000+02:00

In this post, we prove Young's, Holder's and Minkowski's inequalities with full details. We prove Hölder's inequality using Young's inequality. Then we prove Minkowski's inequality by using Hölder.

Sequential importance sampling-resampling

2013-08-20T14:35:00.002+03:00

Introduction

In this post, I review the sequential importance sampling-resampling for state space models. These algorithms are also known as particle filters. I give a derivation of these filters and their application to the general state space models.

Importance sampling

2013-07-30T14:49:00.002+03:00

Introduction

This simple note reviews the importance sampling. This discussion is adapted from here and here.

Static Parameter Estimation for the GARCH model

2013-07-22T14:53:00.000+03:00

Introduction

In this post, we review the online maximum-likelihood parameter estimation for GARCH model which is a dynamic variance model. GARCH can be seen as a toy volatility model and used as a textbook example for financial time series modelling.

Nonnegative Matrix Factorization

2013-06-22T23:55:00.000+03:00

Introduction.

In this post, I derive the nonnegative matrix factorization (NMF) algorithm as proposed by Lee and Seung (1999). I derive the multiplicative updates from a gradient descent point of view by using the treatment of Lee and Seung in their later NIPS paper Algorithms for Nonnegative Matrix Factorization. The code for this blogpost can be accessed from here.

The EM Algorithm

2013-05-25T01:55:00.000+03:00

Introduction.

In this post, we review the Expectation-Maximization (EM) algorithm and its use for maximum-likelihood problems.

Stochastic gradient descent

2013-05-23T18:50:00.000+03:00

In this post, I introduce the widely used stochastic optimization technique, namely the stochastic gradient descent. I also implement the algorithm for the linear-regression problem and provide the Matlab code.

Gaussianity, Least squares, Pseudoinverse

2013-05-20T17:45:00.001+03:00

Introduction.

In this post, we show the relationship between Gaussian observation model, Least-squares and pseudoinverse. We start with a Gaussian observation model and then move to the least-squares estimation. Then we show that the solution of the least-squares corresponds to the pseudoinverse operation.

The use of Ito-Doeblin formula to solve SDEs

2013-05-03T19:50:00.001+03:00

Introduction

These notes are mostly based on the book Stochastic Calculus for Finance vol. II, Chapter 4. I give a few propositions and focus on exercises of Shreve by make use of the Ito-Doeblin formula. The use of Ito-Doeblin formula is almost purely practical to solve continuous-time stochastic models. My treatment is slightly different from the Shreve since I emphasize on the differential forms of the formulas.