almost stochastic: A simple bound for optimisation using a grid

If I give you a function on $[0,1]$ and a computer and want you to find the minimum, what would you do? Since you have the computer, you can be lazy: Just compute a grid on $[0,1]$, evaluate the grid points and take the minimum. This will give you something close to the true minimum. But how much?

Mathematically you take $p+1$ points and partition $[0,1]$ to $p$ intervals. Let's call points as $x_0,\ldots,x_{p}$. You evaluate $f(x_0),\ldots,f(x_{p})$ using your computer and take the minimum, let's denote it with $f(\bar{x})$ where $\bar{x}$ is the argument which gives the minimum value on the grid. We would like to say something about $f(\bar{x}) - f(x^*)$ where $x^*$ is the true minimum.

In order to say something about how the computer solution relates to the true minimum, we have to say something about the function. If you think about it intuitively, this assumption about the function is necessary: What if your function is so crazy in ways that it wiggles wildly? Then, partitioning method will be more and more unreliable as these wild changes can occur in very small regions, where partitioning should be finer and finer in this case, in order to catch the variations. So the assumption we need is naturally an assumption about the function's "niceness": Lipschitz continuity. If your function is Lipschitz continuous, it can not change "too fast" from one coordinate to another, that is, at least the change of it is bounded by the distance you want to move. Here you have it, \begin{align} |f(x) - f(y)| \leq L |x - y| \end{align} As you can see, it means that moving from $x$ to $y$ on the function is in some way bounded by $|x-y|$. Now it depends on $L$ to determine how crazy your function can be. See the illustration.

Let's prove a very simple theorem. As I've said, we denote the solution provided by our simple algorithm with $\bar{x}$, the algorithm being simulating a grid then reporting back the minimum.

Theorem. Let $f$ be a Lipschitz continuous function. Then, \begin{align*} f(\bar{x}) - f(x^*) \leq \frac{L}{2p} \end{align*}

The theorem says two intuitive things: First, the bound depends on $L$, the crazier your function, the more you can end up in a distant place. And again intuitively, it is clear that, the bound is linearly dependent to grid size. Let's prove it.

Proof. We have $x_0,\ldots,x_{p}$ with $x_k \in [0,1]$ for each $k \in \{0,\ldots,p\}$. Surely, we can pick $i_1$ and $i_2$ such that $x \equiv x_{i_1} \leq x^* \leq x_{i_2} \equiv y$. So the minimum belongs to an interval: $x^* \in [x,y]$. And since we formed a regular grid, we have $|x-y| \leq \frac{1}{p}$. Let's denote the midpoint of this interval as $\hat{x} = \frac{x+y}{2}$. Now, let's form a new point: It will be the point to which $x^*$ is closer, i.e. either $x$ or $y$. Mathematically, \begin{align*} \tilde{x} = \left\{ \begin{array}{ll} x & \mbox{if } x^* \geq \hat{x} \\ y & \mbox{otherwise} \end{array} \right. \end{align*} So we have $|\tilde{x} - x^*| \leq \frac{1}{2p}$. Now, notice that $\tilde{x}$ belongs to the grid, so for $f(\bar{x})$, the reported minimum, it should be the case that $f(\bar{x}) \leq f(\tilde{x})$. From here it follows that, \begin{align*} f(\bar{x}) - f(x^*) \leq f(\tilde{x}) - f(x^*) \leq L |\tilde{x} - x^*| \leq \frac{L}{2p}. \end{align*} $\blacksquare$

Source: The superb book, Introductory Lectures on Convex Optimization.

almost stochastic

2016/09/23

A simple bound for optimisation using a grid

1 comments:

Post a Comment

search

mathematics village

archive

band pass filter

links