tag:blogger.com,1999:blog-8730014808678923743.comments2018-04-25T03:03:22.048+03:00almost stochasticDenizhttp://www.blogger.com/profile/06043980704154563908noreply@blogger.comBlogger26125tag:blogger.com,1999:blog-8730014808678923743.post-45729753745819537752018-02-12T02:13:54.702+03:002018-02-12T02:13:54.702+03:00Hi. I did not understand the first part of the com...Hi. I did not understand the first part of the comment. But I'd love to see other analytically tractable nonGaussian models, if there are any.Denizhttps://www.blogger.com/profile/06043980704154563908noreply@blogger.comtag:blogger.com,1999:blog-8730014808678923743.post-55566170765090171412018-02-11T13:44:38.504+03:002018-02-11T13:44:38.504+03:00I think you could show the sequential nature of th...I think you could show the sequential nature of the Bayes' theorem and from that, show the Kalman filter as a special case of it. There are other examples of sequential inference in reliability analysis with some uncommon probabilistic models.Unknownhttps://www.blogger.com/profile/06831229622709608338noreply@blogger.comtag:blogger.com,1999:blog-8730014808678923743.post-20097024398039405212017-11-11T16:10:54.770+03:002017-11-11T16:10:54.770+03:00thankuthankuPatricia J. Blanknoreply@blogger.comtag:blogger.com,1999:blog-8730014808678923743.post-12107263637550878072016-08-30T20:18:38.206+03:002016-08-30T20:18:38.206+03:00Thanks for the feedback. I will add them whenever ...Thanks for the feedback. I will add them whenever I find some time.Denizhttps://www.blogger.com/profile/06043980704154563908noreply@blogger.comtag:blogger.com,1999:blog-8730014808678923743.post-41215759860716341582016-08-30T19:14:42.758+03:002016-08-30T19:14:42.758+03:00your post would be better if you were to mention a...your post would be better if you were to mention all 'regularity conditions' in order to be able to reverse the sequence of differentiation/integration (used to move from in-between eq. 2 and 3, to 3 above)Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-8730014808678923743.post-81526536199378872172016-07-19T18:37:36.566+03:002016-07-19T18:37:36.566+03:00thanks you very much.thanks you very much.Unknownhttps://www.blogger.com/profile/04298914636865244155noreply@blogger.comtag:blogger.com,1999:blog-8730014808678923743.post-33852213144617295872016-04-21T00:32:18.576+03:002016-04-21T00:32:18.576+03:00Ah thanks! Both of them are corrected.Ah thanks! Both of them are corrected.Denizhttps://www.blogger.com/profile/06043980704154563908noreply@blogger.comtag:blogger.com,1999:blog-8730014808678923743.post-36080810654233585752016-04-21T00:20:41.130+03:002016-04-21T00:20:41.130+03:00Thanks for this helpful post. It's worth menti...Thanks for this helpful post. It's worth mentioning that C < \infty . Also there's a repeated line in the equation array starting after "Let us compute the expectation:". Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-8730014808678923743.post-31565449773174399112016-02-25T22:56:25.035+02:002016-02-25T22:56:25.035+02:00Hi there, sorry i missed your comment, I am readin...Hi there, sorry i missed your comment, I am reading it. Thanks!Jie Tianhttps://www.blogger.com/profile/14603639500678494276noreply@blogger.comtag:blogger.com,1999:blog-8730014808678923743.post-22832325136610490352016-02-25T10:31:20.675+02:002016-02-25T10:31:20.675+02:00Hi,
Did you look at the paper Leon Bottou, Online...Hi,<br /><br />Did you look at the paper Leon Bottou, Online learning and stochastic approximations?Denizhttps://www.blogger.com/profile/06043980704154563908noreply@blogger.comtag:blogger.com,1999:blog-8730014808678923743.post-63551039747696595712016-02-25T07:14:04.831+02:002016-02-25T07:14:04.831+02:00hi there again, there are some subscripts that are...hi there again, there are some subscripts that are not very clear to me, for example: <br />the i and t in auxiliary sequences, μn<br />the second condition for $f$, where did you find this condition? why it is not $\theta^*$ in $\nabla_{\theta} f(\theta)$?<br />thank you!Jie Tianhttps://www.blogger.com/profile/14603639500678494276noreply@blogger.comtag:blogger.com,1999:blog-8730014808678923743.post-58383943724047947772016-01-03T12:52:29.249+02:002016-01-03T12:52:29.249+02:00You are welcome!You are welcome!Denizhttps://www.blogger.com/profile/06043980704154563908noreply@blogger.comtag:blogger.com,1999:blog-8730014808678923743.post-16686592395676419782016-01-01T19:08:46.606+02:002016-01-01T19:08:46.606+02:00thanks for the notes, this helps!thanks for the notes, this helps!Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-8730014808678923743.post-13104958713314013162015-10-22T19:05:34.864+03:002015-10-22T19:05:34.864+03:00Jie, for the second comment, yes you are right. In...Jie, for the second comment, yes you are right. In fact, the typo is in the definition of the gradient descent, it should be read as,<br />\begin{align*} \theta_{n+1} = \theta_{n} - \gamma_n \nabla_{\theta} f(\theta,\x) |_{\theta = \theta_{n}} \end{align*}<br />Thanks for catching it! Sorry for the late reply. In any case, you can check Bottou's paper: http://leon.bottou.org/publications/pdf/online-1998.pdf<br /><br />I think your proof will be valid for maximisation process with gradient ascent as soon as negative of the cost function you use satisfies the concepts introduced in the paper (general convexity).<br /><br />(correction about the algorithm may lead to another compilation error in the notation that I thought was a typo in Bottou's paper, but apparently it wasn't!).Denizhttps://www.blogger.com/profile/06043980704154563908noreply@blogger.comtag:blogger.com,1999:blog-8730014808678923743.post-79549218050083303342015-10-19T08:39:45.271+03:002015-10-19T08:39:45.271+03:00With rearrangements, we have, hn+1−(1−γnB)hn, I th...With rearrangements, we have, hn+1−(1−γnB)hn, I think here should be hn+1−(1+γn^2 B)hn, right?Jie Tianhttps://www.blogger.com/profile/14603639500678494276noreply@blogger.comtag:blogger.com,1999:blog-8730014808678923743.post-17266259207134904242015-10-19T08:18:27.096+03:002015-10-19T08:18:27.096+03:00So did you use the fixed step-size when proved the...So did you use the fixed step-size when proved the convergence of Discrete-time gradient descent? Now I know how to derive (10), but that stepsize should be $\gamma_{n+1}$, not $n$. So how can you get $\gamma_{n+1}$ in (10)? <br />And another question, if now I wanna use this proving process to a maximization problem, and my updating equation is $\theta_{n}=\theta_{n-1} + \gamma_n ∇θf(θ_{n-1})$, can I use this method? and how to adaptively rewrite it? thank you!Jie Tianhttps://www.blogger.com/profile/14603639500678494276noreply@blogger.comtag:blogger.com,1999:blog-8730014808678923743.post-44916546728588402842015-10-19T06:34:34.723+03:002015-10-19T06:34:34.723+03:00how do you have equation (10)? I tried some ways t...how do you have equation (10)? I tried some ways to rebuilt $h_{n+1} - h_{n}$ but can not get something like (10). if we expand h_{n+1} as (θn+1−θ∗)^2 ,and h_{n} as (θn−θ∗)^2, then replace θn with θn−1−γn∇θf(θn-1), and θn+1 with θn−γn+1∇θf(θn), but how to calculate γn+1?Jovitiannoreply@blogger.comtag:blogger.com,1999:blog-8730014808678923743.post-24969119660156623502015-08-12T16:16:37.873+03:002015-08-12T16:16:37.873+03:00Yes you are right. "$\gamma_n$ diverges"...Yes you are right. "$\gamma_n$ diverges" should be written as $\sum_{i=1}^\infty \gamma_n$ is divergent. Thanks for correction!<br /><br />As for your question, for simplicity, let's write $k_n = (\theta_n - \theta^*) \nabla_\theta f(\theta_n)$. And $\sum_n$ will denote the infinite sum over $n$. So the statement becomes, if $\sum_n \gamma_n = \infty$ (14) and $\sum_n \gamma_n k_n < \infty$ (13), it means $k_n \to 0$ (15).<br /><br />I'll try to make this rigorous proof by contradiction. Let us assume (13) and (14) holds, and $k_n$ doesn't converge to zero. This means there exists an $\epsilon > 0$ such that $k_n > \epsilon$ for all $n$. Then we can show (since we assumed (14) holds),<br />\begin{align*}<br />\sum_n k_n \gamma_n > \epsilon \sum_n \gamma_n = \infty.<br />\end{align*}<br />Assuming $k_n$ doesn't converge to zero, we arrived a contradiction with (13), one of our assumptions. Then $k_n \to 0$ should be the case.Denizhttps://www.blogger.com/profile/06043980704154563908noreply@blogger.comtag:blogger.com,1999:blog-8730014808678923743.post-62392182607105034382015-08-12T05:49:54.605+03:002015-08-12T05:49:54.605+03:00Hi, this is a nice post. Could you elaborate more ...Hi, this is a nice post. Could you elaborate more about the convergence (15) implied by (13) and (14) in a mathematically rigorous way? By "\gamma_{n} diverges" you actually mean that the series \sum_{n}\gamma_{n} diverges, right? This is because the sequence actually \gamma_{n} converges to zero. I still don't find a rigorous proof for the convergence (15) implied by (13) and (14). But my strategy is to first prove the left-hand side of (15) is bounded and then prove the upper limit of the left-hand side of (15) is less than or equal to zero (it's easy to see that the lower limit is zero).Unknownhttps://www.blogger.com/profile/02578770883598744054noreply@blogger.comtag:blogger.com,1999:blog-8730014808678923743.post-64849954828018948452013-10-23T12:24:05.892+03:002013-10-23T12:24:05.892+03:00You are welcome :)You are welcome :)Denizhttps://www.blogger.com/profile/06043980704154563908noreply@blogger.comtag:blogger.com,1999:blog-8730014808678923743.post-73738891236599540472013-10-18T16:40:17.542+03:002013-10-18T16:40:17.542+03:00sagol! :)sagol! :)Denizhttps://www.blogger.com/profile/06043980704154563908noreply@blogger.comtag:blogger.com,1999:blog-8730014808678923743.post-62232198280533689242013-10-18T12:27:16.200+03:002013-10-18T12:27:16.200+03:00Your blog is awesome dude! Good luck!Your blog is awesome dude! Good luck!Nevzat Yildizhttps://www.blogger.com/profile/13201906525797814894noreply@blogger.comtag:blogger.com,1999:blog-8730014808678923743.post-70096166728826751912013-10-16T22:33:41.225+03:002013-10-16T22:33:41.225+03:00This has been quite convenient and easy-to-underst...This has been quite convenient and easy-to-understand approach. Thanks for the explanation!Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-8730014808678923743.post-45672395741038030002013-05-26T12:20:33.959+03:002013-05-26T12:20:33.959+03:00Thanks, I added this as a note in the post.Thanks, I added this as a note in the post.Denizhttps://www.blogger.com/profile/06043980704154563908noreply@blogger.comtag:blogger.com,1999:blog-8730014808678923743.post-44734741805121784732013-05-26T11:56:11.507+03:002013-05-26T11:56:11.507+03:00Not only it makes sense to choose $p(\mathbf{y}|\m...Not only it makes sense to choose $p(\mathbf{y}|\mathbf{x},\theta)$ as ${q(\mathbf{y})}$, that is the best you can do. If you find the ${q(\mathbf{y})}$ that maximizes $\log \E_{q(\mathbf{y})} \left[ \frac{p(\mathbf{x}, \mathbf{y} | \theta)}{ q(\mathbf{y})} \right]$ by taking derivative and setting it to zero, you end up with $p(\mathbf{y}|\mathbf{x},\theta)$. bitkidokuhttp://www.blogger.com/profile/09141709808294868945noreply@blogger.com