I don’t remember where I heard this arguement but it was a pretty interesting one and I couldn’t remember the entire reasoning and so as I began to think about it, I figured some of it out and I believe it is pretty reasonable. We have all argued that neural networks are difficult to optimize because they are non-convex(or non-concave) for that matter. What we don’t talk about so much is why is it that they are non-convex.
I am going to , inspired by the course on optimization that I am doing this semester, talk a bit about strong convexity and strong smoothness and our very popular gradient descent works on them. So, before going right into the details let’s have a quick chat about convexity in general and we do have a few ways of going about it.
I will go about with talking about two definitions of convex functions, the first one being general more than the second.