Regarding question 3 section C, shouldn't we require that $w^* \in K$?

I might be missing something, but let's assume that $w^* \notin K$, then we can choose $K$ in such a way that there is distance at least some $\varepsilon$ between $K$ and $w^*$. (For instance all vectors with norm at most $||w^*|| - \varepsilon$).

Let's mark with $\hat{w} = arg min_{k \in K} f(k)$, that is the vector in $K$ which brings f to minimum. since every other vector in $K$ will have an f value greater than the one of $\hat{w}$, so will the average $\bar{W}$ (since the average is also in $K$), and also it's exepctancy.

This means that $E[f(\bar{W})] \geq f((\hat{w}))$ because every value is taken from inside $K$.

If we assume f is continuous we get that there is some $\delta > 0$ such that $f(\hat{w}) = f(w^*) + \delta$ since $w^*$ is the absolute minimum. In other words the value of $f(\hat{w})$ will never be any closer than $\delta$ to the value of $f(w^*)$.

Concluding everything togther we get the $E[f(\bar{W})] \geq f(w^*) + \delta$ no matter the value of T.

This clearly contradicts theorem 1.1 which states that the distance between those two values grows closer to zero when T goes to infinity.

What am i missing here?

Thanks.