Part of the SGD algorithm is to randomly uniformly pick an index and take the point of that index.
In uniform distribution, F(x) = (x-a)/a-b, which means the higher the x (the index in that case) it has a greater chance to be picked.
Isn't that a problem, since it creates a high chance that not only we will train only on part of the data, but it might be a small part, from the higher indexes?
since we discuss on q3 I have now another question - I tried to run task (a) with etas in range (1^-10,1^10), and range (1,10), and in both cases I got what seems to be non functional behavior (which means it looked like I got a random accuracy for each eta). should I be worried?