Hi,

1. Regarding to programming Q3, we saw in class a theorem about the convergens of the SGD when it output the average of the weight vector (across iterations). In practice we saw that when we output the average weight vector we get lower accuracy on the validation set compare to outputting the weight vector of the last iteration. WHen can think on several reasons why it's happen, but in practice what we should output in this case?

2. In Thoery Q4c, you said to notice that alpha 1 should be lower than C, we can't see how does it effect the answer (we took derivative of the lagrngian according to alpha 1 and compare it to zero). Can you clarify ?

Guy Oren