Hi!
i didn't understand, for example, at section 2a (in this question) about the decision tree:
what does it mean that "now the best classifier is v+x0"?
if for example our classifier is "if a<0 y = -1 else 1", and we have points x1 = -1 and x2 = 1, and our tree has only 1 decision node; then x1 will get y1 = -1, and x2 will get y2 = 1.
now lets choose x0 = 2.
for the same tree we will get now that T(x1) and T(x2) will have the same labels - which means a different classification.
where am i wrong?