Hi,

on programming question 1c, should we use all 10,000 training image? its taking a long time…

For 1c, use n = 1000.

**Answer:**

I think they meant to use n = 1000.

Anyway, in order to improve run-time of 1c, one may keep in memory all the distances for all pairs: (test-sample, training-sample)

As a result, in 1c, don't use the function you wrote in 1a, but write another one that is able to use those distances.

If one can do it without code duplication, I would love to hear !

**Closely related question:**

In 1d - fix k, run for all n =100,200,…5000

If we normalize the test set and the training set for every n separately, then this naive optimization is not relevant.

What should we do ?

Note that it is reasonable to normalize all training and test samples according to the first 100 training samples, is it acceptable ?

No one said anything about normalization. Perhaps you are confusing this exercise with a similar one from last year.

You are free to *additionally* use normalization if you like, as long as you report the results without it as well, and you explain exactly what you did.

Shouldnt the normalization just be to divide by 255 since we know that the values go from 0 to 255?

I think the normalization should make all the dimensions to have similar distribution -

in order for the L2 norm to be meaningful.

For an extreme case, imagine the first coordinate would be in the range 0-1000, while the second coordinate is in the range 0-1.

Then the second coordinate is neglect-able while calculating L2 distance.

Anyway, specifically dividing the whole vector by 255:

It changes all the distances by a constant - (255) - thus, resulting in exactly the SAME algorithm.

BTW, in this question, the normalization makes quite a big difference (more than 0.2) in accuracy.

Of course i wasnt talking about the general case. And you are right about 255 normalization.

But as for normalizing based on what you are seeing in the samples- lets say for example we have 4 pictures of 2 pixels each (greyscale 0-255 like in the question):

[0,0], [1, 0], [0, 100], [0, 255]

You are saying that the distance between [0,0] and [1,0] should be given more weight than the distance between [0,0] and [0,100], because [0,255] exists.

While this might be true if the indices meant different things, with different units, in this case obviously [0,0] and [0,1] are more similar (you probably couldnt even distinguish between them on a screen)