It said $b \sim N(0_q, r^2 I_q) \Rightarrow Zb \sim N(0_n, r^2 Z Z^t)$, but why did we multiplied by Z Z^t, and not Z?

Why Z? This is from properties of the covariance matrix.

In that question (3.2), why do we need the distribution of Zb in order to find the distribution of y?

since Z is given the only variables that their distribution affects y should be b and e.

therefore, I was thinking that the distribution would be P(b)*P(e), with the assumption they are independent.

why (and where) am I that wrong?

b affects y, but through Zb. We're asking - what is a the probability to get a certain y. For that, we have to derive the distribution of y. For this, we go through Zb.

and in the same exam,

Q2, part 2, in general: in section a, they equal, because it only influence $\mu$, we can set $\mu \rightarrow T(\mu)$, but isn't it actually *changing* the value of the expectation?

because in part b, it said they different, because we need to change also the value of the initial variation, but can't we set also $\sigma \rightarrow T(\sigma)$?

No. Work out the posterior probabilities, and they have different weights.

I'm really don't see wheres the difference, could you please be more specific?

The posterior probability of $\mathbf{x}_i$ to belong to cluster $j$ is proportional to $p_j f_j(\mathbf{x}_i)$, where $f_j(\mathbf{x}_i)$ is the Gaussian pdf with variance 1, which is:

(1)so the posterior is

(2)If we would transform each point by scaling it with $a$, the pdf we will get is

(3)Which is simply not the same number. If we would have initialized the EM with variance of $a^2$ for each cluster, we would get the same number.