Hi, I have a question about proof 2.2 which was shown during the recitation.

During the proof that shows the class of all rectangles in $R^2$is PAC-learnable we constructed stripes $T_i$, and required that $\forall i: D(T_i) \leq \frac{\varepsilon}{4}$. We then proceed to argue that if one of our sample points will be placed within

the stripe $T_i$, our output rectangle $R'$ will be close enough to $R$ in the corresponding edge, in a way that the space between $R$ and $R'$ has probability lower than $D(T_i)$ which is lower than $\frac{\varepsilon}{4}$.

After that we try to argue that the probability that no sample point is selected inside $T_i$ is small. The proof states that $P(x\notin T_i) = 1 - \frac{\varepsilon}{4}$, **and this is the part i don't understand.**

Assuming that $P(x\notin T_i) = 1 - \frac{\varepsilon}{4}$ means that $P(x \in T_i) = \frac{\varepsilon}{4}$ but we only chose $T_i$ such that $D(T_i) \leq \frac{\varepsilon}{4}$. It is very possible that $P(x\in T_i) = \frac{\varepsilon}{100}$

for instance, and this creates a problem further on when we want to upper bound $1 - P(x\in T_i)$ with $1-\frac{\varepsilon}{4}$ but we only have an upper bound on $p(x\in T_i)$ and not a lower bound on it, as we would like.

The proof tries to tackle this problem by choosing the largest stripe $T_i$ such that $D(T_i) \leq \frac{\varepsilon}{4}$, but to my understanding this doesn't mean that $D(T_i) = \frac{\varepsilon}{4}$ without assuming that the Cumulative distribution function of D is continuous, however we state that we assume nothing on D.

For instance D could have a Cumulative distribution function which is as step function, such that up to a certain point the probability to be left of the stripe is $\frac{\varepsilon}{100}$ and from that point on the probability to be left of the stripe is $\frac{1}{2}$.

intuitively we would want to have a lower bound on $D(T_i)$ so we could say that the probability that no sample point landed in $T_i$ is small, since without such lower bound, one can compose a devious example such that $D(T_i) = \frac{1}{m}$ making

$P(x \notin T_i)^m = (1-\frac{1}{m}) ^m = e$.

Shouldn't we be interested in the smallest stripe $T_i$ that still satisfies$D(T_i) \geq \frac{\varepsilon}{4}$ rather than the largest that satisfies $D(T_i) \leq \frac{\varepsilon}{4}$?

This could work better since the Cumulative distribution function is continuousfrom from the right (but not necessarily from the left).