EAS ONLINE LEARNING
One common approach that deal with rich classes in the realizable case is to relax the require-
ment of uniform bound to point-wise bounds that could depend on individual hypotheses in the
class. Perhaps the most well-known setup of this flavor in learning theory literature is non-uniform
PAC learnability and the concept of Structural Risk Minimization by Vapnik and Chervonenkis
(1974); Vapnik (2013). A more related scenario in the online learning setup was studied in the
theory of learning recursive functions. In this setup, one assumes the class H to be functions from
N → {0, 1}, and the instances are presented sequentially as {1, 2, · · · }. A class H is said to be
recursively learnable if there exist a computable learner such that the number of errors is finite, no
matter what model is selected from H. See Zeugmann and Zilles (2008) for a survey on this topic.
This paper generalizes the deterministic finite error setting in the recursive function learning
literature with a more realistic randomized setup. Instead of the learner observing instances in a
predetermined order, we assume that they are sampled from a distribution µ over X independently
at each time step. A class H of functions from X → {0, 1} is eventually almost surely (or eas for
abbreviation) online learnable with respect to µ, if there exists an online learner that makes only
finite many errors with probability 1, no matter what function is choosen from H.
The conceptual basis of our problem setup lies in the almost sure hypothesis testing framework
studied in the statistics and information theory communities. See Cover (1973); Dembo and Peres
(1994); Kulkarni and Tse (1994); Santhanam and Anantharam (2015); Wu and Santhanam (2019)
for a sample of results along this line in hypothesis testing, classification and prediction. There
has been work in this line along the computability angle as well, see for example Leshem (2006).
We make a note of other related work on an infinite horizon machine learning framework Mitchell
et al. (2018), the universal prediction framework in Merhav and Feder (1998), and randomized
online learning scenario in Haussler et al. (1994) as well. Novel modifications to the Littlestone
dimension have also been considered, see for example, Sayedi et al. (2010); Li et al. (2011); Zhang
and Chaudhuri (2016).
Characterization Our first result fully characterizes eas-online learnability in the binary label
case. We show that for all instance spaces X satisfying a regularity condition (satisfied by most
common spaces, including e.g. the Borel space over R
d
) and distributions µ with potentially infinite
support, a class H of measurable functions from X → {0, 1} is eas-online learnable if and only
if H is effectively countable. Informally, a class H is countable if we do not distinguish between
hypotheses that agree with probability 1 under µ (see Definition 3 for a more complete definition).
Applying our characterization to the class T of all linear threshold functions from [0, 1] →
{0, 1}, i.e. functions of form
h
a
(x) =
(
1, if x ≥ a
0, otherwise,
shows that T above is not eas-online learnable for any probability measure over [0, 1] that admits
a continuous density. However, the class T has VC dimension 1. For a uniform distribution over
[0, 1], an empirical risk minimization approach will result in an online learning strategy for the
linear threshold functions. The probability of making an error with this approach at time step n is at
most O(
log n
n
). Because T is not effectively countable, and hence not eas-online learnable, we can
conclude immediately that the empirical risk minimization bound cannot be improved from
log n
n
to
e.g.
1
n log
1+
n
with any > 0. If it could, the Borel-Cantelli lemma would imply that the class T
with uniform distribution would be eas-online learnable, since 1/n log
1+
n is summable over n.
2