Lecture 2: A new model for multiclass boosting
Many natural problems in machine learning are best thought of as multiclass classification problems. Examples include recognizing a digit from its image, natural language processing tasks such as partof- speech tagging, and object recognition in vision.
Boosting refers to a general technique of combining rules of thumb to form highly accurate classifiers. The exact requirements on a rule of thumb, also known as weak-learner, are well understood in the case of binary classification: any algorithm that predicts better than random on any distribution over the training set is said to satisfy the weak learning assumption. For multiclass classification, no such definition has been proposed. although many sufficient conditions have been implicitly assumed.
We provide a weak learning definition that is both necessary and sufficient for boosting to be possible in the multiclass setting. Our assumptions are strictly weaker than those made by popular algorithms like Adaboost.MH. At the same time, they are strong enough to not only drive down training error exponentially fast with the number of rounds of boosting, but can handle asymmetric loss functions as well. Such loss functions are appropriate in certain settings e.g. spam detection, where the cost of a false positive is potentially far more than that of a false negative.