Fast Prediction for Large-Scale Kernel Machines

Cho-Jui Hsieh, Si Si, Inderjit Dhillon

Abstract:   Kernel machines such as kernel SVM and kernel ridge regression usually construct high quality models; however, their use in real-world applications remains limited due to the high prediction cost. In this paper, we present two novel insights for improving the prediction efﬁciency of kernel machines. First, we show that by adding “pseudo landmark points” to the classical Nystrom kernel approximation in an elegant way, we can signiﬁcantly reduce the prediction error without much additional prediction cost. Second, we provide a new theoretical analysis on bounding the error of the solution computed by using Nystrom kernel approximation method, and show that the error is related to the weighted kmeans objective function where the weights are given by the model computed from the original kernel. This theoretical insight suggests a new landmark point selection technique for the situation where we have knowledge of the original model. Based on these two insights, we provide a divide-and-conquer framework for improving the prediction speed. First, we divide the whole problem into smaller local subproblems to reduce the problem size. In the second phase, we develop a kernel approximation based fast prediction approach within each subproblem. We apply our algorithm to real world large-scale classiﬁcation and regression datasets, and show that the proposed algorithm is consistently and signiﬁcantly better than other competitors. For example, on the Covertype classiﬁcation problem, in terms of prediction time, our algorithm achieves more than 10000 times speedup over the full kernel SVM, and a two-fold speedup over the state-of-the-art LDKL approach , while obtaining much higher prediction accuracy than LDKL (95.2% vs. 89.53%).