Paralelización eficiente en métodos de núcleo

  1. Díaz Morales, Roberto
Supervised by:
  1. Angel Navia Vázquez Director

Defence university: Universidad Carlos III de Madrid

Fecha de defensa: 10 February 2016

Committee:
  1. Inmaculada Mora Jiménez Chair
  2. Jerónimo Arenas García Secretary
  3. Rubén Casado Tejedor Committee member

Type: Thesis

Abstract

Kernel methods are very popular techniques in the Machine Learning area because they can produce highly competitive results in many practical tasks. The main reason of this success is their ability of creating non linear solutions very easily, transforming the input data space onto a high dimensional space where inner products between projected vectors can be computed using a kernel function. Among the most extended techniques we have Support Vector Machines (SVMs), that performing very well in classification problems, and Gaussian Processes (GPs), that are widely used in regression problems. Kernel methods have two main weaknesses that limit their practical application. The first one is the computational cost associated with their training procedure, which is O(n3) being n the number of samples in the training set. Due to this scalability problem, the run time obtaining the models is too high when working with large scale data and the field of action of this family of algorithms is reduced. The second problem is associated to their non parametric nature. In kernel methods the complexity of the models is not preset in advance, the machine size depends on the training set size. This ability made them very popular, but the resulting models are often very slow processing new samples when the training set contains a high number of samples. In recent years, a number of factors including social networks, electronic commerce, movie devices and collaboration using the new technologies have led to a huge increase in the dataset's size. We have gone from the problem of not having enough data to solve a task to the opposite problem of having so many data that is very costly and complicated to process them. If, to the fact that currently there are much larger datasets, we add the problem that kernel methods have about computational cost and scalability, we find that the field of action of these techniques has been drastically reduced. Fortunately, new research lines such as parallel architectures and semi-parametric approximations of the models have also emerged to allow us to face this kind of problems. On the one hand, the number of processing cores in computers has increased significantly, giving rise to new research lines to adapt classical machine learning techniques to a new parallel scenario in which multiple processing resources work simultaneously following the principle that problems can be divided in smaller subtasks that can be done at the same time. On the other hand, semi-parametric solutions are approximations that fix the complexity of the resulting models in advance and thanks to them the problem of obtaining a solution too slow to process new samples is avoided. Using these approximations we can also reduce the computational cost and the run time of the training procedure. This PhD Thesis make use of these research lines and presents new parallel algorithms for multicore and multiprocessor systems of non parametric and semi parametric versions of different kernel methods algorithms. First of all, we show a new parallel schema that takes advantage of the matrix formulation to solve kernel methods. This schema proposes some low and high level design criteria to achieve an efficient parallelization when implementing the procedures that are carried out to solve this kind of algorithms. With the aim of showing the usefulness of these techniques and to join the different research lines related with the scalability, some algorithms have been implemented: a semi-parametric version of SVMs and two GPs solvers (a semi-parametric one and a full non parametric version). Results obtained when benchmarking these algorithms evidence the usefulness and efficiency of these techniques. Secondly and after analyzing the main limits of the previous model, that made use of a parallel block matrix inverse to solve linear systems, a new schema based on a parallel Cholesky factorization has been proposed. This new architecture improves the numerical stability and it is more computationally efficient. Using this schema, we have implemented two algorithms, a full and a semi-parametric SVM, and the results show the superiority over the previous model. In addition, we show some indirect contributions that correspond to results and prizes obtained in machine learning competitions thanks to the knowledge about parallelization arising from this work. This document concludes describing the main contributions and some suggestions about new research lines that emerge from this work