Missing data is a key problem in numerous scientific fields, severely degrading the power of learning and inference algorithms, frequently causing erroneous decision making and loss of accuracy. In the emerging era of big data, gaps in data increase exponentially, and manual handling is impossible, hence creating a big missing data problem. As a key solution, the KERNEO project will develop the next generation machine learning algorithms for big missing data as a game-changer in future knowledge extraction. This will be achieved by a highly novel approach whereby the versatility of kernel methods will be synergistically cross-fertilized by the probabilistic nature of information theoretic learning for big data latent (missing) variable analysis. Earth observation (EO), a field where missing data is extremely common e.g. due to clouds, and where data is big, will serve as the test bed for KERNEO, focusing in particular on tropical forest monitoring using the coming Sentinel satellites. In EO, ad-hoc solutions, like simply discarding missing values, are implemented in the analysis to handle cloud-contaminated images, thereby ignoring valuable information. The KERNEO next generation missing data machine learning tools, providing superior knowledge extraction on challenging missing data scenarios, will be highly innovative in EO monitoring and will moreover translate to scientific fields far beyond EO. KERNEO is high risk because of the profound challenges and interdisciplinary nature of the endeavor, yet feasible due to the high quality of the PI and the team, the extensive mobility, and the unique network of researchers in kernel-based machine learning, statistics, computer science and EO, creating the synergy effects needed in order to reach the ambitious project objectives.
Project leader: Robert Jenssen
Institution: Institutt for fysikk og teknologi