Personal tools
You are here: Home
Navigation
Log in


Forgot your password?
 
Document Actions

Machine Learning Working Group

by admin last modified 2008-01-30 16:24

NCEAS Project 10921: Machine learning for the environment

Using this Site

If you're a working group member, login to access collaboration features.

Then check out the screencast which will walk you though the site features.

Abstract

We believe that environmental science, ecology, and conservation biology would be greatly enriched by expanding the ecologist?s analytical toolbox to include machine learning (ML) approaches to data analysis. We use the term ML loosely to distinguish between parametric statistics and a variety of new, computational methods for recognizing and analyzing patterns in data. Generally, parametric methods assume highly restrictive theoretical properties of data, such as additivity, linearity, independence, and distribution (e.g., normality). Ecological data, by contrast, represent highly complex systems and commonly violate these assumptions [1-3]. Unfortunately, failure to appreciate these subtleties of ecological data often results in misguided analysis and incomplete or incorrect conclusions. In recent years, ML researchers have developed techniques for analyzing data not suited to parametric statistics. Older machine learning algorithms include neural networks and decision trees. Now, newer techniques like boosting and kernel methods (e.g., support vector machines), provide new opportunities for extracting subtle patterns from complex data, while hybrid methods integrate parametric models and ML to exploit computation and hard-won biological understanding simultaneously. Despite successes elsewhere (e.g., bioinformatics, astrophysics) ML has not been widely adopted by ecologists. Complex situations that might be addressed with ML include identifying optimal policies for managing ecological systems under uncertainty, forecasting, nonlinear modeling, and scientific inference with non-independent data. Accommodating these scientific and statistical difficulties within parametric statistics ranges from cumbersome to impossible. Therefore, we propose a working group to identify obstacles, scope out promising research, produce case studies, and develop a book length tutorial for ecologists on the practical application of ML.

 

Powered by Plone CMS, the Open Source Content Management System