In many ways, ESS present ideal use cases for ML applications because the problems being addressed—like climate change, weather forecasting, and natural hazards assessment—are globally important; the data are often freely available, voluminous, and of high quality; and computational resources required to develop ML models are steadily becoming more affordable. Free computational languages and ML code libraries are also now available (e.g., scikit-learn, PyTorch, and TensorFlow), contributing to making entry barriers lower than ever. Nevertheless, our experience has been that many young scientists and students interested in applying ML techniques to ESS data do not have a clear sense of how to do so. The Tools of the Trade
An ML algorithm can be thought of broadly as a mathematical function containing many free parameters (thousands or even millions) that takes inputs (features) and maps those features into one or more outputs (targets). The process of “training” an ML algorithm involves optimizing the free parameters to map the features to the targets accurately. Books and classes about ML often present a range of algorithms but leave people to imagine specific applications of these algorithms on their own.Books and classes about ML often present a range of algorithms that fall into one of the above categories but leave people to imagine specific applications of these algorithms on their own. However, in practice, it is usually not obvious how such approaches (some seemingly simple) may be applied in a rich variety of ways, which can create an imposing obstacle for scientists new to ML. Below we briefly describe various themes and ways in which ML is currently applied to ESS data sets (Figure 1), with the hope that this list—necessarily incomplete and biased by our personal experience—inspires readers to apply ML in their research and catalyzes new and creative use cases.
Applications in Earth and Space Sciences Fig. 1. Ten ideas for applying machine learning (ML) in the Earth and space sciences, roughly organized by the degree of involvement of physics-based models (horizontal scale) and the degree to which ML codes are available and readily applicable versus being in development and requiring significant customization (vertical scale). Credit: Jacob Bortnik
1. Pattern Identification and Clustering
Supervised learning is more commonly used in ESS, although it has the disadvantage that it requires labeled data sets (in which each training input sample must be tagged, or labeled, with a corresponding output target), which are not always available. Unsupervised learning, on the other hand, may find multiple structures in a data set, which can reveal unanticipated patterns and relationships, but it may not always be clear which structures or patterns are “correct” (i.e., which represent genuine physical phenomena). There are two broad categories of ML algorithms relevant in most ESS applications: supervised and unsupervised learning (a third category, reinforcement learning, is used infrequently in ESS). Supervised learning, which involves presenting an ML algorithm with many examples of input-output pairs (called the “training set”), can be further divided, according to the type of target that is being learned, as either categorical (classification; e.g., does a given image show a star cluster or not?) or continuous (regression; e.g., what is the temperature at a given location on Earth?). In unsupervised learning, algorithms are not given a particular target to predict; rather, an algorithm’s task is to learn the natural structure in a data set without being told what that structure is.
Conversely, atypical signals may be teased out of data by first identifying and excluding typical signals, a process called anomaly or outlier detection. This technique is useful, for example, in searching for signatures of new physics in particle collider experiments. One of the simplest and most powerful applications of ML algorithms is pattern identification, which works particularly well with very large data sets that cannot be traversed manually and in which signals of interest are faint or highly dimensional. Researchers, for example, applied ML in this way to detect signatures of Earth-sized exoplanets in noisy data making up millions of light curves observed by the Kepler space telescope. Detected signals can be further split into groups through clustering, an unsupervised form of ML, to identify natural structure in a data set.
In many instances, however, predicting a single time series of data is insufficient, and knowledge of the temporal evolution of a physical system over regional (or global) spatial scales is required. This spatiotemporal approach is used, for example, in attempts to predict weather across the entire globe as a function of time and 3D space in high-capacity models such as deep neural networks. 2. Time Series and Spatiotemporal Prediction An important and widespread application of supervised ML is the prediction of time series data from instruments or from an index (or average value) that is intended to encapsulate the behavior of a large-scale system. Approaches to this application often involve using past data in the time series itself to predict future values; they also commonly involve additional inputs that act as drivers of the quantities measured in the time series. A typical example of ML applied to time series in ESS is its use in local weather prediction, with which trends in observed air temperature and pressure data, along with other quantities, can be predicted.
News Highlights Space
- Headline: 10 Ways to Apply Machine Learning in Earth and Space Sciences
- Check all news and articles from the Space news information updates.