Back to blog

Mar 05, 2026 · 2 min read

Machine learning for lead and sea ice classification

A method-oriented note on feature design, model choices, and validation ideas for lead and sea ice classification in altimetry workflows.

GeoAIClassificationCryosphere

Lead and sea ice classification is a key step in many altimetry workflows because leads can provide reference points for estimating local sea surface height. Machine learning can help combine waveform, elevation, and contextual features, but careful validation is essential.

Candidate features

Useful predictors may include:

  • waveform shape metrics
  • peakiness or echo width
  • backscatter-related variables
  • corrected elevation anomaly
  • local spatial variability
  • distance to coast or ice edge
  • auxiliary sea ice concentration

Model choices

Simple models are often good starting points. A random forest, gradient boosting model, or logistic regression baseline can make feature behavior easier to inspect before moving to deep learning.

from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(
    n_estimators=200,
    max_depth=12,
    random_state=42,
)

# X_train contains waveform and geospatial features.
# y_train contains reference labels such as "lead" and "sea_ice".
model.fit(X_train, y_train)

Evaluation metrics

Classification accuracy alone may be misleading if leads are rare. Precision, recall, F1 score, and confusion matrices are usually more informative:

F1=2precisionrecallprecision+recallF_1 = 2 \cdot \frac{precision \cdot recall}{precision + recall}

Practical notes

The goal is not only to maximize a score. For freeboard retrieval, classification errors can propagate into local sea surface estimation. A conservative model with interpretable failure modes may be more useful than a complex model that is hard to diagnose.

This article is a learning note and does not report new experimental results.