How the models work.

A technical overview of DepoDart's machine learning approach to mineral prospectivity mapping — written for geoscientists and data scientists who want to understand what is happening inside the model before recommending it to their team.

See the platform

Problem Framing

Mineral prospectivity mapping is a spatial classification problem: given a set of geoscientific observations across an area, predict where undiscovered mineral deposits are most likely to occur. The challenge is severe class imbalance. Known deposits are rare — often one to three occurrences per district — while non-mineralised terrain constitutes the overwhelming majority of the area of interest. Standard supervised classification approaches fail under this imbalance because they optimise for overall accuracy, which can be achieved by predicting "no deposit" everywhere.

DepoDart addresses this through a combination of positive-unlabelled (PU) learning and ensemble methods that explicitly model the asymmetry between known positives (confirmed deposits or mineralised intersections) and unlabelled locations (which may be undiscovered deposits or truly barren ground). This framing is more geologically honest than treating all non-deposit locations as negatives.

Training Data

The model trains on two data sources: the client's project data and, where available, publicly accessible geological survey data from government agencies (Geological Survey of Canada, USGS, Geoscience Australia, and state geological surveys).

Client data provides the local geological context — the specific combination of features that characterizes the deposit type being targeted in that geological setting. Public survey data supplements sparse client datasets by providing regional geophysical and geochemical baselines. All public data is reprojected and normalised to match the client's coordinate system and resolution before use.

Known mineral occurrences from client drill logs or from public mineral occurrence databases (MINFILE, MRDS, MIRIS) serve as positive training labels. The model learns from these occurrences — not from generic deposit type templates.

Feature Engineering

Raw geoscientific rasters are not fed directly to the model. A feature engineering stage extracts geologically meaningful derivatives from each input layer. For geophysics, this includes reduced-to-pole magnetic anomalies, vertical derivatives, tilt derivatives, and analytic signal amplitudes. For geochemistry, this includes element ratios, pathfinder element anomalies, and multivariate anomaly indices. For structural geology, this includes proximity to interpreted faults, lineament density, and structural complexity measures.

The feature set is constructed to capture the multi-scale nature of mineralising systems: deposit-scale features (direct geophysical response of the orebody), district-scale features (alteration halos, structural corridors), and regional-scale features (terrane boundaries, craton margins). Each input layer may contribute multiple derived features, and the model learns which features are most predictive for the target commodity in the given geological setting.

Model Architecture

DepoDart uses an ensemble of gradient-boosted decision trees as the primary model, combined with a spatial cross-validation scheme that prevents geographic data leakage. Gradient boosting was selected over deep learning alternatives for this application because: (1) it performs well on tabular geoscientific data without requiring large training sets, (2) it provides native feature importance rankings that are interpretable by geoscientists, and (3) it is robust to the mixed data types present in geoscientific datasets (continuous, categorical, and ordinal features).

The ensemble produces a probability estimate at every point, which is then calibrated using Platt scaling to ensure that stated probabilities correspond to empirical mineralisation rates. A point scored at 0.80 should host a deposit approximately 80% of the time when evaluated over many such points across different projects.

Uncertainty Quantification

Every prospectivity surface is accompanied by an uncertainty map that quantifies model confidence at each location. Uncertainty arises from two sources: aleatoric uncertainty (irreducible noise in the data) and epistemic uncertainty (model uncertainty due to sparse training data).

Epistemic uncertainty is estimated through bootstrap resampling of the training set. The model is retrained on multiple bootstrap samples, and the variance in predictions across samples is used as an uncertainty estimate. High-variance locations — where different bootstrap models disagree — indicate areas where additional data collection would most improve target confidence. Low-variance, high-probability locations are the most defensible drill targets.

The uncertainty map is delivered as a separate GeoTIFF alongside the prospectivity surface. Geologists are encouraged to use both layers jointly: targeting high-probability, low-uncertainty zones first, and treating high-probability, high-uncertainty zones as data-collection priorities rather than immediate drill targets.

Validation

Model validation uses spatial cross-validation rather than random holdout. In spatial cross-validation, the area of interest is divided into geographic blocks, and models trained on some blocks are evaluated on held-out blocks. This prevents data leakage from spatial autocorrelation — a known failure mode where models appear to generalise well in random holdout tests but fail on genuinely new ground.

Where client datasets include multiple known occurrences, a leave-one-out validation is performed: the model is trained with one known occurrence removed from the training set, and evaluated on its ability to rank that held-out occurrence in the top decile of the prospectivity surface. Clients receive the validation results as part of the data provenance report.

We are transparent about the limitations of validation with sparse training data. In areas with fewer than three known occurrences, validation is inherently limited and uncertainty estimates should be weighted accordingly.

AI prospectivity mapping does not replace geological judgment. The models identify statistically anomalous feature combinations — they do not understand geological processes. In geological settings with no analogues in the training data, model performance degrades and uncertainty is high. Results should always be reviewed by a qualified geoscientist before informing drill program decisions.

DepoDart discloses model limitations as part of every pilot deliverable. We do not claim certainty we cannot support, and we encourage clients to stress-test our outputs against their own geological knowledge of the area.

Questions about the methodology?

We are happy to discuss the technical details of our approach before any pilot commitment.

See the platform How a pilot works