// index

Uncertainty-Enabled
Earth Observation

Methods and standards for quantifying, encoding, and propagating uncertainty through spatial environmental models and geospatial web service chains.

[01]
OGC web service architecture diagram showing WPS WCS WFS chain with uncertainty encoding annotations

Building Interoperable Geospatial Web Services with Uncertainty Metadata

The geospatial community has invested decades in open service standards (OGC’s Web Processing Service, Web Feature Service, and Web Coverage Service among them) designed to make spatial data and processing accessible across organizational and technological boundaries. These standards largely succeed at their primary goal: a client that speaks the protocol can request a dataset or trigger a computation without knowing anything about the server’s internal software stack.

What the base standards do not address is uncertainty. A WCS coverage response returns a grid of values; the protocol has no native mechanism for accompanying each cell with an estimate of how uncertain that value is, or for a WPS process to declare that its outputs carry a certain type and magnitude of uncertainty. This gap is consequential when services are chained together in a processing pipeline. Uncertainty generated at the first step is invisible to every subsequent step, producing outputs whose nominal precision is higher than the data can support.

The Service Chain Problem

Consider a representative earth observation workflow:

  1. A land-surface temperature (LST) retrieval service ingests raw thermal infrared radiance from a satellite archive and applies an atmospheric correction algorithm to produce per-pixel LST values.
  2. A spatial aggregation service averages those LST values over administrative boundaries to produce regional summary statistics.
  3. A model service ingests the regional LST statistics alongside other climate variables to project vegetation stress indices.

At each boundary, the output of one service becomes the input of the next. If the LST retrieval service produces per-pixel uncertainty estimates (derived from atmospheric correction residuals and sensor calibration uncertainty) but has no standardized way to attach them to its output coverage, those estimates are discarded at the first service boundary. The aggregation service treats all LST values as equally precise, and the vegetation stress model has no basis for expressing confidence in its projections.

This is not a theoretical problem. It describes the routine state of most operational earth observation processing chains in use today. Uncertainty information is generated (implicitly or explicitly) at the sensor level, discarded somewhere in the processing stack, and then re-estimated (often poorly) at the final stage when a quality flag or confidence band is appended to the finished product.

Encoding Uncertainty in OGC Services: The UncertWeb Approach

The UncertWeb project developed a set of uncertainty profile extensions for standard OGC web services intended to close this gap. The core design philosophy was conservative: do not invent new service types; instead extend existing standards at the data model layer so that uncertainty-aware clients can exploit the additional information while legacy clients continue to receive standard responses.

Three encoding patterns were defined for different use cases:

Inline uncertainty encoding embeds uncertainty information directly in the coverage or feature response. For a raster coverage, a companion band carries the per-pixel uncertainty estimate. The band’s semantics (whether it represents a standard deviation, a 90th-percentile half-width, or a qualitative quality class) are declared in the coverage metadata. This pattern is the most backward-compatible and the most widely deployable with existing software infrastructure.

Ensemble encoding returns a collection of equally-plausible realizations of the output field rather than a single best estimate. Each realization is a complete coverage object, and the ensemble together spans the uncertainty distribution. This pattern is better suited to non-Gaussian uncertainties and to applications that need to propagate uncertainty correctly through subsequent non-linear processing steps. The cost is that payload size scales linearly with ensemble size.

Distribution parameter encoding encodes the parameters of a named probability distribution (mean and variance for a Gaussian, alpha and beta for a Beta distribution) at each location. This is compact and analytically tractable but requires choosing a distributional family in advance, which is a modeling assumption that may not hold everywhere in the domain.

A processing service declares which encoding patterns it can accept as input and which it produces as output in its WPS capabilities document. An orchestration layer (a workflow engine or a catalog-aware client) can then inspect these declarations and route data through a chain such that the encoding type is compatible at each step.

Capability Discovery and Automated Chain Composition

One of the more ambitious goals of the UncertWeb architecture was automated chain composition: the ability for a software agent to query a service registry, find services capable of processing uncertainty-encoded inputs, and assemble them into a valid workflow without manual configuration.

This requires two capabilities beyond what base OGC standards provide. First, services must declare their uncertainty handling behavior in machine-readable form in their capabilities documents: specifically, the type of uncertainty encoding each input parameter accepts and each output delivers. Second, a reasoning component must be able to match output types to input requirements and identify compatible compositions.

The UncertWeb project implemented both components. Uncertainty capability annotations were added to WPS process descriptions using an agreed XML schema. A prototype orchestration service was built that could query a registry of annotated services and produce a candidate workflow graph for a user-specified processing objective. Evaluated against case studies in air quality modeling and hydrological forecasting, the prototype demonstrated that end-to-end uncertainty propagation through a three- to four-service chain was achievable within the GEOSS framework.

Practical Integration with Existing Data Infrastructure

The theoretical architecture of uncertainty-aware service chains is useful only if it can be implemented alongside (not instead of) existing operational data infrastructure. Most operational geospatial data systems are built around formats and conventions that predate the UncertWeb uncertainty encoding work: GeoTIFF files with no uncertainty companion, NetCDF archives with rudimentary quality flags, WMS services designed exclusively for visualization.

A pragmatic integration strategy proceeds incrementally. At the production stage, processing algorithms that already compute uncertainty internally (radiometric calibration routines, statistical interpolation algorithms, ensemble numerical weather prediction systems) are modified to write uncertainty outputs alongside primary outputs using an agreed encoding. NetCDF-CF’s ancillary_variables convention is a practical starting point that is already supported by major analysis toolchains including xarray, GDAL, and QGIS.

At the service layer, WCS and WPS implementations are updated to expose the uncertainty companion fields when queried by uncertainty-aware clients. The additional capability is advertised in the capabilities document. Legacy clients receive the same primary data they always have.

At the catalog layer, ISO 19115 metadata records for datasets are extended to document the uncertainty model: which components are tracked, what distributional assumptions are made, what validation has been performed. This documentation is the minimum provenance required for a downstream service to correctly interpret an uncertainty encoding.

JavaScript Statistical Tools for Lightweight UQ in Web Applications

Not all UQ needs to happen in a heavy server-side processing pipeline. Web-based visualization and analysis tools increasingly need to compute summary statistics (confidence intervals, kernel density estimates, quantiles) on data retrieved from web services. The jStat library, developed partly in connection with the UncertWeb project, provides a comprehensive set of statistical distributions and numerical methods in pure JavaScript, enabling these computations in browser or Node.js environments without a round-trip to a server.

For lightweight UQ tasks (computing a confidence interval on a sample of ensemble members retrieved from a WPS service, or visualizing a probability density function encoded in a distribution parameter response), jStat removes the dependency on a server-side Python or R environment. This is particularly useful for client-side applications that need to display uncertainty information interactively.

Cross-References

This article connects to the broader UncertWeb topic areas on this site. For the foundational methods used to quantify uncertainty before it is encoded in service responses, see the article on quantifying uncertainty in spatial environmental models. For the architecture of the model web that these services are designed to compose within, see the article on the model web architecture for global earth observation.

[02]
Environmental monitoring station with sensors and field equipment, laptop showing earth observation spatial data layers

The Model Web Architecture for Global Earth Observation: Design, Components, and Future Directions

The Global Earth Observation System of Systems (GEOSS) was conceived as a federated infrastructure connecting the environmental monitoring assets of participating nations and organizations (satellite sensor archives, ground observation networks, numerical model outputs) through open standards that make the data findable and accessible without requiring central coordination. The “data web” layer of GEOSS addresses discovery and access: a user can find a dataset and retrieve it.

The model web extends this vision one step further. Rather than treating computation as something that happens locally after data retrieval, the model web makes computational models (atmospheric correction algorithms, hydrological simulators, land-surface models, statistical post-processors) discoverable and invokable as network services, composable into processing chains using the same standards-based interoperability that governs data access. The UncertWeb project’s contribution was to add uncertainty as a first-class property of this architecture.

Conceptual Foundations of the Model Web

The model web concept draws on two intellectual lineages. The first is the OGC web service stack (WPS, WFS, WCS) which established that spatial data processing could be exposed as standard web services with machine-readable interface descriptions. The second is the semantic web tradition, which established that machine-readable descriptions of service capabilities, including input and output data types, could support automated reasoning about how services can be composed.

In the model web, a numerical model is wrapped as a WPS process. Its inputs (forcing data, parameter files, boundary conditions) are typed using agreed vocabularies. Its outputs (state variables, derived diagnostics) are similarly typed. An orchestration layer can inspect these type declarations, query a service registry for processes whose output types match required input types, and assemble a candidate workflow graph. The user specifies a desired output; the orchestration layer finds a composition of services that can produce it.

This is a significant departure from conventional earth observation processing, where chains are hardwired: a fixed sequence of software tools, each configured to consume the exact output format of the preceding tool, is encoded in a workflow script maintained by a single team. Hardwired chains are brittle (any change to an upstream service’s output format breaks every downstream step) and they do not compose across organizational boundaries, because each organization’s chain uses its own internal format conventions.

Uncertainty as an Architectural Requirement

Standard model web architectures treat uncertainty as an afterthought: the primary data product is the designed output, and uncertainty estimates, when they exist at all, are produced separately and documented in a companion file or quality report that travels outside the main processing chain.

The UncertWeb project argued, correctly, that this architecture systematically discards information. If an atmospheric correction service produces per-pixel uncertainty estimates but has no standardized output slot for them, those estimates are lost at the first service boundary. A downstream land-surface model that ingests the corrected surface reflectance has no basis for weighting observations by their reliability. The uncertainty information was computed, at some cost, and then thrown away.

The uncertainty-enabled model web resolves this by making uncertainty a typed property of service inputs and outputs. Each service interface declares not just the primary data type it produces (a floating-point raster coverage at a given spatial resolution) but also the uncertainty representation it attaches to that output: an inline standard-deviation companion band, an ensemble of N realizations, or a per-pixel distribution parameter set. Downstream services declare which uncertainty representation they can consume. The orchestration layer matches representations at composition time, inserting format-conversion services when necessary.

This design has a practical consequence for model developers: a model that consumes uncertainty-encoded inputs can use that information to weight its inferences, report calibrated output uncertainty, or decide where to request additional observations. These behaviors are not possible when uncertainty information is discarded at the service boundary.

Components of a Model Web Node

A model web node (a single organization’s contribution to the federated infrastructure) consists of several interacting components.

Service wrappers expose existing computational models as WPS processes. Writing a service wrapper involves defining an interface document that maps each model input and output to a typed parameter, implementing the HTTP request/response handling, and ensuring the model executable is invoked correctly on the server. Tools such as 52North’s WPS implementation framework reduce the boilerplate, but the modeling team still needs to make explicit decisions about how their model’s uncertainty outputs (if any) are encoded in the response.

A local service registry lists the services the organization operates, their interface descriptions, and their uncertainty capability annotations. This registry is queryable by external orchestration layers. Publishing to a shared registry (such as the GEOSS Component and Service Registry) makes the services discoverable beyond the organization’s network.

Data access services expose the organization’s archived datasets as WCS or WFS endpoints. For uncertainty-enabled datasets, the archive must store uncertainty companion fields alongside primary data and serve them on request.

A workflow engine allows the organization to construct and execute multi-step processing chains internally, consuming both local and remote services. For external chains that span multiple organizations, a shared orchestration service handles cross-organization workflow coordination.

Case Studies from UncertWeb

The UncertWeb project validated the model web architecture against several real-world case studies within the GEOSS framework.

The air quality case study assembled a chain of services including an emissions inventory service, an atmospheric dispersion model, and a health impact model. Uncertainty in the emissions inventory (derived from the variability of activity data and emission factors) was propagated through the dispersion model using an ensemble approach and delivered to the health impact model as an ensemble input. The health impact model produced a probability distribution over adverse outcomes rather than a point estimate, directly informing risk communication outputs.

The hydrological case study applied the architecture to flood forecasting. An ensemble numerical weather prediction service provided probabilistic precipitation forcing. A distributed hydrological model consumed the ensemble precipitation and produced an ensemble of discharge projections. An inundation mapping service translated discharge quantiles into spatial probability-of-inundation maps. Each step preserved and propagated uncertainty through the standard encoding protocols, producing a final inundation product with explicit spatial uncertainty that could be ingested by emergency management decision systems.

Both case studies demonstrated that the architecture was technically feasible and that the uncertainty information produced by the end-to-end chain was more defensible (better calibrated against independent observations) than the single-run deterministic baselines.

Scalability and Operational Considerations

Deploying uncertainty-enabled model web chains at operational scale introduces engineering challenges beyond those of conventional deterministic processing.

Ensemble-based uncertainty propagation multiplies computational and bandwidth requirements by the ensemble size. A chain that runs in one hour deterministically may require 50 to 100 hours of compute for a 50-member ensemble unless the workload is parallelized. Cloud-native architectures with auto-scaling compute capacity are well-matched to this requirement: ensemble members are independent tasks that parallelize embarrassingly well.

Network bandwidth is also a concern when ensemble outputs (each member being a complete spatial field) must be transferred between services operated by different organizations. Compression, subsampling, and localized pre-aggregation at the producing service reduce transfer volumes while preserving the information most relevant to downstream models.

Service availability and versioning are operational concerns in any federated architecture. When one service in a chain is upgraded and its interface changes, dependent chains break. Semantic versioning of service interfaces, combined with registries that track version histories, are standard mitigations. The model web architecture benefits from separating the service interface from the model implementation: a new model version can be deployed behind the same interface without disrupting dependent chains, provided the output semantics are preserved.

Future Directions

The model web concept predates current cloud-native geospatial processing platforms by over a decade, but the problems it addressed remain unsolved in most operational systems. Modern platforms (commercial cloud APIs, open-source tools like Pangeo, and emerging analysis-ready data standards) have improved scalability and data access considerably. What is still missing in most of them is systematic, standards-based uncertainty propagation through multi-service chains.

Integrating the uncertainty encoding work from UncertWeb with contemporary data cube architectures and cloud-native processing frameworks is a productive direction for future development. The encoding patterns (inline companion bands, ensemble collections, distribution parameters) map cleanly onto the multi-band, multi-time dimension data structures that modern earth observation platforms use.

For further detail on the uncertainty quantification methods that feed into the model web, see the article on quantifying uncertainty in spatial environmental models. For the service-layer encoding standards that make uncertainty portable across service boundaries, see the article on building interoperable geospatial web services with uncertainty metadata.

[03]
Geospatial satellite data visualization showing environmental model outputs with uncertainty bands on a GIS workstation

Quantifying Uncertainty in Spatial Environmental Models: Methods and Best Practices

Environmental models that operate over spatial domains (hydrological forecasts, air-quality simulations, land-surface temperature retrievals) carry uncertainty at every stage of their processing chain. Sensor noise, interpolation errors, model parameter estimates, and boundary condition assumptions compound in ways that are rarely communicated to downstream users. Addressing this gap requires systematic methods for uncertainty quantification (UQ) and interoperable standards for expressing that uncertainty alongside the primary data product.

Why Spatial Context Makes Uncertainty Harder

Scalar UQ problems (estimating the uncertainty on a single measured temperature, for example) are well-understood in classical statistics. Spatial UQ adds dimensions that complicate standard approaches. Observations are correlated across space, so naive independent-sample assumptions overstate the effective information content of a dataset. A grid cell’s estimated soil moisture value is not independent of its neighbors; both share common input forcing data and the same model structure.

This spatial autocorrelation must be preserved when propagating uncertainty through a model chain. Failing to account for it leads to underestimated joint uncertainties across a spatial domain: a critical failure mode when the downstream use case involves regional aggregation, such as computing total basin runoff or area-averaged surface reflectance.

Two structural choices govern how spatial uncertainty is handled in practice:

  1. Field-based representations store a full covariance or variogram model alongside the primary output. This is rigorous but expensive: for a global 0.1-degree grid, the covariance matrix has on the order of 10^10 entries, which is not tractable without approximations such as sparse precision matrices or Gaussian process approximations.

  2. Ensemble representations replace the analytical covariance with a collection of equally-plausible realizations drawn from the joint distribution. Ensembles are more portable (each member is a valid spatial field that can be fed directly into a downstream model) and their summary statistics converge to the true moments as ensemble size grows.

Monte Carlo Propagation Through Model Chains

Monte Carlo (MC) propagation remains the most general method for UQ in complex model chains because it requires no analytical gradient information and handles non-linear and discontinuous models naturally. The workflow for a spatial MC experiment follows three steps.

First, define a probabilistic description of each uncertain input. For satellite-derived land-cover classifications, this might be a per-pixel confusion matrix that encodes the probability of each class. For a digital elevation model (DEM), it might be a spatially correlated Gaussian error field with a known semivariogram.

Second, draw N realizations from the joint input distribution. When inputs are spatially correlated, drawing must respect that structure: sequential Gaussian simulation or Cholesky decomposition of a local covariance approximation are standard approaches.

Third, propagate each realization through the full model chain and collect the ensemble of outputs. Summary statistics (mean, standard deviation, quantiles, probability of exceedance) can then be computed from the output ensemble at each grid cell.

The main limitation of MC propagation is computational cost. For high-fidelity atmospheric or hydrological models with run times measured in hours, generating even a modest ensemble of 100 members is prohibitive without dedicated high-performance computing resources. Surrogate modeling (replacing the expensive simulation with a fast statistical emulator trained on a small design-of-experiments sample) is the standard mitigation strategy. Polynomial chaos expansions and Gaussian process emulators are widely used in geophysical applications.

Interoperability and the Uncertainty-Enabled Model Web

Quantifying uncertainty within a single modeling system is necessary but not sufficient. Modern earth observation workflows are composed chains of web-accessible processing services (a sensor data archive, a land-surface model, an atmospheric correction service, a spatial aggregation step) each operated by different organizations using different software stacks. Unless uncertainty information is encoded in a machine-readable format that each service can consume and produce, the chain breaks: uncertainty generated in one service is discarded at the next service boundary.

The UncertWeb project (2010-2013), funded under the European Commission’s Seventh Framework Programme, addressed this directly. The project defined a set of uncertainty encoding profiles layered on top of existing OGC web service standards (WPS, WFS, WCS) so that a processing service could declare the uncertainty representation it accepts as input and the representation it produces as output. This enabled automated composition of model chains that preserved uncertainty end-to-end, integrated within the Global Earth Observation System of Systems (GEOSS).

Key design decisions from that work remain instructive. Uncertainty was classified by type (random, systematic, and structural) and each type was assigned a distinct encoding. A processing service that introduced systematic bias of known magnitude encoded that as a separate uncertainty component, not merged with random noise, so that downstream aggregations could handle each component appropriately.

Metadata Standards for Operational Deployment

For uncertainty information to be operationally useful, it must travel with the data through storage, catalog, and visualization systems, not only through processing pipelines. ISO 19115 geographic metadata allows uncertainty fields to be documented at the dataset level, but does not support per-observation or per-pixel uncertainty encoding. Observation and Measurement (O&M, ISO 19156) is more expressive and can represent a measurement as a probability distribution rather than a point value.

In practice, operational systems often use a simpler convention: a companion layer or band that encodes a per-pixel uncertainty estimate (commonly one standard deviation or a 90th-percentile confidence interval) alongside the primary data variable. NetCDF-CF conventions support this pattern through the ancillary_variables attribute. When the companion layer is generated correctly (accounting for spatial correlation) this approach is tractable and widely supported by downstream visualization and analysis toolchains.

What is frequently missing from operational products is documentation of the uncertainty model: what assumptions were made about input error structure, what correlation lengths were assumed, and how structural model uncertainty was handled. Without this provenance, the companion uncertainty layer cannot be correctly interpreted in a downstream propagation step. Data producers should treat uncertainty model documentation with the same rigor as algorithm theoretical basis documents (ATBDs).

Validation Against Independent Observations

Uncertainty estimates derived from model propagation must be validated against independent observations before being treated as reliable. A well-calibrated uncertainty estimate should satisfy the property that a stated 90% confidence interval contains the true value 90% of the time when evaluated over a large independent validation sample. This is the coverage criterion.

Common validation failures in spatial environmental models include:

  • Underestimation of structural model uncertainty: when the only uncertainty tracked is input noise, ignoring the fact that the model itself is an approximation of physical processes.
  • Failure to account for representativity error: the difference in spatial support between point observations used for validation and the grid cell values being validated.
  • Temporal autocorrelation in the validation set: using validation observations drawn from a continuous time series without accounting for temporal dependence, which inflates the apparent sample size.

A rigorous validation workflow splits the independent observation set into subsets that span different climatic regimes, land cover types, and seasons, and checks coverage separately for each subset. Spatial maps of coverage failure are a diagnostic tool for identifying where the uncertainty model is most deficient.

Connecting UQ to Decision-Making

Uncertainty quantification is not an end in itself. Its value is realized when decision systems use it correctly: propagating it into risk assessments, using it to trigger additional data collection when uncertainty exceeds a threshold, or presenting it to analysts in forms that support rather than overwhelm interpretation.

For further reading on interoperability standards developed during the UncertWeb project, see the archived UncertWeb project resources and the associated publications on OGC uncertainty encoding profiles. Related topics on this site include building interoperable geospatial web services with uncertainty metadata and the model web architecture for global earth observation.