Skip to main content

DepthAnythingv2Pipe

DepthAnythingv2Pipe is registered as depth-anything-v2-pipe and implemented in src/autopipeline/components/modules/depth_anything_v2_pipe.py.

This module computes structural similarity in depth space rather than RGB space. It is useful when geometric consistency is more important than appearance similarity.

Class
Overview

Registry Entry

FieldValue
Registry keydepth-anything-v2-pipe
ClassDepthAnythingv2Pipe
Main mixinsDepthAnythingv2Mixin, SSIMMixin, MaskProcessor
Return typefloat
Constructor

Constructor

DepthAnythingv2Pipe(**kwargs)

Supported init kwargs

KeyRequiredMeaning
model_pathYes in practiceHugging Face depth-estimation checkpoint path.
deviceNoTorch device for depth inference.
Methods

Public Methods

MethodPurpose
_normalize_depth_map(depth_map)Rescale a raw depth map into uint8 [0, 255].
calc_depth_ssim(...)Predict depth for both images and compute masked SSIM in depth space.
__call__(...)Dispatch to the depth_ssim branch.
Signature

Call Signature

DepthAnythingv2Pipe.__call__(
ref_image: Image.Image,
edited_image: Image.Image,
coords: List[Tuple[int, int, int, int]] = None,
mask_mode: str = None,
metric: str = "depth_ssim",
**kwargs,
)
Input / Output

Runtime Inputs

ArgumentRequiredMeaning
ref_imageYesReference image.
edited_imageYesEdited image.
coordsNoRegion boxes used for masking.
mask_modeNoinner or outer, normally derived from pipeline scope.
metricYesCurrently only depth_ssim.

Extra runtime kwargs

KeyDefaultMeaning
resize_depth_mapsTrueResize predicted depth back to original image size before scoring.
win_size7SSIM window size.
win_sigma1.5SSIM window sigma.

Supported Metric

MetricWhat it measuresBetter direction
depth_ssimstructural similarity between predicted depth mapshigher is better

Execution flow

calc_depth_ssim(...) performs the following steps:

  1. predict depth maps for the reference and edited images
  2. optionally resize them back to original image size
  3. min-max normalize each depth map into uint8
  4. build a mask with MaskProcessor
  5. reduce the mask to one channel if necessary
  6. compute SSIM on single-channel depth tensors
Input / Output

Return Value

The pipe returns a single float score.

As with other SSIM-backed paths:

  • higher is better
  • SSIMMixin.compute(...) may degrade NaN to -1e8
Config

Minimal Config Example

metric_configs:
depth_ssim:
pipe_name: depth-anything-v2-pipe
default_config: ${pipes_default.depth-anything-v2-pipe}
init_config:
scope: edit_area
runtime_params:
win_size: 11
win_sigma: 1.5
resize_depth_maps: true
Failure Mode

Failure Semantics

The module raises a ValueError for unsupported metrics.

Other important preconditions are implicit:

  • model_path must be valid at initialization
  • the predicted depth map should have non-degenerate range for stable normalization

The current _normalize_depth_map(...) implementation does not explicitly guard against a zero depth range, so extremely degenerate inputs rely on inherited SSIM fallback behavior.

Extension

Extension Notes

  • Extend this pipe when the comparison remains "predict depth, then compare structure."
  • If you want a different depth model but the same scoring logic, the change belongs mostly in DepthAnythingv2Mixin.
  • If you want a different depth-space metric altogether, add a new metric branch and document the score direction clearly.