DepthAnythingv2Pipe
DepthAnythingv2Pipe is registered as depth-anything-v2-pipe and implemented in src/autopipeline/components/modules/depth_anything_v2_pipe.py.
This module computes structural similarity in depth space rather than RGB space. It is useful when geometric consistency is more important than appearance similarity.
Class
Overview
Registry Entry
| Field | Value |
|---|---|
| Registry key | depth-anything-v2-pipe |
| Class | DepthAnythingv2Pipe |
| Main mixins | DepthAnythingv2Mixin, SSIMMixin, MaskProcessor |
| Return type | float |
Constructor
Constructor
DepthAnythingv2Pipe(**kwargs)
Supported init kwargs
| Key | Required | Meaning |
|---|---|---|
model_path | Yes in practice | Hugging Face depth-estimation checkpoint path. |
device | No | Torch device for depth inference. |
Methods
Public Methods
| Method | Purpose |
|---|---|
_normalize_depth_map(depth_map) | Rescale a raw depth map into uint8 [0, 255]. |
calc_depth_ssim(...) | Predict depth for both images and compute masked SSIM in depth space. |
__call__(...) | Dispatch to the depth_ssim branch. |
Signature
Call Signature
DepthAnythingv2Pipe.__call__(
ref_image: Image.Image,
edited_image: Image.Image,
coords: List[Tuple[int, int, int, int]] = None,
mask_mode: str = None,
metric: str = "depth_ssim",
**kwargs,
)
Input / Output
Runtime Inputs
| Argument | Required | Meaning |
|---|---|---|
ref_image | Yes | Reference image. |
edited_image | Yes | Edited image. |
coords | No | Region boxes used for masking. |
mask_mode | No | inner or outer, normally derived from pipeline scope. |
metric | Yes | Currently only depth_ssim. |
Extra runtime kwargs
| Key | Default | Meaning |
|---|---|---|
resize_depth_maps | True | Resize predicted depth back to original image size before scoring. |
win_size | 7 | SSIM window size. |
win_sigma | 1.5 | SSIM window sigma. |
Supported Metric
| Metric | What it measures | Better direction |
|---|---|---|
depth_ssim | structural similarity between predicted depth maps | higher is better |
Execution flow
calc_depth_ssim(...) performs the following steps:
- predict depth maps for the reference and edited images
- optionally resize them back to original image size
- min-max normalize each depth map into
uint8 - build a mask with
MaskProcessor - reduce the mask to one channel if necessary
- compute SSIM on single-channel depth tensors
Input / Output
Return Value
The pipe returns a single float score.
As with other SSIM-backed paths:
- higher is better
SSIMMixin.compute(...)may degradeNaNto-1e8
Config
Minimal Config Example
metric_configs:
depth_ssim:
pipe_name: depth-anything-v2-pipe
default_config: ${pipes_default.depth-anything-v2-pipe}
init_config:
scope: edit_area
runtime_params:
win_size: 11
win_sigma: 1.5
resize_depth_maps: true
Failure Mode
Failure Semantics
The module raises a ValueError for unsupported metrics.
Other important preconditions are implicit:
model_pathmust be valid at initialization- the predicted depth map should have non-degenerate range for stable normalization
The current _normalize_depth_map(...) implementation does not explicitly guard against a zero depth range, so extremely degenerate inputs rely on inherited SSIM fallback behavior.
Extension
Extension Notes
- Extend this pipe when the comparison remains "predict depth, then compare structure."
- If you want a different depth model but the same scoring logic, the change belongs mostly in
DepthAnythingv2Mixin. - If you want a different depth-space metric altogether, add a new metric branch and document the score direction clearly.