Visual Consistency Mixins
visual_consistency.py contains the reusable visual-metric backends for AutoPipeline. It is implemented in src/autopipeline/components/primitives/visual_consistency.py.
This file is the shared backend layer for:
- SSIM-based metrics
- LPIPS
- SAM-backed object extraction
- depth prediction
Primitive Group
Included Classes
| Class | Main role | Typical consumers |
|---|---|---|
SSIMMixin | masked or unmasked structural similarity | ssim-pipe, depth-anything-v2-pipe |
LPIPSMixin | perceptual similarity | lpips-pipe |
SAMSegmentationMixin | bbox-prompted SAM2 extraction | sam-pipe, clip-pipe, dino-v3-pipe |
DepthAnythingv2Mixin | depth-map prediction | depth-anything-v2-pipe |
Mixin
SSIMMixin
Public Methods
| Method | Purpose |
|---|---|
compute(ref_img_tensor, edited_img_tensor, mask=None, win_size=7, win_sigma=1.5) | Wrapper around the low-level SSIM implementation. |
Input / Output Contract
compute(
ref_img_tensor: torch.Tensor,
edited_img_tensor: torch.Tensor,
mask=None,
win_size=7,
win_sigma=1.5,
) -> float
Behavior
The method calls the local ssim(...) utility with:
data_range=255size_average=True
If the returned tensor is NaN, it replaces it with -1e8.
Mixin
LPIPSMixin
Class Signature
LPIPSMixin(**kwargs)
Constructor Parameters
| Key | Default | Meaning |
|---|---|---|
device | auto | Torch device for LPIPS. |
net | alex | LPIPS backbone name. |
Public Methods
| Method | Purpose |
|---|---|
compute(ref_img_tensor, edited_img_tensor) | Compute LPIPS after normalizing tensors to [-1, 1]. |
Input / Output Contract
Inputs are expected as image tensors in pixel scale [0, 255]. The method converts them internally to the LPIPS input range and returns one float score.
Mixin
SAMSegmentationMixin
Class Signature
SAMSegmentationMixin(**kwargs)
Constructor Parameters
| Key | Required | Meaning |
|---|---|---|
model_cfg | Yes in practice | SAM2 config path. |
model_path | Yes in practice | SAM2 checkpoint path. |
device | No | Torch device for inference. |
Public Methods
| Method | Purpose |
|---|---|
get_best_mask_in_bbox(image_np, bbox) | Run SAM and choose the highest-scoring mask for one bbox. |
crop_and_isolate_subject(image_np, bbox, mask, bg_color=(255,255,255)) | Replace background, then crop to bbox. |
extract_object_by_coord(image_np, coord, bg_color=(255,255,255)) | End-to-end bbox-to-isolated-crop helper. |
Input / Output Contract
get_best_mask_in_bbox(...)
- input:
- RGB numpy image
- bbox as numpy array
- output:
- best SAM mask
Noneif image setup fails
crop_and_isolate_subject(...)
- input:
- image array
- bbox
- binary mask
- output:
- cropped isolated subject image
extract_object_by_coord(...)
- output:
- cropped isolated object
Noneif mask extraction fails
Mixin
DepthAnythingv2Mixin
Class Signature
DepthAnythingv2Mixin(**kwargs)
Constructor Parameters
| Key | Required | Meaning |
|---|---|---|
model_path | Yes in practice | Depth-estimation checkpoint path. |
device | No | Torch device for inference. |
Public Methods
| Method | Purpose |
|---|---|
get_depth_map(image, resize_to_original=True) | Predict a dense depth map from a PIL image. |
Input / Output Contract
Returns a numpy depth map, optionally resized to the original image size.
Minimal Config Examples
SAM-backed module init
init_config:
sam:
model_cfg: ${user_config.model_paths.sam_cfg}
model_path: ${user_config.model_paths.sam_path}
Depth-backed module init
init_config:
model_path: ${user_config.model_paths.depth_anything_v2_path}
LPIPS-backed module init
init_config:
net: alex
Failure Semantics
Important behavior across this file:
SSIMMixinconvertsNaNto-1e8SAMSegmentationMixin.get_best_mask_in_bbox(...)returnsNoneifset_image(...)failsSAMSegmentationMixin.extract_object_by_coord(...)returnsNonewhen mask extraction fails- LPIPS and depth mixins do not add custom recovery logic around model loading or inference
Extension Notes
- Use this file for reusable backend logic, not metric-specific scoring semantics.
- If several modules need the same model-loading or preprocessing change, make that change here.
- If only one metric formula changes, keep the edit in the module layer instead.