Skip to main content

MaskProcessor

MaskProcessor is the bbox-to-mask utility implemented in src/autopipeline/components/primitives/mask_processor.py.

It is one of the most important reusable helpers in the framework because multiple modules depend on the same coordinate convention and the same region polarity semantics.

Utility
Overview

Class Role

MaskProcessor exists to convert:

  • pixel-space bbox lists

into:

  • full-resolution masks
  • resized masks
  • patch-level masks

in the tensor layout required by each downstream metric backend.

Methods

Public Methods

MethodPurpose
make_mask(...)Build a mask at the original image resolution.
make_resized_mask(...)Rescale coordinates first, then build a mask at target resolution.
create_patch_mask_from_mask_2d(...)Downsample a high-resolution binary mask into patch-space.
Methods

make_mask(...)

make_mask(
image_h: int,
image_w: int,
coords: list,
*,
return_format: str,
mode: str = None,
)

Parameters

ParameterRequiredMeaning
image_hYesSource image height.
image_wYesSource image width.
coordsYes when mode is not NoneList of (x1, y1, x2, y2) boxes.
return_formatYes2d_numpy, 3d_numpy, or 4d_tensor.
modeNoRegion polarity. If None, the method returns None.

Supported return formats

return_formatOutput shapeTypical consumer
2d_numpy(H, W)CLIP / DINO patch masking
3d_numpy(3, H, W)LPIPS
4d_tensor(1, 3, H, W) boolean tensorSSIM-style computation

Region polarity

The implementation treats mode as:

  • outer initialize to 1, fill bbox region with 0
  • anything else non-None initialize to 0, fill bbox region with 1

In practice the pipeline uses:

  • edit_area -> inner
  • unedit_area -> outer
Methods

make_resized_mask(...)

make_resized_mask(
image_h: int,
image_w: int,
coords: list,
*,
return_format: str,
mode: str = "outer",
target_h: int,
target_w: int,
)

What it does

This method:

  1. rescales each bbox from original image coordinates to target resolution
  2. delegates to make_mask(...)

It is the standard path for patch-based backbones that operate on resized inputs.

Methods

create_patch_mask_from_mask_2d(...)

create_patch_mask_from_mask_2d(
mask_2d: np.ndarray,
patch_size: int,
threshold: float = 0.5,
) -> np.ndarray

Parameters

ParameterMeaning
mask_2dHigh-resolution binary mask.
patch_sizePatch width and height for the target backbone.
thresholdFraction of active pixels required to mark a patch as active.

Output

Returns a low-resolution patch mask with shape:

(H // patch_size, W // patch_size)

and values 0 or 1.

Config

Minimal Integration Example

MaskProcessor is usually used implicitly inside modules, but the effective runtime contract looks like:

metric_configs:
emd:
pipe_name: clip-pipe
scope: unedit_area
runtime_params:
patch_mask_threshold: 0.1

At runtime:

  • scope becomes mask_mode
  • coords come from parser-grounder
  • MaskProcessor converts them into the correct mask layout
Failure Mode

Failure Semantics

Important behavior to know:

  • mode is None -> returns None
  • unsupported return_format -> raises ValueError
  • create_patch_mask_from_mask_2d(...) assumes the mask height and width are divisible by patch_size

That last assumption is not explicitly validated in the current implementation.

Extension

Extension Notes

  • If you change coordinate semantics, update this file first and document the change everywhere else.
  • Reuse this helper instead of hand-writing bbox masking in each module.
  • Keep inner and outer semantics stable. They are effectively part of the framework contract.