MaskProcessor
MaskProcessor is the bbox-to-mask utility implemented in src/autopipeline/components/primitives/mask_processor.py.
It is one of the most important reusable helpers in the framework because multiple modules depend on the same coordinate convention and the same region polarity semantics.
Class Role
MaskProcessor exists to convert:
- pixel-space bbox lists
into:
- full-resolution masks
- resized masks
- patch-level masks
in the tensor layout required by each downstream metric backend.
Public Methods
| Method | Purpose |
|---|---|
make_mask(...) | Build a mask at the original image resolution. |
make_resized_mask(...) | Rescale coordinates first, then build a mask at target resolution. |
create_patch_mask_from_mask_2d(...) | Downsample a high-resolution binary mask into patch-space. |
make_mask(...)
make_mask(
image_h: int,
image_w: int,
coords: list,
*,
return_format: str,
mode: str = None,
)
Parameters
| Parameter | Required | Meaning |
|---|---|---|
image_h | Yes | Source image height. |
image_w | Yes | Source image width. |
coords | Yes when mode is not None | List of (x1, y1, x2, y2) boxes. |
return_format | Yes | 2d_numpy, 3d_numpy, or 4d_tensor. |
mode | No | Region polarity. If None, the method returns None. |
Supported return formats
return_format | Output shape | Typical consumer |
|---|---|---|
2d_numpy | (H, W) | CLIP / DINO patch masking |
3d_numpy | (3, H, W) | LPIPS |
4d_tensor | (1, 3, H, W) boolean tensor | SSIM-style computation |
Region polarity
The implementation treats mode as:
outerinitialize to1, fill bbox region with0- anything else non-
Noneinitialize to0, fill bbox region with1
In practice the pipeline uses:
edit_area -> innerunedit_area -> outer
make_resized_mask(...)
make_resized_mask(
image_h: int,
image_w: int,
coords: list,
*,
return_format: str,
mode: str = "outer",
target_h: int,
target_w: int,
)
What it does
This method:
- rescales each bbox from original image coordinates to target resolution
- delegates to
make_mask(...)
It is the standard path for patch-based backbones that operate on resized inputs.
create_patch_mask_from_mask_2d(...)
create_patch_mask_from_mask_2d(
mask_2d: np.ndarray,
patch_size: int,
threshold: float = 0.5,
) -> np.ndarray
Parameters
| Parameter | Meaning |
|---|---|
mask_2d | High-resolution binary mask. |
patch_size | Patch width and height for the target backbone. |
threshold | Fraction of active pixels required to mark a patch as active. |
Output
Returns a low-resolution patch mask with shape:
(H // patch_size, W // patch_size)
and values 0 or 1.
Minimal Integration Example
MaskProcessor is usually used implicitly inside modules, but the effective runtime contract looks like:
metric_configs:
emd:
pipe_name: clip-pipe
scope: unedit_area
runtime_params:
patch_mask_threshold: 0.1
At runtime:
scopebecomesmask_modecoordscome from parser-grounderMaskProcessorconverts them into the correct mask layout
Failure Semantics
Important behavior to know:
mode is None-> returnsNone- unsupported
return_format-> raisesValueError create_patch_mask_from_mask_2d(...)assumes the mask height and width are divisible bypatch_size
That last assumption is not explicitly validated in the current implementation.
Extension Notes
- If you change coordinate semantics, update this file first and document the change everywhere else.
- Reuse this helper instead of hand-writing bbox masking in each module.
- Keep
innerandoutersemantics stable. They are effectively part of the framework contract.