MaskProcessor

MaskProcessor is the bbox-to-mask utility implemented in src/autopipeline/components/primitives/mask_processor.py.

It is one of the most important reusable helpers in the framework because multiple modules depend on the same coordinate convention and the same region polarity semantics.

Utility

Overview

Class Role

MaskProcessor exists to convert:

pixel-space bbox lists

into:

full-resolution masks
resized masks
patch-level masks

in the tensor layout required by each downstream metric backend.

Methods

Public Methods

Method	Purpose
`make_mask(...)`	Build a mask at the original image resolution.
`make_resized_mask(...)`	Rescale coordinates first, then build a mask at target resolution.
`create_patch_mask_from_mask_2d(...)`	Downsample a high-resolution binary mask into patch-space.

Methods

`make_mask(...)`

make_mask(
    image_h: int,
    image_w: int,
    coords: list,
    *,
    return_format: str,
    mode: str = None,
)

Parameters

Parameter	Required	Meaning
`image_h`	Yes	Source image height.
`image_w`	Yes	Source image width.
`coords`	Yes when `mode` is not `None`	List of `(x1, y1, x2, y2)` boxes.
`return_format`	Yes	`2d_numpy`, `3d_numpy`, or `4d_tensor`.
`mode`	No	Region polarity. If `None`, the method returns `None`.

Supported return formats

`return_format`	Output shape	Typical consumer
`2d_numpy`	`(H, W)`	CLIP / DINO patch masking
`3d_numpy`	`(3, H, W)`	LPIPS
`4d_tensor`	`(1, 3, H, W)` boolean tensor	SSIM-style computation

Region polarity

The implementation treats mode as:

outer initialize to 1, fill bbox region with 0
anything else non-None initialize to 0, fill bbox region with 1

In practice the pipeline uses:

edit_area -> inner
unedit_area -> outer

Methods

`make_resized_mask(...)`

make_resized_mask(
    image_h: int,
    image_w: int,
    coords: list,
    *,
    return_format: str,
    mode: str = "outer",
    target_h: int,
    target_w: int,
)

What it does

This method:

rescales each bbox from original image coordinates to target resolution
delegates to make_mask(...)

It is the standard path for patch-based backbones that operate on resized inputs.

Methods

`create_patch_mask_from_mask_2d(...)`

create_patch_mask_from_mask_2d(
    mask_2d: np.ndarray,
    patch_size: int,
    threshold: float = 0.5,
) -> np.ndarray

Parameters

Parameter	Meaning
`mask_2d`	High-resolution binary mask.
`patch_size`	Patch width and height for the target backbone.
`threshold`	Fraction of active pixels required to mark a patch as active.

Output

Returns a low-resolution patch mask with shape:

(H // patch_size, W // patch_size)

and values 0 or 1.

Config

Minimal Integration Example

MaskProcessor is usually used implicitly inside modules, but the effective runtime contract looks like:

metric_configs:
  emd:
    pipe_name: clip-pipe
    scope: unedit_area
    runtime_params:
      patch_mask_threshold: 0.1

At runtime:

scope becomes mask_mode
coords come from parser-grounder
MaskProcessor converts them into the correct mask layout

Failure Mode

Failure Semantics

Important behavior to know:

mode is None -> returns None
unsupported return_format -> raises ValueError
create_patch_mask_from_mask_2d(...) assumes the mask height and width are divisible by patch_size

That last assumption is not explicitly validated in the current implementation.

Extension

Extension Notes

If you change coordinate semantics, update this file first and document the change everywhere else.
Reuse this helper instead of hand-writing bbox masking in each module.
Keep inner and outer semantics stable. They are effectively part of the framework contract.

Class Role​

Public Methods​

make_mask(...)​

Parameters​

Supported return formats​

Region polarity​

make_resized_mask(...)​

What it does​

create_patch_mask_from_mask_2d(...)​

Parameters​

Output​

Minimal Integration Example​

Failure Semantics​

Extension Notes​

Class Role

Public Methods

`make_mask(...)`

Parameters

Supported return formats

Region polarity

`make_resized_mask(...)`

What it does

`create_patch_mask_from_mask_2d(...)`

Parameters

Output

Minimal Integration Example

Failure Semantics

Extension Notes