Modules Overview
Modules are the executable units registered in PIPE_REGISTRY. They are the objects referenced by pipeline configs through pipe_name, and they are the components that actually compute scores, winners, or parsed region artifacts.
Runtime contract
At the pipeline layer, a module is selected through metric_configs or parser_grounder_config:
metric_configs:
lpips:
pipe_name: lpips-pipe
init_config:
model_path: ...
scope: unedit_area
runtime_params:
...
The pipeline runtime then:
- resolves
pipe_name - instantiates the module with
init_config - maps
scopetomask_mode - calls the module with the metric name and
runtime_params
Scope and masking
BasePipeline translates scopes into mask behavior:
edit_area->mask_mode="inner"unedit_area->mask_mode="outer"
This matters for modules that consume full images plus coords, especially lpips-pipe, ssim-pipe, clip-pipe, dino-v3-pipe, and depth-anything-v2-pipe.
Registered module catalog
| Registry key | Implementation | Role | Typical metric or output |
|---|---|---|---|
parser-grounder | parser_grounder.py | Parse edit instructions and ground regions | parsed object dict + bbox list |
pairwise-judge | judge.py | Compare two candidate images with an LLM or VLM | Image A, Image B, Tie, Failed |
viescore | judge.py | Score one or two edited images with a client-backed prompt | single score or pairwise winner |
clip-pipe | clip_pipe.py | CLIP-based semantic similarity | emd, sam_clip_cls_sim |
dino-v3-pipe | dino_pipe.py | DINO-based structure or object similarity | dinov3_structure_similarity, sam_dino_cls_sim |
lpips-pipe | lpips_pipe.py | Perceptual similarity | lpips |
ssim-pipe | ssim_pipe.py | Structural similarity | ssim, L_channel_ssim |
sam-pipe | sam_pipe.py | Region mask overlap inside a bbox | iou |
depth-anything-v2-pipe | depth_anything_v2_pipe.py | Depth-map similarity | depth_ssim |
face-geometry-pipe | face_pipe.py | Facial landmark geometry consistency | L2_distance |
face-texture-pipe | face_pipe.py | Facial texture or color consistency | high_frequency_diff, color_similarity, energy_ratio |
face-identity-pipe | face_pipe.py | Face identity preservation | face_ID_sim, bg_faceID_sim, max_match_face_ID_sim |
hair-consistency-pipe | hair_pipe.py | Hair-region consistency | color_distance, texture_energy_diff, high_frequency_diff |
body-pose-and-shape-pipe | human_body.py | Body pose and silhouette consistency | body_shape_iou, body_pose_position_error |
body-appearance-pipe | human_body.py | Body appearance similarity | body_appearance_dino_cosine_sim |
Important design caveats
The module interface is intentionally flexible, not perfectly uniform:
- some modules return
float - some return
dict parser-grounderreturns a tuple- many modules return
Noneor0.0on soft failure
That flexibility is by design. The pipelines already tolerate sparse or partially missing scores, so extension code should generally follow the same pattern unless a failure must be fatal.
How to extend this layer safely
- Reuse an existing module if your need is only a new task-specific config.
- Add a new module when you need a new executable metric or judge behavior.
- Import the new module in
components/modules/__init__.py, otherwise it will not be registered. - Document the module's expected input contract, because not all modules consume the same input shape.
Source-aligned module pages
The module docs are now split to stay closer to the source tree: