SAMPromptEncoder
classkeras_hub.layers.SAMPromptEncoder(
hidden_size=256,
image_embedding_size=(64, 64),
input_image_size=(1024, 1024),
mask_in_channels=16,
activation="gelu",
**kwargs
)
Prompt Encoder for the Segment Anything Model (SAM).
The prompt encoder generates encodings for three types of prompts: - Point prompts: Points on the image along with a label indicating whether the point is in the foreground (part of the mask) or in the background (not a part of the mask). - Box prompts: A batch of bounding boxes with format [(x1, y1), (x2, y2)] used to determine the location of the masks in the image. - Masks: An input mask can be passed to refine the positional embeddings for the output mask.
First, the point prompts and box prompts are concatenated and positional encodings are generated using random spatial frequencies. A point is represented as the sum of a positional encoding of the point's location and one of two learned embeddings that indicate if the point is either in the foreground or background. A box is represented by an embedding pair: (1) the positional encoding of its top-left corner summed with a learned embedding representing "top-left corner" and (2) the same structure but using a learned embedding indicating "bottom-right corner". The box and point encodings are referred to as "prompt_sparse encodings" If a mask prompt is passed, a convolutional neural net is used to downscale it to generate "dense encodings". If no mask prompt is passed, an embedding layer is used instead to generate a "no mask" embedding.
Arguments
256
.(64, 64)
.(1024, 1024)
.16
."gelu"
.