► KerasHub: Pretrained Models / API documentation / Model Architectures / SigLIP / SigLIPImageConverter

SigLIPImageConverter

`SigLIPImageConverter` class

keras_hub.layers.SigLIPImageConverter(
    image_size=None,
    scale=None,
    offset=None,
    crop_to_aspect_ratio=True,
    pad_to_aspect_ratio=False,
    interpolation="bilinear",
    antialias=False,
    bounding_box_format="yxyx",
    data_format=None,
    **kwargs
)

Preprocess raw images into model ready inputs.

This class converts from raw images to model ready inputs. This conversion proceeds in the following steps:

Resize the image using to image_size. If image_size is None, this step will be skipped.
Rescale the image by multiplying by scale, which can be either global or per channel. If scale is None, this step will be skipped.
Offset the image by adding offset, which can be either global or per channel. If offset is None, this step will be skipped.

The layer will take as input a raw image tensor in the channels last or channels first format, and output a preprocessed image input for modeling. This tensor can be batched (rank 4), or unbatched (rank 3).

This layer can be used with the from_preset() constructor to load a layer that will rescale and resize an image for a specific pretrained model. Using the layer this way allows writing preprocessing code that does not need updating when switching between model checkpoints.

Arguments

image_size: (int, int) tuple or None. The output size of the image, not including the channels axis. If None, the input will not be resized.
scale: float, tuple of floats, or None. The scale to apply to the inputs. If scale is a single float, the entire input will be multiplied by scale. If scale is a tuple, it's assumed to contain per-channel scale value multiplied against each channel of the input images. If scale is None, no scaling is applied.
offset: float, tuple of floats, or None. The offset to apply to the inputs. If offset is a single float, the entire input will be summed with offset. If offset is a tuple, it's assumed to contain per-channel offset value summed against each channel of the input images. If offset is None, no scaling is applied.
crop_to_aspect_ratio: If True, resize the images without aspect ratio distortion. When the original aspect ratio differs from the target aspect ratio, the output image will be cropped so as to return the largest possible window in the image (of size (height, width)) that matches the target aspect ratio. By default (crop_to_aspect_ratio=False), aspect ratio may not be preserved.
interpolation: String, the interpolation method. Supports "bilinear", "nearest", "bicubic", "lanczos3", "lanczos5". Defaults to "bilinear".
antialias: Whether to use an antialiasing filter when downsampling an image. Defaults to False.
bounding_box_format: A string specifying the format of the bounding boxes, one of "xyxy", "rel_xyxy", "xywh", "center_xywh", "yxyx", "rel_yxyx". Specifies the format of the bounding boxes which will be resized to image_size along with the image. To pass bounding boxed to this layer, pass a dict with keys "images" and "bounding_boxes" when calling the layer.
data_format: String, either "channels_last" or "channels_first". The ordering of the dimensions in the inputs. "channels_last" corresponds to inputs with shape (batch, height, width, channels) while "channels_first" corresponds to inputs with shape (batch, channels, height, width). It defaults to the image_data_format value found in your Keras config file at ~/.keras/keras.json. If you never set it, then it will be "channels_last".

Examples

# Resize raw images and scale them to [0, 1].
converter = keras_hub.layers.ImageConverter(
    image_size=(128, 128),
    scale=1. / 255,
)
converter(np.random.randint(0, 256, size=(2, 512, 512, 3)))

# Resize images to the specific size needed for a PaliGemma preset.
converter = keras_hub.layers.ImageConverter.from_preset(
    "pali_gemma_3b_224"
)
converter(np.random.randint(0, 256, size=(2, 512, 512, 3)))

[source]

`from_preset` method

SigLIPImageConverter.from_preset(preset, **kwargs)

Instantiate a keras_hub.layers.ImageConverter from a model preset.

A preset is a directory of configs, weights and other file assets used to save and load a pre-trained model. The preset can be passed as one of:

a built-in preset identifier like 'pali_gemma_3b_224'
a Kaggle Models handle like 'kaggle://user/paligemma/keras/pali_gemma_3b_224'
a Hugging Face handle like 'hf://user/pali_gemma_3b_224'
a path to a local preset directory like './pali_gemma_3b_224'

You can run cls.presets.keys() to list all built-in presets available on the class.

Arguments

preset: string. A built-in preset identifier, a Kaggle Models handle, a Hugging Face handle, or a path to a local directory.
load_weights: bool. If True, the weights will be loaded into the model architecture. If False, the weights will be randomly initialized.

Examples

batch = np.random.randint(0, 256, size=(2, 512, 512, 3))

# Resize images for `"pali_gemma_3b_224"`.
converter = keras_hub.layers.ImageConverter.from_preset(
    "pali_gemma_3b_224"
)
converter(batch) # # Output shape (2, 224, 224, 3)

# Resize images for `"pali_gemma_3b_448"` without cropping.
converter = keras_hub.layers.ImageConverter.from_preset(
    "pali_gemma_3b_448",
    crop_to_aspect_ratio=False,
)
converter(batch) # # Output shape (2, 448, 448, 3)

Preset	Parameters	Description
siglip_base_patch16_224	203.16M	200 million parameter, image size 224, pre-trained on WebLi.
siglip_base_patch16_256	203.20M	200 million parameter, image size 256, pre-trained on WebLi.
siglip_base_patch16_384	203.45M	200 million parameter, image size 384, pre-trained on WebLi.
siglip_base_patch16_512	203.79M	200 million parameter, image size 512, pre-trained on WebLi.
siglip_base_patch16_256_multilingual	370.63M	370 million parameter, image size 256, pre-trained on WebLi.
siglip2_base_patch16_224	375.19M	375 million parameter, patch size 16, image size 224, pre-trained on WebLi.
siglip2_base_patch16_256	375.23M	375 million parameter, patch size 16, image size 256, pre-trained on WebLi.
siglip2_base_patch32_256	376.86M	376 million parameter, patch size 32, image size 256, pre-trained on WebLi.
siglip2_base_patch16_384	376.86M	376 million parameter, patch size 16, image size 384, pre-trained on WebLi.
siglip_large_patch16_256	652.15M	652 million parameter, image size 256, pre-trained on WebLi.
siglip_large_patch16_384	652.48M	652 million parameter, image size 384, pre-trained on WebLi.
siglip_so400m_patch14_224	877.36M	877 million parameter, image size 224, shape-optimized version, pre-trained on WebLi.
siglip_so400m_patch14_384	877.96M	877 million parameter, image size 384, shape-optimized version, pre-trained on WebLi.
siglip2_large_patch16_256	881.53M	881 million parameter, patch size 16, image size 256, pre-trained on WebLi.
siglip2_large_patch16_384	881.86M	881 million parameter, patch size 16, image size 384, pre-trained on WebLi.
siglip2_large_patch16_512	882.31M	882 million parameter, patch size 16, image size 512, pre-trained on WebLi.
siglip_so400m_patch16_256_i18n	1.13B	1.1 billion parameter, image size 256, shape-optimized version, pre-trained on WebLi.
siglip2_so400m_patch14_224	1.14B	1.1 billion parameter, patch size 14, image size 224, shape-optimized version, pre-trained on WebLi.
siglip2_so400m_patch16_256	1.14B	1.1 billion parameter, patch size 16, image size 256, shape-optimized version, pre-trained on WebLi.
siglip2_so400m_patch14_384	1.14B	1.1 billion parameter, patch size 14, image size 224, shape-optimized version, pre-trained on WebLi.
siglip2_so400m_patch16_384	1.14B	1.1 billion parameter, patch size 16, image size 384, shape-optimized version, pre-trained on WebLi.
siglip2_so400m_patch16_512	1.14B	1.1 billion parameter, patch size 16, image size 512, shape-optimized version, pre-trained on WebLi.
siglip2_giant_opt_patch16_256	1.87B	1.8 billion parameter, patch size 16, image size 256, pre-trained on WebLi.
siglip2_giant_opt_patch16_384	1.87B	1.8 billion parameter, patch size 16, image size 384, pre-trained on WebLi.

SigLIPImageConverter

SigLIPImageConverter class

from_preset method

SigLIPImageConverter

SigLIPImageConverter class

from_preset method

`SigLIPImageConverter` class

`from_preset` method