ViTDetBackbone
classkeras_cv.models.ViTDetBackbone(
include_rescaling,
input_shape=(1024, 1024, 3),
input_tensor=None,
patch_size=16,
embed_dim=768,
depth=12,
mlp_dim=3072,
num_heads=12,
out_chans=256,
use_bias=True,
use_abs_pos=True,
use_rel_pos=True,
window_size=14,
global_attention_indices=[2, 5, 8, 11],
layer_norm_epsilon=1e-06,
**kwargs
)
A ViT image encoder that uses a windowed transformer encoder and relative positional encodings.
Arguments
(H, W, C)
format. Defaults to (1024, 1024, 3)
.keras.layers.Input()
) to use as image input for the model.
Defaults to None
.True
, inputs will be passed through a
Rescaling(1/255.0)
layer. Defaults to False
.16
.768
.12
.768*4
.MultiHeadAttentionWithRelativePE
layer of each transformer
encoder. Defaults to 12
.256
.True
.True
.True
.14
.[2, 5, 8, 11]
.1e-6
.References
from_preset
methodViTDetBackbone.from_preset()
Instantiate ViTDetBackbone model from preset config and weights.
Arguments
None
, which follows whether the preset has
pretrained weights available.Examples
# Load architecture and weights from preset
model = keras_cv.models.ViTDetBackbone.from_preset(
"vitdet_base_sa1b",
)
# Load randomly initialized model from preset architecture with weights
model = keras_cv.models.ViTDetBackbone.from_preset(
"vitdet_base_sa1b",
load_weights=False,
Preset name | Parameters | Description |
---|---|---|
vitdet_base | 89.67M | Detectron2 ViT basebone with 12 transformer encoders with embed dim 768 and attention layers with 12 heads with global attention on encoders 2, 5, 8, and 11. |
vitdet_large | 308.28M | Detectron2 ViT basebone with 24 transformer encoders with embed dim 1024 and attention layers with 16 heads with global attention on encoders 5, 11, 17, and 23. |
vitdet_huge | 637.03M | Detectron2 ViT basebone model with 32 transformer encoders with embed dim 1280 and attention layers with 16 heads with global attention on encoders 7, 15, 23, and 31. |
vitdet_base_sa1b | 89.67M | A base Detectron2 ViT backbone trained on the SA1B dataset. |
vitdet_large_sa1b | 308.28M | A large Detectron2 ViT backbone trained on the SA1B dataset. |
vitdet_huge_sa1b | 637.03M | A huge Detectron2 ViT backbone trained on the SA1B dataset. |
ViTDetBBackbone
classkeras_cv.models.ViTDetBBackbone(
include_rescaling,
input_shape=(1024, 1024, 3),
input_tensor=None,
patch_size=16,
embed_dim=768,
depth=12,
mlp_dim=3072,
num_heads=12,
out_chans=256,
use_bias=True,
use_abs_pos=True,
use_rel_pos=True,
window_size=14,
global_attention_indices=[2, 5, 8, 11],
layer_norm_epsilon=1e-06,
**kwargs
)
VitDetBBackbone model.
Reference
For transfer learning use cases, make sure to read the guide to transfer learning & fine-tuning.
Example
input_data = np.ones(shape=(1, 1024, 1024, 3))
# Randomly initialized backbone
model = VitDetBBackbone()
output = model(input_data)
ViTDetLBackbone
classkeras_cv.models.ViTDetLBackbone(
include_rescaling,
input_shape=(1024, 1024, 3),
input_tensor=None,
patch_size=16,
embed_dim=768,
depth=12,
mlp_dim=3072,
num_heads=12,
out_chans=256,
use_bias=True,
use_abs_pos=True,
use_rel_pos=True,
window_size=14,
global_attention_indices=[2, 5, 8, 11],
layer_norm_epsilon=1e-06,
**kwargs
)
VitDetLBackbone model.
Reference
For transfer learning use cases, make sure to read the guide to transfer learning & fine-tuning.
Example
input_data = np.ones(shape=(1, 1024, 1024, 3))
# Randomly initialized backbone
model = VitDetLBackbone()
output = model(input_data)
ViTDetHBackbone
classkeras_cv.models.ViTDetHBackbone(
include_rescaling,
input_shape=(1024, 1024, 3),
input_tensor=None,
patch_size=16,
embed_dim=768,
depth=12,
mlp_dim=3072,
num_heads=12,
out_chans=256,
use_bias=True,
use_abs_pos=True,
use_rel_pos=True,
window_size=14,
global_attention_indices=[2, 5, 8, 11],
layer_norm_epsilon=1e-06,
**kwargs
)
VitDetHBackbone model.
Reference
For transfer learning use cases, make sure to read the guide to transfer learning & fine-tuning.
Example
input_data = np.ones(shape=(1, 1024, 1024, 3))
# Randomly initialized backbone
model = VitDetHBackbone()
output = model(input_data)