KerasHub

Model Overview

Model Summary

VideoPrism is a family of foundational video-encoder models from Google Research, designed to be a universal "prism" for understanding the diverse facets of video content. Built on a massive scale of 36 million high-quality video-caption pairs and 582 million video clips, VideoPrism is engineered to excel across a wide range of video understanding tasks, including classification, localization, retrieval, and captioning. VideoPrism models utilize a Vision Transformer (ViT) architecture and are pre-trained using a combination of video-text contrastive learning and masked video modeling. This dual approach allows the model to capture both global semantic meaning and fine-grained spatio-temporal details, making it a powerful backbone for state-of-the-art video AI applications.

Links

Installation

Keras and KerasHub can be installed with:

pip install -U -q keras-hub
pip install -U -q keras>=3

JAX, TensorFlow, and Torch come pre-installed in Kaggle Notebooks. For instructions on installing them in another environment, see the Keras Getting Started page.

Presets

The following model checkpoints are provided by the Keras team. For the Video-Text (LvT) variants, both the video encoder and the text encoder are provided to enable multimodal tasks like zero-shot retrieval.

Preset name Parameters Description
videoprism_public_v1_base 114.00M 114 million parameter, 12-layer ViT-B, 16-frame, 288x288 resolution, video-only encoder for spatio-temporal representation.
videoprism_public_v1_large 354.00M 354 million parameter, 24-layer ViT-L, 16-frame, 288x288 resolution, video-only encoder for spatio-temporal representation.
videoprism_lvt_public_v1_base 248.00M 248 million parameter, 12-layer ViT-B video encoder + text encoder, 16-frame, 288x288 resolution, for multimodal video-language tasks.
videoprism_lvt_public_v1_large 580.00M 580 million parameter, 24-layer ViT-L video encoder + text encoder, 16-frame, 288x288 resolution, for multimodal video-language tasks.
Downloads last month
31
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for keras/videoprism_public_v1_base