BLIP-2

by Salesforce Research
API Available

BLIP-2 (Bootstrapping Language-Image Pre-training 2) is a vision-language model from Salesforce that efficiently leverages frozen pre-trained image encoders and large language models. It achieves state-of-the-art performance on various vision-language tasks with significantly fewer trainable parameters.

Specifications

Context Window
512 tokens
Released
January 2023

Capabilities

Image CaptioningVisual Question AnsweringImage-Text RetrievalVision-Language Understanding

Best For

Rate this model

4.6(5 ratings)

Click to rate this AI model