BLIP-2
by Salesforce ResearchBLIP-2 (Bootstrapping Language-Image Pre-training 2) is a vision-language model from Salesforce that efficiently leverages frozen pre-trained image encoders and large language models. It achieves state-of-the-art performance on various vision-language tasks with significantly fewer trainable parameters.
Specifications
- Context Window
- 512 tokens
- Released
- January 2023
Capabilities
Image CaptioningVisual Question AnsweringImage-Text RetrievalVision-Language Understanding
Best For
Rate this model
4.6(5 ratings)
Click to rate this AI model