NVLM-D-72B

Name: NVLM-D-72B
Availability: OnlineOnly
Rating: 4.2 (6 reviews)
Author: NVIDIA

API Available

NVLM-D-72B is NVIDIA's frontier-class multimodal large language model that achieves state-of-the-art results on vision-language tasks. The model features a decoder-only architecture and demonstrates exceptional performance on both multimodal and text-only benchmarks.

Specifications

Context Window: 32,768 tokens
Released: September 2024

Capabilities

Vision-Language UnderstandingImage AnalysisVisual Question AnsweringText GenerationMultimodal Reasoning

Best For

Rate this model

4.2(6 ratings)

Click to rate this AI model

Related Models

Alpamayo-R1

by NVIDIA

Nvidia announced Alpamayo-R1, an open reasoning vision language model designed for autonomous driving research. This model is positioned as the first vision language action model focused specifically on autonomous driving, enabling vehicles to process both text and images to perceive their surroundings and make informed decisions. Alpamayo-R1 is based on Nvidia's Cosmos-Reason model, which emphasizes reasoning in decision-making, and is critical for achieving level 4 autonomous driving, which entails full autonomy in defined areas under specific conditions.