AI Models
Compare 73 models from leading companies
BLIP-2
Salesforce Research
BLIP-2 (Bootstrapping Language-Image Pre-training 2) is a vision-language model from Salesforce that efficiently leverages frozen pre-trained image encoders and large language models. It achieves state-of-the-art performance on various vision-language tasks with significantly fewer trainable parameters.
Whisper
OpenAI
OpenAI's automatic speech recognition model supporting 99 languages. Robust to accents, background noise, and technical language.
MetaSeq
Meta
MetaSeq is Meta's internal library for training large-scale sequence models, used to train models like OPT (Open Pre-trained Transformer). It provides the infrastructure for efficient distributed training of language models with billions of parameters.
Codex
OpenAI
OpenAI Codex is an AI system that translates natural language to code. Codex powers GitHub Copilot and is a descendant of GPT-3, trained on both natural language and billions of lines of code from publicly available sources, including GitHub repositories.
GPT-J
Academic Research
GPT-J is a 6 billion parameter open-source autoregressive language model developed by EleutherAI. It was one of the first large-scale open alternatives to GPT-3 and demonstrated that the open-source community could train competitive language models.
Swin Transformer
Microsoft Research
Swin Transformer is a hierarchical vision transformer that uses shifted windows for computing self-attention. Developed by Microsoft Research, it achieves excellent performance on image classification, object detection, and semantic segmentation while being efficient.
ViLT
Academic Research
ViLT (Vision-and-Language Transformer) is a minimal vision-and-language model that processes raw image patches directly without using a separate visual encoder like CNNs or region features. This makes it significantly faster while maintaining competitive performance.
Turing-NLG
Microsoft Research
Turing-NLG (Natural Language Generation) was one of the largest language models at its release, with 17 billion parameters. Developed by Microsoft, it demonstrated strong performance on various language generation tasks and pushed the boundaries of model scale.
ERNIE
Academic Research
ERNIE (Enhanced Representation through kNowledge IntEgration) is a series of language models developed by Baidu. It incorporates knowledge graphs and entity-level masking to achieve better understanding of semantic relationships and world knowledge.
GoogleNet
GoogleNet (Inception v1) is a deep convolutional neural network architecture that won the ImageNet 2014 competition. It introduced the Inception module, which performs convolutions at multiple scales simultaneously, achieving high accuracy with computational efficiency.
VGG
Academic Research
VGG is a deep convolutional neural network architecture developed by the Visual Geometry Group at Oxford. Known for its simplicity and depth (16-19 layers), VGG demonstrated that network depth is critical for good performance and became widely used for transfer learning.
AlexNet
Academic Research
AlexNet is a landmark convolutional neural network that won the ImageNet Large Scale Visual Recognition Challenge in 2012 by a significant margin. Developed by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, it sparked the deep learning revolution in computer vision.