AI Models

Compare 73 models from leading companies

BLIP-2

Salesforce Research

BLIP-2 (Bootstrapping Language-Image Pre-training 2) is a vision-language model from Salesforce that efficiently leverages frozen pre-trained image encoders and large language models. It achieves state-of-the-art performance on various vision-language tasks with significantly fewer trainable parameters.

Whisper

OpenAI

4.00

OpenAI's automatic speech recognition model supporting 99 languages. Robust to accents, background noise, and technical language.

TranslationAudio & Speech

MetaSeq

Codex

OpenAI

4.14K ctx

OpenAI Codex is an AI system that translates natural language to code. Codex powers GitHub Copilot and is a descendant of GPT-3, trained on both natural language and billions of lines of code from publicly available sources, including GitHub repositories.

GPT-J

Academic Research

4.32K ctx

GPT-J is a 6 billion parameter open-source autoregressive language model developed by EleutherAI. It was one of the first large-scale open alternatives to GPT-3 and demonstrated that the open-source community could train competitive language models.

Swin Transformer

Microsoft Research

3.9

Swin Transformer is a hierarchical vision transformer that uses shifted windows for computing self-attention. Developed by Microsoft Research, it achieves excellent performance on image classification, object detection, and semantic segmentation while being efficient.

ViLT

Academic Research

4.61K ctx

ViLT (Vision-and-Language Transformer) is a minimal vision-and-language model that processes raw image patches directly without using a separate visual encoder like CNNs or region features. This makes it significantly faster while maintaining competitive performance.

Turing-NLG

Microsoft Research

4.01K ctx

Turing-NLG (Natural Language Generation) was one of the largest language models at its release, with 17 billion parameters. Developed by Microsoft, it demonstrated strong performance on various language generation tasks and pushed the boundaries of model scale.

ERNIE

Academic Research

4.04K ctx

ERNIE (Enhanced Representation through kNowledge IntEgration) is a series of language models developed by Baidu. It incorporates knowledge graphs and entity-level masking to achieve better understanding of semantic relationships and world knowledge.

GoogleNet

Google

4.4

GoogleNet (Inception v1) is a deep convolutional neural network architecture that won the ImageNet 2014 competition. It introduced the Inception module, which performs convolutions at multiple scales simultaneously, achieving high accuracy with computational efficiency.

VGG

Academic Research

4.4

VGG is a deep convolutional neural network architecture developed by the Visual Geometry Group at Oxford. Known for its simplicity and depth (16-19 layers), VGG demonstrated that network depth is critical for good performance and became widely used for transfer learning.

AlexNet

Academic Research

4.6

AlexNet is a landmark convolutional neural network that won the ImageNet Large Scale Visual Recognition Challenge in 2012 by a significant margin. Developed by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, it sparked the deep learning revolution in computer vision.

Previous 1 2 3 4 5 6 7 Next