Models Directory
Browse all 77 available language models and their capabilities
Free Models24
These models are available to all users without any subscription or pay-as-you-go charges.
liquid/lfm-7b
liquid/lfm-3b
mistralai/ministral-3b
mistralai/ministral-8b
gryphe/mythomax-l2-13b
One of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions and roleplay. #merge
Context: 4096 tokens
Max output: 4096 tokens
amazon/nova-micro-v1
Amazon Nova Micro 1.0 is a text-only model that delivers the lowest latency responses in the Amazon Nova family of models at a very low cost. With a context length...
Context: 128000 tokens
Max output: 5120 tokens
microsoft/phi-4
Microsoft Research Phi-4 is designed to perform well in complex reasoning tasks and can operate efficiently in situations with limited memory or where quick responses are needed. At 14 billion...
Context: 16384 tokens
Max output: 16384 tokens
microsoft/wizardlm-2-7b
google/gemini-flash-1.5-8b
mistralai/mistral-7b-instruct
google/gemma-2-9b-it
meta-llama/llama-3.2-3b-instruct
Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced natural language processing tasks like dialogue generation, reasoning, and summarization. Designed with the latest transformer architecture, it...
Context: 131072 tokens
Max output: N/A tokens
meta-llama/llama-3.2-1b-instruct
Llama 3.2 1B is a 1-billion-parameter language model focused on efficiently performing natural language tasks, such as summarization, dialogue, and multilingual text analysis. Its smaller size allows it to operate...
Context: 131072 tokens
Max output: N/A tokens
meta-llama/llama-3.1-8b-instruct
Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 8B instruct-tuned version is fast and efficient. It has demonstrated strong performance compared to...
Context: 131072 tokens
Max output: 16384 tokens
qwen/qwen-2-7b-instruct
mistralai/mistral-7b-instruct-v0.3
meta-llama/llama-3-8b-instruct
Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 8B instruct-tuned version was optimized for high quality dialogue usecases. It has demonstrated strong...
Context: 8192 tokens
Max output: 8192 tokens
mistralai/mistral-nemo
A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA. The model is multilingual, supporting English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese,...
Context: 131072 tokens
Max output: N/A tokens
sao10k/l3-lunaris-8b
Lunaris 8B is a versatile generalist and roleplaying model based on Llama 3. It's a strategic merge of multiple models, designed to balance creativity with improved logic and general knowledge....
Context: 8192 tokens
Max output: 16384 tokens
nousresearch/hermes-2-pro-llama-3-8b
Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced...
Context: 8192 tokens
Max output: 8192 tokens
openchat/openchat-7b
undi95/toppy-m-7b:nitro
amazon/nova-lite-v1
Amazon Nova Lite 1.0 is a very low-cost multimodal model from Amazon that focused on fast processing of image, video, and text inputs to generate text output. Amazon Nova Lite...
Context: 300000 tokens
Max output: 5120 tokens
mistralai/pixtral-12b
Pro Models32
These models are available to Pro subscribers with unlimited usage included in the subscription.
thedrummer/unslopnemo-12b
UnslopNemo v4.1 is the latest addition from the creator of Rocinante, designed for adventure writing and role-play scenarios.
Context: 32768 tokens
Max output: 32768 tokens
meta-llama/llama-3.1-70b-instruct
Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 70B instruct-tuned version is optimized for high quality dialogue usecases. It has demonstrated strong...
Context: 131072 tokens
Max output: 16384 tokens
nousresearch/hermes-3-llama-3.1-70b
Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the...
Context: 131072 tokens
Max output: 16384 tokens
deepseek/deepseek-chat
DeepSeek-V3 is the latest model from the DeepSeek team, building upon the instruction following and coding abilities of the previous versions. Pre-trained on nearly 15 trillion tokens, the reported evaluations...
Context: 163840 tokens
Max output: 16384 tokens
microsoft/phi-3.5-mini-128k-instruct
ai21/jamba-1-5-mini
mistralai/codestral-mamba
openai/gpt-4o-mini
GPT-4o mini is OpenAI's newest model after GPT-4 Omni, supporting both text and image inputs with text outputs. As their most advanced small model, it is many multiples more affordable...
Context: 128000 tokens
Max output: 16384 tokens
anthropic/claude-3-haiku
Claude 3 Haiku is Anthropic's fastest and most compact model for near-instant responsiveness. Quick and accurate targeted performance.
See the launch announcement and benchmark results here
#multimodal
Context: 200000 tokens
Max output: 4096 tokens
cognitivecomputations/dolphin-mixtral-8x22b
google/gemma-2-27b-it
Gemma 2 27B by Google is an open model built from the same research and technology used to create the Gemini models. Gemma models are well-suited for a variety of...
Context: 8192 tokens
Max output: 2048 tokens
mistralai/mixtral-8x7b-instruct
mistralai/mistral-small-24b-instruct-2501
Mistral Small 3 is a 24B-parameter language model optimized for low-latency performance across common AI tasks. Released under the Apache 2.0 license, it features both pre-trained and instruction-tuned versions designed...
Context: 32768 tokens
Max output: 16384 tokens
gryphe/mythomist-7b
anthropic/claude-instant-1:beta
nvidia/llama-3.1-nemotron-70b-instruct
deepseek/deepseek-chat-v3-0324
DeepSeek V3, a 685B-parameter, mixture-of-experts model, is the latest iteration of the flagship chat model family from the DeepSeek team. It succeeds the DeepSeek V3 model and performs really well...
Context: 163840 tokens
Max output: 16384 tokens
thedrummer/rocinante-12b
Rocinante 12B is designed for engaging storytelling and rich prose. Early testers have reported: - Expanded vocabulary with unique and expressive word choices - Enhanced creativity for vivid narratives -...
Context: 32768 tokens
Max output: 32768 tokens
eva-unit-01/eva-qwen-2.5-14b
mistralai/mistral-tiny
mistralai/mistral-small
qwen/qwen-turbo
qwen/qwen-plus
Qwen-Plus, based on the Qwen2.5 foundation model, is a 131K context model with a balanced performance, speed, and cost combination.
Context: 1000000 tokens
Max output: 32768 tokens
deepseek/deepseek-r1-distill-qwen-1.5b
deepseek/deepseek-r1-distill-qwen-32b
DeepSeek R1 Distill Qwen 32B is a distilled large language model based on Qwen 2.5 32B, using outputs from DeepSeek R1. It outperforms OpenAI's o1-mini across various benchmarks, achieving new...
Context: 128000 tokens
Max output: 32768 tokens
deepseek/deepseek-r1-distill-llama-70b
DeepSeek R1 Distill Llama 70B is a distilled large language model based on Llama-3.3-70B-Instruct, using outputs from DeepSeek R1. The model combines advanced distillation techniques to achieve high performance across...
Context: 131072 tokens
Max output: 16384 tokens
qwen/qvq-72b-preview
qwen/qwq-32b-preview
qwen/qwen-2.5-coder-32b-instruct
Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). Qwen2.5-Coder brings the following improvements upon CodeQwen1.5: - Significantly improvements in code generation, code reasoning...
Context: 128000 tokens
Max output: N/A tokens
mistralai/codestral-2501
meta-llama/llama-3.3-70b-instruct
The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model...
Context: 131072 tokens
Max output: 16384 tokens
deepseek/deepseek-r1-distill-llama-3.1-70b
Pro Metered Models21
These premium models are available on a pay-as-you-go basis with per-token pricing.
anthropic/claude-3.7-sonnet
anthropic/claude-3.7-sonnet:thinking
deepseek/deepseek-r1
Input: $0.0000007 per token
Output: $0.0000025 per token
DeepSeek R1 is here: Performance on par with OpenAI o1, but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass....
Context: 163840 tokens
Max output: 16000 tokens
✗ Unmoderated
openai/gpt-4o-2024-11-20
Input: $0.0000025 per token
Output: $0.00001 per token
The 2024-11-20 version of GPT-4o offers a leveled-up creative writing ability with more natural, engaging, and tailored writing to improve relevance & readability. It’s also better at working with uploaded...
Context: 128000 tokens
Max output: 16384 tokens
✓ Moderated
openai/o3-mini-high
Input: $0.0000011 per token
Output: $0.0000044 per token
OpenAI o3-mini-high is the same model as o3-mini with reasoning_effort set to high. o3-mini is a cost-efficient language model optimized for STEM reasoning tasks, particularly excelling in science, mathematics, and...
Context: 200000 tokens
Max output: 100000 tokens
✓ Moderated
allenai/llama-3.1-tulu-3-405b
aion-labs/aion-1.0
Input: $0.000004 per token
Output: $0.000008 per token
Aion-1.0 is a multi-model system designed for high performance across various tasks, including reasoning and coding. It is built on DeepSeek-R1, augmented with additional models and techniques such as Tree...
Context: 131072 tokens
Max output: 32768 tokens
✗ Unmoderated
qwen/qwen-max
openai/o1
Input: $0.000015 per token
Output: $0.00006 per token
The latest and strongest model family from OpenAI, o1 is designed to spend more time thinking before responding. The o1 model series is trained with large-scale reinforcement learning to reason...
Context: 200000 tokens
Max output: 100000 tokens
✓ Moderated
x-ai/grok-2-1212
mistralai/mistral-large-2411
Input: $0.000002 per token
Output: $0.000006 per token
Mistral Large 2 2411 is an update of Mistral Large 2 released together with Pixtral Large 2411 It provides a significant upgrade on the previous Mistral Large 24.07, with notable...
Context: 131072 tokens
Max output: N/A tokens
✗ Unmoderated
neversleep/llama-3.1-lumimaid-70b
x-ai/grok-beta
inflection/inflection-3-pi
Input: $0.0000025 per token
Output: $0.00001 per token
Inflection 3 Pi powers Inflection's Pi chatbot, including backstory, emotional intelligence, productivity, and safety. It has access to recent news, and excels in scenarios like customer support and roleplay. Pi...
Context: 8000 tokens
Max output: 1024 tokens
✗ Unmoderated
cohere/command-r-plus-08-2024
Input: $0.0000025 per token
Output: $0.00001 per token
command-r-plus-08-2024 is an update of the Command R+ with roughly 50% higher throughput and 25% lower latencies as compared to the previous Command R+ version, while keeping the hardware footprint...
Context: 128000 tokens
Max output: 4000 tokens
✓ Moderated