Poly AI
ChatsCommunityReferral

Models Directory

Browse all 77 available language models and their capabilities

Free Models24

These models are available to all users without any subscription or pay-as-you-go charges.

liquid/lfm-7b

liquid/lfm-3b

mistralai/ministral-3b

mistralai/ministral-8b

gryphe/mythomax-l2-13b

One of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions and roleplay. #merge

Context: 4096 tokens

Max output: 4096 tokens

amazon/nova-micro-v1

Amazon Nova Micro 1.0 is a text-only model that delivers the lowest latency responses in the Amazon Nova family of models at a very low cost. With a context length...

Context: 128000 tokens

Max output: 5120 tokens

microsoft/phi-4

Microsoft Research Phi-4 is designed to perform well in complex reasoning tasks and can operate efficiently in situations with limited memory or where quick responses are needed. At 14 billion...

Context: 16384 tokens

Max output: 16384 tokens

microsoft/wizardlm-2-7b

google/gemini-flash-1.5-8b

mistralai/mistral-7b-instruct

google/gemma-2-9b-it

meta-llama/llama-3.2-3b-instruct

Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced natural language processing tasks like dialogue generation, reasoning, and summarization. Designed with the latest transformer architecture, it...

Context: 131072 tokens

Max output: N/A tokens

meta-llama/llama-3.2-1b-instruct

Llama 3.2 1B is a 1-billion-parameter language model focused on efficiently performing natural language tasks, such as summarization, dialogue, and multilingual text analysis. Its smaller size allows it to operate...

Context: 131072 tokens

Max output: N/A tokens

meta-llama/llama-3.1-8b-instruct

Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 8B instruct-tuned version is fast and efficient. It has demonstrated strong performance compared to...

Context: 131072 tokens

Max output: 16384 tokens

qwen/qwen-2-7b-instruct

mistralai/mistral-7b-instruct-v0.3

meta-llama/llama-3-8b-instruct

Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 8B instruct-tuned version was optimized for high quality dialogue usecases. It has demonstrated strong...

Context: 8192 tokens

Max output: 8192 tokens

mistralai/mistral-nemo

A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA. The model is multilingual, supporting English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese,...

Context: 131072 tokens

Max output: N/A tokens

sao10k/l3-lunaris-8b

Lunaris 8B is a versatile generalist and roleplaying model based on Llama 3. It's a strategic merge of multiple models, designed to balance creativity with improved logic and general knowledge....

Context: 8192 tokens

Max output: 16384 tokens

nousresearch/hermes-2-pro-llama-3-8b

Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced...

Context: 8192 tokens

Max output: 8192 tokens

openchat/openchat-7b

undi95/toppy-m-7b:nitro

amazon/nova-lite-v1

Amazon Nova Lite 1.0 is a very low-cost multimodal model from Amazon that focused on fast processing of image, video, and text inputs to generate text output. Amazon Nova Lite...

Context: 300000 tokens

Max output: 5120 tokens

mistralai/pixtral-12b

Pro Models32

These models are available to Pro subscribers with unlimited usage included in the subscription.

thedrummer/unslopnemo-12b

UnslopNemo v4.1 is the latest addition from the creator of Rocinante, designed for adventure writing and role-play scenarios.

Context: 32768 tokens

Max output: 32768 tokens

meta-llama/llama-3.1-70b-instruct

Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 70B instruct-tuned version is optimized for high quality dialogue usecases. It has demonstrated strong...

Context: 131072 tokens

Max output: 16384 tokens

nousresearch/hermes-3-llama-3.1-70b

Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the...

Context: 131072 tokens

Max output: 16384 tokens

deepseek/deepseek-chat

DeepSeek-V3 is the latest model from the DeepSeek team, building upon the instruction following and coding abilities of the previous versions. Pre-trained on nearly 15 trillion tokens, the reported evaluations...

Context: 163840 tokens

Max output: 16384 tokens

microsoft/phi-3.5-mini-128k-instruct

ai21/jamba-1-5-mini

mistralai/codestral-mamba

openai/gpt-4o-mini

GPT-4o mini is OpenAI's newest model after GPT-4 Omni, supporting both text and image inputs with text outputs. As their most advanced small model, it is many multiples more affordable...

Context: 128000 tokens

Max output: 16384 tokens

anthropic/claude-3-haiku

Claude 3 Haiku is Anthropic's fastest and most compact model for near-instant responsiveness. Quick and accurate targeted performance.

See the launch announcement and benchmark results here

#multimodal

Context: 200000 tokens

Max output: 4096 tokens

cognitivecomputations/dolphin-mixtral-8x22b

google/gemma-2-27b-it

Gemma 2 27B by Google is an open model built from the same research and technology used to create the Gemini models. Gemma models are well-suited for a variety of...

Context: 8192 tokens

Max output: 2048 tokens

mistralai/mixtral-8x7b-instruct

mistralai/mistral-small-24b-instruct-2501

Mistral Small 3 is a 24B-parameter language model optimized for low-latency performance across common AI tasks. Released under the Apache 2.0 license, it features both pre-trained and instruction-tuned versions designed...

Context: 32768 tokens

Max output: 16384 tokens

gryphe/mythomist-7b

anthropic/claude-instant-1:beta

nvidia/llama-3.1-nemotron-70b-instruct

deepseek/deepseek-chat-v3-0324

DeepSeek V3, a 685B-parameter, mixture-of-experts model, is the latest iteration of the flagship chat model family from the DeepSeek team. It succeeds the DeepSeek V3 model and performs really well...

Context: 163840 tokens

Max output: 16384 tokens

thedrummer/rocinante-12b

Rocinante 12B is designed for engaging storytelling and rich prose. Early testers have reported: - Expanded vocabulary with unique and expressive word choices - Enhanced creativity for vivid narratives -...

Context: 32768 tokens

Max output: 32768 tokens

eva-unit-01/eva-qwen-2.5-14b

mistralai/mistral-tiny

mistralai/mistral-small

qwen/qwen-turbo

qwen/qwen-plus

Qwen-Plus, based on the Qwen2.5 foundation model, is a 131K context model with a balanced performance, speed, and cost combination.

Context: 1000000 tokens

Max output: 32768 tokens

deepseek/deepseek-r1-distill-qwen-1.5b

deepseek/deepseek-r1-distill-qwen-32b

DeepSeek R1 Distill Qwen 32B is a distilled large language model based on Qwen 2.5 32B, using outputs from DeepSeek R1. It outperforms OpenAI's o1-mini across various benchmarks, achieving new...

Context: 128000 tokens

Max output: 32768 tokens

deepseek/deepseek-r1-distill-llama-70b

DeepSeek R1 Distill Llama 70B is a distilled large language model based on Llama-3.3-70B-Instruct, using outputs from DeepSeek R1. The model combines advanced distillation techniques to achieve high performance across...

Context: 131072 tokens

Max output: 16384 tokens

qwen/qvq-72b-preview

qwen/qwq-32b-preview

qwen/qwen-2.5-coder-32b-instruct

Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). Qwen2.5-Coder brings the following improvements upon CodeQwen1.5: - Significantly improvements in code generation, code reasoning...

Context: 128000 tokens

Max output: N/A tokens

mistralai/codestral-2501

meta-llama/llama-3.3-70b-instruct

The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model...

Context: 131072 tokens

Max output: 16384 tokens

deepseek/deepseek-r1-distill-llama-3.1-70b

Pro Metered Models21

These premium models are available on a pay-as-you-go basis with per-token pricing.

anthropic/claude-3.7-sonnet

anthropic/claude-3.7-sonnet:thinking

deepseek/deepseek-r1

Input: $0.0000007 per token

Output: $0.0000025 per token

DeepSeek R1 is here: Performance on par with OpenAI o1, but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass....

Context: 163840 tokens

Max output: 16000 tokens

✗ Unmoderated

openai/gpt-4o-2024-11-20

Input: $0.0000025 per token

Output: $0.00001 per token

The 2024-11-20 version of GPT-4o offers a leveled-up creative writing ability with more natural, engaging, and tailored writing to improve relevance & readability. It’s also better at working with uploaded...

Context: 128000 tokens

Max output: 16384 tokens

✓ Moderated

openai/o3-mini-high

Input: $0.0000011 per token

Output: $0.0000044 per token

OpenAI o3-mini-high is the same model as o3-mini with reasoning_effort set to high. o3-mini is a cost-efficient language model optimized for STEM reasoning tasks, particularly excelling in science, mathematics, and...

Context: 200000 tokens

Max output: 100000 tokens

✓ Moderated

allenai/llama-3.1-tulu-3-405b

aion-labs/aion-1.0

Input: $0.000004 per token

Output: $0.000008 per token

Aion-1.0 is a multi-model system designed for high performance across various tasks, including reasoning and coding. It is built on DeepSeek-R1, augmented with additional models and techniques such as Tree...

Context: 131072 tokens

Max output: 32768 tokens

✗ Unmoderated

qwen/qwen-max

openai/o1

Input: $0.000015 per token

Output: $0.00006 per token

The latest and strongest model family from OpenAI, o1 is designed to spend more time thinking before responding. The o1 model series is trained with large-scale reinforcement learning to reason...

Context: 200000 tokens

Max output: 100000 tokens

✓ Moderated

x-ai/grok-2-1212

mistralai/mistral-large-2411

Input: $0.000002 per token

Output: $0.000006 per token

Mistral Large 2 2411 is an update of Mistral Large 2 released together with Pixtral Large 2411 It provides a significant upgrade on the previous Mistral Large 24.07, with notable...

Context: 131072 tokens

Max output: N/A tokens

✗ Unmoderated

neversleep/llama-3.1-lumimaid-70b

x-ai/grok-beta

inflection/inflection-3-pi

Input: $0.0000025 per token

Output: $0.00001 per token

Inflection 3 Pi powers Inflection's Pi chatbot, including backstory, emotional intelligence, productivity, and safety. It has access to recent news, and excels in scenarios like customer support and roleplay. Pi...

Context: 8000 tokens

Max output: 1024 tokens

✗ Unmoderated

cohere/command-r-plus-08-2024

Input: $0.0000025 per token

Output: $0.00001 per token

command-r-plus-08-2024 is an update of the Command R+ with roughly 50% higher throughput and 25% lower latencies as compared to the previous Command R+ version, while keeping the hardware footprint...

Context: 128000 tokens

Max output: 4000 tokens

✓ Moderated

ai21/jamba-1-5-large

01-ai/yi-large

neversleep/llama-3-lumimaid-70b

anthropic/claude-3-opus

anthropic/claude-3-sonnet

alpindale/goliath-120b