Best LLMs for Writing in 2026

Aggregated benchmark data across EQ-Bench Creative Writing, LMArena Text, and Artificial Analysis — covering 25 models, updated weekly.

Last updated: March 8, 2026 · 25 models tracked · 3 tiers: Premium · Mid-Range · Budget

How we rank models for writing

This ranking combines three independent data sources to give the most complete picture of writing quality across frontier LLMs. No single benchmark captures the full picture — so we aggregate:

Show benchmark details

EQ Creative EQ-Bench Creative Writing — specialist benchmark using trained raters to assess narrative quality, emotional depth, prose style, and character voice. Elo scale ~1400–1940. The most relevant signal for marketing copy, long-form content, and creative work.
Arena Text LMArena Text — crowd-sourced human preference leaderboard. Broad signal across all text tasks: a model that consistently wins votes is generally pleasant, clear, and useful to read. Elo scale ~1460–1510.
EQ General EQ-Bench General — measures emotional intelligence in roleplay scenarios. A proxy for character voice quality and tonal control — useful for brand voice work. Note: high EQ-General does not automatically mean strong creative writing; interpret alongside EQ Creative.
Speed Artificial Analysis — median output tokens per second across providers. Matters for iterative draft workflows where waiting costs time. ~75 tokens ≈ 55 words.

Prices are per 1M tokens (input / output) and reflect standard API pricing. A dash (—) means the model has not yet appeared on that leaderboard — never an estimated or interpolated value.

EVY uses this data automatically

Instead of picking one model and hoping it fits every task, EVY routes each writing request — brand copy, long-form content, quick social posts — to the model best suited for that specific job. You get top-tier output without managing a single API key.

Try EVY free →

Model	EQ Creative ↓	Arena Text ↕	EQ General ↕	Speed ↕	Price / 1M ↕
Claude Sonnet 4.6 Premium Anthropic 🧠 EQ-Bench	1,936.2	—	1,890.9	50 t/s	$3.00 $15.00
Sources EQ-Bench leaderboard → Artificial Analysis →
Claude Opus 4.6 Premium Anthropic 📋 Consensus	1,931.7	1,504	1,874.8	45 t/s	$5.00 $25.00
Sources EQ-Bench leaderboard → LMArena Text → Artificial Analysis →
*gpt-5.3-chat Mid-Range OpenAI 🧠 EQ-Bench	1,816.8	—	1,402.6	—	—
Sources EQ-Bench leaderboard →
claude-sonnet-4.5 Premium Anthropic 🧠 EQ-Bench	1,744.5	—	1,529.4	—	$3.00 $15.00
Sources EQ-Bench leaderboard →
claude-opus-4-5-20251101 Premium Anthropic 🧠 EQ-Bench	1,733.7	—	1,619.6	—	$5.00 $25.00
Sources EQ-Bench leaderboard →
O3 Mid-Range OpenAI 🧠 EQ-Bench	1,730.9	—	1,500	—	$2.00 $8.00
Sources EQ-Bench leaderboard →
Kimi K2 Mid-Range Moonshot AI 🧠 EQ-Bench	1,661.4	—	1,601.8	44 t/s	$0.55 $2.20
Sources EQ-Bench leaderboard → Artificial Analysis →
openrouter/horizon-alpha Mid-Range Unknown 🧠 EQ-Bench	1,633.6	—	1,545.2	—	—
Sources EQ-Bench leaderboard →
GLM-5 Budget Zhipu AI 🧠 EQ-Bench	1,626.1	—	1,650.2	80 t/s	$0.80 $2.50
Sources EQ-Bench leaderboard → Artificial Analysis →
claude-opus-4 Premium Anthropic 🧠 EQ-Bench	1,619.7	—	1,421.9	—	$15.00 $75.00
Sources EQ-Bench leaderboard →
GPT-5.2 Premium OpenAI 📋 Consensus	1,593.6	1,480	1,607.4	—	$1.25 $10.00
Sources EQ-Bench leaderboard → LMArena Text →
Kimi K2.5 Mid-Range Moonshot AI 🧠 EQ-Bench	1,542.3	—	1,563.1	—	—
Sources EQ-Bench leaderboard →
*moonshotai/Kimi-K2.5 Mid-Range Moonshot AI 🧠 EQ-Bench	1,542.3	—	1,563.1	—	—
Sources EQ-Bench leaderboard →
DeepSeek V3.2 Budget DeepSeek 🧠 EQ-Bench	1,495.8	—	—	—	$0.28 $0.42
Sources EQ-Bench leaderboard →
Gemini 3 Pro Mid-Range Google 📋 Consensus	1,474.5	1,486	1,559.2	80 t/s	$2.00 $12.00
Sources EQ-Bench leaderboard → LMArena Text → Artificial Analysis →
Qwen3-235B Budget Alibaba 🧠 EQ-Bench	1,459	—	1,233.8	—	$0.18 $0.54
Sources EQ-Bench leaderboard →
Gemini 3.1 Pro Mid-Range Google 🧠 EQ-Bench	1,447.9	—	1,546	—	$2.11 $12.66
Sources EQ-Bench leaderboard →
Mistral Medium 3 Budget Mistral AI 🧠 EQ-Bench	1,445.3	—	—	—	$0.40 $2.00
Sources EQ-Bench leaderboard →
GPT-4o Premium OpenAI 🧠 EQ-Bench	1,443	—	1,393	185 t/s	$2.50 $10.00
Sources EQ-Bench leaderboard → Artificial Analysis →
GLM-4.7 Budget Zhipu AI 🧠 EQ-Bench	1,363.4	—	1,442.4	—	$0.38 $1.70
Sources EQ-Bench leaderboard →
MiniMax M2.5 Budget MiniMax 🧠 EQ-Bench	1,295.2	—	—	395 t/s	$0.30 $1.20
Sources EQ-Bench leaderboard → Artificial Analysis →
GPT-5.4 Premium OpenAI ⚠️ No Data	—	—	—	—	—
Sources No benchmark data available for this model yet.
Gemini 3 Flash Mid-Range Google 🏟️ Arena	—	1,473	—	250 t/s	$0.50 $3.00
Sources LMArena Text → Artificial Analysis →
Gemini 3.1 Flash-Lite Budget Google ⚠️ No Data	—	—	—	—	$0.25 $1.50
Sources No benchmark data available for this model yet.
Grok 4.1 Mid-Range xAI 🏟️ Arena	—	1,473	—	163 t/s	$0.20 $0.50
Sources LMArena Text → Artificial Analysis →

The right model depends on the task

Benchmark leaderboards rank models globally — but the best model for a 2,000-word thought leadership article is not necessarily the best model for a 15-word social media headline. Here's how the leading models split across common writing tasks:

Narrative & long-form

Thought leadership, case studies, email newsletters, ghostwriting. Requires emotional depth, tonal consistency, and the ability to sustain voice across thousands of words.

Best picks: Claude Sonnet 4.6 · Claude Opus 4.6

Structured commercial copy

Product descriptions, landing pages, ad copy, LinkedIn posts. Requires clarity, persuasion structure, and format adherence more than creative flair.

Best picks: GPT-5.2 · Claude Sonnet 4.6

High-volume / fast drafts

Social media scheduling, meta descriptions, bulk content variation. Speed and cost matter more than peak quality; fast iteration wins here.

Best picks: Gemini 3 Flash · Grok 4.1 · Kimi K2

Brand voice & consistency

Any content where staying on-brand is non-negotiable. Requires strong instruction-following, tonal control, and memory of brand guidelines.

Best picks: Claude Sonnet 4.6 · Gemini 3.1 Pro

Managing this complexity manually — four API keys, four pricing tiers, a decision tree for every task type — is exactly the overhead that kills creative momentum. EVY eliminates the routing problem entirely.

EVY picks the right model.
Every time, for every task.

EVY is an AI co-creator that runs inside any app on your Mac. Press the EVY-key, speak your idea, and EVY routes it to the ideal model — then writes, edits, or transforms it into finished content in your brand voice.

Try EVY for free See pricing →

Frequently asked questions

Which LLM is best for creative writing in 2026?

Claude Sonnet 4.6 (Anthropic) leads the EQ-Bench Creative Writing leaderboard with an Elo score of 1936 as of March 2026, followed closely by Claude Opus 4.6 at 1932. Both excel at narrative quality, emotional depth, and character voice — the core skills that separate great writing from generic AI output.

What is EQ-Bench and why does it matter for writing?

EQ-Bench is an independent benchmark that evaluates large language models on emotional intelligence and narrative quality, using a panel of human raters. Its Creative Writing sub-leaderboard specifically measures story quality, emotional resonance, and prose style — making it the most relevant benchmark for marketing copy, long-form content, and creative work. Scores are on an Elo scale where higher is better, typically ranging from ~1400 to ~1940.

What is LMArena Text and how is it different from EQ-Bench?

LMArena Text (formerly LMSYS Chatbot Arena) measures human preference through head-to-head votes: two anonymous models answer the same prompt, and users pick the better response. It's a broad preference signal across all text tasks, not just writing. EQ-Bench Creative Writing is narrower and more specialist — it specifically evaluates narrative and emotional writing quality with trained raters rather than crowd votes.

Which LLM is the best value for writing tasks?

Kimi K2 by Moonshot AI offers the best performance-per-dollar for writing: an EQ-Bench Creative score of 1700 and EQ-General score of 1602 at just $0.60 input / $2.50 output per 1M tokens — roughly 5× cheaper than Claude Sonnet 4.6 with ~87% of its creative writing performance. GLM-5 (Zhipu AI) is another strong value option at $0.80/$2.50 with scores of 1626 EQ Creative and 1650 EQ General.

How often is this ranking updated?

Scores are updated weekly via an automated scraper that fetches the latest data from EQ-Bench and LMArena. Prices are reviewed manually and updated when providers announce changes. The 'Updated weekly' badge in the table header shows the date of the last successful update.

What does 'tokens per second' mean for writing?

Tokens per second (t/s) measures how fast a model outputs text — roughly, 75 tokens equals about 55 words. For writing workflows, speed matters when you need rapid iteration on drafts or real-time dictation-to-copy conversion. MiniMax M2.5 is the fastest tracked model at 395 t/s; Gemini 3 Flash at 250 t/s offers the best speed-to-cost ratio among paid models.

Does the best LLM for writing change depending on the task?

Yes — significantly. Claude Sonnet 4.6 and Claude Opus 4.6 lead on narrative and emotional writing. GPT-5.2 performs better on structured commercial copy where format consistency matters. Faster models like Gemini 3 Flash or Grok 4.1 suit high-volume, lower-stakes content. EVY handles this complexity automatically: it routes each writing request to the most suitable model based on task type, length, and brand requirements.

Can I use multiple LLMs for writing without switching between tools?

Yes — EVY runs on all major LLMs and automatically selects the best model for each task. You speak or type your request once; EVY decides whether the task calls for Claude's narrative depth, GPT-5.2's structured output, or a faster budget model for quick drafts. No API keys, no model-switching — EVY handles the routing silently in the background.

Compare

Plans

Best LLMs for Writing in 2026

How we rank models for writing

The best AI models for copywriting

The right model depends on the task

Narrative & long-form

Structured commercial copy

High-volume / fast drafts

Brand voice & consistency

EVY picks the right model.
Every time, for every task.

Frequently asked questions

Which LLM is best for creative writing in 2026?

What is EQ-Bench and why does it matter for writing?

What is LMArena Text and how is it different from EQ-Bench?

Which LLM is the best value for writing tasks?

How often is this ranking updated?

What does 'tokens per second' mean for writing?

Does the best LLM for writing change depending on the task?

Can I use multiple LLMs for writing without switching between tools?

Compare

Plans

How we rank models for writing

The best AI models for copywriting

The right model depends on the task

Narrative & long-form

Structured commercial copy

High-volume / fast drafts

Brand voice & consistency

EVY picks the right model. Every time, for every task.

Frequently asked questions

Which LLM is best for creative writing in 2026?

What is EQ-Bench and why does it matter for writing?

What is LMArena Text and how is it different from EQ-Bench?

Which LLM is the best value for writing tasks?

How often is this ranking updated?

What does 'tokens per second' mean for writing?

Does the best LLM for writing change depending on the task?

Can I use multiple LLMs for writing without switching between tools?

EVY picks the right model.
Every time, for every task.