Official-page based evaluator

Paste your real work. Get the right model mix.

This site consolidates official product pages for GPT-5.4, Claude Opus 4.6, Gemini 3.1 Pro, and Gemini 3.1 Flash Live, then turns those specs into a practical recommendation engine for coding, research, spreadsheets, live voice, and long-context work.

Model facts

Use this as the evidence layer before you trust the evaluator. The table separates official claims from product-fit notes so you can challenge the defaults.

Model Official page highlights Pricing Context / limits Benchmark claims Tool / live capabilities Likely failure modes

Task evaluator

Paste a real task, choose the dominant work type, and tune cost, speed, and context sensitivity. The engine recommends one primary model and one backup.

Recommendation

The engine scores fit on a 100-point scale. It also exposes the tradeoff behind the recommendation so you can override it fast.

Side-by-side prompt runners

These panes generate model-specific starter prompts and evaluation checklists. Paste them into your own playgrounds or vendor consoles, then score outputs consistently.

Scorecards

Use a repeatable rubric after each run. The default scorecards balance observed delivery quality with operational fit.

Best team of models

This router splits a workflow into subtasks and assigns each to the model most likely to perform well for that slice of work.

Routed plan

The router is opinionated by default: use specialized models for their strongest lane, then hand off to a synthesizer if needed.

How to use this in practice

Start with the evaluator for a quick pick. Use prompt runners when you want a fair shootout on the same task. Use the router for compound workflows where no single model should own everything.