
A Microsoft MVP 𝗁𝖾𝗅𝗉𝗂𝗇𝗀 develop careers, scale and 𝗀𝗋𝗈𝗐 businesses 𝖻𝗒 𝖾𝗆𝗉𝗈𝗐𝖾𝗋𝗂𝗇𝗀 everyone 𝗍𝗈 𝖺𝖼𝗁𝗂𝖾𝗏𝖾 𝗆𝗈𝗋𝖾 𝗐𝗂𝗍𝗁 𝖬𝗂𝖼𝗋𝗈𝗌𝗈𝖿𝗍 𝟥𝟨𝟧
The YouTube video by Daniel Anderson [MVP] examines how model choice in Copilot Cowork changes real costs for identical tasks. In the demonstration, Anderson runs the same prompt against three models and records the resulting bills. The results show a surprising spread in cost that matters for Teams running many tasks. Consequently, the video warns decision makers to pick models deliberately rather than by habit.
The test compares outputs for an identical CSV of 2025 sales figures and three requested artefacts: an Excel workbook, a PowerPoint deck, and an interactive HTML dashboard. Each run produced the same deliverables but with different billed amounts. This makes the cost difference easier to compare because output quality and format stayed consistent. Therefore, the focus is squarely on price per model under real-world conditions.
Anderson reports that Sonnet 4.6 returned a bill of $3.98, Opus 4.8 $4.77, and GPT 5.5 $10.69 for the same prompt. That represents about a 2.7x spread between the cheapest and the most expensive run on identical work. The video reinforces that model choice can compound over time, especially in pipelines that run many tasks every day. Thus, even small per-task differences quickly become large line items on a monthly bill.
The clip also includes short chapter timestamps that guide viewers through why model choice matters, the available models, and step-by-step runs with cost checks. Anderson demonstrates using the /cost command after each run to show the bill in real time. This practical approach makes the cost mechanics tangible for viewers who manage budgets. As a result, the video serves both as an experiment and a how-to for cost awareness.
Copilot Cowork moved to general availability on 16 June 2026 and uses pay-as-you-go pricing where each Copilot Credit costs $0.01. Anderson highlights that model selection now directly affects your bill because the model picker records which model runs each task. Moreover, the picker is a single click, so users may change models frequently without realizing the cumulative cost. Therefore, teams must understand cost drivers beyond model choice alone.
The video explains that model choice is one of four main cost drivers along with context retrieval, tool calls, and runtime. Each of these factors can increase credits consumed during a task and so they interact with model efficiency. For example, long context retrieval and heavy tool use can magnify differences between models that emit fewer tokens. Consequently, assessing all four elements together gives a clearer view of total cost of ownership.
Anderson discusses tradeoffs: GPT 5.5 carries a higher per-token price but often uses far fewer output tokens, which can make it cost-efficient on tasks that require more compact responses. In contrast, Opus tends to score higher on certain reasoning benchmarks and may produce richer single-turn reasoning, albeit at higher token usage. Meanwhile, Sonnet aims to balance quality and price, often providing a middle ground and becoming a practical default for many developers. These differences mean teams must weigh quality, speed, and token efficiency when choosing a model.
The video underscores specific challenges. For instance, a model that uses fewer tokens might still miss nuanced edge cases, while a model with better reasoning may use more tokens and cost more. Also, models that excel at long-context retrieval or agentic workflows may reduce failures and reruns, which saves time and money overall. Thus, the best choice depends on task type: high-volume, repetitive automation favors token efficiency, while complex refactors or migrations may justify higher cost for better reasoning.
Anderson’s main recommendation is to test deliberately and track costs. He shows how to run the same prompt on different models and use the /cost check after each run, so teams can measure real billing differences. Over a month, these per-task gaps compound, so small tests reveal large budget impacts. Accordingly, teams should adopt a pilot process before defaulting to a single model across projects.
Finally, the video encourages transparency and governance around model selection. Teams should document which model suits each workflow, monitor credits, and consider automated rules that choose models by task category. Although choosing the cheapest model may look attractive, managers must balance it against reliability, latency, and the cost of rework. By weighing these tradeoffs and running targeted tests, organizations can make informed choices that match performance needs to budget limits.
Claude Sonnet vs GPT 5.5 cost comparison, Opus vs GPT 5.5 Copilot Cowork pricing, Copilot Cowork AI cost test, Claude Sonnet Opus GPT 5.5 performance price comparison, GPT 5.5 pricing analysis Copilot Cowork, Best AI for Copilot Cowork cost comparison, Opus Claude Sonnet cost per request, AI model cost benchmarking Copilot Cowork