Cosmos DB: Tune for Speed and Savings
Databases
Apr 7, 2026 1:00 AM

Cosmos DB: Tune for Speed and Savings

by HubSite 365 about John Savill's [MVP]

Principal Cloud Solutions Architect

Optimize Azure Cosmos DB with Microsoft expert tips on RUs autoscale partition key provisioning for performance and cost

Key insights

  • Cosmos DB optimization summary from John Savill's video: practical steps to cut latency and cost by improving queries, indexing, and partitioning.
    Main focus: reduce RUs (Request Units) and scale predictably while keeping queries fast.
  • Optimistic Direct Execution (ODE) speeds up single-partition queries by skipping extra client-side planning and rewrites.
    Use it for queries that specify a partition key or run on a single physical partition, but avoid it for multi-partition or paginated queries.
  • Global secondary index and smarter search reduce cross-partition "fan out" and improve query efficiency.
    Newer features like vector search and full-text/hybrid search help build faster similarity and text queries.
  • Choose the right Partition key to spread data evenly and prevent hot partitions that throttle throughput.
    Prefer high-cardinality keys and design queries to stay within a single partition when possible.
  • Understand throughput modes: Autoscale, manual, free/serverless, and provisioned throughput each suit different workloads.
    Autoscale helps variable workloads, while provisioned or reserved capacity fits steady, predictable loads; watch account-level throughput limits.
  • Practical RU and data design tips: tune your Indexing policy, reduce document size, and use composite indexes for multi-field filters.
    Monitor Request Units (RUs) and workload patterns (write-heavy, storage-heavy, read-heavy) to pick the right capacity and minimize cost.

Overview of the video and objectives

Overview of the video and objectives

In a clear and practical session, John Savill's [MVP] explores how to optimize Cosmos DB deployments and data models for better performance and cost efficiency. He structures the talk into focused chapters that move from service choices to design patterns, and finally to hands-on tuning techniques. Consequently, the video serves both as an introduction for newcomers and as a focused update for experienced engineers seeking recent improvements.

Throughout the presentation, Savill emphasizes measurable outcomes such as reduced latency and lower cost per operation, often expressed in Request Units (RUs). He combines conceptual guidance with configuration tips and practical examples to show where savings and speed-ups typically come from. Therefore, viewers can expect actionable advice alongside explanations of tradeoffs.

Service options, throughput modes, and RUs

Savill walks through the available service options and throughput modes, explaining the differences between serverless, provisioned, and autoscale models. He highlights how choice of mode directly affects operational cost and elasticity, and he explains how each mode maps to different workload patterns. For example, low and unpredictable traffic often suits serverless, while steady, predictable throughput benefits from provisioned capacity.

In addition, the video dives into RUs as the central unit for pricing and throttling in Cosmos DB. Savill explains how reads, writes, and queries consume RUs differently and why understanding those costs is essential for sound capacity planning. Consequently, he recommends monitoring RU consumption and aligning application behavior to reduce expensive operations.

He also outlines autoscale and manual scaling options and their implications for risk and cost control. Autoscale can absorb spikes with less operational work but may yield higher peak costs if not tuned properly. Conversely, manual or provisioned modes can deliver lower steady-state costs but require careful forecasting to avoid throttling.

Partitioning and indexing: design fundamentals

A significant portion of the video focuses on partitioning strategy and indexing, which Savill calls the “foundation” of performance. He describes how a carefully chosen partition key distributes data and traffic, preventing hot partitions that create bottlenecks and spikes in RU consumption. Therefore, selecting a high-cardinality, evenly distributed key is essential for large-scale deployments.

Moreover, Savill addresses indexing policies and recommends customizing indexes to match query patterns rather than relying on the default full indexing. By excluding unused properties and adding composite indexes where needed, teams can reduce RU overhead and improve query latency. However, he also warns that overly aggressive index pruning can complicate later queries and require index rebuilds.

Document size and schema design also influence both storage cost and RU usage, according to the video. Smaller, focused documents reduce read cost, while large documents or heavy nested structures can increase the RU price of simple operations. Thus, designers must balance normalization and denormalization with access patterns in mind.

New optimizations: ODE, search, and vector features

Savill highlights recent engine-level improvements such as Optimistic Direct Execution (ODE), which speeds up many single-partition queries by skipping client-side plan generation and rewrite steps. He explains that ODE shines for queries that target a specific partition and include operations like GROUP BY, ORDER BY, DISTINCT, and aggregations. Nevertheless, he cautions that ODE may not help for multi-partition queries or heavy pagination, where it can sometimes increase costs.

The video also notes expanded search and indexing features, including a form of global secondary index and general availability of vector search for similarity queries. These capabilities reduce cross-partition fan-out and enable advanced search scenarios, but Savill points out that they add complexity to indexing strategy and may carry additional compute or storage cost. Consequently, teams should evaluate whether richer search functionality justifies the extra configuration and RU impact.

Additionally, the ability to switch capacity modes and the arrival of smarter query execution offer more flexibility for mixed workloads. Still, Savill stresses that flexibility requires disciplined monitoring and continuous adjustment to avoid runaway costs. Therefore, he recommends staged changes and thorough benchmarking before rolling optimizations into production.

Tradeoffs and practical challenges

Throughout the talk, Savill balances recommendations with warnings about tradeoffs, such as between cost predictability and automatic scaling convenience. He explains that autoscale reduces manual operations but can expose teams to unpredictable billing if workloads are poorly understood. Meanwhile, provisioned throughput gives control and predictability but demands accurate forecasting and capacity management.

Another challenge he raises is the tension between optimal partition keys and application design constraints. A theoretically ideal key might complicate query semantics or client code, so teams often compromise and then mitigate hotspots with other techniques. For example, rethinking access patterns, reshaping documents, or using synthetic keys can help but add development complexity.

Conclusion and practical recommendations

In closing, John Savill's [MVP] provides a pragmatic roadmap: start with good partition and indexing choices, measure RU consumption, and apply new engine features like ODE where they fit query patterns. He urges teams to adopt iterative testing, using realistic workloads to validate changes and to avoid broad sweeping adjustments without measurement. As a result, the path to meaningful savings and performance gains becomes systematic rather than accidental.

Finally, Savill recommends maintaining observability to detect regression and to tune over time, because workloads evolve and so should the optimization approach. By combining sound design, targeted indexing, and cautious use of new features, teams can balance performance, complexity, and cost in practical ways. Readers and viewers will find the video useful for both planning and hands-on tuning of their Cosmos DB systems.

Databases - Cosmos DB: Tune for Speed and Savings

Keywords

Cosmos DB performance tuning, Azure Cosmos DB best practices, Cosmos DB indexing strategies, Cosmos DB partitioning design, Cosmos DB throughput optimization, Cosmos DB query performance, Azure Cosmos DB cost optimization, Cosmos DB latency reduction