Cosmos DB Query Performance & Cost

Software Delivery Rescue & Technical Leadership

Cosmos DB Query Performance & Cost

April 13, 2026

This is Chapter 7 of Azure Cosmos DB for .NET Developers. Previous: Chapter 6: Cosmos DB Partition Keys & Data Modeling.

There's a well-worn adage in software development that goes something like this: only optimize performance for the problems you know you have. Why is this interesting? Because preemptively optimizing your code for performance problems that you don't officially know that you have usually is a waste of time -- or wrong -- and almost always adds weird complexity.

Now why am I bringing this up? Because it's one of those things that drives me crazy. It drives me crazy to see developers fixing non-existent performance problems. But it launches me into space when they do it on received wisdom about some vague rumor about some best practice that some buddy's sister's neighbor heard someone say on a plane about what sometimes might cause blah blah blah issue.

Wasting time and adding complexity on a performance problem that they don't actually know they have based on a hypothetical understanding of how something works.

So now let's pivot this to software development with Cosmos DB.

Cosmos follows a fairly common cloud service pattern where they bill you for a combination of usage, effort, and storage. Cosmos is amazingly fast even when you do stuff in your code that performs terribly. That means that Cosmos can often just absorb your logic's inefficient operations and still give you great performance.

But there's a catch.

You might end up with a big ol' bill. Basically, it's a new kind of performance problem where the cloud systems you're relying on, do the right things for real-time performance that end up costing you money.

Those systems are making a good decision, too -- why be artificially slow if you don't have to be?

This chapter is all about showing you real numbers so that you know how to optimize for efficiency and cost at the same time.

What Was My Process?

Throughout this book, I've been dropping hints. In Chapter 3, you saw response.RequestCharge for the first time. In Chapter 4, I showed you how cross-partition queries silently cost more. In Chapter 5, I showed you my library's RU logging and cross-partition detection. I kept saying "we'll talk about cost later."

We've gotten to that part. Now has become later.

In order to develop this chapter, I built a test harness, ran it against a real Cosmos DB account in Azure, and measured the RequestCharge on every operation — inserts, reads, updates, deletes, queries, bulk operations, etc. My goal was to try to think of all the things that might make a difference and then measure it. I also ran the same tests against the local emulator. (Yah. Uhhh...don't pay attention to the RequestCharge numbers from the emulator. Seriously.)

I was legit surprised by what was and wasn't important.

But before the numbers make sense, you need to understand how Cosmos charges you — because the billing model changes what the numbers mean.

How Cosmos DB Charges You

Cosmos DB costs break down into three components: throughput (Request Units), storage, and data egress. Throughput is where most of your money goes, and it's where most of your optimization opportunities are.

Request Units: The Currency of Cosmos

A Request Unit is Cosmos DB's way of normalizing compute cost. Instead of tracking CPU, memory, and I/O separately, Cosmos rolls them all into a single number. Every operation — every read, write, query, and delete — consumes some number of RUs, and that number is deterministic. The same operation on the same data costs the same RUs every time.

The baseline: a point read of a 1 KB document (that's a ReadItemAsync with partition key + id) costs 1 RU. Everything else is measured relative to that. A write costs more than a read. A larger document costs more than a smaller one. A query costs more than a point read. But the 1 RU point read is your anchor.

The RU, the Metric System, and Water

Ok. So I'm an American. We use "freedom units" (or whatever they're actually called) to measure stuff rather than the Metric System. Inches, feet, yards, miles, degrees in Fahrenheit — all that stuff. It's a mess. It's based on nothing coherent.

The metric system makes so much more sense. A milliliter is a cubic centimeter. A milliliter of water weighs a gram. Everything in the metric system grows out of that. It's all based on some basic measurements of water.

In Cosmos DB, the RU is the water. A point read of a 1 KB document by partition key and id costs 1 RU. All the other operations are calculated relative to that. It makes sense...just like the metric system.

Storage

You pay for the data you store — currently about $0.25 per GB per month for most regions. This includes your documents, your indexes, and your backups. For most applications, storage cost is small relative to throughput cost. But it's worth knowing that indexes take up space too, which we'll come back to when we talk about indexing policies.

Data Egress

Data leaving Azure or moving between regions costs money. Data coming in is free. For single-region applications, egress is usually negligible. For multi-region setups, it adds up.

A note on geo-replication and cost

This is getting a little into the weeds for this chapter, but Cosmos DB has geo-replication options for distributing your data across Azure regions. And guess what — they all feed into cost, too. From a developer point of view, you don't exactly have to worry about it — it's more of an Azure portal configuration kind of thing. But if your application has front-ends in multiple regions for performance or redundancy/failover, then it's probably something you'll want to look into for Cosmos, too. The short version: every region you replicate to increases your throughput bill. We'll cover geo-replication in more detail in Chapter 11.

Serverless vs. Provisioned vs. Autoscale

Before the RU numbers mean anything, you need to understand how you're being billed for them. Cosmos DB has three throughput modes, and each one changes the math.

Provisioned (Manual)

You reserve a specific number of Request Units per second (RU/s) — say, 400 — and you're billed by the hour for whatever your peak was during that hour. If you provision 400 RU/s and your application uses 200 RU/s, you pay for 400. If it spikes to 600, you get throttled — Cosmos returns HTTP 429 (Too Many Requests) and your application has to retry.

400 RU/s is the minimum, and it's what we've been using throughout this book. It's fine for development and low-traffic applications. For production, you'll typically provision higher and adjust based on monitoring.

So this provisioned model has an upside and downside. The upside: predictable cost. The downside: you're guessing. Over-provision and you waste money. Under-provision and you get throttled.

BTW, getting throttled isn't necessarily bad as long as you and your application planned for it and handle it gracefully.

Autoscale

You set a maximum RU/s — say, 4,000 — and Cosmos automatically scales between 10% of your max (400) and your max (4,000) based on actual demand. You're billed for the peak RU/s used in each hour.

This is often the sweet spot for production workloads with variable traffic. You don't waste money during quiet periods, and you don't get throttled during spikes. The cost per RU is slightly higher than manual provisioned, but the waste reduction usually more than compensates.

Serverless

No provisioning at all. You pay per RU consumed, period. No reservations, no 429 throttling (within reason), no capacity planning.

Serverless is great for development, testing, proof-of-concept projects, and low-traffic applications. It's also good for workloads with very spiky, unpredictable traffic — the kind where provisioned throughput would mean either constant throttling or massive over-provisioning.

The per-RU cost is higher than provisioned, so at sustained high throughput, serverless gets expensive. But for many applications — especially early in their lifecycle — it's the simplest and cheapest option.

The Free Tier

New Cosmos DB accounts are eligible for a free tier: 1,000 RU/s of provisioned throughput and 25 GB of storage per month, at no cost. One free-tier account per Azure subscription. If you're learning or prototyping, this is the way to start with real, actual, running-in-Azure Cosmos DB.

The Local Emulator

Remember, there's also always the local emulator. That's totally free. And you don't have to create an Azure Subscription. But it's not the real, actual, running in-Azure version of Cosmos DB. The RU numbers from the emulator are almost hilariously unhelpful so if you care about trying to prove out your RUU (request unit usage) that's when it's time to sign up.

For the rest of the chapter, I just want you to remember that I'm going to stop reminding you that the emulator exists because those RU numbers aren't helpful for usage profiling. Ok? Good. And that pretty much wraps it up for the emulator.

Which One Should You Pick?

For development and learning: serverless or the free tier. Zero capacity planning, zero waste.

For production with variable traffic: autoscale. Let Cosmos handle the scaling.

For production with predictable, steady traffic: manual provisioned. You know your baseline, you set it, you monitor it.

You can change modes later, so don't overthink it at the start. Get your application working first, then optimize the billing model based on actual usage patterns.

What Operations Actually Cost: The Numbers

Okay. Let's look at real numbers. Everything that follows comes from a test harness I built and ran against a real Cosmos DB account in Azure East US 2. Not the emulator — a real account with real RU reporting. Additionally, I spun up a Windows 11 virtual machine in East US 2 and ran the tests from that VM so that I could remove/minimize network latency as much as possible.

I tested six document types across a range of sizes and property counts:

Doc Type	Size	Properties	Description
SmallDoc	220 B	5	Tiny — id, tenantId, entityType, title, date
MediumDoc	1.3 KB	22	Moderate — core properties plus 15 strings, tags, address
WideDoc	1.8 KB	100	Same size as MediumDoc but 100 short properties
LargeDoc	27 KB	18	Nested objects, lists of objects, long text
BlobDoc	1.5 MB	4	One giant text property, minimal indexing surface
WideBlobDoc	1.5 MB	503	Same size as BlobDoc but 500+ short properties

Two containers: one with Cosmos's default indexing policy (index everything), one with a custom policy that only indexes the four properties I actually query on.

CRUD Costs — Default Indexing

Operation	SmallDoc	MediumDoc	WideDoc	LargeDoc	BlobDoc	WideBlobDoc
Insert	7.43	18.86	44.95	55.24	1,252	1,443
Point Read	1.00	1.05	1.05	2.19	291.80	292.18
Upsert	10.29	12.95	12.95	22.86	2,500	2,537
Replace	10.67	13.33	13.33	23.24	2,500	2,537
Delete	6.29	6.29	6.29	6.29	6.29	6.29

A few things jump out.

Point reads are cheap and proportional to size. SmallDoc is 1 RU. LargeDoc at 27 KB is 2.19 RUs. Even BlobDoc at 1.5 MB is about 292 RUs — expensive in absolute terms, but proportional to the massive payload. If you're designing for cost, keep your frequently-read documents small.

Deletes are always 6.29 RUs. This surprised me. Whether the document is 220 bytes or 1.5 megabytes, the delete costs exactly 6.29 RUs. The delete operation apparently has a fixed cost regardless of document size.

Upsert and Replace cost the same. I expected Replace (ReplaceItemAsync) to be cheaper than Upsert (UpsertItemAsync) since Upsert has to check whether the document exists first. But the numbers are nearly identical — 10.29 vs 10.67 for SmallDoc, dead even at 2,500 for BlobDoc. The existence check apparently costs almost nothing. It seems like there’s no cost reason to prefer one over the other.

Writes cost more than inserts for WideDoc. Look at WideDoc: the insert costs 44.95 RUs, but the upsert and replace cost only 12.95 and 13.33. That seems backwards until you realize that the insert is creating 100 new index entries from scratch, while the upsert/replace is updating existing index entries. Index updates are apparently cheaper than index creation.

But the most interesting finding is in the insert column.

The Property Count Bombshell

Look at MediumDoc and WideDoc. MediumDoc is 1.3 KB with 22 properties. WideDoc is 1.8 KB with 100 properties. Almost the same size — within 500 bytes of each other.

But the insert cost? MediumDoc: 18.86 RUs. WideDoc: 44.95 RUs. More than double.

The only meaningful difference is the number of properties. Same basic size, same partition key, same container — but WideDoc has 78 more properties, and each one gets indexed under the default policy.

This is the indexing tax. Every property in your document that gets indexed costs you RUs on each write. More properties means more index entries means more write cost. Document size matters, but property count matters independently of size.

This has real implications for document design. If you're storing documents with lots of small properties — configuration objects, metadata dictionaries, denormalized records with many fields — the default indexing policy is silently inflating your write costs.

The Fix: Custom Indexing

The default indexing policy is to index everything. So what happens when you stop indexing everything?

I created a second container with a custom indexing policy that only indexes the four properties that participate in actual queries: tenantId, entityType, title, and createdDate. Everything else is excluded. Then I ran the same inserts.

Doc Type	Props	Default Index	Custom Index	Savings
SmallDoc	5	7.43	7.05	5%
MediumDoc	22	18.86	8.38	56%
WideDoc	100	44.95	8.38	81%
LargeDoc	18	55.24	13.33	76%
BlobDoc	4	1,252	1,251	~0%
WideBlobDoc	503	1,443	1,251	13%

WideDoc drops from 44.95 to 8.38 RUs — an 81% reduction in write cost.

And here's the kicker: with custom indexing, MediumDoc and WideDoc now cost exactly the same to insert: 8.38 RUs. Remove the indexing overhead, and the cost is driven by payload size, not property count. Those extra 78 properties were pure indexing tax.

BlobDoc barely changes — 1,252 to 1,251. It only has 4 properties, so there's almost nothing to index. The cost is purely payload. But WideBlobDoc (same size, 503 properties) drops from 1,443 to 1,251 — the 13% difference is entirely indexing overhead.

How to Customize Your Indexing Policy

Cosmos DB indexes everything by default. The default policy looks like this:

{
    "indexingMode": "consistent",
    "automatic": true,
    "includedPaths": [
        { "path": "/*" }
    ],
    "excludedPaths": [
        { "path": "/\"_etag\"/?" }
    ]
}

The /* wildcard means every property on every document gets indexed. To customize, you flip the approach: exclude everything, then include only what you need.

{
    "indexingMode": "consistent",
    "automatic": true,
    "includedPaths": [
        { "path": "/tenantId/*" },
        { "path": "/entityType/*" },
        { "path": "/title/*" },
        { "path": "/createdDate/*" }
    ],
    "excludedPaths": [
        { "path": "/*" }
    ]
}

A few important details here:

id and _ts are always indexed regardless of your policy. You can't turn them off, and you don't need to include them explicitly.

The partition key is NOT automatically indexed. This one is subtle and important. If you're using hierarchical partition keys (/tenantId,/entityType), you should include those paths in your indexing policy. Otherwise, queries filtering on the partition key hierarchy can force full scans, costing more RUs.

You can change the policy at any time without downtime. Cosmos transforms the index in the background using your provisioned RUs at lower priority than your application's operations. Your reads and writes continue normally during the transformation.

The tradeoff is real. If you exclude a property from the index and then query on it, Cosmos falls back to a full scan within the partition — which costs more RUs than an indexed query. The key is knowing your query patterns. Index what you query on. Exclude everything else. In Chapter 11, we'll look at how the library's diagnostics can tell you which indexes your queries actually used — and which ones Cosmos wishes you had.

Read cost is not affected by indexing policy. Indexing is a write-time cost. The savings show up on inserts and upserts. Reads and point reads cost the same regardless of your indexing configuration.

The Patch API: Not the Savings You'd Expect

Cosmos DB has a Patch API (PatchItemAsync) that lets you update individual properties on a document without reading and replacing the whole thing. On paper, this sounds like it should be dramatically cheaper for large documents — why send 1.5 MB back to Cosmos when you're only changing the title?

I tested single-property and multi-property patches across all document sizes.

Doc Type	Size	Upsert	Replace	Patch (1 prop)	Patch (3 props)
SmallDoc	220 B	10.29	10.67	10.79	11.80
MediumDoc	1.3 KB	12.95	13.33	13.49	13.57
WideDoc	1.8 KB	12.95	13.33	13.49	14.57
LargeDoc	27 KB	22.86	23.24	23.52	24.85
BlobDoc	1.5 MB	2,500	2,500	2,532	2,596
WideBlobDoc	1.5 MB	2,537	2,537	2,568	2,632

Patch costs the same or slightly more than Upsert and Replace across every document size. Patching one property on a 1.5 MB BlobDoc costs 2,532 RUs — versus 2,500 for replacing the entire document. Cosmos appears to charge based on the document size, not the amount of data you're changing.

I honestly did not see this coming. I expected Patch would be dramatically cheaper for large documents. It isn’t. The Patch API is a convenience for avoiding read-modify-write cycles, but it's not a cost optimization. I suppose you could make an argument that avoiding the ‘read’ saves you some RUs but by itself, Patch isn’t moving the RU needle.

Why This Matters for Application Design

Let’s say for a moment that I hypothetically ran these tests and found that the patch operations saved a ton of RUs versus Upsert. (To make sure I’m 100% clear: the data shows the opposite.)

Why would this matter? Why does this matter? The answer is: complexity.

The reason is that then you, might be tempted to add change-detection logic to your application — track which properties changed, issue targeted Patch calls instead of full Upserts, optimize away unnecessary full-document writes.

Next if you care about change detection, that starts to point you towards using EF Core to help with your Cosmos traffic. Which would make sense because EF Core has a built-in change tracker so it knows which properties are “dirty”. Therefore, EF Core would be a natural technology for detecting individual property changes and issuing targeted patch operations.

But back to what the numbers say. The numbers say: don't bother. The RU cost is the same.

If you rolled your own change detection logic, the complexity is non-trivial. If you used EF Core, then that adds an additional dependency in your app and almost certainly more code.

Everything you write has to be maintained. It’s code inventory and feature inventory that you have to worry about.

As a software architect, I’ve got to say that I’m seriously relieved to find that there’s _no reason _to add this complexity to your apps. Just Upsert the whole document and keep your code simple. The end.

BTW, If you're using my Benday.CosmosDb library, SaveAsync does the right thing and calls Upsert.

Point Read vs. Query: Use the Right Tool

Here’s a mistake that I’ve made myself. It’s sneaky: thinking that you’re doing an efficient point read when in fact you’re running a query. I made this mistake enough that I actually added [Obsolete] markers to certain methods in my library to give me a compiler warning if I was doing something sub-optimal.

I tested three ways to retrieve a single SmallDoc:

Approach	RU Cost
`ReadItemAsync` (partition key + id)	1.00
Query `WHERE id = @id` (with partition key)	3.02
Query `WHERE title = @title` (with partition key)	3.02

A point read costs 1 RU. That’s the basis of the whole RU system. A query that returns the exact same single document costs 3.02 RUs — three times more expensive. And it doesn't matter whether you're querying by id or by title; the query machinery itself has a baseline cost that exceeds the point read.

The lesson is simple: if you have the partition key and the id, use ReadItemAsync. Always. Don't use a query to fetch something you can point-read. Otherwise, you're throwing away 2 RUs every time.

This is why the library's CosmosTenantItemRepository<T> has GetByIdAsync(string tenantId, string id) as the primary method, and the parameterless GetByIdAsync(string id) (which runs a cross-partition query) is marked [Obsolete]. It's not really obsolete but I didn't want anyone (or me) to use the inefficient version without warning. The API nudges you — and your AI coding assistant — toward the efficient path. And if you actually need the inefficient path — and you might — then it's there for you.

Query Cost Scales with Results

What about queries that return multiple documents? Does cost scale linearly with the result set?

Query	Docs Returned	RU Cost
SELECT TOP 5	5	3.04
SELECT TOP 50	50	4.36
SELECT * (all 500)	500	17.57

Cost grows with the result set, but not linearly. There's a baseline query cost (~3 RUs) that dominates at low result counts. At higher counts, each additional document adds only a small incremental cost — maybe a few hundredths of an RU per document.

The practical lesson: pagination matters. If you only need 50 results, don't fetch 500 and filter in memory. Use TOP, use continuation tokens, use GetPagedAsync from the library. Every document you fetch that you don't need is a fraction of an RU wasted — and fractions add up at scale.

The Heat-Death of the Universe vs. Wasted RUs

The heat-death of the universe is the inevitable time in the far, far, far, distant future where all the usable energy in the universe has been spent. Everything reaches the same temperature. Nothing can happen anymore because there are no energy differences left to drive anything.

My point though is that the heat-death of the universe is not imminent.

So I just wrote a moment ago that those wasted fractions of RUs add up to something meaningful at scale. And now I'm going to undermine my point about wasted RUs.

If you're running at the scale of Amazon or Facebook or Azure DevOps Services, those wasted RUs will add up to meaningful waste.

But if you're not running at that scale or anything close to that scale, maybe you should just write the code that's easy to write, easy to maintain, and easy to understand. Sure, there's waste...but at the scale of your application, it'll take until the heat-death of the universe for it to amount to anything measurable. So don't worry about it.

Projection: Does SELECT * Waste Money?

In SQL Server, selecting only the columns you need is a well-established best practice. Is the same true in Cosmos?

Query	Docs	RU Cost
SmallDoc SELECT *	50	4.36
SmallDoc SELECT id, title, date	50	4.56
LargeDoc SELECT *	50	20.29
LargeDoc SELECT id, title	50	6.84

For small documents, projection actually costs slightly more — 4.56 vs. 4.36. The projection machinery has overhead, and when the documents are already small, that overhead exceeds the savings.

For large documents, the story is completely different. SELECT * on 50 LargeDocs costs 20.29 RUs. Projecting just id and title drops it to 6.84 — a 66% savings.

So the rule isn't "always use projection." It's "use projection when your documents are large and you don't need every property." For small documents, don't bother — the overhead isn't worth it. And if you're not running at huge scale, then you might not ever see the difference even with a higher RU cost. (see "The Heat-Death of the Universe vs. Wasted RUs" discussion above)

Server-Side COUNT vs. Fetch-and-Count

Here's another common mistake: fetching all the documents and counting them in C# when all you need is the count. Let's say that you want to know how many orders were placed in Massachusetts in the last 24 hours. You don't care about the orders themselves — only the count.

There are two ways to count them:

Server-side count
C# count in memory using Length or Count() from System.Linq

Approach	SmallDoc (500 docs)	LargeDoc (50 docs)
Server-side `SELECT VALUE COUNT(1)`	3.09	3.09
Client-side fetch all + `.Count()`	17.57	20.29
Savings	5.7x	6.6x

A server-side COUNT in Cosmos costs about 3 RUs regardless of document size. Why would that be the case? Well, the answer is that doing a server-side count doesn't need to fetch any data. It just counts the matching records.

The other approach isn't any different than retrieving the data as if you were going to display it in a list or a grid. Cosmos finds the matching records and then retrieves the data. Cosmos is done at this point but there's more work to do in your app because that data needs to be transferred over wire and then deserialized into objects and something like IEnumerable<T>.

The LargeDoc comparison makes this especially clear: the server-side count costs the same 3.09 RUs whether the documents are 220 bytes or 27 KB. The client-side cost jumps from 17.57 to 20.29 because you're transferring bigger documents.

The lesson: if you just need a count, use SELECT VALUE COUNT(1) in your query. Don't use LINQ's .Count() on a result set you fetched from Cosmos — you've already paid for the full fetch by the time you count.

How does LINQ work? How does LINQ's Count() method work?

Nerd alert! It's important to know how LINQ actually works. Everything is based on IEnumerable or IEnumerable<T>. It's all riding on top of the Iterator Pattern. (That's also how foreach works.)

All those handy methods in LINQ like Where(), Select(), OrderBy(), etc. — they're all looking at your data in some form of IEnumerable. Notice that I'm saying IEnumerable<T> and not something like List<T>. That dependency on an interface instead of a concrete implementation allows a TON of flexibility and also a lot of performance optimization.

One thing that you'll see if you look at EF Core or the Cosmos DB SDK for .NET is that there's an interface called IQueryable. IQueryable implements IEnumerable by the way and this lets .NET define a potential query against a datastore and only run that query when the data is actually needed.

If you build an IQueryable LINQ query against Cosmos and stick it in a variable called myQuery but then never run it, then you've used no RUs.

But then take that same IQueryable instance and call myQuery.Count(). Now, a smart IQueryable provider might translate that into a server-side COUNT — it never fetches the rows. But depending on the provider and how you've built the query, it might just fetch everything and count client-side. And if you call myQuery.ToList() first and then check .Count, you've definitely fetched everything — and you're paying for the full data transfer.

If you want the data AND you want the count, you might actually be running the Cosmos query twice — or worse, fetching everything twice.

The Cross-Partition Surprise

A cross-partition query is a query that doesn't use a partition key value and therefore causes Cosmos to look at all the partitions in your container.

I've been telling you throughout this book that cross-partition queries are expensive. So I tested it. And the results are — well — somewhat hard to come by. Surprising and confusing, too.

This data is nuts.

Query	Docs	RU	Cross-Partition Detected?
With partition key	500	17.57	No
Without partition key	500	15.70	No
All tenants, no PK	1,002	29.28	No

The query without a partition key cost 15.70 RUs — actually less than the 17.57 with a partition key. And none of the queries were flagged as cross-partition.

What's going on? The answer is that I don't know for sure but I've got a guess. My guess is all of our test data was fitting into a single physical partition in Cosmos.

Let's remember that there's a difference between "logically partitioned" and "physically partitioned" in Cosmos DB. Logically partitioned means that our data is stored in a way that describes a usable partitioning scheme. Physically partitioned means that Cosmos has moved that partition's data onto a separate node inside of Cosmos.

Just because data can be physically partitioned doesn't mean that Cosmos DB will always physically partition your data. Cosmos DB waits until it figures out that the data volume warrants physically rearranging your data.

Considering our test data, with only a thousand or so documents totaling a few megabytes, there's not much to fan out onto physical partitions. So even though we might have had a whole bunch of local partitions, our data all lived in one physical partition. Therefore, we ran queries and didn't see any warnings about cross-partition queries.

Cross-partition queries are only expensive when there are actually multiple physical partitions to query — and that only happens when your data grows large enough for Cosmos to split.

But you need to plan ahead! You need to think about your logical partitioning scheme ahead of time because you can't change the partition key config for a container. So this is one of those rare places where you kinda have to optimize for a performance problem you don't actually know you have.

Why This Matters

This is actually a really important finding for this book. Here's what it means in practice:

You can't feel the cross-partition pain during development. Your test data is too small for Cosmos to split across physical partitions. Every query hits one partition no matter what. The RU difference between a good query and a bad query is negligible.

The cross-partition detection in the library won't fire during development either. The library checks the diagnostics for signs of multi-partition fan-out. If there's only one physical partition, there's nothing to detect.

By the time cross-partition queries become expensive, you're in production. You've got real data, real traffic, and real money on the line. The query patterns that were "fine" in development are suddenly costing 10x or 50x more than they should.

This is why the library's guardrails exist. The partition-key-aware methods, the [Obsolete] markers, the cross-partition detection — they're there because you can't test for this problem. You have to design for it from the beginning, trusting that the pain will materialize later if you don't.

I verified this by checking the raw Cosmos diagnostics. Every query, regardless of whether it specified a partition key, hit PartitionKeyRangeId: "0" — a single partition key range. Even with 1,000+ documents across 100+ different tenantId values, Cosmos kept everything in one physical partition.

Bulk Operations: Speed, Not Savings

Is there an RU discount for inserting documents in bulk versus one at a time?

Approach	Total RU	Per Doc	Time	Throttled
Individual inserts	743.00	7.43	3,263 ms	0
TransactionalBatch	742.86	7.43	136 ms	0
Bulk execution	743.00	7.43	2,221 ms	30

No. The per-document RU cost is virtually identical across all three approaches — about 7.43 RUs per SmallDoc. Cosmos charges for the work regardless of how you submit it.

But the time difference is dramatic. TransactionalBatch finished in 136 milliseconds — one network round trip for 100 documents. Individual inserts took over 3 seconds — 100 round trips. That's a 24x speedup.

Bulk execution (firing all requests concurrently with Task.WhenAll using the AllowBulkExecution client option) hit 30 throttled requests at our 400 RU/s provisioned throughput. It's fast in theory, but at low throughput, you'll spend most of your time retrying after 429s.

The lesson: batch for speed, not for cost savings. TransactionalBatch is the clear winner for throughput when all items share a partition key. Just remember that all items in a batch must share the same partition key and the total batch size must stay under 2 MB.

The 2 MB Limit: What the Error Looks Like

Speaking of 2 MB — in Chapter 6, I mentioned the maximum document size. Let's see what actually happens when you exceed it.

I created a document at 2,150,539 bytes — just over the 2 MB line — and tried to save it.

Real Cosmos account: CosmosException with HTTP status code 413 (RequestEntityTooLarge) and the message "Request size is too large." Clear, immediate, no ambiguity.

The emulator? Status 200. Document saved successfully. The emulator accepted a document that the real service rejects. More on that next.

The Emulator Lies to You Can't Count

I ran every single test against both the real Cosmos account and the vNext emulator. I mentioned back in Chapter 4 that the emulator doesn't report accurate RU values. Now let me show you what that actually looks like and exactly how inaccurate it is.

Operation	Doc Type	Real RU	Emulator RU
Insert	SmallDoc	7.43	1.00
Insert	WideDoc	44.95	1.00
Insert	BlobDoc	1,252	1.00
Point Read	SmallDoc	1.00	1.00
Point Read	BlobDoc	291.80	1.00
Patch (1 prop)	BlobDoc	2,532	1.00
Query (500 docs)	SmallDoc	17.57	1.00
COUNT (500 docs)	SmallDoc	3.09	1.00
Batch (100 docs)	SmallDoc	742.86	1.00

Every. Single. Operation. 1.00 RU.

Insert a 220-byte document? 1 RU. Insert a 1.5 megabyte document? 1 RU. Patch a single property on a 1.5 MB document? 1 RU. Query 500 documents? 1 RU. Batch insert 100 documents? 1 RU. The emulator reports 1.00 for everything regardless of document size, operation type, or complexity.

And it's not just RU reporting. The emulator also:

Accepted our 2.1 MB over-limit document (real account: 413 error)
Never detected cross-partition queries (although that's understandable at small data volumes)
Reported 0.00 total RUs for bulk execution of 100 documents

The emulator is great for development. It's fast, it's free, it runs on my Mac, it runs on Windows and Linux, and the CRUD operations work correctly. But it cannot tell you what your operations actually cost, and it won't enforce all the limits that the real service does.

For cost estimation, for performance testing, and for validating document size limits — you need a real Azure account. Full stop.

Container Throughput Decisions

In Chapter 6, we talked about when to put entity types in separate containers. Now you can see the cost dimension of that decision.

When multiple entity types share a container, they share throughput. If your webhook event processor suddenly hammers 300 RU/s of writes, that leaves only 100 RU/s for your user-facing queries (at 400 RU/s provisioned). Your users see latency spikes or 429 errors because a background process consumed the budget.

A separate container with its own provisioned throughput isolates the blast radius. The webhook container can throttle without affecting user-facing operations. But separate containers mean separate provisioned throughput, which means a higher minimum spend — each container needs at least 400 RU/s (or use serverless).

Shared throughput at the database level is a middle ground. You provision throughput on the database (minimum 400 RU/s), and all containers in that database share it. Cosmos distributes the RUs across containers automatically. This is cheaper than per-container provisioning but gives you less isolation.

The right choice depends on your traffic patterns. If all your entity types have similar, steady traffic — shared throughput or a single container is fine. If one entity type is bursty and could starve the others — give it its own container and its own throughput.

Practical Cost Checklist

Here's a checklist for thinking about and planning Cosmos DB cost:

Customize your indexing policy. If your documents have more than a handful of properties, don't index everything. Only index the properties you filter and sort on. The write savings can be 50-80% for wide documents.

Use point reads when you have the id. ReadItemAsync costs 1 RU. A query returning the same document costs 3 RUs. Always point-read when you can.

Don't bother with Patch for cost savings. The Patch API costs the same as Upsert. It's a convenience for avoiding read-modify-write cycles, not a cost optimization. Just Upsert the whole document and keep your code simple.

Use server-side COUNT, not fetch-and-count. SELECT VALUE COUNT(1) costs ~3 RUs regardless of document size. Fetching all documents and counting client-side costs 6-7x more because you're paying for data transfer.

Use projection for large documents. SELECT * on small documents is fine. On large documents, projecting only the properties you need can save 60% or more on query cost.

Paginate your queries. Don't fetch 500 documents when you need 50. Use TOP, continuation tokens, or GetPagedAsync.

Design for partition isolation. Cross-partition queries are cheap during development because everything fits in one physical partition. They get expensive at scale. Use partition-key-aware queries from the start.

Batch for speed, not cost. TransactionalBatch is 24x faster than individual inserts. The per-document RU cost is the same, but the latency improvement is dramatic.

Don't trust the emulator for cost or limits. Use it for development. Use a real Azure account for cost estimation and performance testing. The emulator reports 1.00 RU for everything and accepts documents over 2 MB.

Start with serverless or free tier. Don't provision throughput until you understand your traffic patterns. You can always switch to provisioned later.

Monitor with my library's RU logging. The RequestCharge on every operation tells you exactly what you're spending. Watch for operations that cost more than you expect — that's where the optimization opportunities hide.

Next up: Chapter 8, where we talk about the Change Feed — the feature that makes Cosmos DB more than just a database. Event-driven patterns, the boilerplate nobody warns you about, and an Azure Functions cold-start trick that saves you money.

Categories: cosmos-db-for-dotnet-developers

Tags: cosmosdb azure csharp performance request-units indexing

Azure Cosmos DB for .NET Developers book cover

Cosmos DB for .NET Developers: The Book