Cosmos DB: Appendix B — Operations Reference - Benjamin Day Consulting, Inc.

Software Delivery Rescue & Technical Leadership

Cosmos DB: Appendix B — Operations Reference

May 12, 2026

This is Appendix B of Azure Cosmos DB for .NET Developers. Previous: Appendix A: Cosmos DB Query Cookbook.

Same format as the Query Cookbook. Problem first, practical answer, watch-out-for gotchas. This appendix is oriented around the Azure portal and the operational side of running Cosmos DB — the stuff that shows up after your code works and somebody asks "okay, but how do we run this thing?"

The Azure portal's resource blade for a Cosmos DB account is basically the table of contents for this appendix. If you see something in the left nav and wonder what it does, there's probably an entry for it here.

Throughput Management

The situation: You're setting up a new container and the portal is asking you to choose between Manual, Autoscale, and Serverless. You have no idea what to pick.

The three modes:

Serverless is the simplest. You don't configure throughput at all. You pay per-RU for every operation, no minimums, no commitments. If your container sits idle, you pay nothing. If it spikes, you pay for the spike. This is great for development, testing, low-traffic applications, and anything where the workload is unpredictable and bursty. The cocktail app ran on serverless for months during development. The downside: serverless has a 1,000 RU/s burst limit per partition, and a 1 TB storage limit per account. You also can't use serverless with multi-region writes or Fabric mirroring (continuous backup isn't supported on serverless accounts as of this writing).

Manual provisioned throughput is the traditional model. You set a specific RU/s number — say 400 RU/s — and that's what you get. Go over it and your requests get throttled (HTTP 429). The advantage is cost predictability. The disadvantage is that you have to guess your throughput needs. Set it too low and requests fail. Set it too high and you're paying for capacity you're not using. The minimum is 400 RU/s per container (or 100 RU/s per container if you're sharing throughput at the database level).

Autoscale is the sweet spot for production. You set a maximum RU/s, and Cosmos DB scales between 10% of that maximum and the full amount based on actual load. If you set a max of 4,000 RU/s, the system scales between 400 and 4,000 RU/s. You're billed for the highest throughput the system scaled to during each hour. Autoscale handles spikes without throttling, and costs less than manual during quiet periods because it scales down. For most production workloads, this is the right default.

The decision tree: Developing or experimenting? Serverless. Production workload with predictable, steady traffic? Manual, if you want to optimize cost. Production workload with variable traffic? Autoscale. Not sure? Start with autoscale — you can always switch to manual later if cost analysis shows your traffic is flat enough.

One more thing: You can provision throughput at the database level instead of the container level. Database-level throughput is shared across all containers in that database. This is useful if you have multiple containers with low individual throughput needs — instead of paying 400 RU/s minimum per container, you pay one pool that all containers share. The tradeoff is that a burst in one container can starve the others.

Cross-reference: Chapter 3 (Cosmos DB structure), Chapter 7 (RU costs).

Consistency Levels

The situation: The portal shows five consistency levels and the documentation reads like a distributed systems textbook. You just want to know which one to pick.

The five levels, in practical terms:

Strong — after a write, any subsequent read from any region sees that write immediately. This is what you're used to from a single-server SQL Server. The cost: writes are slower because Cosmos has to replicate to a majority of regions before acknowledging the write. For single-region accounts, this is effectively the same as Bounded Staleness without the latency penalty of multi-region replication.

Bounded Staleness — reads might be stale, but only by a bounded amount (you configure the window in seconds or operations). This is a compromise between consistency and performance in multi-region setups. For single-region accounts, it behaves identically to Strong.

Session — this is the default and it's the right choice for most applications. Within a single client session, you always see your own writes. If User A writes a recipe and then reads it back, they see the updated version. User B might see the old version for a brief moment. This is natural "I see my own changes" behavior and it matches how most web applications already work. The Benday.CosmosDb library tracks the session token automatically.

Consistent Prefix — reads might be stale, but you'll never see things out of order. If writes happen as A → B → C, a reader might see just A, or A → B, but never A → C (skipping B). Useful for event-stream scenarios where ordering matters more than immediacy.

Eventual — the cheapest option. Reads might be stale and might be out of order. This sounds scary but for read-heavy workloads where staleness is acceptable (caching layers, analytics, read replicas), it gives the best throughput and lowest RU cost. Reads cost roughly half the RUs compared to Strong.

What to pick: Session consistency, unless you have a specific reason to choose something else. It gives you "I see my own writes" semantics at a reasonable cost, and it's the default for a reason. The cocktail app uses Session consistency. I've never had a reason to change it.

Cross-reference: Chapter 3 (Cosmos DB fundamentals).

Backup and Restore

The situation: You want to know what happens if someone accidentally deletes a container or corrupts your data.

Two backup modes:

Periodic backup is the older model. Cosmos takes snapshots at intervals you configure (minimum every hour, maximum every 24 hours) and retains them for a configurable period (minimum 8 hours, maximum 720 hours / 30 days). To restore, you file a support ticket with Microsoft. They restore to a new account, not in-place. The restore is at the account level — you can't restore a single container. Recovery time depends on data size and support response time.

Continuous backup is the newer model and the one you probably want. It keeps a continuous log of changes and supports Point In Time Restore (PITR) — you pick a timestamp within the retention window and Cosmos restores to that exact moment. Two retention tiers: 7-day (free) and 30-day (has a cost per GB/month). Continuous backup also supports granular restore — you can restore a single container or database to a new account rather than restoring the entire account.

Choosing between them: If you're setting up a new account, choose continuous backup. The 7-day tier is free. There is essentially no reason to pick periodic backup for new accounts. Continuous backup is also a prerequisite for Fabric mirroring (Chapter 14), so if you think you might ever want analytics, you'll need it anyway.

The migration catch: If your existing account uses periodic backup, you can migrate to continuous backup. But the migration is irreversible — once you switch to continuous, you can't go back to periodic. This is fine because continuous is strictly better, but know that it's a one-way door.

Point In Time Restore (PITR)

The situation: Someone deleted a batch of documents at 2:47 PM and you want them back.

How it works: With continuous backup enabled, you go to the Azure portal, select your Cosmos account, navigate to Point in Time Restore, pick a timestamp (any time within your retention window), and specify which databases/containers to restore. Cosmos creates a new account with the data as it existed at that timestamp. You then migrate the recovered data back into your production account.

The gotchas:

It restores to a new account, not in-place. You can't "undo" a deletion on the live account. You get a copy of the data at the point in time, in a separate account, and you have to move it yourself.

The restore granularity is per-container. You can restore a specific container without restoring the entire account. But you can't restore individual documents — you restore the whole container and then find the documents you need in the restored copy.

RPO (Recovery Point Objective) is essentially the timestamp you pick. If you know when the bad thing happened, you can restore to seconds before it. If you don't know when it happened, you'll need to poke around in the restored data to find a good timestamp.

Restore time depends on data size. Small containers restore in minutes. Large containers can take hours. Don't assume this is instantaneous for disaster recovery planning.

The practical advice: Think of PITR as insurance for "oh no" moments — accidental deletions, bad deployments that corrupt data, bulk operations gone wrong. For genuine disaster recovery with tight RPOs and RTOs, you'll want multi-region configuration as well.

TTL (Time To Live)

The situation: You have documents that should expire automatically — session data, temp files, event logs, cache entries — and you don't want to build a cleanup job.

How it works: TTL is the number of seconds a document should live after its last modification (based on _ts). You can set it at two levels:

Container-level default: Set a default TTL on the container. Every document in the container inherits this value unless it overrides it.

// Container setting: default TTL of 86400 seconds (24 hours)
{
    "defaultTtl": 86400
}

Document-level override: Set a ttl property on individual documents to override the container default.

{
    "id": "session-abc123",
    "ttl": 3600,
    "...": "..."
}

This document expires 3,600 seconds (1 hour) after its last modification, regardless of the container default.

Special values: A ttl of -1 on a document means "never expire, even if the container has a default TTL." A container default of -1 means "TTL is enabled on this container (documents can have individual TTLs), but documents don't expire by default."

Deletion behavior: Expired documents are deleted in the background by a system process. This process runs on leftover RUs — it doesn't consume your provisioned throughput. But "in the background" means there's a slight delay between when a document's TTL expires and when it actually gets removed. Don't write code that depends on documents being deleted at the exact second of expiration.

Common patterns: Session state with a 24-hour TTL. Event log entries with a 30-day TTL. Change feed processor lease documents with short TTLs for cleanup. Transient cache documents that auto-expire.

The RU benefit: Deleted documents stop costing you storage and stop being indexed. If you have a high-write container with transient data, TTL can significantly reduce your storage costs over time with zero maintenance code.

Monitoring

The situation: Something is slow or expensive and you need to figure out what's happening.

Where to look in the portal:

Metrics blade — the first stop. Shows RU consumption, request count, throttled requests (HTTP 429s), storage, and document count over time. The most useful view is "Total Request Units" broken down by operation type. This tells you whether your reads, writes, or queries are consuming the most throughput. If you see spikes in throttled requests, your provisioned throughput is too low (or you have a runaway query).

Insights blade — a curated set of dashboards built on top of the same data. More visual, less configurable. Good for a quick health check.

Diagnostic settings — for deeper analysis, you can send Cosmos metrics and logs to Log Analytics, Event Hub, or Storage. Once the data is in Log Analytics, you can write KQL queries to analyze query patterns, identify expensive operations, and track performance trends over time. This is where the serious operational analysis happens.

Alerts — set up alerts on the metrics that matter. The ones I'd start with: throttled requests greater than zero (you're exceeding throughput), normalized RU consumption above 80% (you're approaching throttling), and storage utilization approaching your limits. Don't set alerts on everything — alert fatigue is real and it makes people ignore the alerts that matter.

Application-level diagnostics: The ICosmosQueryLogSink from Chapter 11 gives you query-level detail that the portal metrics can't — individual query SQL, RU cost per operation, timing, cross-partition flags, and (when enabled) index utilization metrics that show which indexes your queries actually used and which ones Cosmos recommends adding. The portal tells you how much throughput you're using. The sink tells you which queries are responsible and whether they're hitting the right indexes.

The practical approach: Use the portal Metrics blade for "is anything on fire right now?" Use the ICosmosQueryLogSink for "which specific query is causing the fire?" Turn on CaptureIndexMetrics for "is the fire burning hotter than it needs to because I'm missing an index?" Use Log Analytics for "what's the trend over the last month and should I change my throughput configuration?"

Cross-reference: Chapter 7 (RU costs), Chapter 11 (ICosmosQueryLogSink).

Locks

The situation: You want to make sure nobody accidentally deletes your production Cosmos DB account or database.

How it works: Azure Resource Manager locks prevent accidental deletion or modification of resources. Two types:

Delete lock — prevents the resource from being deleted. You can still modify it (change settings, update throughput, etc.), but you can't delete it. This is the one you almost certainly want on your production Cosmos account.

ReadOnly lock — prevents both deletion and modification. This is too restrictive for most Cosmos accounts because you can't change throughput, update indexing policies, or make any configuration changes. Use this only if you have a very specific reason.

Where to set it: In the Azure portal, navigate to your Cosmos DB account → Settings → Locks. Add a lock at the account level. You can also set locks at the resource group level, which cascades to everything in the group.

The catch: Locks apply to management-plane operations, not data-plane operations. A Delete lock on the Cosmos account prevents someone from deleting the account in the portal or via ARM, but it doesn't prevent someone from deleting documents or containers via the SDK or Data Explorer. For data-level protection, use RBAC permissions (Chapter 9) to restrict who can perform destructive operations.

Best practice: Put a Delete lock on every production Cosmos DB account. It takes 30 seconds and prevents the "someone accidentally deleted the account" disaster. You can always remove the lock when you legitimately need to delete the resource.

Cross-reference: Chapter 9 (RBAC and permissions).

Networking

The situation: Your security team says the Cosmos DB account needs to be locked down, and you need to understand the networking options.

Public network access is the default. Your Cosmos DB account is accessible from the internet via its endpoint URL, authenticated by access keys or Azure AD tokens. For development and small projects, this is fine. For production, your security team will probably want more.

IP firewall rules restrict access to specific IP addresses or CIDR ranges. You configure them in the portal under Networking → Public Access → Selected Networks. Add your office IP, your build server's IP, your app's outbound IPs. Everything else gets rejected. This is the simplest security improvement and takes about two minutes.

Service endpoints allow Azure resources in a specific virtual network to reach your Cosmos DB account over the Azure backbone network (not the public internet). The traffic never leaves Microsoft's network. This is useful when your App Service or Azure Functions run in a VNet.

Private endpoints take it further — they assign a private IP address from your VNet to the Cosmos DB account. The public endpoint can be disabled entirely. All traffic goes through the private IP. This is the gold standard for production security and what most enterprise environments require. Fabric mirroring supports private endpoints as of March 2026.

The practical progression: Start with public access during development. Add IP firewall rules when you move to staging. Switch to private endpoints for production. Don't try to set up private endpoints on day one — it adds complexity that slows down development without adding value until you're running real workloads.

The Fabric mirroring note: If you're using Fabric mirroring (Chapter 14), you'll need either public access or a private endpoint that Fabric can reach. Service endpoints alone aren't sufficient for Fabric connectivity.

Cross-reference: Chapter 9 (security and permissions), Chapter 14 (Fabric mirroring prerequisites).

Dedicated Gateway and Integrated Cache

The situation: You have a read-heavy workload and you want to reduce RU costs by caching frequently-accessed documents.

What it is: The Dedicated Gateway is a compute layer that sits in front of your Cosmos DB account. It includes an integrated cache that stores the results of point reads and queries. Subsequent identical reads hit the cache instead of Cosmos, costing zero RUs.

How it works: You provision a Dedicated Gateway SKU (separate from your container throughput), change your application's connection string to use the gateway endpoint instead of the direct endpoint, and set ConsistencyLevel to Eventual on cached reads. The gateway transparently caches responses and serves them for requests with matching keys/queries.

When it makes sense: High read-to-write ratios where the same documents or query results are read repeatedly. Think of it like a managed cache specifically for Cosmos DB. If your app reads the same recipe detail page 1,000 times between updates, the cache saves 999 RUs worth of reads.

When it doesn't make sense: Write-heavy workloads, workloads where every read is unique, or workloads where strong consistency is required on every read (the cache introduces staleness by definition). Also not worth it if your total RU consumption on reads is already low — the Dedicated Gateway has its own cost, and it only saves money if the RU savings exceed the gateway cost.

The cost math: The smallest Dedicated Gateway SKU has a monthly cost. Compare that cost against the RUs you'd save by caching. If your read RU bill is significantly higher than the gateway cost, it pays for itself. If it's close or lower, stick with direct reads.

Container Copy

The situation: You need to move data between containers — maybe you're restructuring your partition key, changing your indexing policy in a way that requires a new container, or migrating between accounts.

Why you need it: Some container properties can't be changed after creation. Partition key paths and vector indexing policies are the big ones. If you need to change either of those, you create a new container with the correct settings and copy the data over.

The options:

Container Copy (preview as of this writing) is an Azure-managed job that copies data between containers within the same account or across accounts. You configure it in the portal or via CLI, and it runs as a background job. It handles large datasets, doesn't consume your container's RUs (it uses dedicated copy throughput), and supports transforms during copy.

Change Feed processor — write a small application that reads the change feed from the source container and writes to the target container. This gives you complete control over the copy process, including the ability to transform documents during migration. The change feed examples from Chapter 8 give you the pattern.

Data export/import CLI — for the cocktail app, I built CLI commands that export documents to JSON files on disk (one file per document, organized by entity type subfolder) and import them back into a container. This is the brute-force approach but it works reliably for datasets that fit on disk. It's also handy for seeding test environments.

Bulk executor — the Cosmos SDK's bulk execution mode lets you write large batches of documents efficiently. Combine a change feed reader (or a file-based export) with bulk writes for a fast migration pipeline. The Benday.CosmosDb library's bulk operations include automatic throttling and retry logic.

The practical advice: For one-time migrations with datasets under a few GB, export-to-disk-and-reimport is the simplest and most debuggable approach. For larger datasets or ongoing sync, Container Copy or a change feed processor is the way to go. Always test the migration on a non-production copy first. Always verify document counts before and after.

Cross-reference: Chapter 8 (change feed), Chapter 12 (vector indexing requires container recreation).

Bulk Operations

The situation: You need to write a large number of documents — initial data import, batch updates, embedding generation — and doing them one at a time is painfully slow.

Enabling bulk mode:

var clientOptions = new CosmosClientOptions
{
    AllowBulkExecution = true
};

var client = new CosmosClient(endpoint, credential, clientOptions);

That AllowBulkExecution = true tells the SDK to batch multiple operations into grouped requests. Instead of sending 5,000 individual write requests, the SDK groups them into efficient batches and sends them in parallel.

The pattern:

var tasks = new List<Task>();

foreach (var recipe in recipes)
{
    tasks.Add(container.UpsertItemAsync(
        recipe,
        new PartitionKey(recipe.TenantId)));
}

await Task.WhenAll(tasks);

You kick off all the tasks without awaiting each one individually, then wait for them all at once. The SDK handles the batching, parallelism, and connection management.

Throttling and retries: Even with bulk mode, you can exceed your provisioned throughput and get HTTP 429 throttle responses. The SDK's built-in retry logic handles transient throttles automatically. For heavy bulk operations, consider temporarily scaling up your throughput (autoscale helps here) and scaling back down afterward.

The Benday.CosmosDb library approach: The library includes bulk operation helpers that add structured logging (via ICosmosQueryLogSink), configurable parallelism, and throttle-aware batching. This is what the cocktail app's batch embedding process uses — it generates embeddings for thousands of recipes and writes them back with the embedding vectors.

When to use bulk: Initial data loads, batch transformations (adding a new property to every document), re-embedding after a model change, migrating data between containers. Any time you're touching more than a few hundred documents in a single operation.

Cross-reference: Chapter 12 (batch embedding process).

Multi-Region Setup

The situation: Your users are globally distributed and you want low-latency reads (or writes) from multiple regions.

Read replicas are the simplest multi-region configuration. Your Cosmos account has a single write region, and you add one or more read regions. Cosmos replicates data to the read regions automatically. Your application connects to the nearest region for reads, and writes go to the primary region. This reduces read latency for users far from your write region.

Multi-region writes are more complex. Multiple regions accept writes, and Cosmos handles conflict resolution (last-writer-wins by default, or custom conflict resolution policies). This gives you low-latency writes everywhere, but you need to design for potential write conflicts.

The cost math: Every additional region roughly doubles your RU cost (you pay for throughput in each region). Two regions = 2x cost. Three regions = 3x cost. This adds up fast. Make sure the latency improvement justifies the expense. For many applications, a single region with the Dedicated Gateway cache is more cost-effective than adding a second region.

Failover: With multiple regions, Cosmos provides automatic failover. If the primary write region goes down, Cosmos promotes a read region to the new write region. You configure the failover priority in the portal. For single-region accounts, if the region goes down, your data is unavailable until the region recovers — there's no automatic failover.

When it's worth it: Genuinely global applications with users on multiple continents where latency matters. SLA requirements that demand multi-region redundancy. Applications where regional outages are unacceptable.

When it's not worth it: Most applications, honestly. If your users are primarily in one geography and you can tolerate occasional regional outages (which are rare), single-region is fine. The cost difference is significant.

Fleet Management (Multi-Tenancy Patterns)

The situation: You're building a multi-tenant application and you need to decide how to organize your Cosmos resources across tenants.

Three levels of isolation:

Shared container — all tenants share a single container, separated by partition key. This is what the cocktail app does: tenantId is the first level of the hierarchical partition key. Tenant A's data lives at partition ["TenantA", "CocktailRecipe"], Tenant B's at ["TenantB", "CocktailRecipe"]. Maximum cost efficiency, minimum isolation. One noisy tenant's query load can consume throughput that other tenants need.

Container per tenant — each tenant gets their own container within a shared database. Better isolation (each container can have independent throughput), but more management overhead. Useful when tenants have meaningfully different throughput needs or when regulatory requirements demand logical separation. If you've written logic into your app to track individual customer RU usage and want to bill for that, this might also be an interesting option.

Account per tenant — maximum isolation. Each tenant gets a separate Cosmos DB account. Completely independent throughput, networking, backup, and access control. The highest management overhead and the most expensive, but sometimes required for compliance reasons (data residency, for example).

The practical default: Start with shared containers and partition key separation. It's the simplest, cheapest, and most operationally lightweight approach. The Benday.CosmosDb library's TenantItemBase and hierarchical partition key pattern (/tenantId,/entityType) is designed for exactly this scenario. Move to container-per-tenant or account-per-tenant only when you have a specific isolation requirement that shared containers can't meet.

Cross-reference: Chapter 6 (partition keys and tenancy), Chapter 10 (multi-tenant cocktail app design).

Vector Embedding and Indexing Policies

The situation: You want to add vector search to a container and you need to know what to configure.

Two pieces to set up: Vector search requires both a vector embedding policy (which tells Cosmos the shape of your embedding data) and a vector index in your indexing policy (which tells Cosmos how to index it for similarity queries).

The configuration (for reference):

Vector embedding policy on the container:

{
    "vectorEmbeddingPolicy": {
        "vectorEmbeddings": [
            {
                "path": "/embedding",
                "dataType": "float32",
                "distanceFunction": "cosine",
                "dimensions": 1536
            }
        ]
    }
}

Vector index in the indexing policy:

{
    "vectorIndexes": [
        {
            "path": "/embedding",
            "type": "diskANN"
        }
    ]
}

Exclude the embedding path from your regular indexing policy. This one's easy to miss. A 1,536-float array indexed with range indexes wastes RUs on every write and provides zero query benefit — you never query embeddings with range operators. That's what the vector index is for.

Cross-reference: Chapter 12 (full vector search setup), Appendix A (VectorDistance queries).

Resource Visualizer

The situation: You're looking at the Azure portal and you want a quick picture of what's in your Cosmos DB account — how many databases, how many containers, what throughput is configured where.

Where to find it: In the portal, go to your Cosmos DB account → Resource Visualizer (under the Overview section in the left nav). It shows a tree view of your account hierarchy: account → databases → containers, with throughput information at each level.

Why it's useful: When you're working with an account that has multiple databases and containers — especially if some containers share database-level throughput and others have dedicated throughput — the Resource Visualizer gives you a single-screen overview. It's also useful for quickly confirming that a container you just created actually exists where you expected it.

What it doesn't do: It doesn't show document counts, storage utilization, or query performance. For those, use the Metrics blade or the individual container's Scale & Settings pane.

That's the operations reference. Between this and Appendix A, you've got a reference shelf you can come back to whenever Cosmos throws something at you that you haven't seen before — or that you've seen before and can't quite remember how you handled it last time.

These two appendices wrap up Azure Cosmos DB for .NET Developers. Thanks for reading. If any of this helped you ship something, I'd love to hear about it. Oh...and if you'd like to hire me to help you out on a project or to speak at an event or you just want to hang out, drop me a line at info@benday.com.

This is the final entry in Azure Cosmos DB for .NET Developers.

Categories: cosmos-db-for-dotnet-developers

Tags: cosmosdb azure operations devops reference