This is Chapter 8 of Azure Cosmos DB for .NET Developers. Previous: Chapter 7: Query Performance & Cost.
When I was learning Cosmos DB, I kept hearing about this "Change Feed" thing. People seemed really excited about it. Blog posts. Conference talks. Microsoft's own documentation devoted significant real estate to it.
But for the longest time, I really never understood why it was all that interesting.
Probably because I was thinking about it wrong. I was thinking it was analogous to SQL Server triggers — a mechanism for knowing when data changed. And honestly, as someone who's been writing enterprise applications for a couple of decades, there have definitely been some times where I was darned glad that I had an audit trail — audit tables — corresponding to my primary tables in my SQL schema. Audit trails are valuable. I'm not dismissing them.
But triggers? Auditing? That's... fine. It's not exciting. And I just didn't understand why anyone would get that excited about knowing when data changed. I mean, wouldn't that just be some kind of application tier concern? My app wrote the data. My app already knows what happened. Why does the database need to tell me something I just did?
It took me a while to realize that I was asking the wrong question. Instead of asking "why would I want to know when data changed?", the question is better put as "why would someone else want to know?"
The Code That Writes Probably Isn't the Code That Responds
Here's the shift in thinking that made the Change Feed click for me.
When you write a document to Cosmos, your code knows that happened. But what if the code that responds to that change isn't your code? What if it's a different application? A different deployment? A different team's responsibility? What if it doesn't even have to be running at the same time as your write?
As a developer and an application architect, I want to keep integrations between apps as clean as possible. I really don't want to know much about other apps because I'll have a tendency to — well, let me put it a different way — the needs of those other apps tend to leak into my app like a water leak in a boat. Here's me. There's you. We have boundaries. The blood-brain barrier.
And then the really selfish reason: if your app goes down, I don't want you taking me down with you.
Think about it in SQL Server terms for a second. You've got a trigger that writes an audit record. If the audit table's disk is full, or the audit trigger throws an exception, or there's a deadlock in the audit logic — your primary INSERT fails. The whole transaction rolls back. Your user's operation fails because of something entirely unrelated to the primary function of your application. That's not great for resilience. That's not great for graceful failure handling.
Synchronous coupling means shared fate. When everything runs in the same transaction, everything fails together.
The Change Feed is a tool for breaking that coupling. The write to Cosmos succeeds on its own terms. Some other process, somewhere else, at some other time, picks up the change and does whatever it needs to do with it. If that other process is down, your write still succeeded. If that other app is running kind of slow, your write was still fast. If it crashes halfway through processing, your write is fine and the Change Feed still has the event waiting for when the processor comes back.
You don't even have to think about it. That's the difference. That's why people get excited.
What the Change Feed Actually Is
Let's get concrete. Every time you create or update a document in a Cosmos DB container, that change is recorded in the container's Change Feed. It's an ordered, persistent log of changes — ordered within each logical partition key, with each change appearing exactly once.
The Change Feed gives you a way to read that log. You set up a processor (or an Azure Function trigger) that monitors the container, and whenever new changes appear, your code runs.
A few important mechanical details:
Creates and updates, not deletes. In the default "latest version" mode, the Change Feed captures creates and updates but not deletes. If you delete a document, that deletion doesn't appear in the feed. If you need to react to deletions, you have two options: use soft deletes (set a deleted property to true and optionally set a TTL for auto-cleanup later), or use the newer "all versions and deletes" mode which captures everything including deletes and intermediate updates.
Latest version only. In the default mode, if a document is updated three times in quick succession, the Change Feed shows it once — with the final state after the third update. You don't see the intermediate versions. If you need every intermediate version, that's the "all versions and deletes" mode.
Ordered per partition key. Changes within the same partition key are guaranteed to arrive in order. Changes across different partition keys have no ordering guarantee. For most applications, this is fine — events for the same tenant or the same entity arrive in sequence, which is what matters.
Persistent. Unlike a message queue where messages expire, the Change Feed in latest version mode retains changes as long as the document exists in the container. You can replay from the beginning if you need to.
The persistence feature has a side benefit. Since it doesn't disappear, it allows the same item in the Change Feed to be used as a trigger for multiple processors or systems. If you're using a message in something like an Azure Storage Queue, once the message is dequeued (read), no one else can see that message.
The SQL Server Trigger Analogy (and Where It Breaks Down)
If you're coming from SQL Server, the Change Feed is conceptually similar to triggers — something changes in the database, code runs in response.
But there's a crucial difference: SQL Server triggers are synchronous. The trigger runs inside your exact same transaction as your INSERT, UPDATE, or DELETE. The caller waits while the trigger runs. If the trigger fails, the transaction rolls back.
The Change Feed is asynchronous. Your write completes immediately and returns to the caller. The processing happens later — or not at all, if nobody's listening. The processing could happen within milliseconds, seconds, minutes, hours, or days. If the processor fails, the original write is fine.
This is a feature, not a limitation. It means a slow or failing consumer can't slow down or break your writes. It means you can deploy, restart, or rewrite the consumer without affecting the writer. It means the writer and the consumer have different lifecycles, different failure modes, and different scaling characteristics.
Why Async Processing Matters (Even in Small Apps)
You might be thinking that this async processing model is overkill for your app. "I'm not building a distributed system. I don't need event-driven architecture. I just have a web app." I get it. It adds complexity. It breaks your single application into something that looks a little more like two applications. More stuff to deploy. More stuff to break.
I appreciate and respect that thinking. That mindset has a healthy fear of the unknown: "I don't quite know how to do this async thing and it could be a bottomless pit of complexity and bugs." But implementing asynchronous processing in an app is pretty easy to do.
Once I wrapped my head around it, I've been surprised by how often I want to use async processing even in small-ish apps. Especially in web apps. There are all kinds of things that you could process on the same thread/request in an ASP.NET app — sending notification emails, generating PDFs, syncing data to an external system, updating search indexes, writing detailed audit logs, data parsing & processing — but every one of those adds latency to the user's request. It might scale just fine. But users get snippy when the app starts to "feel slow."
Moving that work out of band — write the primary data, queue up the side effects, respond to the user — makes your app feel faster even when the total work is the same. Your end user doesn't think about performance the same way that a developer would. They don't care that saving a record takes 1ms followed by 3 seconds to send an email. Instead, they see it as "the Save button took 3 seconds to respond" followed by "your app is slow".
The Chocolate and Peanut Butter
And now I'm going to start talking about Azure Functions. (I'll bet you didn't see that coming.)
The async processing pattern and Azure Functions running in Consumption Plan mode are the chocolate and peanut butter of this world. The rassam and rice. The champagne and pizza. (Seriously...if you haven't tried a dry sparkling wine with greasy, salty pizza, you really should give it a try.)
Azure Functions has a whole bunch of ways to run them and one of those ways is called "Consumption Plan". Azure Functions run in Consumption Plan mode are dirt cheap. You pay per execution. Your Change Feed processor runs for 200 milliseconds to process a webhook event and then goes back to sleep. You're paying for 200 milliseconds of compute. Compare that to running a dedicated background worker in an App Service that's alive 24/7 whether there's work to do or not.
This combination is amazing for handling out-of-band processing work. Especially when speed of processing isn't a hard requirement. If "processed within a few seconds" is just as good as "processed immediately", you can save a TON of money by running on a Consumption Plan.
Keep Your Stuff Separated
As The Offspring said in their 1994 classic "Come Out and Play": You've gotta keep em separated. Prescient words of distributed software wisdom from the pre-internet era. (Yes, I am GenX. How could you tell?)
Keeping things separated and isolated is another reason why the async pattern matters. It's about keeping things separated in Azure for uptime and reliability.
If you're running your background processing on the same App Service as your web app, you've created shared fate again. If a deployment from your Azure DevOps or GitHub Actions pipeline makes something a liiiiiittttllleee bit slow or unstable on that App Service Plan, you're at risk of everything falling apart. It might not be for a long time — maybe just a minute or less — but Murphy's Law says that that's the moment that your biggest customer loses their changes. Your users see errors because your background PDF generation hogged the CPU. Or your background processing stalls because your web traffic spiked and consumed all the available threads.
Keeping the web app, the webhook receiver, and the background processor in separate Azure resources means they succeed and fail independently. Each one can be deployed independently. Each one can scale independently. The blast radius of any single problem is contained.
Public Service Announcement: App Services vs. App Service Plans
If you use Azure App Services, make sure you're 100% clear on the difference between an "App Service" and an "App Service Plan." An App Service is your application — your web app, your API. An App Service Plan is the underlying compute resource — the VM(s) that host your App Services.
The App Service Plan is the thing that you pay for and the thing that you scale. Multiple App Services can happily share a single App Service Plan. But if you need to scale one App Service, you're actually going to be scaling the PLAN and therefore scaling all the other App Services running on that plan.
The gotcha: if you put your web app and your background worker on the same App Service Plan, they share CPU, memory, and deployment slots. They're on the same VM. You haven't actually separated anything — you've just put two processes in the same blast radius. If the Plan goes down, the app services on that plan go down.
That's not good or bad. But it's absolutely essential that you know that that's how it works.
This has cost me money. And I've seen it cost at least a handful of customers extra money, too where they over-provision plans and pay way more than they need. It's a common misunderstanding and an easy mistake to make.
Make sure you know which plan each of your services is running on.
Honest Cheetah: The Change Feed in Production
Let me show you how this all comes together in a real application. My app, Honest Cheetah (https://www.honestcheetah.com), is a delivery intelligence tool that integrates with GitHub Project & Issues. GitHub sends webhook events — issue changes, project updates, status transitions — and Honest Cheetah processes them into flow metrics like cycle time and throughput and lets the user forecast when work is likely to get done. (BTW, it's free and there's also a version for Azure DevOps that's also free. There's also a demo mode, if you just want to try it out.)
Those webhook events from GitHub are the lifeblood of Honest Cheetah (HC). If HC misses a webhook event, the project management data might get weird. GitHub has some amount of retry logic but if my webhook receiver endpoint is down for a minute or two, GitHub is probably going to give up on that message. Also, if my receiver endpoint doesn't return an HTTP 200 (success) fast enough, it'll mark that message as failed and potentially try to deliver it again.
Stability and uptime of that webhook receiver endpoint is crucial.
The Architecture
The architecture for Honest Cheetah for GitHub has three pieces:
1. The Webhook Receiver (Azure Function — Consumption Plan)
GitHub POSTs a webhook event. The receiver validates the message signature for validity (is this a legitimate message from GitHub?), extracts a few metadata fields from the payload, wraps the raw JSON into a RawWebhookEvent document with a 1-hour TTL, and writes it to a raw-webhooks container in Cosmos DB.
That's it. Super simple and minimalist. It doesn't parse the event. It doesn't update any domain data. It doesn't call any services. It just validates that it actually came from GitHub, captures the raw data, and tells GitHub that "we got it" as fast as possible.
Why? Because the webhook receiver needs to be up. If it's down, I miss events from GitHub. GitHub will retry a few times, but if my endpoint is consistently slow or unavailable, I lose data. So I kept it bare bones — minimal dependencies, minimal reasons to fail. Validate auth, grab the raw payload, store it, respond. Sort out any problems later.
In the Git repo for Honest Cheetah, the webhook receiver code is kept in a separate folder and there are almost no or literally no dependencies to anything else in the HC app. This means that the GitHub Actions continuous deployment pipeline almost never needs to run. That means that the receiver almost never gets updated and has just about zero downtime for deployments.
2. The Change Feed Processor (Azure Function — Consumption Plan)
A second Azure Function monitors the raw-webhooks container via a Cosmos DB Change Feed trigger. When new raw webhook events appear in the change feed, the processor picks them up and does the actual work: deserializing payloads, routing by event type (issue events, project events, installation events), calling domain services to update ProjectIssue records in HC with status change history, and writing processed data to the main Data container in Cosmos.
This is where all the complexity lives. Domain logic, service dependencies, error handling. If this Function crashes or needs to be redeployed, the raw events are sitting safely in Cosmos waiting. When the processor comes back, it picks up from where it left off.
If this part of the HC application is down, it's not great...but it's also not that big of a deal. If this goes down for a while, some data in the UI might not get refreshed from GitHub...but overall the app is still running. When when the change feed processor comes back up, it'll catch up on the backlog pretty quickly.
3. The Web Application (ASP.NET Core — App Service)
This is the user interface portion of the app. It's what the user sees and interacts with.
The main Honest Cheetah web app reads from the Data container to display dashboards, metrics, and reports. It doesn't know anything about webhooks or raw events. It just reads processed data.
Once again, there's that moat — the failure protection boundary — around this part of the app. If the webhook receiver goes down or the change feed processor goes down, the data in the UI gets a little stale but we're still up. This reliability goes in the other direction, too. If the web app goes down, it doesn't nuke the change feed processor or the receiver.
Three separate deployments. Three separate scaling profiles. Three separate failure domains.
The Transient Buffer Pattern
The raw-webhooks container deserves special attention. It has a 1-hour time to live (TTL) on every document. Raw webhook payloads arrive, sit in the container long enough for the Change Feed processor to pick them up, and then Cosmos automatically deletes them.
This means the raw container never grows. It's a transient buffer — basically a queue that happens to be a Cosmos container. The Change Feed is what makes it work as a queue: writes go in one side, the processor reads from the other side via the feed.
The 1-hour TTL is generous — the processor typically picks up events within seconds. But the buffer gives you breathing room. If the processor is down for maintenance or experiencing issues, you've got an hour of events safely stored before they start expiring. (I could make this TTL longer if I wanted.)
The Flow Diagram
Here's how the pieces connect:
sequenceDiagram
participant GH as GitHub
participant WH as Webhooks Function
participant RW as raw-webhooks Container
participant CF as Change Feed
participant JF as Jobs Function
participant DV as Data Container
GH->>WH: POST webhook event
activate WH
WH->>WH: Validate HMAC signature
WH->>WH: Wrap raw JSON, set TTL=1hr
WH->>RW: Write RawWebhookEvent
WH-->>GH: 200 OK
deactivate WH
RW--)CF: Change Feed fires
CF--)JF: Deliver batch of changes
activate JF
JF->>JF: Route by event type
JF->>JF: Process into domain objects
JF->>DV: Write ProjectIssue records
deactivate JF
Note over RW: TTL expires (1 hour)
RW->>RW: Raw event auto-deleted
The KeepWarm Trick
There's a practical problem with Consumption Plan Azure Functions: cold starts. If a Function hasn't run in a while, Azure deallocates it. The next time it needs to run, there's a warmup delay while Azure spins up a new instance, loads your code, initializes your dependencies. That delay is unpredictable — could be a second, could be ten seconds, could be longer.
For Honest Cheetah, this is a real problem. GitHub webhook traffic comes in 24/7, but it's bursty. There's no reasonable, logical, predictable downtime. If I didn't have a mitigation strategy, the Azure Function would probably get shut down at least a handful of times per week or per day. That would almost certainly cause dropped or delayed message processing because the warmup time for Consumption Plan Functions can be wickedly unpredictable.
The fix is dead-simple: a timer trigger that fires every 5 minutes and does nothing.
[Function("KeepWarm")]
public void Run([TimerTrigger("0 */5 * * * *")] TimerInfo myTimer)
{
_logger.LogDebug("KeepWarm triggered at {Timestamp}", DateTime.UtcNow);
}
That's it. Five lines. The timer fires, the Function logs a debug message, and the Function goes back to waiting. But because it ran, Azure keeps the instance warm. The next time a Change Feed event arrives, there's no cold start delay.
Put one of these in the app that you're deploying to your Consumption Plan Function App and suddenly you're avoiding the cold start problem. You're telling Azure Functions that you are indeed using this, so don't turn me off.
I have this in both of my Function Apps — the webhook receiver and the Change Feed processor. A few pennies a month in execution costs. Zero dropped messages from cold starts.
If your Consumption Plan Functions process time-sensitive events, add a KeepWarm trigger. It's the best ROI of any code you'll write.
How to Set It Up: The Azure Functions Binding
There are two ways to consume the Change Feed in .NET: the Azure Functions Cosmos DB trigger (the easy way) and the SDK's ChangeFeedProcessor (the control-freak way). Let's start with the easy way, which is what Honest Cheetah uses.
The Trigger
[Function("WebhookProcessor")]
public async Task Run(
[CosmosDBTrigger(
databaseName: "%CosmosDatabase%",
containerName: "%CosmosRawWebhooksContainer%",
Connection = "CosmosConnection",
LeaseContainerName = "leases",
CreateLeaseContainerIfNotExists = false,
StartFromBeginning = false)]
IReadOnlyList<RawWebhookEvent> events)
{
foreach (var evt in events)
{
try
{
await ProcessEventAsync(evt);
}
catch (Exception ex)
{
_logger.LogError(ex, "Error processing event {EventId}", evt.Id);
// Don't rethrow — continue processing other events in the batch
}
}
}
The [CosmosDBTrigger] attribute tells Azure Functions to monitor a Cosmos container for changes and invoke this method whenever new documents appear. The %...% syntax reads values from app settings and saves you from having to hard-code values into the trigger attributes. Connection references a set of app settings for Cosmos authentication (endpoint, credential, client ID for managed identity).
The Function receives a batch of changed documents — not one at a time. You iterate through the batch and process each one.
The Lease Container
The LeaseContainerName = "leases" points to a separate Cosmos container that tracks the processor's position in the Change Feed. This is the checkpoint system. When your processor finishes a batch, the Azure Functions runtime updates the lease to record how far through the feed you've read. If the processor restarts, it picks up from the last checkpoint, not from the beginning.
You need to create this container yourself (note CreateLeaseContainerIfNotExists = false — I prefer explicit control over container creation). A simple container with /id as the partition key works. The lease documents are small and managed entirely by the runtime.
Don't delete the lease container. If the lease container is accidentally deleted, the processor loses its checkpoints. Depending on your
StartFromBeginningsetting, it will either start reprocessing from the beginning of the feed (potentially reprocessing millions of documents) or start from "now" and miss everything between the last checkpoint and now. Neither is great. (Yup...I've made this mistake, too.)
The Producer Side
On the write side, Honest Cheetah uses the [CosmosDBOutput] binding to write raw webhook events:
[Function("WebhookReceiver")]
[CosmosDBOutput("%CosmosDatabase%", "%CosmosContainer%",
Connection = "CosmosConnection")]
public async Task<object> Run(
[HttpTrigger(AuthorizationLevel.Anonymous, "post",
Route = "github/webhook")] HttpRequest req)
{
// Validate HMAC signature
// Extract metadata from payload
// Return a RawWebhookEvent object — the binding writes it to Cosmos
}
The [CosmosDBOutput] binding writes the returned object to Cosmos automatically. You don't even call a repository or a CosmosClient — you just return the document and the binding handles the write. About as thin as a webhook receiver can get.
Configuration
Both Function Apps use managed identity for Cosmos DB authentication — no connection strings. The app settings look like:
CosmosConnection__accountEndpoint = https://your-account.documents.azure.com:443/
CosmosConnection__credential = managedidentity
CosmosConnection__clientId = <your-managed-identity-client-id>
The __accountEndpoint, __credential, and __clientId suffixes are the Azure Functions convention for configuring Cosmos connections via managed identity. None of these are secrets — they're endpoint URLs and identity references, not keys or passwords. Whether to commit them to Git is a team decision, but there's nothing here that would grant access on its own.
How to Set It Up: The SDK's ChangeFeedProcessor
If you're not using Azure Functions — maybe your processor runs inside an ASP.NET Core app or a background worker service or even a command line app — you'll use the SDK's ChangeFeedProcessor directly. This gives you more control but requires more setup code.
Container monitoredContainer = cosmosClient
.GetDatabase("MyDatabase")
.GetContainer("my-container");
Container leaseContainer = cosmosClient
.GetDatabase("MyDatabase")
.GetContainer("leases");
ChangeFeedProcessor processor = monitoredContainer
.GetChangeFeedProcessorBuilder<MyDocument>(
processorName: "myProcessor",
onChangesDelegate: HandleChangesAsync)
.WithInstanceName("instance-1")
.WithLeaseContainer(leaseContainer)
.WithStartFromBeginning()
.Build();
await processor.StartAsync();
// Later, on shutdown:
await processor.StopAsync();
The handler delegate:
static async Task HandleChangesAsync(
ChangeFeedProcessorContext context,
IReadOnlyCollection<MyDocument> changes,
CancellationToken cancellationToken)
{
foreach (var doc in changes)
{
// Process each changed document
}
}
This approach requires you to manage the processor lifecycle yourself — starting it when your app starts, stopping it gracefully on shutdown. You also need to handle the lease container creation, instance naming (important if you're running multiple instances for scale), and error handling within the delegate.
The Azure Functions binding does all of this for you. For most applications, the binding is the right choice. Use the SDK directly when you need fine-grained control over polling intervals, batch sizes, error handling strategies, or when you're running in a host that isn't Azure Functions.
The Gotchas Nobody Warns You About
The Change Feed is powerful, but there are a few sharp edges. Here's what I've learned from running it in production.
Swallowed Errors = Silent Data Loss
Look at the error handling in my processor code:
foreach (var evt in events)
{
try
{
await ProcessEventAsync(evt);
}
catch (Exception ex)
{
_logger.LogError(ex, "Error processing event {EventId}", evt.Id);
// Don't rethrow — continue processing other events
}
}
If ProcessEventAsync throws, the exception is caught, logged, and swallowed. The loop continues to the next event. When the batch completes, the Change Feed checkpoint advances past the failed event.
That failed event is gone. It won't be retried. It exists in the logs, but it's not in a dead-letter queue. There's no built-in mechanism to replay it.
Why do I do it this way? Because the alternative is worse. If you rethrow the exception, the entire batch fails. The checkpoint doesn't advance. The Change Feed delivers the same batch again. If the first event in the batch is the one that's broken, you're stuck in an infinite retry loop — the poisoned event blocks everything behind it.
So you have a choice: swallow errors and accept that individual events can fail silently, or rethrow and risk a single bad event blocking your entire pipeline. Neither option is great.
The right long-term answer is a dead-letter pattern — catch the error, write the failed event to a separate dead-letter container, and continue processing. Then build monitoring and a replay mechanism for the dead-letter container. I'll be honest: Honest Cheetah doesn't have this yet. It's on the list.
In-Memory Idempotency Is Fragile
Honest Cheetah's processor uses a static ConcurrentDictionary to track which events it's already processed, preventing duplicate processing if the same event somehow arrives twice. This works fine... until the Function App restarts. On a cold start, that dictionary is empty. If the checkpoint didn't advance before the restart, events could be reprocessed.
The Change Feed guarantees "at least once" delivery. If your processing needs to be idempotent — and it almost always should be — you need durable idempotency tracking, not an in-memory dictionary. That means checking a database or a distributed cache before processing each event.
Latest Version Mode Has Blind Spots
The default Change Feed mode ("latest version") has two significant limitations:
No intermediate updates. If a document is updated three times rapidly, you see the final state, not the journey. If your processing logic cares about the sequence of changes (state machine transitions, audit trails), you either need to encode that history in the document itself or switch to "all versions and deletes" mode.
No deletes. Deleting a document doesn't generate a Change Feed event in latest version mode. The most common workaround is soft deletes — instead of deleting, set a deleted property to true (which is an update and does appear in the feed), then set a TTL on the document so Cosmos cleans it up automatically later.
For Honest Cheetah, neither of these are a concern because I'm always reacting to a message that GitHub sends me. I'm not using the change feed to handle changes made to the core data of HC itself.
The Lease Container Is a Single Point of...Confusion
The lease container is just a regular Cosmos container with lease documents managed by the runtime. But if something goes wrong with it — accidental deletion, corruption, misconfigured partition key — your Change Feed processor loses its place. Depending on your StartFromBeginning setting, you'll either reprocess everything or skip to "now" and lose everything in between.
Treat the lease container like you'd treat any other production data. Don't delete it. Back up your understanding of what's in it. And set CreateLeaseContainerIfNotExists = false so you're making explicit choices about container creation.
Other Things You Can Do with the Change Feed
Webhook processing is my use case, but the Change Feed enables several other patterns worth knowing about.
Cache invalidation and search index sync. When a document changes in Cosmos, a Change Feed processor updates a Redis cache or an Azure AI Search index. Your read path hits the cache/index (fast and cheap), while the write path goes to Cosmos (durable and consistent). The Change Feed keeps them in sync — eventually. This is a classic CQRS-adjacent pattern.
Denormalization propagation. Remember Chapter 6's discussion of denormalization? If a Person's name is copied into their Note documents for performance, and the Person changes their name, a Change Feed processor can propagate that change to all the denormalized copies. This is how you keep denormalized data from going stale without rebuilding everything on every write.
Materialized views. A Change Feed processor that reads raw operational data and maintains pre-computed summaries — leaderboards, dashboards, aggregated metrics. Honest Cheetah's flow metrics (throughput, cycle time) are essentially materialized views built from raw webhook events.
Event sourcing. If you model all changes as append-only writes (never update, never delete), the Change Feed becomes your event log. You can replay events from the beginning to rebuild state, build multiple materialized views from the same event stream, or integrate with external systems. This is a deep architectural pattern that's beyond the scope of this chapter, but the Change Feed makes it possible in Cosmos.
Data tiering. Move "cold" data from your primary Cosmos container to cheaper storage (Azure Blob Storage, Table Storage) based on age or access patterns. A Change Feed processor watches for documents that meet your archival criteria and copies them to cold storage.
Eventual Consistency: A Philosophy
Let me close with something that might seem like a detour but isn't.
SQL Server and relational databases are built around ACID transactions. Atomicity, Consistency, Isolation, Durability. It all worked or it all didn't work. If anything goes wrong, everything rolls back. The system is never in an inconsistent state — or at least, it tries very hard not to be.
The Change Feed and the async processing patterns in this chapter operate on a different model: eventual consistency. The raw webhook event is written, but it's not processed yet. The user's update is saved, but the search index doesn't reflect it yet. The Person's name changed, but the denormalized copies in their Notes still show the old name. For some period of time — usually milliseconds to seconds, occasionally longer — the system is in an inconsistent state.
And that's okay. It's a deliberate mindset. It's a choice for reliability and scaling.
If I were going to get a tattoo about my philosophy of life, it might very well be "eventual consistency." A belief that bad things can happen — that things can NOT go my way — that I can lose or make mistakes or whatever, and that I'll be able to handle it. I might not love it, but I can handle it.
I find myself quoting Rilke's poem "Already the Ripening Barberries Are Red" and the "believing that there are visions after visions" line a lot. You wouldn't think it, but a random German poet from the early 1900s has stuff to teach us about eventual consistency in our distributed systems 125 years later.
Eventual consistency isn't a compromise. It's a design choice that trades the control-freaky illusion of perfect consistency for real resilience and real decoupling. ACID transactions say "everything succeeds together or fails together." Eventual consistency says "things might be temporarily out of sync, but the system converges to the right state, and in the meantime, nothing is broken — just not yet complete."
The Change Feed is how Cosmos DB gives you the tools to build eventually consistent systems that are resilient, decoupled, and observable. It's not triggers. It's not auditing. It's the infrastructure for building systems that handle imperfection gracefully.
Next up: Chapter 9, where we talk about Security & Permissions — the chapter is about the stuff that makes people angry because the Azure Portal speaks in riddles and whispers to them about what permissions your app actually has.