Document Thinking: Why Cosmos DB Changes How You Design Software

March 24, 2026
Document Thinking: Why Cosmos DB Changes How You Design Software

This is Chapter 2 of Azure Cosmos DB for .NET Developers. Start with Chapter 1: How We Got Here if you haven't read it yet.

In the last chapter, we spent a lot of time talking about trees and boxes. Your domain model is a tree — hierarchical, branching, alive with behavior. Your relational database stores boxes — flat, uniform, stackable. Huge portions of your entire architecture exists solely to chop trees into boxes for storage and reassemble boxes back into trees for use.

All because, fifty-something years ago, disk space cost $700,000 a gigabyte and we couldn't afford to store a tree as a tree.

This chapter is about what happens when you can.

But first, I need to talk about the technology that made it possible. And I need to confess something about Star Trek.

JSON: The Transporter Beam for Trees

I'm a Star Trek nerd. (Specifically, a Star Trek: The Next Generation nerd.) And one of the things I've always thought about — because I'm also a nerd-nerd, the kind of nerd who thinks about the engineering behind the fiction — is what the data format would be for the transporter. When Scotty (or O'Brien, if you're a TNG purist) dematerializes someone, the transporter has to encode the complete structure of a living thing into some intermediate format, beam that format somewhere else, and rematerialize the original structure at the other end. The thing that comes out has to be the same thing that went in. Same structure. Same relationships. Same everything.

I think about this because that's what JSON does for object trees.

As someone who's been developing software for close to 30 years, I can say this with confidence: the invention of JSON has been world-changing. I know that sounds dramatic for a data format, but it's not an overstatement.

JSON lets you take a tree — a living, branching, hierarchical object — and create a dehydrated representation of it. A text format that captures the full structure: what's nested inside what, which branches contain which sub-branches, the whole shape. You can send that dehydrated representation across a network, write it to a file, store it somewhere. And then at the other end, you rehydrate it — and suddenly you've got a tree again. Same structure. Same nesting. Same shape.

It's a transporter beam for data.

If you got your start in distributed systems when I did — I was doing CORBA in the late '90s, then SOAP, then REST — you know what life was like before JSON. CORBA had IDL and binary protocols. SOAP had XML with schemas and envelopes and namespaces that would make your eyes bleed. Converting objects into a transmittable format and back was a significant engineering problem. Getting it wrong produced exactly the kind of bugs we talked about in the last chapter — something gets lost in the conversion, something ends up in the wrong place, and you find out in production.

JSON changed all of that. It maps naturally to object structures because it IS an object structure. Curly braces for objects. Square brackets for arrays. Nesting for hierarchy. Your C# object serializes to JSON that looks like your object. No schema files. No IDL. No XML namespaces. The tree goes in, the tree comes out.

And there's a reason you see JSON everywhere now. It's not because it's trendy. It's because JSON lets you skip the tree-to-box conversion entirely. You don't have to chop the tree into flat pieces to transmit it. You don't have to reassemble it on the other end. The format preserves the shape. The tree survives the trip.

Which brings us to the question that the last chapter was building toward: what if your database stored JSON? What if, instead of chopping your tree into boxes and filing them in a warehouse, you could just beam the tree directly into storage — structure intact, branches in place, hierarchy preserved?

Cosmos DB Stores Trees

That's what Azure Cosmos DB does. It stores JSON documents. And a JSON document is a tree.

When you save an Order object to Cosmos DB, you don't chop it into an Orders box, an OrderItems box, and a ShippingInfo box. You serialize the entire Order tree to JSON — line items nested inside, shipping address nested inside, payment summary nested inside — and store the whole thing as a single document. One tree. One document. No chopping. No boxes.

When you read it back, you don't pull boxes from a warehouse and reassemble a tree. You read one document, deserialize the JSON, and you've got your tree back. Same structure. Same nesting. Same shape. The transporter worked.

No adapter layer to chop and reassemble. No entity classes shaped like boxes. No repository implementations juggling two different shapes. No unit tests verifying that the tree survived the round trip. The tree goes in, the tree comes out. The shape doesn't change.

This is the big insight: the domain object is the stored document. Your Order class — the one with CalculateTotal() and Cancel() and nested LineItems — serializes to JSON and gets stored directly. The thing you think about in your code is the thing that gets stored in the database. This is one of my favorite things about writing apps that use Cosmos DB.

So you get to skip all that entity stuff and adapt stuff but there is sort of a catch. You need to figure out how you expect to access your documents (aka. your data). In relational databases, you can access your data starting from pretty much any relationship — write your query and get the data in the shape that you need. When you use a document db like Cosmos, you can query from pretty much anywhere in a document's structure but what you end up getting back is the same document.

It's more of a mindset shift than a problem but you need to know where to draw the boundaries around your documents. Which brings us to the most important concept in this entire book.

Aggregate Roots: The Key to Document Design

An aggregate root is a concept from domain-driven design. It sounds fancy but it it's pretty easy to grok. It's also not a new, Cosmos-centric idea either. Even if you're writing applications that store data relationally in SQL Server, you still care about your aggregate roots...they're just relevant a couple layers up the design stack in your Service Layer.

Think about an Order in an e-commerce system. An Order has line items. It has a shipping address. It has a payment summary. These things are all part of the order — they don't make sense on their own. A line item without an order is meaningless. A shipping address floating around without being attached to something is just data noise.

The Order is the aggregate root. It's the thing that owns everything inside it. It's the consistency boundary — when you save an Order, you save the whole thing: header, line items, addresses, all of it. When you delete an Order, the line items go with it. They don't have an independent existence.

In a relational system, saving that order might be 10 insert statements and reading that data might take 4 or 5 joins. In Cosmos DB, you save the order and that's it. The query is equally simple — find the matching document and return that document.

Finding the Aggregate Root using Deletes

Here's the key question that'll help you test if you've correctly identified your aggregate roots:

"If you delete the parent, should this other thing disappear too?"

If the answer is yes, it's inside the aggregate. It's part of the tree. It's part of the document.

If the answer is no, it's a separate aggregate. A separate tree. A separate document.

If the answer is maybe, there might be a flaw in your thinking somewhere.

Line items on an order? Delete the order, the line items are gone. They're inside the aggregate. Part of the Order document.

The Customer who placed the order? Delete the order, the customer still exists. The customer is a separate aggregate. A separate document. The order holds a reference to the customer (maybe a customer ID), not the customer itself.

In Cosmos DB, your aggregate root IS your document. Everything inside the aggregate boundary gets stored together as a single JSON document. Everything outside it is a separate document with a reference. If you can identify your aggregate roots, you can design your documents. If you can't, then there's some detail you haven't quite nailed yet.

When I was first learning Cosmos DB, I really wish someone had told me this: document = aggregate root. It's just such an essential nugget of knowledge and I don't think I saw it in other demos and documentation. Pretty much all of the tutorials I saw started with "here's a JSON document and here's how partition keys work". That's fine but knowing your aggregate roots not only makes your documents make sense but it also makes your application architecture make sense.

What Does It Look Like?

Here's what it looks like when you draw the aggregate boundaries:

classDiagram
    namespace Order_Aggregate_-_One_Cosmos_Document {
        class Order {
            +string Id
            +string CustomerId
            +DateTime OrderDate
            +OrderStatus Status
            +List~LineItem~ LineItems
            +ShippingAddress ShippingAddress
            +PaymentSummary Payment
            +CalculateTotal() decimal
            +AddLineItem(product, qty) void
        }

        class LineItem {
            +string ProductName
            +decimal UnitPrice
            +int Quantity
            +GetSubtotal() decimal
        }

        class ShippingAddress {
            +string Street
            +string City
            +string State
            +string Zip
        }

        class PaymentSummary {
            +decimal Subtotal
            +decimal Tax
            +decimal Total
            +PaymentMethod Method
        }
    }

    namespace Customer_Aggregate_-_Separate_Cosmos_Document {
        class Customer {
            +string Id
            +string Name
            +string Email
            +List~Address~ Addresses
            +GetDefaultShippingAddress() Address
        }

        class Address {
            +string Street
            +string City
            +string State
            +string Zip
            +bool IsDefault
        }
    }

    namespace Product_Aggregate_-_Separate_Cosmos_Document {
        class Product {
            +string Id
            +string Name
            +string Category
            +decimal CurrentPrice
            +string Description
        }
    }

    Order *-- LineItem : contains
    Order *-- ShippingAddress : contains
    Order *-- PaymentSummary : contains
    Customer *-- Address : contains

    Order ..> Customer : references via CustomerId
    LineItem ..> Product : references via ProductId

Look at the boundaries. The Order aggregate contains its LineItems, its ShippingAddress, and its PaymentSummary. Delete the order, all of that goes away. That's one tree. One JSON document in Cosmos DB.

The Customer aggregate contains its Addresses. Delete the customer, the addresses go away. That's a separate tree. A separate document.

The Product aggregate stands alone. Delete a product from the catalog, existing orders that reference it aren't affected — they captured the product name and price at the time of the order.

The dotted lines between aggregates are references, not containment. The Order holds a CustomerId string — not the Customer object itself. The LineItem holds a ProductId and a snapshot of the product name and price — not a live reference to the Product.

Each aggregate root becomes one JSON document. The Order document contains everything inside the Order aggregate boundary. The Customer document contains everything inside the Customer aggregate boundary. They live in the same container (or different containers — a design choice we'll get to later), but they're separate documents. Separate trees.

Denormalization Is a Thing

The aggregate root version of document is the official, definitive copy. To put that in relational database terms, that aggregate root version of the document is the primary key version. But unlike traditional relational database normalization advice that says "thou shalt only have one copy of the data", I frequently break that rule in Cosmos.

I denormalize for convenience, performance, and debugging.

What do I mean by that? I often will copy bits of relevant data from one aggregate root into another aggregate root just so I have it. For example, the Order document might end up with a CustomerName property or an Address property. These become convenience properties that let me display data while avoiding extra calls to Cosmos.

Now I know what you're thinking: if the definitive version of the data changes, what do I do with those convenience values? I don't worry about it. Most of the time, it's actually kind of nice to have the snapshot of what the values were at the time that the document was created.

If I'm worried about those denormalized convenience values getting out of sync, then I'd just skip storing them. Or always get the definitive version and only display that one. Having those extra bits of data in the document can make debugging WAY easier.

Cosmos DB Enforces Your Aggregate Boundaries

Here's the thing that makes aggregate roots and Cosmos DB such a natural fit — and the thing that's fundamentally different from working with EF Core against a relational database.

In the relational world, your aggregate boundaries are a design guideline that you hope your team follows. EF Core gives you a DbSet<T> for every entity type. You can write context.OrderItems.Where(oi => oi.Quantity > 10) — reaching directly into order line items without going through the Order. EF Core doesn't stop you, because enforcing aggregate boundaries isn't its job. The discipline has to come from you.

In Cosmos DB, the aggregate boundaries are how the data is physically stored. You can't reach into the line items without loading the Order document, because the line items don't exist independently from the Order. They're nested inside the Order JSON. They don't have their own endpoint, their own query surface, their own independent existence. They exist inside the Order document or they don't exist at all.

You can't accidentally violate an aggregate boundary that Cosmos enforces for you. The storage model and the domain model are the same shape. Less flexibility? Absolutely. More correctness by default? Yah probably. Less object-relational impedance mismatching headaches? Definitely.

Remember what we said in the last chapter about all that infrastructure — the adapters, the entity classes, the repository implementations doing bidirectional shape conversion, the unit tests making sure the tree survived the round trip through the boxes? In Cosmos DB, most of that infrastructure simply isn't needed. The tree goes in, the tree comes out. There's nothing to chop and nothing to reassemble, which means there's nothing to get wrong in the conversion.

So What About EF Core's Cosmos DB Provider?

This is where I'll plant my contrarian flag.

Microsoft ships an EF Core provider for Cosmos DB. You can use DbContext and DbSet<T> against Cosmos just like you would against SQL Server. I think I understand why they're doing this — one data access library to rule them all. — but I think this is the wrong tool for the job and a distraction.

For Cosmos DB, you just don't need it. It's just extra stuff.

(Disclaimer: I enthusiastically use EF Core for SQL Server and recommend it to my customers.)

I want to be specific about why, because this isn't a knee-jerk take. EF Core is an ORM — an Object-Relational Mapper. The "R" is doing important work in that acronym. It exists to bridge the gap between objects and relational schemas. That's its purpose. That's what it's good at. As I said in the last chapter — ORMs have a brutally complex problem to solve, and they do a surprisingly great job.

But Cosmos DB doesn't have relational schemas. It doesn't have tables. It doesn't have foreign keys or JOINs. The gap that EF Core is designed to bridge doesn't exist. There are no boxes for the tree to get chopped into. The tree goes straight into storage as JSON.

So when you use the EF Core Cosmos provider, you're bringing a sophisticated solution to a problem you don't have — and in the process, you're reintroducing complexity that choosing Cosmos DB was supposed to eliminate. You get DbSet<T> giving you direct access to every entity type, which re-creates the aggregate boundary problem that Cosmos's document model had naturally solved. You get change tracking overhead that makes no sense when your unit of persistence is a whole document. You get an abstraction layer that hides the things you actually need to understand about Cosmos — partition keys, request units, cross-partition queries — behind a familiar-but-misleading interface.

This isn't EF Core's fault. EF Core is doing exactly what it's designed to do. In this case, it's just a misapplication of a good tool. Think about it in terms of our metaphor: EF Core is a brilliantly engineered tree-chopping-and-box-packing machine. Cosmos DB doesn't use boxes. Bringing EF Core to Cosmos is like hauling a wood chipper to a job site that has a perfectly good storage shed for whole trees. The machine works great — it's just not needed here.

When Cosmos DB Isn't the Answer

I promised honesty, so here it is.

Cosmos DB is the wrong choice when you need complex cross-entity reporting and ad-hoc queries across your entire dataset. "Show me all orders from Q3 where the customer is in Texas and the product category is electronics and the average item price is above $50" — that's a query that relational databases were born for. That's where the boxes-in-a-warehouse model shines. You can combine any box with any other box to answer questions nobody anticipated. That same query in Cosmos DB? Well, it might make your brain hurt. And performance might be "sub-optimal".

It's the wrong choice when your data model is genuinely relational — when the relationships between entities are as important as the entities themselves and you need to traverse those relationships in unpredictable ways.

It's also the wrong choice when your team doesn't have the bandwidth to learn a new mental model. Document thinking isn't harder than relational thinking, but it is different, and learning that transition has a cost. If your team is deep in relational expertise and the project doesn't justify the learning curve, that's a perfectly valid reason to stick with what you know.

It's the wrong choice when cost predictability matters more than scalability. Cosmos DB pricing is consumption-based (request units), which is great for elastic workloads but can be surprising for steady-state applications. SQL Server pricing is often more predictable.

None of these are weaknesses of Cosmos DB. They're characteristics of a tool designed for a specific set of problems and different scalability models. The mistake isn't using SQL Server when SQL Server is the right choice — the warehouse full of boxes is great when you need a warehouse full of boxes. The mistake is using Cosmos DB with a relational mental model (trying to cram boxes into a tree database), or using SQL Server when your data is naturally hierarchical and you're going to spend half your architecture chopping and reassembling.

What's Next

So here's where we are. You understand the shapes: trees for your domain, boxes for relational storage, and the cost of converting between them. You understand that JSON is the transporter that lets trees survive a trip to storage and back. You understand that Cosmos DB stores JSON documents — which means it stores trees — and that aggregate roots are the concept that tells you where to draw the document boundaries.

The rest of this book is about building real applications with this mental model. We'll start with the raw SDK — setting up accounts, writing CRUD operations, feeling the friction of doing everything by hand. Then we'll build the abstractions that make it productive, using my Benday.CosmosDb library to encode aggregate root thinking directly into the architecture.

We'll get deep into data modeling — using the "delete the parent" heuristic to make real design decisions about document boundaries, partition keys, and parent-child relationships. We'll talk about query performance and cost, because Cosmos DB will happily let you spend money you didn't need to spend if you don't understand request units and cross-partition queries.

We'll build a complete production application — ASP.NET Core with authentication, authorization, and admin management, all backed by Cosmos DB — to prove that this isn't just theory. And we'll cover the operational reality: Change Feed for event-driven patterns, security and permissions (which are genuinely bizarre and deserve their own chapter), and advanced patterns for teams running Cosmos in production.

But everything starts with the mental model we've built in these first two chapters. Your domain model is a tree. Relational databases store boxes. The impedance mismatch is the cost of converting between them. JSON is the transporter beam that preserves the tree's shape. Cosmos DB stores JSON. Your aggregate root is your document.

No more chopping. No more boxes. Just the tree.