This is Chapter 6 of Azure Cosmos DB for .NET Developers. Previous: Chapter 5: Benday.CosmosDb — The Foundation.
In Chapter 5, we introduced TenantItemBase and the repository pattern, and I mentioned that EntityType is the "species of tree" identifier — the thing that tells you what kind of document you're looking at in a shared container. I also said that TenantItemBase is an aggregate root implementation.
But we didn't really dig into what any of that means for how you design your data. We used a single Note class and moved on. Real applications aren't that simple. Real applications have Persons who have Notes. Notes that have Comments. Lookup tables that provide picklist values. Entities that reference each other, nest inside each other, and sometimes need to exist independently of each other.
This is the chapter where the conceptual vocabulary from Chapters 1 and 2 becomes a practical design framework. By the end of it, you should be able to look at your own domain model and know exactly how to map it into Cosmos DB documents.
One Question That Solves Most of It
When people ask how they should model data in Cosmos DB, I typically start with the following question:
"If you delete the parent, should this disappear too?"
That's it. That one question tells you more about your document structure than any number of blog posts about partition key cardinality or denormalization strategies.
If the answer is "yes, absolutely" — the data belongs inside the parent document. It's a nested object. An Address on a Person. A line item on an Order. Tags on a Note. These things have no independent existence. They're part of the tree.
If the answer is "no, it should survive on its own" — the data is its own document. Its own aggregate root. A Person and a Note might be related, but deleting the Person shouldn't destroy their Notes. (Maybe the Notes get reassigned, maybe they become orphaned, but they don't vanish.) These are separate trees in the same forest.
If the answer is "it's complicated" — congratulations, you've found a real design decision. And that ambiguity is actually useful information. It's telling you something about your domain that you need to think through before you write any code.
Let's work through each of these.
Aggregate Roots
The Aggregate Root design pattern is one of those things that doesn't get nearly enough recognition. And to be honest, I feel like a broken record sometimes talking about aggregate root, aggregate root, aggregate root, aggregate root, ad infinitum. In my mind, identifying your system's aggregate roots is the key that unlocks good design whether you're storing data in a relational database, document database, or some flat-file storage system you rolled yourself.
The idea isn't new — it's been around for a while — but it feels almost completely unknown to most developers.
An Aggregate Root is the entry point for your data access operations. In an application of any complexity, you'll probably have more than one. It's not unusual to have a half-dozen, dozen or more. The content management system for my website/blog has:
- Page
- BlogEntry
- Menu
- Contact
Let's look at the Menu class because it's fairly simple. My content management system (CMS) can have multiple menu definitions but the default is "main menu". Menu extends from TenantItemBase and that means it has its own identity — it's an aggregate root.
namespace Benday.Cms.Api.DomainModels;
public class Menu : TenantItemBase
{
public string Name { get; set; } = string.Empty;
public List<MenuItem> Items { get; set; } = new();
}
But the Menu has a collection of MenuItem objects and those make up the links that are part of the Menu. The MenuItem class doesn't have any identity. There's no MenuItem.Id property because you'd never get a MenuItem on its own. There's no point. You'd never care about a MenuItem unless you were doing something related to a Menu.
public class MenuItem
{
public string Title { get; set; } = string.Empty;
public string Url { get; set; } = string.Empty;
public bool OpenInNewTab { get; set; }
}
This means that I'll want a Repository for Menu that will at a minimum handle all my CRUD operations for that class.
Aggregate roots don't just organize your data — they organize your entire application. Each aggregate root gets a repository. Each repository can get a service layer on top of it. The service layer is where your business logic lives — validation, operations, authorization. The repositories know how to talk to Cosmos. The services know what the business rules are. Everything hides behind interfaces so it's loosely coupled, testable, and maintainable. We'll build this architecture out later in the book when we put a real application together. For now, know that the aggregate root decisions you make in this chapter ripple all the way up and down through your application design.
"Yes, Delete It" — Nested Objects
The simplest case. Some data only makes sense in the context of its parent. It has no identity of its own. You'd never query for it independently. It exists to describe or extend the parent.
Think about a shipping address on an order. You'd probably never say "give me address #47" without knowing which order it belongs to. You might ask to see all orders in Texas though. The difference is subtle but your "where" clause is going after data in the order's address in order to show the orders — you're using the address but what you actually care about is the order.
The address is part of the order. If the order goes away, the address goes with it.
In Cosmos DB, this is just... a property on your document. No special pattern needed.
public class Order : TenantItemBase
{
public string CustomerId { get; set; } = string.Empty;
public DateTime OrderDate { get; set; } = DateTime.UtcNow;
public Address ShippingAddress { get; set; } = new();
public Address BillingAddress { get; set; } = new();
public List<LineItem> Items { get; set; } = new();
public decimal Total { get; set; }
}
public class Address
{
public string Street { get; set; } = string.Empty;
public string City { get; set; } = string.Empty;
public string State { get; set; } = string.Empty;
public string PostalCode { get; set; } = string.Empty;
}
public class LineItem
{
public string ProductId { get; set; } = string.Empty;
public string ProductName { get; set; } = string.Empty;
public int Quantity { get; set; }
public decimal UnitPrice { get; set; }
}
Address and LineItem don't inherit from TenantItemBase. They don't have their own Id and they don't have any use for an id. They don't have their own EntityType. They're just classes — nested objects that get serialized as part of the Order document's JSON tree.
This is the document database promise at its most direct. In a relational database, this Order would be spread across three or four tables: Orders, Addresses, OrderLineItems, maybe OrderAddresses if you're really normalized. You'd need JOINs to reconstruct the Order for display. You'd need transactions to keep the tables consistent. That's a lot of inserts, updates, and joins.
In Cosmos, the Order is one document. One read. One write. The tree stays intact.
The heuristic is simple: if the data has no independent identity and no reason to exist outside its parent, nest it.
"No, It Survives" — Separate Aggregate Roots
Now consider the relationship between a Person and their Notes. If you delete a Person from the system, should all their Notes vanish? Maybe. But maybe not. Maybe those Notes contain institutional knowledge that should be preserved. Maybe they get reassigned to another person. Maybe the business rule is "archive the person but keep the notes."
When the answer to "should this disappear?" is no — or even "maybe not" — you're looking at separate aggregate roots. Separate documents. Separate trees.
In the Benday.CosmosDb library, this means both entities inherit from TenantItemBase:
public class Person : TenantItemBase
{
public string FirstName { get; set; } = string.Empty;
public string LastName { get; set; } = string.Empty;
public string Email { get; set; } = string.Empty;
}
public class Note : TenantItemBase
{
public string Title { get; set; } = string.Empty;
public string Body { get; set; } = string.Empty;
public DateTime CreatedDate { get; set; } = DateTime.UtcNow;
public List<string> Tags { get; set; } = new();
}
Each has its own Id. Each has its own EntityType (automatically set to "Person" and "Note"). Each gets its own repository. Each is independently queryable, independently saveable, independently deletable.
They can live in the same container, and typically they will. The EntityType property — the second level of the hierarchical partition key — is what keeps them distinct. When you call noteRepository.GetAllAsync(tenantId), the query is automatically scoped to documents where EntityType == "Note". The Person documents are right there in the same partition, but the repository doesn't see them. Different species of tree in the same forest.
The Container Is a Forest
This is the "species of tree" concept I introduced in Chapter 5, and it's worth expanding on here because it's central to how you think about Cosmos DB containers.
In a relational database, each table holds one type of thing. The Person table holds person records. The Note table holds notes. You know what you're getting because the table name tells you. And the database engine prohibits you from trying to store a Note in the person table. That's schema validation and enforcement.
In Cosmos DB, a container holds documents. Those documents can be anything. A single container can hold Person documents, Note documents, Comment documents, and LookupValue documents all mixed together. The container is a forest, and the different document types are different species of tree.
EntityType is the species label. It's what makes this work. Without it, you'd have to inspect each document to figure out what it is. With it, the repository can filter by type automatically, and the hierarchical partition key /tenantId,/entityType means that queries for "all Notes belonging to tenant X" hit exactly one partition. No scanning through Person documents to find the Notes. No cross-partition fan-out.
This is why my library defaults to a shared container with EntityType as the second partition key level. It gives you the organizational clarity of separate tables with the performance characteristics of co-located data. Most of the time, this is what you want.
(There are exceptions — we'll talk about when you'd put an entity type in its own container later in this chapter.)
"It's Complicated" — The Parented Pattern
Now here's where it gets interesting. What about Comments on a Note?
If you delete a Note, should its Comments disappear? Almost certainly yes. A Comment without its parent Note is meaningless. It's a response to something that no longer exists.
But Comments aren't quite like line items on an Order. A Comment has its own identity — you might need to edit a specific comment, delete a specific comment, query for "all comments by this user." It has its own lifecycle even though it exists in the context of a parent.
So it fails the "just nest it" test (it has independent identity) but it also fails the "separate aggregate root" test (it shouldn't survive deletion of its parent). It's somewhere in between.
This is what the ParentedItemBase class is for.
ParentedItemBase
Benday.CosmosDb.ParentedItemBase inherits from TenantItemBase and adds two properties that help us to handle these bridging relationships:
public abstract class ParentedItemBase : TenantItemBase, IParentedItem
{
public string ParentId { get; set; } = string.Empty;
public abstract string ParentEntityType { get; set; }
}
ParentId is the Id of the parent document. ParentEntityType is the EntityType of the parent — which species of tree is the parent. ParentEntityType is abstract, which means your concrete class must implement it. You can't forget to specify what kind of parent this entity belongs to.
Something that I want to point out here about data enforcement. In a relational database, you'd probably use a foreign key constraint to define, validate, and enforce this relationship. Cosmos DB doesn't do that kind of enforcement — but that doesn't mean that you don't have to worry about it. Those data validation concerns just hop up a tier or two in the application and become part of the C# code instead of the relational database schemas.
Snarky aside on data validation
As a halfway serious snarky aside... I kinda think we developers have gotten lazy about data validation because we know/think that the RDBMS will do it for us and therefore save us from our mistakes. That ends up putting your data validation logic in the data storage implementation, and I think that's just architecturally weird.
It might have worked fine when you had simple, two-tier client-server applications, but in complex service-oriented systems at scale?
Your data validation tends to happen way too late, and that causes you to write a lot of complex exception handling.
Here's what a Comment class looks like:
public class Comment : ParentedItemBase
{
public string AuthorName { get; set; } = string.Empty;
public string Body { get; set; } = string.Empty;
public DateTime CreatedDate { get; set; } = DateTime.UtcNow;
public override string ParentEntityType
{
get => nameof(Note);
set { } // Required by interface, but the value is fixed
}
}
A Comment is its own document in the container. It has its own Id, its own EntityType ("Comment"), its own Etag. But it also knows who its parent is (ParentId) and what kind of parent it has (ParentEntityType returns "Note").
This gives you the best of both worlds. Comments are independently queryable and editable — they're real documents. But the parent relationship is explicit in the data model, so you can find all comments for a note, and when you delete a note, you know exactly which comments to clean up.
Querying by Parent
The library provides CosmosDbParentedItemRepository<T> (or you use IParentedItemRepository<T> through DI) which adds a key method:
var comments = await commentRepository.GetAllByParentIdAsync(
tenantId,
noteId,
"Note" // optional: filter by parent entity type
);
This queries within the tenant partition for all Comment documents whose ParentId matches the specified note. Efficient, partition-scoped, and type-aware.
Registration follows the same pattern from Chapter 5:
// Simple — no custom repository class needed
helper.RegisterParentedRepository<Comment>();
// With a custom repository
helper.RegisterParentedRepository<Comment, ICommentRepository, CommentRepository>();
// With a service layer too
helper.RegisterParentedRepositoryAndService<Comment>();
When to Use ParentedItemBase vs. Nesting
It's important to remember that the maximum size of a document in Cosmos DB is 2MB. Two megabytes. On the one hand, you can fit an awful lot of text into 2MB. On the other hand, there's a lot of stuff that's bigger than 2MB. 2MB is comfortable size constraint but it's not huge and you have to plan for it.
That said, here's my rule of thumb:
Nest it (plain property on the parent) when the child data has no independent identity, you'll never query for it outside the context of its parent, and the collection is small and bounded. Addresses on an Order. Tags on a Note. Phone numbers on a Person.
Make it a parented item (ParentedItemBase) when the child has its own identity, you need to query or edit it independently, the collection could grow large and eventually bump up against that 2MB limit, or the child has its own lifecycle (creation, modification, deletion at different times than the parent). Comments on a Note. Attachments on a ticket. Activity log entries on a project.
Make it a separate aggregate root (TenantItemBase) when the entity should survive the deletion of whatever it's currently associated with, when the relationship might change, or when it's referenced by multiple parents. A Person associated with multiple Projects. A Tag definition used across many Notes. Lookup values shared by the whole application.
But then again maybe you solve the delete problem by not solving it. In the Notes and Comments scenario, maybe it's easier to just nest the data but don't actually delete the Notes or Comments — do a soft delete instead. Add a boolean property to the document called IsDeleted and set it to true and just hide the record. If you need to get at the comments, the data is all still there.
There are options.
Denormalization: When Copying Data Is the Right Answer
If you've spent your career in relational databases, you've probably internalized a rule: THOU SHALT NOT DUPLICATE DATA! You know the drill and the buzzwords. Normalize. Single source of truth. If a customer's name appears in the Orders table and the Customers table, you've done something wrong.
In Cosmos DB, that rule doesn't apply the same way. In fact, strategic denormalization is often the right design choice.
Here's why. In a relational database, JOINs are cheap. Retrieving a customer name from a related table during an order query costs almost nothing. So you normalize — store the name once, JOIN when you need it. (Plus, storage space used to be expensive and every byte was precious.)
In Cosmos DB, there are no JOINs across documents. (Cosmos has a JOIN keyword, but it operates within a single document — joining across nested arrays, not across separate documents.) If your Order document needs to display the customer's name, you have two choices: make a separate read to the Customer document (that's another RU and another round trip), or store the customer's name directly on the Order.
Storing it on the Order is denormalization, and it's often the better choice. Here's when:
Denormalize when the data is a snapshot. An Order should record the customer's name at the time the order was placed. If the customer changes their name later, the Order should still show the original name. The denormalized copy isn't duplicate data — it's a historical record. The snapshot IS the point.
Denormalize when you need the data for display. If every time you display a list of Comments, you need to show the author's name, storing that name on the Comment saves you from making a separate Person read for every comment in the list. At scale, those extra reads add up fast — both in RUs and in latency.
Denormalize when the source data changes rarely. If a product's name changes once a year, the cost of updating the denormalized copies once a year is far less than the cost of JOINing to the product table thousands of times a day.
Don't denormalize when the data changes frequently and consistency matters immediately. If a user changes their email address and every document in the system needs to reflect that change in real-time, denormalization creates a consistency problem. In those cases, either accept the extra reads or use the Change Feed (Chapter 8) to propagate updates asynchronously.
A Practical Example
Look at the LineItem class in our Order example:
public class LineItem
{
public string ProductId { get; set; } = string.Empty;
public string ProductName { get; set; } = string.Empty;
public int Quantity { get; set; }
public decimal UnitPrice { get; set; }
}
ProductName and UnitPrice are denormalized. The line item records what the product was called and what it cost at the time of the order. Even if the product's price changes tomorrow, this order still reflects what the customer actually paid. The denormalized copy isn't a shortcut — it's a business requirement.
This is one of those places where Cosmos DB's document model actually clarifies your thinking. In a relational database, it's easy to conflate "the current price of the product" with "the price the customer paid." The JOIN makes them feel like the same thing. In a document database, you have to make an explicit decision about what data the document carries, and that decision forces you to think about what the document means.
When to Use a Separate Container
Most of the time, you'll put all your entity types in a single container with the hierarchical partition key /tenantId,/entityType. That's the default, and it works well for most applications.
But sometimes you want a separate container. Here are the big three cases:
High-volume write-heavy entities. If you have an entity type that generates a huge volume of writes — webhook logs, telemetry data, audit trails — putting it in the same container as your core business data means they share throughput. A burst of webhook processing could throttle your user-facing queries. A separate container with its own provisioned throughput isolates the blast radius.
Different partition key requirements. Not every entity is going to fit my favorite /tenantId,/entityType partition key pattern. Maybe your webhook log entries are partitioned differently — by date, by source, by a custom key. A separate container lets you use a different partition key path without affecting the rest of your data. (This gets even more relevant when you start using the Cosmos DB Change Feed. More on that later.)
Different TTL policies. Cosmos DB supports automatic document expiration via time-to-live (TTL), but TTL is configured at the container level. If your audit logs should expire after 90 days but your business data should live forever, they need different containers.
In Chapter 5, we showed how to register an entity type in a different container:
helper.RegisterRepository<
WebhookLogEntry,
IWebhookLogRepository, WebhookLogRepository>(
containerName: "raw-webhooks",
partitionKey: "/partitionKey"
);
The library handles the routing. Your application code doesn't change — you still inject the repository and call SaveAsync. The container difference is a configuration detail, not a code change.
The Sample App: Four Entity Types, Three Patterns
The Benday.CosmosDb repository on GitHub includes a sample application that demonstrates everything we've covered in this chapter. Let's walk through it, because seeing all the patterns in one place is worth more than any amount of abstract explanation.
Person — An Aggregate Root
public class Person : TenantItemBase
{
public string FirstName { get; set; } = string.Empty;
public string LastName { get; set; } = string.Empty;
public string Email { get; set; } = string.Empty;
}
A Person is a standalone aggregate root. It has its own identity, its own lifecycle, and no parent. Deleting a Person doesn't cascade to anything else in the system. It's registered with a custom repository because the app needs Person-specific queries:
helper.RegisterRepository<Person, IPersonRepository, PersonRepository>();
Note — An Aggregate Root That's Also a Parent
public class Note : TenantItemBase
{
public string Title { get; set; } = string.Empty;
public string Body { get; set; } = string.Empty;
public DateTime CreatedDate { get; set; } = DateTime.UtcNow;
public List<string> Tags { get; set; } = new();
}
A Note is also a standalone aggregate root — TenantItemBase, not ParentedItemBase. But it's also a parent: Comments reference it via ParentId.
Note that Tags is a nested list — simple strings with no independent identity. They pass the "nest it" test: no one queries for a tag outside the context of its note. (If you needed a global tag directory, that would be a separate aggregate root.)
Comment — A Parented Item
public class Comment : ParentedItemBase
{
public string AuthorName { get; set; } = string.Empty;
public string Body { get; set; } = string.Empty;
public DateTime CreatedDate { get; set; } = DateTime.UtcNow;
public override string ParentEntityType
{
get => nameof(Note);
set { }
}
}
A Comment has its own document, its own identity, its own EntityType. But it belongs to a Note. If the Note is deleted, the Comments should be cleaned up. The ParentId links it back to its parent Note. ParentEntityType returns "Note" — the species of tree that's the parent.
AuthorName is denormalized. The Comment stores the author's name at creation time rather than referencing a Person by ID and requiring a separate read. If the author changes their display name later, existing Comments retain the name that was current when the comment was written. The snapshot is the point.
LookupValue — An Aggregate Root in a Separate Container
public class LookupValue : TenantItemBase
{
public string LookupType { get; set; } = string.Empty;
public string LookupKey { get; set; } = string.Empty;
public string LookupValue { get; set; } = string.Empty;
public int DisplayOrder { get; set; }
}
LookupValues are reference data — the values that populate dropdown lists. Status values, categories, priorities. They're aggregate roots (each has independent identity), but they often live in a separate container because they have different access patterns than the core business data. They're read frequently, written rarely, and might benefit from different throughput settings.
helper.RegisterRepository<LookupValue>(
containerName: "LookupContainer"
);
BTW, I'm not saying you should do it this way, but you could. Creating a separate container costs you more money in Azure spend. Want to keep your costs as low as possible? Maybe you stick to a single container.
The Patterns at a Glance
| Entity | Base Class | Pattern | Container |
|---|---|---|---|
| Person | TenantItemBase |
Aggregate root | Default |
| Note | TenantItemBase |
Aggregate root (also a parent) | Default |
| Comment | ParentedItemBase |
Parented item (belongs to Note) | Default |
| LookupValue | TenantItemBase |
Aggregate root | Separate |
Four entity types. Three patterns (aggregate root, parented item, separate container). One coherent application.
Designing Your Own Domain
You've now seen the full toolkit. Here's the process I use when modeling a new domain for Cosmos DB:
Start with your entities. List the things your application manages. Don't think about storage yet — just the domain objects.
Apply the delete question to each relationship. For every "X has Y" relationship, ask: if you delete X, should Y disappear? That answer determines whether Y is nested, parented, or independent.
Identify your aggregate roots. These are the entities that have independent identity and independent lifecycle. They inherit from TenantItemBase. Each gets its own repository.
Identify your parented items. These are entities with independent identity that exist in the context of a parent. They inherit from ParentedItemBase. Each knows its parent's ID and entity type.
Nest everything else. Data that has no independent identity and no reason to exist outside its parent is just a property. A class, a list, a nested object. No base class. No repository.
Default to one container. Put everything in a shared container with /tenantId,/entityType unless you have a specific reason not to. High-volume writes, different partition keys, different TTL policies — those are reasons. "It feels cleaner" is not.
Denormalize deliberately. For each cross-document reference, ask: will I need this data every time I read this document? Is the data a snapshot that should be preserved? If yes, denormalize it. If the data changes frequently and consistency matters immediately, don't.
Most applications I've built follow this pattern, and the modeling decisions become almost mechanical once you've internalized the delete question. The hard part isn't the pattern — it's being honest about your domain. The delete question forces that honesty.
Next up: Chapter 7, where we talk about the thing everyone actually cares about — money. Request Units, query cost, indexing policies, and how to stop your Cosmos DB bill from surprising you. The RU logging and cross-partition detection from Chapter 5 become essential tools.