Cosmos DB: Appendix A — Query Cookbook - Benjamin Day Consulting, Inc.

Software Delivery Rescue & Technical Leadership

Cosmos DB: Appendix A — Query Cookbook

May 11, 2026

This is Appendix A of Azure Cosmos DB for .NET Developers. Previous: Chapter 14: Reporting with Fabric.

This isn't a chapter. It's a reference shelf.

Every entry follows the same pattern: here's the problem, here's the query, here's why it works. The examples draw from the cocktail app (Chapters 10–12) and the Person/Note/Comment sample app (Chapters 5–6). Bookmark this page. You'll be back.

SELECT vs. SELECT VALUE

The problem: You ran a query expecting a number or a list of documents, and instead you got a JSON object wrapper you didn't ask for.

What's happening:

SELECT COUNT(1) FROM c
WHERE c.entityType = 'CocktailRecipe'

Returns this:

[{ "$1": 5800 }]

That's an object with an auto-generated property name. Not the number 5800. If you're deserializing this into a C# int, it won't work.

SELECT VALUE COUNT(1) FROM c
WHERE c.entityType = 'CocktailRecipe'

Returns this:

[5800]

That's the raw scalar. SELECT VALUE unwraps the result — it gives you the value instead of an object containing the value. Same thing applies when you're selecting documents:

-- Returns documents wrapped in an object: [{ "c": { ... } }]
SELECT c FROM c WHERE c.entityType = 'CocktailRecipe'

-- Returns the documents directly: [{ ... }]
SELECT VALUE c FROM c WHERE c.entityType = 'CocktailRecipe'

The rule of thumb: If you want to deserialize the results directly into your C# model (or into a scalar), use SELECT VALUE. If you need to project into a custom shape with named properties, use SELECT with aliases.

Cross-reference: Chapter 7 (query cost comparisons), Chapter 11 (LINQ vs. raw SQL).

SELECT DISTINCT VALUE

The problem: You're searching nested arrays with JOIN and getting duplicate results.

In the cocktail app, searching for recipes that contain "lime" looks like this:

SELECT VALUE c FROM c
JOIN i IN c.ingredients
WHERE c.entityType = 'CocktailRecipe'
AND CONTAINS(i.name, 'lime', true)

If a recipe has both "Lime Juice" and "Lime Wheel" in its ingredients, the JOIN matches twice, and you get the same recipe document twice in your results.

The fix:

SELECT DISTINCT VALUE c FROM c
JOIN i IN c.ingredients
WHERE c.entityType = 'CocktailRecipe'
AND CONTAINS(i.name, 'lime', true)

DISTINCT VALUE deduplicates at the document level. Two matches in the same recipe, one result. This is the standard pattern for any query that JOINs into nested arrays.

RU note: DISTINCT adds a small amount of processing overhead. It's worth it. The alternative is deduplicating in your application code, which means you've already paid the RU cost to read the duplicates and transfer them across the wire.

Cross-reference: Chapter 11 (ingredient search queries).

CONTAINS with Case-Insensitive Search

The problem: Your text search is case-sensitive and users are annoyed.

-- Case-sensitive: searching for "bourbon" won't match "Bourbon"
SELECT VALUE c FROM c
WHERE CONTAINS(c.name, 'bourbon')

The fix: The third parameter. It's a boolean.

SELECT VALUE c FROM c
WHERE CONTAINS(c.name, 'bourbon', true)

That true enables case-insensitive matching. It's easy to miss in the docs because the parameter is optional and not exactly advertised.

In C# with the SDK:

var query = context.Queryable
    .Where(x => x.Name.Contains(
        searchText, StringComparison.OrdinalIgnoreCase));

The Cosmos LINQ provider translates StringComparison.OrdinalIgnoreCase into that true third parameter. This is one of those cases where LINQ actually gives you a cleaner syntax than the raw SQL.

Cross-reference: Chapter 11 (SearchByTitleAsync).

Parameterized Queries

The problem: You're building query strings with string concatenation, and you're inviting both SQL injection and poor query plan caching.

Don't do this:

// NO. BAD. STOP.
var queryText = $"SELECT VALUE c FROM c WHERE c.name = '{userInput}'";

Do this:

var queryText = "SELECT VALUE c FROM c WHERE c.name = @name";

var queryDef = new QueryDefinition(queryText)
    .WithParameter("@name", userInput);

var iterator = container.GetItemQueryIterator<MyDocument>(
    queryDef,
    requestOptions: new QueryRequestOptions { PartitionKey = partitionKey });

Parameters are prefixed with @. You chain .WithParameter() calls for each one. The SDK handles escaping and type conversion.

Why it matters beyond security: Cosmos DB caches query plans based on the query text. If you concatenate values into the string, every distinct value produces a distinct query plan. With parameters, the query text stays the same and the plan gets reused. On a high-throughput system, this is measurable.

Cross-reference: Chapter 11 (SearchBySingleIngredientAsync raw SQL example).

Cosmos JOIN Is Not Relational JOIN

The problem: You wrote JOIN expecting it to work like SQL Server, and the results make no sense.

The critical distinction: In a relational database, JOIN connects rows from two different tables. In Cosmos DB, JOIN iterates a nested array within a single document. It's a cross-apply, not a relational join. There is no "join between documents" in Cosmos DB.

-- This iterates the ingredients array inside each recipe
SELECT c.name AS recipe, i.name AS ingredient
FROM c
JOIN i IN c.ingredients
WHERE c.entityType = 'CocktailRecipe'

For a recipe named "Margarita" with three ingredients, this produces three rows — one per ingredient. The JOIN expands the nested array into a flat result set. It's the same concept as CROSS APPLY in SQL Server or SelectMany in LINQ.

The implications: You can't join CocktailRecipe documents with IngredientClassification documents, even though they're in the same container. If you need to correlate data across entity types, you either do it in your application code or you use Fabric (Chapter 14). This is a structural limitation of the database architecture, not a missing feature.

Cross-reference: Chapter 10 (multi-entity design), Chapter 13 (why Cosmos is bad at reporting), Chapter 14 (Fabric for cross-entity analytics).

EXISTS Subqueries for Multi-Value AND Search

The problem: You want to find recipes that contain both bourbon AND bitters. A single JOIN with an OR gives you recipes that have either. You need AND logic across multiple values in a nested array.

The pattern:

SELECT DISTINCT VALUE c FROM c
WHERE c.entityType = 'CocktailRecipe'
AND EXISTS (
    SELECT VALUE 1 FROM i IN c.ingredients
    WHERE CONTAINS(i.name, 'bourbon', true)
)
AND EXISTS (
    SELECT VALUE 1 FROM i IN c.ingredients
    WHERE CONTAINS(i.name, 'bitters', true)
)

Each EXISTS subquery independently checks whether the ingredients array contains a match. Stacking them with AND means the recipe must match all of them. Add more EXISTS clauses for more ingredients.

In C# with dynamic query building:

var sb = new StringBuilder();
sb.Append("SELECT DISTINCT VALUE c FROM c ");
sb.Append("WHERE c.entityType = 'CocktailRecipe' ");
sb.Append("AND c.tenantId = @tenantId ");

var queryDef = new QueryDefinition(""); // placeholder
int paramIndex = 0;

foreach (var ingredient in ingredients)
{
    var paramName = $"@ingredient{paramIndex}";
    sb.Append($"AND EXISTS (SELECT VALUE 1 FROM i IN c.ingredients ");
    sb.Append($"WHERE CONTAINS(i.name, {paramName}, true)) ");
    paramIndex++;
}

queryDef = new QueryDefinition(sb.ToString())
    .WithParameter("@tenantId", tenantId);

paramIndex = 0;
foreach (var ingredient in ingredients)
{
    queryDef = queryDef.WithParameter(
        $"@ingredient{paramIndex}", ingredient.Trim());
    paramIndex++;
}

Yes, it's dynamic SQL. But it's parameterized dynamic SQL. The query structure varies by ingredient count, but the values are always parameters.

LINQ alternative: You can also do this with chained .Where(x => x.Ingredients.Any(...)) calls. As discussed in Chapter 11, the LINQ version actually works well here and sometimes outperforms hand-written SQL because the SDK generates some clever projections.

Cross-reference: Chapter 11 (LINQ vs. raw SQL head-to-head on ingredient search).

String Functions and LIKE

The problem: You're coming from SQL Server and you want to use LIKE '%something%' or call string functions you're used to.

What Cosmos DB has:

CONTAINS(str, substr) does substring matching, optionally case-insensitive with the true third parameter. STARTSWITH(str, prefix) does prefix matching and also has the case-insensitive option. ENDSWITH(str, suffix) does suffix matching. STRINGEQUALS(str1, str2, ignoreCase) does case-insensitive equality without paying for a full CONTAINS scan. Beyond those, you've got the usual suspects: UPPER, LOWER, LENGTH, SUBSTRING, REPLACE, CONCAT, TRIM, LTRIM, RTRIM, LEFT, RIGHT.

What about LIKE? Cosmos DB does have a LIKE operator, and it supports % (any characters) and _ (single character) wildcards. But here's the thing: CONTAINS is usually the better choice for substring matching. LIKE '%bourbon%' and CONTAINS(c.name, 'bourbon') do the same work, but CONTAINS reads better and has the handy case-insensitive third parameter. LIKE is most useful when you need pattern matching that isn't a simple substring — things like LIKE 'Mar%' for prefix matches (although STARTSWITH is cleaner for that too).

What's expensive: Any function that has to evaluate against every character of every string in every document. CONTAINS on a large field is inherently more expensive than STARTSWITH because STARTSWITH can short-circuit once the prefix doesn't match, while CONTAINS has to scan the entire string. If performance matters and a prefix match is sufficient, prefer STARTSWITH.

Aggregate Functions and Cross-Partition Behavior

The problem: You need COUNT, SUM, or AVG, and you're not sure what happens when the data spans partitions.

The basics:

-- Count all recipes
SELECT VALUE COUNT(1) FROM c
WHERE c.entityType = 'CocktailRecipe'

-- Average ingredient count (using a subquery)
SELECT VALUE AVG(ARRAY_LENGTH(c.ingredients)) FROM c
WHERE c.entityType = 'CocktailRecipe'

Available aggregates: COUNT, SUM, AVG, MIN, MAX.

Cross-partition behavior: If your query doesn't specify a partition key, aggregates run as a cross-partition query. The engine fans out to each partition, computes a partial aggregate on each, and then combines the results. For COUNT, SUM, MIN, and MAX, the combination is straightforward. For AVG, the engine needs to compute the sum and count on each partition and then combine them — which it handles correctly, but the RU cost is proportional to the number of partitions touched.

The RU trap: That cross-partition aggregation can be expensive on high-throughput accounts with many physical partitions. If you find yourself running frequent aggregate queries across all partitions, that's a signal that you want an analytical layer (Chapter 14) rather than running analytics on the transactional engine.

Conditional aggregation: Cosmos supports ternary expressions inside aggregates:

SELECT
    COUNT(1) AS totalCount,
    SUM(c.embedding != null ? 1 : 0) AS withEmbedding,
    SUM(c.embedding = null ? 1 : 0) AS withoutEmbedding
FROM c
WHERE c.entityType = 'CocktailRecipe'
AND c.tenantId = 'COCKTAILS'

This is the Cosmos equivalent of SUM(CASE WHEN ... THEN 1 ELSE 0 END) in SQL Server. LINQ can't express this — there's no Sum overload that takes a ternary. Raw SQL only.

Cross-reference: Chapter 7 (RU cost of aggregates), Chapter 11 (GetEmbeddingStatsAsync), Chapter 13 (why analytics belong in Fabric).

DateTime Math

The problem: You need to query by date range, and Cosmos DB's date handling is... not intuitive.

Background: Cosmos DB stores a system property called _ts on every document. It's a Unix timestamp — seconds since epoch. It's not a string, not a DateTime, not an ISO 8601 value. It's an integer. Every query involving "when was this last modified" eventually touches this property.

Querying items modified in the last N days:

SELECT VALUE c FROM c
WHERE c._ts > DateTimeToTimestamp(
    DateTimeAdd("dd", -2, GetCurrentDateTime())
) / 1000

Querying items modified in the last N minutes:

SELECT VALUE c FROM c
WHERE c._ts > DateTimeToTimestamp(
    DateTimeAdd("mi", -5, GetCurrentDateTime())
) / 1000

That /1000 is critical. GetCurrentDateTime() returns an ISO 8601 string. DateTimeAdd adds an interval to it using string codes. DateTimeToTimestamp converts the result to milliseconds since epoch. But _ts is in seconds. So you divide by 1000. Miss that division and your comparison is off by a factor of a thousand. You'll either get zero results or all results. Ask me how I know.

The interval codes: "yyyy" (year), "mm" (month), "dd" (day), "hh" (hour), "mi" (minute), "ss" (second), "ms" (millisecond). Yes, month is "mm" and minute is "mi". Don't confuse them. You will confuse them. You will debug it for 30 minutes before you notice.

If you store your own date fields as ISO 8601 strings on your documents, you can compare them directly using string comparison because ISO 8601 sorts lexicographically:

SELECT VALUE c FROM c
WHERE c.createdDate > '2026-01-01T00:00:00Z'

This works as long as your dates are consistently formatted with the Z suffix and zero-padded. If they're not — fix that first.

OFFSET/LIMIT

The problem: You want pagination and you're reaching for OFFSET 20 LIMIT 10 like you would in SQL Server.

The syntax:

SELECT VALUE c FROM c
WHERE c.entityType = 'CocktailRecipe'
ORDER BY c.name
OFFSET 20 LIMIT 10

This works, but there's a gotcha that'll get you on cost. Cosmos DB doesn't have an index cursor that efficiently skips to row 20. It processes all 20 skipped documents and charges you RUs for them. OFFSET 1000 LIMIT 10 reads 1,010 documents and charges accordingly. On page 100 of a 10-items-per-page result set, you're reading 1,000 documents to show 10.

The better approach for deep pagination: Continuation tokens. The SDK returns a continuation token with each page of results. You pass it back to get the next page. The engine picks up where it left off without re-reading skipped documents.

var options = new QueryRequestOptions
{
    MaxItemCount = 10,
    PartitionKey = partitionKey
};

string? continuationToken = null;

// First page
var iterator = container.GetItemQueryIterator<CocktailRecipe>(
    queryDef, continuationToken, options);

var response = await iterator.ReadNextAsync();
continuationToken = response.ContinuationToken;
// Store continuationToken somewhere (session, URL parameter, etc.)

// Next page
iterator = container.GetItemQueryIterator<CocktailRecipe>(
    queryDef, continuationToken, options);

When OFFSET/LIMIT is fine: Small datasets, shallow pagination (first few pages), admin UIs where cost isn't critical. For user-facing pagination over large datasets, use continuation tokens.

The Benday.CosmosDb library wraps this in a GetPagedResultsAsync method that handles the continuation token management, and the ICosmosQueryLogSink logs the RU cost per page so you can see exactly what each page request costs.

Cross-reference: Chapter 7 (RU costs scale with documents read).

ORDER BY and Composite Indexes

The problem: Your query with ORDER BY is slow, expensive, or throwing an error about missing indexes.

The basics: Cosmos DB can sort by any property that's included in the index. By default, all properties are indexed with range indexes, so single-property ORDER BY works out of the box:

SELECT VALUE c FROM c
WHERE c.entityType = 'CocktailRecipe'
ORDER BY c.name ASC

When you need a composite index: The moment you sort by two or more properties, or when you filter on one property and sort by another that's not the filter property, Cosmos DB may need a composite index. Without one, you'll get an error like: "The order by query does not have a corresponding composite index that it can be served from."

Creating a composite index requires updating your container's indexing policy. In the Azure portal, go to your container → Settings → Indexing Policy, and add a composite index:

{
    "indexingMode": "consistent",
    "includedPaths": [{ "path": "/*" }],
    "excludedPaths": [{ "path": "/\"_etag\"/?" }],
    "compositeIndexes": [
        [
            { "path": "/entityType", "order": "ascending" },
            { "path": "/name", "order": "ascending" }
        ]
    ]
}

The order matters. The sequence of properties in the composite index definition must match the sequence in your ORDER BY clause. And the sort direction (ascending/descending) must match too. If your query sorts by entityType ASC, name DESC, you need a composite index with those exact directions.

Cost impact: Custom indexing policies can dramatically reduce RU costs. The default "index everything" policy writes an index entry for every property on every document, which means every write pays for all those index updates. If you only ever query on three properties, exclude the rest.

Cross-reference: Chapter 7 (indexing policy and RU impact).

Cross-Partition Query Detection

The problem: Your query is slow and expensive, and you suspect it's scanning across partitions but you're not sure.

How to tell: The Benday.CosmosDb library's ICosmosQueryLogSink includes an isCrossPartition flag in every diagnostic entry. If that's true, your query is fanning out across physical partitions.

Common causes:

No partition key in the query. If you don't specify the partition key in QueryRequestOptions, every query is cross-partition by definition.
Incomplete hierarchical key. If your hierarchical partition key is /tenantId,/entityType and you only specify tenantId, Cosmos still has to check across all entity types within that tenant.
WHERE clause mismatch. Even with a partition key in the request options, if your query's WHERE clause contradicts or omits the partition key columns, the engine may not be able to limit the scan.

The fix is almost always: Make sure your query specifies the full partition key. With hierarchical partition keys, that means all levels. In the cocktail app, a query for recipes specifies both tenantId = 'COCKTAILS' and entityType = 'CocktailRecipe' — which narrows to a single logical partition.

Cross-reference: Chapter 6 (partition key design), Chapter 7 (RU cost of cross-partition queries), Chapter 11 (diagnostic sink output).

Indexing Policies

The problem: You're burning RUs on writes and you haven't touched the default indexing policy.

The default: Cosmos DB indexes every property on every document with range indexes. That's convenient for getting started — any query works without configuration. But every write pays for all those index updates. On a container with documents that have 50 properties, every write updates 50 index entries even if you only ever query on 3 of them.

A targeted policy:

Start by excluding everything:

{
    "indexingMode": "consistent",
    "includedPaths": [],
    "excludedPaths": [{ "path": "/*" }]
}

Cosmos DB automatically indexes the partition key paths and id, so your basic operations still work. From there, add back only what you actually query on:

{
    "indexingMode": "consistent",
    "includedPaths": [
        { "path": "/name/?" },
        { "path": "/entityType/?" },
        { "path": "/tenantId/?" },
        { "path": "/ingredients/[]/name/?" }
    ],
    "excludedPaths": [{ "path": "/*" }]
}

The /? suffix means "index this specific property." The /ingredients/[]/name/? syntax indexes the name property inside each element of the ingredients array.

When to do this: After your schema and query patterns have stabilized. Don't optimize indexing policies during early development when you're still figuring out what queries you need. Do it before production, or when your RU costs on writes are higher than expected.

The embedding gotcha: If you're storing vector embeddings (Chapter 12), you absolutely want to exclude the embedding property from the regular index. A 1,536-dimension float[] is expensive to index with range operators and you never query it that way — that's what the vector index is for.

Cross-reference: Chapter 7 (cost impact), Chapter 12 (vector indexing policy).

VectorDistance Queries

The problem: You want to do semantic similarity search using vector embeddings stored on your documents.

The query:

SELECT TOP @maxResults
    c.id,
    c.name,
    c.description,
    VectorDistance(c.embedding, @queryVector) AS similarityScore
FROM c
WHERE c.entityType = 'CocktailRecipe'
AND c.tenantId = @tenantId
ORDER BY VectorDistance(c.embedding, @queryVector)

In C# with the SDK:

var queryText = @"
    SELECT TOP @maxResults
        c.id, c.name, c.description,
        VectorDistance(c.embedding, @embedding) AS similarityScore
    FROM c
    WHERE c.entityType = 'CocktailRecipe'
    AND c.tenantId = @tenantId
    ORDER BY VectorDistance(c.embedding, @embedding)";

var queryDef = new QueryDefinition(queryText)
    .WithParameter("@maxResults", maxResults)
    .WithParameter("@tenantId", tenantId)
    .WithParameter("@embedding", queryEmbedding);

The @embedding parameter is a float[] in C#. The SDK serializes it to a JSON array of numbers. You don't need to convert it to a string or do any special formatting.

Projecting out the embedding: Notice that the SELECT clause lists specific properties and does not include c.embedding. This is deliberate. The embedding array is 1,536 floats — about 12 KB of data per document. If you're returning 20 results, that's 240 KB of embedding data you don't need transferred across the wire. Project it out. Only select the properties you'll actually display or use.

No LINQ equivalent: VectorDistance is a Cosmos SQL function with no LINQ mapping. This is always a raw SQL query.

Vector indexing policy: The vector index must be configured at container creation time. You can't add it later. The container policy looks something like:

{
    "vectorIndexes": [
        {
            "path": "/embedding",
            "type": "diskANN"
        }
    ]
}

And the corresponding vector embedding policy on the container:

{
    "vectorEmbeddings": [
        {
            "path": "/embedding",
            "dataType": "float32",
            "distanceFunction": "cosine",
            "dimensions": 1536
        }
    ]
}

Plan ahead. If you think you might want vector search, configure the policy when you create the container. It's free if you don't use it. I learned this the hard way with the cocktail app — I had to recreate the container and re-import all the data when I decided to add vector search after the fact.

Cross-reference: Chapter 12 (full vector search implementation).

Data Explorer Quirks

The problem: Queries that work fine in your C# code behave strangely in the Azure portal's Data Explorer.

The pagination trap: Data Explorer shows results in pages and has a "Load more" button. But it doesn't show you the total result count upfront. If your query returns 500 documents, you'll see the first ~100 and have to click "Load more" repeatedly. This is fine for browsing but misleading if you're trying to validate result counts. Run a SELECT VALUE COUNT(1) separately if you need the actual count.

The timeout: Data Explorer has a query timeout that's shorter than what the SDK allows. Complex queries — especially cross-partition aggregates — may time out in the portal but succeed from your application. If a query fails in Data Explorer, try it from code before concluding it doesn't work.

Parameter support: Data Explorer doesn't support parameterized queries. You have to inline the values. This means you can't copy a parameterized query directly from your C# code into the portal — you need to substitute the parameter values manually first.

Partition key scope: Data Explorer lets you set the partition key for a query. With hierarchical partition keys, you need to provide all levels as a JSON array: ["COCKTAILS", "CocktailRecipe"]. If you provide only the first level, you'll get cross-partition behavior and potentially different results (or higher RU costs) than your application sees.

SELECT * behavior: SELECT * FROM c in Data Explorer returns the documents with a c wrapper: { "c": { ... } }. Use SELECT VALUE c FROM c to get the clean documents. Your SDK-based application code won't have this issue because the SDK handles the unwrapping, but it trips people up in the portal constantly.

That's the query cookbook. Bookmark it, come back when you need it, and when you find a query pattern that should be in here but isn't — well, I'm not that hard to find on the internet.

Next: Appendix B: Cosmos DB Operations Reference.

Categories: cosmos-db-for-dotnet-developers

Tags: cosmosdb azure csharp queries reference