Query Arrays in Cosmos DB

As a database architect, I often wonder, “How do we effectively query arrays in Cosmos DB?”. In this article, I’ll cover everything you need to know about querying arrays in Cosmos DB, from basic concepts to advanced methods.

Query Arrays in Cosmos DB

Before discussing the main topic, let’s discuss arrays in Cosmos DB. Arrays are ordered collections of values enclosed in square brackets.

{
  "id": "customer-2589",
  "name": "Paul Smith",
  "location": "Chicago",
  "orders": [
    { "orderId": "ORD-001", "amount": 88.95, "items": 3 },
    { "orderId": "ORD-002", "amount": 138.50, "items": 2 },
    { "orderId": "ORD-003", "amount": 36.00, "items": 1 }
  ],
  "tags": ["premium", "midwest", "frequent-buyer"]
}

In this example, we have two arrays: orders (a variety of objects) and tags (an array of strings).

Let’s explore the different methods available for querying arrays in Cosmos DB.

Method 1: Using the ARRAY_CONTAINS Function

The ARRAY_CONTAINS function is the simple way to check if an array contains a specific value.

SELECT * FROM c WHERE ARRAY_CONTAINS(c.tags, "premium")

This query returns all documents where the tags array contains the string “premium”.

For optimal performance, create an index on the array property:

{
  "indexingPolicy": {
    "includedPaths": [
      {
        "path": "/tags/*",
        "indexes": [
          {
            "kind": "Range",
            "dataType": "String",
            "precision": -1
          }
        ]
      }
    ]
  }
}

Method 2: Array Iteration Using JOIN

When you need to filter on properties within array objects, the JOIN keyword provides an excellent solution.

SELECT c.id, order
FROM c
JOIN order IN c.orders
WHERE order.amount > 100

This query returns all documents with orders exceeding $100 and the matching order details.

Method 3: ARRAY Length Operations

Sometimes, you need to filter based on the size of an array. The query below can be used for this purpose.

SELECT c.id, c.name
FROM c
WHERE ARRAY_LENGTH(c.orders) > 2

This query finds all customers who have placed more than two orders.

Method 4: Using Sub-queries with EXISTS

The EXISTS clause allows for powerful sub-queries when working with arrays.

SELECT c.id, c.name
FROM c
WHERE EXISTS(
    SELECT VALUE order
    FROM order IN c.orders
    WHERE order.amount > 200 AND order.items > 3
)

This query finds customers with at least one large, multi-item order (over $200 with more than three items).

Method 5: Array Projections with SELECT VALUE

The SELECT VALUE syntax can be used when you want to transform array data during queries.

SELECT c.id, 
    ARRAY(SELECT VALUE order.orderId FROM order IN c.orders) AS orderIds
FROM c

This query returns each customer’s ID along with a list of just their order IDs, creating a projected array.

Working with Nested Arrays

We can handle this using the query below.

SELECT c.id
FROM c
JOIN order IN c.orders
JOIN item IN order.items
WHERE item.category = "Electronics" AND item.price > 500

This multi-level join query finds customers who have purchased expensive electronics.

Performance Optimization for Array Queries

1. Proper Indexing Strategy

Array indexing is crucial in terms of performance:

{
  "indexingPolicy": {
    "includedPaths": [
      {
        "path": "/orders/*/amount",
        "indexes": [
          {
            "kind": "Range",
            "dataType": "Number",
            "precision": -1
          }
        ]
      }
    ]
  }
}

2. Query Partition Key Whenever Possible

Always include your partition key in array queries.

SELECT * FROM c 
WHERE c.location = "Chicago" AND ARRAY_CONTAINS(c.tags, "premium")

3. Use TOP for Limit Operations

You can use the query below.

SELECT TOP 10 * FROM c
JOIN order IN c.orders
WHERE order.amount > 100
ORDER BY order.amount DESC

Real-Time Example

I built a customer analytics system for a nationwide retailer that needed to identify its most valuable customers.

SELECT c.id, c.name, c.location,
    (SELECT VALUE COUNT(1) FROM order IN c.orders WHERE order.amount > 100) AS largeOrderCount,
    ARRAY(SELECT VALUE order.amount FROM order IN c.orders) AS orderAmounts,
    AVG(SELECT VALUE order.amount FROM order IN c.orders) AS averageOrderValue
FROM c
WHERE c.location IN ("Chicago", "New York", "Los Angeles")
    AND EXISTS(SELECT VALUE order FROM order IN c.orders WHERE order.amount > 250)
    AND ARRAY_LENGTH(c.orders) >= 3

The above query performs the below actions.

  • Filters customers from major cities
  • Ensures they have at least one large order (>$250)
  • Counts their orders exceeding $100
  • Lists all order amounts
  • Calculates their average order value
  • Only includes customers with 3+ orders

Conclusion

Knowing all possible ways to array queries in Cosmos DB is crucial. As mentioned in this article, you can use everything from simple ARRAY_CONTAINS functions to complex nested array operations.

Remember the key principles:

  • Choose the right array querying method for your needs
  • Structure your data to support efficient querying
  • Implement proper indexing for array properties
  • Monitor and optimize RequestUnit consumption

You may also like the following articles below

Azure Virtual Machine

DOWNLOAD FREE AZURE VIRTUAL MACHINE PDF

Download our free 25+ page Azure Virtual Machine guide and master cloud deployment today!