Skip to main content

SQL Find Missing Values in a Sequence

Finding missing values in a sequence is crucial for maintaining data integrity, ensuring analytical accuracy, supporting informed decision-making, and contributing to the overall reliability and quality of data-driven systems.

In this post, we will demonstrate how to use Recursive CTE to construct a virtual RecursiveSequence table and then use LEFT JOIN technique to find missing values within a sequence.

tip

You can use this SQL Server Fiddle to follow along and practice finding missing values in a sequence

SQL Server DB Fiddle: Find Missing Values in Sequence

Orders Data

We create a table to store orders data, inserts orders, and then verifies the inserted data by retrieving all rows from the table.

Orders Table
-- Create a sample Orders table
CREATE TABLE Orders (
OrderID INT PRIMARY KEY
);

-- Insert some sample records with missing values
INSERT INTO Orders (OrderID) VALUES (1001), (1002), (1004), (1006), (1008);

SELECT * FROM Orders;
OrderID
1001
1002
1004
1006
1008

Find Missing Values in a Sequence

Find Missing Values in a Sequence
-- Use a recursive common table expression (CTE) to find missing values in the sequence
WITH RecursiveSequence
AS (SELECT MIN(OrderID) AS OrderID,
MIN(OrderID) AS StartRange,
MAX(OrderID) AS EndRange
FROM Orders
UNION ALL
SELECT OrderID + 1,
StartRange,
EndRange
FROM RecursiveSequence
WHERE OrderID < EndRange)
-- Select the missing values
SELECT rs.OrderID AS MissingOrderID
FROM RecursiveSequence rs
LEFT JOIN Orders o
ON rs.OrderID = o.OrderID
WHERE o.OrderID IS NULL
-- Set MAXRECURSION to 0 to avoid recursion limit
OPTION (MAXRECURSION 0);

The SQL query detects there are 3 missing OrderID values in the sequence.

MissingOrderID
1003
1005
1007

Let's walk through the provided T-SQL code step by step:

  1. Create a recursive common table expression (CTE):

    WITH RecursiveSequence
    AS (SELECT MIN(OrderID) AS OrderID,
    MIN(OrderID) AS StartRange,
    MAX(OrderID) AS EndRange
    FROM Orders
    UNION ALL
    SELECT OrderID + 1,
    StartRange,
    EndRange
    FROM RecursiveSequence
    WHERE OrderID < EndRange)
    • The RecursiveSequence CTE is created using a recursive structure.
    • The first part (before UNION ALL) selects the minimum and maximum order IDs from the Orders table. This serves as the initial range.
    • The second part (after UNION ALL) recursively selects the next order ID (OrderID + 1) while maintaining the same start and end range. The recursion continues until the OrderID reaches the EndRange.

The RecursiveSequence CTE contains all OrderID values between 1001 and 1008.

OrderIDStartRangeEndRange
100110011008
100210011008
100310011008
100410011008
100510011008
100610011008
100710011008
100810011008
  1. Select the missing values:

    SELECT rs.OrderID AS MissingOrderID
    FROM RecursiveSequence rs
    LEFT JOIN Orders o
    ON rs.OrderID = o.OrderID
    WHERE o.OrderID IS NULL
    OPTION (MAXRECURSION 0);
    • The main query selects the OrderID from the RecursiveSequence CTE as MissingOrderID.
    • It performs a LEFT JOIN with the Orders table on the OrderID column. The LEFT JOIN is used to include all values from the RecursiveSequence even if there is no match in the Orders table.
    • The WHERE clause filters out the rows where there is a match (o.OrderID IS NULL), indicating that the order ID is missing in the original Orders table.
  2. OPTION (MAXRECURSION 0):

    • This option is used to set the maximum recursion level to 0, effectively removing any limit on the number of recursions. It is added to avoid reaching the default recursion limit in case the sequence is large. Adjust this value based on your specific needs and the range of order numbers in your dataset.

Conclusion

In summary, the code recursively generates a sequence of order IDs and then identifies the missing order IDs by comparing them with the actual data in the Orders table. The final result includes the order IDs that are in the generated sequence but not in the original Orders table.