Welcome to part nineteen of the plansplaining series, where I will finally wrap up the discussion on temporal tables. In the first three parts, we looked at execution plans for modifying data in a temporal table, and for basic and advanced temporal queries. In this last part, we’ll shift away for execution plans (so the plansplaining tag is perhaps a bit misleading) and look at what happens if you join data from multiple temporal tables.
I once more use the same tables as in all previous post, but this time I want a bit more control over the data. So I use the syntax for converting an existing table with history data to a system-versioned temporal table. This syntax is intended for situations where your application used a homebrew mechanism, such as e.g. triggers, to track history, and you want to change to using temporal tables without losing the history. But for demo scenarios, that same mechanism is easily abused to set up a fake history.
-- Declare the tables and the history tables as regular tables CREATE TABLE dbo.Suppliers (SupplierID int NOT NULL, SupplierName varchar(50) NOT NULL, ValidSince datetime2(7) NOT NULL, ValidUntil datetime2(7) NOT NULL, CONSTRAINT PK_Suppliers PRIMARY KEY (SupplierID)); CREATE TABLE dbo.Products (ProductCode char(10) NOT NULL, ProductName varchar(50) NOT NULL, SupplierID int NULL, ValidSince datetime2(7) NOT NULL, ValidUntil datetime2(7) NOT NULL, CONSTRAINT PK_Products PRIMARY KEY (ProductCode), CONSTRAINT FK_Products_Suppliers FOREIGN KEY (SupplierID) REFERENCES dbo.Suppliers (SupplierID)); CREATE TABLE dbo.SuppliersHistory (SupplierID int NOT NULL, SupplierName varchar(50) NOT NULL, ValidSince datetime2(7) NOT NULL, ValidUntil datetime2(7) NOT NULL); CREATE TABLE dbo.ProductsHistory (ProductCode char(10) NOT NULL, ProductName varchar(50) NOT NULL, SupplierID int NULL, ValidSince datetime2(7) NOT NULL, ValidUntil datetime2(7) NOT NULL); GO -- Insert some sample data (with valid since and valid until timestamps!!! DECLARE @EndOfTime datetime2(7) = '9999-12-31T23:59:59.9999999'; INSERT dbo.Suppliers (SupplierID, SupplierName, ValidSince, ValidUntil) VALUES (1, 'Supplier 1', '20210601', @EndOfTime), (2, 'Supplier 2', '20210602', @EndOfTime), (3, 'Supplier 3', '20210603', @EndOfTime); INSERT dbo.Products (ProductCode, ProductName, SupplierID, ValidSince, ValidUntil) VALUES ('Prod 1', 'Prod 1', 1, '20210601', @EndOfTime), ('Prod 2', 'Prod 2', 2, '20210602', @EndOfTime), ('Prod 3', 'Prod 3', 3, '20210603', @EndOfTime), ('Prod 4', 'Prod 4', 2, '20210604', @EndOfTime), ('Prod 5', 'Prod 5', 3, '20210605', @EndOfTime), ('Prod 6', 'Prod 6', 1, '20210606', @EndOfTime); GO -- Insert sample history data too INSERT dbo.SuppliersHistory (SupplierID, SupplierName, ValidSince, ValidUntil) VALUES (2, 'Sup 2 old', '20210601', '20210602'), (4, 'Sup 4 del', '20210602', '20210605'); INSERT INTO dbo.ProductsHistory (ProductCode, ProductName, SupplierID, ValidSince, ValidUntil) VALUES ('Prod 4', 'Old name 4', 1, '20210601', '20210603'), ('Prod 4', 'Old name 4', 2, '20210603', '20210604'), ('Prod 6', 'Old name 6', 2, '20210603', '20210606'); GO -- Now connect the history table as system-versioned history tables ALTER TABLE dbo.Suppliers ADD PERIOD FOR SYSTEM_TIME(ValidSince, ValidUntil); ALTER TABLE dbo.Suppliers SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.SuppliersHistory)); ALTER TABLE dbo.Products ADD PERIOD FOR SYSTEM_TIME(ValidSince, ValidUntil); ALTER TABLE dbo.Products SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.ProductsHistory)); GO
The code is quite long already, and it would have been even longer if I had bothered to add indexes (which of course is incredibly important in real scenarios!!).
Point in time joins
Of course, it will be a quite common requirement to query products and their suppliers. When querying the present, you can just access the Suppliers and Products table without the “FOR SYSTEM_TIME” keyword, and the execution plan will unsurprisingly show that a regular join of the two “current” tables is used, with no reference to the history table. Nothing special. We already saw in the earlier parts that querying the present simply ignores the history table; joining does not make that different.
But what if, for instance, we want to show all data as it was valid on June 3 at noon? Well, that is also simple. We already know that we can use FOR SYSTEM_TIME AS OF” to get the rows from a single temporal table as they were at a specific time. We can use that syntax for both tables, to get the data we need:
SELECT p.ProductCode, p.ProductName, s.SupplierID, s.SupplierName FROM dbo.Products FOR SYSTEM_TIME AS OF '2021-06-03 12:00' AS p INNER JOIN dbo.Suppliers FOR SYSTEM_TIME AS OF '2021-06-03 12:00' AS s ON s.SupplierID = p.SupplierID ORDER BY p.ProductCode;
The result of this query show us the expected point in time snapshot:
If you take the time to manually check the history of all products and suppliers by reading the data I inserted in the history tables, you will see that this does indeed represent all product and matching supplier data as of June 3.
Checking the syntax, you may spot a weak point. I had to supply the as-of date and time twice. And I need to make sure that I specify the exact same date and time in both cases. The next query illustrates this:
SELECT p.ProductCode, p.ProductName, s.SupplierID, s.SupplierName FROM dbo.Products FOR SYSTEM_TIME AS OF '2021-06-03 12:00' AS p INNER JOIN dbo.Suppliers FOR SYSTEM_TIME AS OF '2021-06-01 12:00' AS s ON s.SupplierID = p.SupplierID ORDER BY p.ProductCode;
Note that I “accidentally” specified the wrong date for the Suppliers table. This causes some weird results to be returned. For instance, product Prod 6 was first added Jun 3, after the name change of supplier 2, yet it is listed here as being supplied by “Sup 2 old”, which was never the case. And even worse is that product Prod 3 is completely missing. It was a valid product on June 3, supplied by supplier 3. But supplier 3 did not exist on June 1, so the inner join removes this product completely!
This risk of incorrect data is of course not acceptable. One way to prevent this is to declare a variable for the as of date and time and use that, as shown below:
DECLARE @AsOf datetime2 = '2021-06-03 12:00'; SELECT p.ProductCode, p.ProductName, s.SupplierID, s.SupplierName FROM dbo.Products FOR SYSTEM_TIME AS OF @AsOf AS p INNER JOIN dbo.Suppliers FOR SYSTEM_TIME AS OF @AsOf AS s ON s.SupplierID = p.SupplierID ORDER BY p.ProductCode;
However, there is an even easier option, and that is to use a view.
Temporal support in views
When the product and supplier data is often queried together, it makes sense to create a view for easier access, for instance like this.
CREATE VIEW dbo.ProductAndSuppliers AS SELECT p.ProductCode, p.ProductName, s.SupplierID, s.SupplierName FROM dbo.Products AS p INNER JOIN dbo.Suppliers AS s ON s.SupplierID = p.SupplierID;
As you see, this is completely the normal view syntax. And you can also use this view the normal way to query the present data. But because there are temporal tables used in the view, we can now also apply the FOR SYSTEM_TIME syntax to this view in any query to do our time travel:
SELECT ProductCode, ProductName, SupplierID, SupplierName FROM dbo.ProductAndSuppliers FOR SYSTEM_TIME AS OF '2021-06-03 12:00' ORDER BY ProductCode;
Here we specify the as-of date and time only once, so it’s impossible to accidentally have inconsistency in the query. SQL Server will automatically apply the specification to all tables used by the view, and return the correct results:
Also note that if a query uses a combination of temporal tables and regular tables, SQL Server will automatically apply the FOR SYSTEM_TIME specification to the temporal tables only; you will not get an error due to attempting to apply the specification to the regular tables.
So, all of this works pretty neat. Only good news, right?
Well, let’s just say that if you want to believe that, you better stop reading here. If you want to be aware of how horribly wrong this can go in some other cases, keep reading.
Joins over time periods
Now let’s see what happens if we use this same view to look at our products and suppliers over a period of time. Let’s say, June 1 noon until June 4 noon. And let’s in this case look at a single product only.
SELECT ProductCode, ProductName, SupplierID, SupplierName FROM dbo.ProductAndSuppliers FOR SYSTEM_TIME BETWEEN '2021-06-01 12:00' AND '2021-06-04 12:00' WHERE ProductCode = 'Prod 4';
Here is the output returned by this query (order of rows may differ between executions):
The biggest issue I have with these results is that they are simply, clearly, and totally wrong. We get data back that should never have been returned. This product was first supplied by supplier 1, but we changed to supplier 2 on June 3; at that time that supplier had already changed their name (that happened on June 2). So we never acquired this product from a supplier named “Sup 2 old”. And yet the data returned suggests we did. Rows number 2 and 5 in the screenshot above are simply a falsification of history.
The reason this happens is actually quite mundane. As explained above, SQL Server simply applies the SYSTEM_TIME specification to each temporal table used in the query. So the query on the view is interpreted as if we executed this:
SELECT p.ProductCode, p.ProductName, s.SupplierID, s.SupplierName FROM dbo.Products FOR SYSTEM_TIME BETWEEN '2021-06-01 12:00' AND '2021-06-04 12:00' AS p INNER JOIN dbo.Suppliers FOR SYSTEM_TIME BETWEEN '2021-06-01 12:00' AND '2021-06-04 12:00' AS s ON s.SupplierID = p.SupplierID WHERE p.ProductCode = 'Prod 4';
And SQL Server interprets this quite literally. It takes all rows from Products that were valid in the specified period, all rows from Suppliers that were valid in the specified period, and then simply joins them on equality of the SupplierID. Without looking at the ValidSince and ValidUntil dates. After all, we never asked in our query to take that into account. And it is apparently too much to ask of SQL Server to understand that joining temporal tables should take into account whether rows were valid at the same time.
In the case of the full query without the view, I guess one might be able to argue this as to be expected behavior. In the case of the view that we used before, I think this point falls apart at the seams. Especially because Microsoft’s documentation actively encourages people to use views for AS OF queries, without including a big warning not to use those same views for other types of temporal queries!
Another problem with the data above, less important but still relevant, is that even for the rows that were actually valid at one time, we don’t know when they were valid.
Let’s see if we can fix both issues.
Can we fix it?
As a starting point, let’s add a few columns to the view so we get a bit more insight in what the issue exactly is and how we might fix it. Note that for this blog post, I cast the ValidSince and ValidUntil columns as date, to make the data in the screenshots easier on the eyes. I can do that because in my demo data all changes were at midnight anyway. In real scenarios, you should of course always return the full precision of the valid since and valid until columns!
ALTER VIEW dbo.ProductAndSuppliers AS SELECT p.ProductCode, p.ProductName, s.SupplierID, s.SupplierName, -- Cast as date for blog purposes only!! CAST(p.ValidSince AS date) AS pSince, CAST(p.ValidUntil AS date) AS pUntil, CAST(s.ValidSince AS date) AS sSince, CAST(s.ValidUntil AS date) AS sUntil FROM dbo.Products AS p INNER JOIN dbo.Suppliers AS s ON s.SupplierID = p.SupplierID;
Let’s now query the view again, again for the same period of time, but now with the extra columns returned as well.
SELECT ProductCode, ProductName, SupplierID, SupplierName, pSince, pUntil, sSince, sUntil FROM dbo.ProductAndSuppliers FOR SYSTEM_TIME BETWEEN '2021-06-01 12:00' AND '2021-06-04 12:00' WHERE ProductCode = 'Prod 4';
At this point we could filter out the invalid rows. Every row where there is no overlap in the validity of the product (pSince, pUntil) and that of the supplier (sSince, sUntil) was never actually a valid combination and should be filtered out.
For the rows that remain, we can deduce when that combination was valid by looking at the overlap of the two periods. For example, the fourth row above shows supplier data that is valid from June 2 until eternity, but combined with product data that was only valid from June 3 to June 4. So that combination was also only valid from June 3 to June 4.
Now that we know this, we can of course add extra conditions to the WHERE clause to filter out the invalid rows. And we can add the right expressions in the SELECT list to compute the period a specific combination of supplier and product row was valid. But there is a better way.
Yes, we can!
Whenever we use this view, for whatever reason, I am pretty sure that nobody is ever interested in receiving data that never actually existed. And I am also pretty sure that if we need to know when a row is valid, we need that logic to determine when that exact combination of data was valid. So instead of adding all that logic to every query that uses the view, it makes much more sense to do it in the view itself.
To determine whether two periods overlap, we need to put our thinking cap on. I’ve seen some pretty ugly code used for this purpose. And some of that code was written by myself, in a shady past I prefer not to be reminded of. It turns out, sometimes a double negation works best. The easiest way to check if two periods overlap is to actually check that they do not not overlap.
There is no overlap between two periods if the first one ends before the other starts or starts after the other ends. So for the example in this blog, the rows we do NOT want are those where (pUntil <= sSince OR pSince >= sUntil). Using Boolean negation, this means that the rows we DO want to return are those where (pUntil > sSince AND pSince < sUntil). At first sight this looks weird and wrong to many people, but trust me: it is the correct (and simplest) way to find overlapping intervals.
The period that a specific combination of supplier and product is valid is simpler. It starts at the latest of the two start moments, and ends at the earliest of the two end moments.
All of that combines results in this improved view for combined product and supplier data.
ALTER VIEW dbo.ProductAndSuppliers AS SELECT p.ProductCode, p.ProductName, s.SupplierID, s.SupplierName, CASE WHEN p.ValidSince > s.ValidSince THEN p.ValidSince ELSE s.ValidSince END AS ValidSince, CASE WHEN p.ValidUntil < s.ValidUntil THEN p.ValidUntil ELSE s.ValidUntil END AS ValidUntil FROM dbo.Products AS p INNER JOIN dbo.Suppliers AS s ON s.SupplierID = p.SupplierID WHERE p.ValidUntil > s.ValidSince AND p.ValidSince < s.ValidUntil;
And now we can query this view in any way we want. With any temporal specification we want. The logic of the view itself guarantees that only valid combinations are returned, and with a ValidSince and ValidUntil value that describes when that specific combination was valid.
SELECT ProductCode, ProductName, SupplierID, SupplierName, -- Cast as date for blog purposes only!! CAST(ValidSince AS date) AS ValidSince, CAST(ValidUntil AS date) AS ValidUntil FROM dbo.ProductAndSuppliers FOR SYSTEM_TIME BETWEEN '2021-06-01 12:00' AND '2021-06-04 12:00' WHERE ProductCode = 'Prod 4';
Once more, the cast to date is here simply to keep the results easier to read, and should not be used in real production code!!!
The beauty of this is that with all the extra logic, the view is still perfectly valid for queries on the current table only (without FOR SYSTEM_TIME specification), and for queries with an AS OF specification for one specific point in time. The extra logic may not be needed for those queries, but it does not invalidate the results in any way.
The entire example above is for just a single join of two tables. You can (and probably should!!) apply this pattern to all views you have that join two temporal tables.
But real life tends to get messy and complex. Often you need to join more tables than two. If they are not temporal tables, no problem; but to expand the logic above to three temporal tables gets messy, and if you add even more tables it gets very dicey, very fast!
Also, not exactly the same but at least a similar logic needs to be applied in other cases. What about a query that has a subquery in its SELECT list? What about a query where you use EXISTS? Or APPLY? Or any other query construction that uses a subquery where a temporal table is used. What if your system is very flexible, joins are generated on the fly (or written by data scientists) and you cannot possibly create safe views to use for every possible query that might every be used?
The bottom line is, there are no easy answers. However, you do need to take necessary actions to prevent incorrect data being returned to your users, because SQL Server by itself does not even attempt to prevent errors.
When you submit a query with more than system-versioned temporal table, you can apply a FOR SYSTEM_TIME qualifier to each of them. In realistic scenarios, you want that to be the same, and SQL Server makes it easy for you by allowing you to create a view; if you then apply FOR SYSTEM_TIME to that view, it will be automatically be applied to all temporal tables used in that view.
While this works great for the AS OF version of FOR SYSTEM_TIME, that returns the data for one specific moment in time only, it fails spectacularly with all of the other FOR SYSTEM_TIME qualifications. Each of those describes a period, which means more than one version of a row can be returned. If that happens for multiple tables, that are then combined in the query or view with e.g. JOIN, APPLY, or subqueries, then SQL Server does not use any logic to verify whether the rows it combines are even valid at the same time.
The code in this blog shows how you can create a view for a two-table join that includes the logic to only join rows when they are valid at the same time, and set the correct period of validity for the combined data. The good news is that this fixes the problems for FOR SYSTEM_TIME specifications that span a period without invalidating the results for the AS OF version or a query without any FOR SYSTEM_TIME qualifier.
The same logic can, with determination and care, be used to expand to joins on more tables. Subqueries and APPLY can be a bit tricker to handle. But you will still need to do it, or set and communicate a very clear and hard rule that FOR SYSTEM_TIME can only be used with AS OF and not in any other way. The alternative, incorrect data, is simply not acceptable.
[…] Hugo Kornelis wraps up a discussion of temporal tables with miscellany: […]
We are considering to replace history handling in our DW, with Temporal Tables.
This was very usefull information to take into consideration.