Plansplaining, part 16. Temporal tables (part 1)

3 Comments

Welcome to part sixteen of the plansplaining series. The first of a few posts about how temporal tables affect execution plans. In this post, I’ll build on the last four posts on data modifications, building on the more generic discussion of data modification in the previous four posts. Later posts will look at data retrieval and some specific scenarios.

If you have never heard of temporal tables, I suggest reading Louis Davidson’s five part series about this feature. It seems some of the formatting has been lost in time, but the content is still good and valid. Part 1 Part 2 Part 3 Part 4 Part 5

Sample tables

I decided to stick with the same scenario I used in the posts on data modifications. But of course, I have to add some columns and specify some options to ensure we’re dealing with temporal tables. Also, to not bog you down with too much other stuff, I have removed the self-correcting options on the foreign key.

CREATE TABLE dbo.Suppliers
    (SupplierID   int          NOT NULL,
     SupplierName varchar(50)  NOT NULL,
     ValidSince   datetime2(7) GENERATED ALWAYS AS ROW START NOT NULL,
     ValidUntil   datetime2(7) GENERATED ALWAYS AS ROW END NOT NULL,
     PERIOD FOR SYSTEM_TIME(ValidSince, ValidUntil),
     CONSTRAINT PK_Suppliers
         PRIMARY KEY (SupplierID))
WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.SuppliersHistory));

CREATE TABLE dbo.Products
    (ProductCode char(10)     NOT NULL,
     ProductName varchar(50)  NOT NULL,
     SupplierID  int          NULL,
     ValidSince  datetime2(7) GENERATED ALWAYS AS ROW START NOT NULL,
     ValidUntil  datetime2(7) GENERATED ALWAYS AS ROW END NOT NULL,
     PERIOD FOR SYSTEM_TIME(ValidSince, ValidUntil),
     CONSTRAINT PK_Products
         PRIMARY KEY (ProductCode),
     CONSTRAINT FK_Products_Suppliers
         FOREIGN KEY (SupplierID)
         REFERENCES dbo.Suppliers (SupplierID),
     INDEX ix_Products_SupplierID NONCLUSTERED (SupplierID))
WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.ProductsHistory));

I have not tried to create a migration script, it’s all simple demo data anyway. If you still have the tables from my last blog, just remove them. Or create a small new database for just the purpose of following along.

Inserting data

Time to put some data in these tables, and see how the fact that they are temporal tables affects the execution plan.

INSERT … VALUES

For the Suppliers table, I will use a simple INSERT … VALUES statement:

INSERT dbo.Suppliers (SupplierID,
                      SupplierName)
VALUES (1, 'Supplier 1'),
       (2, 'Supplier 2'),
       (3, 'Supplier 3'),
       (4, 'Supplier 4'),
       (5, 'Supplier 5'),
       (6, 'Supplier 6');

Here is the execution plan plus run-time statistics for this statement:


Based on the screenshot alone, you’ll see no difference if you ran the exact same insert statement against a “normal” (not temporal) table. To see the differences, you’ll have to look in the properties. And even there, you won’t see many surprises.

The Constant Scan on the far right is where the data is produced. If you look at the properties, you will see in the Output List property that it returns rows with two columns, named Union1002 and Union1003. The Values property lists six rows of two values each that will be returned. Because the query qualifies for simple parameterization, these values are not listed as 1, ‘Supplier 1’; 2, ‘Supplier 2’; etc. but as @1, @2; @3, @4; etc., but the effect is still the same. This is expected, and not related to the use of a temporal table.

The Compute Scalar defines three new expressions, and their values can be found in the Defined Values property, as shown here. The first one, Expr1004, simply converts Union1003 (which holds the value to be inserted into the SupplierName column) from whatever data type was inferred from the constant list to the column’s data type in the table. This, too, is perfectly normal. However, the two other columns listed here show the first logic related to the use of a temporal table. Expr1005 is computed using an internal function called systrandatetime, which returns the start time (in UTC and as the datetime2(7) data type) of the current transaction; which is the expected value for the ValidSince column. And Expr1006 uses another internal function, sysmaxdatetime(7), to return the value “9999-12-31 23:59:59.9999999”, which is expected for ValidUntil.

The Clustered Index Insert simply inserts a row with the values in Union1002, Expr1004, Expr1005, and Expr1005. If you look at the properties of the operator, you can see the mapping of these columns to the columns in the table by expanding the Predicate property. Here you will also see that the RaiseIfNullInsert internal function is used to ensure a run-time error if any of these values is NULL. This is of course impossible for the ValidSince and ValidUntil functions, because they have just been computed in the Compute Scalar operator. The use of the RaiseIfNullInsert function is always used when inserting in a column that doesn’t allow NULL values. My guess is that setting up logic in the optimizer to remove this call when possible would be rather expensive and possibly error prone; the function itself is likely incredibly cheap and fast so it’s better to just leave it in.

INSERT … SELECT

I also want to check the execution plan for an insert that is not based on a simple VALUES specification, but on another query. So for the Products table, I insert sample data in a temporary table first, then insert the data from there into the actual Products table:

CREATE TABLE #Products
    (ProductCode char(10)    NOT NULL PRIMARY KEY,
     ProductName varchar(50) NOT NULL,
     SupplierID  int         NOT NULL);

INSERT #Products (ProductCode,
                  ProductName,
                  SupplierID)
VALUES ('Prod 1', 'Product number 1', 1),
       ('Prod 2', 'Product number 2', 2),
       ('Prod 3', 'Product number 3', 3),
       ('Prod 4', 'Product number 4', 4),
       ('Prod 5', 'Product number 5', 5),
       ('Prod 6', 'Product number 6', 1),
       ('Prod 7', 'Product number 7', 2),
       ('Prod 8', 'Product number 8', 3),
       ('Prod 9', 'Product number 9', 4);

INSERT dbo.Products (ProductCode,
                     ProductName,
                     SupplierID)
SELECT ProductCode,
       ProductName,
       SupplierID
FROM   #Products;

This query uses two execution plans: one for the insert into the temporary table, which is quite uninteresting. The second one represents the execution plan for a query that fetches data from some source and inserts that into a temporal table:


This execution plan looks much more complicated than the one shown before. But if you read my earlier post on inserting data in “normal” (non-temporal) tables, you will quickly recognize the standard pattern for verifying the foreign key constraint.

If you compare the execution plan from that earlier post to this one, checking all properties of all operators in detail, you’ll quickly find that there’s only one difference. And that is that in this execution plan, like the one above for inserting in the Suppliers table, the Compute Scalar operator uses the internal functions systrandatetime and sysmaxdatetime to set the correct values for the ValidSince and ValidUntil columns.

Updating data

Temporal tables become more interesting when data changes. That’s when history needs to be tracked, so let’s look at a simple UPDATE to see how exactly this works.

UPDATE dbo.Products
SET    ProductName = 'New name for ' + ProductName
WHERE  ProductCode = 'Prod 4';

We’re going to rename just a single product, based on the primary key. No foreign key checks are needed. This is really the most basic form of update I can think of for my sample setup.


So this is the execution plan plus run-time statistics for the above statement. Let’s look at it in detail, following the flow of data (right to left).

Updating the actual data

The Clustered Index Update on the far right is where the Products table is updated. A Clustered Index Update operator usually takes rows from a child operator. The exception is when the update statement uses a WHERE clause on the (leading) column(s) of the clustered index, and the rest of the statement is simple enough that no other operators are needed to compute the new values. The statement above does meet these rules, which is why in this case the Clustered Index Update operator is on the far right and has no child operator.

If a Clustered Index Update has no child operator, it needs to have a Seek Predicates property. You can see in the screenshot on the right that this is indeed the case. You can also see that this plan used simple parameterization again, that’s why the Seek Predicates property uses @2 instead of ‘Prod 4’.

More interesting is the Predicate property, where the operator specifies what columns to change, and what new values to use for this columns. Based on the query, I expect the ProductName column to be listed, and it is; the operator will change it from whatever value it had to “RaiseIfNullUpdate([Expr1005])”. The RaiseIfNullUpdate is an internal function, very similar to RaiseIfNullInsert but used when updating data. I’ll admit that I have no clue why there are even two separate versions of a function that simply throws an error if its input is NULL and passes all other input unchanged. The input to this function in Expr1005. Normally I would expect a column with that name to be produced in a Compute Scalar operator. Not in this case, though. We’ll see where this comes from shortly.

There’s also a second column listed here. This may be surprising when looking at the query only, after all, the UPDATE specifies one column only. But because this is a temporal table, it does make sense that extra work is needed. And we see that, indeed, one of the temporal-specific columns is affected. ValidSince is updated to Expr1006, also with the standard NULL protection.

Good. Time to look into those two Expr columns. There is no Compute Scalar on the input of the Clustered Index Update. In fact, it has no input at all. The only operator where this data can come from is the operator itself. And indeed, if you go to the full properties list you will see a Defined Values property, shown in the screenshot on the left. The formula used for Expr1005 is simply taken directly from the query, but parameterized and converted to the correct datatype. For Expr1006 we once more see the systrandatetime internal function.

So piecing this all together, we see that the combined effect of all these properties of the Clustered Index Update is to make it find all rows that match the WHERE clause (in this case just one because it’s equality on a unique column, but the same basic plan shape can be used for queries with a WHERE clause that matches more than one row). For these rows, the new value of ProductName is computed and set, and the ValidSince value is also changed to the current date and time. That last part is important for temporal tables, since we need to know that this data was not valid before that moment. But we also need to track the value that was valid at that time.

Tracking history

That’s where one more property of the Clustered Index Update comes in: the Output List. The primary function of a Clustered Index Update operator is updating rows, not returning them. But it’s still an operator. Like all operators, it is bound to the same contract: when called (through the GetNext() method), it returns a row after doing its job. And in this case, we see in the Output List property that the row returned will include four columns: ProductCode, SupplierID, ProductName_OLD, and ValidSince_OLD. The first two are simply taken from the affected row. When a Clustered Index Update has columns from its target table in the Output List, it always returns them with the values after the update. That is not relevant here because the columns mentioned are not affected by the update; but in other cases it may be relevant to know this.

The other two columns are not directly from a table. They have names similar to the columns in the table, but the database, schema, and table are not included in their names. That proves that the columns are internal. And the suffix “_OLD” added to the column names betrays that these columns are intended to hold the original values of these columns, as they were before the update.

So the rows returned by the Clustered Index Update effectively include the (unchanged) ProductCode and SupplierId, plus the (original) ProductName and ValidSince values. That is almost the entire original row; only ValidUntil is missing.

These rows are then passed through a Compute Scalar operator. This operator merely adds one extra column, Expr1009, and sets it equal to the systrandatetime internal function. Not very interesting, until we know what this column is used for.

To figure that out, we move to the next operator, a Clustered Index Insert. The target of this operator, as found in the Object property, is ProductsHistory, the history table for the Products temporal table. To see what exactly gets inserted, we can check the full properties, scroll to the Predicate property, then expand to find a ScalarOperartor sub-property underneath it. None of this is visible in the popup, unfortunately, but I have included a screenshot of this property from the full property list on the right.

As you can see, four of the columns in the history table (ProductCode, ProductName, SupplierID, and ValidSince) are taken from the unchanged or original values of the row we just updated. The last one, ValidUntil, is set to the current date and time just computed in the Compute Scalar.

The net effect of an update in a temporal table should be to invalidate the current row (by setting its ValidUntil to the current date and time and move it to the history table), then insert a new row that is valid since the current date and time and has the new values. If you tracked the steps detailed above, you’ll see that the actual way SQL Server executes this has the same effect, but in a different order. The existing row is modified to become what we think of as the new row, and a new row is inserted in the history table that looks like what we conceptually expect in the original row after the update invalidates it.

Trust, but verify

There is one more operator left in the execution plan. It’s an Assert, which means that if its condition is not met, it aborts the update, rolls back the transaction, and returns an error message. The original query does nothing that immediately explains why an Assert should be needed. So let’s look at its properties to try to understand why it’s here.

As you can see in the screenshot, the Predicate property of this Assert compares the ValidSince and ValidUntil values that it receives. These values were returned by the Clustered Index Insert operator, so they represent the values of these two columns in the row that was just added to the history table.

The Predicate expression makes it clear that this Assert will raise an error condition if, for whatever reason, the row just inserted in the history table has an invalid combination of ValidSince and ValidUntil. It does make sense, of course, that no row can ever have a ValidSince that is later than ValidUntil. If you look in Object Explorer, or query the catalog views, you will not see an explicit constraint to enforce this; it is still there though, as an implied constraint that applies to every table marked as a temporal table. And this Assert operator verifies this constraint. Granted, I would not know any way to get this to trigger, unless the existing data already violates the rules. But SQL Server takes no risks. It checks anyway.

Deleting data

Let’s complete the picture by looking at a removal. It seems Supplier 6 sells none of the products we use, let’s get rid of this row.

DELETE dbo.Suppliers
WHERE  SupplierName = 'Supplier 6';

Again, a very simple example. And yet the execution plan is a lot more complex than most people would expect for such a simple statement.


Note that, as usual, I have opted to run the query and collect the execution plan plus run-time statistics. It’s of course also possible the request the execution plan only if you prefer not to actually execute the statement.

Looks familiar

The execution plan has a lot of operators doing lots of stuff. But most of the things going on are not new; we have seen them already in the general posts about data modification in execution plans. So I’ll just quickly go over these and only pay more attention when operators have actions specifically because this table is a temporal table.

On the far right, we see a Clustered Index Scan that is required to find all rows that match the SupplierName in the WHERE clause, needed because there is no index on this column. It passes just a single column, the SupplierID, because that is the clustered index key value needed for the Clustered Index Delete to work.

Once the row is actually deleted, the same SupplierID value is passed to the Nested Loops operator, along with some other columns (more on that later). The Nested Loops does a Left Semi Join into an Index Seek to verify that there are no products that reference the supplier that has been deleted. The Assert operator to the left of the Nested Loops aborts execution and forces rollback and an error message if this is the case.

All of these actions are basically the same as what we saw when we deleted from similar tables that are not temporal, as described here.

Temporal logic

There’s also a lot of stuff in the execution plan that we did not see before, and that is present in this plan because the table affected is defined as a temporal table. A very subtle but still relevant change is that the Output List property of the Clustered Index Delete specifies that in this case not only the SupplierID of the deleted row needs to be returned, but also the SupplierName and ValidSince values from the same row. The other operators this row flows through all pass that data unchanged to their parent.

When a row is deleted and it does not cause a violation of the foreign key, the Assert passes a row with three values, SupplierID, SupplierName, and ValidSince, to its parent: a Compute Scalar. This Compute Scalar adds one extra column to the row, called Expr1007, and computed using the internal function systrandatetime. We’ve seen this before in the execution plan for the update, as the method used in the execution plan to determine the value for the ValidUntil column in the history table. Here, it is used for that exact same purpose.

The parent of this operator is a Clustered Index Insert, targeting the SuppliersHistory table. Its Predicate property confirms what we suspect by now, that it adds a row with the SupplierID, SupplierName, and ValidSince values taken from the row that was just deleted, and with ValidUntil set to the just computed value in Expr1007. At this point, the entire change is completed. The row to be deleted has indeed been removed from the “real” table, and a copy of it, with ValidUntil set to the date and time of the transaction, has been added to the history table, as required for a temporal table. Our work is done.

But not for SQL Server. The Clustered Index Insert returns a row with only two columns, ValidSince and ValidUntil. Just as for the update scenario, SQL Server wants to make sure it didn’t accidentally add a row in the history with ValidUntil before ValidSince, so it uses an Assert operator to verify this.

Interesting is that in this case the Assert operator does not compare the ValidSince and ValidUntil values directly; there is a Computer Scalar operator in between that does this comparison and returns a single column Expr1012 with a boolean value in it, and the Assert operator simply triggers on this column. Even more interesting is that if you change the query to select the row(s) to be deleted based on SupplierID rather than SupplierName, there is no Compute Scalar between the Clustered Index Insert and the Assert, and the Assert uses a Predicate property that compares ValidSince to ValidUntil directly, similar to how this check is done in the update plan above.

It’s interesting to see the two different methods of verifying the hidden internal constraint. But do keep in mind that the difference is visual only. The execution is still exactly the same, because the Compute Scalar in this execution plan uses deferred evaluation. This means it doesn’t actually run; it simply is a container for expressions that are referenced and evaluated elsewhere. So even though there is an explicit Compute Scalar in the execution plan for this query, the actual execution returns rows from the Clustered Index Insert directly to the Assert operator, and the Assert invokes the expression defined in the Compute Scalar when it needs to use the value of Expr1012.

Conclusion

When you modify data in a temporal table, there’s always at least some extra work necessary to ensure that the history of the data is properly maintained. For insert statements, that work is fairly minimal. An insert statement never affects the history table; the only change is that the execution plan needs to compute the values to be inserted in the columns used for the valid period. These are extremely simple computations, so you’re likely to not find any measurable overhead for this.

For updates and deletes, more overhead is added. Logically speaking, the existing row is in both cases marked with a ValidUntil and then moved to the history table, and in the case of an update then replaced in the current table. Physically the process is a bit different; the update or delete is carried out as normal (after computing the correct ValidSince and ValidUntil values in the case of an update); then the data of the old (updated or deleted) row is passed to other operators that do some checking and then insert a row with that old data in the history table.

The examples here were fairly basic. If you want to see some really interesting plans, then change the foreign keys to be self-correcting, so that an update or delete in the parent table affects the child table too. Since both are temporal tables, this will give you a really lovely execution plan that works out what to change and then makes those changes in the parent table, the child table, as well as the two associated history tables. And if that is not yet complex enough, then try a MERGE statement and try to analyze that execution plan.

But that’s homework for you. Using the basics explained in several previous plansplaining posts, you should be able to work through all elements in that execution plan.

In the meantime, I’ll prepare the next part of this series, where I’ll look at how specific options for retrieval from temporal tables affect the resulting execution plans, and how we can try to optimize for these by using proper indexing.

Do you also have a topic you would like to see me cover? Let me know, and I’ll add it to my list!

T-SQL Tuesday #135 – My tools to stay alive
T-SQL Tuesday #136 – Float does NOT suck!

Related Posts

3 Comments. Leave new

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Menu
%d bloggers like this:

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close