Storage structures 4 – Memory-optimized columnstore

Storage structures 4 – Memory-optimized columnstore

Time for the next part in my series on storage structures. The previous parts covered on-disk rowstore, columnstore indexes, and memory-optimized storage. In this part, I will look at the combination of the latter two: memory-optimized columnstore indexes.

Memory-optimized columnstore indexes were introduced in SQL Server 2016. I’ve seen some slick Microsoft marketing sessions in that time that were big on “real-time operational analytics”. A new trend where analytical processing would no longer be done on a stale copy of the data in a separate data warehouse, but directly on the OLTP database. Reports would always be fully current, there would be no more need for an ETL pipeline, and due to the combination of memory-optimized for OLTP workloads and columnstore for analytical processing, everything would always be fast. In theory.

I have not heard the term “real-time operational analytics” anymore after the initial rollout of SQL Server 2016. And since the introduction of SQL Server 2017, I don’t think I have ever heard any Microsoft employee use the terms “memory-optimized” and “columnstore” in the same session, let alone in a single sentence.

I really don’t know if there are any customers in the world that actually use memory-optimized columnstore indexes in their production systems, and I seriously debated whether I should include this storage structure at all in my series. But I’m a completionist. So I’m writing about it anyway.

Storage structure

If you want to create a columnstore index on a memory-optimized table, you can only choose to create a clustered columnstore index. Nonclustered columnstore is not supported for memory-optimized tables.

Because the columnstore index is clustered, all columns in the table are automatically included. This also means that the restrictions on supported data types can’t be easily circumvented. If the table includes a column that is not supported for the memory-optimized columnstore, then you can’t tell SQL Server to just omit that column. You will have to actually remove it from the table. Which might not sit well with the users of the system who need that column for their OLTP work.

The restrictions I mention above are not well documented. Or at least, I have not been able to find them anywhere, except in this whitepaper, that is almost ten years old already. I have ran some tests on SQL Server 2025 to see if there have been improvements since, but it seems that this is not the case. So my assumption is that memory-optimized columnstore indexes still do not support any LOB or off-row columns.

Normally, when a table has a clustered index, that index comes instead of the heap storage that is used for tables without a clustered index. So you might expect that, in this case, the varheap structure of the memory-optimized table is now replaced by a new, column-oriented storage structure.

That is not the case, though.

Copy of the data

When you have an existing memory-optimized table and then add a clustered columnstore index to it, SQL Server creates a full copy of the data! Granted, this copy is stored as rowgroups and segments, and heavily compressed, just as in regular on-disk columnstore indexes. So your storage needs will not suddenly double. But they will grow. If a table has 50GB of data and the columnstore compression achieves 90% compression rate, you still need 5GB of additional storage space.

And because this is not a regular columnstore index but a memory-optimized columnstore index, that compressed copy of the data will all be kept in memory. Just as with other memory-optimized indexes, there will also be data written to disk, to enable SQL Server to reload the data in memory on startup or recovery. But all regular operations (reads and writes) that affect the memory-optimized columnstore index use this copy of the data that resides in memory.

The structure of that data in memory is exactly the same as the structure of a regular columnstore index. So the data is divided into rowgroups of up to 1,048,576 (2 to the power of 20) rows. Each rowgroup is sorted for optimal compression, then stored as segments that hold the compressed data for the rowgroup one column each. Nothing new, except that this is now all in memory.

Support for modifications

The compression here also has the same effect of making modifications way too expensive. So, just as for regular columnstore indexes, the compressed rowgroups are effectively stored in a read-only structure. Which means that we also see the same workarounds.

The deleted bitmap, or deleted rows table, stores the Row IDs of all rows that were deleted but could not be actually removed because they are in a compressed rowgroup. This structure is, obviously, also stored in memory. Its storage structure is unknown. But since its main use is to quickly find whether a specific Row ID value is in it or not, I would expect this deleted rows table to be stored as a regular memory-optimized table, with one column (the Row ID), and then probably a nonclustered Bw-tree index on that column. It might also be a nonclustered hash index, but that would probably be slightly less efficient.

For inserted rows, again, similar to regular columnstore indexes, a delta store is used. I’ll get to the details of that structure in a bit. Just as with regular columnstore indexes, a background process periodically checks whether there are sufficient new rows in the delta store to warrant converting them to a new rowgroup. The cutoff point is one million rows, but this is based on an estimation of the number of new rows, not an exact count. Also, if your data frequently sees a lot of updates during a short period after being inserted, you can ensure that the new rows remain in the delta store (where updates are cheaper) for a specific period of time by changing the COMPRESSION_DELAY setting. This specifies the number of minutes that must pass after the last change to a row in the delta store before it becomes eligible to be included in a compressed rowgroup.

Updating a row in the delta store is easy. These are not compressed, so SQL Server can just change the data. Of course, it is still memory-optimized data, so that change will be reflected as a delete of the old row (by setting the end timestamp) and then adding a new row with the new values.

When a row that is already in a compressed rowgroup is updated, it is also effectively handled as if you delete the old data and then insert the new data. But now the old version of the row is marked as deleted in the deleted rows table. The new version is still added to the delta store. Once it has been there without further modifications for at least the period that was specified as the COMPRESSION_DELAY, it can be added to a compressed rowgroup if there are enough qualifying rows.

It is really important to realize that modifications on memory-optimized columnstore indexes get expensive. Remember, the compressed columnstore rowgroups are all copies of data that is also stored in the varheap. So when you delete a row, SQL Server has to delete it from the varheap (which, as described in the previous post, is a virtual delete, by setting its end timestamp). And then it also has to delete it from the copy in the columnstore index (which, once again, is a virtual delete – this time by finding its Row ID and adding that value to the deleted rows table).

Second varheap

As mentioned above, inserted (and updated) rows are initially stored in a delta store. This delta store is implemented as a second varheap. However, unlike the compressed data, this second varheap is not a duplication of the data.

You can think of it as if the regular varheap, that already existed to support the memory-optimized table and its nonclustered indexes, is split in two. The largest part holds the same data that is also copied in the columnstore index. The second part holds “recent” data only. This data has not yet been compressed into the columnstore index.

In other words, when the background process compresses rows from the delta store into a new compressed rowgroup, it copies the data into that compressed rowgroup, and them moves the data from the delta store varheap to the regular varheap!

Reading from a memory-optimized columnstore index

Reading data from a memory-optimized columnstore index is really very limited. Just as for a regular columnstore index, seeks are not supported. And for the same reason: the data is not stored in a way that allows for an effective search by value.

However, where a regular clustered columnstore index does allow a (very expensive) Key Lookup to find specific rows that have been identified through a different (nonclustered) index, that is not the case for its memory-optimized counterpart. Not because it would be too hard to build, but because there simply is no need for it.

Every memory-optimized table requires at least one hash or Bw-tree index. Those are always nonclustered indexes. So at first sight, it appears as if there would be cases where the optimizer would choose one of those nonclustered indexes to identify the required rows. But remember, even though these indexes are nonclustered, they are implemented as pointer chains on the varheap, and hence give direct access to all the data. Granted, it comes from the varheap, not from the clustered columnstore index. But those are identical copies of each other. You never see a Key Lookup on any memory-optimized table. Adding a clustered columnstore index will not change that.

So all in all, only a single operator is used to read data from a memory-optimized columnstore index: the Columnstore Index Scan.

Columnstore Index Scan

When a Columnstore Index Scan targets a memory-optimized columnstore index, the operator does effectively the same things as a Columnstore Index Scan on a regular columnstore index. The exception is that the data is now stored in memory, and that the delta store additionally has a different structure.

So, once more, the first step is to use column elimination and (if the Predicate property allows for it) rowgroup elimination, so that only the segments are read that might contain relevant data. These are then read. If the deleted rows table is stored as a Bw-tree index, then this index is scanned at the same time, to quickly identify which rows are logically deleted and hence should be skipped. If it is stored as a hash index, then this would require a seek in that index for each row that the Clustered Index Scan wants to return.

Of course, rows in the delta store also have to be returned. This is considered yet another filegroup, but here, the process is different. I assume that SQL Server uses the same logic as for a Table Scan on a regular memory-optimized table: it simply scans the varheap (but in this case only one of the two varheaps – the one that stores new and recently changed rows) and returns each row it finds, unless the Predicate property disqualifies it.

Conclusion

A memory-optimized columnstore index effectively stores a second copy of the data, in addition to the varheap. Due to its high compression gain, this does not actually double the storage requirement. But it does cause a huge increase. And this is all in memory. So if you want to use this feature on a large table, better make sure that your server has a lot of memory.

New and recently modified data is not duplicated, though. Until these rows are compressed into a columnstore rowgroup, they live in a second varheap structure. This second varheap hence is both part of the regular memory-optimized varheap structure, and also part of the memory-optimized columnstore index.

Only a single execution plan operator can be used to read data from the memory-optimized columnstore index: the Columnstore Index Scan. This operator reads the compressed rowgroups, together with the deleted rows table, and the delta store varheap, in the same way as a Columnstore Index Scan on a disk-based columnstore index.

That concludes the description of the regular, general purpose storage structures. But SQL Server also has various specialized index types that are used for specific data types only. My next post in this series will cover one of those: the XML index.

Parameter Sensitive Plan Optimization

Related Posts

No results found.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close