Index Scan

Introduction

The Index Scan operator is used to read all or most data from a nonclustered index. (When SQL Server needs to read all or more data from a clustered index, it uses a different operator: Clustered Index Scan). In combination with a Top operator, an Index Scan can also be used to read the first few rows according to the innate order of a nonclustered index, or to read just a few rows from a table when data order is irrelevant and the index used is the smallest available index.

The logic of the Index Scan operator itself is fairly simple, but the actual actions carried out can vary hugely depending on the type of index being scanned (as defined in the Storage and IndexKind subproperties of the Object property). Most of this logic is carried out at the level of the storage engine. Since an understanding of this is important to get a proper understanding of the performance of this operator, the actual actions carried out at the level of the storage engine will be described on this page as well.

The current version of SQL Server (2017) supports four types of nonclustered index storage. The Storage property distinguishes between RowStore, ColumnStore, and MemoryOptimized; for the latter type only IndexKind further differentiates this into NonClustered and NonClusteredHash.

An Index Scan operator with Storage equal to “ColumnStore” is represented in SSMS (and many other tools) by a different icon and, with a different name. Only the underlying XML reveals that this is merely a presentation difference, and that the same operator is actually used. For this reason, this page describes the behavior of ALL types of Index Scan, including those represented as Columnstore Index Scan in the graphical execution plan.

Visual appearance in execution plans

Depending on the tool being used, an Index Scan operator with the Storage property not equal to ColumnStore is displayed in a graphical execution plan as shown below:

SSMS and ADS (current versions)	Legacy SSMS	Plan Explorer	Paste The Plan

Algorithm

The basic algorithm for the Index Scan operator is as shown below:

Note that this flowchart is a simplification. It doesn’t show that execution stops whenever a row is returned, and resumes where it was upon next being called.

Also note that, depending on the transaction isolation level that is in effect for the index being read, locks may be taken and released as rows are accessed. This is part of the responsibility of the storage engine, and a detailed discussion of this functionality is beyond the scope of this website.

Predicate matched?

The Index Scan operator supports a Predicate property. When present, the logical test in this property will be applied to each row read, and only rows where the logical test results in True are returned. If no Predicate property is present, then every row is considered a match. The actual method of testing the predicate depends on the type of index as represented in the Storage property.

For a RowStore index, the predicate may be pushed into the storage engine. This means that the test is done by the storage engine, after loading the page into the buffer pool but before returning it to the operator. This optimization is not shown in the flowchart above. Whether or not a predicate is pushed depends on the complexity, and is not visible in the execution plan, except when a Bitmap filter is applied. In that case only, when the “probe” internal function used to test the bitmap is enriched with the “IN ROW” parameter, the bitmap probe is pushed into the storage engine. If not, then the probing is done within the operator.

For a ColumnStore index, the predicate will always be pushed into the storage engine. Even when it is a probe of a bitmap, and even when no “IN ROW” parameter is given. The predicate is not only used to filter rows before returning them to the operator, but also to perform rowgroup elimination (often incorrectly called segment elimination) as detailed below.

For a MemoryOptimized index, the predicate can, as far as currently known, only be pushed down into the storage engine when the Index Scan is initiated by a query in a natively compiled query. In so-called InterOp mode (non-compiled code that allows access to both memory-optimized and disk-based data) pushdown is not supported.

Variations by index type and scan properties

The flowchart above shows the basic outline of Index Scan processing. As mentioned above, this is quite simple and the actual complexity is in how the underlying data is accessed. That depends on several of the properties of the Index Scan. In the paragraphs below the different behavior of the “Read first row” and “Read next row” actions is detailed per type of index.

RowStore

For a RowStore nonclustered index (Storage = “RowStore” and IndexKind = “NonClustered”), three different access methods are supported: Allocation Order Scan, Ordered Forward Scan, and Ordered Backward Scan. The final decision on the access method used is made at execution time, though some of the properties of the Index Scan can force a specific access method.

RowStore – Allocation Order Scan

An Allocation Order Scan returns the data in unspecified order. Therefore this access method is only considered when the Ordered property of the Index Scan operator is set to False. Also, since this method is faster for large tables but has a higher startup cost, an Allocation Order Scan is only used when the size of the index is at least 64 pages. And finally, due to the risk of incorrect data from a concurrent update, an Allocation Order Scan is only allowed when the index is on a read-only filegroup, when a full table lock is taken during the scan, or when dirty reads are allowed.

An Allocation Order Scan uses the so-called IAM pages to drive the scan. A detailed discussion of the contents of an IAM page is beyond the scope of this website; simplified it can be seen as a bitmap that represents a huge amount of pages, and each set bit shows that the represented page is part of the index.

“Read first row” uses the internal data structures (which can be seen through the sys.sysindexes view) to find the location of the first IAM page. It brings that page into the buffer pool, and then scans the bitmap structure of that page to find the first page that is allocated to this index. It then brings that page into the buffer pool, checks the page header, and if it is the root page or an intermediate index page continues to scan the IAM page to find the next page allocated to the index. Once a leaf page is found it uses the row offset table on that page to find the location of the first row, and then returns that row.

“Read next row” in an Allocation Order Scan uses the row offset table on the current page to find the location of the next row. If the last row on the current page was processed, it returns to the bitmap on the IAM page to find the next page allocated to this index, brings it into the buffer pool, and verifies the page type. If it is a leaf page it once again uses the row offset table to find and return the first row on the page. If it is not a leaf page then the process of scanning the IAM bitmap continues. Once it reaches the end of the bitmap without finding the next leaf page, it checks the NextPage pointer on the IAM page to find the next IAM page, brings that into the buffer pool, and then continues the process of scanning the bitmap. If no new leaf page is found and the last examined IAM page has a nil pointer in NextPage, then the end of the index data is reached and no row is returned.

RowStore – Ordered Forward Scan

An Ordered Forward Scan returns the data in the order that was specified when the index was created. In most cases this means ascending order of the index’s key columns, except when the keyword DESC or DESCENDING was used when creating the index. An Ordered Forward Scan is always used if the Ordered property of the Index Scan operator is set to True, and Scan Direction is set to FORWARD. An Ordered Forward Scan is also used if the Ordered property is set to False but the other conditions for an Allocation Order Scan are not met.

An Ordered Forward Scan uses the B-tree structure of the index to drive the scan. A B-tree always has a single so-called root page, with pointers to either one or more levels of intermediate index pages, or leaf pages where the actual index data is stored. A more detailed discussion of this B-tree structure is beyond the scope of this website.

“Read first row” uses the internal data structures (which can be seen through the sys.sysindexes view) to find the location of the root page. It brings that page into the buffer pool and reads its first internal record to find the logical first page of the next lower level. It then brings that page into the buffer pool and checks the page header. If it is an intermediate index level, then it again reads the first internal record to find the logical first page of the next lower level. This repeats until the logical first leaf page is found. It then uses the row offset table on that page to find the location of the first row on that page, and then returns that row.

“Read next row” in an Ordered Forward Scan uses the row offset table on the current page to find the location of the next row. If the last row on the current page was processed, it checks the NextPage pointer on the current leaf page to find the logical next leaf page, brings that into the buffer pool, and then once more uses the row offset table on that page to find the location of the first row, and then returns that row. If the last row of a leaf page was processed and that leaf page has a nil pointer in NextPage, then the end of the index data is reached and no row is returned.

RowStore – Ordered Backward Scan

An Ordered Backward Scan returns the data in the order that was specified when the index was created. In most cases this means descending order of the index’s key columns, except when the keyword DESC or DESCENDING was used when creating the index, in which case a backward scan will return the data in ascending order. An Ordered Backward Scan is only used when the Ordered property of the Index Scan operator is set to True, and Scan Direction is set to BACKWARD.

An Ordered Backward Scan uses the same B-tree structure of the index that is also used by an Ordered Forward Scan.

“Read first row” uses the internal data structures (which can be seen through the sys.sysindexes view) to find the location of the root page. It brings that page into the buffer pool and reads its last internal record to find the logical last page of the next lower level. It then brings that page into the buffer pool and checks the page header. If it is an intermediate index level, then it again reads the last internal record to find the logical last page of the next lower level. This repeats until the logical last leaf page is found. It then uses the row offset table on that page to find the location of the last row on that page, and then returns that row.

“Read next row” in an Ordered Backward Scan uses the row offset table on the current page to find the location of the previous row. If the first row on the current page was processed, it checks the PreviousPage pointer on the current leaf page to find the logical previous leaf page, brings that into the buffer pool, and then once more uses the row offset table on that page to find the location of the last row, and then returns that row. If the first row of a leaf page was processed and that leaf page has a nil pointer in PreviousPage, then the start of the index data is reached and no row is returned.

ColumnStore

For a ColumnStore nonclustered index (Storage = “ColumnStore” and IndexKind = “NonClustered”), only a single access method is supported: unordered. For this index type, the Ordered property of the Index Scan operator will always be set to False.

For a columnstore index, “Read first row” first uses internal data structures (which can be exposed through a variety of views, beyond the scope of this website) to find the first rowgroup of the index. If the Predicate property is used and has a condition that allows for rowgroup elimination, it checks the min_data_id and max_data_id columns in sys.column_store_segments (for each column that qualifies for rowgroup elimination) to see if the current rowgroup can contain any matching data at all. If the rowgroup can not contain any matching data for one or more of the columns that allow for rowgroup elimination, then this rowgroup is skipped and the check is repeated for the next rowgroup. If all columns indicate that matching data could exist, then the segments of all columns that are needed for the scan (typically the columns in the Output List property, though the Predicate property is also examined) are brought into the Columnstore Object Pool. After that the data is decompressed, and then the first row can be returned.

For “Read next row”, the operator continues to go through the data of the current rowgroup in the Columnstore Object Pool. Once the last row has been processed, it will continue on to the next rowgroup, applying the rowgroup elimination logic outlined above. Once the last of all rowgroups has been fully processed, the end of the index is reached and no row is returned.

If the columnstore index currently has any rowgroups that are not yet compressed (i.e. they are open or closed deltastores), then the min_data_id and max_data_id columns in sys.column_store_segments are not set, since these are determined when the rowgroup is compressed. So these rowgroups can never be skipped. Also, since they are still in the internal B-tree structure of the deltastore, the Index Scan switches to the methods used for scanning a RowStore index for these rowgroups.

Note that the above description applies to a Columnstore Index Scan running in Row mode. If the Actual Execution Mode property is Batch, then instead of returning just a single row it returns a batch of rows, in an internal format that is very similar to the storage format of the columnstore index itself. It is currently unknown whether the data is just copied over directly, or decompressed first.

MemoryOptimized (NonClustered)

For a MemoryOptimized nonclustered index (Storage = “MemoryOptimized” and IndexKind = “NonClustered”), only a single access method is supported: ordered forward. The Ordered property of the Index Scan operator may still be set to either false or true depending on whether the optimizer wants the data to be returned in ordered fashion, but the value of this property doesn’t affect the actual processing.

The “Read first row” action uses internal data structures (beyond the scope of this website) to find the Page Mapping Table (PMT) of the index, then uses that to find the location in memory of the current version of the root page. On that page, it reads the first pointer, then goes via the PMT to that non-leaf page. This repeats until a leaf page is found. On that leaf page, the first data value is found and the pointer to its first actual row is followed. The timestamp values of that row are checked to verify whether the row should be visible for the running transaction. If not, then the pointer to the next row is followed, and the process repeats. If a visible row is found, then that row is returned to the operator.

For “Read next row”, the pointer to the next row from the current row is followed, and the timestamp values of that row are checked to determine whether the row is visible for the current transaction. If it is, then that row is returned. Otherwise the next row in the pointer chain is checked. If the end of the pointer chain is reached, then the scan returns to the current leaf page of the index, moves to the next index value, and again starts following the pointer chain. Once the last value on the index page has been processed, the pointer to the logically next leaf page is followed, and the process repeats. This lasts until the last leaf page was fully processed and the pointer to the next page is a nil pointer. At that point the end of the index data is reached and no row is returned.

MemoryOptimized (NonClusteredHash)

For a MemoryOptimized nonclustered hash index (Storage = “MemoryOptimized” and IndexKind = “NonClusteredHash”), only a single access method is supported: unordered. For this index type, the Ordered property of the Index Scan operator will always be set to False.

The “Read first row” action uses internal data structures (beyond the scope of this website) to find the start of the internal hash table for the index. It then iterates through the buckets in order, until it finds the first bucket that holds a pointer to a row. It follows that pointer, and checks the timestamp values of that row to verify whether the row should be visible for the running transaction. If not, then the pointer to the next row is followed, and the process repeats. If a visible row is found, then that row is returned to the operator.

For “Read next row”, the pointer to the next row from the current row is followed, and the timestamp values of that row are checked to determine whether the row is visible for the current transaction. If it is, then that row is returned. Otherwise the next row in the pointer chain is checked. If the end of the pointer chain is reached, then the scan returns to the next bucket in the hash table, and again continues to scan the buckets until a pointer to a row is found. This process continues until the all hash buckets have been processed. At that point the end of the index data is reached and no row is returned.

Parallel Page Supplier

If an Index Scan operator is running in a parallel section of an execution plan, it changes its behavior. The specific behavior of an Index Scan in a parallel execution plan is called the “Parallel Page Supplier”. It listens on all threads for row requests, and serves data, one row at a time, to each thread that requests data and has already processed all data on the last page received.

The result is that each thread will continue to receive data until all of the index has been processed, and that skew is minimized. However, it also means that there is no guarantee at all as to which row ends up on which thread. Very often, the data then immediately flows through a Parallelism (Repartition Streams) operator to reassigned rows to the “correct” thread. Since there are no options to change the behavior of the Index Scan operator in a parallel execution plan, this cannot be avoided.

It is currently unknown whether the Parallel Page Supplier is restricted to rowstore indexes only, or whether it also activates for columnstore and/or memory-optimized indexes.

Operator properties

The properties below are specific to the Index Scan operator, or have a specific meaning when appearing on it. For all other properties, see Common properties. Properties that are included on the Common properties page but are also included below for their specific meaning for the Index Scan operator are marked with a *.

Property name	Description
Actual Partition Count	This is the number of partitions within a partitioned table that were accessed at least once during execution. Only available in execution plan plus run-time statistics and only when the Partitioned property is included and set to True.
Actual Partitions Accessed	This property lists all the partitions within a partitioned table that were accessed at least once during execution. In the execution plan XML, this is actually stored as one or more PartitionRange elements, each with a Start and End property. In tools such as SSMS, this is displayed as a semicolon separated list of ranges, where every single-partition range is displayed as a number instead. So for instance, “2-4;7;10-11” represents that partitions 2, 3, 4, 7, 10, and 11 were accessed at least once. Only available in execution plan plus run-time statistics and only when the Partitioned property is included and set to True.
Defined Values *	For an Index Scan, this property lists the columns read from the index and returned to the calling operator. It is therefore the same as the Output List property.
Estimated Number of Rows to be Read	This is an estimate of the number of rows that will be read by the operator as it is scanning the index. In most cases this will be equal to the estimated total number of rows in the index, except when there are other operators in the execution plan that can result in this operator not being called anymore before the end of the data is reached. The difference between this property and the Estimated Number of Rows property represents the number of rows that is estimated to be read but not returned due to the Predicate property. Note that this property is computed without taking the effect of a row goal (if any) into account. If the EstimateRowsWithoutRowGoal property is present, then you need to compare that, instead of the Estimated Number of Rows, to the Estimated Number of Rows to be Read property for a proper assessment of the estimated selectivity of the Predicate.
Forced Index	This property is set to true if the use of this index was forced through an index hint in the query.
ForceScan	This property is set to true if the query used a FORCESCAN hint to force the use of a scan operator even if the optimizer would rather use a seek operator.
ForceSeek	This property is set to true if the query used a FORCESEEK hint to force the use of a seek operator even if the optimizer would rather use a scan operator. On an Index Scan operator, it is therefore always false.
NoExpandHint	This property is set to true if the NOEXPAND hint was used in the query to force the optimizer to use indexes on an indexed view.
Number of Rows Read	This is the number of rows that was read by the operator as it was scanning the index. In most cases this will be equal to the total number of rows in the index, except when there were other operators in the execution plan that resulted in this operator not being called anymore before the end of the data was reached. In a parallel execution plan, this property shows a breakdown of rows on each individual thread. The difference between this property and the Actual Number of Rows property represents the number of rows that was read but not returned due to the Predicate property. Note that this property may report incorrect values when the Predicate property contains a PROBE. Only available in execution plan plus run-time statistics. When the actual number of rows read is equal to zero, this property is omitted.
Object	This property lists the index that is being scanned by the Index Scan operator, using four part naming (database, schema, table, index) and optionally followed by a table alias. The subproperties of the Object property represent the alias and the four name parts separately, but also include two additional properties: Index Kind: Represents what kind of index is scanned. For a (nonclustered) Index Scan, this can be ViewNonClustered (for nonclustered indexes on an indexed view), NonClusteredHash (memory-optimized hash indexes only), or NonClustered (for all other cases). Storage: Determines the storage type of the index. Possible values for a (nonclustered) Index Scan are RowStore, MemoryOptimized, or ColumnStore (though the latter is represented as a Columnstore Index Scan in most tools).
Ordered	This property is set to True by the optimizer when other operators in the execution plan require that the data is returned in an order that matches the index’s key columns. When set to False, the optimizer doesn’t care about the order of rows returned, and the Index Scan is free to determine, sometimes at run time, whether or not to use an access method that follows this order. See the main text for more details on this.
Partitioned	This property is present and set to True when the index read by the Index Scan is a partitioned index.
Predicate	When present, this property defines a logical test to be applied to all rows that are read by the index scan. Only rows for which the predicate evaluates to True are returned. When possible, the Index Scan operator will push this predicate into the storage engine to prevent extra roundtrips between the operator and the storage engine. Note that, except for columnstore indexes when rowgroup elimination can apply, a Predicate on an Index Scan does not reduce the amount of rows that is touched by the operator. The difference between the Actual Number of Rows and the Number of Rows Read property (or their estimated counterparts) shows how many rows were read but not returned. This can be used to gauge how useful an index would be that supports the predicate.
Scan Direction	If the Ordered property is True, then this property specifies whether the data has to be returned in normal index order (“FORWARD”) or reverse index order (“BACKWARD”). If the Ordered property is False, then this property is not used. The BACKWARD option is not available for memory-optimized indexes.
Seek Predicates	The Seek Predicates property can only appear on an Index Scan operator if the Object scanned is a partitioned index, and it only supports equality or inequality conditions on an internal column called PtnIdnnnn, where nnnn is a 4-digit number that is unique within the execution plan. This implements partition elimination, where only the indicated partition(s) of the index are scanned.
Table Cardinality	This property shows the number of rows in the index’s table when the plan was last compiled. Note that this is true even for a filtered index, so it cannot be used as an indication of how many rows are read if the Index Scan is allowed to finish. For this, use the Estimated Number of Rows to be Read property instead.

Implicit properties

This table below lists the behavior of the implicit properties for the Index Scan operator.

Property name	Description
Batch Mode enabled	The Index Scan operator supports both row mode and batch mode execution for both RowStore and ColumnStore indexes. For MemoryOptimized indexes, only row mode execution is supported.
Blocking	The Index Scan operator is non-blocking.
Memory requirement	The Index Scan operator does not have any special memory requirement.
Order-preserving	The Index Scan operator imposes an order if the Ordered property is set to true. If Ordered is set to false, then the operator is assumed to return data in an unreliable order, even though in some cases the actual access methods happen to always return data ordered.
Parallelism aware	When the Index Scan operator is running in a parallel section of an execution plan, it uses the “Parallel Page Supplier” method described above.
Segment aware	The Index Scan operator is not segment aware.