ColumnStore Indexes were first introduced in SQL Server 2012, and this created a new way to store and retrieve the Index or Table data in an efficient manner.
ColumnStore uses xVelocity technology that was based on Vertipaq engine, this uses a new Columnar storage technique to store data that is highly Compressed and is capable of In-memory Caching and highly parallel data scanning with Aggregation algorithms.
ColumnStore or Columnar data format does not store data in traditional RowStore fashion, instead the data is grouped and stored as one column at a time in Column Segments.
The Traditional RowStore stores data for each row and then joins all the rows and store them in Data Pages, and is still the same storage mechanism for Heap and Clustered Indexes.
Thus, a ColumnStore increases the performance by reading less data that is highly compressed, and in batches from disk to memory by further reducing the I/O.
–> To know more about ColumnStore Indexes check [MSDN BoL].
–> Let’s see the journey of ColumnStore index from SQL Server 2012 to 2016:
✔ We will start with SQL Server 2012 offering for ColumnStore Indexes:
1. A Table (Heap or BTree) can have only one NonClustered ColumnStore Index.
2. A Table with NonClustered ColumnStore Index becomes readonly and cannot be updated.
3. The NonClustered ColumnStore Index uses Segment Compression for high compression of Columnar data.
✔ With SQL Sever 2014 some new features were added, like:
1. You can create one Clustered ColumnStore Index on a Table, and no further Indexes can be created.
2. A Table with Clustered ColumnStore Index can be updated with INSERT/UPDATE/DELETE operations.
3. Both Clustered & NonClustered ColumnStore Index has new Archival Compression options i.e. COLUMNSTORE_ARCHIVE to further compress the data.
✔ Now with SQL Server 2016 new features have been added and some limitations are removed:
1. A Table will still have one NonClustered ColumnStore Index, but this will be updatable and thus the Table also.
2. Now you can create a Filtered NonClustered ColumnStore Index by specifying a WHERE clause to it.
3. A Clustered ColumnStore Index table can have one or more NonClustered RowStore Indexes.
4. Clustered ColumnStore Index can now be Unique indirectly as you can create a Primary Key Constraint on a Heap table (and also Foreign Key constraints).
5. Columnar Support for In-Memory (Hekaton) tables, as you can create one one ColumnStore Index on top of them.
–> Here is the full feature summary of ColumnStore Indexe evolution from SQL Server 2012 to 2016:
–> Check more about this on [MSDN BoL].
Check the above details explained in the video below:
While working on Performance Optimization and Index tuning many Developers ends up in creating some (or even many) unnecessary Indexes based upon various SQL Queries. Some or many of those Indexes might not be used at all by the SQL Query Optimizer. But even if they are used you may have to Trade-off with your ETL performance going down, as the CRUD (CREATE, READ, UPDATE, DELETE) operations are going to taking more time to update those new Indexes.
So, while creating new Indexes you will have to plan very carefully, decide and balance out things so that your Data retrieval is fast and on the same hand your ETLs are also not affected much.
SQL Server provides some DMVs (Dynamic Management Views) and DMFs (Dynamic Management Functions) to get this information from SQL engine.
–> To know how the Indexes are getting used we can use sys.dm_db_index_usage_stats DMV and this will provide us information on how many times the Index was used for SEEK, SCAN & LOOKUP operations. Check the Query and its output below:
SELECT DB_NAME(database_id) AS DATABASE_NAME, OBJECT_NAME(ius.object_id) AS TABLE_NAME, ids.name AS INDEX_NAME, ius.user_seeks AS SEEK_COUNT, ius.user_scans AS SCAN_COUNT, ius.user_lookups AS LOOKUP_COUNT, ius.user_seeks + ius.user_scans + ius.user_lookups AS TOTAL_USAGE, ius.last_user_seek AS LAST_SEEK_COUNT, ius.last_user_scan AS LAST_SCAN_COUNT, ius.last_user_lookup AS LAST_LOOKUP_COUNT FROM sys.dm_db_index_usage_stats AS ius INNER JOIN sys.indexes AS ids ON ids.object_id = ius.object_id AND ids.index_id = ius.index_id WHERE OBJECTPROPERTY(ius.object_id,'IsUserTable') = 1 ORDER BY DATABASE_NAME, TOTAL_USAGE
–> Now if we want to know the maintenance overhead on the new Indexes we created, like every time a related Table is updated the Indexes are also updated. We can check by using sys.dm_db_index_operational_stats DMF and it will show how many INSERT, UPDATE & DELETE operations are happening on particular indexes. Check the Query and its output below:
SELECT DB_NAME(database_id) AS DATABASE_NAME, OBJECT_NAME(ios.object_id) AS TABLE_NAME, idx.name AS INDEX_NAME, ios.leaf_insert_count AS INSERT_COUNT, ios.leaf_update_count AS UPDATE_COUNT, ios.leaf_delete_count AS DELETE_COUNT, ios.leaf_insert_count + ios.leaf_update_count + ios.leaf_delete_count AS TOTAL_COUNT FROM sys.dm_db_index_operational_stats (NULL,NULL,NULL,NULL ) ios INNER JOIN sys.indexes AS idx ON idx.object_id = ios.object_id AND idx.index_id = ios.index_id WHERE OBJECTPROPERTY(ios.object_id,'IsUserTable') = 1 ORDER BY DATABASE_NAME, TOTAL_COUNT
–> So, by using these two Dynamic Management Views/Functions you can know:
1. What is the affect (maintenance overhead) of Indexes you created, and
2. Are the Indexes really used or not so that you can DROP them.
Please Note: that these stats could be wrong at times because of several reasons, like:
1. If SQL Server (MSSQLSERVER) service is re-started these counters are initialized to 0.
2. When a Database is detached or is shut down (for example, because AUTO_CLOSE is set to ON), all rows associated with the database are removed.
3. Index Rebuild resets these counters to 0 (Bug logged in MS Connect, link).
In SQL Server 2014 for In-Memory tables there are lot of changes in DDLs compared with normal Disk Based Tables. In-Memory Tables related changes we’ve seen in previous posts, check [here]. Here we will see Memory Optimized Index related changes and few important things to take care before designing your Tables and Indexes.
–> Some of the main points to note are:
1. Indexes on In-Memory tables must be created inline with CREATE TABLE DDL script only.
2. These Indexes are not persisted on Disk and reside only in memory, thus they are not logged. As these Indexes are not persistent so they are re-created whenever SQL Server is restarted. Thus In-Memory tables DO NOT support Clustered Indexes.
3. Only two types of Indexes can be created on In-Memory tables, i.e. Non Clustered Hash Index and Non Clustered Index (aka Range Index). So there is no bookmark lookup.
4. These Non Clustered Indexes are inherently Covering, and all columns are automatically INCLUDED in the Index.
5. Total there can be MAX 8 Non Clustered Indexes created on an In-Memory table.
–> Here we will see how Query Optimizer uses Hash & Range Indexes to process query and return results:
1. Hash Indexes: are used for Point Lookups or Seeks. Are optimized for index seeks on equality predicates and also support full index scans. Thus these will only perform better when the predicate clause contains only equality predicate (=).
2. Range Indexes: are used for Range Scans and Ordered Scans. Are optimized for index scans on inequality predicates, such as greater than or less than, as well as sort order. Thus these will only preform better when the predicate clause contains only inequality predicates (>, <, =, BETWEEN).
–> Let’s check this by some hands-on code. We will create 2 similar In-Memory tables, one with Range Index and another with Hash Index:
-- Create In-Memory Table with simple NonClustered Index (a.k.a Range Index): CREATE TABLE dbo.MemOptTable_With_NC_Range_Index ( ID INT NOT NULL PRIMARY KEY NONCLUSTERED, VarString VARCHAR(200) NOT NULL, DateAdded DATETIME NOT NULL ) WITH ( MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA ) GO -- Create In-Memory Table with NonClustered Hash Index: CREATE TABLE dbo.MemOptTable_With_NC_Hash_Index ( ID INT NOT NULL PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT = 10000), VarString VARCHAR(200) NOT NULL, DateAdded DATETIME NOT NULL ) WITH ( MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA ) GO
–> Now we will Insert about 10k records on both the tables, so that we have good numbers of records to compare:
DECLARE @i INT = 1 WHILE @i <= 10000 BEGIN INSERT INTO dbo.MemOptTable_With_NC_Range_Index VALUES(@i, REPLICATE('a', 200), GETDATE()) INSERT INTO dbo.MemOptTable_With_NC_Hash_Index VALUES(@i, REPLICATE('a', 200), GETDATE()) SET @i = @i+1 END
–> Now check the Execution Plan by using equality Operator (=) on both the tables:
SELECT * FROM MemOptTable_With_NC_Hash_Index WHERE ID = 5000 -- 4% SELECT * FROM MemOptTable_With_NC_Range_Index WHERE ID = 5000 -- 96%
You will see in the Execution Plan image below that Equality Operator with Hash Index Costs you only 4%, but Range Index Costs you 96%.
–> Now check the Execution Plan by using inequality Operator (BETWEEN) on both the tables:
SELECT * FROM MemOptTable_With_NC_Hash_Index WHERE ID BETWEEN 5000 AND 6000 -- 99% SELECT * FROM MemOptTable_With_NC_Range_Index WHERE ID BETWEEN 5000 AND 6000 -- 1%
You will see in the Execution Plan image below that Inequality Operator with Range Index Costs you only 1%, but Hash Index Costs you 99%.
So, while designing In-Memory Tables and Memory Optimized Indexes you will need to see in future that how you will be going to query that table. It also depends upon various scenarios and conditions, so always keep note of these things in advance while designing your In-Memory Tables.
Update: Know more about In-Memory tables:
In my previous post I discussed about the how Clustered Index’s Data-Pages and Rows are allocated in memory (Disk). I tried to prove that that Clustered Indexes do not guarantee Physical Ordering of Rows. But instead they are Logically Ordered and Sorted.
As they are Logically Sorted and when you query a table without an “ORDER BY” clause you get Sorted Rows, but this is not what will happen always. You can also get rows in Unsorted Order, so to get Sorted rows always apply an “ORDER BY” clause. Here in this post we will see under what circumstances a table will not return Sorted rows:
–> Let’s create a simple table with a Clustered Index on it and add some records:
-- Create table with 2 columns, first beign a PK and an IDENTITY column: CREATE TABLE test2 ( i INT IDENTITY(1,1) PRIMARY KEY NOT NULL, j INT ) -- Let's insert some records in random order on second column: INSERT INTO test2 (j) SELECT 500 UNION ALL SELECT 300 UNION ALL SELECT 900 UNION ALL SELECT 100 UNION ALL SELECT 600 UNION ALL SELECT 200 -- Now we will query the table without using ORDER BY clause: SELECT * FROM test2
–> Following is the Output of the first SELECT query above:
You get sorted rows, as the Execution plan shows that Clustered Index was used to fetch the records.
–> Now, what if this table also has a Non Clustered Index, let’s see:
-- We will create a Non Clustered Index on the same table on second column: CREATE INDEX NCI_test ON test2 (j) -- Again query the table without using ORDER BY clause: SELECT * FROM test2
–> Following is the Output of the second SELECT query above:
This time the query returned rows in un-ordered fashion. As you can see in the Execution plan Query optimizer preferred to do a Non Clustered Scan. Thus the records were returned in un-ordered fashion.
–> Now, If you add an ORDER BY Clause to your SELECT Query how does it return rows:
-- Added an ORDER BY Clause: SELECT * FROM test2 ORDER BY i
–> Following is the Output of the third SELECT query above:
After adding an “ORDER BY” clause the query optimizer preferd to use the Clustered Index and returns rows in Ordered fashion again.
-- Final Cleanup DROP TABLE test2
So, it is necessary to provide an “ORDER BY” clause to your Queries when you expect to get sorted results, even if the table has Clustered Index on it.