Advertisements

Archive

Archive for the ‘Indexes’ Category

ColumnStore Indexes evolution from SQL Server 2012, 2014 to 2016

June 8, 2015 2 comments

ColumnStore Indexes were first introduced in SQL Server 2012, and this created a new way to store and retrieve the Index or Table data in an efficient manner.

ColumnStore uses xVelocity technology that was based on Vertipaq engine, this uses a new Columnar storage technique to store data that is highly Compressed and is capable of In-memory Caching and highly parallel data scanning with Aggregation algorithms.

ColumnStore or Columnar data format does not store data in traditional RowStore fashion, instead the data is grouped and stored as one column at a time in Column Segments.

The Traditional RowStore stores data for each row and then joins all the rows and store them in Data Pages, and is still the same storage mechanism for Heap and Clustered Indexes.

Thus, a ColumnStore increases the performance by reading less data that is highly compressed, and in batches from disk to memory by further reducing the I/O.
 

–> To know more about ColumnStore Indexes check [MSDN BoL].
 

–> Let’s see the journey of ColumnStore index from SQL Server 2012 to 2016:
 

We will start with SQL Server 2012 offering for ColumnStore Indexes:

1. A Table (Heap or BTree) can have only one NonClustered ColumnStore Index.

2. A Table with NonClustered ColumnStore Index becomes readonly and cannot be updated.

3. The NonClustered ColumnStore Index uses Segment Compression for high compression of Columnar data.
 

With SQL Sever 2014 some new features were added, like:

1. You can create one Clustered ColumnStore Index on a Table, and no further Indexes can be created.

2. A Table with Clustered ColumnStore Index can be updated with INSERT/UPDATE/DELETE operations.

3. Both Clustered & NonClustered ColumnStore Index has new Archival Compression options i.e. COLUMNSTORE_ARCHIVE to further compress the data.
 

Now with SQL Server 2016 new features have been added and some limitations are removed:

1. A Table will still have one NonClustered ColumnStore Index, but this will be updatable and thus the Table also.

2. Now you can create a Filtered NonClustered ColumnStore Index by specifying a WHERE clause to it.

3. A Clustered ColumnStore Index table can have one or more NonClustered RowStore Indexes.

4. Clustered ColumnStore Index can now be Unique indirectly as you can create a Primary Key Constraint on a Heap table (and also Foreign Key constraints).

5. Columnar Support for In-Memory (Hekaton) tables, as you can create one one ColumnStore Index on top of them.
 

–> Here is the full feature summary of ColumnStore Indexe evolution from SQL Server 2012 to 2016:

SQL Server 2016 - ColumnStoreIndexes

–> Check more about this on [MSDN BoL].
 

Check the above details explained in the video below:


Advertisements

Index Usage Stats – Indexes Used, Unused and Updated

April 2, 2015 1 comment

While working on Performance Optimization and Index tuning many Developers ends up in creating some (or even many) unnecessary Indexes based upon various SQL Queries. Some or many of those Indexes might not be used at all by the SQL Query Optimizer. But even if they are used you may have to Trade-off with your ETL performance going down, as the CRUD (CREATE, READ, UPDATE, DELETE) operations are going to taking more time to update those new Indexes.

So, while creating new Indexes you will have to plan very carefully, decide and balance out things so that your Data retrieval is fast and on the same hand your ETLs are also not affected much.

SQL Server provides some DMVs (Dynamic Management Views) and DMFs (Dynamic Management Functions) to get this information from SQL engine.
 

–> To know how the Indexes are getting used we can use sys.dm_db_index_usage_stats DMV and this will provide us information on how many times the Index was used for SEEK, SCAN & LOOKUP operations. Check the Query and its output below:

SELECT 
	DB_NAME(database_id) AS DATABASE_NAME,
	OBJECT_NAME(ius.object_id) AS TABLE_NAME, 
	ids.name AS INDEX_NAME, 
	ius.user_seeks AS SEEK_COUNT, 
	ius.user_scans AS SCAN_COUNT, 
	ius.user_lookups AS LOOKUP_COUNT, 
	ius.user_seeks + ius.user_scans + ius.user_lookups AS TOTAL_USAGE, 
	ius.last_user_seek AS LAST_SEEK_COUNT, 
	ius.last_user_scan AS LAST_SCAN_COUNT, 
	ius.last_user_lookup AS LAST_LOOKUP_COUNT
FROM sys.dm_db_index_usage_stats AS ius
INNER JOIN sys.indexes AS ids
ON ids.object_id = ius.object_id
AND ids.index_id = ius.index_id 
WHERE OBJECTPROPERTY(ius.object_id,'IsUserTable') = 1 
ORDER BY DATABASE_NAME, TOTAL_USAGE

Index Usage Stats 02
 

–> Now if we want to know the maintenance overhead on the new Indexes we created, like every time a related Table is updated the Indexes are also updated. We can check by using sys.dm_db_index_operational_stats DMF and it will show how many INSERT, UPDATE & DELETE operations are happening on particular indexes. Check the Query and its output below:

SELECT 
	DB_NAME(database_id) AS DATABASE_NAME,
	OBJECT_NAME(ios.object_id) AS TABLE_NAME, 
	idx.name AS INDEX_NAME, 
	ios.leaf_insert_count AS INSERT_COUNT, 
	ios.leaf_update_count AS UPDATE_COUNT, 
	ios.leaf_delete_count AS DELETE_COUNT, 
	ios.leaf_insert_count + ios.leaf_update_count + ios.leaf_delete_count AS TOTAL_COUNT 
FROM sys.dm_db_index_operational_stats (NULL,NULL,NULL,NULL ) ios
INNER JOIN sys.indexes AS idx
ON idx.object_id = ios.object_id
AND idx.index_id = ios.index_id 
WHERE  OBJECTPROPERTY(ios.object_id,'IsUserTable') = 1
ORDER BY DATABASE_NAME, TOTAL_COUNT

Index Usage Stats 01
 

–> So, by using these two Dynamic Management Views/Functions you can know:

1. What is the affect (maintenance overhead) of Indexes you created, and

2. Are the Indexes really used or not so that you can DROP them.
 

Please Note: that these stats could be wrong at times because of several reasons, like:

1. If SQL Server (MSSQLSERVER) service is re-started these counters are initialized to 0.

2. When a Database is detached or is shut down (for example, because AUTO_CLOSE is set to ON), all rows associated with the database are removed.

3. Index Rebuild resets these counters to 0 (Bug logged in MS Connect, link).


Memory Optimized Indexes | Hash vs Range Indexes – SQL Server 2014

December 20, 2013 3 comments

In SQL Server 2014 for In-Memory tables there are lot of changes in DDLs compared with normal Disk Based Tables. In-Memory Tables related changes we’ve seen in previous posts, check [here]. Here we will see Memory Optimized Index related changes and few important things to take care before designing your Tables and Indexes.
 

–> Some of the main points to note are:

1. Indexes on In-Memory tables must be created inline with CREATE TABLE DDL script only.

2. These Indexes are not persisted on Disk and reside only in memory, thus they are not logged. As these Indexes are not persistent so they are re-created whenever SQL Server is restarted. Thus In-Memory tables DO NOT support Clustered Indexes.

3. Only two types of Indexes can be created on In-Memory tables, i.e. Non Clustered Hash Index and Non Clustered Index (aka Range Index). So there is no bookmark lookup.

4. These Non Clustered Indexes are inherently Covering, and all columns are automatically INCLUDED in the Index.

5. Total there can be MAX 8 Non Clustered Indexes created on an In-Memory table.
 

–> Here we will see how Query Optimizer uses Hash & Range Indexes to process query and return results:

1. Hash Indexes: are used for Point Lookups or Seeks. Are optimized for index seeks on equality predicates and also support full index scans. Thus these will only perform better when the predicate clause contains only equality predicate (=).

2. Range Indexes: are used for Range Scans and Ordered Scans. Are optimized for index scans on inequality predicates, such as greater than or less than, as well as sort order. Thus these will only preform better when the predicate clause contains only inequality predicates (>, <, =, BETWEEN).
 

–> Let’s check this by some hands-on code. We will create 2 similar In-Memory tables, one with Range Index and another with Hash Index:

-- Create In-Memory Table with simple NonClustered Index (a.k.a Range Index):
CREATE TABLE dbo.MemOptTable_With_NC_Range_Index
(
    ID INT NOT NULL
        PRIMARY KEY NONCLUSTERED,
    VarString VARCHAR(200) NOT NULL, 
    DateAdded DATETIME NOT NULL
) WITH (
	MEMORY_OPTIMIZED = ON, 
	DURABILITY = SCHEMA_AND_DATA
)
GO

-- Create In-Memory Table with NonClustered Hash Index:
CREATE TABLE dbo.MemOptTable_With_NC_Hash_Index
(
    ID INT NOT NULL
        PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT = 10000),
    VarString VARCHAR(200) NOT NULL, 
    DateAdded DATETIME NOT NULL
) WITH (
	MEMORY_OPTIMIZED = ON, 
	DURABILITY = SCHEMA_AND_DATA
)
GO

 

–> Now we will Insert about 10k records on both the tables, so that we have good numbers of records to compare:

DECLARE @i INT = 1

WHILE @i <= 10000
BEGIN
	INSERT INTO dbo.MemOptTable_With_NC_Range_Index 
		VALUES(@i, REPLICATE('a', 200), GETDATE())

	INSERT INTO dbo.MemOptTable_With_NC_Hash_Index 
		VALUES(@i, REPLICATE('a', 200), GETDATE())

	SET @i = @i+1
END

 

–> Now check the Execution Plan by using equality Operator (=) on both the tables:

SELECT * FROM MemOptTable_With_NC_Hash_Index  WHERE ID = 5000 -- 4%
SELECT * FROM MemOptTable_With_NC_Range_Index WHERE ID = 5000 -- 96%

You will see in the Execution Plan image below that Equality Operator with Hash Index Costs you only 4%, but Range Index Costs you 96%.

SQLServer2014_Hash_vs_Range_1
 

–> Now check the Execution Plan by using inequality Operator (BETWEEN) on both the tables:

SELECT * FROM MemOptTable_With_NC_Hash_Index  WHERE ID BETWEEN 5000 AND 6000 -- 99%
SELECT * FROM MemOptTable_With_NC_Range_Index WHERE ID BETWEEN 5000 AND 6000 -- 1%

You will see in the Execution Plan image below that Inequality Operator with Range Index Costs you only 1%, but Hash Index Costs you 99%.

SQLServer2014_Hash_vs_Range_2
 

So, while designing In-Memory Tables and Memory Optimized Indexes you will need to see in future that how you will be going to query that table. It also depends upon various scenarios and conditions, so always keep note of these things in advance while designing your In-Memory Tables.
 

Update: Know more about In-Memory tables:


 


Clustered Index will not always guarantee Sorted Rows

June 10, 2013 3 comments

In my previous post I discussed about the how Clustered Index’s Data-Pages and Rows are allocated in memory (Disk). I tried to prove that that Clustered Indexes do not guarantee Physical Ordering of Rows. But instead they are Logically Ordered and Sorted.

As they are Logically Sorted and when you query a table without an “ORDER BY” clause you get Sorted Rows, but this is not what will happen always. You can also get rows in Unsorted Order, so to get Sorted rows always apply an “ORDER BY” clause. Here in this post we will see under what circumstances a table will not return Sorted rows:

–> Let’s create a simple table with a Clustered Index on it and add some records:

-- Create  table with 2 columns, first beign a PK and an IDENTITY column:
CREATE TABLE test2
(
	i INT IDENTITY(1,1) PRIMARY KEY NOT NULL,
	j INT
)

-- Let's insert some records in random order on second column:
INSERT INTO test2 (j)
SELECT 500 UNION ALL
SELECT 300 UNION ALL
SELECT 900 UNION ALL
SELECT 100 UNION ALL
SELECT 600 UNION ALL
SELECT 200

-- Now we will query the table without using ORDER BY clause:
SELECT * FROM test2

–> Following is the Output of the first SELECT query above:

ClusteredIndex_Sotring01

You get sorted rows, as the Execution plan shows that Clustered Index was used to fetch the records.


–> Now, what if this table also has a Non Clustered Index, let’s see:

-- We will create a Non Clustered Index on the same table on second column:
CREATE INDEX NCI_test ON test2 (j)

-- Again query the table without using ORDER BY clause:
SELECT * FROM test2

–> Following is the Output of the second SELECT query above:

ClusteredIndex_Sotring02

This time the query returned rows in un-ordered fashion. As you can see in the Execution plan Query optimizer preferred to do a Non Clustered Scan. Thus the records were returned in un-ordered fashion.


–> Now, If you add an ORDER BY Clause to your SELECT Query how does it return rows:

-- Added an ORDER BY Clause:
SELECT * FROM test2 ORDER BY i

–> Following is the Output of the third SELECT query above:

ClusteredIndex_Sotring03

After adding an “ORDER BY” clause the query optimizer preferd to use the Clustered Index and returns rows in Ordered fashion again.


-- Final Cleanup
DROP TABLE test2

So, it is necessary to provide an “ORDER BY” clause to your Queries when you expect to get sorted results, even if the table has Clustered Index on it.

Clustered Index do “NOT” guarantee Physically Ordering or Sorting of Rows

June 2, 2013 9 comments

Lot of SQL Developers I’ve worked with and interviewed have this misconception, that Clustered Indexes are Physically Sorted and Non-Clustered Index are Logically Sorted. When I ask them reasons, all come with their own different versions.

I discussed about this in my previous post long time back, where I mentioned as per the MSDN that “Clustered Indexes do not guarantee Physical Ordering of rows in a Table”. And today in this post I’m going to provide an example to prove it.

Clustered Index is the table itself as its Leaf-Nodes contains the Data-Pages of the table. The Data-Pages are nothing but Doubly Linked-List Data-Structures linked to their leading & preceding Data-Pages. The Data-Pages and rows in them are Ordered by the Clustering Key. Thus there can be only one Clustered Index on a table.

When a new Clustered Index is created on a table, it reorganizes the table’s Data-Pages Physically so that the rows are Logically sorted. But when a tables goes through updates, like INSERT/UPDATE/DELETE operations then the index gets fragmented and thus the Physical Ordering of Data-Pages gets lost. But still the Data-Pages of the Clustered Index or the Table are chained together by Doubly Linked-List in Logical Order. We will see this behavior in the following example:

USE AdventureWorks2012
GO

CREATE TABLE test (i INT PRIMARY KEY not null, j VARCHAR(2000))

INSERT INTO test (i,j)
SELECT 1, REPLICATE('a',100) UNION ALL
SELECT 2, REPLICATE('a',100) UNION ALL
SELECT 3, REPLICATE('a',100) UNION ALL
SELECT 5, REPLICATE('a',100) UNION ALL
SELECT 6, REPLICATE('a',100)

SELECT * FROM test

DBCC IND(AdventureWorks2012, 'test', -1);

DBCC TRACEON(3604);
DBCC PAGE (AdventureWorks2012, 1, 22517, 1);

As you can see above we inserted 5 records in a table having PK as Clustering Key with following values: 1,2,3,5,6. We skipped the value 4.

–> Here is the Output of DBCC PAGE command execute in the end:
The DBCC PAGE provides very descriptive Output, but here I’ve just extracted the useful information we need.

OFFSET TABLE:

Row - Offset                        
4 (0x4) - 556 (0x22c)           --> 1=6
3 (0x3) - 441 (0x1b9)           --> 1=5
2 (0x2) - 326 (0x146)           --> 1=3
1 (0x1) - 211 (0xd3)            --> 1=2    
0 (0x0) - 96  (0x60)		--> 1=1

It show the 5 rows with their memory allocated address in both decimal and hexadecimal formats, which is contiguous. Thus all the rows are Physically Ordered.

Now, let’s insert the value “4” that we skipped initially:

INSERT INTO test (i,j)
SELECT 4, REPLICATE('a',100)

SELECT * FROM test

DBCC IND(AdventureWorks2012, 'test', -1);

DBCC TRACEON(3604);
DBCC PAGE (AdventureWorks2012, 1, 22517, 1);

–> Here is the new Output of DBCC PAGE command execute in the end:

OFFSET TABLE:

Row - Offset                        
5 (0x5) - 556 (0x22c)           --> 1=6
4 (0x4) - 441 (0x1b9)           --> 1=5
3 (0x3) - 671 (0x29f)           --> 1=4           <-- new row added here
2 (0x2) - 326 (0x146)           --> 1=3
1 (0x1) - 211 (0xd3)            --> 1=2    
0 (0x0) - 96  (0x60)		--> 1=1

As you can see very clearly in the output above that the memory allocation address for the new row with PK value = “4” is 671, which is not in between and greater than the address of the last row i.e. 556. The memory address of the last 2 rows is unchanged (441 & 556) and the new row is not accommodated in between as per the sort order but at the end. There is no Physical Ordering and the new row is Logically Chained with other rows and thus is Logically and Ordered, similar to the image below:

Doubly LinkList

Hope the above exercise clears that Clustered Indexes do not guarantee the Physical Ordering of Rows & Data Pages.

-- Final Cleanup
DROP TABLE test