Archive
Clustered Indexes, Non Clustered Indexes & why?
Creating Indexes on tables reduces the query retrieval time and increase the efficiency of SQL queries or statements fired against a database in SQL Server. Indexes are just like a Table of Contents in front side of the book or Index section at the back side of the book.
There are mainly 2 types of Indexes, CLUSTERED & NON-CLUSTERED index which can be created on a table.
– Clustered indexes are similar to a telephone directory where you search a person’s name alphabetically and get his phone number there only.
– Non Clustered indexes are similar to the Index of a book where you get the page number of the item you were searching for. Then turn to that page and read what you were looking for.
According to MS BOL one can create only one Clustered index & as many 249 Non Clustered indexes on a single table.
But why there is a need to create these indexes, what causes the fast retrival of data from the tables.
Let’s check this by creating a large table and creating these Indexes one by one and checking as we go one:
USE [AdventureWorks] GO select * from Sales.SalesOrderDetail -- Total 121317 records select * from Production.Product -- Total 504 records SELECT s.SalesOrderDetailID, s.SalesOrderID, s.ProductID, p.Name as ProductName, s.ModifiedDate INTO IndexTestTable FROM Sales.SalesOrderDetail s JOIN Production.Product p on p.ProductID = s.ProductID GO -- Test the table without any Indexes which is also a HEAP SELECT TOP 10 * FROM IndexTestTable --//////////////////////////////////////////////// --// Scenario 1 : When there is no Clustered Index --//////////////////////////////////////////////// SET STATISTICS PROFILE ON SET STATISTICS TIME ON SELECT SalesOrderDetailID FROM IndexTestTable WHERE SalesOrderDetailID = 60000 SET STATISTICS TIME OFF SET STATISTICS PROFILE OFF GO
CPU time = 15 ms, elapsed time = 17 ms. SELECT [SalesOrderDetailID] FROM [IndexTestTable] WHERE [SalesOrderDetailID]=@1 |--Table Scan(OBJECT:([AdventureWorks].[dbo].[IndexTestTable]), WHERE:([AdventureWorks].[dbo].[IndexTestTable].[SalesOrderDetailID]=[@1]))
It does a Table SCAN and takes 15ms of CPU time and elapsed time of 17ms. As the table is a HEAP so it will always SCAN the for matching rows in the entire table. To do a SEEK there must be an ordering of rows which can be done by putting a Primary key column which automatically creates a CLUSTERED INDEX OR creating a CLUSTERED INDEX explicitly shown below would do the ordering.
--// Create Clustered Index on SalesOrderDetailID column CREATE UNIQUE CLUSTERED INDEX IDX_UCI_SalesOrderDetailID ON IndexTestTable (SalesOrderDetailID) GO --// Check with Clustered Index SET STATISTICS PROFILE ON SET STATISTICS TIME ON SELECT SalesOrderDetailID FROM IndexTestTable WHERE SalesOrderDetailID = 60000 SET STATISTICS TIME OFF SET STATISTICS PROFILE OFF GO
CPU time = 0 ms, elapsed time = 1 ms. SELECT [SalesOrderDetailID] FROM [IndexTestTable] WHERE [SalesOrderDetailID]=@1 |--Clustered Index Seek(OBJECT:([AdventureWorks].[dbo].[IndexTestTable].[IDX_UCI_SalesOrderDetailID]), SEEK:([AdventureWorks].[dbo].[IndexTestTable].[SalesOrderDetailID]=[@1]) ORDERED FORWARD)
Now as shown above this does a Clustered Index SEEK after the creation of CLUSTERED Index. After creating an Index the CPU time is reduced to 0ms from 15ms and Elapsed time to 1ms from 17ms. The following queries will also do the SEEK operation:
SELECT SalesOrderDetailID FROM IndexTestTable WHERE SalesOrderDetailID > 60000 SELECT SalesOrderDetailID FROM IndexTestTable WHERE SalesOrderDetailID > 60000 AND SalesOrderDetailID < 70000 SELECT SalesOrderDetailID FROM IndexTestTable WHERE SalesOrderDetailID BETWEEN 60000 AND 70000 --//////////////////////////////////////////////////// --// Scenario 2 : When there is no Non-Clustered Index --//////////////////////////////////////////////////// SET STATISTICS PROFILE ON SET STATISTICS TIME ON SELECT SalesOrderDetailID FROM IndexTestTable WHERE SalesOrderID = 65000 AND ProductID = 711 SET STATISTICS TIME OFF SET STATISTICS PROFILE OFF GO
CPU time = 16 ms, elapsed time = 15 ms. SELECT [SalesOrderDetailID] FROM [IndexTestTable] WHERE [SalesOrderID]=@1 AND [ProductID]>@2 |--Clustered Index Scan(OBJECT:([AdventureWorks].[dbo].[IndexTestTable].[IDX_UCI_SalesOrderDetailID]), WHERE:([AdventureWorks].[dbo].[IndexTestTable].[SalesOrderID]=[@1] AND [AdventureWorks].[dbo].[IndexTestTable].[ProductID]>CONVERT_IMPLICIT(int,[@2],0)))
Now, for every situation or query the query optimizer will not do SEEK. In first query the INDEX was created on SalesOrderDetailID column, so it will not do a SEEK if query is applied on other columns. You would need to create a another INDEX for those columns. But you can create only one CLUSTERED INDEX. But yes you can also create as many as 249 NONCLUSTERED INDEXES on a table, as shown below.
--// Create Non-Clustered Index on SalesOrderID & ProductID columns CREATE NONCLUSTERED INDEX IDX_NCI_SalesOrderID_ProductID ON IndexTestTable (SalesOrderID, ProductID) GO --// Check with Non Clustered Index SET STATISTICS PROFILE ON SET STATISTICS TIME ON SELECT SalesOrderDetailID FROM IndexTestTable WHERE SalesOrderID = 65000 AND ProductID = 711 SET STATISTICS TIME OFF SET STATISTICS PROFILE OFF GO
CPU time = 0 ms, elapsed time = 1 ms. SELECT [SalesOrderDetailID] FROM [IndexTestTable] WHERE [SalesOrderID]=@1 AND [ProductID]>@2 |--Index Seek(OBJECT:([AdventureWorks].[dbo].[IndexTestTable].[IDX_NCI_SalesOrderID_ProductID]), SEEK:([AdventureWorks].[dbo].[IndexTestTable].[SalesOrderID]=[@1] AND [AdventureWorks].[dbo].[IndexTestTable].[ProductID] > CONVERT_IMPLICIT(int,[@2],0)) ORDERED FORWARD)
It does an Index SEEK, not Clustered Index SEEK. So this time it uses the NONCLUSTERED Index to SEEK the matching rows.
--////////////////////////////////////////////////////////////////////////// --// Scenario 3 : When there is no Non-Clustered Index with Included Columns --////////////////////////////////////////////////////////////////////////// SET STATISTICS PROFILE ON SET STATISTICS TIME ON SELECT ProductName FROM IndexTestTable WHERE SalesOrderID = 65000 AND ProductID = 711 SET STATISTICS TIME OFF SET STATISTICS PROFILE OFF GO
CPU time = 0 ms, elapsed time = 1 ms.
SELECT [ProductName] FROM [IndexTestTable] WHERE [SalesOrderID]=@1 AND [ProductID]=@2
|--Nested Loops(Inner Join, OUTER REFERENCES:([AdventureWorks].[dbo].[IndexTestTable].[SalesOrderDetailID]))
|--Index Seek(OBJECT:([AdventureWorks].[dbo].[IndexTestTable].[IDX_NCI_SalesOrderID_ProductID]),
SEEK:([AdventureWorks].[dbo].[IndexTestTable].[SalesOrderID]=(65000)
AND [AdventureWorks].[dbo].[IndexTestTable].[ProductID]=(711)) ORDERED FORWARD)
|--Clustered Index Seek(OBJECT:([AdventureWorks].[dbo].[IndexTestTable].[IDX_UCI_SalesOrderDetailID]),
SEEK:([AdventureWorks].[dbo].[IndexTestTable].[SalesOrderDetailID]=
[AdventureWorks].[dbo].[IndexTestTable].[SalesOrderDetailID]) LOOKUP ORDERED FORWARD)
It still does an Index SEEK, no problem. Please note that it uses both the Indexes CLUSTERED & NONCLUSTERED.
But when you change the WHERE clause and increase the range of selected items then it does a Index SCAN.
SET STATISTICS PROFILE ON SET STATISTICS TIME ON SELECT ProductName FROM IndexTestTable WHERE SalesOrderID > 57916 AND ProductID > 900 SET STATISTICS TIME OFF SET STATISTICS PROFILE OFF GO
CPU time = 32 ms, elapsed time = 709 ms. SELECT [ProductName] FROM [IndexTestTable] WHERE [SalesOrderID]>@1 AND [ProductID]>@2 |--Clustered Index Scan(OBJECT:([AdventureWorks].[dbo].[IndexTestTable].[IDX_UCI_SalesOrderDetailID]), WHERE:([AdventureWorks].[dbo].[IndexTestTable].[SalesOrderID]>(57916) AND [AdventureWorks].[dbo].[IndexTestTable].[ProductID]>(900)))
--// Create NONCLUSTERED Covering Index on SalesOrderID, ProductID with Included Columns on SalesOrderDetailID CREATE NONCLUSTERED INDEX IDX_NCCI_IndexTestTable_SalesOrderID_ProductID ON IndexTestTable (SalesOrderID, ProductID) INCLUDE (ProductName) GO SET STATISTICS PROFILE ON SET STATISTICS TIME ON SELECT ProductName FROM IndexTestTable WHERE SalesOrderID > 57916 AND ProductID > 900 SET STATISTICS TIME OFF SET STATISTICS PROFILE OFF GO
CPU time = 0 ms, elapsed time = 654 ms. SELECT [ProductName] FROM [IndexTestTable] WHERE [SalesOrderID]>@1 AND [ProductID]>@2 |--Index Seek(OBJECT:([AdventureWorks].[dbo].[IndexTestTable].[IDX_NCCI_IndexTestTable_SalesOrderID_ProductID]), SEEK:([AdventureWorks].[dbo].[IndexTestTable].[SalesOrderID] > (57916)), WHERE:([AdventureWorks].[dbo].[IndexTestTable].[ProductID]>(900)) ORDERED FORWARD)
Now, again the query optimizer uses Index SEEK, but it uses the new NONCLUSTERED INDEX with INCLUDED column.
-- Final Cleanup DROP TABLE IndexTestTable
Related MS BOL links: http://msdn.microsoft.com/en-us/library/aa933131%28v=sql.80%29.aspx
BOL links for CLUSTERED Index vs NONCLUSTERED Index:-
http://www.devtoolshed.com/content/clustered-index-vs-non-clustered-index-sql-server
http://www.mssqlcity.com/FAQ/General/clustered_vs_nonclustered_indexes.htm
http://forums.devx.com/showthread.php?t=19018
Avoid CURSORS? why not use them optimally…
CURSORS or WHILE loops with temp-tables & counter, what do you prefer, personally and perofrmance wise?
This has been discussed in lots of forums, threads, posts and blogs previously. Many people and experts claim to use the either one and most of them are inclined to WHILE loops, and suggest to avoid CURSORS without any proof and logic.
Links: http://blog.sqlauthority.com/2008/05/21/sql-server-2005-twelve-tips-for-optimizing-sql-server-2005-query-performance/
http://www.c-sharpcorner.com/UploadFile/skumaar_mca/CursorsAlternative09012009011823AM/CursorsAlternative.aspx
http://www.sqlbook.com/SQL/Avoiding-using-SQL-Cursors-20.aspx
… and many more.
To make this more clear I tested this scenario myself and also posted this case in MSDN TSQL forum.
use [tempdb]
GO
create table T1 (sn int identity(1,1) primary key, data varchar(1000))
GO
insert into T1 (data)
values (replicate('a',1000))
GO 10000
select * from T1
create table T2 (sn int primary key, data varchar(1000))
create table T3 (sn int primary key, data varchar(1000))
-- Test the CURSOR, pull record from T1 and insert into T2
set nocount on
declare @stDate datetime
set @stDate = getdate()
declare @sn int , @data varchar(1000)
declare cr cursor FORWARD_ONLY FAST_FORWARD READ_ONLY
for select sn, data from T1
open cr
fetch next from cr into @sn, @data
while @@fetch_status=0
begin
insert into T2
select @sn, @data
fetch next from cr into @sn, @data
end
close cr
deallocate cr
select Datediff(ms,@stDate,getdate()) -- Ran 4 times, it gives me 966, 413, 310, 306 ms
GO
-- Test the WHILE loop with counter (NO CURSOR), pull record from T1 and insert into T3
set nocount on
declare @stDate datetime
set @stDate = getdate()
declare @ctr int
set @ctr=0
while @ctr<=10000
begin
insert into T3
select sn, data
from T1
where sn = @ctr
set @ctr = @ctr + 1
end
select Datediff(ms,@stDate,getdate()) -- Ran 4 times, it gives me: 1070, 450, 503, 423 ms
GO
--Final Cleanup
drop table T2
drop table T3
drop table T1
I ran the above code for CURSOR & WHILE loop 4 times and it gave me less execution time for CURSOR, everytime.
While using CURSORS the main thing most people miss out are the options that are available to optimize CURSOR use.
According to MS BOL they are:
DECLARE cursor_name CURSOR [ LOCAL | GLOBAL ]
[ FORWARD_ONLY | SCROLL ]
[ STATIC | KEYSET | DYNAMIC | FAST_FORWARD ]
[ READ_ONLY | SCROLL_LOCKS | OPTIMISTIC ]
To know more check this link: http://msdn.microsoft.com/en-us/library/ms180169.aspx
This prove that CURSORs are more performant than WHILE loops. I’m open for your comments, thanks.
MSDN links:
http://social.msdn.microsoft.com/Forums/en-US/transactsql/thread/e67105a6-0f4a-4a12-85b9-e7e9855279e7/
http://social.msdn.microsoft.com/Forums/en-US/transactsql/thread/2978d387-fcd0-45bb-bf69-80139b6dac53
http://social.msdn.microsoft.com/Forums/en-US/transactsql/thread/ba15132e-c26d-4472-92e9-845ce2cc244d/
UNION vs UNION ALL | which one is faster?
UNION and UNION ALL are used to combine data or record sets from two different tables. One could combine more than 2 tables. Just like JOINS, UNION combines data into a single record-set but vertically by adding rows from another table. JOINS combine data horizontally by adding columns from another table.
UNION insures you get DISTINCT records from both the tables.
UNION ALL pulls out all records from both the tables with duplicates.
Generally “UNION ALL” is considered best in performance when compared with UNION.
But you should only use “UNION ALL” when you are sure that you have distinct or no common records in both the records-sets.
For more information on this check my earlier blog post which shows the main differences and how performance differs in b/w the two: https://sqlwithmanoj.wordpress.com/2010/12/30/why-union-all-is-faster-than-union/
Why UNION ALL is faster than UNION?
UNION and UNION ALL are used to combine data or record sets from two different tables. One could combine more than 2 tables. Just like JOINS, UNION combines data into a single record-set but vertically by adding rows from another table. JOINS combine data horizontally by adding columns from another table.
UNION insures you get DISTINCT records from both the tables.
UNION ALL pulls out all records from both the tables with duplicates.
SQL Scripts below proves why UNION ALL is faster than UNION:
-- Create First table: #tempTable1 select FirstName, LastName into #tempTable1 from Person.Contact where ContactID <= 100 -- Create Second table: #tempTable2 select FirstName, LastName into #tempTable2 from Person.Contact where ContactID > 100 and ContactID <= 200
Comparison:
SET STATISTICS PROFILE ON -- Using UNION - Query Cost: 35% select * from #tempTable1 UNION select * from #tempTable2
Query Plan for UNION:
|--Sort(DISTINCT ORDER BY:([Union1008] ASC, [Union1009] ASC))
|--Concatenation
|--Table Scan(OBJECT:([tempdb].[dbo].[#tempTable1]))
|--Table Scan(OBJECT:([tempdb].[dbo].[#tempTable2]))
-- Using UNION ALL - Query Cost: 13% select * from #tempTable1 UNION ALL select * from #tempTable2
Query Plan for UNION ALL:
|--Concatenation
|--Table Scan(OBJECT:([tempdb].[dbo].[#tempTable1]))
|--Table Scan(OBJECT:([tempdb].[dbo].[#tempTable2]))
-- Using UNION ALL with DISTINCT - Query Cost: 52% select DISTINCT * from #tempTable1 UNION ALL select DISTINCT * from #tempTable2
Query Plan for UNION ALL with DISTINCT:
|--Concatenation
|--Sort(DISTINCT ORDER BY:([tempdb].[dbo].[#tempTable1].[FirstName] ASC, [tempdb].[dbo].[#tempTable1].[LastName] ASC))
| |--Table Scan(OBJECT:([tempdb].[dbo].[#tempTable1]))
|--Sort(DISTINCT ORDER BY:([tempdb].[dbo].[#tempTable2].[FirstName] ASC, [tempdb].[dbo].[#tempTable2].[LastName] ASC))
|--Table Scan(OBJECT:([tempdb].[dbo].[#tempTable2]))
SET STATISTICS PROFILE OFF -- Final Cleanup drop table #tempTable1 drop table #tempTable2
The Query plan for:
– UNION ALL SELECT above shows it just does a concatenation.
– UNION SELECT above shows it does a SORT on final concatenation.
– UNION ALL with DISTINCT SELECT above shows it does a SORT of individual tables than does the final concatenation.
The Query cost with:
– UNION ALL = 13%
– UNION = 35%
– UNION ALL with DISTINCT = 52%
This proves that:
- UNION ALL is faster and more optimized than UNION. But this does not mean you use UNION ALL in every scenario.
- UNION is not equivalent to “UNION ALL with DISTINCT”.




