Archive

Archive for the ‘Differences’ Category

EXECUTE (or EXEC) vs sp_executesql

March 22, 2011 6 comments

–> EXECUTE:
As per MS BOL EXECUTE executes a command string or character string within a TSQL batch, or one of the following modules: system stored procedure, user-defined stored procedure, scalar-valued user-defined function, or extended stored procedure. The TSQL query can be a direct string or a variable of char, varchar, nchar, or nvarchar data type.

–> sp_executesql:
As per MS BOL sp_executesql executes a TSQL statement or batch that can be reused many times, or one that has been built dynamically. The TSQL statement or batch can contain embedded parameters. The SQL query is a Unicode string or a Unicode variable that contains a Transact-SQL statement or batch. Here the variable datatype is restricted to Unicode nchar or nvarchar only. If a Unicode constant (SQL string) is used then the it must be prefixed with N.

–> Main difference performance wise: sp_executesql is generally preferred over EXEC() when executing dynamic T-SQL. sp_executesql works by creating a stored procedure using the specified query, then calling it using the supplied parameters. Unlike EXEC(), sp_executesql provides a mechanism that allows you to parameterize dynamic T-SQL and encourage plan reuse. A dynamic query that is executed using sp_executesql has a much better chance of avoiding unnecessary compilation and resource costs than one ran using EXEC().

Let’s check the difference with a simple example using [AdventureWorks] database’s [Person].[Contact] table:

USE [AdventureWorks]
GO

DECLARE @str NVARCHAR(1000)
DECLARE @fName NVARCHAR(20)
DECLARE @ePromo INT

SET @fName = N'James'
SET @ePromo = 1

--// Using EXECUTE or EXEC
SET @str = N'
SELECT ContactID, FirstName, MiddleName, LastName, EmailAddress, Phone, EmailPromotion
FROM Person.Contact
WHERE FirstName = ''' + @fName + '''
AND EmailPromotion = ' + CAST(@ePromo as NVARCHAR(20))

PRINT @str

EXEC (@str)
GO

--// Using sp_executesql
DECLARE @str NVARCHAR(1000)
DECLARE @fName NVARCHAR(20)
DECLARE @ePromo INT

DECLARE @paramList NVARCHAR(500)
SET @paramList = N'@fNameParam NVARCHAR(20), @ePromoParam INT'

SET @str = N'
SELECT ContactID, FirstName, MiddleName, LastName, EmailAddress, Phone, EmailPromotion
FROM Person.Contact
WHERE FirstName = @fNameParam
AND EmailPromotion = @ePromoParam'

SET @fName = N'James'
SET @ePromo = 1
EXECUTE sp_executesql @str, @paramList, @fNameParam=@fName, @ePromoParam=@ePromo

-- When the match is LIKE
SET @str = N'
SELECT ContactID, FirstName, MiddleName, LastName, EmailAddress, Phone, EmailPromotion
FROM Person.Contact
WHERE FirstName LIKE ''%'' + @fNameParam + ''%''
AND EmailPromotion = @ePromoParam'

SET @fName = N'James'
SET @ePromo = 0
EXECUTE sp_executesql @str, @paramList, @fNameParam=@fName, @ePromoParam=@ePromo
GO

Security Note:
Before you call EXECUTE or sp_executesql with a character string, validate the character string. Never execute a command constructed from user input that has not been validated. For more information, see SQL Injection.

Clustered vs NonClustered Indexes… and data sorting in SQL Server

March 2, 2011 3 comments

This post on difference between Clustered Index & Non Clustered Index is a prequel to my blog post on requirement & use of Clustered index & NonClustered index, [link].

As per MS BOL (MSDN) in SQL Server, indexes are organized as B-trees. Each page in an index B-tree is called an index node. The top node of the B-tree is called the root node. The bottom level of nodes in the index is called the leaf nodes. Any index levels between the root and the leaf nodes are collectively known as intermediate levels.
 

In Clustered index: (MSDN)

1. The Leaf nodes contain the Data pages of the underlying table.

2. The Root and Intermediate level nodes contain Index pages holding Index rows.

3. Each Index row contains a Key value and a pointer to either an Intermediate level page in the B-tree, or a Data row in the Leaf level of the Index. The Pages in each level of the Index are linked in a Doubly-linked list.

4. The Pages in the Data chain and the rows in them are ordered on the value of the Clustered Index key.

5. There can be only one Clustered Index on a table.

6. Does not support Included column, because they already contain all the columns which are not in the index as Included columns.
 

In NonClustered Index: (MSDN)

1. The Leaf layer of a NonClustered index is made up of Index pages instead of Data pages.

2. Each Index row in the NonClustered index contains the NonClustered Key value and a row locator. This locator points to the Data row in the Clustered index or Heap having the Key value.

2.a. If the table is a Heap, which means it does not have a Clustered index, the row locator is a pointer to the row.

2.b. If the table has a Clustered index, or the index is on an Indexed view, the row locator is the Clustered index Key for the row. SQL Server retrieves the data row by searching the Clustered index using the Clustered index Key stored in the Leaf row of the NonClustered index.

3. The Data rows of the underlying table are not sorted and stored in order based on their nonclustered keys.

4. Each table can have up to 249 & 999 nonclustered indexes on SQL Server 2005 & 2008 respectively.
 

Indexing & data Sorting: (MSDN)

As per MS BOL, a Clustered Index only reorganizes the data pages so that the rows are logically sorted in Clustered Index order. The pages are not guaranteed to be ordered physically. SQL Server doesn’t necessarily store the data physically on the disk in clustered-index order, but while creating an index, SQL Server attempts to physically order the data as close to the logical order as possible. Each page in an index’s leaf level has a pointer to the page that logically precedes the current page and to the page that logically follows the current page, thereby creating a doubly linked list. The sysindexes table contains the address of the first leaf-level page. Because the data is guaranteed to be logically in clustered-index order, SQL Server can just start at the first page and follow the index pointers from one page to the next to retrieve the data in order.

So its not guaranteed about the physical ordering of records/rows if a table has Clustered Index on it. It is a common misconsecption among people that Clustered Index sorts data physically & Non Clustered Index sorts data logically.

Also discussed this topic on MSDN’s TSQL forum and got several expert comments & answers:  http://social.msdn.microsoft.com/Forums/en-US/transactsql/thread/1c98a7ee-7e60-4730-a38c-f0e3f0deddba

More Info on ordering: https://sqlwithmanoj.com/2013/06/02/clustered-index-do-not-guarantee-physically-ordering-or-sorting-of-rows/


[NOT] IN, [NOT] EXISTS, JOINS with NULL values

February 15, 2011 3 comments

Using NOT IN could be dangerous and may result to incorrect results at times.

Let’s check this by a step by step example:

use [tempdb]
go

-- Create test tables and fill test data:
CREATE TABLE [dbo].TestTable(AID int NOT NULL)

INSERT INTO TestTable(AID) values (1)
INSERT INTO TestTable(AID) values (2)
INSERT INTO TestTable(AID) values (3)
INSERT INTO TestTable(AID) values (4)
go

create TABLE TableWithNull(BID int NULL DEFAULT(NULL))

INSERT INTO TableWithNull(BID) VALUES(NULL)
INSERT INTO TableWithNull(BID) VALUES(0)
INSERT INTO TableWithNull(BID) VALUES(1)
INSERT INTO TableWithNull(BID) VALUES(2)
GO

-- Check inserted records:
SELECT * FROM TableWithNull -- NULL, 0, 1, 2

select * from TestTable -- 1, 2, 3, 4

 

–> Using IN & NOT IN

select * from TestTable 
where AID in (SELECT BID FROM TableWithNull) -- 1, 2

select * from TestTable 
where AID not in (SELECT BID FROM TableWithNull) -- 0 rows (expected 3, 4)

SQL is a set-based language, any set containing NULL value makes it whole as NULL. Because NULL is unknown and anything added to it also becomes unknown. Thus above SQL statement results 0 rows. To make it work you have to treat the NULL and add a WHERE clause or use EXISTS instead of IN shown below:

select * from TestTable 
where AID not in (SELECT BID FROM TableWithNull where BID is not null) -- 3, 4 (expected result)
go

 

–> Using EXISTS & NOT EXISTS

select * from TestTable A 
where exists (SELECT * FROM TableWithNull B where A.AID = B.BID) -- 1, 2

select * from TestTable A 
where not exists (SELECT * FROM TableWithNull B where A.AID = B.BID) -- 3, 4
go

NOTE: EXISTS keyword is a good way to check sub-queries, this makes them co-related sub-queries. With EXISTS you don’t even need to provide any column name, just use “SELECT *”, as it does not use the SELECT list at all.
With IN you can only compare one column, but with EXISTS you can compare multiple columns within outer & inner queries (sub-queries or derived queries).
 

–> Using JOINS in place of IN() & EXISTS()

select A.* -- 1, 2
from TestTable A
JOIN TableWithNull B ON
A.AID = B.BID

select A.* -- 3, 4
from TestTable A
LEFT JOIN TableWithNull B ON
A.AID = B.BID
WHERE B.BID IS NULL
go

 

–> Here’s another catch, weird behavior:

select * from TestTable 
where AID not in (SELECT AID FROM TableWithNull) -- Query works even when AID column is not in [TableWithNULL] table.
-- [AID] column does not belong to [TableWithNull] table, but the query still work and won't give any error.
-- So its always adviced to use table alias, shown below:

select * from TestTable A 
where A.AID not in (SELECT B.AID FROM TableWithNull B) -- Invalid column name 'AID'.

 

NOTE: So always try to provide ALIAS you your tables and use it with COLUMN names in your queries.

-- Final Cleanup
drop table TableWithNull
drop table TestTable
go

Clustered Indexes, Non Clustered Indexes & why?

February 9, 2011 10 comments

Creating Indexes on tables reduces the query retrieval time and increase the efficiency of SQL queries or statements fired against a database in SQL Server. Indexes are just like a Table of Contents in front side of the book or Index section at the back side of the book.

There are mainly 2 types of Indexes, CLUSTERED & NON-CLUSTERED index which can be created on a table.
– Clustered indexes are similar to a telephone directory where you search a person’s name alphabetically and get his phone number there only.
– Non Clustered indexes are similar to the Index of a book where you get the page number of the item you were searching for. Then turn to that page and read what you were looking for.

According to MS BOL one can create only one Clustered index & as many 249 Non Clustered indexes on a single table.

But why there is a need to create these indexes, what causes the fast retrival of data from the tables.
Let’s check this by creating a large table and creating these Indexes one by one and checking as we go one:

USE [AdventureWorks]
GO

select * from Sales.SalesOrderDetail -- Total 121317 records
select * from Production.Product -- Total 504 records

SELECT s.SalesOrderDetailID, s.SalesOrderID, s.ProductID, p.Name as ProductName, s.ModifiedDate
INTO IndexTestTable
FROM Sales.SalesOrderDetail s
JOIN Production.Product p
on p.ProductID = s.ProductID
GO

-- Test the table without any Indexes which is also a HEAP
SELECT TOP 10 * FROM IndexTestTable

--////////////////////////////////////////////////
--// Scenario 1 : When there is no Clustered Index
--////////////////////////////////////////////////
SET STATISTICS PROFILE ON
SET STATISTICS TIME ON
SELECT SalesOrderDetailID FROM IndexTestTable WHERE SalesOrderDetailID = 60000
SET STATISTICS TIME OFF
SET STATISTICS PROFILE OFF
GO
CPU time = 15 ms,  elapsed time = 17 ms.
SELECT [SalesOrderDetailID] FROM [IndexTestTable] WHERE [SalesOrderDetailID]=@1
  |--Table Scan(OBJECT:([AdventureWorks].[dbo].[IndexTestTable]),
WHERE:([AdventureWorks].[dbo].[IndexTestTable].[SalesOrderDetailID]=[@1]))

It does a Table SCAN and takes 15ms of CPU time and elapsed time of 17ms. As the table is a HEAP so it will always SCAN the for matching rows in the entire table. To do a SEEK there must be an ordering of rows which can be done by putting a Primary key column which automatically creates a CLUSTERED INDEX OR creating a CLUSTERED INDEX explicitly shown below would do the ordering.

--// Create Clustered Index on SalesOrderDetailID column
CREATE UNIQUE CLUSTERED INDEX IDX_UCI_SalesOrderDetailID
	ON IndexTestTable (SalesOrderDetailID)
GO

--// Check with Clustered Index
SET STATISTICS PROFILE ON
SET STATISTICS TIME ON
SELECT SalesOrderDetailID FROM IndexTestTable WHERE SalesOrderDetailID = 60000
SET STATISTICS TIME OFF
SET STATISTICS PROFILE OFF
GO
CPU time = 0 ms,  elapsed time = 1 ms.
SELECT [SalesOrderDetailID] FROM [IndexTestTable] WHERE [SalesOrderDetailID]=@1
  |--Clustered Index Seek(OBJECT:([AdventureWorks].[dbo].[IndexTestTable].[IDX_UCI_SalesOrderDetailID]),
SEEK:([AdventureWorks].[dbo].[IndexTestTable].[SalesOrderDetailID]=[@1]) ORDERED FORWARD)

Now as shown above this does a Clustered Index SEEK after the creation of CLUSTERED Index. After creating an Index the CPU time is reduced to 0ms from 15ms and Elapsed time to 1ms from 17ms. The following queries will also do the SEEK operation:

SELECT SalesOrderDetailID FROM IndexTestTable WHERE SalesOrderDetailID > 60000
SELECT SalesOrderDetailID FROM IndexTestTable WHERE SalesOrderDetailID > 60000 AND SalesOrderDetailID < 70000
SELECT SalesOrderDetailID FROM IndexTestTable WHERE SalesOrderDetailID BETWEEN 60000 AND 70000

--////////////////////////////////////////////////////
--// Scenario 2 : When there is no Non-Clustered Index
--////////////////////////////////////////////////////
SET STATISTICS PROFILE ON
SET STATISTICS TIME ON
SELECT SalesOrderDetailID FROM IndexTestTable WHERE SalesOrderID = 65000 AND ProductID = 711
SET STATISTICS TIME OFF
SET STATISTICS PROFILE OFF
GO
CPU time = 16 ms,  elapsed time = 15 ms.
SELECT [SalesOrderDetailID] FROM [IndexTestTable] WHERE [SalesOrderID]=@1 AND [ProductID]>@2
  |--Clustered Index Scan(OBJECT:([AdventureWorks].[dbo].[IndexTestTable].[IDX_UCI_SalesOrderDetailID]),
WHERE:([AdventureWorks].[dbo].[IndexTestTable].[SalesOrderID]=[@1]
AND [AdventureWorks].[dbo].[IndexTestTable].[ProductID]>CONVERT_IMPLICIT(int,[@2],0)))

Now, for every situation or query the query optimizer will not do SEEK. In first query the INDEX was created on SalesOrderDetailID column, so it will not do a SEEK if query is applied on other columns. You would need to create a another INDEX for those columns. But you can create only one CLUSTERED INDEX. But yes you can also create as many as 249 NONCLUSTERED INDEXES on a table, as shown below.

--// Create Non-Clustered Index on SalesOrderID & ProductID columns
CREATE NONCLUSTERED INDEX IDX_NCI_SalesOrderID_ProductID
	ON IndexTestTable (SalesOrderID, ProductID)
GO

--// Check with Non Clustered Index
SET STATISTICS PROFILE ON
SET STATISTICS TIME ON
SELECT SalesOrderDetailID FROM IndexTestTable WHERE SalesOrderID = 65000 AND ProductID = 711
SET STATISTICS TIME OFF
SET STATISTICS PROFILE OFF
GO
CPU time = 0 ms,  elapsed time = 1 ms.
SELECT [SalesOrderDetailID] FROM [IndexTestTable] WHERE [SalesOrderID]=@1 AND [ProductID]>@2
  |--Index Seek(OBJECT:([AdventureWorks].[dbo].[IndexTestTable].[IDX_NCI_SalesOrderID_ProductID]),
SEEK:([AdventureWorks].[dbo].[IndexTestTable].[SalesOrderID]=[@1]
AND [AdventureWorks].[dbo].[IndexTestTable].[ProductID] >
CONVERT_IMPLICIT(int,[@2],0)) ORDERED FORWARD)

It does an Index SEEK, not Clustered Index SEEK. So this time it uses the NONCLUSTERED Index to SEEK the matching rows.

--//////////////////////////////////////////////////////////////////////////
--// Scenario 3 : When there is no Non-Clustered Index with Included Columns
--//////////////////////////////////////////////////////////////////////////
SET STATISTICS PROFILE ON
SET STATISTICS TIME ON
SELECT ProductName FROM IndexTestTable WHERE SalesOrderID = 65000 AND ProductID = 711
SET STATISTICS TIME OFF
SET STATISTICS PROFILE OFF
GO
CPU time = 0 ms,  elapsed time = 1 ms.
SELECT [ProductName] FROM [IndexTestTable] WHERE [SalesOrderID]=@1 AND [ProductID]=@2
  |--Nested Loops(Inner Join, OUTER REFERENCES:([AdventureWorks].[dbo].[IndexTestTable].[SalesOrderDetailID]))
       |--Index Seek(OBJECT:([AdventureWorks].[dbo].[IndexTestTable].[IDX_NCI_SalesOrderID_ProductID]),
SEEK:([AdventureWorks].[dbo].[IndexTestTable].[SalesOrderID]=(65000)
AND [AdventureWorks].[dbo].[IndexTestTable].[ProductID]=(711)) ORDERED FORWARD)
       |--Clustered Index Seek(OBJECT:([AdventureWorks].[dbo].[IndexTestTable].[IDX_UCI_SalesOrderDetailID]),
SEEK:([AdventureWorks].[dbo].[IndexTestTable].[SalesOrderDetailID]=
[AdventureWorks].[dbo].[IndexTestTable].[SalesOrderDetailID]) LOOKUP ORDERED FORWARD)

It still does an Index SEEK, no problem. Please note that it uses both the Indexes CLUSTERED & NONCLUSTERED.
But when you change the WHERE clause and increase the range of selected items then it does a Index SCAN.

SET STATISTICS PROFILE ON
SET STATISTICS TIME ON
SELECT ProductName FROM IndexTestTable WHERE SalesOrderID > 57916 AND ProductID > 900
SET STATISTICS TIME OFF
SET STATISTICS PROFILE OFF
GO
CPU time = 32 ms,  elapsed time = 709 ms.
SELECT [ProductName] FROM [IndexTestTable] WHERE [SalesOrderID]>@1 AND [ProductID]>@2
  |--Clustered Index Scan(OBJECT:([AdventureWorks].[dbo].[IndexTestTable].[IDX_UCI_SalesOrderDetailID]),
WHERE:([AdventureWorks].[dbo].[IndexTestTable].[SalesOrderID]>(57916)
AND [AdventureWorks].[dbo].[IndexTestTable].[ProductID]>(900)))
--// Create NONCLUSTERED Covering Index on SalesOrderID, ProductID with Included Columns on SalesOrderDetailID
CREATE NONCLUSTERED INDEX IDX_NCCI_IndexTestTable_SalesOrderID_ProductID
	ON IndexTestTable (SalesOrderID, ProductID) INCLUDE (ProductName)
GO

SET STATISTICS PROFILE ON
SET STATISTICS TIME ON
SELECT ProductName FROM IndexTestTable WHERE SalesOrderID > 57916 AND ProductID > 900
SET STATISTICS TIME OFF
SET STATISTICS PROFILE OFF
GO
CPU time = 0 ms,  elapsed time = 654 ms.
SELECT [ProductName] FROM [IndexTestTable] WHERE [SalesOrderID]>@1
AND [ProductID]>@2
  |--Index Seek(OBJECT:([AdventureWorks].[dbo].[IndexTestTable].[IDX_NCCI_IndexTestTable_SalesOrderID_ProductID]),
SEEK:([AdventureWorks].[dbo].[IndexTestTable].[SalesOrderID] > (57916)),
WHERE:([AdventureWorks].[dbo].[IndexTestTable].[ProductID]>(900)) ORDERED FORWARD)

Now, again the query optimizer uses Index SEEK, but it uses the new NONCLUSTERED INDEX with INCLUDED column.

-- Final Cleanup
DROP TABLE IndexTestTable

Related MS BOL links: http://msdn.microsoft.com/en-us/library/aa933131%28v=sql.80%29.aspx

BOL links for CLUSTERED Index vs NONCLUSTERED Index:-
http://www.devtoolshed.com/content/clustered-index-vs-non-clustered-index-sql-server
http://www.mssqlcity.com/FAQ/General/clustered_vs_nonclustered_indexes.htm
http://forums.devx.com/showthread.php?t=19018

Avoid CURSORS? why not use them optimally…

February 7, 2011 5 comments

CURSORS or WHILE loops with temp-tables & counter, what do you prefer, personally and perofrmance wise?

This has been discussed in lots of forums, threads, posts and blogs previously. Many people and experts claim to use the either one and most of them are inclined to WHILE loops, and suggest to avoid CURSORS without any proof and logic.
Links: http://blog.sqlauthority.com/2008/05/21/sql-server-2005-twelve-tips-for-optimizing-sql-server-2005-query-performance/
http://www.c-sharpcorner.com/UploadFile/skumaar_mca/CursorsAlternative09012009011823AM/CursorsAlternative.aspx
http://www.sqlbook.com/SQL/Avoiding-using-SQL-Cursors-20.aspx
… and many more.

To make this more clear I tested this scenario myself and also posted this case in MSDN TSQL forum.

use [tempdb]
GO

create table T1 (sn int identity(1,1) primary key, data varchar(1000))
GO

insert into T1 (data)
values (replicate('a',1000))
GO 10000

select * from T1

create table T2 (sn int primary key, data varchar(1000))
create table T3 (sn int primary key, data varchar(1000))

-- Test the CURSOR, pull record from T1 and insert into T2
set nocount on

declare @stDate datetime
set @stDate = getdate()
declare @sn int , @data varchar(1000)

declare cr cursor FORWARD_ONLY FAST_FORWARD READ_ONLY
for select sn, data from T1

open cr
fetch next from cr into @sn, @data
while @@fetch_status=0
begin
	insert into T2
	select @sn, @data
	fetch next from cr into @sn, @data
end
close cr
deallocate cr

select Datediff(ms,@stDate,getdate()) -- Ran 4 times, it gives me 966, 413, 310, 306 ms
GO

-- Test the WHILE loop with counter (NO CURSOR), pull record from T1 and insert into T3
set nocount on

declare @stDate datetime
set @stDate = getdate()
declare @ctr int
set @ctr=0

while @ctr<=10000
begin
	insert into T3
	select sn, data
	from T1
	where sn = @ctr

	set @ctr = @ctr + 1
end

select Datediff(ms,@stDate,getdate()) -- Ran 4 times, it gives me: 1070, 450, 503, 423 ms
GO

--Final Cleanup
drop table T2
drop table T3
drop table T1

I ran the above code for CURSOR & WHILE loop 4 times and it gave me less execution time for CURSOR, everytime.

While using CURSORS the main thing most people miss out are the options that are available to optimize CURSOR use.
According to MS BOL they are:

DECLARE cursor_name CURSOR [ LOCAL | GLOBAL ]
     [ FORWARD_ONLY | SCROLL ]
     [ STATIC | KEYSET | DYNAMIC | FAST_FORWARD ]
     [ READ_ONLY | SCROLL_LOCKS | OPTIMISTIC ]

To know more check this link: http://msdn.microsoft.com/en-us/library/ms180169.aspx

This prove that CURSORs are more performant than WHILE loops. I’m open for your comments, thanks.

MSDN links:
http://social.msdn.microsoft.com/Forums/en-US/transactsql/thread/e67105a6-0f4a-4a12-85b9-e7e9855279e7/
http://social.msdn.microsoft.com/Forums/en-US/transactsql/thread/2978d387-fcd0-45bb-bf69-80139b6dac53
http://social.msdn.microsoft.com/Forums/en-US/transactsql/thread/ba15132e-c26d-4472-92e9-845ce2cc244d/