Posts Tagged ‘Doubly-linked list’

Clustered vs NonClustered Indexes… and data sorting in SQL Server

March 2, 2011 3 comments

This post on difference between Clustered Index & Non Clustered Index is a prequel to my blog post on requirement & use of Clustered index & NonClustered index, [link].

As per MS BOL (MSDN) in SQL Server, indexes are organized as B-trees. Each page in an index B-tree is called an index node. The top node of the B-tree is called the root node. The bottom level of nodes in the index is called the leaf nodes. Any index levels between the root and the leaf nodes are collectively known as intermediate levels.

In Clustered index: (MSDN)

1. The Leaf nodes contain the Data pages of the underlying table.

2. The Root and Intermediate level nodes contain Index pages holding Index rows.

3. Each Index row contains a Key value and a pointer to either an Intermediate level page in the B-tree, or a Data row in the Leaf level of the Index. The Pages in each level of the Index are linked in a Doubly-linked list.

4. The Pages in the Data chain and the rows in them are ordered on the value of the Clustered Index key.

5. There can be only one Clustered Index on a table.

6. Does not support Included column, because they already contain all the columns which are not in the index as Included columns.

In NonClustered Index: (MSDN)

1. The Leaf layer of a NonClustered index is made up of Index pages instead of Data pages.

2. Each Index row in the NonClustered index contains the NonClustered Key value and a row locator. This locator points to the Data row in the Clustered index or Heap having the Key value.

2.a. If the table is a Heap, which means it does not have a Clustered index, the row locator is a pointer to the row.

2.b. If the table has a Clustered index, or the index is on an Indexed view, the row locator is the Clustered index Key for the row. SQL Server retrieves the data row by searching the Clustered index using the Clustered index Key stored in the Leaf row of the NonClustered index.

3. The Data rows of the underlying table are not sorted and stored in order based on their nonclustered keys.

4. Each table can have up to 249 & 999 nonclustered indexes on SQL Server 2005 & 2008 respectively.

Indexing & data Sorting: (MSDN)

As per MS BOL, a Clustered Index only reorganizes the data pages so that the rows are logically sorted in Clustered Index order. The pages are not guaranteed to be ordered physically. SQL Server doesn’t necessarily store the data physically on the disk in clustered-index order, but while creating an index, SQL Server attempts to physically order the data as close to the logical order as possible. Each page in an index’s leaf level has a pointer to the page that logically precedes the current page and to the page that logically follows the current page, thereby creating a doubly linked list. The sysindexes table contains the address of the first leaf-level page. Because the data is guaranteed to be logically in clustered-index order, SQL Server can just start at the first page and follow the index pointers from one page to the next to retrieve the data in order.

So its not guaranteed about the physical ordering of records/rows if a table has Clustered Index on it. It is a common misconsecption among people that Clustered Index sorts data physically & Non Clustered Index sorts data logically.

Also discussed this topic on MSDN’s TSQL forum and got several expert comments & answers:

More Info on ordering: