Clustered Index do “NOT” guarantee Physically Ordering or Sorting of Rows

Home > Indexes, Optimization Performance, SQL DB Engine > Clustered Index do “NOT” guarantee Physically Ordering or Sorting of Rows

Clustered Index do “NOT” guarantee Physically Ordering or Sorting of Rows

June 2, 2013 Leave a comment Go to comments

Myth: Lot of SQL Developers I’ve worked with and interviewed have this misconception, that Clustered Indexes are Physically Sorted and Non-Clustered Index are Logically Sorted. When I ask them reasons, all come with their own different versions.

I discussed about this in my previous post long time back, where I mentioned as per the MSDN that “Clustered Indexes do not guarantee Physical Ordering of rows in a Table”. And today in this post I’m going to provide an example to prove it.

Clustered Index is the table itself as its Leaf-Nodes contains the Data-Pages of the table. The Data-Pages are nothing but Doubly Linked-List Data-Structures linked to their leading & preceding Data-Pages. The Data-Pages and rows in them are Ordered by the Clustering Key. Thus there can be only one Clustered Index on a table.

When a new Clustered Index is created on a table, it reorganizes the table’s Data-Pages Physically so that the rows are Logically sorted. But when a tables goes through updates, like INSERT/UPDATE/DELETE operations then the index gets fragmented and thus the Physical Ordering of Data-Pages gets lost. But still the Data-Pages of the Clustered Index or the Table are chained together by Doubly Linked-List in Logical Order.

We will see this behavior in the following example:

USE AdventureWorks2012
GO

CREATE TABLE test (i INT PRIMARY KEY not null, j VARCHAR(2000))

INSERT INTO test (i,j)
SELECT 1, REPLICATE('a',100) UNION ALL
SELECT 2, REPLICATE('a',100) UNION ALL
SELECT 3, REPLICATE('a',100) UNION ALL
SELECT 5, REPLICATE('a',100) UNION ALL
SELECT 6, REPLICATE('a',100)

SELECT * FROM test

DBCC IND(AdventureWorks2012, 'test', -1);

DBCC TRACEON(3604);
DBCC PAGE (AdventureWorks2012, 1, 22517, 1);

As you can see above we inserted 5 records in a table having PK as Clustering Key with following values: 1,2,3,5,6. We skipped the value 4.

–> Here is the Output of DBCC PAGE command execute in the end:
The DBCC PAGE provides very descriptive Output, but here I’ve just extracted the useful information we need.

OFFSET TABLE:

Row - Offset                        
4 (0x4) - 556 (0x22c)           --> 1=6
3 (0x3) - 441 (0x1b9)           --> 1=5
2 (0x2) - 326 (0x146)           --> 1=3
1 (0x1) - 211 (0xd3)            --> 1=2    
0 (0x0) - 96  (0x60)		--> 1=1

It show the 5 rows with their memory allocated address in both decimal and hexadecimal formats, which is contiguous. Thus all the rows are Physically Ordered.

Now, let’s insert the value “4” that we skipped initially:

INSERT INTO test (i,j)
SELECT 4, REPLICATE('a',100)

SELECT * FROM test

DBCC IND(AdventureWorks2012, 'test', -1);

DBCC TRACEON(3604);
DBCC PAGE (AdventureWorks2012, 1, 22517, 1);

–> Here is the new Output of DBCC PAGE command execute in the end:

OFFSET TABLE:

Row - Offset                        
5 (0x5) - 556 (0x22c)           --> 1=6
4 (0x4) - 441 (0x1b9)           --> 1=5
3 (0x3) - 671 (0x29f)           --> 1=4           <-- new row added here
2 (0x2) - 326 (0x146)           --> 1=3
1 (0x1) - 211 (0xd3)            --> 1=2    
0 (0x0) - 96  (0x60)		--> 1=1

As you can see very clearly in the output above that the memory allocation address for the new row with PK value = “4” is 671, which is not in between and greater than the address of the last row i.e. 556. The memory address of the last 2 rows is unchanged (441 & 556) and the new row is not accommodated in between as per the sort order but at the end. There is no Physical Ordering and the new row is Logically Chained with other rows and thus is Logically and Ordered, similar to the image below:

Myth Busted: Hope the above exercise clears that Clustered Indexes do not guarantee the Physical Ordering of Rows & Data Pages.

-- Final Cleanup
DROP TABLE test

Categories: Indexes, Optimization Performance, SQL DB Engine Tags: Clustered Index, Data Pages, DBCC PAGE, Doubly LinkList, Logical Order, Logical Sort, Physical Order, Physical Sort

Comments (5) Trackbacks (4) Leave a comment Trackback

Rahul Vaidya

July 2, 2013 at 7:32 am

Reply

I think you’ve confused memory address with physical disk address. With a clustered index, the data is stored on the physical disk in order according to the index, since HDDs have a physical spindle internally and benefit from having data ordered sequentially. This does not guarantee any sort of ordering for the memory address. There is no reason to reorder anything with the memory address (in this current scope) since memory is random-access, and will not gain any performance benefits from reordering.

You do, however get another benefit from clustered indexing, in that if you use a clustered index your related rows will probably end up being in the same data page. This will yield benefits for both RAM as well as disk. A good discussion about this topic is here:

http://dba.stackexchange.com/questions/8496/is-the-concept-of-a-clustered-index-in-a-db-design-sensical-when-using-ssds
- manub22
  
  July 16, 2013 at 11:45 am
  
  Reply
  
  Hi Rahul,
  Thanks for your comments. Agree with your comments and I’m sorry if you see any confusion by reading this post.
  
  I just want to make sure that people in SQL Server community thinks that Clustered indexes are Physically sorted, and I’ve seen this type of expectation in Interview questions.
  
  One thing about Disks, as they are random access, so to locate to a certain address the Spindle head would need to do a seek and thus involve some latency. But this won’t be any issue with SSDs.
Channdeep Singh

February 28, 2016 at 1:15 am

Reply

Great post. Long time doubt cleared now.
- Manoj Pandey
  
  February 29, 2016 at 12:52 pm
  
  Reply
  
  thanks @Channdeep 🙂
Pradeep Nvs

January 6, 2017 at 6:26 pm

Reply

thanks for the post