Archive
Enhanced OVER() Clause in SQL Server 2012 – Part2
First of all I would like to wish my readers a very Happy and Prosperous New Year – 2013!!!
This is my 1st post in this new year and I will try to come up with more informative and interesting topics related to SQL Server.
Coming back to the topic, sometime back in Part-1 [link] I discussed about how to use both PARTITION BY & ORDER BY clause within OVER() clause when using it with AGGREGATE functions, like SUM().
Here in this post we will see other features that are added to the OVER() clause.
ROWS/RANGE are new the 2 new clauses that you can use inside the OVER() clause to limit the records within the Partition by specifying start and end within the partition. They use a new set of keywords to define the boundaries within the partition, i.e. PREECEDING, FOLLOWING mixed with UNBOUNDED or CURRENT, or number of rows. The PRECEDING and FOLLOWING rows are defined based on the ordering in the ORDER BY clause of the query.
#1. Using ROWS/RANGE UNBOUNDED PRECEDING with OVER() clause:
Let’s check with 1st example where I want to calculate Cumulative Totals or Running Totals at each row. This total is calculated by the SUM of current and all previous rows.
We will use UNBOUND PRECEDING option here, which means that the window starts from the 1st row till the current row.
USE [AdventureWorks2012] GO -- To Calculate Cumulative SUM or Running Totals: ;WITH CTE AS ( SELECT BusinessEntityID AS SalesPersonID, CAST([Rate] AS DECIMAL(10,0))AS Salary, [ModifiedDate] AS SalDate FROM [HumanResources].[EmployeePayHistory] WHERE BusinessEntityID <= 10 ) SELECT SalesPersonID, SalDate, Salary ,SUM(Salary) OVER (ORDER BY SalesPersonID ROWS UNBOUNDED PRECEDING) AS CumulativeSumByRows ,SUM(Salary) OVER (ORDER BY SalesPersonID RANGE UNBOUNDED PRECEDING) AS CumulativeSumByRange FROM CTE ORDER BY SalesPersonID, SalDate
In the Output above you can see the was the SUM is calculated, as we go down all previous and current rows are getting summed at all levels.
Here you can also see a difference between the way ROW & RANGE are calculated, for SalesPersonID=4:
– ROWS shows different and continuous calculation at different levels.
– But RANGE shows same calculation of all 3 records at different levels.
#2. Using ROWS/RANGE CURRENT ROW & UNBOUNDED FOLLOWING with OVER() clause:
Now, let’s say you’ve to do the same calculation, but in reverse order. So, here we will use BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING options to define our Window. This means the window starts from current row and ends at the last row.
-- To Calculate Cumulative SUM or Running Totals, but in REVERSE order: ;WITH CTE AS ( SELECT BusinessEntityID AS SalesPersonID, CAST([Rate] AS DECIMAL(10,0))AS Salary, [ModifiedDate] AS SalDate FROM [HumanResources].[EmployeePayHistory] WHERE BusinessEntityID <= 10 ) SELECT SalesPersonID, SalDate, Salary ,SUM(Salary) OVER (ORDER BY SalesPersonID ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS ReverseCumulativeSumByRows ,SUM(Salary) OVER (ORDER BY SalesPersonID RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS ReverseCumulativeSumByRange FROM CTE ORDER BY SalesPersonID, SalDate
In the Output above we can see that the Running totals are now in reverse order, compared to the 1st example.
Also, the ROWS & RANGE difference can be seen for the SalesPersonID=4 similar to the 1st example.
#3. Using ROWS/RANGE PRECEDING & FOLLOWING with OVER() clause, without UNBOUNDED:
Now, if you want to calculate the SUM only of those values that are just adjacent to a current row. Here we will use combination of row numbers with CURRENT/PRECEDING & FOLLOWING keywords.
Let’s us check this by following example:
-- To Calculate Moving SUM, by taking 3 moving rows: ;WITH CTE AS ( SELECT BusinessEntityID AS SalesPersonID, CAST([Rate] AS DECIMAL(10,0))AS Salary, [ModifiedDate] AS SalDate FROM [HumanResources].[EmployeePayHistory] WHERE BusinessEntityID <= 10 ) SELECT SalesPersonID, SalDate, Salary ,SUM(Salary) OVER (ORDER BY SalesPersonID ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS MovingSumByRowsPrevCurrentNext ,SUM(Salary) OVER (ORDER BY SalesPersonID ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING) AS MovingSumByRowsCurrentNext2 FROM CTE ORDER BY SalesPersonID, SalDate
In the output above for:
– Column MovingSumByRowsPrevCurrentNext: it calculates the SUM of just one Previous, Current & one Next row.
– Column MovingSumByRowsCurrentNext2: it calculates the SUM of Current and next two rows.
So, by above 3 examples we saw how to use new ROWS & RANGE clauses with a mix of UNBOUNDED PRECEDING, UNBOUNDED FOLLOWING, CURRENT ROW, etc. Similarly there could be many different cases and scenarios where we can use there clauses and options.
I will try to post more on these features as I see anything new and exciting around this.
For more info on OVER() Clause check MS BOL: http://msdn.microsoft.com/en-us/library/ms189461.aspx
Using OVER and PARTION BY clauses with Aggregate Functions in SQL Server
In my [previous post] we learned about MS SQL Server 2005 Ranking functions.
Here we’ll see more scenarios where they can be used. We can use them with CTE i.e. Common Table Expressions, again a new gem with ver. 2005.
with myCTE as ( select row_number() over(partition by class order by marks desc) as [RowNumber], class, marks, stuName from #tempTable) select * from myCTE where RowNumber < 5
This gives following result:
row_num class marks name
1 A 90 pooja
2 A 90 saurabh
3 A 80 kanchan
4 A 80 manoj
1 B 90 paras
2 B 80 dinesh
3 B 70 hema
4 B 70 kapil
A CTE (Common Table Expression) can be thought of as a temporary result set that is defined within the execution scope of a single SELECT, INSERT, UPDATE, DELETE, or CREATE VIEW statement. A CTE is similar to a derived table in that it is not stored as an object and lasts only for the duration of the query. Unlike a derived table, a CTE can be self-referencing and can be referenced multiple times in the same query.
More on CTE: http://msdn.microsoft.com/en-us/library/ms190766.aspx
The above mentioned PARTITION BY clause can also be used with SQL aggregate functions, like: AVG, COUNT, MIN, MAX, etc.
Here is an example:
select class, stuName, marks, AVG(marks) over(partition by class) AS [avg], SUM(marks) over(partition by class) AS [sum], MIN(marks) over(partition by class) AS [max], MAX(marks) over(partition by class) AS [min], COUNT(marks) over(partition by class) AS [count], STDEV(marks) over(partition by class) AS [ST_Dev], VAR(marks) over(partition by class) AS [VAR] from #tempTable
This gives following result:
class name marks AVG SUM MIN MAX COUNT STDEV VAR
A anita 50 76 460 50 90 6 15.0554 226.6667
A harish 70 76 460 50 90 6 15.0554 226.6667
A kanchan 80 76 460 50 90 6 15.0554 226.6667
A manoj 80 76 460 50 90 6 15.0554 226.6667
A pooja 90 76 460 50 90 6 15.0554 226.6667
A saurabh 90 76 460 50 90 6 15.0554 226.6667
B kamar 50 67 470 50 90 7 14.9603 223.8095
B lalit 50 67 470 50 90 7 14.9603 223.8095
B nitin 60 67 470 50 90 7 14.9603 223.8095
B hema 70 67 470 50 90 7 14.9603 223.8095
B kapil 70 67 470 50 90 7 14.9603 223.8095
B dinesh 80 67 470 50 90 7 14.9603 223.8095
B paras 90 67 470 50 90 7 14.9603 223.8095
I’ve rounded the last 2 column values to 4 decimal places.
Ranking Functions | ROW_NUMBER, RANK, DENSE_RANK, NTILE
SQL Server 2005 provides functionality for using Ranking Functions with your result set. One can select a number of Ranking algorithms which are applied to a column of your table that you want to classify in a scope of your executing query. This feature is Dynamic and upon change of data (addition or removal of rows) it gives desired results the next time query is run.
–> Its 4 gems are:
1. ROW_NUMBER: Returns the sequential number of a row within a partition of a result set, starting at 1 for the first row in each partition.
Syntax:
ROW_NUMBER() OVER ( [ < partition_by_clause > ] < order_by_clause > )
2. RANK: Returns the rank of each row within the partition of a result set. The rank of a row is one plus the number of ranks that come before the row in question.
Syntax:
RANK() OVER ( [ < partition_by_clause > ] < order_by_clause > )
3. DENSE_RANK: Returns the rank of rows within the partition of a result set, without any gaps in the ranking. The rank of a row is one plus the number of distinct ranks that come before the row in question.
Syntax:
DENSE_RANK() OVER( [ ] < order_by_clause > )
4. NTILE: Distributes the rows in an ordered partition into a specified number of groups. The groups are numbered, starting at one. For each row, NTILE returns the number of the group to which the row belongs.
Syntax:
NTILE(integer_expression) OVER( [ ] < order_by_clause > )
–> Now lets take an example, simple one of a class of students, their marks & class:
select 'A' [class], 80 [marks], 'manoj' stuName into #tempTable UNION select 'A', 70 [marks],'harish' stuName UNION select 'A', 80 [marks],'kanchan' stuName UNION select 'A', 90 [marks],'pooja' stuName UNION select 'A', 90 [marks],'saurabh' stuName UNION select 'A', 50 [marks],'anita' stuName UNION select 'B', 60 [marks],'nitin' stuName UNION select 'B', 50 [marks],'kamar' stuName UNION select 'B', 80 [marks],'dinesh' stuName UNION select 'B', 90 [marks],'paras' stuName UNION select 'B', 50 [marks],'lalit' stuName UNION select 'B', 70 [marks],'hema' stuName select * from #tempTable
Now on selection this gives you:
class marks name
A 50 anita
A 70 harish
A 80 kanchan
A 80 manoj
A 90 pooja
A 90 saurabh
B 50 kamar
B 50 lalit
B 60 nitin
B 70 hema
B 80 dinesh
B 90 paras
–> The following query shows you how each function works:
select marks, stuName, ROW_NUMBER() over(order by marks desc) as [RowNum], RANK() over(order by marks desc) as [Rank], DENSE_RANK() over(order by marks desc) as [DenseRank], NTILE(3) over(order by marks desc) as [nTile] from #tempTable
Result:
marks stuName RowNum Rank DenseRank nTile
90 pooja 1 1 1 1
90 saurabh 2 1 1 1
90 paras 3 1 1 1
80 dinesh 4 4 2 1
80 kanchan 5 4 2 2
80 manoj 6 4 2 2
70 harish 7 7 3 2
70 hema 8 7 3 2
60 nitin 9 9 4 3
50 anita 10 10 5 3
50 kamar 11 10 5 3
50 lalit 12 10 5 3
–> Here:
– RowNum column lists unique ID’s of students, like Roll Numbers.
– Rank lists student rank with equal ranks those secured equal marks, thus there is no 2nd or 3rd.
– DenseRank lists student ranks with no gaps, so here 3 students came 1st &2nd and only 2 3rd.
– nTile listed students in different but equal groups, can be thought of as different sections.
–> Now, lets use the PARTITION BY option, its same as group by clause. Lets group/partition the students group by their classes A&B:
select class, marks, stuName, ROW_NUMBER() over(partition by class order by marks desc) as [RowNum], RANK() over(partition by class order by marks desc) as [Rank], DENSE_RANK() over(partition by class order by marks desc) as [DenseRank], NTILE(3) over(partition by class order by marks desc) as [nTile] from #tempTable
Result:
class marks stuName RowNum Rank DenseRank nTile
A 90 pooja 1 1 1 1
A 90 saurabh 2 1 1 1
A 80 kanchan 3 3 2 2
A 80 manoj 4 3 2 2
A 70 harish 5 5 3 3
A 50 anita 6 6 4 3
B 90 paras 1 1 1 1
B 80 dinesh 2 2 2 1
B 70 hema 3 3 3 2
B 60 nitin 4 4 4 2
B 50 kamar 5 5 5 3
B 50 lalit 6 5 5 3
You can clearly see that our query has grouped students in 2 Partitions (classes) and then Ranked them.
In my [next post] check how to use OVER Clause & Partition By option with Aggregate functions like, SUM, AVG, MIN, MAX, etc.
–> Check the same demo on YouTube: