Advertisements

Archive

Posts Tagged ‘GROUP BY’

SQL Basics – Difference between WHERE, GROUP BY and HAVING clause

May 23, 2015 6 comments

All these three Clauses are a part/extensions of a SQL Query, are used to Filter, Group & re-Filter rows returned by a Query respectively, and are optional. Being Optional they play very crucial role while Querying a database.

–> Here is the logical sequence of execution of these clauses:

1. WHERE clause specifies search conditions for the rows returned by the Query and limits rows to a meaningful set.

2. GROUP BY clause works on the rows returned by the previous step #1. This clause summaries identical rows into a single/distinct group and returns a single row with the summary for each group, by using appropriate Aggregate function in the SELECT list, like COUNT(), SUM(), MIN(), MAX(), AVG(), etc.

3. HAVING clause works as a Filter on top of the Grouped rows returned by the previous step #2. This clause cannot be replaced by a WHERE clause and vice-versa.

As these clauses are optional thus a minimal SQL Query looks like this:

SELECT *
FROM [Sales].[SalesOrderHeader]

This Query returns around 32k (thousand) rows form SalesOrderHeader table. Thus, if somebody wants to do some analysis on this big row-set it would be very difficult and time consuming for him.

–> Use Case: Let’s say a Sales department wants to get a list of such Customers who bought more number of items last year, so that they can sell more some stuff to them this year. How they will go ahead?

1. Using WHERE clause: First of all they will need to apply filter on above ~32k rows and get list of Orders that were made last year (i.e. in 2014) to limit the row-set, like:

SELECT *
FROM [Sales].[SalesOrderHeader]
WHERE OrderDate >= '2014-01-01 00:00:00.000'
AND OrderDate < '2015-01-01 00:00:00.000'

This Query still gives ~12k records and its still difficult to identify such Customers who have more orders.

2. Using GROUP BY clause: Here we need to group the Customers with their number of Orders, like:

SELECT CustomerID, COUNT(*) AS OrderNos
FROM [Sales].[SalesOrderHeader]
WHERE OrderDate >= '2014-01-01 00:00:00.000'
AND OrderDate < '2015-01-01 00:00:00.000'
GROUP BY CustomerID

GROUP_BY clause
This query still returns ~10k records, and I’ve go through the entire list of records to identify such records. Is there any way where I can still filter out the unwanted records with lesser count?

3. USING HAVING clause: This will works on top of GROUP BY clause to filter the grouped records onCOUNT(*) AS OrderNos column values (like a WHERE clause), like:

SELECT CustomerID, COUNT(*) AS OrderNos
FROM [Sales].[SalesOrderHeader]
WHERE OrderDate >= '2014-01-01 00:00:00.000'
AND OrderDate < '2015-01-01 00:00:00.000'
GROUP BY CustomerID
HAVING COUNT(*) > 10

HAVING clause

Thus, by using all these these clauses we can reduce and narrow down the row-set to do some quick analysis.

Check this video tutorial on WHERE clause and difference with GROUP BY & HAVING clause.


Advertisements

GROUPING SETS equivalent | for SQL Server 2005 and below

March 23, 2015 4 comments

Sometime back I discussed about [GROUPING SETS] in my previous post and today one reader of this blog asked me how we can create the same result-set without using GROUPING SETS if we are on previous versions like SQL Server 2005 or 2000.
 

–> Let’s take the following SQL Query that uses GROUPING SETS operator and let’s see what Output we get:

-- GROUPING SETS
SELECT class, section, rollno, sum(marks) [sum]
FROM #tempTable
GROUP BY GROUPING SETS (
	 (class, section, rollno)
	,(class, section)
	,(class)
	,()
)
Output:
class		section	rollno	sum
HighSchool	a	1	80
HighSchool	a	2	70
HighSchool	a	3	80
HighSchool	a	NULL	230
HighSchool	b	4	90
HighSchool	b	5	90
HighSchool	b	6	50
HighSchool	b	NULL	230
HighSchool	NULL	NULL	460
Intermediate	a	1	60
Intermediate	a	2	50
Intermediate	a	3	80
Intermediate	a	NULL	190
Intermediate	b	4	90
Intermediate	b	5	50
Intermediate	b	6	70
Intermediate	b	NULL	210
Intermediate	NULL	NULL	400
NULL		NULL	NULL	860

You can refer to the DDL and population of this #tempTable in my previous post, [link].
 

–> And now let’s see how can we recreate the same result set without using GROUPING SETS if we are on lower versions of SQL Server (<= 2005):

-- ROLLUP and Grouping Sets Equivalent (pre SQL 2008)
SELECT class, section, rollno, marks as [sum]
FROM #tempTable

UNION ALL

SELECT class, section, null as rollnu, sum(marks) [sum]
FROM #tempTable
GROUP BY class, section

UNION ALL

SELECT class, null, null, sum(marks) [sum]
FROM #tempTable
GROUP BY class

UNION ALL

SELECT null, null, null, sum(marks) [sum]
FROM #tempTable

ORDER BY class DESC, section DESC, rollno DESC

 

Thanks, please let me know your thoughts and comments!!!


Categories: SQL Tips Tags: , , ,

Using GROUPING SETS | SQL Server 2008 and above

November 20, 2014 2 comments

In my previous posts (long back) I discussed about using [CUBE and ROLLUP] operators introduced in SQL Server 2008. I also discussed about COMPUTE & COMPUTE BY operators, but there two were removed from the SQL Server 2012 version.

Here in this post I’ll talk about GROUPING SETS (few days back my friend asked me about this and I thought its better to document this here).

Just like CUBE and ROLLUP, GROUPING SETS are used to GROUP and calculate Sub Totals and Totals within a set of records. Compared to ROLLUP with GROUPING SETS you have more flexibility to show Sub Totals on selected columns.

-- ROLLUP
SELECT class, section, rollno, sum(marks) [sum]
FROM #tempTable
GROUP BY class, section, rollno WITH ROLLUP
Output:
class		section	rollno	sum
HighSchool	a	1	80
HighSchool	a	2	70
HighSchool	a	3	80
HighSchool	a	NULL	230
HighSchool	b	4	90
HighSchool	b	5	90
HighSchool	b	6	50
HighSchool	b	NULL	230
HighSchool	NULL	NULL	460
Intermediate	a	1	60
Intermediate	a	2	50
Intermediate	a	3	80
Intermediate	a	NULL	190
Intermediate	b	4	90
Intermediate	b	5	50
Intermediate	b	6	70
Intermediate	b	NULL	210
Intermediate	NULL	NULL	400
NULL		NULL	NULL	860

 

–> Now, if you want to just Sub Total on Class level, you just need to use GROUPING SET operator like this:

-- GROUPING SETS on selective columns
SELECT class, section, rollno, sum(marks) [sum]
FROM #tempTable
GROUP BY GROUPING SETS (
	 (class, section, rollno)
	,(class)
)

This will give you more controlled Sub Totaling and grouping on selected columns:

Output:
class		section	rollno	sum
HighSchool	a	1	80
HighSchool	a	2	70
HighSchool	a	3	80
HighSchool	b	4	90
HighSchool	b	5	90
HighSchool	b	6	50
HighSchool	NULL	NULL	460
Intermediate	a	1	60
Intermediate	a	2	50
Intermediate	a	3	80
Intermediate	b	4	90
Intermediate	b	5	50
Intermediate	b	6	70
Intermediate	NULL	NULL	400

 

–> You can also get the same output like ROLLUP operator, but you have to provide columns in following groups:

-- GROUPING SETS
SELECT class, section, rollno, sum(marks) [sum]
FROM #tempTable
GROUP BY GROUPING SETS (
	 (class, section, rollno)
	,(class, section)
	,(class)
	,()
)

Note: You can also use GROUPING SETS with combination of ROLUP and/or CUBE operators, check more on regarding this on MS BoL, here.


 

–> Grouping Sets equivalent for SQL Server 2005 and below, check here.