SQL Basics – Difference between WHERE, GROUP BY and HAVING clause

Home > Differences, SQL Basics > SQL Basics – Difference between WHERE, GROUP BY and HAVING clause

SQL Basics – Difference between WHERE, GROUP BY and HAVING clause

May 23, 2015 Leave a comment Go to comments

All these three Clauses are a part/extensions of a SQL Query, are used to Filter, Group & re-Filter rows returned by a Query respectively, and are optional. Being Optional they play very crucial role while Querying a database.

–> Here is the logical sequence of execution of these clauses:

1. WHERE clause specifies search conditions for the rows returned by the Query and limits rows to a meaningful set.

2. GROUP BY clause works on the rows returned by the previous step #1. This clause summaries identical rows into a single/distinct group and returns a single row with the summary for each group, by using appropriate Aggregate function in the SELECT list, like COUNT(), SUM(), MIN(), MAX(), AVG(), etc.

3. HAVING clause works as a Filter on top of the Grouped rows returned by the previous step #2. This clause cannot be replaced by a WHERE clause and vice-versa.

As these clauses are optional thus a minimal SQL Query looks like this:

SELECT *
FROM [Sales].[SalesOrderHeader]

This Query returns around 32k (thousand) rows form SalesOrderHeader table. Thus, if somebody wants to do some analysis on this big row-set it would be very difficult and time consuming for him.

–> Use Case: Let’s say a Sales department wants to get a list of such Customers who bought more number of items last year, so that they can sell more some stuff to them this year. How they will go ahead?

1. Using WHERE clause: First of all they will need to apply filter on above ~32k rows and get list of Orders that were made last year (i.e. in 2014) to limit the row-set, like:

SELECT *
FROM [Sales].[SalesOrderHeader]
WHERE OrderDate >= '2014-01-01 00:00:00.000'
AND OrderDate < '2015-01-01 00:00:00.000'

This Query still gives ~12k records and its still difficult to identify such Customers who have more orders.

2. Using GROUP BY clause: Here we need to group the Customers with their number of Orders, like:

SELECT CustomerID, COUNT(*) AS OrderNos
FROM [Sales].[SalesOrderHeader]
WHERE OrderDate >= '2014-01-01 00:00:00.000'
AND OrderDate < '2015-01-01 00:00:00.000'
GROUP BY CustomerID

This query still returns ~10k records, and I’ve go through the entire list of records to identify such records. Is there any way where I can still filter out the unwanted records with lesser count?

3. USING HAVING clause: This will works on top of GROUP BY clause to filter the grouped records onCOUNT(*) AS OrderNos column values (like a WHERE clause), like:

SELECT CustomerID, COUNT(*) AS OrderNos
FROM [Sales].[SalesOrderHeader]
WHERE OrderDate >= '2014-01-01 00:00:00.000'
AND OrderDate < '2015-01-01 00:00:00.000'
GROUP BY CustomerID
HAVING COUNT(*) > 10

Thus, by using all these these clauses we can reduce and narrow down the row-set to do some quick analysis.

Check this video tutorial on WHERE clause and difference with GROUP BY & HAVING clause.

Categories: Differences, SQL Basics Tags: GROUP BY, GROUP BY vs HAVING, HAVING, WHERE

Comments (4) Trackbacks (2) Leave a comment Trackback

subhransu

December 23, 2016 at 9:18 am

Reply

Nice article.
Imran

January 17, 2017 at 3:37 pm

Reply

Nice blog Manoj.Can you also explain the step by step procedure when a sql statement is executed
- Manoj Pandey
  
  January 17, 2017 at 4:04 pm
  
  Reply
  
  you can check this blog to know internals of SQL Query execution steps: https://www.simple-talk.com/sql/sql-training/the-sql-server-query-optimizer/
Rob

October 10, 2017 at 8:23 pm

Reply

Does the HAVING clause make use of indexes of the underlying table or is it actually performed on the (materialised) data from the GROUP BY output?