Using CROSS APPLY & OUTER APPLY operators with UDFs, Derived tables & XML data
In my previous posts CROSS APPLY vs OUTER APPLY I discussed on the same topic, but didn’t covered much on examples. Here we will see more examples on using APPLY operators, their workarounds and performance.
>> Using APPLY operator with User Define Functions:
USE [AdventureWorks] GO -- Create an "Inline Table-valued Function" that returns Sales Details of a Contact (Customer): CREATE FUNCTION [dbo].[ufnGetSalesInfo](@ContactID int) RETURNS TABLE AS RETURN ( SELECT SalesOrderID, OrderDate, PurchaseOrderNumber, AccountNumber, CustomerID, TotalDue FROM Sales.SalesOrderHeader WHERE ContactID = @ContactID ) GO -- We can use this function with SELECT FROM statement like we do for a table: SELECT * FROM [dbo].[ufnGetSalesInfo](100)
Here, we cannot provide multiple IDs or any variable to make it work for multiple or dynamic Contacts. To return Sales details for multiple Customers we need to JOIN it with Contacts table, but JOINS are now allowed with UFDs, they only work with Tables/Views.
So to JOIN a UDF with a TABLE we need an APPLY operator, check examples below:
-- CROSS APPLY acts as INNER JOIN & will return only matching rows, returns 513 rows: SELECT con.ContactID, con.FirstName, con.LastName, con.EmailAddress, con.Phone, x.* FROM Person.Contact con CROSS APPLY [dbo].[ufnGetSalesInfo](con.ContactID) x WHERE con.ContactID <= 100 -- OUTER APPLY acts as OUTER LEFT JOIN & will return all matching & non-matching rows, returns 529 rows: SELECT con.ContactID, con.FirstName, con.LastName, con.EmailAddress, con.Phone, x.* FROM Person.Contact con OUTER APPLY [dbo].[ufnGetSalesInfo](con.ContactID) x WHERE con.ContactID <= 100 -- Final Cleanup DROP FUNCTION [dbo].[ufnGetSalesInfo] GO
>> Using APPLY operator with Derived tables:
[Scenario]: Pull upto 5 Sales Orders for each Contact (Customer):
This is not possible with JOINS, because the condition to SELECT max 5 Orders is applied after the JOIN is made, we need to add the condition before JOIN. Using APPLY operator we can do this, let’s see how:
-- With Derived Table: SELECT con.ContactID, con.FirstName, con.LastName, con.EmailAddress, con.Phone, x.* FROM Person.Contact con CROSS APPLY (SELECT TOP 5 SalesOrderID, OrderDate, PurchaseOrderNumber, AccountNumber, CustomerID, TotalDue FROM Sales.SalesOrderHeader soh WHERE soh.ContactID = con.ContactID) x WHERE con.ContactID <= 100 -- We can also write the above query by using ROW_NUMBER() function & JOINs instead of the APPLY operator: ;WITH CTE AS ( SELECT SalesOrderID, OrderDate, PurchaseOrderNumber, AccountNumber, ContactID, TotalDue, ROW_NUMBER() OVER(PARTITION BY ContactID ORDER BY ContactID, SalesOrderID) AS ROWNUM FROM Sales.SalesOrderHeader) SELECT con.ContactID, con.FirstName, con.LastName, con.EmailAddress, con.Phone, CTE.* FROM Person.Contact con JOIN CTE cte ON cte.ContactID = con.ContactID WHERE con.ContactID <= 100 AND cte.ROWNUM <= 5 GO
Both of these queries do the same task. So to check which one is more optimized we can check the cost of each query in the batch. So the cost of query using APPLY operator is 84% and query using ROW_NUMBER & JOIN is 16%. Hence the second approach is much more performant & should be considered.
>> Using APPLY operator with XML data:
APPLY operators can also we used with XML data when there is a need to query parent/child nodes, check my previous blog post on this topic.
Is it true that both of the queries that you explained in the end do the exact same task? I was not able to test both of the queries since I do not have the Person.Contact table. It seems to me that, in the first query, “SELECT TOP 5” selects 5 random “SalesOrderID, OrderDate, …” while, in the second query, “SELECT con.ContactID, con.FirstName, …, ROW_NUMBER() OVER(PARTITION BY ContactID ORDER BY ContactID, SalesOrderID)” selects the SalesOrderID’s in ascending order for each ContactID. Then, the next SELECT statement chooses the ContactID’s on the condition that the number of rows for each ContactID must be limited to 5. I believe that the first query does not care if more than 5 rows in the Sales.SalesOrderHeader table matches a row in the Person.Contact table based on their ContactID’s, but the second query does.
“but JOINS are now allowed with UFDs” – should that be “not” instead of “now”? Thanks!