HOW TO: Recursive Queries Using Common Table Expressions (CTE)

CTEs are pretty standard. Recursive CTEs are pretty mainstream too. They’re great for calculations amongst a hierarchy, think manger-employee type relationship; both are employees and as such sit in the same table. To analyse this hierarchy, you would need a recurring self join. This same technique can be used to analyse non-hierarchical problems too. Microsoft have got a pretty comprehensive guide to Recursive CTEs here, if you would like to understand that first.

So, the problem: Charity shops have tins, for donations on the counters. Every day that donations are made, the total donation is emptied from the tin, and the total donation recorded against a store and a day. Looking back, we now want to know how many days it is until the next donation is made, and from then, the next and so on in each store. Then we could do some trend analysis on that data about how often stores are receiving donations.

Let’s have a look at the sort of thing we might have wanted to do before in SQL Server, and then we’ll get into how we’ll go about doing it in Exasol.

This is the setup code for SQL:

CREATE TABLE TBL_STOREDONATIONS (
    STOREID       DECIMAL(18,0),
    DONATIONDATE  DATE,
    DONATIONVALUE DECIMAL(18,2)
);

INSERT INTO CTE_EXAMPLE.DBO.TBL_STOREDONATIONS
SELECT 1, '2016-01-12', '3.00'
UNION ALL SELECT 1, '2016-01-20', '5.00'
UNION ALL SELECT 1, '2016-01-20', '4.20'
UNION ALL SELECT 2, '2016-01-14', '1.00'
UNION ALL SELECT 2, '2016-01-15', '3.40'
UNION ALL SELECT 1, '2016-01-29', '5.50'
UNION ALL SELECT 1, '2016-01-30', '2.10'
UNION ALL SELECT 3, '2016-01-17', '7.80'
UNION ALL SELECT 3, '2016-01-27', '1.00'
UNION ALL SELECT 3, '2016-01-28', '1.20'
UNION ALL SELECT 4, '2016-01-30', '5.44'
UNION ALL SELECT 1, '2016-02-03', '1.23';

Here’s the code for the Recursive CTE:

WITH Donations (StoreID, DonationDate, DonationOrder, DaysToNext)
AS
(
-- Anchor member definition
    SELECT  e.StoreID
		, e.DonationDate 
		, rnk
		, null
    FROM tbl_storedonations AS e
	INNER JOIN(
		SELECT Storeid
		, donationdate
		, DENSE_RANK() OVER (PARTITION BY storeid ORDER BY donationdate DESC) rnk
		FROM TBL_STOREDONATIONS
		)rnkd
	ON e.storeid = rnkd.storeid
	AND e.donationdate = rnkd.donationdate
	WHERE rnk =1 --get last donation    
    UNION ALL
-- Recursive member definition
    SELECT e.StoreID
		, e.DonationDate
		,rnk
		,DATEDIFF(dd, e.donationdate,d.DonationDate) 
    FROM dbo.TBL_STOREDONATIONS AS e
	INNER JOIN(
		SELECT Storeid
		, donationdate
		, DENSE_RANK() OVER (PARTITION BY storeid ORDER BY donationdate DESC) rnk
		FROM TBL_STOREDONATIONS
		)rnkd
	ON e.storeid = rnkd.storeid
	AND e.donationdate = rnkd.donationdate
    INNER JOIN Donations AS d
    ON e.StoreId = d.StoreID
	AND rnk = donationorder + 1
)
-- Statement that executes the CTE
SELECT storeid
, donationdate
, daystonext
FROM Donations
ORDER BY storeid ASC, donationdate DESC
GO

Here’s the output, exactly what we wanted.

SQL_CTE_OUTPUT

Unfortunately, this is not natively supported in Exasol, using recursive CTEs.

There is a light at the end of the tunnel though. We can build something we the same output, by using the Exasol feature, Graph search.

Here’s the Exasol setup code to this demo:

CREATE SCHEMA CTE_EXAMPLE;

CREATE TABLE TBL_STOREDONATIONS (
    STOREID       DECIMAL(18,0),
    DONATIONDATE  DATE,
    DONATIONVALUE DECIMAL(18,2)
);

INSERT INTO CTE_EXAMPLE.TBL_STOREDONATIONS
SELECT 1, '2016-01-12', '3.00'
UNION ALL SELECT 1, '2016-01-20', '5.00'
UNION ALL SELECT 1, '2016-01-20', '4.20'
UNION ALL SELECT 2, '2016-01-14', '1.00'
UNION ALL SELECT 2, '2016-01-15', '3.40'
UNION ALL SELECT 1, '2016-01-29', '5.50'
UNION ALL SELECT 1, '2016-01-30', '2.10'
UNION ALL SELECT 3, '2016-01-17', '7.80'
UNION ALL SELECT 3, '2016-01-27', '1.00'
UNION ALL SELECT 3, '2016-01-28', '1.20'
UNION ALL SELECT 4, '2016-01-30', '5.44'
UNION ALL SELECT 1, '2016-02-03', '1.23';

Here, we’re going to use the Graph Search keyword of CONNECT BY. We’re telling Exasol to join the single result set, back on itself, on the StoreId attribute and the ranking r_instore, by looking for the next instance of a donation in store. r_instore is also used to start the calculation at the last instance of a donation in a store.
This code is undoubtedly tidier and much more readable than the SQL Server equivalent.

SELECT * FROM(
SELECT 
d.StoreId
, d.DonationDate
, d.r_instore
, DAYS_BETWEEN(PRIOR donationdate, donationdate) AS DaysToNext
FROM (
SELECT distinct
StoreId
, DonationDate
, DENSE_RANK() OVER (PARTITION BY StoreId ORDER BY DonationDate desc) r_instore
FROM tbl_StoreDonations
) d

CONNECT BY
StoreId = PRIOR StoreId and
r_instore = PRIOR r_instore + 1

START WITH
r_instore = 1)DTN
ORDER BY STOREID ASC
, DONATIONDATE ASC;

Here’s the output from Exasol, again just what we needed.EXA_CTE_OUTPUT

It is important to note, that to use this functionality, you will need Graph Search as part of your Exasol licence.

SQL Relay 2016

SQLRelay

So if you’re not attending the Exasol breakfast on Friday, and data is your thing, how about SQL Relay?

Admittedly I’m a bit late on the bandwagon here. This years’ SQL Relay is already halfway through, with the Birmingham, Cardiff and Reading events already done.

Tomorrow, 5th October, is Nottingham, with the Relay wrapping up in Leeds on Friday 6th October.

I’m heading to Relay in Leeds. There’s 3 main tracks of talks, with the added bonus of a workshop track. Best of all it’s free!

My itinerary is as follows:

9.30-12.30 – Workshop on Azure Machine Learning with Amy Nicholson and Andrew Fryer from Microsoft

12.30-2.30 – Lunch / Networking (come find me!)

2.30-3.30 – DatOps Journey at Sky Betting with Andy Burgin

3.45-4.40 – Simon Whiteley discusses Diving into Data Lakes

If you know of anymore conferences free or otherwise that you think could be interesting,  get in touch and let me know.