HOW TO: Get your UDFs working with Python libraries after Exasol 6 upgrade

We recently took the plunge and upgraded to Exasol V6. There’s a few changes that you can find about, listed in the upgrade notes here. Like all upgrades, I thoroughly recommend you study these notes in detail – as there can always be something to trip you up.

This happened to us, with the changed implementation of UDFs and the way libraries are stored and used. You can read all about how it used to work here, but for now let’s go through how to make your Python libraries work with your UDFs again! I’ll be splitting up the content of this blog into a couple of future ones too, as there’s quite a lot to go through!

Get Curl!

First off, you’re going to need Curl. Get it here. Extract it on your computer, and remember where you put it!

Setup Bucket Services

Next up, let’s check out the changes in ExaOperation. There’s the addition of this new tab of ExaBuckets. Head on in there, and click Add.AddBucketService

By default the HTTP port for the service bucket is 2580, if you create another service bucket after this, you’ll need to change the port number (which I need to here – so I’m using 2585). Add a Description, this will be how you access the bucket. I’ll use “Demo”.

My ExaBuckets now looks like this. It has bfsdefault – which will be the service bucket you have, by default, when you get Version 6.

bucketservices

Click on the ID of the service you just created. This will take you through to where we can create the actual bucket. Click on Add.

nobuckets

I’ve added the name “libraries”, we’ll need this later. I’ve checked Public readable, and provided read (‘xyz’) and write passwords (123) for the demo.

newbucket

Your bucket will be displayed like this.

bucketcreated

Upload library to Bucket using Curl

Ok, now let’s use the bucket we created. I’m going to be using Windows, so open up a cmd window. Change the directory to your Curl one from earlier. Let’s list out what we find…. The libraries bucket.

c:\curl>curl Your_IP_here:2585

libraries

To put something in the bucket, first download the library locally that you need, from a reputable source, like Python.

Now we’ll get it in our bucket. We use the curl put command, and use the write password we set earlier, along with the bucket name we specified (libraries). Here, I’m uploading the boto3 library.

c:\curl>curl -X PUT -T C:\boto3-1.4.7.tar.gz http://w:123@YOUR_IP_HERE:2585/libraries/boto3-1.4.7.tar.gz

Check your library is there by listing out the bucket.

c:\curl>curl YOUR_IP_HERE:2585/libraries

boto3-1.4.7.tar.gz

Use library in UDF script

So, we’ve uploaded our library to the bucket, but our udfs still don’t work! Your script needs the path specifying before the import statement will work.

udf

Exasol have added a video of how to do some of this here.

Personally, I wish they’d kept the GUI in ExaOperation (or a flavor of it), as it was way quicker to use, and this is such a faff to do. This is another distinction between just writing a Python script with a traditional import, and an Exasol specific thing to remember.

Anyway, it’s all there for you if you read it (from me or Exasol!), and a great reminder to study the upgrade notes and think about their implications.

 

HOW TO: Restart the licence node

Welcome back for the fourth instalment of the AdminSeries. Catch up on the previous posts here:

For this post we’ll be restarting the licence node.

To do this, I recommend you follow all the steps in the above posts beforehand.

That said, head over to the Nodes tab in ExaOperation. Scroll down past the nodes (if you have a few!).

restartlicence

In the License Nodes section, click Reboot. If all goes well, this will take down the page you are viewing, indeed all of ExaOperation (as it primarily resides on the licence node). You should be left with your browser’s page not found or something similar.

Again, I hope you have your patience ready. For me, this step takes around 5 minutes, in which cold sweats ensue (WTF have I done!). But it’s ok.

Refresh the page periodically until ExaOperation comes back.

This concludes Restarting the licence node.

HOW TO: Shutdown the Data Nodes

Time for the third post in the AdminSeries – how to shutdown the data nodes. This post follows the previous two: How to Shutdown the Database(s) and How to Shutdown the Storage. Before you read much further, go check them out first.

With that out the way, let’s get started.

In ExaOperation, head over to the Nodes tab. It should look something like this:nodeson

In this shot, you can see the State of the nodes is Running and they are Powered On.

Next up, select the checkboxes next to the Nodes that you want to shutdown. From the Actions drop down, select Shutdown, and then the button Execute.shutdownnodes

It may take some time for the nodes to shutdown, as with all things ExaOperation, be patient and wait for it. You can refresh the page and watch for the node state to change. Sometimes the Nodes change state, one at a time, so you may be waiting for some nodes.

Eventually your page should look like this. All of the Nodes will have the state of Installed, and Power Off. Your nodes are now Shutdown.

nodesoff

That brings us to the end of How to Shutdown Data Nodes. Next up, we’ll be looking at How to Restart the licence node.

HOW TO: Shutdown the Storage

The second post in the AdminSeries is going to be about how to Shutdown the Storage. Before you read this post, or are trying to do this, go back and make sure you have Shutdown the Database(s) first.

So, assuming that you have already Shutdown the Database(s), head over to the EXAStorage tab in EXAOperation (a lot of EXA’s going on there guys!).

It should look a little something like this:

storageon

I’ve got two databases, with three nodes each and a licence node. This means I have 6 volumes listed up top, and 7 disks from my nodes at the bottom. You may have more or less depending on your setup.

Click Shutdown Storage Service.

The page should then look like this:

exastorageoff

Here you can see the 7 nodes – 6 data nodes and 1 licence server.

Seeing the EXAStorage page look like this, means that the Storage is now stopped.

Having Shutdown the Database(s) and the Storage, we’ll move onto how to Shutdown the Nodes.

HOW TO: Shutdown the database(s)

This post is the first in a series of posts about how to administer Exasol. We’re going to be covering how to Shutdown databases, Shutdown storage, shutdown the nodes, restart the licence server, install CentOS patches and restart the whole lot. For all of the HOW TO: posts regarding administration, you will need access to ExaOperation.

Firstly, a quick disclaimer: This set of posts is based around 5.0.17, and so should be used in that context. If you are unsure about what you are doing, seek professional help. Do some of these things wrong, and you could jeopardise your data.

Ok, so without further ado…

In general, you should always shutdown the database first, before other shutdowns such as the storage etc. Obviously in the case of an unplanned system shutdown, you don’t have much choice.

So first log into ExaOperation, then head over to the EXASolution tab.

In my case, I have two databases running. I can tell that they are running because of the Status, and also that they are Online and behaving well – with the green light. Databases can be Running and Stopped separately. However, if you intend to perform further maintenance, like Shutting down all nodes or Stopping the Storage, you should stop them all.Inkeddatabasesrunning_LI

To Shutdown the databases, select the tick box next to the databases and then select Shutdown.

You can then manually refresh this page. It may take up to a minute to stop the databases.

When they are stopped the page looks like this:

Inkedstartupdbs_LI

We can tell that the databases are stopped, because they have a Status of “Created”, and the light indicating that the database is Online has gone out/grey.

So that’s it, your database has stopped.

The next post is then How to Shutdown the Storage.

HOW TO: Execute multiple commands from a file in EXASOL

In Exaplus we can execute multiple commands, by hitting the Execute All button, or by pressing CTRL + SHIFT + ENTER.

But what if we have a file of multiple commands? Should we copy the contents out and then execute them? Seems a bit of a faff.

Enter the @ command. Or also the START command, but I prefer @. A bit more catchy!

To run commands from a file in Exaplus, run the following:

@C:\folder\myfile.sql;

or if your file resides on FTP:

@ftp://my:ftp@ftp.scripts/myfile.sql;

I’ve found this command to be really handy, particularly in developing my own ‘automated deployment and comparison tool’ for Exasol. But that’s for another day!

What happens to the Identity value when you Truncate a table in Exasol?

Today was challenging at work for many reasons, but it was a good day for learning new things!

I thought I’d share with you a nugget that I learnt today.

When I truncate a table, I expect the all of the data to be permanently deleted, essentially like it was never there. For all indexes to be emptied, and any auto-incrementing constraints to be reset.

Let’s give it a go then… *simples*…no?

CREATE TABLE ID_TESTING(
ID DECIMAL(18,0) IDENTITY NOT NULL ENABLE,
A DECIMAL(18,0),
B DECIMAL(18,0)
);

I then insert 3 rows:

insert into ID_TESTING (A,B)
SELECT 1,2;
insert into ID_TESTING (A,B)
SELECT 2,2;
insert into ID_TESTING (A,B)
SELECT 3,2;

This gives me 3 rows, with an identity value of 3, which we can check by looking in one of the sys tables.

SELECT column_identity
FROM SYS.EXA_ALL_COLUMNS
WHERE column_schema = 'DUMMY'
AND column_table = 'ID_TESTING'
AND column_name = 'ID';

Finally, I truncate the table;

truncate table ID_TESTING;

So if we check the identity value this should be zero right? Using the same query as above, we can see it remains at 3.

This is obviously not ideal, if you’re loading large amounts of data, or have not set your identity column to be DECIMAL(18,0), or you’re just plain expecting a counter to start at 1. Having got in touch with Exasol about this, as it just doesn’t make sense to me, it seems it had been raised as an issue previously, logged as IDEA 177, but has been discounted. As such, it makes this behaviour, accepted.

If, like me, this doesn’t really work for you; we need a workaround…

…I’ll cover it in my next blog post!

 

HOW TO: Find out which features are covered by your Exasol licence?

Following on from the last post about how to build an Exasol equivalent of a Recursive CTE using Graph Search, you probably need to know if you have that feature available to you.

To do this, you’ll need access to ExaOperation.

In ExaOperation, head over to Configuration>Software>Licence.

licence

Here I’ve commented out the General licence info, for obvious reasons. Down the bottom, under Extra Database Features, you can see the Graph search option. Here it says ‘Unlimited’, but also valid is a date. ‘not available’ means that you do not have that functionality as part of your licence. To get features added to your licence, contact your Account Manager.

HOW TO: Recursive Queries Using Common Table Expressions (CTE)

CTEs are pretty standard. Recursive CTEs are pretty mainstream too. They’re great for calculations amongst a hierarchy, think manger-employee type relationship; both are employees and as such sit in the same table. To analyse this hierarchy, you would need a recurring self join. This same technique can be used to analyse non-hierarchical problems too. Microsoft have got a pretty comprehensive guide to Recursive CTEs here, if you would like to understand that first.

So, the problem: Charity shops have tins, for donations on the counters. Every day that donations are made, the total donation is emptied from the tin, and the total donation recorded against a store and a day. Looking back, we now want to know how many days it is until the next donation is made, and from then, the next and so on in each store. Then we could do some trend analysis on that data about how often stores are receiving donations.

Let’s have a look at the sort of thing we might have wanted to do before in SQL Server, and then we’ll get into how we’ll go about doing it in Exasol.

This is the setup code for SQL:

CREATE TABLE TBL_STOREDONATIONS (
    STOREID       DECIMAL(18,0),
    DONATIONDATE  DATE,
    DONATIONVALUE DECIMAL(18,2)
);

INSERT INTO CTE_EXAMPLE.DBO.TBL_STOREDONATIONS
SELECT 1, '2016-01-12', '3.00'
UNION ALL SELECT 1, '2016-01-20', '5.00'
UNION ALL SELECT 1, '2016-01-20', '4.20'
UNION ALL SELECT 2, '2016-01-14', '1.00'
UNION ALL SELECT 2, '2016-01-15', '3.40'
UNION ALL SELECT 1, '2016-01-29', '5.50'
UNION ALL SELECT 1, '2016-01-30', '2.10'
UNION ALL SELECT 3, '2016-01-17', '7.80'
UNION ALL SELECT 3, '2016-01-27', '1.00'
UNION ALL SELECT 3, '2016-01-28', '1.20'
UNION ALL SELECT 4, '2016-01-30', '5.44'
UNION ALL SELECT 1, '2016-02-03', '1.23';

Here’s the code for the Recursive CTE:

WITH Donations (StoreID, DonationDate, DonationOrder, DaysToNext)
AS
(
-- Anchor member definition
    SELECT  e.StoreID
		, e.DonationDate 
		, rnk
		, null
    FROM tbl_storedonations AS e
	INNER JOIN(
		SELECT Storeid
		, donationdate
		, DENSE_RANK() OVER (PARTITION BY storeid ORDER BY donationdate DESC) rnk
		FROM TBL_STOREDONATIONS
		)rnkd
	ON e.storeid = rnkd.storeid
	AND e.donationdate = rnkd.donationdate
	WHERE rnk =1 --get last donation    
    UNION ALL
-- Recursive member definition
    SELECT e.StoreID
		, e.DonationDate
		,rnk
		,DATEDIFF(dd, e.donationdate,d.DonationDate) 
    FROM dbo.TBL_STOREDONATIONS AS e
	INNER JOIN(
		SELECT Storeid
		, donationdate
		, DENSE_RANK() OVER (PARTITION BY storeid ORDER BY donationdate DESC) rnk
		FROM TBL_STOREDONATIONS
		)rnkd
	ON e.storeid = rnkd.storeid
	AND e.donationdate = rnkd.donationdate
    INNER JOIN Donations AS d
    ON e.StoreId = d.StoreID
	AND rnk = donationorder + 1
)
-- Statement that executes the CTE
SELECT storeid
, donationdate
, daystonext
FROM Donations
ORDER BY storeid ASC, donationdate DESC
GO

Here’s the output, exactly what we wanted.

SQL_CTE_OUTPUT

Unfortunately, this is not natively supported in Exasol, using recursive CTEs.

There is a light at the end of the tunnel though. We can build something we the same output, by using the Exasol feature, Graph search.

Here’s the Exasol setup code to this demo:

CREATE SCHEMA CTE_EXAMPLE;

CREATE TABLE TBL_STOREDONATIONS (
    STOREID       DECIMAL(18,0),
    DONATIONDATE  DATE,
    DONATIONVALUE DECIMAL(18,2)
);

INSERT INTO CTE_EXAMPLE.TBL_STOREDONATIONS
SELECT 1, '2016-01-12', '3.00'
UNION ALL SELECT 1, '2016-01-20', '5.00'
UNION ALL SELECT 1, '2016-01-20', '4.20'
UNION ALL SELECT 2, '2016-01-14', '1.00'
UNION ALL SELECT 2, '2016-01-15', '3.40'
UNION ALL SELECT 1, '2016-01-29', '5.50'
UNION ALL SELECT 1, '2016-01-30', '2.10'
UNION ALL SELECT 3, '2016-01-17', '7.80'
UNION ALL SELECT 3, '2016-01-27', '1.00'
UNION ALL SELECT 3, '2016-01-28', '1.20'
UNION ALL SELECT 4, '2016-01-30', '5.44'
UNION ALL SELECT 1, '2016-02-03', '1.23';

Here, we’re going to use the Graph Search keyword of CONNECT BY. We’re telling Exasol to join the single result set, back on itself, on the StoreId attribute and the ranking r_instore, by looking for the next instance of a donation in store. r_instore is also used to start the calculation at the last instance of a donation in a store.
This code is undoubtedly tidier and much more readable than the SQL Server equivalent.

SELECT * FROM(
SELECT 
d.StoreId
, d.DonationDate
, d.r_instore
, DAYS_BETWEEN(PRIOR donationdate, donationdate) AS DaysToNext
FROM (
SELECT distinct
StoreId
, DonationDate
, DENSE_RANK() OVER (PARTITION BY StoreId ORDER BY DonationDate desc) r_instore
FROM tbl_StoreDonations
) d

CONNECT BY
StoreId = PRIOR StoreId and
r_instore = PRIOR r_instore + 1

START WITH
r_instore = 1)DTN
ORDER BY STOREID ASC
, DONATIONDATE ASC;

Here’s the output from Exasol, again just what we needed.EXA_CTE_OUTPUT

It is important to note, that to use this functionality, you will need Graph Search as part of your Exasol licence.