So what is ExaBucketFS?
ExaBucketFS is essentially just a file system or store within the cluster. Files are automatically replicated across all nodes.
There are two main use cases (that I know of) to use ExaBucketFS. Firstly, it is as a repository for our in-house and third party libraries (e.g Python, Java, R etc). Secondly, it allows us to store binary data, such as trained statistical models; where the Exasol database cannot traditionally store binary data.
Previously, before version 6, the Exasol cluster required access to the internet to be able to get the libraries used by UDFs. For custom libraries, you would have had to setup your own repository, and again be able to access it over the internet.
ExaBucketFS does require some administration though.
To use ExaBucketFS there is an API to allow you to put/get/remove files into the cluster. You can do this using curl. In the last post, I discussed how to use Curl to upload files to the bucket. The files added to the bucket are then accessible for use within UDFs.
Buckets can be password protected for reading or writing, or left public, although this to me seems like a purely DBA type task – get a password on them!
There is also the following do’s and don’ts for ExaBucketFS:
- Ensure that you don’t write to buckets concurrently – Buckets are non-transactional.
- Buckets and files do not get backed up – so you need to make sure you have this backed up somewhere else (by your own means!)
- Don’t use ExaBucket as storage, as there is 100% replication across all nodes; meaning that if you store a file once, it will be replicated on every node – consuming your disk space.
Exasol’s knowledgeable Mathias Brink describes ExaBucketFS in one of the Exasol videos here.