Databricks is one of the modules we offer for Data Studio. Databricks offers us a clean user interface (UI), a managed Python environment, and distributed computation with Spark. This document covers how to find out what is available in the managed Python environment and how PrecisionLender manages upgrades.
In this article
Environments
When the Databricks module is used, code is being run in a managed Python environment. It is important for users to be able to identify the managed Python environment for the purpose of knowing which Python packages are available for analytic tasks. Databricks manages its environments at the cluster level. That is, each cluster can have a unique Python environment.
In order to review the packages in the Python environment you can take the following steps:
- Determine what cluster your notebook is using, via the dropdown in your workspace, and then, select the Clusters icon in the menu bar to go to the cluster management page.
- Find the name of your cluster and look under the Runtime column to identify the version of the cluster
- Review the cluster version within the Databricks documentation and review the various packages
Environment Upgrades
Because Databricks runtimes are managed by Databricks, we are beholden to the Databricks runtime lifecycle. PrecisionLender upgrades runtime clusters periodically to ensure the clusters are:
- Secure
- Featureful
- Supported by Databricks
Databricks has its own runtime deprecation process, which can be found in the documentation linked in the section above. PrecisionLender also has a runtime/cluster deprecation process, to help us achieve the goals listed above.
The PrecisionLender cluster/runtime deprecation process is as follows:
- Notification
- When PrecisionLender is ready to upgrade clusters, an email will be sent to Data Studio clients informing them that the upgrade process is beginning.
- Upgrade Cluster Creation
- A new cluster with a new Databricks Runtime will be created. The name of this cluster will be given in the notification email.
- Deprecated Cluster Removal
- The deprecated cluster will be removed two months after the initial email detailing the cluster upgrade.
Reacting to Environment Upgrades
As a Databricks user of Data Studio, it is important that you take certain actions when an upgrade occurs:
- As soon as possible, move your critical notebooks to the new cluster
- Identify any issues with cluster/notebook compatibility
- Fix compatibility issues by rewriting appropriate pieces of Python code to utilize the upgraded versions of any Python libraries
- Where your issues are difficult/impossible to resolve, contact PrecisionLender for support
It is important that you react to a cluster upgrade event early, so that you have plenty of time to identify and remedy cluster/notebook integration issues.