Databricks - Environments and Upgrades

Databricks is one of the modules we offer for Data Studio. Databricks offers us a clean user interface (UI), a managed Python environment, and distributed computation with Spark. This document covers how to find out what is available in the managed Python environment and how PrecisionLender manages upgrades.

In this article

Environments

When the Databricks module is used, code is being run in a managed Python environment. It is important for users to be able to identify the managed Python environment for the purpose of knowing which Python packages are available for analytic tasks. Databricks manages its environments at the cluster level. That is, each cluster can have a unique Python environment.

In order to review the packages in the Python environment you can take the following steps:

  1. Determine what cluster your notebook is using, via the dropdown in your workspace, and then, select the Clusters icon in the menu bar to go to the cluster management page. Shows where to identify the cluster and where to select the cluster icon.
  2. Find the name of your cluster and look under the Runtime column to identify the version of the cluster Shows the cluster version under the Runtime column
  3. Review the cluster version within the Databricks documentation and review the various packages

Environment Upgrades

Because Databricks runtimes are managed by Databricks, we are beholden to the Databricks runtime lifecycle. PrecisionLender upgrades runtime clusters periodically to ensure the clusters are:

  1. Secure
  2. Featureful
  3. Supported by Databricks

Databricks has its own runtime deprecation process, which can be found in the documentation linked in the section above. PrecisionLender also has a runtime/cluster deprecation process, to help us achieve the goals listed above.

The PrecisionLender cluster/runtime deprecation process is as follows:

  • Notification
    • When PrecisionLender is ready to upgrade clusters, an email will be sent to Data Studio clients informing them that the upgrade process is beginning.
  • Upgrade Cluster Creation
    • A new cluster with a new Databricks Runtime will be created. The name of this cluster will be given in the notification email.
  • Deprecated Cluster Removal
    • The deprecated cluster will be removed two months after the initial email detailing the cluster upgrade.

Reacting to Environment Upgrades

As a Databricks user of Data Studio, it is important that you take certain actions when an upgrade occurs:

  1. As soon as possible, move your critical notebooks to the new cluster
  2. Identify any issues with cluster/notebook compatibility
  3. Fix compatibility issues by rewriting appropriate pieces of Python code to utilize the upgraded versions of any Python libraries
  4. Where your issues are difficult/impossible to resolve, contact PrecisionLender for support

It is important that you react to a cluster upgrade event early, so that you have plenty of time to identify and remedy cluster/notebook integration issues.