Big Data

Secure your workspaces with new platform security controls for Databricks on Google Cloud


We are excited to announce the general availability (GA) of several key security features for Databricks on Google Cloud:

  • Private connectivity with Private Service Connect (PSC)
  • Customer-managed encryption keys
  • IP access lists for Account console and API access

At Databricks, we recognize that data is your most valuable asset. With the GA of these critical security capabilities, you can protect your data at rest, keep your data private, and mitigate data exfiltration risks on the Databricks Lakehouse Platform.

In this blog, we will address commonly asked security questions and walk you through the new security features and capabilities that are now generally available on Google Cloud.

End-to-end private workspaces with Private Service Connect

Most enterprise customers want to ensure that their users and workloads can process their security data in a private and isolated environment. With Databricks, you can secure the network perimeter and configure end-to-end private connectivity with the customer-managed virtual private cloud (VPC) and Private Service Connect (PSC). This includes:

  1. The ability to privately connect to the Databricks web application and APIs from a client. Databricks provides the ability to limit access to a Workspace to only authorized VPC endpoints and public IP addresses.
  2. The ability to privately connect to the Databricks compute resources in a customer-managed VPC (the data plane) to the Databricks workspace core services (the control plane).

Now in Limited Availability with GA-level functionality, Private Service Connect can now be leveraged by Google Cloud customers for their Databricks workspaces with the recommendation for production use, full support, and SLAs. A PSC-enabled private workspace helps you mitigate several data exfiltration risks, such as access from unauthorized networks using leaked credentials or exposure of data on the internet.

Our recent Databricks on Google Cloud Security Best Practices blog explains how you can isolate your Databricks environment and secure your data using capabilities such as customer-managed VPCs, Private Service Connect and IP ACLs.

Databricks on Google Cloud Security

Protect your data at rest with customer-managed keys

Databricks encrypts all data at rest by default within our managed services. For added control and visibility, several enterprise customers also need the ability to protect their data with encryption keys managed by them in Cloud KMS.

Now generally available on Google Cloud, Databricks customer-managed keys for encryption feature enables you to bring your own encryption keys to protect data at rest in Databricks managed services and workspace storage:

  • Customer-managed keys for managed services: Managed services data in the Databricks control plane is encrypted at rest. You can add a customer-managed key for managed services to help protect and control access to the following types of encrypted data:
    • Notebook source files that are stored in the control plane
    • Notebook results for notebooks that are stored in the control plane
    • Secrets stored by the secret manager APIs
    • Databricks SQL queries and query history
    • Personal access tokens or other credentials used to set up Git integration with Databricks Repos
    Databricks Repos
  • Customer-managed keys for workspace storage: Databricks also supports configuring customer-managed keys for workspace storage to help protect and control access to data. You can configure your own key to encrypt the data on the GCS bucket associated with the Google Cloud project that you specified when you created your workspace. The same key is also used to encrypt your cluster’s GCE persistent disks.
Customer Managed keys for workspace storage

Secure your network perimeter with IP access lists

IP access lists (IP ACLs) allow you to control the networks permitted to access your Databricks resources over the internet. IP ACLs help you reduce the risk of unauthorized access using stolen credentials and meet compliance requirements. For example, specific industries and regulatory frameworks require organizations to restrict access to data or applications based on geographical locations or specific IPs.

There are two types of IP ACLs on Databricks now generally available on Google Cloud:

  • IP access lists for workspaces allow you to configure Databricks workspaces so that users and clients only connect to the service from approved corporate networks or a set of approved IP addresses.
  • IP access lists for the account console allow account owners and accounts admin to connect to the account console UI and account-level REST APIs, such as the Account API only through existing corporate networks with a secure perimeter and a set of approved IP addresses. Account owners and admins can use an account console UI or a REST API to configure allowed and blocked IP addresses and subnets
IP Access List

Getting Started with Private Service Connect, CMK, and IP ACLs on Databricks on Google Cloud

Private Service Connect, customer-managed keys, and IP ACLs are available on the Premium Tier of Google Cloud. For step-by-step instructions on configuring these features for your Databricks workspaces, refer to our documentation (Private Service Connect | CMK | IP ACLs). Please note that Databricks support for private connectivity using Private Service Connect (PSC) is in Limited Availability, with GA-level functionality. Contact your Databricks representative to request access.

Please visit our Security and Trust Center for more information about Databricks security practices and features available to customers.