Data Analytics Tools Permisson Reference

Data Analytics Tools Permisson Reference

In a previous article I mentioned that securing the permissions for your data lake analytics tools in addition to the data lake itself. This is just a short reference article to highlight various use cases in two popular tools: Azure Synapse Analytics and Azure Databricks.

(I apologize for the format - LinkedIn today doesn't allow tables. This article is organized by access layer, then by use case, with descriptions of the roles or permission you need to accomplish the use case, plus where you'll find those settings.)

Table of Contents

  • Storage: Read/write from external tool and query data from SQL pool/endpoint
  • Storage: View contents in Azure portal
  • Databricks workspace: Access workspace
  • Databricks cluster: Ability to execute notebooks against a cluster
  • Databricks SQL: SQL workspace
  • Databricks Spark databases: Access hive databases via Databricks SQL only
  • Databricks Spark databases: Access hive databases via Databricks notebooks
  • Synapse Studio: Edit pipelines, notebooks, etc
  • Synapse Studio: Execute Spark notebooks and jobs
  • Synapse Studio: Publish new or updated artifacts to the service
  • Synapse Studio: Commit code to git
  • Azure Synapse SQL pool databases: Use SQL Pool via Azure Synapse Studio, SSMS, SSRS

Storage

Read/write from external tool and query data from SQL pool/endpoint

  • Permission/role: Storage blob data reader. This will allow you to query data using a SQL engine but will NOT let you view the underlying storage account/containers in Azure
  • Permission location: Azure Portal

View contents in Azure portal

  • Permission/role: Reader role
  • Permission location: Azure Portal

Databricks Workspace

Access workspace

  • Permission/role: Workspace access permission
  • Permission location: Databricks workspace

Databricks cluster

Ability to execute notebooks against a cluster

  • Permission/role: At least “can attach to” permission for user
  • Permission location: Databricks workspace -> Compute -> Cluster -> Permissions

Databricks SQL

SQL workspace

  • Permission/role: Databricks SQL access. Note this does not allow you to query databases – separate permission must be given for each database and the storage layer.
  • Permission location: Databricks workspace

Databricks Spark databases

Access hive databases via Databricks SQL only

  • Permission/role: Must grant at least USAGE and SELECT permissions to user or group registered in Databricks:

GRANT USAGE, SELECT ON SCHEMA spark_db TO test@domain.com;        

  • Permission location: Run grant SQL scripts in a Databricks notebook or Databricks SQL. To view permissions:

SHOW GRANT ON database_name.table_name
SHOW GRANT ON SCHEMA database_name        

Access hive databases via Databricks notebooks

  • Permission/role: By default, all users have access to the data (if storage permissions are set) unless table access control is turned on.If table access control is on, then follow the instructions for accessing Databricks spark databases via Databricks SQL

Synapse Studio

Edit pipelines, notebooks, etc

  • Permission/role: Any RBAC role (yes, any)
  • Permission location: Azure Portal

Execute Spark notebooks and jobs

  • Permission/role: Synapse Compute Operator – can be done on a Spark pool-by-pool basis
  • Permission location: Synapse Studio

Publish new or updated artifacts to the service

  • Permission/role: Databricks SQL access. Synapse Artifact Publisher
  • Permission location: Synapse Studio

Commit code to git

  • Permission/role: At least contribute and read on the repository
  • Permission location: Azure DevOps

Azure Synapse SQL pool databases

Use SQL Pool via Azure Synapse Studio, SSMS, SSRS

  • Permission/role: Need to create login and associated users/permissions like any regular SQL permissions. Can be granted at group or user level.Individual user:

CREATE LOGIN [email@domain.com] FROM EXTERNAL PROVIDER use [database name]

CREATE USER [name] FROM LOGIN [email@domain.com]; ALTER ROLE db_datareader ADD MEMBER [test@domain.com]        

Group:

CREATE LOGIN [VXC_GroupName] FROM EXTERNAL PROVIDER use [database name]

CREATE USER [VXC_GroupName] FROM LOGIN [VXC_GroupName]; ALTER ROLE db_datareader ADD MEMBER [VXC_GroupName]         

  • Won’t be able to read underlying data unless given at least Storage Blob Data Reader role to Azure storage blob container or folder.Nested security groups are not supported.
  • Permission location: Synapse SQL Pool (e.g. via SSMS)The only way to view existing permissions here is to run this query:

-- server logins

SELECT name, CONVERT(uniqueidentifier, sid) as sid, create_date

FROM sys.server_principals WHERE type in ('E', 'X')

-- database users

SELECT name, CONVERT(uniqueidentifier, sid) as sid, create_date

FROM sys.database_principals WHERE type in ('E', 'X')

-- database roles

SELECT r.name role_principal_name, m.name AS member_principal_name

FROM sys.database_role_members rm

JOIN sys.database_principals r ON rm.role_principal_id = r.principal_id

JOIN sys.database_principals m ON rm.member_principal_id = m.principal_id        

  • Traditional SQL permissions still apply. E.g. you may need to check/add roles for a user/group: 

ALTER ROLE db_datawriter ADD MEMBER [<My group name>]

ALTER ROLE db_ddladmin ADD MEMBER [<My group name>]        



To view or add a comment, sign in

More articles by Chris Payne

  • Cybersecurity for Business Execs 101

    You've come into a new company as an executive with some purview of the business's cybersecurity stance. Or you're…

  • How to be HITRUST Compliant

    So you're a healthcare company and you've decided to go the HITRUST route (or you're already there but you don't feel…

    4 Comments
  • Azure Databricks Costs with DLT

    In this article from last August, I wrote about calculating costs of your running Databricks pipelines in Azure. TL;DR:…

  • Securing Your Azure Data Lake

    A consolidated pool of data to drive your organization is a wonderful thing, but that doesn’t mean everyone in your…

  • Building an Azure Data Lake from MSSQL

    MSSQL is a great house for data. The power of the relational management system and T-SQL as a language can make most…

  • Importing Your Azure Infrastructure into Terraform

    So you’ve built out an infrastructure in Azure and have decided to leap into the infrastructure-as-code pond. The…

  • EDR, DLP, and SIEM: Tools Evolving Together

    Endpoint Detection and Response, Data Loss Prevention, and Security Information and Event Management. All important…

    2 Comments
  • How to Analyze Azure Databricks Costs

    Databricks is an amazing tool for crunching large amounts of data. Unfortunately, understanding its cost model and…

Others also viewed

Explore content categories