References
Reference - SQL Language reference (opens in a new tab)
Reference - SQL Data types (opens in a new tab)
Reference - Databricks REST API (opens in a new tab)
Databricks Runtime release notes versions and compatibility (opens in a new tab)
Databricks Architecture (opens in a new tab)
Lakehouse Medallion Architecture (opens in a new tab)

Bronze
- Ingests raw data in its original format
- No data cleanup or validation
- Is appended incrementally and grows over time.
- Serves as the single source of truth, preserving the data's fidelity.
- Enables reprocessing and auditing by retaining all historical data.
- Can be any combination of streaming and batch transactions from sources
- Metadata columns might be added
Silver
- Represents validated, cleaned, and enriched versions of the data.
- Data cleansing, deduplication, and normalization.
- Enhances data quality by correcting errors and inconsistencies.
- Structures data into a more consumable format for downstream processing.
- Start modeling data
Gold
- Business users oriented, aligns with business logic and requirements.
- Consists of
aggregated datatailored for analytics and reporting. - Optimized for performance in queries and dashboards
- Contains fewer datasets than
silverandbronze
Databricks Account

Workspace
Databricks workspaceis a web application, functioning as the management console and cloud integrated development environment.- One account can have multiple workspaces in one region.
Workspace - Architecture (opens in a new tab)
Workspace - Control Plane
- Backend services that Databricks manages in your Databricks account.
- Web application
Workspace - Compute Plane
Classic compute (opens in a new tab)

-
Databricks compute resources run in your AWS account.
-
Types
-
jobs -
Lakeflow SDP -
SQL warehouses-
Classic- Gist
- Most basic
SQLcompute, less performance thanProandServerlesscompute. - In
customercloud - Supports
Photon engine
- Most basic
- Gist
-
Pro-
More advanced features than
Classic(Predictive IO) -
In
customercloud -
Supports
Photon engine -
Does not support
Intelligent Workload Management -
Less responsive, slower auto scaling than serverless SQL warehouse
-
Use cases
Serverless SQL warehousesare not available in a region.- You have
custom-defined networkingand want to connect to databasesin your network in the cloudoron-premisesforfederationor ahybrid-type architecture.
-
-
-
Access modes
- Standard (shared)
- Dedicated (single user)
-
Instance pools
- Prefer
serverless computetoinstance poolwhenever possible - More responsive, reduce cluster start and scaling time
- Prefer
Serverless compute

- Managed by Databricks
- Customer data isolated by network and security boundary
- Types
-
Notebooks
-
jobs -
Lakeflow SDP -
SQL warehouses- Cutting edge features (
Predictive IOandIntelligent Workload Management) - Most cost performant
- Use cases
- ETL
- Business intelligence
- Exploratory analysis
- Cutting edge features (
-
SQL warehouses
-
Optimized compute for
SQL queries,analytics, andbusiness intelligence workloadswithserverlessorclassicoptions. -
Only support
SQL cellsinNotebooks. -
Auto StopWhether the warehouse stops if it's idle for the specified number of minutes
-
Cluster sizeThe size of the
drivernode and number ofworkernodes associated with the cluster -
ScalingThe
minimumandmaximumnumber of clustersthat will be used for a queryDatabricks recommends
a cluster for every 10 concurrent queries.
| Warehouse type | Photon Engine | Predictive IO | Intelligent Workload Management |
|---|---|---|---|
| Serverless | X | X | X |
| Pro | X | X | |
| Classic | X |
Workspace - Access Control
Workspace - Account admin (opens in a new tab)
- Can create metastores, and by default become the initial metastore admin.
- Can link metastores to workspaces.
- Can assign the metastore admin role.
- Can grant privileges on metastores.
- Can enable Delta Sharing for a metastore.
- Can configure storage credentials.
- Can enable system tables and delegate access to them.
Workspace - Workspace admin (opens in a new tab)
- Can add users, service principals, and groups to a workspace.
- Can delegate other workspace admins.
- Can manage job ownership.
- Can manage the job Run as setting.
- Can view and manage notebooks, dashboards, queries, and other workspace objects.
Workspace - User
- Represented by an
email address
Workspace - Service principal
- Represented by an
application ID
Workspace - Group
Delta tables (opens in a new tab)
Delta table types
| Table type | Description |
|---|---|
| Unity Catalog managed table | Always backed by Delta Lake. The default and recommended table type on Databricks. Provides many built-in optimizations. |
| Unity Catalog external table | Optionally backed by Delta Lake. Supports some legacy integration patterns with external Delta Lake clients. |
| Unity Catalog foreign table | Might be backed by Delta Lake, depending on the foreign catalog. Foreign tables backed by Delta Lake do not have many optimizations present in Unity Catalog managed tables. |
| Streaming table | A Lakeflow Spark Declarative Pipelines dataset backed by Delta Lake that includes an append or AUTO CDC ... INTO flow definition for incremental processing. |
Hive metastore table | Foreign tables in an internal or external federated Hive metastore and tables in the legacy workspace Hive metastore. Both managed and external Hive metastore tables can optionally be backed by Delta Lake. |
| Materialized view | A Lakeflow Spark Declarative Pipelines dataset backed by Delta Lake that materializes the results of a query using managed flow logic. |
Delta tables - feature compatibility and protocols (opens in a new tab)
Interface
REST API
REST - Call API endpoint with CLI (opens in a new tab)
# Example
databricks api get $ENDPOINT_PATH-
Example - GET /api/2.0/serving-endpoints (opens in a new tab)
databricks api get /api/2.0/serving-endpoints
CLI (opens in a new tab)
CLI - Setup - Workspace-level
UI -> Settings -> Identity and access -> Service principals
# $HOME/.databrickscfg
# DEFAULT profile doesn't need to be specified explicitly
[DEFAULT]
host = $workspace_url
client_id = $service_principal-Application_ID
client_secret = $secretCLI - Show auth info
databricks auth envCLI - Verify profiles
databricks auth profilesCLI - Show clusters info
databricks clusters spark-versionsCLI - Troubleshoot CLI command execution
Will display underlying HTTP requests and responses.
databricks $COMMAND --debugExample:
> databricks workspace list /Workspace/Users/orange.lavender.9@gmail.com --debug
15:00:25 Info: start pid=48008 version=0.272.0 args="databricks, workspace, list, /Workspace/Users/orange.lavender.9@gmail.com, --debug"
15:00:25 Debug: Loading DEFAULT profile from C:\Users\Takechiyo/.databrickscfg pid=48008 sdk=true
15:00:25 Debug: Failed to configure auth: "pat" pid=48008 sdk=true
15:00:25 Debug: Failed to configure auth: "basic" pid=48008 sdk=true
15:00:26 Debug: GET /oidc/.well-known/oauth-authorization-server
< HTTP/2.0 200 OK
< {
< "authorization_endpoint": "https://dbc-80546867-03ea.cloud.databricks.com/oidc/v1/authorize",
< "claims_supported": [
< "iss",
< "sub",
< "aud",
< "iat",
< "exp",
< "jti",
< "name",
< "family_name",
< "given_name",
< "preferred_username"
< ],
< "code_challenge_methods_supported": [
< "S256"
< ],
< "grant_types_supported": [
< "client_credentials",
< "authorization_code",
< "refresh_token"
< ],
< "id_token_signing_alg_values_supported": [
< "RS256"
< ],
< "issuer": "https://dbc-80546867-03ea.cloud.databricks.com/oidc",
< "jwks_uri": "https://ohio.cloud.databricks.com/oidc/jwks.json",
< "request_uri_parameter_supported": false,
< "response_modes_supported": [
< "query",
< "fragment",
< "form_post"
< ],
< "response_types_supported": [
< "code",
< "id_token"
< ],
< "scopes_supported": [
< "all-apis",
< "email",
< "offline_access",
< "openid",
< "profile",
< "sql"
< ],
< "subject_types_supported": [
< "public"
< ],
< "token_endpoint": "https://dbc-80546867-03ea.cloud.databricks.com/oidc/v1/token",
< "token_endpoint_auth_methods_supported": [
< "client_secret_basic",
< "client_secret_post",
< "none"
< ]
< } pid=48008 sdk=true
15:00:26 Debug: Generating Databricks OAuth token for Service Principal (c9fa7fd8-0982-4430-89f3-a8723af7441e) pid=48008 sdk=true
15:00:26 Debug: POST /oidc/v1/token
> <http.RoundTripper>
< HTTP/2.0 200 OK
< {
< "access_token": "**REDACTED**",
< "expires_in": 3600,
< "scope": "all-apis",
< "token_type": "Bearer"
< } pid=48008 sdk=true
15:00:27 Debug: GET /api/2.0/workspace/list?path=/Workspace/Users/orange.lavender.9@gmail.com
< HTTP/2.0 200 OK
< {} pid=48008 sdk=true
ID Type Language Path
15:00:27 Info: completed execution pid=48008 exit_code=0
15:00:27 Debug: no telemetry logs to upload pid=48008Workspace UI (opens in a new tab)
UI - Get resource URL/path

PySpark
Lists the contents of a directory
dbutils.fs.ls('/databricks-datasets/data.gov/farmers_markets_geographic_data/data-001/')Use Ctrl + SPACE to trigger autocomplete.
Read a CSV file
SELECT * FROM csv.`dbfs:/databricks-datasets/data.gov/farmers_markets_geographic_data/data-001/market_data.csv`Read a text file
SELECT * FROM csv.`dbfs:/databricks-datasets/data.gov/farmers_markets_geographic_data/README.md`Read a file in binary format (HEX encoded)
SELECT * FROM binaryFile.`dbfs:/databricks-datasets/travel_recommendations_realtime/README.txt`Databricks Connect
- like a
JDBC driver, theDatabricks Connectis alibrarythat allows you to connect to a Databricks cluster from your own application. - Built on
Spark Connect
Lakeflow Connect (opens in a new tab)
-
Gist
40+ GA connectors: Salesforce, Workday, SharePoint, SQL Server, Oracle, Postgres, Redshift, BigQuery, SFTP and more. (databricks.com)Zerobus Ingest connector(opens in a new tab) write API streams events straight intoUnity Catalogat~100 MB/swith5s end-to-lake latency.- Source-aware
CDCoptimisations so you don’t babysit log positions.
Lakeflow Connect - Manual File Uploads (opens in a new tab)
Lakeflow Connect - Managed connectors (opens in a new tab)
- Google Analytics
- Salesforce
- Workday
- SharePoint
- SQL Server
- ServiceNow
Lakeflow Connect - Standard connectors (opens in a new tab)
-
Sources
-
Cloud object storage
- Amazon S3 (
s3://) - Azure Data Lake Storage (
ADLS, abfss://) - Google Cloud Storage (
GCS, gs://) - Azure Blob Storage (
wasbs://) - Databricks File System (
DBFS, dbfs:/)
- Amazon S3 (
-
SFTP servers
-
Apache Kafka
-
Amazon Kinesis
-
Google Pub/Sub
-
Apache Pulsar
-
-
Ingestion methods
-
Batch
-
Gist
- Load data as
batches of rows into Databricks, often based on a schedule. - Traditional batch ingestion
processes all recordseach time it runs
- Load data as
-
Usage
CREATE TABLE AS (CTAS)spark.read.load()
-
-
Incremental Batch
-
Gist
Only new data is ingested, previously loaded records areskipped automatically.
-
Usage
COPY INTOspark.readStream(Auto Loaderwithtimed trigger(opens in a new tab))- Declarative Pipelines:
CREATE OR REFRESH STREAMING TABLE
-
-
Streaming
-
Gist
Continuously load datarows or batches of data rows as it is generated so you can query it as itarrives in near real-time.Micro-batchprocesses small batches avery short, frequent intervals
-
Usage
spark.readStream(Auto Loaderwithcontinuous trigger(opens in a new tab))- Declarative Pipelines: (trigger mode continuous)
-
-
Standard connectors - Cloud object storage - CREATE TABLE AS (CTAS)
- Batch method
SQL Reference - CREATE TABLE [USING] (opens in a new tab)
Standard connectors - Cloud object storage - COPY INTO (opens in a new tab)
-
Incremental batch
-
Idempotent
-
Easily configurable file or directory filters from cloud storage
S3, ADLS, ABFS, GCS, and Unity Catalog volumes
-
Support for multiple source file formats
CSV, JSON, XML, Avro, ORC, Parquet, text, and binary files
-
Target table schema inference, mapping, merging, and evolution
-
load data from a file location into a
Delta table
SQL Reference - COPY INTO (opens in a new tab)
Standard connectors - Cloud object storage - Auto Loader (opens in a new tab)
-
Built on
Structured Streaming, providing aStructured Streaming sourcecalledcloudFiles. -
Declarative Pipeline-
Python
spark.readStream .format("cloudFiles) -
SQL
Example:
CREATE OR REFRESH STREAMING TABLE csv_data ( id int, ts timestamp, event string ) AS SELECT * FROM STREAM read_files( 's3://bucket/path', format => 'csv', schema => 'id int, ts timestamp, event string' );
-
-
Incremental batchorstreaming, alternative and successor toCOPY INTO
Auto Loader - Schema Inference and Evolution
-
Specifying a target directory for the option
cloudFiles.schemaLocationenablesschema inference and evolution. -
If you use
Lakeflow SPD,Databricks manages schema location and other checkpoint information automatically. -
Schema Inference
- Use directory
_schemasat the configuredcloudFiles.schemaLocationto track schema changes to the input data over time. - For formats that don't encode data types (
JSON,CSV, andXML),Auto Loaderinfers all columns asstrings(including nested fields in JSON files). - For formats with typed schema (
ParquetandAvro),Auto Loadersamples a subset of files and merges the schemas of individual files.
- Use directory
-
Schema Evolution
-
cloudFiles.schemaEvolutionModeaddNewColumnsandfailOnNewColumnsrequire human intervention to confirm.-
addNewColumns(default)-
Stream fails.
-
New Columns are added to the schema.
-
Existing columns do not evolve data types.
-
-
rescue-
Stream doesn't fail.
-
Schema is not evolved.
-
New columns are recorded in the
rescued data column.
-
-
failOnNewColumns-
Stream fails.
-
Stream does not restart unless the provided schema is updated, or the offending data file is removed.
-
-
none-
Stream doesn't fail.
-
Schema is not evolved.
-
New columns are ignored.
-
-
-
Lakeflow Spark Declarative Pipelines (SDP, previously DLT)

-
Benefits
-
Automatic orchestration
-
Declarative processing
-
Incremental processing
-
The engine will only process new data and changes in the data sources whenever possible.
Incremental refresh for
materialized views(opens in a new tab)
-
-
-
DLT
DLTdoes not support shared clusters. You need to configure the cluster (ajobs cluster) via theDLTuse (under workflows).- You would still write your notebook with
DLTusing Python but the pipeline itself has to be configured via the UI and run from the UI.
-
From DLT to LDP

-
Gist
- Same
CREATE LIVE TABLEDSL, now built onSpark Declarative Pipelines, an open standard you can run anywhereSparkruns (opens in a new tab). - New IDE for Data Engineering:
side-by-side code&DAG,inline data previews,Git integrationand anAI copilot. AUTO CDC ...inSQL/create_auto_cdc_flow()inPythonreplace manualAPPLY CHANGES.Lakeflow SDPprovide a declarative approach to definingrelationships between datasets and transformations.
- Same
-
SQL
SQLcode thatcreates pipeline datasetsuses theCREATE OR REFRESHsyntax to definematerialized viewsandstreaming tablesagainst query results.The
STREAMkeyword indicates if the data source referenced in aSELECTclause should be read withstreaming semantics.-
LIVE
Legacy publishing modeuses theLIVE virtual schema(opens in a new tab) to achieve similar behavior. In thedefault publishing mode(used byall new pipelines), theLIVEkeyword is ignored.In
legacy publishing modepipelines, you can use theLIVEkeyword toreference other datasets in the current pipeline for reads, e.g.SELECT * FROM LIVE.bronze_table. In thedefault publishing modefor newLakeflow Spark Declarative Pipelines, this syntax is silently ignored, meaning thatunqualified identifiers use the current schema.
-
-
Resources
Lakeflow SPD - Pipeline Mode (opens in a new tab)
| Key questions | Triggered | Continuous |
|---|---|---|
| When does the update stop? | Automatically once complete. | Runs continuously until manually stopped. |
| What data is processed? | Data available when the update starts. | All data as it arrives at configured sources. |
| What data freshness requirements is this best for? | Data updates run every 10 minutes, hourly, or daily. | Data updates are desired between every 10 seconds and a few minutes. |
-
triggered (opens in a new tab)
Lakeflow SDPstops processing after successfully refreshing all tables or selected tables, ensuring each table in the update is refreshed based on the data available when the update starts.- You can schedule
triggered Lakeflow SPDto run as ataskin ajob.
-
continuous (opens in a new tab)
Lakeflow SDPprocesses new data as it arrivesin data sources to keep tables throughout the pipeline fresh.
Lakeflow SPD - Pricing (opens in a new tab)
Lakeflow SPD - Flows (opens in a new tab)
-
Lakeflow SPD unique flow types
-
- Streaming flow
- Handles
out of order CDC eventsand supports bothSCD Type 1 and 2(Auto CDCis not available inApache Spark Declarative Pipelines).
Example:
CREATE OR REFRESH STREAMING TABLE atp_cdc_flow AS AUTO CDC INTO atp_silver -- Target table to update with SCD Type 1 or 2 FROM STREAM atp_bronze -- Source records to determine updates, deletes and inserts KEYS (player) -- Primary key for identifying records APPLY AS DELETE WHEN operation = "DELETE" -- Handle deletes from source to the target SEQUENCE BY ranking_date -- Defines order of operations for applying changes COLUMNS * EXCEPT (ranking_date, operation, sequenceNum) -- Select columns and exclude metadata fields STORED AS SCD TYPE 1 -- Use SCD Type 1 -
Materialized View
- Batch flow
-
Lakeflow SPD - Streaming Tables (opens in a new tab)
-
Each input row is handled only once.
-
The
STREAMkeyword beforeread_filestells the query totreat the dataset as a stream. -
Trigger Policy
ContinuousOnceTimed
-
Example
CREATE OR REFRESH STREAMING TABLE atp_bronze AS SELECT * FROM STREAM read_files( '/Volumes/workspace/experiment/atp/*.csv', format => "csv", sep => ",", header => true ); CREATE OR REFRESH STREAMING TABLE atp_silver AS SELECT player, points FROM STREAM atp_bronze; CREATE OR REFRESH MATERIALIZED VIEW atp_gold AS SELECT player, SUM(points) AS total_points FROM atp_silver GROUP BY player ORDER BY player;
Lakeflow SPD - Materialized Views (opens in a new tab)
Lakeflow SPD - Data Quality
| Recommendation | Impact |
|---|---|
| Store expectation definitions separately from pipeline logic. | Easily apply expectations to multiple datasets or pipelines. Update, audit, and maintain expectations without modifying pipeline source code. |
| Add custom tags to create groups of related expectations. | Filter expectations based on tags. |
| Apply expectations consistently across similar datasets. | Use the same expectations across multiple datasets and pipelines to evaluate identical logic. |
Lakeflow SPD - Expectations - On Violation
-
warn
Invalid records are written to the target.
EXPECT -
drop
Invalid records are dropped before data is written to the target.
EXPECT ... ON VIOLATION DROP ROW -
fail
Invalid records prevent the update from succeeding.
EXPECT ... ON VIOLATION FAIL UPDATE
Lakeflow SPD - Compute
Lakeflow SPD - Compute - Serverless
Lakeflow SPD - Compute - Classic
Lakeflow SPD - Compute - Classic - Core
- Declarative Pipelines in Python and SQL
Lakeflow SPD - Compute - Classic - Pro
- Declarative Pipelines in Python and SQL
- CDC
Lakeflow SPD - Compute - Classic - Advanced
- Declarative Pipelines in Python and SQL
- CDC
- Data Quality Control
Lakeflow SPD - DLT - SQL
Lakeflow Jobs (previously Workflows)
-
Gist
- Everything
Workflowsdid, plusloops,conditionals,parameter-scoped retriesandtable-update triggers. Serverless performance modeGA:3–5× faster cold starts, orcost-optimised modeif pennies matter.- Unified
Jobs & Pipelinesleft-nav entry: no more hunting for 2 icons.
- Everything
Lakeflow Jobs - Jobs
- A
jobis used to schedule and orchestratetaskson Databricks in a workflow.
Lakeflow Jobs - Tasks (opens in a new tab)
-
You can schedule
triggeredLakeflow SPDto run as ataskin ajob. -
Types of tasks
- Notebook
- Python script
- Python wheel
- SQL
- Pipeline
- Dashboards
- Power BI
- dbt
- dbt platform (Beta)
- JAR
- Spark Submit
- Run Job
- If/else
- For each
Unity Catalog (opens in a new tab)
-
Data Governance (opens in a new tab)
-
Data Access Control (opens in a new tab)
Layer Purpose Mechanisms Workspace-level restrictions Limit which workspaces can access specific catalogs, external locations, and storage credentials Workspace-level bindings Privileges and ownership Control access to catalogs, schemas, tables, and other objects Privilege grants to users and groups, object ownership Attribute-based policies (ABAC) Use tags and policies to dynamically apply filters and masks ABAC policies and governed tags Table-level filtering and masking Control what data users can see within tables Row filters, column masks, dynamic views -
Data Access Audit
Capture and record all access to data
-
Data Lineage (opens in a new tab)
Tracks and visualizes the origin and usage of data assets, providing transparency and traceability.
-
Data Discovery
Ability to search for and discover authorized assets
-
-
Gist
Each cloud regionhas its ownUnity Cataloginstance.
-
Managed storage location (opens in a new tab)
Associated Unity Catalog object How to set Relation to external locations Metastore Configured by account admin during metastore creation. Cannot overlap an external location. Standard catalog Specified during catalog creation using the MANAGED LOCATIONkeyword.Must be contained within an external location. Foreign catalog Specified after catalog creation using Catalog Explorer. Must be contained within an external location. Schema Specified during schema creation using the MANAGED LOCATIONkeyword.Must be contained within an external location. - You can associate a
managed storage locationwith ametastore,catalog, orschema.Managed storage locationsat lower levelsin the hierarchy override storage locations definedat higher levelswhen managed tables or managed volumes are created. - You can choose to
store data at the metastore level, providing adefault storage location for catalogs that don't have a managed storage location of their own. - Databricks recommends that
you assign managed storage at the catalog level for logical data isolation, with metastore-level and schema-level as options.
- You can associate a
Unity Catalog - Metastore (opens in a new tab)
-
Top-level container for metadata in Unity Catalog
-
For a
workspaceto useUnity Catalog, it must have aUnity Catalogmetastoreattached. -
You can link
a single metastoretomultiple workspacesin the same region, giving each workspace the same data view. -
You should have
one metastore for each regionin which you have workspaces. -
There can be multiple
metastoresin a singleaccount. -
Ownership
- Metastore admins are the owners of the metastore.
- Metastore admins can reassign ownership of the metastore by transferring the metastore admin role.
Unity Catalog - Catalog (opens in a new tab)
- Each
catalogtypically has its own managed storage location to store managed tables and volumes, providingphysical data isolation at the catalog level. - Your Databricks account has
one metastore per region, catalogs are inherently isolated by region. - By default, a
catalogis shared with allworkspacesattached to the currentmetastore. Workspace-catalog binding(opens in a new tab) can be used to limitcatalogaccess to specificworkspacesin your account.
Unity Catalog - Views

Unity Catalog - Volumes (opens in a new tab)
-
Use cases
- Register landing areas for raw data produced by external systems to support its processing in the early stages of ETL pipelines and other data engineering activities.
Register staging locations for ingestion. e.g. usingAuto Loader,COPY INTO, orCTAS (CREATE TABLE AS) statements.- Provide file storage locations for data scientists, data analysts, and machine learning engineers to use as parts of their exploratory data analysis and other data science tasks.
- Give Databricks users access to arbitrary files produced and deposited in cloud storage by other systems. For example, large collections of unstructured data (such as image, audio, video, and PDF files) captured by surveillance systems or IoT devices, or library files (JARs and Python wheel files) exported from local dependency management systems or CI/CD pipelines.
- Store operational data, such as logging or checkpointing files.
- Recommended for managing all access to
non-tabular data in cloud object storageand forstoring workload support files.
-
catalog.schema.volume -
Access pattern
/Volumes/<catalog>/<schema>/<volume>/<path>/<file-name>or
dbfs:/Volumes/<catalog>/<schema>/<volume>/<path>/<file-name>
Separation of metastore management
-
Databricks workspaces management before and after Unity Catalog

Centralized user management and access control
-
Identities
Managed at 2 levels:
account levelandworkspace level- Users
- Groups
- Service principals
-
Identity federation
Allow
identitiesto be created once at theaccount leveland then assigned to multipleworkspacesas needed. -
Each
catalogtypically has its own managed storage location to store managed tables and volumes, providingphysical data isolation at the catalog level.
3 Level hierarchy:
catalog.schema.table-
Catalog
Object privileges

-
Schema (Database)
Object privileges

-
Table / View / Function
Object privileges

-
Securable objects (opens in a new tab)
Privilege types by securable object

Docs - UI (opens in a new tab)
In the GitHub repo, Unity Catalog UI (opens in a new tab)
Docs - CLI (opens in a new tab)
In the GitHub repo, CLI - uc (opens in a new tab)
Unity Catalog - Integration with Confluent stack

Unity Catalog - Delta Sharing (opens in a new tab)
- Based on Delta Lake - Delta Sharing (opens in a new tab)
Data from other metastorescan be accessed usingDelta Sharingif needed to enablecross-regionalfederation.
Unity Catalog - Predictive Optimization (opens in a new tab)
- Identifies tables that would benefit from
ANALYZE,OPTIMIZE, andVACUUMoperations and queues them to run using serverless compute for jobs. - Only applies to
managed table
Notebooks
Notebooks - Examples - Pivot in SQL (opens in a new tab)
DevOps
Provision Databricks Workspace
- Create
Databricks workspaceswith theDatabricks Terraform provider(opens in a new tab).
Provision Databricks projects and resources
-
Databricks Asset Bundles (opens in a new tab)

- Infrastructure-as-code definition file in YAML format to define resources and configuration
- Use Databricks CLI to operate
- Use another CI/CD runner (e.g. Jenkins) to schedule and manage the pipeline for provisioning jobs
Databricks Git Folders (Repos) (opens in a new tab)
-
Gist
Clone,push to, andpull froma remote Git repository.Createand manage branches for development work, includingmerging,rebasing, andresolving conflicts.- Create notebooks, including
IPYNB notebooks, and edit them and other files. Visually compare differencesupon commit andresolve merge conflicts.
Data Access
| Object | Object identifier | File path | Cloud URI |
|---|---|---|---|
| External location | no | no | yes |
| Managed table | yes | no | no |
| External table | yes | no | yes |
| Managed volume | no | yes | no |
| External volume | no | yes | yes |
File access (opens in a new tab)
-
Locations
-
Unity Catalogvolumes -
Workspacefiles -
Cloud object storage
-
DBFSmounts andDBFSroot-
Databricks Utilities (opens in a new tab)
Only work with DBFS
-
-
Ephemeral storage attached to the driver node of the cluster
-
-
Access patterns
URI-style paths

POSIX-style paths

