Databricks Certified - Data Engineer Associate

Databricks Certified - Data Engineer Associate (opens in a new tab)

LEARNING PATHWAY 1: ASSOCIATE DATA ENGINEERING

1. Data Ingestion with Lakeflow Connect (opens in a new tab)

From Cloud Storage

  • CREATE TABLE AS (CTAS)
  • COPY INTO
  • Auto Loader

Data Ingestion from Cloud Storage

From Databases

Lakeflow Connect Managed Connectors: Database Ingestion

From Enterprise applications

Lakeflow Connect Managed Connectors: SaaS Ingestion

2. Deploy Workloads with Lakeflow Jobs (opens in a new tab)

3. Build Data Pipelines with Lakeflow Declarative Pipelines (opens in a new tab)

4. Data Management and Governance with Unity Catalog (opens in a new tab)

Exam Guide

Section 1: Databricks Intelligence Platform

  • Enable features that simplify data layout decisions and optimize query performance.
  • Explain the value of the Data Intelligence Platform.
  • Identify the applicable compute to use for a specific use case.

Section 2: Development and Ingestion

  • Use Databricks Connect in a data engineering workflow.
  • Determine the capabilities of Notebooks functionality.
  • Classify valid Auto Loader sources and use cases.
  • Demonstrate knowledge of Auto Loader syntax.
  • Use Databricks' built-in debugging tools to troubleshoot a given issue.

Section 3: Data Processing & Transformations

  • Describe the three layers of the Medallion Architecture and explain the purpose of each layer in a data processing pipeline.
  • Classify the type of cluster and configuration for optimal performance based on the scenario in which the cluster is used.
  • Emphasize the advantages of LDP (for ETL process in Databricks).
  • Implement data pipelines using LDP.
  • Identify DDL/DML features.
  • Compute complex aggregations and Metrics with PySpark Dataframes.

Section 4: Productionizing Data Pipelines

  • Identify the difference between DAB and traditional deployment methods.
  • Identify the structure of DAB.
  • Deploy a workflow, repair, and rerun a task in case of failure.
  • Use serverless for a hands-off, auto-optimized compute managed by Databricks.
  • Analyzing the Spark UI to optimize the query.

Section 5: Data Governance & Quality

  • Explain the difference between managed and external tables.
  • Identify the grant of permissions to users and groups within UC.
  • Identify key roles in UC.
  • Identify how audit logs are stored.
  • Use lineage features in UC.
  • Use the Delta Sharing feature available with UC to share data.
  • Identify the advantages and limitations of Delta sharing.
  • Identify types of delta sharing - Databricks vs external system.
  • Analyze the cost considerations of data sharing across clouds.
  • Identify Use cases of Lakehouse Federation when connected to external sources.