DP-700 - Microsoft Certified - Fabric Data Engineer Associate

Certification - Microsoft Certified: Fabric Data Engineer Associate (opens in a new tab)

Microsoft Learn

Exam Topics

Implement and manage an analytics solution (30–35%)

Configure Microsoft Fabric workspace settings

  • Configure Spark workspace settings
  • Configure domain workspace settings
  • Configure OneLake workspace settings
  • Configure data workflow workspace settings

Implement lifecycle management in Fabric

  • Configure version control
  • Implement database projects
  • Create and configure deployment pipelines

Configure security and governance

  • Implement workspace-level access controls
  • Implement item-level access controls
  • Implement row-level, column-level, object-level, and folder/file-level access controls
  • Implement dynamic data masking
  • Apply sensitivity labels to items
  • Endorse items
  • Implement and use workspace logging

Orchestrate processes

  • Choose between a pipeline and a notebook
  • Design and implement schedules and event-based triggers
  • Implement orchestration patterns with notebooks and pipelines, including parameters and dynamic expressions

Ingest and transform data (30–35%)

Design and implement loading patterns

  • Design and implement full and incremental data loads
  • Prepare data for loading into a dimensional model
  • Design and implement a loading pattern for streaming data

Ingest and transform batch data

  • Choose an appropriate data store
  • Choose between dataflows, notebooks, KQL, and T-SQL for data transformation
  • Create and manage shortcuts to data
  • Implement mirroring
  • Ingest data by using pipelines
  • Transform data by using PySpark, SQL, and KQL
  • Denormalize data
  • Group and aggregate data
  • Handle duplicate, missing, and late-arriving data

Ingest and transform streaming data

  • Choose an appropriate streaming engine
  • Choose between native storage, mirrored storage, or shortcuts in Real-Time Intelligence
  • Process data by using eventstreams
  • Process data by using Spark structured streaming
  • Process data by using KQL
  • Create windowing functions

Monitor and optimize an analytics solution (30–35%)

Monitor Fabric items

  • Monitor data ingestion
  • Monitor data transformation
  • Monitor semantic model refresh
  • Configure alerts

Identify and resolve errors

  • Identify and resolve pipeline errors
  • Identify and resolve dataflow errors
  • Identify and resolve notebook errors
  • Identify and resolve eventhouse errors
  • Identify and resolve eventstream errors
  • Identify and resolve T-SQL errors

Optimize performance

  • Optimize a lakehouse table
  • Optimize a pipeline
  • Optimize a data warehouse
  • Optimize eventstreams and eventhouses
  • Optimize Spark performance
  • Optimize query performance

Services

Dataflows Gen2

Azure Data Factory

  • Managed, serverless ETL/ELT service
  • SSIS (SQL Server Integration Services) in the cloud

Azure Data Factory - Data Factory Pipelines

Data Factory Pipelines can be used to orchestrate Spark, Dataflow, and other activities; enabling you to implement complex data transformation processes.

Azure Data Lake Storage (ADLS) (opens in a new tab)

  • Hadoop compatible access

    Treat the data as if it's stored in a HDFS

  • Security

    ACL and POSIX permissions

  • Hierarchical namespace for easier navigation and better performance

    Azure Data Lake Storage Gen2 builds on blob storage and optimizes I/O of high-volume data by using a hierarchical namespace that organizes blob data into directories, and stores metadata about each directory and the files within it. This structure allows operations, such as directory renames and deletes, to be performed in a single atomic operation. Flat namespaces, by contrast, require several operations proportionate to the number of objects in the structure. Hierarchical namespaces keep the data organized, which yields better storage and retrieval performance for an analytical use case and lowers the cost of analysis.

  • Data Redundancy

    Supported by underlying Azure Blob storage redundancy options (LRS/GRS)

Microsoft Fabric

Capacity

  • Key points

    • A Microsoft Fabric capacity resides on a tenant.
    • Each capacity that sits under a specific tenant is a distinct pool of resources allocated to Microsoft Fabric.
  • Benefits

    • Centralized management of capacity

      Rather than provisioning and managing separate resources for each workload, with Microsoft Fabric, your bill is determined by 2 variables:

      • The amount of compute you provision

        • A shared pool of capacity that powers all capabilities in Microsoft Fabric.
        • Pay-as-you-go and 1-year Reservation
      • The amount of storage you use

        • A single place to store all data
        • Pay-as-you-go (billable per GB/month)
Capacity License SKUs
  • Capacity licenses are split into SKUs. Each SKU provides a set of Fabric resources for your organization. Your organization can have as many capacity licenses as needed.
Capacity Unit
  • Capacity unit (CU) = Compute Power

  • CU Consumption

    Each capability, such as Power BI, Spark, Data Warehouse, with the associated queries, jobs, or tasks has a unique consumption rate.

Access Control

  • Tenant

  • Capacity

  • Workspace

  • Item

    Data Warehouse, Data Lakehouse, Dataflow, Semantic Model, etc.

  • Object

    Table, View, Function, Stored Procedure, etc.

Workspace

  • Workspace is created under a capacity.
  • Workspace is a container for Microsoft Fabric items.
Workspace - License mode
Workspace - Roles
  • Workspace roles apply to all items in the workspace

  • Roles in workspaces in Microsoft Fabric (opens in a new tab)

  • Admin

    • Update and delete the workspace
    • Add or remove people, including other admins
  • Member

    Everything an admin can do, except the above two.

    • Add members or others wtith lower permissions
    • Allow others to reshare items
  • Contributor

    Everything a member can do, except the above two.

  • Viewer

    Read-only access to the workspace without API access.

Data Store options

  • Eventhouse
  • Cosmos DB
  • SQL databse in Fabric
  • Fabric Data Warehouse
  • Lakehouse

Azure Synapse Analytics

  • Analytics pools

    • SQL pool
    • Apache Spark pool
    • Data Explorer pool

Resources