DP-700 - Microsoft Certified - Fabric Data Engineer Associate

Certification - Microsoft Certified: Fabric Data Engineer Associate (opens in a new tab)

Microsoft Learn

Study Guide for Exam DP-700: Implementing Data Engineering Solutions Using Microsoft Fabric (opens in a new tab)

Exam Topics

Implement and manage an analytics solution (30–35%)

Configure Microsoft Fabric workspace settings

Configure Spark workspace settings
Configure domain workspace settings
Configure OneLake workspace settings
Configure data workflow workspace settings

Implement lifecycle management in Fabric

Configure version control
Implement database projects
Create and configure deployment pipelines

Configure security and governance

Implement workspace-level access controls
Implement item-level access controls
Implement row-level, column-level, object-level, and folder/file-level access controls
Implement dynamic data masking
Apply sensitivity labels to items
Endorse items
Implement and use workspace logging

Orchestrate processes

Choose between a pipeline and a notebook
Design and implement schedules and event-based triggers
Implement orchestration patterns with notebooks and pipelines, including parameters and dynamic expressions

Ingest and transform data (30–35%)

Design and implement loading patterns

Design and implement full and incremental data loads
Prepare data for loading into a dimensional model
Design and implement a loading pattern for streaming data

Ingest and transform batch data

Choose an appropriate data store
Choose between dataflows, notebooks, KQL, and T-SQL for data transformation
Create and manage shortcuts to data
Implement mirroring
Ingest data by using pipelines
Transform data by using PySpark, SQL, and KQL
Denormalize data
Group and aggregate data
Handle duplicate, missing, and late-arriving data

Ingest and transform streaming data

Choose an appropriate streaming engine
Choose between native storage, mirrored storage, or shortcuts in Real-Time Intelligence
Process data by using eventstreams
Process data by using Spark structured streaming
Process data by using KQL
Create windowing functions

Monitor and optimize an analytics solution (30–35%)

Monitor Fabric items

Monitor data ingestion
Monitor data transformation
Monitor semantic model refresh
Configure alerts

Identify and resolve errors

Identify and resolve pipeline errors
Identify and resolve dataflow errors
Identify and resolve notebook errors
Identify and resolve eventhouse errors
Identify and resolve eventstream errors
Identify and resolve T-SQL errors

Optimize performance

Optimize a lakehouse table
Optimize a pipeline
Optimize a data warehouse
Optimize eventstreams and eventhouses
Optimize Spark performance
Optimize query performance

Services

Data Engineering documentation in Microsoft Fabric (opens in a new tab)

Dataflows Gen2

What is Dataflow Gen2? (opens in a new tab)

Azure Data Factory

Managed, serverless ETL/ELT service
SSIS (SQL Server Integration Services) in the cloud

Azure Data Factory - Data Factory Pipelines

Data Factory Pipelines can be used to orchestrate Spark, Dataflow, and other activities; enabling you to implement complex data transformation processes.

Azure Data Lake Storage (ADLS) (opens in a new tab)

Hadoop compatible access

Treat the data as if it's stored in a HDFS
Security

ACL and POSIX permissions
Hierarchical namespace for easier navigation and better performance

Azure Data Lake Storage Gen2 builds on blob storage and optimizes I/O of high-volume data by using a hierarchical namespace that organizes blob data into directories, and stores metadata about each directory and the files within it. This structure allows operations, such as directory renames and deletes, to be performed in a single atomic operation. Flat namespaces, by contrast, require several operations proportionate to the number of objects in the structure. Hierarchical namespaces keep the data organized, which yields better storage and retrieval performance for an analytical use case and lowers the cost of analysis.
Data Redundancy

Supported by underlying Azure Blob storage redundancy options (LRS/GRS)

Microsoft Fabric

Pricing (opens in a new tab)

Capacity

Key points
- A Microsoft Fabric capacity resides on a tenant.
- Each capacity that sits under a specific tenant is a distinct pool of resources allocated to Microsoft Fabric.
Benefits
- Centralized management of capacity
  
  Rather than provisioning and managing separate resources for each workload, with Microsoft Fabric, your bill is determined by 2 variables:
  - The amount of compute you provision
    - A shared pool of capacity that powers all capabilities in Microsoft Fabric.
    - Pay-as-you-go and 1-year Reservation
  - The amount of storage you use
    - A single place to store all data
    - Pay-as-you-go (billable per GB/month)

Capacity License SKUs

Capacity licenses are split into SKUs. Each SKU provides a set of Fabric resources for your organization. Your organization can have as many capacity licenses as needed.

Capacity Unit

Capacity unit (CU) = Compute Power
CU Consumption

Each capability, such as Power BI, Spark, Data Warehouse, with the associated queries, jobs, or tasks has a unique consumption rate.

Access Control

Tenant
Capacity
Workspace
Item

Data Warehouse, Data Lakehouse, Dataflow, Semantic Model, etc.
Object

Table, View, Function, Stored Procedure, etc.

Workspace

Workspace is created under a capacity.
Workspace is a container for Microsoft Fabric items.

Workspace - License mode

Microsoft Learn - Microsoft Fabric concepts and licenses (opens in a new tab)
The workspace license mode dictates what kind of capacity the workspace can be hosted in and as a result the capabilities available.

Workspace - Roles

Workspace roles apply to all items in the workspace
Roles in workspaces in Microsoft Fabric (opens in a new tab)
Admin
- Update and delete the workspace
- Add or remove people, including other admins
Member

Everything an admin can do, except the above two.
- Add members or others wtith lower permissions
- Allow others to reshare items
Contributor

Everything a member can do, except the above two.
Viewer

Read-only access to the workspace without API access.

Data Store options

Eventhouse
Cosmos DB
SQL databse in Fabric
Fabric Data Warehouse
Lakehouse

Azure Synapse Analytics

Analytics pools
- SQL pool
- Apache Spark pool
- Data Explorer pool

Resources

Experience from internet

Apache Parquet Object Storage