Certification - Microsoft Certified: Fabric Data Engineer Associate (opens in a new tab)
Microsoft Learn
Exam Topics
Implement and manage an analytics solution (30–35%)
Configure Microsoft Fabric workspace settings
- Configure Spark workspace settings
- Configure domain workspace settings
- Configure OneLake workspace settings
- Configure data workflow workspace settings
Implement lifecycle management in Fabric
- Configure version control
- Implement database projects
- Create and configure deployment pipelines
Configure security and governance
- Implement workspace-level access controls
- Implement item-level access controls
- Implement row-level, column-level, object-level, and folder/file-level access controls
- Implement dynamic data masking
- Apply sensitivity labels to items
- Endorse items
- Implement and use workspace logging
Orchestrate processes
- Choose between a pipeline and a notebook
- Design and implement schedules and event-based triggers
- Implement orchestration patterns with notebooks and pipelines, including parameters and dynamic expressions
Ingest and transform data (30–35%)
Design and implement loading patterns
- Design and implement full and incremental data loads
- Prepare data for loading into a dimensional model
- Design and implement a loading pattern for streaming data
Ingest and transform batch data
- Choose an appropriate data store
- Choose between dataflows, notebooks, KQL, and T-SQL for data transformation
- Create and manage shortcuts to data
- Implement mirroring
- Ingest data by using pipelines
- Transform data by using PySpark, SQL, and KQL
- Denormalize data
- Group and aggregate data
- Handle duplicate, missing, and late-arriving data
Ingest and transform streaming data
- Choose an appropriate streaming engine
- Choose between native storage, mirrored storage, or shortcuts in Real-Time Intelligence
- Process data by using eventstreams
- Process data by using Spark structured streaming
- Process data by using KQL
- Create windowing functions
Monitor and optimize an analytics solution (30–35%)
Monitor Fabric items
- Monitor data ingestion
- Monitor data transformation
- Monitor semantic model refresh
- Configure alerts
Identify and resolve errors
- Identify and resolve pipeline errors
- Identify and resolve dataflow errors
- Identify and resolve notebook errors
- Identify and resolve eventhouse errors
- Identify and resolve eventstream errors
- Identify and resolve T-SQL errors
Optimize performance
- Optimize a lakehouse table
- Optimize a pipeline
- Optimize a data warehouse
- Optimize eventstreams and eventhouses
- Optimize Spark performance
- Optimize query performance
Services
Dataflows Gen2
Azure Data Factory
- Managed, serverless ETL/ELT service
- SSIS (SQL Server Integration Services) in the cloud
Azure Data Factory - Data Factory Pipelines
Data Factory Pipelines can be used to orchestrate Spark, Dataflow, and other activities; enabling you to implement complex data transformation processes.
Azure Data Lake Storage (ADLS) (opens in a new tab)
-
Hadoop compatible access
Treat the data as if it's stored in a
HDFS -
Security
ACLandPOSIXpermissions -
Hierarchical namespace for easier navigation and better performance
Azure Data Lake Storage Gen2builds on blob storage and optimizes I/O of high-volume data by using a hierarchical namespace that organizes blob data into directories, and stores metadata about each directory and the files within it. This structure allows operations, such as directory renames and deletes, to be performed in a single atomic operation. Flat namespaces, by contrast, require several operations proportionate to the number of objects in the structure. Hierarchical namespaces keep the data organized, which yields better storage and retrieval performance for an analytical use case and lowers the cost of analysis. -
Data Redundancy
Supported by underlying
Azure Blob storageredundancy options (LRS/GRS)
Microsoft Fabric
Capacity
-
Key points
- A
Microsoft Fabriccapacityresides on atenant. - Each
capacitythat sits under a specifictenantis a distinct pool of resources allocated toMicrosoft Fabric.
- A
-
Benefits
-
Centralized management of capacity
Rather than provisioning and managing separate resources for each workload, with
Microsoft Fabric, your bill is determined by 2 variables:-
The amount of compute you provision
- A shared pool of capacity that powers all capabilities in
Microsoft Fabric. Pay-as-you-goand 1-year Reservation
- A shared pool of capacity that powers all capabilities in
-
The amount of storage you use
- A single place to store all data
Pay-as-you-go(billable per GB/month)
-
-
Capacity License SKUs
Capacity licensesare split intoSKUs. EachSKUprovides a set of Fabric resources for your organization. Your organization can have as many capacity licenses as needed.
Capacity Unit
-
Capacity unit (CU)= Compute Power -
CUConsumptionEach capability, such as
Power BI,Spark,Data Warehouse, with the associated queries, jobs, or tasks has a unique consumption rate.
Access Control
-
Tenant -
Capacity -
Workspace -
ItemData Warehouse, Data Lakehouse, Dataflow, Semantic Model, etc.
-
Object
Table, View, Function, Stored Procedure, etc.
Workspace
Workspaceis created under acapacity.Workspaceis a container forMicrosoft Fabricitems.
Workspace - License mode
-
Microsoft Learn - Microsoft Fabric concepts and licenses (opens in a new tab)
-
The
workspace license modedictates what kind ofcapacitytheworkspacecan be hosted in and as a result the capabilities available.
Workspace - Roles
-
Workspace rolesapply to allitemsin theworkspace -
Roles in workspaces in Microsoft Fabric (opens in a new tab)
-
Admin- Update and delete the workspace
- Add or remove people, including other admins
-
MemberEverything an admin can do, except the above two.
- Add members or others wtith lower permissions
- Allow others to reshare items
-
ContributorEverything a member can do, except the above two.
-
ViewerRead-only access to the workspace without API access.
Data Store options
- Eventhouse
- Cosmos DB
- SQL databse in Fabric
- Fabric Data Warehouse
- Lakehouse
Azure Synapse Analytics
-
Analytics pools
- SQL pool
- Apache Spark pool
- Data Explorer pool