Data Mesh using Microsoft Fabric

Nathan Bijnens
Manager, Belux CSU Data Team
Data Mesh in Microsoft Fabric
Ivana Pejeva
Cloud Solution Architect, Data & AI

What we’ve heard
To spend less time preparing
data
Robust data governance
Platform to actionable Insights to
the business
Ability to increase the value of
hidden data
Improve Operational Efficiency
Ideally, organizations
want to have…..
Reduce cost of data engineering
Need for Frictionless
Data Governance
Difficult to balance
access and data
protection
Data and Analytics
Operationalization
Enable Lines of Businesses
Poor data quality
Disparate systems
and data silos
Too slow moving
from data to decision
Barriers
to
achieve
business
outcomes
Unified ecosystem
Project prioritization

Every application that creates data, needs and will have a database
Application A Application B
Consequently, when we have two applications, we hypothesize that each application has its own ‘database’.
When there is interoperability between these two applications, we expect data to be transferred from one
application to the other.
Every application, at least in the context of data management, that creates data, needs and will have a
database. Even stateless applications that create data have “databases”. In these scenarios the database
typically sits in the RAM or in a temp file.

We can’t escape from data integration
Application A Application B
The ‘always’ required data transformation lies in the fact that an application database schema is designed to
meet the application’s specific requirements. Since the requirements differ from application to application,
the schemas are expected to be different and data integration is always required when moving data around.
A crucial aspect when it comes to data transfer is that data integration is always right around the corner.
Whether you do ETL or ELT, virtual or physical, batch or real-time, there’s no escape from the data
integration* dilemma.
Data integration

Business Drivers
•Lack of data
ownership
Lack of data quality
Difficult to see
interdependencies
Model conflicts
across business
concerns
Tremendous effort
for integration and
coordination leads
to bypasses
Business and IT
work in silos
Disconnect
between the data
producers and data
consumers
Central team
becomes the
bottleneck
Difficult to apply
policy and
governance
Hard to see
technical
dependencies
Small changes
become risky due
to unexpected
consequences
Technical
ownership rather
than data
ownership
Many Enterprises are saddled with outdated Data Architectures that do not scale to the needs of large multi-
disciplinary organizations.

Problems with Existing Architectures
There’s a deep assumption that centralization is the solution to data management. This includes
centralizing all data and management activities into one central team, building one data platform,
using one ETL framework, using one canonical model, etc.
Transactional
Sources
Analytical
Consumers
Centralized Architecture
• Single team with centralized knowledge and book of work
• Centralized pipelines for all extraction / ingestion activities
• Centralized transformations to create harmonized data
• Central platform serves as large integration database: all
execution and analysis is done on the same platform
Data providers Data consumers
Central engineering team
Transactional
Sources
Transactional
Sources
Analytical
Consumers
Analytical
Consumers

Data as a Product
Data is no
longer a
side-effect,
it’s a product.
Who are my
"customers"?
What do my
"customers"
need?
Are they
happy with
the data? Are
they using it?
How do I let
my
"customers"
know my
data exists?
What is in it
for the
"customer"?

Data Product Owner
Domain
Data
Product
Owner
Data
Engineer
Software
Developer
Infra
Engineer
How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh (martinfowler.com)
Zhamak Dehghani

Data Product Properties
Zhamak Dehghani
• Overview of product in central data catalog
• Provide easy discoverability
Discoverable
• Help users access the product
programmatically
Addressable
• Data Product Owners provide monitored SLOs
• Data is cleansed and up to standard
Trustworthy
• Minimal friction for data engineers and
scientists to use the data
Self-describing
• Open standards for harmonization
• Field type formatting
Interoperable
• Access control policies
• Use SSO and RBAC
Secure

Data Mesh
Data Mesh is a new decentralized
socio-technical approach to
managing data, designed to work
with organizational complexity and
continuous growth. It enables large
organizations to get value from their
data, at scale, through reusability,
analytics and ML. It is building on the
Domain Driven Design methodology.
Data
Mesh
Domain
Driven
Design
Domain
Zones
Data
Products
Consumed
by other
Domains
Zhamak Dehghani

Centralized Implementation is not working!
Engineering
Finance
HR
Marketing
Innovation
Operations
Centralized Platform
LOBs are the SMEs and Shared
Service team is not able to cope
up with the projects
Datasets sprawls
Competing needs within the
organization
• IT needs to standardize
• LOBs need to implement analytics
Primitive Data Strategy

Introduction to Data Domains
Search
Keywords
Promotions
Top
Selling
Products
Orders
Customer
Profiles
Data Products
Integration
Services
Operational
Systems
Marketing
Domain
Customer Services
Domain
Order Management
Domain
• A domain is a collection of people, typically organized around a common business purpose.
• Create and serve data products to other domains and end users, independently from other domains.
• Ensure data is accessible, usable, available, and meets the quality criteria defined.
• Evolve data products based on user feedback and retire data products when they become irrelevant.

Domain Zones
Engineering
Finance HR Innovation
Marketing Operations
Management zone
Data products
Data Domains
Microsoft Enterprise Data Mesh

Domain Zone
Domain Zone
Environment for each LOB
LOBs: Implement Data Services
• ex: Exploration Service, Data Order System
LOBs: Build and Share Data Products
• ex: Sales Forecast, Clean Room Performance
Automated using templates
• security, integration, monitoring, etc.

E N T E R P R I S E
R E Q U I R E M E N T S
Security & Privacy
Governance & Compliance
Availability & Recovery
Performance & Scalability
Skills & Training
Usage & Cost Management
Observation & Monitoring
Domain Architecture

Modern Analytics and Governance at Scale
Open and Governed Data Lakehouse Foundation
• Automated Data Services • Data Management
• Data Operationalization
data mesh
data fabric
data hub
Microsoft’s Hybrid Approach to data mesh, data fabric and data hub
Data Governance
Security
Compliance
Data Engineering
Real-time
ML & AI
SQL Based Analytics
Enterprise BI
Data Products

Open and Governed Data Lakehouse Foundation
• Automated Data Services • Data Management
• Data Operationalization
data mesh
data fabric
data hub
Microsoft’s Hybrid Approach to data mesh, data fabric and data hub
Data Governance
Security
Compliance
HR Innovation
Engineering
Operations
Finance
Marketing
Data Products

Governance
Open and Governed
Data Lakehouse Foundation
Data Services
Data Services
Data Services
Data Services
Data Services
Data Services
Automated Services | Data Management
Data Services
Data Products
Unifying the Domains
Domains
Data Engineering
Real-time Analytics
ML, AI & Data Science
SQL-based Analytics
Enterprise BI

Open and Governed
Data Lakehouse Foundation
Governance Finance
Marketing
Operations
Line of Business
Shared
• Self-Serve Analytics
• Empower LOBs to implement their own analytics projects
• Democratize data and analytics across LOBs
• Accelerate Cross Business Unit Collaboration
• Leverage LOB SMEs for business analytics
• Re-use data products across domains
• Reduce data engineering
• Improve data agility
Data Products
Raw / Conformed
Unifying the Domains
Modern Analytics and Governance

Azure
Databricks
MS Fabric
Azure
Databricks
MS Fabric
Azure
Databricks
MS Fabric
MS Fabric
MS Fabric
MS Fabric
OneLake
(ShortcutExisting Data Lakes)
Internal Storage (SaaS)
MS Fabric

IT or Shared Services Team
Data Factory Azure
Databricks
Data Flow
Ext Data Feed
On-Prem Data Feed
Data Lake
Raw Curated Publish
IOT Hub
Workspace 1
Workspace 2 Workspace 3
Capacity Capacity
MS Fabric
Marketing
OneLake
MS Fabric implementation

MS Fabric implementation
Power BI
Datamart
Operations
OneLake
(internal storage)
HR
• Lakehouse 1
• Lakehouse 2
Finance
• Warehouse 1
• Warehouse 2
• Lakehouse 1
Innovation
• Lakehouse 1
• Warehouse 1
Engineering
• Lakehouse 1
• Lakehouse 2
Power BI DirectLake
Power BI
Datamart
Marketing
Power BI DirectLake

OneLake for all domains
OneLake gives a true data mesh as a service
Introducing domains as an integral part of Fabric:
A domain is a way to logically group together all
the data in an organization relevant to an area or
field, according to business needs
Domains are defined with domain admins and
contributors who can associate workspaces and
group them together under a relevant domain
Federated governance can be achieved by
delegating settings to domain admins, thus
allowing them to achieve more granular control
over their business area
Domains simplify discovery and consumption of
data across the organization, thus allowing
business optimized consumption
Avoid data swamps by endorsing certain data as
certified or promoted, thus encouraging reuse.
Unified management and governance
Certified
Workspace
POS sales
Certified
Workspace
online sales
Sales
Workspace
customer
Promoted
Workspace
ads
Marketing
Workspace
expenses
Finance
Data
Factory
Synapse Data
Warehousing
Synapse Data
Engineering
Synapse Data
Science
Synapse Real
Time Analytics
Power BI
Data
Activator
Sales Marketing Finance

Shortcuts virtualize data across domains and clouds
No data movements or duplication
A shortcut is a symbolic link which points
from one data location to another
Create a shortcut to make data from a
warehouse part of your lakehouse
Create a shortcut within Fabric to consolidate
data across items or workspaces without
changing the ownership of the data. Data can be
reused multiple times without data duplication.
Existing ADLS gen2 storage accounts and
Amazon S3 buckets can be managed
externally to Fabric and Microsoft while still
being virtualized into OneLake with shortcuts
All data is mapped to a unified namespace
and can be accessed using the same APIs
including the ADLS Gen2 DFS APIs
Workspace A
Warehouse
Finance
Lakehouse
Customer
360
Workspace B
Lakehouse
Service
telemetry
Warehouse
Business
KPIs
Amazon
Azure
Data
Factory
Synapse Data
Warehousing
Synapse Data
Engineering
Synapse Data
Science
Synapse Real
Time Analytics
Power BI
Data
Activator

OneLake gives a true data mesh as a service 1
One Copy enables data to be used across domains, clouds and engines
Marketing
Operations
Finance
Engineering
Sales
HR Innovation
An organization will have many data
domains with many workspaces with
different data owners. However, a single
data product can span multiple domains.
Shortcuts provide the connections between
domains so that data can be virtualized into a
single data product without data movement,
data duplication or changing the ownership
of the data.
Data
Factory
Synapse Data
Warehousing
Synapse Data
Engineering
Synapse Data
Science
Synapse Real
Time Analytics
Power BI
Data
Activator

Interested in
learning more?
Reach out to
Nathan.Bijnens@microsoft.com
ivanapejeva@microsoft.com

Data Mesh using Microsoft Fabric

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Data Mesh using Microsoft Fabric

Similar to Data Mesh using Microsoft Fabric (20)

More from Nathan Bijnens

More from Nathan Bijnens (16)

Recently uploaded

Recently uploaded (20)

Data Mesh using Microsoft Fabric

Editor's Notes