尊敬的 微信汇率:1円 ≈ 0.046239 元 支付宝汇率:1円 ≈ 0.04633元 [退出登录]
SlideShare a Scribd company logo
Nathan Bijnens
Manager, Belux CSU Data Team
Data Mesh in Microsoft Fabric
Ivana Pejeva
Cloud Solution Architect, Data & AI
What we’ve heard
To spend less time preparing
data
Robust data governance
Platform to actionable Insights to
the business
Ability to increase the value of
hidden data
Improve Operational Efficiency
Ideally, organizations
want to have…..
Reduce cost of data engineering
Need for Frictionless
Data Governance
Difficult to balance
access and data
protection
Data and Analytics
Operationalization
Enable Lines of Businesses
Poor data quality
Disparate systems
and data silos
Too slow moving
from data to decision
Barriers
to
achieve
business
outcomes
Unified ecosystem
Project prioritization
Every application that creates data, needs and will have a database
Application A Application B
Consequently, when we have two applications, we hypothesize that each application has its own ‘database’.
When there is interoperability between these two applications, we expect data to be transferred from one
application to the other.
Every application, at least in the context of data management, that creates data, needs and will have a
database. Even stateless applications that create data have “databases”. In these scenarios the database
typically sits in the RAM or in a temp file.
We can’t escape from data integration
Application A Application B
The ‘always’ required data transformation lies in the fact that an application database schema is designed to
meet the application’s specific requirements. Since the requirements differ from application to application,
the schemas are expected to be different and data integration is always required when moving data around.
A crucial aspect when it comes to data transfer is that data integration is always right around the corner.
Whether you do ETL or ELT, virtual or physical, batch or real-time, there’s no escape from the data
integration* dilemma.
Data integration
Business Drivers
•Lack of data
ownership
Lack of data quality
Difficult to see
interdependencies
Model conflicts
across business
concerns
Tremendous effort
for integration and
coordination leads
to bypasses
Business and IT
work in silos
Disconnect
between the data
producers and data
consumers
Central team
becomes the
bottleneck
Difficult to apply
policy and
governance
Hard to see
technical
dependencies
Small changes
become risky due
to unexpected
consequences
Technical
ownership rather
than data
ownership
Many Enterprises are saddled with outdated Data Architectures that do not scale to the needs of large multi-
disciplinary organizations.
Problems with Existing Architectures
There’s a deep assumption that centralization is the solution to data management. This includes
centralizing all data and management activities into one central team, building one data platform,
using one ETL framework, using one canonical model, etc.
Transactional
Sources
Analytical
Consumers
Centralized Architecture
• Single team with centralized knowledge and book of work
• Centralized pipelines for all extraction / ingestion activities
• Centralized transformations to create harmonized data
• Central platform serves as large integration database: all
execution and analysis is done on the same platform
Data providers Data consumers
Central engineering team
Transactional
Sources
Transactional
Sources
Analytical
Consumers
Analytical
Consumers
Data as a Product
Data as a Product
Data is no
longer a
side-effect,
it’s a product.
Who are my
"customers"?
What do my
"customers"
need?
Are they
happy with
the data? Are
they using it?
How do I let
my
"customers"
know my
data exists?
What is in it
for the
"customer"?
Data Product Owner
Domain
Data
Product
Owner
Data
Engineer
Software
Developer
Infra
Engineer
How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh (martinfowler.com)
Zhamak Dehghani
Data Product Properties
How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh (martinfowler.com)
Zhamak Dehghani
• Overview of product in central data catalog
• Provide easy discoverability
Discoverable
• Help users access the product
programmatically
Addressable
• Data Product Owners provide monitored SLOs
• Data is cleansed and up to standard
Trustworthy
• Minimal friction for data engineers and
scientists to use the data
Self-describing
• Open standards for harmonization
• Field type formatting
Interoperable
• Access control policies
• Use SSO and RBAC
Secure
Data Mesh
Data Mesh
Data Mesh is a new decentralized
socio-technical approach to
managing data, designed to work
with organizational complexity and
continuous growth. It enables large
organizations to get value from their
data, at scale, through reusability,
analytics and ML. It is building on the
Domain Driven Design methodology.
Data
Mesh
Domain
Driven
Design
Domain
Zones
Data
Products
Consumed
by other
Domains
How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh (martinfowler.com)
Zhamak Dehghani
Centralized Implementation is not working!
Engineering
Finance
HR
Marketing
Innovation
Operations
Centralized Platform
LOBs are the SMEs and Shared
Service team is not able to cope
up with the projects
Datasets sprawls
Competing needs within the
organization
• IT needs to standardize
• LOBs need to implement analytics
Primitive Data Strategy
Introduction to Data Domains
Search
Keywords
Promotions
Top
Selling
Products
Orders
Customer
Profiles
Data Products
Integration
Services
Operational
Systems
Marketing
Domain
Customer Services
Domain
Order Management
Domain
• A domain is a collection of people, typically organized around a common business purpose.
• Create and serve data products to other domains and end users, independently from other domains.
• Ensure data is accessible, usable, available, and meets the quality criteria defined.
• Evolve data products based on user feedback and retire data products when they become irrelevant.
Domain Zones
Engineering
Finance HR Innovation
Marketing Operations
Management zone
Data products
Data Domains
Microsoft Enterprise Data Mesh
Domain Zone
Domain Zone
Environment for each LOB
LOBs: Implement Data Services
• ex: Exploration Service, Data Order System
LOBs: Build and Share Data Products
• ex: Sales Forecast, Clean Room Performance
Automated using templates
• security, integration, monitoring, etc.
E N T E R P R I S E
R E Q U I R E M E N T S
Security & Privacy
Governance & Compliance
Availability & Recovery
Performance & Scalability
Skills & Training
Usage & Cost Management
Observation & Monitoring
Domain Architecture
E N T E R P R I S E
R E Q U I R E M E N T S
Security & Privacy
Governance & Compliance
Availability & Recovery
Performance & Scalability
Skills & Training
Usage & Cost Management
Observation & Monitoring
Domain Architecture
Data Mesh in
Microsoft Fabric
Modern Analytics and Governance at Scale
Open and Governed Data Lakehouse Foundation
• Automated Data Services • Data Management
• Data Operationalization
data mesh
data fabric
data hub
Microsoft’s Hybrid Approach to data mesh, data fabric and data hub
Data Governance
Security
Compliance
Data Engineering
Real-time
ML & AI
SQL Based Analytics
Enterprise BI
Data Products
Modern Analytics and Governance at Scale
Open and Governed Data Lakehouse Foundation
• Automated Data Services • Data Management
• Data Operationalization
data mesh
data fabric
data hub
Microsoft’s Hybrid Approach to data mesh, data fabric and data hub
Data Governance
Security
Compliance
HR Innovation
Engineering
Operations
Finance
Marketing
Data Products
Governance
Open and Governed
Data Lakehouse Foundation
Data Services
Data Services
Data Services
Data Services
Data Services
Data Services
Automated Services | Data Management
Data Services
Data Products
Unifying the Domains
Domains
Data Engineering
Real-time Analytics
ML, AI & Data Science
SQL-based Analytics
Enterprise BI
Modern Analytics and Governance at Scale
Open and Governed
Data Lakehouse Foundation
Governance Finance
Marketing
Operations
Line of Business
Shared
• Self-Serve Analytics
• Empower LOBs to implement their own analytics projects
• Democratize data and analytics across LOBs
• Accelerate Cross Business Unit Collaboration
• Leverage LOB SMEs for business analytics
• Re-use data products across domains
• Reduce data engineering
• Improve data agility
Data Products
Raw / Conformed
Unifying the Domains
Modern Analytics and Governance
Azure
Databricks
MS Fabric
Azure
Databricks
MS Fabric
Azure
Databricks
MS Fabric
MS Fabric
MS Fabric
MS Fabric
OneLake
(ShortcutExisting Data Lakes)
Internal Storage (SaaS)
MS Fabric
IT or Shared Services Team
Data Factory Azure
Databricks
Data Flow
Ext Data Feed
On-Prem Data Feed
Data Lake
Raw Curated Publish
IOT Hub
Workspace 1
Workspace 2 Workspace 3
Capacity Capacity
MS Fabric
Marketing
OneLake
MS Fabric implementation
MS Fabric implementation
Power BI
Datamart
Operations
OneLake
(internal storage)
HR
• Lakehouse 1
• Lakehouse 2
Finance
• Warehouse 1
• Warehouse 2
• Lakehouse 1
Innovation
• Lakehouse 1
• Warehouse 1
Engineering
• Lakehouse 1
• Lakehouse 2
Power BI DirectLake
Power BI
Datamart
Marketing
Power BI DirectLake
Demo Fabric Domains
OneLake for all domains
OneLake gives a true data mesh as a service
Introducing domains as an integral part of Fabric:
A domain is a way to logically group together all
the data in an organization relevant to an area or
field, according to business needs
Domains are defined with domain admins and
contributors who can associate workspaces and
group them together under a relevant domain
Federated governance can be achieved by
delegating settings to domain admins, thus
allowing them to achieve more granular control
over their business area
Domains simplify discovery and consumption of
data across the organization, thus allowing
business optimized consumption
Avoid data swamps by endorsing certain data as
certified or promoted, thus encouraging reuse.
Unified management and governance
Certified
Workspace
POS sales
Certified
Workspace
online sales
Sales
Workspace
customer
Promoted
Workspace
ads
Marketing
Workspace
expenses
Finance
Data
Factory
Synapse Data
Warehousing
Synapse Data
Engineering
Synapse Data
Science
Synapse Real
Time Analytics
Power BI
Data
Activator
Sales Marketing Finance
Shortcuts virtualize data across domains and clouds
No data movements or duplication
A shortcut is a symbolic link which points
from one data location to another
Create a shortcut to make data from a
warehouse part of your lakehouse
Create a shortcut within Fabric to consolidate
data across items or workspaces without
changing the ownership of the data. Data can be
reused multiple times without data duplication.
Existing ADLS gen2 storage accounts and
Amazon S3 buckets can be managed
externally to Fabric and Microsoft while still
being virtualized into OneLake with shortcuts
All data is mapped to a unified namespace
and can be accessed using the same APIs
including the ADLS Gen2 DFS APIs
Unified management and governance
Workspace A
Warehouse
Finance
Lakehouse
Customer
360
Workspace B
Lakehouse
Service
telemetry
Warehouse
Business
KPIs
Amazon
Azure
Data
Factory
Synapse Data
Warehousing
Synapse Data
Engineering
Synapse Data
Science
Synapse Real
Time Analytics
Power BI
Data
Activator
OneLake gives a true data mesh as a service 1
One Copy enables data to be used across domains, clouds and engines
Unified management and governance
Marketing
Operations
Finance
Engineering
Sales
HR Innovation
An organization will have many data
domains with many workspaces with
different data owners. However, a single
data product can span multiple domains.
Shortcuts provide the connections between
domains so that data can be virtualized into a
single data product without data movement,
data duplication or changing the ownership
of the data.
Data
Factory
Synapse Data
Warehousing
Synapse Data
Engineering
Synapse Data
Science
Synapse Real
Time Analytics
Power BI
Data
Activator
Interactive Whiteboarding
Interested in
learning more?
Reach out to
Nathan.Bijnens@microsoft.com
ivanapejeva@microsoft.com

More Related Content

What's hot

Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
Databricks
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure
Antonios Chatzipavlis
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
Dmitry Anoshin
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
James Serra
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
Databricks
 
Data Mesh
Data MeshData Mesh
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
James Serra
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
Jeffrey T. Pollock
 
Azure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data FlowsAzure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data Flows
Thomas Sykes
 
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
DataScienceConferenc1
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
James Serra
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
James Serra
 
Azure Data Engineering.pptx
Azure Data Engineering.pptxAzure Data Engineering.pptx
Azure Data Engineering.pptx
priyadharshini626440
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
Databricks
 
adb.pdf
adb.pdfadb.pdf
Azure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the CloudAzure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the Cloud
Mark Kromer
 
Getting Started with Databricks SQL Analytics
Getting Started with Databricks SQL AnalyticsGetting Started with Databricks SQL Analytics
Getting Started with Databricks SQL Analytics
Databricks
 
Introduction to Azure Data Lake
Introduction to Azure Data LakeIntroduction to Azure Data Lake
Introduction to Azure Data Lake
Antonios Chatzipavlis
 
1- Introduction of Azure data factory.pptx
1- Introduction of Azure data factory.pptx1- Introduction of Azure data factory.pptx
1- Introduction of Azure data factory.pptx
BRIJESH KUMAR
 
Snowflake Datawarehouse Architecturing
Snowflake Datawarehouse ArchitecturingSnowflake Datawarehouse Architecturing
Snowflake Datawarehouse Architecturing
Ishan Bhawantha Hewanayake
 

What's hot (20)

Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Azure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data FlowsAzure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data Flows
 
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
Azure Data Engineering.pptx
Azure Data Engineering.pptxAzure Data Engineering.pptx
Azure Data Engineering.pptx
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
adb.pdf
adb.pdfadb.pdf
adb.pdf
 
Azure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the CloudAzure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the Cloud
 
Getting Started with Databricks SQL Analytics
Getting Started with Databricks SQL AnalyticsGetting Started with Databricks SQL Analytics
Getting Started with Databricks SQL Analytics
 
Introduction to Azure Data Lake
Introduction to Azure Data LakeIntroduction to Azure Data Lake
Introduction to Azure Data Lake
 
1- Introduction of Azure data factory.pptx
1- Introduction of Azure data factory.pptx1- Introduction of Azure data factory.pptx
1- Introduction of Azure data factory.pptx
 
Snowflake Datawarehouse Architecturing
Snowflake Datawarehouse ArchitecturingSnowflake Datawarehouse Architecturing
Snowflake Datawarehouse Architecturing
 

Similar to Data Mesh using Microsoft Fabric

Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Nathan Bijnens
 
SphereEx pitch deck
SphereEx pitch deckSphereEx pitch deck
SphereEx pitch deck
Tech in Asia
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
DATAVERSITY
 
The Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationThe Shifting Landscape of Data Integration
The Shifting Landscape of Data Integration
DATAVERSITY
 
IBM Relay 2015: Open for Data
IBM Relay 2015: Open for Data IBM Relay 2015: Open for Data
IBM Relay 2015: Open for Data
IBM
 
IBM Cloud pak for data brochure
IBM Cloud pak for data   brochureIBM Cloud pak for data   brochure
IBM Cloud pak for data brochure
Simon Harrison ACMA CGMA
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Denodo
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
James Serra
 
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Denodo
 
Data Virtualization: The Agile Delivery Platform
Data Virtualization: The Agile Delivery PlatformData Virtualization: The Agile Delivery Platform
Data Virtualization: The Agile Delivery Platform
Denodo
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
DATAVERSITY
 
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Denodo
 
Bridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItBridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need It
Denodo
 
Next Gen Analytics Going Beyond Data Warehouse
Next Gen Analytics Going Beyond Data WarehouseNext Gen Analytics Going Beyond Data Warehouse
Next Gen Analytics Going Beyond Data Warehouse
Denodo
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
DATAVERSITY
 
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Dataconomy Media
 
Data and Application Modernization in the Age of the Cloud
Data and Application Modernization in the Age of the CloudData and Application Modernization in the Age of the Cloud
Data and Application Modernization in the Age of the Cloud
redmondpulver
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal Modernization
Denodo
 

Similar to Data Mesh using Microsoft Fabric (20)

Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
 
SphereEx pitch deck
SphereEx pitch deckSphereEx pitch deck
SphereEx pitch deck
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
 
The Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationThe Shifting Landscape of Data Integration
The Shifting Landscape of Data Integration
 
IBM Relay 2015: Open for Data
IBM Relay 2015: Open for Data IBM Relay 2015: Open for Data
IBM Relay 2015: Open for Data
 
IBM Cloud pak for data brochure
IBM Cloud pak for data   brochureIBM Cloud pak for data   brochure
IBM Cloud pak for data brochure
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
 
Data Virtualization: The Agile Delivery Platform
Data Virtualization: The Agile Delivery PlatformData Virtualization: The Agile Delivery Platform
Data Virtualization: The Agile Delivery Platform
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
 
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
 
Bridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItBridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need It
 
Next Gen Analytics Going Beyond Data Warehouse
Next Gen Analytics Going Beyond Data WarehouseNext Gen Analytics Going Beyond Data Warehouse
Next Gen Analytics Going Beyond Data Warehouse
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
 
Data and Application Modernization in the Age of the Cloud
Data and Application Modernization in the Age of the CloudData and Application Modernization in the Age of the Cloud
Data and Application Modernization in the Age of the Cloud
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal Modernization
 

More from Nathan Bijnens

Dataminds - ML in Production
Dataminds - ML in ProductionDataminds - ML in Production
Dataminds - ML in Production
Nathan Bijnens
 
Azure Databricks & Spark @ Techorama 2018
Azure Databricks & Spark @ Techorama 2018Azure Databricks & Spark @ Techorama 2018
Azure Databricks & Spark @ Techorama 2018
Nathan Bijnens
 
Big Data Expo '18 - Microsoft AI
Big Data Expo '18 - Microsoft AIBig Data Expo '18 - Microsoft AI
Big Data Expo '18 - Microsoft AI
Nathan Bijnens
 
Spark on Azure, a gentle introduction (nov 2015)
Spark on Azure, a gentle introduction (nov 2015)Spark on Azure, a gentle introduction (nov 2015)
Spark on Azure, a gentle introduction (nov 2015)
Nathan Bijnens
 
Cloudera, Azure and Big Data at Cloudera Meetup '17
Cloudera, Azure and Big Data at Cloudera Meetup '17Cloudera, Azure and Big Data at Cloudera Meetup '17
Cloudera, Azure and Big Data at Cloudera Meetup '17
Nathan Bijnens
 
Microsoft AI at SAI '17
Microsoft AI at SAI '17Microsoft AI at SAI '17
Microsoft AI at SAI '17
Nathan Bijnens
 
Microsoft Advanced Analytics @ Data Science Ghent '16
Microsoft Advanced Analytics @ Data Science Ghent '16Microsoft Advanced Analytics @ Data Science Ghent '16
Microsoft Advanced Analytics @ Data Science Ghent '16
Nathan Bijnens
 
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...
Nathan Bijnens
 
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
Nathan Bijnens
 
a real-time architecture using Hadoop and Storm at Devoxx
a real-time architecture using Hadoop and Storm at Devoxxa real-time architecture using Hadoop and Storm at Devoxx
a real-time architecture using Hadoop and Storm at Devoxx
Nathan Bijnens
 
A real-time architecture using Hadoop and Storm @ JAX London
A real-time architecture using Hadoop and Storm @ JAX LondonA real-time architecture using Hadoop and Storm @ JAX London
A real-time architecture using Hadoop and Storm @ JAX London
Nathan Bijnens
 
A real-time architecture using Hadoop and Storm @ BigData.be
A real-time architecture using Hadoop and Storm @ BigData.beA real-time architecture using Hadoop and Storm @ BigData.be
A real-time architecture using Hadoop and Storm @ BigData.be
Nathan Bijnens
 
Microsoft Big Data @ SQLUG 2013
Microsoft Big Data @ SQLUG 2013Microsoft Big Data @ SQLUG 2013
Microsoft Big Data @ SQLUG 2013
Nathan Bijnens
 
A real time architecture using Hadoop and Storm @ FOSDEM 2013
A real time architecture using Hadoop and Storm @ FOSDEM 2013A real time architecture using Hadoop and Storm @ FOSDEM 2013
A real time architecture using Hadoop and Storm @ FOSDEM 2013
Nathan Bijnens
 
Getting more out of your big data
Getting more out of your big dataGetting more out of your big data
Getting more out of your big data
Nathan Bijnens
 
Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!
Nathan Bijnens
 

More from Nathan Bijnens (16)

Dataminds - ML in Production
Dataminds - ML in ProductionDataminds - ML in Production
Dataminds - ML in Production
 
Azure Databricks & Spark @ Techorama 2018
Azure Databricks & Spark @ Techorama 2018Azure Databricks & Spark @ Techorama 2018
Azure Databricks & Spark @ Techorama 2018
 
Big Data Expo '18 - Microsoft AI
Big Data Expo '18 - Microsoft AIBig Data Expo '18 - Microsoft AI
Big Data Expo '18 - Microsoft AI
 
Spark on Azure, a gentle introduction (nov 2015)
Spark on Azure, a gentle introduction (nov 2015)Spark on Azure, a gentle introduction (nov 2015)
Spark on Azure, a gentle introduction (nov 2015)
 
Cloudera, Azure and Big Data at Cloudera Meetup '17
Cloudera, Azure and Big Data at Cloudera Meetup '17Cloudera, Azure and Big Data at Cloudera Meetup '17
Cloudera, Azure and Big Data at Cloudera Meetup '17
 
Microsoft AI at SAI '17
Microsoft AI at SAI '17Microsoft AI at SAI '17
Microsoft AI at SAI '17
 
Microsoft Advanced Analytics @ Data Science Ghent '16
Microsoft Advanced Analytics @ Data Science Ghent '16Microsoft Advanced Analytics @ Data Science Ghent '16
Microsoft Advanced Analytics @ Data Science Ghent '16
 
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...
Virdata: lessons learned from the Internet of Things and M2M Cloud Services @...
 
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
 
a real-time architecture using Hadoop and Storm at Devoxx
a real-time architecture using Hadoop and Storm at Devoxxa real-time architecture using Hadoop and Storm at Devoxx
a real-time architecture using Hadoop and Storm at Devoxx
 
A real-time architecture using Hadoop and Storm @ JAX London
A real-time architecture using Hadoop and Storm @ JAX LondonA real-time architecture using Hadoop and Storm @ JAX London
A real-time architecture using Hadoop and Storm @ JAX London
 
A real-time architecture using Hadoop and Storm @ BigData.be
A real-time architecture using Hadoop and Storm @ BigData.beA real-time architecture using Hadoop and Storm @ BigData.be
A real-time architecture using Hadoop and Storm @ BigData.be
 
Microsoft Big Data @ SQLUG 2013
Microsoft Big Data @ SQLUG 2013Microsoft Big Data @ SQLUG 2013
Microsoft Big Data @ SQLUG 2013
 
A real time architecture using Hadoop and Storm @ FOSDEM 2013
A real time architecture using Hadoop and Storm @ FOSDEM 2013A real time architecture using Hadoop and Storm @ FOSDEM 2013
A real time architecture using Hadoop and Storm @ FOSDEM 2013
 
Getting more out of your big data
Getting more out of your big dataGetting more out of your big data
Getting more out of your big data
 
Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!
 

Recently uploaded

Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google CloudRadically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
ScyllaDB
 
Discover the Unseen: Tailored Recommendation of Unwatched Content
Discover the Unseen: Tailored Recommendation of Unwatched ContentDiscover the Unseen: Tailored Recommendation of Unwatched Content
Discover the Unseen: Tailored Recommendation of Unwatched Content
ScyllaDB
 
Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...
Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...
Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...
anilsa9823
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
Sease
 
Real-Time Persisted Events at Supercell
Real-Time Persisted Events at  SupercellReal-Time Persisted Events at  Supercell
Real-Time Persisted Events at Supercell
ScyllaDB
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
UiPathCommunity
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
christinelarrosa
 
CTO Insights: Steering a High-Stakes Database Migration
CTO Insights: Steering a High-Stakes Database MigrationCTO Insights: Steering a High-Stakes Database Migration
CTO Insights: Steering a High-Stakes Database Migration
ScyllaDB
 
APJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes WebinarAPJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes Webinar
ThousandEyes
 
So You've Lost Quorum: Lessons From Accidental Downtime
So You've Lost Quorum: Lessons From Accidental DowntimeSo You've Lost Quorum: Lessons From Accidental Downtime
So You've Lost Quorum: Lessons From Accidental Downtime
ScyllaDB
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
DanBrown980551
 
Tracking Millions of Heartbeats on Zee's OTT Platform
Tracking Millions of Heartbeats on Zee's OTT PlatformTracking Millions of Heartbeats on Zee's OTT Platform
Tracking Millions of Heartbeats on Zee's OTT Platform
ScyllaDB
 
Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!
Ortus Solutions, Corp
 
Cost-Efficient Stream Processing with RisingWave and ScyllaDB
Cost-Efficient Stream Processing with RisingWave and ScyllaDBCost-Efficient Stream Processing with RisingWave and ScyllaDB
Cost-Efficient Stream Processing with RisingWave and ScyllaDB
ScyllaDB
 
Facilitation Skills - When to Use and Why.pptx
Facilitation Skills - When to Use and Why.pptxFacilitation Skills - When to Use and Why.pptx
Facilitation Skills - When to Use and Why.pptx
Knoldus Inc.
 
ScyllaDB Kubernetes Operator Goes Global
ScyllaDB Kubernetes Operator Goes GlobalScyllaDB Kubernetes Operator Goes Global
ScyllaDB Kubernetes Operator Goes Global
ScyllaDB
 
Call Girls Kochi 💯Call Us 🔝 7426014248 🔝 Independent Kochi Escorts Service Av...
Call Girls Kochi 💯Call Us 🔝 7426014248 🔝 Independent Kochi Escorts Service Av...Call Girls Kochi 💯Call Us 🔝 7426014248 🔝 Independent Kochi Escorts Service Av...
Call Girls Kochi 💯Call Us 🔝 7426014248 🔝 Independent Kochi Escorts Service Av...
dipikamodels1
 
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
FilipTomaszewski5
 
Guidelines for Effective Data Visualization
Guidelines for Effective Data VisualizationGuidelines for Effective Data Visualization
Guidelines for Effective Data Visualization
UmmeSalmaM1
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
christinelarrosa
 

Recently uploaded (20)

Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google CloudRadically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
 
Discover the Unseen: Tailored Recommendation of Unwatched Content
Discover the Unseen: Tailored Recommendation of Unwatched ContentDiscover the Unseen: Tailored Recommendation of Unwatched Content
Discover the Unseen: Tailored Recommendation of Unwatched Content
 
Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...
Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...
Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
 
Real-Time Persisted Events at Supercell
Real-Time Persisted Events at  SupercellReal-Time Persisted Events at  Supercell
Real-Time Persisted Events at Supercell
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
 
CTO Insights: Steering a High-Stakes Database Migration
CTO Insights: Steering a High-Stakes Database MigrationCTO Insights: Steering a High-Stakes Database Migration
CTO Insights: Steering a High-Stakes Database Migration
 
APJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes WebinarAPJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes Webinar
 
So You've Lost Quorum: Lessons From Accidental Downtime
So You've Lost Quorum: Lessons From Accidental DowntimeSo You've Lost Quorum: Lessons From Accidental Downtime
So You've Lost Quorum: Lessons From Accidental Downtime
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
 
Tracking Millions of Heartbeats on Zee's OTT Platform
Tracking Millions of Heartbeats on Zee's OTT PlatformTracking Millions of Heartbeats on Zee's OTT Platform
Tracking Millions of Heartbeats on Zee's OTT Platform
 
Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!
 
Cost-Efficient Stream Processing with RisingWave and ScyllaDB
Cost-Efficient Stream Processing with RisingWave and ScyllaDBCost-Efficient Stream Processing with RisingWave and ScyllaDB
Cost-Efficient Stream Processing with RisingWave and ScyllaDB
 
Facilitation Skills - When to Use and Why.pptx
Facilitation Skills - When to Use and Why.pptxFacilitation Skills - When to Use and Why.pptx
Facilitation Skills - When to Use and Why.pptx
 
ScyllaDB Kubernetes Operator Goes Global
ScyllaDB Kubernetes Operator Goes GlobalScyllaDB Kubernetes Operator Goes Global
ScyllaDB Kubernetes Operator Goes Global
 
Call Girls Kochi 💯Call Us 🔝 7426014248 🔝 Independent Kochi Escorts Service Av...
Call Girls Kochi 💯Call Us 🔝 7426014248 🔝 Independent Kochi Escorts Service Av...Call Girls Kochi 💯Call Us 🔝 7426014248 🔝 Independent Kochi Escorts Service Av...
Call Girls Kochi 💯Call Us 🔝 7426014248 🔝 Independent Kochi Escorts Service Av...
 
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
 
Guidelines for Effective Data Visualization
Guidelines for Effective Data VisualizationGuidelines for Effective Data Visualization
Guidelines for Effective Data Visualization
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
 

Data Mesh using Microsoft Fabric

  • 1. Nathan Bijnens Manager, Belux CSU Data Team Data Mesh in Microsoft Fabric Ivana Pejeva Cloud Solution Architect, Data & AI
  • 2. What we’ve heard To spend less time preparing data Robust data governance Platform to actionable Insights to the business Ability to increase the value of hidden data Improve Operational Efficiency Ideally, organizations want to have….. Reduce cost of data engineering Need for Frictionless Data Governance Difficult to balance access and data protection Data and Analytics Operationalization Enable Lines of Businesses Poor data quality Disparate systems and data silos Too slow moving from data to decision Barriers to achieve business outcomes Unified ecosystem Project prioritization
  • 3. Every application that creates data, needs and will have a database Application A Application B Consequently, when we have two applications, we hypothesize that each application has its own ‘database’. When there is interoperability between these two applications, we expect data to be transferred from one application to the other. Every application, at least in the context of data management, that creates data, needs and will have a database. Even stateless applications that create data have “databases”. In these scenarios the database typically sits in the RAM or in a temp file.
  • 4. We can’t escape from data integration Application A Application B The ‘always’ required data transformation lies in the fact that an application database schema is designed to meet the application’s specific requirements. Since the requirements differ from application to application, the schemas are expected to be different and data integration is always required when moving data around. A crucial aspect when it comes to data transfer is that data integration is always right around the corner. Whether you do ETL or ELT, virtual or physical, batch or real-time, there’s no escape from the data integration* dilemma. Data integration
  • 5. Business Drivers •Lack of data ownership Lack of data quality Difficult to see interdependencies Model conflicts across business concerns Tremendous effort for integration and coordination leads to bypasses Business and IT work in silos Disconnect between the data producers and data consumers Central team becomes the bottleneck Difficult to apply policy and governance Hard to see technical dependencies Small changes become risky due to unexpected consequences Technical ownership rather than data ownership Many Enterprises are saddled with outdated Data Architectures that do not scale to the needs of large multi- disciplinary organizations.
  • 6. Problems with Existing Architectures There’s a deep assumption that centralization is the solution to data management. This includes centralizing all data and management activities into one central team, building one data platform, using one ETL framework, using one canonical model, etc. Transactional Sources Analytical Consumers Centralized Architecture • Single team with centralized knowledge and book of work • Centralized pipelines for all extraction / ingestion activities • Centralized transformations to create harmonized data • Central platform serves as large integration database: all execution and analysis is done on the same platform Data providers Data consumers Central engineering team Transactional Sources Transactional Sources Analytical Consumers Analytical Consumers
  • 7. Data as a Product
  • 8. Data as a Product Data is no longer a side-effect, it’s a product. Who are my "customers"? What do my "customers" need? Are they happy with the data? Are they using it? How do I let my "customers" know my data exists? What is in it for the "customer"?
  • 9. Data Product Owner Domain Data Product Owner Data Engineer Software Developer Infra Engineer How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh (martinfowler.com) Zhamak Dehghani
  • 10. Data Product Properties How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh (martinfowler.com) Zhamak Dehghani • Overview of product in central data catalog • Provide easy discoverability Discoverable • Help users access the product programmatically Addressable • Data Product Owners provide monitored SLOs • Data is cleansed and up to standard Trustworthy • Minimal friction for data engineers and scientists to use the data Self-describing • Open standards for harmonization • Field type formatting Interoperable • Access control policies • Use SSO and RBAC Secure
  • 12. Data Mesh Data Mesh is a new decentralized socio-technical approach to managing data, designed to work with organizational complexity and continuous growth. It enables large organizations to get value from their data, at scale, through reusability, analytics and ML. It is building on the Domain Driven Design methodology. Data Mesh Domain Driven Design Domain Zones Data Products Consumed by other Domains How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh (martinfowler.com) Zhamak Dehghani
  • 13. Centralized Implementation is not working! Engineering Finance HR Marketing Innovation Operations Centralized Platform LOBs are the SMEs and Shared Service team is not able to cope up with the projects Datasets sprawls Competing needs within the organization • IT needs to standardize • LOBs need to implement analytics Primitive Data Strategy
  • 14. Introduction to Data Domains Search Keywords Promotions Top Selling Products Orders Customer Profiles Data Products Integration Services Operational Systems Marketing Domain Customer Services Domain Order Management Domain • A domain is a collection of people, typically organized around a common business purpose. • Create and serve data products to other domains and end users, independently from other domains. • Ensure data is accessible, usable, available, and meets the quality criteria defined. • Evolve data products based on user feedback and retire data products when they become irrelevant.
  • 15. Domain Zones Engineering Finance HR Innovation Marketing Operations Management zone Data products Data Domains Microsoft Enterprise Data Mesh
  • 16. Domain Zone Domain Zone Environment for each LOB LOBs: Implement Data Services • ex: Exploration Service, Data Order System LOBs: Build and Share Data Products • ex: Sales Forecast, Clean Room Performance Automated using templates • security, integration, monitoring, etc.
  • 17. E N T E R P R I S E R E Q U I R E M E N T S Security & Privacy Governance & Compliance Availability & Recovery Performance & Scalability Skills & Training Usage & Cost Management Observation & Monitoring Domain Architecture
  • 18. E N T E R P R I S E R E Q U I R E M E N T S Security & Privacy Governance & Compliance Availability & Recovery Performance & Scalability Skills & Training Usage & Cost Management Observation & Monitoring Domain Architecture
  • 20. Modern Analytics and Governance at Scale Open and Governed Data Lakehouse Foundation • Automated Data Services • Data Management • Data Operationalization data mesh data fabric data hub Microsoft’s Hybrid Approach to data mesh, data fabric and data hub Data Governance Security Compliance Data Engineering Real-time ML & AI SQL Based Analytics Enterprise BI Data Products
  • 21. Modern Analytics and Governance at Scale Open and Governed Data Lakehouse Foundation • Automated Data Services • Data Management • Data Operationalization data mesh data fabric data hub Microsoft’s Hybrid Approach to data mesh, data fabric and data hub Data Governance Security Compliance HR Innovation Engineering Operations Finance Marketing Data Products
  • 22. Governance Open and Governed Data Lakehouse Foundation Data Services Data Services Data Services Data Services Data Services Data Services Automated Services | Data Management Data Services Data Products Unifying the Domains Domains Data Engineering Real-time Analytics ML, AI & Data Science SQL-based Analytics Enterprise BI Modern Analytics and Governance at Scale
  • 23. Open and Governed Data Lakehouse Foundation Governance Finance Marketing Operations Line of Business Shared • Self-Serve Analytics • Empower LOBs to implement their own analytics projects • Democratize data and analytics across LOBs • Accelerate Cross Business Unit Collaboration • Leverage LOB SMEs for business analytics • Re-use data products across domains • Reduce data engineering • Improve data agility Data Products Raw / Conformed Unifying the Domains Modern Analytics and Governance
  • 24. Azure Databricks MS Fabric Azure Databricks MS Fabric Azure Databricks MS Fabric MS Fabric MS Fabric MS Fabric OneLake (ShortcutExisting Data Lakes) Internal Storage (SaaS) MS Fabric
  • 25. IT or Shared Services Team Data Factory Azure Databricks Data Flow Ext Data Feed On-Prem Data Feed Data Lake Raw Curated Publish IOT Hub Workspace 1 Workspace 2 Workspace 3 Capacity Capacity MS Fabric Marketing OneLake MS Fabric implementation
  • 26. MS Fabric implementation Power BI Datamart Operations OneLake (internal storage) HR • Lakehouse 1 • Lakehouse 2 Finance • Warehouse 1 • Warehouse 2 • Lakehouse 1 Innovation • Lakehouse 1 • Warehouse 1 Engineering • Lakehouse 1 • Lakehouse 2 Power BI DirectLake Power BI Datamart Marketing Power BI DirectLake
  • 28. OneLake for all domains OneLake gives a true data mesh as a service Introducing domains as an integral part of Fabric: A domain is a way to logically group together all the data in an organization relevant to an area or field, according to business needs Domains are defined with domain admins and contributors who can associate workspaces and group them together under a relevant domain Federated governance can be achieved by delegating settings to domain admins, thus allowing them to achieve more granular control over their business area Domains simplify discovery and consumption of data across the organization, thus allowing business optimized consumption Avoid data swamps by endorsing certain data as certified or promoted, thus encouraging reuse. Unified management and governance Certified Workspace POS sales Certified Workspace online sales Sales Workspace customer Promoted Workspace ads Marketing Workspace expenses Finance Data Factory Synapse Data Warehousing Synapse Data Engineering Synapse Data Science Synapse Real Time Analytics Power BI Data Activator Sales Marketing Finance
  • 29. Shortcuts virtualize data across domains and clouds No data movements or duplication A shortcut is a symbolic link which points from one data location to another Create a shortcut to make data from a warehouse part of your lakehouse Create a shortcut within Fabric to consolidate data across items or workspaces without changing the ownership of the data. Data can be reused multiple times without data duplication. Existing ADLS gen2 storage accounts and Amazon S3 buckets can be managed externally to Fabric and Microsoft while still being virtualized into OneLake with shortcuts All data is mapped to a unified namespace and can be accessed using the same APIs including the ADLS Gen2 DFS APIs Unified management and governance Workspace A Warehouse Finance Lakehouse Customer 360 Workspace B Lakehouse Service telemetry Warehouse Business KPIs Amazon Azure Data Factory Synapse Data Warehousing Synapse Data Engineering Synapse Data Science Synapse Real Time Analytics Power BI Data Activator
  • 30. OneLake gives a true data mesh as a service 1 One Copy enables data to be used across domains, clouds and engines Unified management and governance Marketing Operations Finance Engineering Sales HR Innovation An organization will have many data domains with many workspaces with different data owners. However, a single data product can span multiple domains. Shortcuts provide the connections between domains so that data can be virtualized into a single data product without data movement, data duplication or changing the ownership of the data. Data Factory Synapse Data Warehousing Synapse Data Engineering Synapse Data Science Synapse Real Time Analytics Power BI Data Activator
  • 32. Interested in learning more? Reach out to Nathan.Bijnens@microsoft.com ivanapejeva@microsoft.com

Editor's Notes

  1. Before we dive deeper, I want to run very quickly thru some basic assumptions which frames any architecture. The first assumption is that every application which processes data, needs to have some type of data persistency. Second, I note that applications are used to solve specific problems. Applications are unique and so it the data. This is because there are several stages to the design and development of applications. You always start with conceptual thinking and design; then you translate our knowledge to a logical application data model, which is an abstract structure of conceptual information and requirements. Finally, you make the physical application data model: the true design of the application and database. The physical data model is unique and receives both the context and nonfunctional requirements for how the application and database will be designed and used.
  2. And these unique designs lead to another problem from which we can’t escape. It’s the data integration that is always around the corner when moving data across applications. There’s no escape from this dilemma, and it doesn’t matter you do ETL, ELT, virtual of physical, batch or real-time. This problem is always there. Any architecture is framed by these objectives.
  3. As an architect I can tell you that the world heading towards distributed data at large. Several trends are fragmenting the data landscape, of which some you see on the screen. The first trend I see is an explosion of analytical tools and ways in which you can process and use your data. The consequence of this is that the same data ends up everywhere. A second trend is the Cloud, Services and API connectivity which pushes the data usage and distribution even further. At the same time, we need to be very much in control of our data, because of stronger regulation such as GDPR and BCBS. Next, I see a trend of increased compute power, which allows us to quickly move data across platforms and different locations. These trends of data distribution at scale will also exponentially grow data even further. And lastly, I see a trend where the read vs write ratio changes. Transactional systems we no longer use for only store and processing data for transactional purposes. They at the same time need to serve out spontaneously tons of data, which at the same time can be challenging.
  4. Zhamak DDD and it’s org aspects Domain Zones and how they are independent and enabled Within a Domain Zone you create Data Products Which then can be consumed by other domains Creating a Data Mesh
  5. empathized with today’s pain points of architectural and organizational challenges in order to become data-driven, use data to compete, or use data at scale to drive value. 
  6. The data management landing zone has a management function and Is responsible for the governance of your analytics platform. The data management landing zone is responsible for the following: Data catalog Data quality management Data security and privacy Data governance
  7. Zoom in on one domain zone In a data mesh, a domain zone is a way to define boundaries around your enterprise data. Domains can vary depending on your organization, and in some cases, you might want to define domains based on your line of business (LOB) According to Microsoft’s Cloud Adoption Framework, here are some best practices to follow: Use automation to create domain zones and ensure that they are consistent across your organization. Implement data services that are specific to each domain zone. Build data products that are specific to each domain zone. Share data products across domain zones to promote reuse and collaboration
  8. Microsoft Purview provides a unified data governance solution to help manage and govern your on-premises, multicloud, and software as a service (SaaS) data. Microsoft Fabric is an all-in-one analytics solution for enterprises that covers everything from data movement to data science, real-time analytics, and business intelligence. It provides a way to organize data into domains. Domains are a way to logically group together all the data in an organization that is relevant to a particular area or field . OneLake is a single, unified, logical data lake for your whole organization - OneLake brings customers one data lake for the entire organization - one copy of data for use with multiple analytical engines - ability to organize and manage data in a logical way allowing different business groups to efficiently operate and control their own data.
  9. Challenges of lego block architecture – too complex Clients, partners and every cloud provider is pushing to build an end to end data and analytics ecosystem using the “lego block” approach The approach is too complicated to implement at scale requirement different skills to ensure proper design and deployment (integration, security, networking, governance, etc) The challenge MS is solving is how do we simplify this implementation, how do we make it easier for our clients while ensuring all the enterprise requirements are met. Microsoft answer is the Analytics Continuum – our strategy and vision we’re executing on. Every standalone component of this architecture has six enterprise needs which must be met. In the architecture shown, including the cloud service, that could mean 36 points of failure, inefficiency or cost. Every additional cloud platform also incurs its own burden.
  10. - Data domain & data mesh - Enterprise Scale for Analytics (Data Management and Analytics Scenario) - Microsoft framework available on public documentation - Guidance, best practices, deployment templates
  11. MS Fabric gives the flexibility in each domain to build their own data products. Bottom to top: operational data sources: e.g. cosmos db -> fabric mirror, makes it accessable in the data hub -> can make new data products -> can be used in other domains Microsoft’s hybrid approach to data mesh, data fabric and data hub is based on the idea of combining the best features of each concept to create a data platform that is decentralized, scalable, and accessible. Microsoft Fabric’s data mesh architecture supports the data mesh principle by allowing data to be grouped into domains based on different business areas, such as marketing, sales, human resources, etc. Each domain has its own data owners, contributors, and governance rules, enabling decentralized data management and autonomy.  Microsoft Fabric also provides a OneLake data hub that makes it easy to find, explore, and use the data items in the organization that the user has access to. The data hub provides a filterable list of all the data items, a gallery of recommended data items, a way of finding data items by workspace or domain, and an options menu of things the user can do with the data item. The data hub also integrates with various data sources and services, such as Azure Synapse Analytics, Azure Data Factory, Azure Purview, and Power BI, to enable data ingestion, transformation, analysis, and visualization. The hybrid approach also leverages the data fabric technology to enable data integration, orchestration, and processing across different data sources and platforms. The hybrid data fabric and data mesh framework can help organizations design a data platform that can handle complex data scenarios, such as data streaming, data lake, data warehouse, data virtualization, data catalog, and data governance. The hybrid framework can also support various data products that can benefit from both data fabric technology and data mesh principles, such as data quality, data lineage, data security, data privacy, and data discovery. The hybrid approach aims to create a data platform that is flexible, agile, and adaptable to the changing data needs and requirements of the organization.
  12. MS Fabric gives the flexibility in each domain to build their own data products. Bottom to top: operational data sources: e.g. cosmos db -> fabric mirror, makes it accessable in the data hub -> can make new data products -> can be used in other domains Microsoft’s hybrid approach to data mesh, data fabric and data hub is based on the idea of combining the best features of each concept to create a data platform that is decentralized, scalable, and accessible.  Microsoft Fabric’s data mesh architecture supports the data mesh principle by allowing data to be grouped into domains based on different business areas, such as marketing, sales, human resources, etc. Each domain has its own data owners, contributors, and governance rules, enabling decentralized data management and autonomy.  Microsoft Fabric also provides a OneLake data hub that makes it easy to find, explore, and use the data items in the organization that the user has access to. The data hub provides a filterable list of all the data items, a gallery of recommended data items, a way of finding data items by workspace or domain, and an options menu of things the user can do with the data item.. The hybrid approach also leverages the data fabric technology to enable data integration, orchestration, and processing across different data sources and platforms. The hybrid data fabric and data mesh framework can help organizations design a data platform that can handle complex data scenarios, such as data streaming, data lake, data warehouse, data virtualization, data catalog, and data governance. The hybrid framework can also support various data products that can benefit from both data fabric technology and data mesh principles, such as data quality, data lineage, data security, data privacy, and data discovery. The hybrid approach aims to create a data platform that is flexible, agile, and adaptable to the changing data needs and requirements of the organization.
  13. The open data Lakehouse can be used as the technical foundation for data mesh. Data mesh aims to enable domains (often manifesting as business units in an enterprise) to use best-of-breed technologies to support their use cases
  14. One security uses a layered security model built around the organizational structure of experiences within Microsoft Fabric, such as OneLake, Warehouse, Real-time analytics, and Power BI semantic models. One security allows you to manage security at different levels, such as workspace, item, and compute-specific security.
  15. Domains are an integral part of Fabric. They are defined with domain admins and contributors who can associate workspaces and group them together under a relevant domain. Federated governance can be achieved by delegating settings to domain admins, thus allowing them to achieve more granular control over their business area. Domains simplify discovery and consumption of data across the organization, thus allowing business optimized consumption.
  16. Tenant -> Domain -> Workspace Different business groups are now able to work independently within the same data lake without the overhead of managing different storage resources. They are already able to implement the popular data mesh pattern more efficiently than they could before. OneLake takes this even further with the introduction of domains as a first-class concept. A single business domain may have multiple workspaces as workspaces tend to align with specific projects or teams. A domain is a way of logically grouping together all the data in and organization that is relevant to an area or field. Domains are defined with domain admins and contributors who can logically group together workspaces under those domains.. Domains provide a management boundary between tenant and workspace enabling admins to have more granular control over multiple workspaces. As you will see later, domains also simplify discovery and consumption of data across the entire organization. Now that we are making it so easy for different parts of the organization to work on the same data lake without going through a central gatekeeper, you might be thinking that you want to block certain users from adding to the lake. If anyone can add to the data lake, then these can quickly become data swamps with data from official sources mixed with data from unofficial sources. The problem with blocking users from OneLake, is that they will just create another data lake somewhere else. When they do that, you will have no idea if that data is properly governed or even how it is being used. If they add their data to OneLake, it will be automatically governed and still under the control of the admins who will start to get more and more insights on how that data is being used. You can avoid data swamps in OneLake through data endorsements. Domain owners can officially certify data or recommend data so that the important data rises to the surface while the rest sinks to the bottom.
  17. Think of OneLake as an abstraction layer. You can mount existing ADLS Gen2 to it. Virtualization across many storage account. Maintains a single namespace. A shortcut is nothing more than a symbolic link which points from one data location to another. Just like you can create shortcuts in Windows or Linux, the data will appear in the shortcut location as if it were physically there. Today, if you have tables in a data warehouse, which you want you want to make available along side other tables or files in a lakehouse, you will need to copy that data out of warehouse. With OneLake, you simply create a shortcut in the lakehouse pointing to the warehouse. The data will appear in your lakehouse as if you had physically copied it. Since you didn’t copy it, when data is updated in the warehouse, changes are automatically reflected in the lakehouse. You can also use shortcuts to consolidate data across workspace and domains without changing the ownership of the data. In this example, the workspace B still owns the data. They still have ultimate control over who can access it and how it stays up to date. Many of you already have existing data lakes stored in ADLS gen2 or in Amazon S3 buckets. These lakes can continue to exist and be managed externally to Fabric. We have extended shortcuts to include lake outside of OneLake and even outside of Azure so that you can virtualize you existing ADLS gen 2 accounts or Amazon S3 buckets into OneLake. All data is mapped to the same unified namespace and can be accessed using the same ADLS gen2 APIs even when it is coming from S3.
  18. If we zoom out, we can see all these domains in OneLake. To get 360-degree view of your business, a single data item, or product in data mesh terms, will need to span multiple domains. It is shortcuts that provide the connections between domains so that data can be virtualized into a single data product without data movement, data duplication or changing the ownership of the data.
  19. Sure, I can write you a demo script for Microsoft Fabric showcasing Data Mesh. Here is a possible example: Hello and welcome to this demo of Microsoft Fabric, the AI-powered analytics platform that helps you bring your data into the era of AI. In this demo, we will show you how Fabric enables you to leverage the power of Data Mesh, a decentralized data architecture that organizes data by business domains and provides more ownership to the data producers. Data Mesh is a concept that was introduced by Zhamak Dehghani in 20191 and is based on four principles: domain-oriented decentralized data ownership and architecture, data as a product, self-serve data infrastructure as a platform, and federated computational governance23. These principles aim to address the challenges of centralized, monolithic data structures, such as data accessibility, quality, and organization. With Fabric, you can implement Data Mesh in your organization by following these steps: Identify your business domains and the data producers and consumers for each domain. For example, you may have domains such as marketing, sales, customer service, and finance, each with their own data sources, pipelines, and analytics needs. Empower your domain teams to take responsibility for their data and treat it as a product. This means that the domain teams should design, build, and run their own data platforms, APIs, and services, using the Fabric tools and services that suit their needs. For example, they can use OneLake to create and manage their data lakes, Synapse to perform data engineering and data science, Power BI to create and share dashboards and reports, and Data Factory to orchestrate data movement and transformation. Enable self-service data access and discovery across domains by using Fabric’s data catalog and metadata management features. This allows the domain teams to document and expose their data products to other domains, as well as to consume data products from other domains, using standard protocols and formats. For example, they can use Data Activator to automatically generate insights and trigger actions from their data, or use Data Explorer to search and browse data products from different domains. Establish federated governance and compliance policies for your data mesh by using Fabric’s data security and quality features. This ensures that the data products are reliable, consistent, and trustworthy, and that the data consumers have the appropriate permissions and usage rights. For example, they can use Data Protector to monitor and protect their data from threats and breaches, or use Data Auditor to audit and validate their data quality and lineage. By following these steps, you can create a data mesh architecture that leverages the benefits of Fabric’s unified data foundation, role-tailored tools, AI-powered capabilities, and open, governed foundation. With Fabric and Data Mesh, you can reshape how your entire team uses data and drive innovation and growth for your business. Thank you for watching this demo of Microsoft Fabric and Data Mesh. If you want to learn more, please visit our website4 or sign up for a free trial5.
  翻译: