尊敬的 微信汇率:1円 ≈ 0.046239 元 支付宝汇率:1円 ≈ 0.04633元 [退出登录]
SlideShare a Scribd company logo
FREE TO SHARE
Sheetal Pratik - Saxo Bank
August 2020
Data Workbench
About Saxo
We are leading fintech and regtech
specialists, connecting traders, investors and
partners to more than 35,000 instruments –
across all asset classes – from a single
account.
What we do
We build digital platforms to facilitate multi-
asset market access and provide clients of all
sizes with professional-grade tools, industry-
leading prices and best-in-class service.
Data ForScale: Transforming DataAccess,DataGovernance and DataQuality 2
• A data driven organization need to have multi-level Data Governance. Most of the tools are designed to fix the
fact e.g. before a data warehouse load. What is needed is to ensure data integrity at the origin to prevent the
“butterfly effect” in the downstream systems.
• The article “How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh”, clearly emphasizes on how
data platform with a centralized architecture can lead to failures by being bottleneck at certain point and have
impact to stability. Also with ownership of data at the domain level, it becomes a failed attempt to manage the
data dictionary centrally or duplicate the effort of creating and maintaining such data assets.
• Considering this, it is imperative that the solution has to be more futuristic and a straight implementation of any
of the COTS products for Data Cataloguing might not be the right answer to Saxo’s Data Governance
implementation.
• The preferred strategy for tooling is to fix forward rather than attempting to fix the past by using some kind of
crawler and using ML to extract the metadata from various data sources.
DATA GOVERNANCE FOR A DIGITAL NATIVE
3
For DomainTeams
Who need visibility onthe availability,meaning, usage,ownershipand quality of data
The Data Workbench(Owner's pride Neighbor's envy)
Is a one-stopdata shop
That provides transparency of Saxo’s data ecosystem
Unlike our current state whichis becomingincreasingly complex as we grow
Our product will help Saxo to improve time to market andunlock new insights.
The Data Workbench is designedto be part of the new data architecture.It consists oftwomaincomponentsa Data Catalogue anda Data
QualitySolution.
1. The Data Catalogue captures andexposes metadata. This provides transparency into the meaningand ownershipof ourdata. The Data
Catalogue is built on DataHub a data catalogue open-sourcedby LinkedIn. LinkedInis very supportive and are workingclosely withus
helping withthe adoptionof the tool.
2. The Data Quality Solutionis built onthe opensource solution Great Expectations supportedbySuperConductive. Great Expectations is
a declarative,flexible,andextensible data quality solution. It allows teams to define dataquality rules and actively monitor the quality
of their data.
VISION
4
• Federated Data Governance model is an industry trend where the enterprise governance team facilitates the
monitoring and management of the quality of enterprise critical data, with assistance from the business unit.
• LinkedIn’s journey of its shift of approach from initial version of Data Governance solution called WhereHows to
DataHub , is a typical example of the paradigm shift from “a central metadata repository” solution to a more
decentralized architecture that puts domains before anything else to support the possibility of self-service data
platform.
• We realized that a practical way of implementation would be to stay lean and agile and iteratively work with data
domains while establishing the Data Governance framework and thus create a platform that is self-serviced, scalable
and more relevant to stakeholders.
• We had a discussion with LinkedIn to understand their journey, learnings and lessons learnt that motivated them to
evolve from WhereHows to Datahub. We acknowledged that, Saxo Bank is on a similar journey and we can fast
forward the implementation by adopting Datahub open sources that best relates to the ecosystem of Saxo Bank.
• The LinkedIn datahub team has been extremely responsive.
• Other digital natives have also recognized that the incumbent solutions are not fit for the modern age and have
built their specific solutions.
MOTIVATION FOR THE SOLUTION
5
PERSONAS: GOALS AND PAIN POINTSGoalPainPoints
Data Asset
Owner
Data
Steward
Data Governance
Committee Member
Data Scientist Data Consumer
(Reporting)
To find data, its owner of
data and anything else
that helps compliance
To solve business
problems based on
data
To get an overview
of whole data in
the org
To define and
document data
standards
To be responsible
and accountable
for data
Lack of clarity on
ownership of data
New role in Saxo
Bank, so yet to
own full
responsibility
Lack of clarity on
what to mandate
at what level
(federated or data
domain level)
Data missing/
incomplete
See who can
explain data
elements
Not sure if the
data can be
trusted for making
the right decisions
6
DATA MESH APPROACH - PRODUCT THINKING
7
DIRECT AGENT/
FLEET ENGAGEMENT
SPEED OF
DELIVERY
Domain
Polyglot
Output Data
Ports
Polyglot Data
Input Ports
Control
Ports
Logs,
metrics
Self-serve
description
Discoverable
Addressable
Self-describing
Trustworthy
Interoperable
7
TARGET METADATA MODEL
8
STREAM PROCESSING
DATA PLATFORM - HIGH LEVEL OVERVIEW
9
ConsumptionDomain
Data Source
Business
Capabilities
Business
Capabilities
Reports
Trading
Instruments
Prices
Customers/Parti
es
Product
RX
Data Workbench
BUSINESS EVENTS
STREAM PROCESSING
(Enrich/Transform/Aggregate)
DATA STORAGE& MANAGEMENT
DW
DL Storage
DATA PRODUCTS
Native Data Products Domain Data
Products
Aggregated Data
Products
Fit for Purpose Data
Products
Confluent Kafka Platform
Data Catalog:
DATAHUB
DATA QUALITY
Great
Expectations:
OTHER PROCESSING FRAMEWORKS
● SAXO features list
● SAXO initial evaluation params
● TW extended feature list
● Product documentation
● Software: Local installations
● Vendor questionnaire
● Includes initial & shortlisted list
● Data Catalog
● Data Quality
Evaluation Criteria Definition Shortlisted tools Evaluation Process
TOOLS EVALUATION PROCESS
10
11
DATA CATALOG TOOL EVALUATION
11
Prioritized Feature List
● Full Text search on dataset
name, attributes and tags
● Extensiblesearch model
Metadata Search
● Web-based UI to show
metadata, governance
attributes, tags and lineage
● Ability to edit and enrich
attributes
Metadata UI
● Push-based REST API
● Pull-based adapters for
Snowflake and CRM dynamics
● Extensibility
Metadata Ingestion
● Metadata entity for datasets,
its users and attributes
● Business glossary &
documentation
● Extensibility
Metadata Modelling
● Dataset lineagewith upstream
and downstream provenance
● Integration with data
processing/orchestration tools
Data Lineage
● Support for metadata
enrichment and tagging
● Ability to flaga dataset
Data Stewardship
● Shows related Quality Attributes in
UI
● Extensibleto integrate with any DQ
tool
Data Quality Integration
● Cloud-native(Scalable& High
Availability)
● Configurable
● Extensible
Architecture
● Data as a Product
● Distributed Domain Driven
Architecture
● Self-serviceplatform
Alignment with Data Mesh
● Authentication / LDAP
● Authorization / RBAC
Security ● Metadata Versioning
● Data Virtualization
● ML/AI capabilities
Deprioritized
● Export API
Metadata Export
● LicensingCost
● Customization / Development cost
Total Cost of Ownership
● Release cycle
● Community support
● Commercial Support
● Documentation
Support
11
TOOLS LANDSCAPE
Data Catalog
Collibra
Informatica EDC
Alatian
Data.World
Azure Data Catalog - Prev2
Zeenea
Apache Atlas
Linkedin DataHub
Amundsen
Marquez
Commercial
Open Source
In House
12
TOOLS LANDSCAPE
Data Catalog
Collibra
Informatica EDC
Alatian
Data.World
Azure Data Catalog - Prev2
Zeenea
Apache Atlas
Linkedin DataHub
Amundsen
Marquez
Commercial
Open Source
In House
13
DATA CATALOG TOOL EVALUATION
Deep-dive analysis of the capabilities of shortlisted tools purely as per the teams understaning in Saxo’s context.
Datahub Marquez Amundsen Collibra Zeenea
Metadata Search *
Metadata UI Editable
Metadata Ingestion *
Metadata Modelling *
Data Lineage *
Metadata Export *
Data Stewardship
Data Quality Integration
Architecture * Only supports
AWS
Security *
Alignment with Data Mesh *
Support
Total Cost of Ownership *
Completelysuitable
Partiallysuitable
No/Minimal suitability
Commercial
Open Source
• Push Based approach that supports Event Driven
architecture. The solutionbuilt onthe principle of
self-service andproducers know theirdata better
and they canprovide the rich metadata so that it
helps in discover the data-assets andencourages
consumption.
• DetailedEvaluationwascarriedout fromSaxo
perspective.
• Possibility of evolution
• Extensibility with the open sourceand evolve
the toolas per needs. Right fit from feature
perspectivein terms of Data Governance
maturity of the organization.
• Leverage from larger community needs and
also influence internal process when needed.
• Reputation of LinkedIn, their success in Kafka
and datahub scaling internally to LinkedIn
volume
• Promising Roadmap 14
**Disclaimer**:Based on Saxo evaluation criteriaand interpretationofproduct capabilities
DATASET ONBOARDING - RESPONSIBILITIES
Metadata
● Dataset/Data Product Metadata
○ Ownership Information
○ Reader Information
○ Topic Configuration Details
○ Dataset Structure (AVRO Schema)
○ Business Term mapping
○ Source Dataset Definition (Optional)
● Quality Rules
Data Engineering
● Domain Transformations
● (Kafka Stream)
PRODUCERS
Metadata
● Consumer Details
● Usage Details
● Target Dataset details
CONSUMERS
Engineering Capabilities
● Supporting New Domains
● Metadata Integration
● DQ Integration
Kafka PLATFORM-Lean Team
15
Thank You
Q&A?
16

More Related Content

What's hot

Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
Databricks
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Databricks
 
Observability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageObservability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineage
Databricks
 
Data Warehouse - Incremental Migration to the Cloud
Data Warehouse - Incremental Migration to the CloudData Warehouse - Incremental Migration to the Cloud
Data Warehouse - Incremental Migration to the Cloud
Michael Rainey
 
Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012
Jay Patel
 
Apache Spark Core
Apache Spark CoreApache Spark Core
Apache Spark Core
Girish Khanzode
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQL
Yousun Jeong
 
Apache Nifi Crash Course
Apache Nifi Crash CourseApache Nifi Crash Course
Apache Nifi Crash Course
DataWorks Summit
 
Data discovery & metadata management (amundsen installation)
Data discovery & metadata management (amundsen installation)Data discovery & metadata management (amundsen installation)
Data discovery & metadata management (amundsen installation)
창언 정
 
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
Michael Stack
 
Hive Does ACID
Hive Does ACIDHive Does ACID
Hive Does ACID
DataWorks Summit
 
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
Edureka!
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
Databricks
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
Databricks
 
Sqoop
SqoopSqoop
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
Databricks
 
Fast analytics kudu to druid
Fast analytics  kudu to druidFast analytics  kudu to druid
Fast analytics kudu to druid
Worapol Alex Pongpech, PhD
 
Introduction to Apache NiFi dws19 DWS - DC 2019
Introduction to Apache NiFi   dws19 DWS - DC 2019Introduction to Apache NiFi   dws19 DWS - DC 2019
Introduction to Apache NiFi dws19 DWS - DC 2019
Timothy Spann
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streaming
datamantra
 
Introduction to sqoop
Introduction to sqoopIntroduction to sqoop
Introduction to sqoop
Uday Vakalapudi
 

What's hot (20)

Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
 
Observability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageObservability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineage
 
Data Warehouse - Incremental Migration to the Cloud
Data Warehouse - Incremental Migration to the CloudData Warehouse - Incremental Migration to the Cloud
Data Warehouse - Incremental Migration to the Cloud
 
Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012
 
Apache Spark Core
Apache Spark CoreApache Spark Core
Apache Spark Core
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQL
 
Apache Nifi Crash Course
Apache Nifi Crash CourseApache Nifi Crash Course
Apache Nifi Crash Course
 
Data discovery & metadata management (amundsen installation)
Data discovery & metadata management (amundsen installation)Data discovery & metadata management (amundsen installation)
Data discovery & metadata management (amundsen installation)
 
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
 
Hive Does ACID
Hive Does ACIDHive Does ACID
Hive Does ACID
 
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
Sqoop
SqoopSqoop
Sqoop
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
 
Fast analytics kudu to druid
Fast analytics  kudu to druidFast analytics  kudu to druid
Fast analytics kudu to druid
 
Introduction to Apache NiFi dws19 DWS - DC 2019
Introduction to Apache NiFi   dws19 DWS - DC 2019Introduction to Apache NiFi   dws19 DWS - DC 2019
Introduction to Apache NiFi dws19 DWS - DC 2019
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streaming
 
Introduction to sqoop
Introduction to sqoopIntroduction to sqoop
Introduction to sqoop
 

Similar to LinkedInSaxoBankDataWorkbench

5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake
MetroStar
 
Virtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesVirtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & Bénéfices
Denodo
 
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsWP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
Jane Roberts
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Denodo
 
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal Modernization
Denodo
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Denodo
 
Benefits of a data lake
Benefits of a data lake Benefits of a data lake
Benefits of a data lake
Sun Technologies
 
Master Meta Data
Master Meta DataMaster Meta Data
Master Meta Data
Digikrit
 
Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)
Denodo
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Nathan Bijnens
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US Information
Devon Ziegenfuss
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US Information
Julian Tong
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
Denodo
 
Fast Data Strategy Houston Roadshow Presentation
Fast Data Strategy Houston Roadshow PresentationFast Data Strategy Houston Roadshow Presentation
Fast Data Strategy Houston Roadshow Presentation
Denodo
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
DATAVERSITY
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
DATAVERSITY
 
Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)
Denodo
 
Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft Fabric
Nathan Bijnens
 

Similar to LinkedInSaxoBankDataWorkbench (20)

5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake
 
Virtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesVirtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & Bénéfices
 
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsWP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
 
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
 
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal Modernization
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
 
Benefits of a data lake
Benefits of a data lake Benefits of a data lake
Benefits of a data lake
 
Master Meta Data
Master Meta DataMaster Meta Data
Master Meta Data
 
Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US Information
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US Information
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
Fast Data Strategy Houston Roadshow Presentation
Fast Data Strategy Houston Roadshow PresentationFast Data Strategy Houston Roadshow Presentation
Fast Data Strategy Houston Roadshow Presentation
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
 
Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)
 
Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft Fabric
 

Recently uploaded

saps4hanaandsapanalyticswheretodowhat1565272000538.pdf
saps4hanaandsapanalyticswheretodowhat1565272000538.pdfsaps4hanaandsapanalyticswheretodowhat1565272000538.pdf
saps4hanaandsapanalyticswheretodowhat1565272000538.pdf
newdirectionconsulta
 
Mumbai Central Call Girls ☑ +91-9833325238 ☑ Available Hot Girls Aunty Book Now
Mumbai Central Call Girls ☑ +91-9833325238 ☑ Available Hot Girls Aunty Book NowMumbai Central Call Girls ☑ +91-9833325238 ☑ Available Hot Girls Aunty Book Now
Mumbai Central Call Girls ☑ +91-9833325238 ☑ Available Hot Girls Aunty Book Now
radhika ansal $A12
 
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
Rebecca Bilbro
 
Bangalore Call Girls ♠ 9079923931 ♠ Beautiful Call Girls In Bangalore
Bangalore Call Girls  ♠ 9079923931 ♠ Beautiful Call Girls In BangaloreBangalore Call Girls  ♠ 9079923931 ♠ Beautiful Call Girls In Bangalore
Bangalore Call Girls ♠ 9079923931 ♠ Beautiful Call Girls In Bangalore
yashusingh54876
 
MySQL Notes For Professionals sttudy.pdf
MySQL Notes For Professionals sttudy.pdfMySQL Notes For Professionals sttudy.pdf
MySQL Notes For Professionals sttudy.pdf
Ananta Patil
 
Econ3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdfEcon3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdf
blueshagoo1
 
Hyderabad Call Girls Service 🔥 9352988975 🔥 High Profile Call Girls Hyderabad
Hyderabad Call Girls Service 🔥 9352988975 🔥 High Profile Call Girls HyderabadHyderabad Call Girls Service 🔥 9352988975 🔥 High Profile Call Girls Hyderabad
Hyderabad Call Girls Service 🔥 9352988975 🔥 High Profile Call Girls Hyderabad
2004kavitajoshi
 
🔥Night Call Girls Pune 💯Call Us 🔝 7014168258 🔝💃Independent Pune Escorts Servi...
🔥Night Call Girls Pune 💯Call Us 🔝 7014168258 🔝💃Independent Pune Escorts Servi...🔥Night Call Girls Pune 💯Call Us 🔝 7014168258 🔝💃Independent Pune Escorts Servi...
🔥Night Call Girls Pune 💯Call Us 🔝 7014168258 🔝💃Independent Pune Escorts Servi...
yuvishachadda
 
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
Timothy Spann
 
Startup Grind Princeton 18 June 2024 - AI Advancement
Startup Grind Princeton 18 June 2024 - AI AdvancementStartup Grind Princeton 18 June 2024 - AI Advancement
Startup Grind Princeton 18 June 2024 - AI Advancement
Timothy Spann
 
Call Girls Hyderabad ❤️ 7339748667 ❤️ With No Advance Payment
Call Girls Hyderabad ❤️ 7339748667 ❤️ With No Advance PaymentCall Girls Hyderabad ❤️ 7339748667 ❤️ With No Advance Payment
Call Girls Hyderabad ❤️ 7339748667 ❤️ With No Advance Payment
prijesh mathew
 
Salesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - CanariasSalesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - Canarias
davidpietrzykowski1
 
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
9gr6pty
 
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
Call Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call GirlCall Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call Girl
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
sapna sharmap11
 
Mumbai Call Girls service 9920874524 Call Girl service in Mumbai Mumbai Call ...
Mumbai Call Girls service 9920874524 Call Girl service in Mumbai Mumbai Call ...Mumbai Call Girls service 9920874524 Call Girl service in Mumbai Mumbai Call ...
Mumbai Call Girls service 9920874524 Call Girl service in Mumbai Mumbai Call ...
hanshkumar9870
 
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
Call Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call GirlCall Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call Girl
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
sapna sharmap11
 
Health care analysis using sentimental analysis
Health care analysis using sentimental analysisHealth care analysis using sentimental analysis
Health care analysis using sentimental analysis
krishnasrigannavarap
 
Fabric Engineering Deep Dive Keynote from Fabric Engineering Roadshow
Fabric Engineering Deep Dive Keynote from Fabric Engineering RoadshowFabric Engineering Deep Dive Keynote from Fabric Engineering Roadshow
Fabric Engineering Deep Dive Keynote from Fabric Engineering Roadshow
Gabi Münster
 
SAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content DocumentSAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content Document
newdirectionconsulta
 
Bangalore ℂall Girl 000000 Bangalore Escorts Service
Bangalore ℂall Girl 000000 Bangalore Escorts ServiceBangalore ℂall Girl 000000 Bangalore Escorts Service
Bangalore ℂall Girl 000000 Bangalore Escorts Service
nhero3888
 

Recently uploaded (20)

saps4hanaandsapanalyticswheretodowhat1565272000538.pdf
saps4hanaandsapanalyticswheretodowhat1565272000538.pdfsaps4hanaandsapanalyticswheretodowhat1565272000538.pdf
saps4hanaandsapanalyticswheretodowhat1565272000538.pdf
 
Mumbai Central Call Girls ☑ +91-9833325238 ☑ Available Hot Girls Aunty Book Now
Mumbai Central Call Girls ☑ +91-9833325238 ☑ Available Hot Girls Aunty Book NowMumbai Central Call Girls ☑ +91-9833325238 ☑ Available Hot Girls Aunty Book Now
Mumbai Central Call Girls ☑ +91-9833325238 ☑ Available Hot Girls Aunty Book Now
 
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
 
Bangalore Call Girls ♠ 9079923931 ♠ Beautiful Call Girls In Bangalore
Bangalore Call Girls  ♠ 9079923931 ♠ Beautiful Call Girls In BangaloreBangalore Call Girls  ♠ 9079923931 ♠ Beautiful Call Girls In Bangalore
Bangalore Call Girls ♠ 9079923931 ♠ Beautiful Call Girls In Bangalore
 
MySQL Notes For Professionals sttudy.pdf
MySQL Notes For Professionals sttudy.pdfMySQL Notes For Professionals sttudy.pdf
MySQL Notes For Professionals sttudy.pdf
 
Econ3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdfEcon3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdf
 
Hyderabad Call Girls Service 🔥 9352988975 🔥 High Profile Call Girls Hyderabad
Hyderabad Call Girls Service 🔥 9352988975 🔥 High Profile Call Girls HyderabadHyderabad Call Girls Service 🔥 9352988975 🔥 High Profile Call Girls Hyderabad
Hyderabad Call Girls Service 🔥 9352988975 🔥 High Profile Call Girls Hyderabad
 
🔥Night Call Girls Pune 💯Call Us 🔝 7014168258 🔝💃Independent Pune Escorts Servi...
🔥Night Call Girls Pune 💯Call Us 🔝 7014168258 🔝💃Independent Pune Escorts Servi...🔥Night Call Girls Pune 💯Call Us 🔝 7014168258 🔝💃Independent Pune Escorts Servi...
🔥Night Call Girls Pune 💯Call Us 🔝 7014168258 🔝💃Independent Pune Escorts Servi...
 
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
 
Startup Grind Princeton 18 June 2024 - AI Advancement
Startup Grind Princeton 18 June 2024 - AI AdvancementStartup Grind Princeton 18 June 2024 - AI Advancement
Startup Grind Princeton 18 June 2024 - AI Advancement
 
Call Girls Hyderabad ❤️ 7339748667 ❤️ With No Advance Payment
Call Girls Hyderabad ❤️ 7339748667 ❤️ With No Advance PaymentCall Girls Hyderabad ❤️ 7339748667 ❤️ With No Advance Payment
Call Girls Hyderabad ❤️ 7339748667 ❤️ With No Advance Payment
 
Salesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - CanariasSalesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - Canarias
 
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
 
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
Call Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call GirlCall Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call Girl
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
 
Mumbai Call Girls service 9920874524 Call Girl service in Mumbai Mumbai Call ...
Mumbai Call Girls service 9920874524 Call Girl service in Mumbai Mumbai Call ...Mumbai Call Girls service 9920874524 Call Girl service in Mumbai Mumbai Call ...
Mumbai Call Girls service 9920874524 Call Girl service in Mumbai Mumbai Call ...
 
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
Call Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call GirlCall Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call Girl
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
 
Health care analysis using sentimental analysis
Health care analysis using sentimental analysisHealth care analysis using sentimental analysis
Health care analysis using sentimental analysis
 
Fabric Engineering Deep Dive Keynote from Fabric Engineering Roadshow
Fabric Engineering Deep Dive Keynote from Fabric Engineering RoadshowFabric Engineering Deep Dive Keynote from Fabric Engineering Roadshow
Fabric Engineering Deep Dive Keynote from Fabric Engineering Roadshow
 
SAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content DocumentSAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content Document
 
Bangalore ℂall Girl 000000 Bangalore Escorts Service
Bangalore ℂall Girl 000000 Bangalore Escorts ServiceBangalore ℂall Girl 000000 Bangalore Escorts Service
Bangalore ℂall Girl 000000 Bangalore Escorts Service
 

LinkedInSaxoBankDataWorkbench

  • 1. FREE TO SHARE Sheetal Pratik - Saxo Bank August 2020 Data Workbench
  • 2. About Saxo We are leading fintech and regtech specialists, connecting traders, investors and partners to more than 35,000 instruments – across all asset classes – from a single account. What we do We build digital platforms to facilitate multi- asset market access and provide clients of all sizes with professional-grade tools, industry- leading prices and best-in-class service. Data ForScale: Transforming DataAccess,DataGovernance and DataQuality 2
  • 3. • A data driven organization need to have multi-level Data Governance. Most of the tools are designed to fix the fact e.g. before a data warehouse load. What is needed is to ensure data integrity at the origin to prevent the “butterfly effect” in the downstream systems. • The article “How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh”, clearly emphasizes on how data platform with a centralized architecture can lead to failures by being bottleneck at certain point and have impact to stability. Also with ownership of data at the domain level, it becomes a failed attempt to manage the data dictionary centrally or duplicate the effort of creating and maintaining such data assets. • Considering this, it is imperative that the solution has to be more futuristic and a straight implementation of any of the COTS products for Data Cataloguing might not be the right answer to Saxo’s Data Governance implementation. • The preferred strategy for tooling is to fix forward rather than attempting to fix the past by using some kind of crawler and using ML to extract the metadata from various data sources. DATA GOVERNANCE FOR A DIGITAL NATIVE 3
  • 4. For DomainTeams Who need visibility onthe availability,meaning, usage,ownershipand quality of data The Data Workbench(Owner's pride Neighbor's envy) Is a one-stopdata shop That provides transparency of Saxo’s data ecosystem Unlike our current state whichis becomingincreasingly complex as we grow Our product will help Saxo to improve time to market andunlock new insights. The Data Workbench is designedto be part of the new data architecture.It consists oftwomaincomponentsa Data Catalogue anda Data QualitySolution. 1. The Data Catalogue captures andexposes metadata. This provides transparency into the meaningand ownershipof ourdata. The Data Catalogue is built on DataHub a data catalogue open-sourcedby LinkedIn. LinkedInis very supportive and are workingclosely withus helping withthe adoptionof the tool. 2. The Data Quality Solutionis built onthe opensource solution Great Expectations supportedbySuperConductive. Great Expectations is a declarative,flexible,andextensible data quality solution. It allows teams to define dataquality rules and actively monitor the quality of their data. VISION 4
  • 5. • Federated Data Governance model is an industry trend where the enterprise governance team facilitates the monitoring and management of the quality of enterprise critical data, with assistance from the business unit. • LinkedIn’s journey of its shift of approach from initial version of Data Governance solution called WhereHows to DataHub , is a typical example of the paradigm shift from “a central metadata repository” solution to a more decentralized architecture that puts domains before anything else to support the possibility of self-service data platform. • We realized that a practical way of implementation would be to stay lean and agile and iteratively work with data domains while establishing the Data Governance framework and thus create a platform that is self-serviced, scalable and more relevant to stakeholders. • We had a discussion with LinkedIn to understand their journey, learnings and lessons learnt that motivated them to evolve from WhereHows to Datahub. We acknowledged that, Saxo Bank is on a similar journey and we can fast forward the implementation by adopting Datahub open sources that best relates to the ecosystem of Saxo Bank. • The LinkedIn datahub team has been extremely responsive. • Other digital natives have also recognized that the incumbent solutions are not fit for the modern age and have built their specific solutions. MOTIVATION FOR THE SOLUTION 5
  • 6. PERSONAS: GOALS AND PAIN POINTSGoalPainPoints Data Asset Owner Data Steward Data Governance Committee Member Data Scientist Data Consumer (Reporting) To find data, its owner of data and anything else that helps compliance To solve business problems based on data To get an overview of whole data in the org To define and document data standards To be responsible and accountable for data Lack of clarity on ownership of data New role in Saxo Bank, so yet to own full responsibility Lack of clarity on what to mandate at what level (federated or data domain level) Data missing/ incomplete See who can explain data elements Not sure if the data can be trusted for making the right decisions 6
  • 7. DATA MESH APPROACH - PRODUCT THINKING 7 DIRECT AGENT/ FLEET ENGAGEMENT SPEED OF DELIVERY Domain Polyglot Output Data Ports Polyglot Data Input Ports Control Ports Logs, metrics Self-serve description Discoverable Addressable Self-describing Trustworthy Interoperable 7
  • 9. STREAM PROCESSING DATA PLATFORM - HIGH LEVEL OVERVIEW 9 ConsumptionDomain Data Source Business Capabilities Business Capabilities Reports Trading Instruments Prices Customers/Parti es Product RX Data Workbench BUSINESS EVENTS STREAM PROCESSING (Enrich/Transform/Aggregate) DATA STORAGE& MANAGEMENT DW DL Storage DATA PRODUCTS Native Data Products Domain Data Products Aggregated Data Products Fit for Purpose Data Products Confluent Kafka Platform Data Catalog: DATAHUB DATA QUALITY Great Expectations: OTHER PROCESSING FRAMEWORKS
  • 10. ● SAXO features list ● SAXO initial evaluation params ● TW extended feature list ● Product documentation ● Software: Local installations ● Vendor questionnaire ● Includes initial & shortlisted list ● Data Catalog ● Data Quality Evaluation Criteria Definition Shortlisted tools Evaluation Process TOOLS EVALUATION PROCESS 10
  • 11. 11 DATA CATALOG TOOL EVALUATION 11 Prioritized Feature List ● Full Text search on dataset name, attributes and tags ● Extensiblesearch model Metadata Search ● Web-based UI to show metadata, governance attributes, tags and lineage ● Ability to edit and enrich attributes Metadata UI ● Push-based REST API ● Pull-based adapters for Snowflake and CRM dynamics ● Extensibility Metadata Ingestion ● Metadata entity for datasets, its users and attributes ● Business glossary & documentation ● Extensibility Metadata Modelling ● Dataset lineagewith upstream and downstream provenance ● Integration with data processing/orchestration tools Data Lineage ● Support for metadata enrichment and tagging ● Ability to flaga dataset Data Stewardship ● Shows related Quality Attributes in UI ● Extensibleto integrate with any DQ tool Data Quality Integration ● Cloud-native(Scalable& High Availability) ● Configurable ● Extensible Architecture ● Data as a Product ● Distributed Domain Driven Architecture ● Self-serviceplatform Alignment with Data Mesh ● Authentication / LDAP ● Authorization / RBAC Security ● Metadata Versioning ● Data Virtualization ● ML/AI capabilities Deprioritized ● Export API Metadata Export ● LicensingCost ● Customization / Development cost Total Cost of Ownership ● Release cycle ● Community support ● Commercial Support ● Documentation Support 11
  • 12. TOOLS LANDSCAPE Data Catalog Collibra Informatica EDC Alatian Data.World Azure Data Catalog - Prev2 Zeenea Apache Atlas Linkedin DataHub Amundsen Marquez Commercial Open Source In House 12
  • 13. TOOLS LANDSCAPE Data Catalog Collibra Informatica EDC Alatian Data.World Azure Data Catalog - Prev2 Zeenea Apache Atlas Linkedin DataHub Amundsen Marquez Commercial Open Source In House 13
  • 14. DATA CATALOG TOOL EVALUATION Deep-dive analysis of the capabilities of shortlisted tools purely as per the teams understaning in Saxo’s context. Datahub Marquez Amundsen Collibra Zeenea Metadata Search * Metadata UI Editable Metadata Ingestion * Metadata Modelling * Data Lineage * Metadata Export * Data Stewardship Data Quality Integration Architecture * Only supports AWS Security * Alignment with Data Mesh * Support Total Cost of Ownership * Completelysuitable Partiallysuitable No/Minimal suitability Commercial Open Source • Push Based approach that supports Event Driven architecture. The solutionbuilt onthe principle of self-service andproducers know theirdata better and they canprovide the rich metadata so that it helps in discover the data-assets andencourages consumption. • DetailedEvaluationwascarriedout fromSaxo perspective. • Possibility of evolution • Extensibility with the open sourceand evolve the toolas per needs. Right fit from feature perspectivein terms of Data Governance maturity of the organization. • Leverage from larger community needs and also influence internal process when needed. • Reputation of LinkedIn, their success in Kafka and datahub scaling internally to LinkedIn volume • Promising Roadmap 14 **Disclaimer**:Based on Saxo evaluation criteriaand interpretationofproduct capabilities
  • 15. DATASET ONBOARDING - RESPONSIBILITIES Metadata ● Dataset/Data Product Metadata ○ Ownership Information ○ Reader Information ○ Topic Configuration Details ○ Dataset Structure (AVRO Schema) ○ Business Term mapping ○ Source Dataset Definition (Optional) ● Quality Rules Data Engineering ● Domain Transformations ● (Kafka Stream) PRODUCERS Metadata ● Consumer Details ● Usage Details ● Target Dataset details CONSUMERS Engineering Capabilities ● Supporting New Domains ● Metadata Integration ● DQ Integration Kafka PLATFORM-Lean Team 15
  翻译: