尊敬的 微信汇率:1円 ≈ 0.046089 元 支付宝汇率:1円 ≈ 0.04618元 [退出登录]
SlideShare a Scribd company logo
Royal Caribbean Cruises, Ltd.
2
• Founded in 1968
• Six companies employing over 65,000
people from 120 countries who have
served over 50 million guests
• Fleet of over 55 ships and growing
• Countless industry “firsts” - such as rock
climbing wall, ice skating, and surfing at
sea
• Each brand delivering a unique Guest
experience
• www.rclcorporate.com
33
44
55
6
77
88
99
1010
1111
What is Cerebro™
Cerebro™ is a project under Excalibur’s data program
focused on delivering a next-generation data
management platform.
Design Drivers and Architecture Principles
12
Cerebro™ is Cloud Native
Cloud-native data lake architecture leveraging vendor managed services
13
Managed Services Container Based
Azure Data Lake Store Azure Data Factory
Storage Type Object Store Document Store Graph Store
Which Data? Sensor data;
financial data;
Reference data;
dynamic schema
Relationships
Which Queries Data science; BI;
large analytical jobs
Single record; small
batches; mutations
Relationship
analysis; mutations
Key Considerations Parquet and Arrow
accelerate queries
Ability to handle
streaming
workloads
Flexibility and ability
to handle
complexity
Cerebro™ Leverages Different Storage Engines
Why there is a need for a Heterogeneous Data Lake
14
Azure Data Lake Store (ADLS)
Cerebro™ Leverages In-
Memory Architecture
• Scalability via distributed in-
memory compute layer, object
storage
• Dremio and Spark anchor in-
memory computing layer
• Parquet and object store (ADLS)
for storage layer, plus MongoDB
and Neo4j
• Dremio and Arrow Flight further
accelerate access and in-
memory processing
15
Compute Layer
Storage Layer
Today Future
(with Arrow Flight)
Cerebro™ - Phase 1
16
• Initial release focused on ingestion of
sources spanning current data silos
• Establishment of a Raw Zone with
Landing and Staging Areas
• Physical storage is file based (CSV,
Parquet) on Azure Data Lake Store
(ADLS) to support variety and variability
of data
• Staging Area requires users to be
familiar with low level data structures in
order to execute queries joining
disparate source systems (e.g. multiple
PMS and Casino sources)
Raw
Zone
Cloud Object Store, Document Store, Graph
Standardized
Zone
Enriched
Zone
Ingest
Batch
CDC
Batch
SFTP
File
RDBMS
Reservations
Customer Master
Property Management
Casino
Clickstream
Marketing
Metadata Management, Data Catalog, Data Ingestion, Data Integration
Data Virtualization, Self-service BI, Advanced Analytics
Data
Engineers
Operational
Analytics
BI
Analysts
Self-Service
Dashboards
Data
Scientists
Advanced
Analytics
Data
Stewards
Compliance
Analytics
Landing Area
Staging Area
Transform Consume
Data Pipeline – Phase 1
17
Data
Engineers
Data
Scientists
• Talend utilized to ingest data from a
number of sources (RDBMS, File-based,
API) into CSV files stored in the Landing
Area (ADLS)
• Talend / Spark leveraged to create
Parquet files in the Staging Area (ADLS)
• In-memory columnar (Arrow) via Dremio
accelerates SQL based query access for
data engineering and data science use
cases
• Leverages data virtualization within
Dremio to support simple ad-hoc
integration and agile exploration
• Supports data science and advanced
analytics (AI/ML) via Azure Databricks
(Python, Scala, Java, R)
Ingest
Talend
Azure HDInsight
Persist
Azure Data Lake Store
Model/PredictExplore
Dremio
Azure Data Catalog
Azure Databricks
Python
Scala
Java
R
Roles
Azure Data Lake Store
Azure HDInsight
Azure Data Catalog
Cerebro™ - Phase 2
18
• Implementation of a Standardized Zone
based on semantic view of entities that
will be easier to query for casual users
• Introduction of MongoDB (Document)
will allow the platform to support low
latency ingestion and consumption of
customer data required to support
downstream applications (Call Center)
• Dremio still leveraged to support
analytical use cases involving customer
data stored in MongoDB (Marketing)
• Introduction of Neo4j (Graph) will
increase overall agility (relationships) as
well as provide insights by leveraging
advanced functionality (patterns,
recommendations)
Raw
Zone
Cloud Object Store, Document Store, Graph
Standardized
Zone
Enriched
Zone
Ingest
Batch
CDC
Batch
SFTP
File
RDBMS
Reservations
Customer Master
Property Management
Casino
Clickstream
Marketing
Metadata Management, Data Catalog, Data Ingestion, Data Integration
Data Virtualization, Self-service BI, Advanced Analytics
Data
Engineers
Operational
Analytics
BI
Analysts
Self-Service
Dashboards
Data
Scientists
Advanced
Analytics
Data
Stewards
Compliance
Analytics
Landing Area
Staging Area
Transform Consume
Downstream
Applications
Developers
Data Pipeline – Phase 2
19
Data
Engineers
Data
Scientists
Ingest/Process
Talend
Azure HDInsight
Azure Databricks
Azure Data Factory
Persist
Azure Data Lake Store
MongoDB Atlas
Neo4j
Model/PredictExplore/Visualize
Dremio
Azure Data Catalog
Power BI
Azure Databricks
Python
Scala
Java
R
Roles
• Talend used to develop pipelines that
process (cleanse, integrate, harmonize)
data sourced from Raw Zone
• Data resulting from pipeline executions
is persisted in the appropriate store(s)
(ADLS, Neo4j and MongoDB) to support
both analytical and operational
requirements
• Develop services to be consumed by
customer facing applications and other
downstream processes via managed
APIs
BI
Analysts
Data
Stewards
Services
Azure Functions
Apigee
Azure Kubernetes Service
Azure HDInsight
Azure Data Lake Store
Azure Data Catalog
Azure Data Factory
Azure Kubernetes Service
Azure Functions
User ExperienceProcessIngestData Sources
Consumers
Modern
Analytics
Modern
Data Platform
BusinessAnalystsDataScientists
Batch
Integration
Applications
Streaming
Integration
Kafka on
HDInsight
On-Premises
Property
Management
Customer
Master
Reservations
Casino
Spark on
HDInsight
Talend
Big Data
Azure Data Lake Store
External
Clickstream
Customer
Feedback
Campaign
Management
Neo4j Causal Cluster
Azure Event Hubs
Self-Service
Data Analytics
Azure Data Catalog
Advanced Analytics
Azure Data Factory
Data Services
Azure Functions
Azure Kubernetes Service
MongoDB Atlas
20
DBeaver EE

More Related Content

What's hot

Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APS
Stéphane Fréchette
 
Versa Shore Microsoft APS PDW webinar
Versa Shore Microsoft APS PDW webinarVersa Shore Microsoft APS PDW webinar
Versa Shore Microsoft APS PDW webinar
Shawn Rao
 
Tools and approaches for migrating big datasets to the cloud
Tools and approaches for migrating big datasets to the cloudTools and approaches for migrating big datasets to the cloud
Tools and approaches for migrating big datasets to the cloud
DataWorks Summit
 
Scaling Multi-Cloud Deployments with Denodo: Automated Infrastructure Management
Scaling Multi-Cloud Deployments with Denodo: Automated Infrastructure ManagementScaling Multi-Cloud Deployments with Denodo: Automated Infrastructure Management
Scaling Multi-Cloud Deployments with Denodo: Automated Infrastructure Management
Denodo
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture Design
Kujambu Murugesan
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Building IoT and Big Data Solutions on Azure
Building IoT and Big Data Solutions on AzureBuilding IoT and Big Data Solutions on Azure
Building IoT and Big Data Solutions on Azure
Ido Flatow
 
Leap to Next Generation Data Management with Denodo 7.0
Leap to Next Generation Data Management with Denodo 7.0Leap to Next Generation Data Management with Denodo 7.0
Leap to Next Generation Data Management with Denodo 7.0
Denodo
 
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
Dipti Borkar
 
Scalable data pipeline
Scalable data pipelineScalable data pipeline
Scalable data pipeline
GreenM
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architecture
Sudheer Kondla
 
Dremio introduction
Dremio introductionDremio introduction
Dremio introduction
Alexis Gendronneau
 
Data Virtualization and ETL
Data Virtualization and ETLData Virtualization and ETL
Data Virtualization and ETL
Lily Luo
 
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...
Data Con LA
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Bridging to a hybrid cloud data services architecture
Bridging to a hybrid cloud data services architectureBridging to a hybrid cloud data services architecture
Bridging to a hybrid cloud data services architecture
IBM Analytics
 
Azure Big Data Story
Azure Big Data StoryAzure Big Data Story
Azure Big Data Story
Lynn Langit
 
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionHow One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
DataWorks Summit
 
Denodo Data Virtualization Platform: Scalability (session 3 from Architect to...
Denodo Data Virtualization Platform: Scalability (session 3 from Architect to...Denodo Data Virtualization Platform: Scalability (session 3 from Architect to...
Denodo Data Virtualization Platform: Scalability (session 3 from Architect to...
Denodo
 
The Analytics Data Store: Information Supply Framework
The Analytics Data Store: Information Supply FrameworkThe Analytics Data Store: Information Supply Framework
The Analytics Data Store: Information Supply Framework
Martyn Richard Jones
 

What's hot (20)

Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APS
 
Versa Shore Microsoft APS PDW webinar
Versa Shore Microsoft APS PDW webinarVersa Shore Microsoft APS PDW webinar
Versa Shore Microsoft APS PDW webinar
 
Tools and approaches for migrating big datasets to the cloud
Tools and approaches for migrating big datasets to the cloudTools and approaches for migrating big datasets to the cloud
Tools and approaches for migrating big datasets to the cloud
 
Scaling Multi-Cloud Deployments with Denodo: Automated Infrastructure Management
Scaling Multi-Cloud Deployments with Denodo: Automated Infrastructure ManagementScaling Multi-Cloud Deployments with Denodo: Automated Infrastructure Management
Scaling Multi-Cloud Deployments with Denodo: Automated Infrastructure Management
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture Design
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Building IoT and Big Data Solutions on Azure
Building IoT and Big Data Solutions on AzureBuilding IoT and Big Data Solutions on Azure
Building IoT and Big Data Solutions on Azure
 
Leap to Next Generation Data Management with Denodo 7.0
Leap to Next Generation Data Management with Denodo 7.0Leap to Next Generation Data Management with Denodo 7.0
Leap to Next Generation Data Management with Denodo 7.0
 
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
 
Scalable data pipeline
Scalable data pipelineScalable data pipeline
Scalable data pipeline
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architecture
 
Dremio introduction
Dremio introductionDremio introduction
Dremio introduction
 
Data Virtualization and ETL
Data Virtualization and ETLData Virtualization and ETL
Data Virtualization and ETL
 
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Big Data in Azure
 
Bridging to a hybrid cloud data services architecture
Bridging to a hybrid cloud data services architectureBridging to a hybrid cloud data services architecture
Bridging to a hybrid cloud data services architecture
 
Azure Big Data Story
Azure Big Data StoryAzure Big Data Story
Azure Big Data Story
 
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionHow One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
 
Denodo Data Virtualization Platform: Scalability (session 3 from Architect to...
Denodo Data Virtualization Platform: Scalability (session 3 from Architect to...Denodo Data Virtualization Platform: Scalability (session 3 from Architect to...
Denodo Data Virtualization Platform: Scalability (session 3 from Architect to...
 
The Analytics Data Store: Information Supply Framework
The Analytics Data Store: Information Supply FrameworkThe Analytics Data Store: Information Supply Framework
The Analytics Data Store: Information Supply Framework
 

Similar to Cerebro: Bringing together data scientists and bi users - Royal Caribbean - Strata - London 2019

Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
Amazon Web Services
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
DataStax
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
Serverless SQL
Serverless SQLServerless SQL
Serverless SQL
Torsten Steinbach
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SF
Amazon Web Services
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
Amazon Web Services
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
Alex Ivy
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Databricks
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
DATAVERSITY
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
DATAVERSITY
 
Engineering practices in big data storage and processing
Engineering practices in big data storage and processingEngineering practices in big data storage and processing
Engineering practices in big data storage and processing
Schubert Zhang
 
IBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeIBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lake
Torsten Steinbach
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake House
Data Con LA
 
Building Big Data Solutions with Azure Data Lake.10.11.17.pptx
Building Big Data Solutions with Azure Data Lake.10.11.17.pptxBuilding Big Data Solutions with Azure Data Lake.10.11.17.pptx
Building Big Data Solutions with Azure Data Lake.10.11.17.pptx
thando80
 
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraLow-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Caserta
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
Ashnikbiz
 
AWS Big Data Platform
AWS Big Data PlatformAWS Big Data Platform
AWS Big Data Platform
Amazon Web Services
 
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Trivadis
 
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data FabricUsing Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Cambridge Semantics
 
Accelerate and modernize your data pipelines
Accelerate and modernize your data pipelinesAccelerate and modernize your data pipelines
Accelerate and modernize your data pipelines
Paul Van Siclen
 

Similar to Cerebro: Bringing together data scientists and bi users - Royal Caribbean - Strata - London 2019 (20)

Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Serverless SQL
Serverless SQLServerless SQL
Serverless SQL
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SF
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
 
Engineering practices in big data storage and processing
Engineering practices in big data storage and processingEngineering practices in big data storage and processing
Engineering practices in big data storage and processing
 
IBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeIBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lake
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake House
 
Building Big Data Solutions with Azure Data Lake.10.11.17.pptx
Building Big Data Solutions with Azure Data Lake.10.11.17.pptxBuilding Big Data Solutions with Azure Data Lake.10.11.17.pptx
Building Big Data Solutions with Azure Data Lake.10.11.17.pptx
 
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraLow-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
 
AWS Big Data Platform
AWS Big Data PlatformAWS Big Data Platform
AWS Big Data Platform
 
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
 
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data FabricUsing Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
 
Accelerate and modernize your data pipelines
Accelerate and modernize your data pipelinesAccelerate and modernize your data pipelines
Accelerate and modernize your data pipelines
 

Recently uploaded

Chapter 6 - Test Tools Considerations V4.0
Chapter 6 - Test Tools Considerations V4.0Chapter 6 - Test Tools Considerations V4.0
Chapter 6 - Test Tools Considerations V4.0
Neeraj Kumar Singh
 
Communications Mining Series - Zero to Hero - Session 2
Communications Mining Series - Zero to Hero - Session 2Communications Mining Series - Zero to Hero - Session 2
Communications Mining Series - Zero to Hero - Session 2
DianaGray10
 
ThousandEyes New Product Features and Release Highlights: June 2024
ThousandEyes New Product Features and Release Highlights: June 2024ThousandEyes New Product Features and Release Highlights: June 2024
ThousandEyes New Product Features and Release Highlights: June 2024
ThousandEyes
 
Elasticity vs. State? Exploring Kafka Streams Cassandra State Store
Elasticity vs. State? Exploring Kafka Streams Cassandra State StoreElasticity vs. State? Exploring Kafka Streams Cassandra State Store
Elasticity vs. State? Exploring Kafka Streams Cassandra State Store
ScyllaDB
 
DynamoDB to ScyllaDB: Technical Comparison and the Path to Success
DynamoDB to ScyllaDB: Technical Comparison and the Path to SuccessDynamoDB to ScyllaDB: Technical Comparison and the Path to Success
DynamoDB to ScyllaDB: Technical Comparison and the Path to Success
ScyllaDB
 
Cyber Recovery Wargame
Cyber Recovery WargameCyber Recovery Wargame
Cyber Recovery Wargame
Databarracks
 
MongoDB vs ScyllaDB: Tractian’s Experience with Real-Time ML
MongoDB vs ScyllaDB: Tractian’s Experience with Real-Time MLMongoDB vs ScyllaDB: Tractian’s Experience with Real-Time ML
MongoDB vs ScyllaDB: Tractian’s Experience with Real-Time ML
ScyllaDB
 
Database Management Myths for Developers
Database Management Myths for DevelopersDatabase Management Myths for Developers
Database Management Myths for Developers
John Sterrett
 
CTO Insights: Steering a High-Stakes Database Migration
CTO Insights: Steering a High-Stakes Database MigrationCTO Insights: Steering a High-Stakes Database Migration
CTO Insights: Steering a High-Stakes Database Migration
ScyllaDB
 
Automation Student Developers Session 3: Introduction to UI Automation
Automation Student Developers Session 3: Introduction to UI AutomationAutomation Student Developers Session 3: Introduction to UI Automation
Automation Student Developers Session 3: Introduction to UI Automation
UiPathCommunity
 
Fuxnet [EN] .pdf
Fuxnet [EN]                                   .pdfFuxnet [EN]                                   .pdf
Fuxnet [EN] .pdf
Overkill Security
 
Call Girls Kochi 💯Call Us 🔝 7426014248 🔝 Independent Kochi Escorts Service Av...
Call Girls Kochi 💯Call Us 🔝 7426014248 🔝 Independent Kochi Escorts Service Av...Call Girls Kochi 💯Call Us 🔝 7426014248 🔝 Independent Kochi Escorts Service Av...
Call Girls Kochi 💯Call Us 🔝 7426014248 🔝 Independent Kochi Escorts Service Av...
dipikamodels1
 
From NCSA to the National Research Platform
From NCSA to the National Research PlatformFrom NCSA to the National Research Platform
From NCSA to the National Research Platform
Larry Smarr
 
Chapter 1 - Fundamentals of Testing V4.0
Chapter 1 - Fundamentals of Testing V4.0Chapter 1 - Fundamentals of Testing V4.0
Chapter 1 - Fundamentals of Testing V4.0
Neeraj Kumar Singh
 
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
AlexanderRichford
 
APJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes WebinarAPJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes Webinar
ThousandEyes
 
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
manji sharman06
 
Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!
Ortus Solutions, Corp
 
The "Zen" of Python Exemplars - OTel Community Day
The "Zen" of Python Exemplars - OTel Community DayThe "Zen" of Python Exemplars - OTel Community Day
The "Zen" of Python Exemplars - OTel Community Day
Paige Cruz
 
intra-mart Accel series 2024 Spring updates_En
intra-mart Accel series 2024 Spring updates_Enintra-mart Accel series 2024 Spring updates_En
intra-mart Accel series 2024 Spring updates_En
NTTDATA INTRAMART
 

Recently uploaded (20)

Chapter 6 - Test Tools Considerations V4.0
Chapter 6 - Test Tools Considerations V4.0Chapter 6 - Test Tools Considerations V4.0
Chapter 6 - Test Tools Considerations V4.0
 
Communications Mining Series - Zero to Hero - Session 2
Communications Mining Series - Zero to Hero - Session 2Communications Mining Series - Zero to Hero - Session 2
Communications Mining Series - Zero to Hero - Session 2
 
ThousandEyes New Product Features and Release Highlights: June 2024
ThousandEyes New Product Features and Release Highlights: June 2024ThousandEyes New Product Features and Release Highlights: June 2024
ThousandEyes New Product Features and Release Highlights: June 2024
 
Elasticity vs. State? Exploring Kafka Streams Cassandra State Store
Elasticity vs. State? Exploring Kafka Streams Cassandra State StoreElasticity vs. State? Exploring Kafka Streams Cassandra State Store
Elasticity vs. State? Exploring Kafka Streams Cassandra State Store
 
DynamoDB to ScyllaDB: Technical Comparison and the Path to Success
DynamoDB to ScyllaDB: Technical Comparison and the Path to SuccessDynamoDB to ScyllaDB: Technical Comparison and the Path to Success
DynamoDB to ScyllaDB: Technical Comparison and the Path to Success
 
Cyber Recovery Wargame
Cyber Recovery WargameCyber Recovery Wargame
Cyber Recovery Wargame
 
MongoDB vs ScyllaDB: Tractian’s Experience with Real-Time ML
MongoDB vs ScyllaDB: Tractian’s Experience with Real-Time MLMongoDB vs ScyllaDB: Tractian’s Experience with Real-Time ML
MongoDB vs ScyllaDB: Tractian’s Experience with Real-Time ML
 
Database Management Myths for Developers
Database Management Myths for DevelopersDatabase Management Myths for Developers
Database Management Myths for Developers
 
CTO Insights: Steering a High-Stakes Database Migration
CTO Insights: Steering a High-Stakes Database MigrationCTO Insights: Steering a High-Stakes Database Migration
CTO Insights: Steering a High-Stakes Database Migration
 
Automation Student Developers Session 3: Introduction to UI Automation
Automation Student Developers Session 3: Introduction to UI AutomationAutomation Student Developers Session 3: Introduction to UI Automation
Automation Student Developers Session 3: Introduction to UI Automation
 
Fuxnet [EN] .pdf
Fuxnet [EN]                                   .pdfFuxnet [EN]                                   .pdf
Fuxnet [EN] .pdf
 
Call Girls Kochi 💯Call Us 🔝 7426014248 🔝 Independent Kochi Escorts Service Av...
Call Girls Kochi 💯Call Us 🔝 7426014248 🔝 Independent Kochi Escorts Service Av...Call Girls Kochi 💯Call Us 🔝 7426014248 🔝 Independent Kochi Escorts Service Av...
Call Girls Kochi 💯Call Us 🔝 7426014248 🔝 Independent Kochi Escorts Service Av...
 
From NCSA to the National Research Platform
From NCSA to the National Research PlatformFrom NCSA to the National Research Platform
From NCSA to the National Research Platform
 
Chapter 1 - Fundamentals of Testing V4.0
Chapter 1 - Fundamentals of Testing V4.0Chapter 1 - Fundamentals of Testing V4.0
Chapter 1 - Fundamentals of Testing V4.0
 
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
 
APJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes WebinarAPJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes Webinar
 
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
 
Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!
 
The "Zen" of Python Exemplars - OTel Community Day
The "Zen" of Python Exemplars - OTel Community DayThe "Zen" of Python Exemplars - OTel Community Day
The "Zen" of Python Exemplars - OTel Community Day
 
intra-mart Accel series 2024 Spring updates_En
intra-mart Accel series 2024 Spring updates_Enintra-mart Accel series 2024 Spring updates_En
intra-mart Accel series 2024 Spring updates_En
 

Cerebro: Bringing together data scientists and bi users - Royal Caribbean - Strata - London 2019

  • 1.
  • 2. Royal Caribbean Cruises, Ltd. 2 • Founded in 1968 • Six companies employing over 65,000 people from 120 countries who have served over 50 million guests • Fleet of over 55 ships and growing • Countless industry “firsts” - such as rock climbing wall, ice skating, and surfing at sea • Each brand delivering a unique Guest experience • www.rclcorporate.com
  • 3. 33
  • 4. 44
  • 5. 55
  • 6. 6
  • 7. 77
  • 8. 88
  • 9. 99
  • 10. 1010
  • 11. 1111
  • 12. What is Cerebro™ Cerebro™ is a project under Excalibur’s data program focused on delivering a next-generation data management platform. Design Drivers and Architecture Principles 12
  • 13. Cerebro™ is Cloud Native Cloud-native data lake architecture leveraging vendor managed services 13 Managed Services Container Based Azure Data Lake Store Azure Data Factory
  • 14. Storage Type Object Store Document Store Graph Store Which Data? Sensor data; financial data; Reference data; dynamic schema Relationships Which Queries Data science; BI; large analytical jobs Single record; small batches; mutations Relationship analysis; mutations Key Considerations Parquet and Arrow accelerate queries Ability to handle streaming workloads Flexibility and ability to handle complexity Cerebro™ Leverages Different Storage Engines Why there is a need for a Heterogeneous Data Lake 14 Azure Data Lake Store (ADLS)
  • 15. Cerebro™ Leverages In- Memory Architecture • Scalability via distributed in- memory compute layer, object storage • Dremio and Spark anchor in- memory computing layer • Parquet and object store (ADLS) for storage layer, plus MongoDB and Neo4j • Dremio and Arrow Flight further accelerate access and in- memory processing 15 Compute Layer Storage Layer Today Future (with Arrow Flight)
  • 16. Cerebro™ - Phase 1 16 • Initial release focused on ingestion of sources spanning current data silos • Establishment of a Raw Zone with Landing and Staging Areas • Physical storage is file based (CSV, Parquet) on Azure Data Lake Store (ADLS) to support variety and variability of data • Staging Area requires users to be familiar with low level data structures in order to execute queries joining disparate source systems (e.g. multiple PMS and Casino sources) Raw Zone Cloud Object Store, Document Store, Graph Standardized Zone Enriched Zone Ingest Batch CDC Batch SFTP File RDBMS Reservations Customer Master Property Management Casino Clickstream Marketing Metadata Management, Data Catalog, Data Ingestion, Data Integration Data Virtualization, Self-service BI, Advanced Analytics Data Engineers Operational Analytics BI Analysts Self-Service Dashboards Data Scientists Advanced Analytics Data Stewards Compliance Analytics Landing Area Staging Area Transform Consume
  • 17. Data Pipeline – Phase 1 17 Data Engineers Data Scientists • Talend utilized to ingest data from a number of sources (RDBMS, File-based, API) into CSV files stored in the Landing Area (ADLS) • Talend / Spark leveraged to create Parquet files in the Staging Area (ADLS) • In-memory columnar (Arrow) via Dremio accelerates SQL based query access for data engineering and data science use cases • Leverages data virtualization within Dremio to support simple ad-hoc integration and agile exploration • Supports data science and advanced analytics (AI/ML) via Azure Databricks (Python, Scala, Java, R) Ingest Talend Azure HDInsight Persist Azure Data Lake Store Model/PredictExplore Dremio Azure Data Catalog Azure Databricks Python Scala Java R Roles Azure Data Lake Store Azure HDInsight Azure Data Catalog
  • 18. Cerebro™ - Phase 2 18 • Implementation of a Standardized Zone based on semantic view of entities that will be easier to query for casual users • Introduction of MongoDB (Document) will allow the platform to support low latency ingestion and consumption of customer data required to support downstream applications (Call Center) • Dremio still leveraged to support analytical use cases involving customer data stored in MongoDB (Marketing) • Introduction of Neo4j (Graph) will increase overall agility (relationships) as well as provide insights by leveraging advanced functionality (patterns, recommendations) Raw Zone Cloud Object Store, Document Store, Graph Standardized Zone Enriched Zone Ingest Batch CDC Batch SFTP File RDBMS Reservations Customer Master Property Management Casino Clickstream Marketing Metadata Management, Data Catalog, Data Ingestion, Data Integration Data Virtualization, Self-service BI, Advanced Analytics Data Engineers Operational Analytics BI Analysts Self-Service Dashboards Data Scientists Advanced Analytics Data Stewards Compliance Analytics Landing Area Staging Area Transform Consume Downstream Applications Developers
  • 19. Data Pipeline – Phase 2 19 Data Engineers Data Scientists Ingest/Process Talend Azure HDInsight Azure Databricks Azure Data Factory Persist Azure Data Lake Store MongoDB Atlas Neo4j Model/PredictExplore/Visualize Dremio Azure Data Catalog Power BI Azure Databricks Python Scala Java R Roles • Talend used to develop pipelines that process (cleanse, integrate, harmonize) data sourced from Raw Zone • Data resulting from pipeline executions is persisted in the appropriate store(s) (ADLS, Neo4j and MongoDB) to support both analytical and operational requirements • Develop services to be consumed by customer facing applications and other downstream processes via managed APIs BI Analysts Data Stewards Services Azure Functions Apigee Azure Kubernetes Service Azure HDInsight Azure Data Lake Store Azure Data Catalog Azure Data Factory Azure Kubernetes Service Azure Functions
  • 20. User ExperienceProcessIngestData Sources Consumers Modern Analytics Modern Data Platform BusinessAnalystsDataScientists Batch Integration Applications Streaming Integration Kafka on HDInsight On-Premises Property Management Customer Master Reservations Casino Spark on HDInsight Talend Big Data Azure Data Lake Store External Clickstream Customer Feedback Campaign Management Neo4j Causal Cluster Azure Event Hubs Self-Service Data Analytics Azure Data Catalog Advanced Analytics Azure Data Factory Data Services Azure Functions Azure Kubernetes Service MongoDB Atlas 20 DBeaver EE
  翻译: