尊敬的 微信汇率:1円 ≈ 0.046239 元 支付宝汇率:1円 ≈ 0.04633元 [退出登录]
SlideShare a Scribd company logo
Using Data
Platforms that are
Fit-For-Purpose
Presented by: William McKnight
“#1 Global Influencer in Data Warehousing” Onalytica
President, McKnight Consulting Group
A 2-time Inc. 5000 Company
@williammcknight
www.mcknightcg.com
(214) 514-1444
© All rights reserved. Matillion 20 21
A ve n d or p e rsp e ctive from Matillion
Modern Data Storage
Evolutions
© All rights reserved. Matillion 20 21
2
Paul Lacey
Sr. Dire ctor P rod u ct Marke tin g
Matillion
Sp e ake r
© All rights reserved. Matillion 20 21
© Matillion. All rights reserved 2021
History of Data Warehousing
1960
DBMS
1970
RDBMS
1980
SQL
1988
Da ta
W a re h ou se
1992
P u b lish e d :
Bu ild in g th e
Da ta W a re h ou se
1996
P u b lish e d :
Th e Da ta
W a re h ou se
Toolkit
2005
Ha d oop
Big Da ta
2013
Clou d Da ta
W a re h ou se
2015
Clou d ETL
2017
La ke h ou se
3
© All rights reserved. Matillion 20 21
© Matillion. All rights reserved 2021
History of Data Warehousing
2005
Ha d oop
‘Big Da ta ’
2011 2013 2014 2017
Clou d ETL
2015
Sp e ctru m
2017 2019
2013 2014
2011
4
© All rights reserved. Matillion 20 21
© Matillion. All rights reserved 2021
Architectural Paradigms
Effort to Reward
Innovation
Original Big Data Stack
Pipeline 2.0
Hybrid Storage
Lakehouse
2005
2013
2015
2017
5
© All rights reserved. Matillion 20 21
© Matillion. All rights reserved 2021
The Original Big Data Stack
6
© All rights reserved. Matillion 20 21
© Matillion. All rights reserved 2021
Pipeline 2.0
7
© All rights reserved. Matillion 20 21
© Matillion. All rights reserved 2021
Hybrid Storage
8
© All rights reserved. Matillion 20 21
ML & BI Se p arate d an d Siloe d
Data
En g in e e rin g
Data Scie n ce
W ran g lin g ETL Data P re p
Storag e Data W are h ou se Data Lake
P roce ssin g Scala P an d as
Orch e stration Airflow
Ju p yte r
Note b ooks
Visu alization Tab le au Matp lotlib
9
9
© All rights reserved. Matillion 20 21
Com b in e d ata
scie n ce an d d ata
e n g in e e rin g
w orkflow s
The
Lakehouse
Approach
1
10
© All rights reserved. Matillion 20 21
© Matillion. All rights reserved 2021
Lake h ou se – b e st of b oth w orld s
Data W are h ou se Data Lake
Stre am in g
An alytics
BI Data
Scie n ce
Mach in e
Le arn in g
Stru ctu re d , Se m i-Stru ctu re d an d
Un stru ctu re d Data
11
© All rights reserved. Matillion 20 21
© Matillion. All rights reserved 2021
Lakehouse
12
© All rights reserved. Matillion 20 21
© Matillion. All rights reserved 2021
Fam iliar In te rface s
On e Datase t Analysts
Matillion ELT Logic and Orchestration
SQL
ACID
PySpark
Spark
Data
Scientists
Data
Engineers
Data Eng Data Science
Integrators Business Users
13
© All rights reserved. Matillion 20 21
© Matillion. All rights reserved 2021
Acce ssib le
In te g ration
Brin g s Un ifie d
An alytics
14
© All rights reserved. Matillion 2021
More In fo:
m atillion .com
15
© All rights reserved. Matillion 20 21
Matillion # Te am Gre e n Matillion # te am g re e n
m atillion .com
Th an k You !
16
Performance
• Performance is a critical point of interest when it
comes to selecting an analytics platform.
• To measure data warehouse performance, we use
similarly priced specifications across data
warehouse competitors.
• Usually when people say they care about
performance, it is the ultimate metric of
price/performance.
• The realities of creating fair tests can be
overwhelming to many shops, and is a task usually
underestimated.
2
The Perils of Performance Alone
• A modern workload is less
frequently a set number of
queries, but more of an
interactive variable number of
queries
• A lack of certain key features
and functions in the chosen
platform leads to increased
time spent
• There can be some hidden
downsides to some data
warehouse platforms have
features that appear beneficial
and desirable
3
Enterprise Analytic Platforms
Data is FFP when it is…
• In a leveragable platform
• In an appropriate platform for its profile and
usage
• With high non-functionals (Availability,
performance, scalability, stability, durability,
secure)
• Data is captured at the most granular level
• Data is at a data quality standard (as defined by
Data Governance)
5
Product Setup
6
Cost Predictability and Transparency
• The cost profile options for cloud databases
are straightforward if you accept the defaults
for simple workload or proof-of-concept (POC)
environments
• Initial entry costs and inadequately scoped
environments can artificially lower
expectations of the true costs of jumping into
a cloud data warehouse environment.
• For some, you pay for compute resources as a
function of time, but you also choose the
hourly rate based on certain enterprise
features you need.
7
Cost Consciousness and Licensing Structure
• Be on the lookout for cost optimizations like not
paying when the system is idle, compression to
save storage costs, and moving or isolating
workloads to avoid contention.
• Look for the ability to directly operate on compact
open file formats Parquet and ORC
• Also, costs can spin out of control if you have to
pay a separate license for each deployment option
or each machine learning algorithm.
8
Easy Administration
• Overall costs, time, as well as storage and compute
resources are affected by the simplicity of
configurability and overall use.
• The platform should have embraced a self-sufficiency
model for its customers and be well into the process of
automating repetitive tasks.
• Easy administration starts with setup that is a simple
process of asking basic information and providing
helpful information for selecting the storage and node
configurations.
9
Optimizer Robustness
• The data warehouse should be designed for
complex decision support and machine learning
activity in a multi-user, mixed workload, highly
concurrent environment.
• Check on conditional parallelism and what the
causes are of variations in the parallelism
deployed.
• Check on dynamic and controllable
prioritization of resources for queries.
10
Dedicated Compute
• The dedicated compute category represents the heart of
the analytics stack—the data warehouse itself.
• A modern cloud data warehouse must have separate
compute and storage architecture.
• The power to scale compute and storage independently of
one another has transitioned from an industry trend to an
industry standard.
11
Dedicated Storage
• The dedicated storage category represents
storage of the enterprise data.
• In former days, this data was tightly-coupled to
the data warehouse itself, but modern cloud
architecture allows for the data to be stored
separately (and priced separately).
12
Data Integration
• The data integration category represents the
movement of enterprise data from source to
the target data warehouse through
conventional ETL (extract-transform-load) and
ELT (extract-load-transform) methods.
13
Data Access
• Azure Synapse and Google BigQuery have a “serverless” pricing model that allows
users to run queries and only pay for the data they scan and not an hourly rate for
compute.
• Redshift has the Spectrum service to scan data in S3 without loading it into the data
warehouse; however, you pay for the data scanned, plus you need a running Redshift
cluster at an additional charge.
• For Snowflake, you pay for the compute, but not for data scanned. For all these
scenarios (except Snowflake), we assumed 500TB scanned per month for the
Medium-tier enterprise and 2,000TB scanned for Large organizations.
14
Data Lake
• The data lake category represents the use of a
data lake that is separate from the data. This is
common in many modern data-driven
organizations as a way to store and analyze
massive data sets of “colder” data that don’t
necessarily belong in the data warehouse.
15
Sample Breakout (AWS)
16
Dedicated
Compute
43%
Storage
0%
Data Integration
14%
Streaming
4%
Spark Analytics
3%
Data Exploration
6%
Data Lake
20%
BI
5%
Machine
Learning
5%
Identity
Management
0%
Data Catalog
0%
Product Utilization
17
Concurrency Scaling
• If the database has concurrency limitations, designing
around them is difficult at best, and limiting to effective
data usage.
• If the data warehouse automatically scales up to
overcome concurrency limitations, this may be costly if
the data warehouse charges by compute node.
• If the data warehouse charges per user, costs will also
increase as the data warehouse is put to more use in the
company.
• Look for a data warehouse to provide linear scaling in
overall query workload performance as concurrent users
are added.
18
Resource Elasticity
• A data warehouse needs to be able to scale up and down and
take advantage of the elastic compute and storage
capabilities in the cloud, public or private, without disruption
or delay.
• The more the customer needs to be involved in resource
determination and provisioning, the less elastic, and less
modern, the solution is.
• One thing to watch for in elasticity scaling is keeping the
amount of money spent by the customer under the
customer’s control.
19
Machine Learning
• Today, data warehouse query languages need to be extended to include machine
learning, or firms may find the programming required will be too challenging to keep
pace.
• Data warehouses today need to weave machine learning into their data processing
workflows.
• Vendors must accommodate and extend SQL to include machine learning functions
and algorithms to expand the capabilities of those tools and users.
• If your database does not include machine learning, there are many extra things to
be concerned with.
• Other components will be needed to complete the toolbox and get the job done.
• Ideally, security for machine learning will be the same as database security.
• The data warehouse also needs to be able to operate at scale, beyond sampling.
20
Data Storage Format Alternatives
• Cloud object storage is relatively inexpensive making data storage at high
scale affordable.
• On-premises, specialized private cloud storage options such as Pure
Flashblades tend to offer similar data type storage flexibility
• To take full advantage of the elasticity of the cloud without driving up costs,
data warehouse compute and storage need to be scaled separately.
• To take full advantage of the many types of data available, such as Apache
ORC, Apache Parquet, JSON, Apache Avro, etc., modern data warehouses
need to be able to analyze that data without moving or altering it.
• A unified analytics warehouse that supports these various data formats
means you have the benefit of querying them directly, without greatly
expanding the hierarchical complex data types to a standard tabular data
structure for analysis.
• You should also be able to import data directly from these formats
• The ability to join data for analysis between the various internal and external
data formats provides the highest level of analytic flexibility.
21
Hadoop Sequence File and Parquet File
22
Graph Databases
Bridge
vertex
Bridge
vertex
23
• Subject: John R Peterson Predicate: Knows Object: Frank T Smith
• Subject: Triple #1 Predicate: Confidence Percent Object: 70
• Subject: Triple #1 Predicate: Provenance Object: Mary L Jones
USAGE UNDERSTANDING BY THE BUILDERS
DATA
CULTIVATION
Data Warehouse
Data Lake
Balance of Analytics
Analytic Applications
DW
Data Lake
Analytic Applications
DW
Data Lake
Analytic Applications
DW
Data Lake
DW
Design Your Test
• What are you benchmarking?
– Query performance
– Load performance
– Query performance with concurrency
– Ease of use
• Competition
• Queries, Schema, Data
• Scale
• Cost
• Query Cut-Off
• Number of runs/cache
• Number of nodes
• Tuning allowed
• Vendor Involvement
• Any free third party, SaaS, or on-demand software (e.g., Apigee or SQL Server)
• Any not-free third party, SaaS, or on-demand software
• Instance type of nodes
• Measure Price/Performance!
26
Using Data
Platforms that are
Fit-For-Purpose
Presented by: William McKnight
“#1 Global Influencer in Data Warehousing” Onalytica
President, McKnight Consulting Group
A 2 time Inc. 5000 Company
@williammcknight
www.mcknightcg.com
(214) 514-1444

More Related Content

What's hot

Unlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeUnlocking the Value of Your Data Lake
Unlocking the Value of Your Data Lake
DATAVERSITY
 
ADV Slides: 2021 Trends in Enterprise Analytics
ADV Slides: 2021 Trends in Enterprise AnalyticsADV Slides: 2021 Trends in Enterprise Analytics
ADV Slides: 2021 Trends in Enterprise Analytics
DATAVERSITY
 
Data Quality Best Practices
Data Quality Best PracticesData Quality Best Practices
Data Quality Best Practices
DATAVERSITY
 
Building an Effective Data & Analytics Operating Model A Data Modernization G...
Building an Effective Data & Analytics Operating Model A Data Modernization G...Building an Effective Data & Analytics Operating Model A Data Modernization G...
Building an Effective Data & Analytics Operating Model A Data Modernization G...
Mark Hewitt
 
Data-Ed Online: Unlock Business Value through Reference & MDM
Data-Ed Online: Unlock Business Value through Reference & MDMData-Ed Online: Unlock Business Value through Reference & MDM
Data-Ed Online: Unlock Business Value through Reference & MDM
DATAVERSITY
 
Data-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling FundamentalsData-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling Fundamentals
DATAVERSITY
 
IDERA Slides: Managing Complex Data Environments
IDERA Slides: Managing Complex Data EnvironmentsIDERA Slides: Managing Complex Data Environments
IDERA Slides: Managing Complex Data Environments
DATAVERSITY
 
The Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationThe Shifting Landscape of Data Integration
The Shifting Landscape of Data Integration
DATAVERSITY
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
DATAVERSITY
 
Slides: Enterprise Architecture vs. Data Architecture
Slides: Enterprise Architecture vs. Data ArchitectureSlides: Enterprise Architecture vs. Data Architecture
Slides: Enterprise Architecture vs. Data Architecture
DATAVERSITY
 
DAS Slides: Data Virtualization – Separating Myth from Reality
DAS Slides: Data Virtualization – Separating Myth from RealityDAS Slides: Data Virtualization – Separating Myth from Reality
DAS Slides: Data Virtualization – Separating Myth from Reality
DATAVERSITY
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
DATAVERSITY
 
Big Data and MDM altogether: the winning association
Big Data and MDM altogether: the winning associationBig Data and MDM altogether: the winning association
Big Data and MDM altogether: the winning association
Jean-Michel Franco
 
DMBOK 2.0 and other frameworks including TOGAF & COBIT - keynote from DAMA Au...
DMBOK 2.0 and other frameworks including TOGAF & COBIT - keynote from DAMA Au...DMBOK 2.0 and other frameworks including TOGAF & COBIT - keynote from DAMA Au...
DMBOK 2.0 and other frameworks including TOGAF & COBIT - keynote from DAMA Au...
Christopher Bradley
 
10 Worst Practices in Master Data Management
10 Worst Practices in Master Data Management10 Worst Practices in Master Data Management
10 Worst Practices in Master Data Management
ibi
 
Modern Integrated Data Environment - Whitepaper | Qubole
Modern Integrated Data Environment - Whitepaper | QuboleModern Integrated Data Environment - Whitepaper | Qubole
Modern Integrated Data Environment - Whitepaper | Qubole
Vasu S
 
Measuring Data Quality Return on Investment
Measuring Data Quality Return on InvestmentMeasuring Data Quality Return on Investment
Measuring Data Quality Return on Investment
DATAVERSITY
 
Virtual Governance in a Time of Crisis Workshop
Virtual Governance in a Time of Crisis WorkshopVirtual Governance in a Time of Crisis Workshop
Virtual Governance in a Time of Crisis Workshop
CCG
 
Slides: Beyond Metadata — Enrich Your Metadata Management with Deep-Level Dat...
Slides: Beyond Metadata — Enrich Your Metadata Management with Deep-Level Dat...Slides: Beyond Metadata — Enrich Your Metadata Management with Deep-Level Dat...
Slides: Beyond Metadata — Enrich Your Metadata Management with Deep-Level Dat...
DATAVERSITY
 
Platforming the Major Analytic Use Cases for Modern Engineering
Platforming the Major Analytic Use Cases for Modern EngineeringPlatforming the Major Analytic Use Cases for Modern Engineering
Platforming the Major Analytic Use Cases for Modern Engineering
DATAVERSITY
 

What's hot (20)

Unlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeUnlocking the Value of Your Data Lake
Unlocking the Value of Your Data Lake
 
ADV Slides: 2021 Trends in Enterprise Analytics
ADV Slides: 2021 Trends in Enterprise AnalyticsADV Slides: 2021 Trends in Enterprise Analytics
ADV Slides: 2021 Trends in Enterprise Analytics
 
Data Quality Best Practices
Data Quality Best PracticesData Quality Best Practices
Data Quality Best Practices
 
Building an Effective Data & Analytics Operating Model A Data Modernization G...
Building an Effective Data & Analytics Operating Model A Data Modernization G...Building an Effective Data & Analytics Operating Model A Data Modernization G...
Building an Effective Data & Analytics Operating Model A Data Modernization G...
 
Data-Ed Online: Unlock Business Value through Reference & MDM
Data-Ed Online: Unlock Business Value through Reference & MDMData-Ed Online: Unlock Business Value through Reference & MDM
Data-Ed Online: Unlock Business Value through Reference & MDM
 
Data-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling FundamentalsData-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling Fundamentals
 
IDERA Slides: Managing Complex Data Environments
IDERA Slides: Managing Complex Data EnvironmentsIDERA Slides: Managing Complex Data Environments
IDERA Slides: Managing Complex Data Environments
 
The Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationThe Shifting Landscape of Data Integration
The Shifting Landscape of Data Integration
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
 
Slides: Enterprise Architecture vs. Data Architecture
Slides: Enterprise Architecture vs. Data ArchitectureSlides: Enterprise Architecture vs. Data Architecture
Slides: Enterprise Architecture vs. Data Architecture
 
DAS Slides: Data Virtualization – Separating Myth from Reality
DAS Slides: Data Virtualization – Separating Myth from RealityDAS Slides: Data Virtualization – Separating Myth from Reality
DAS Slides: Data Virtualization – Separating Myth from Reality
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
Big Data and MDM altogether: the winning association
Big Data and MDM altogether: the winning associationBig Data and MDM altogether: the winning association
Big Data and MDM altogether: the winning association
 
DMBOK 2.0 and other frameworks including TOGAF & COBIT - keynote from DAMA Au...
DMBOK 2.0 and other frameworks including TOGAF & COBIT - keynote from DAMA Au...DMBOK 2.0 and other frameworks including TOGAF & COBIT - keynote from DAMA Au...
DMBOK 2.0 and other frameworks including TOGAF & COBIT - keynote from DAMA Au...
 
10 Worst Practices in Master Data Management
10 Worst Practices in Master Data Management10 Worst Practices in Master Data Management
10 Worst Practices in Master Data Management
 
Modern Integrated Data Environment - Whitepaper | Qubole
Modern Integrated Data Environment - Whitepaper | QuboleModern Integrated Data Environment - Whitepaper | Qubole
Modern Integrated Data Environment - Whitepaper | Qubole
 
Measuring Data Quality Return on Investment
Measuring Data Quality Return on InvestmentMeasuring Data Quality Return on Investment
Measuring Data Quality Return on Investment
 
Virtual Governance in a Time of Crisis Workshop
Virtual Governance in a Time of Crisis WorkshopVirtual Governance in a Time of Crisis Workshop
Virtual Governance in a Time of Crisis Workshop
 
Slides: Beyond Metadata — Enrich Your Metadata Management with Deep-Level Dat...
Slides: Beyond Metadata — Enrich Your Metadata Management with Deep-Level Dat...Slides: Beyond Metadata — Enrich Your Metadata Management with Deep-Level Dat...
Slides: Beyond Metadata — Enrich Your Metadata Management with Deep-Level Dat...
 
Platforming the Major Analytic Use Cases for Modern Engineering
Platforming the Major Analytic Use Cases for Modern EngineeringPlatforming the Major Analytic Use Cases for Modern Engineering
Platforming the Major Analytic Use Cases for Modern Engineering
 

Similar to Using Data Platforms That Are Fit-For-Purpose

ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
DATAVERSITY
 
Data Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data LakeData Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data Lake
Denodo
 
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Denodo
 
Benefits of a data lake
Benefits of a data lake Benefits of a data lake
Benefits of a data lake
Sun Technologies
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data Virtualization
Denodo
 
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need BothThe Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
Adaryl "Bob" Wakefield, MBA
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
Denodo
 
Achieve data democracy in data lake with data integration
Achieve data democracy in data lake with data integration Achieve data democracy in data lake with data integration
Achieve data democracy in data lake with data integration
Saurabh K. Gupta
 
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Denodo
 
How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?
Slim Baltagi
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Denodo
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
single store faster analytics for warehousing
single store faster analytics for warehousingsingle store faster analytics for warehousing
single store faster analytics for warehousing
ballsmcballsack
 
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Denodo
 
Manish tripathi-ea-dw-bi
Manish tripathi-ea-dw-biManish tripathi-ea-dw-bi
Manish tripathi-ea-dw-bi
A P
 
An Overview of Data Lake
An Overview of Data LakeAn Overview of Data Lake
An Overview of Data Lake
IRJET Journal
 
Data Lakes: A Logical Approach for Faster Unified Insights
Data Lakes: A Logical Approach for Faster Unified InsightsData Lakes: A Logical Approach for Faster Unified Insights
Data Lakes: A Logical Approach for Faster Unified Insights
Denodo
 
Bridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItBridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need It
Denodo
 
Big Data Case study - caixa bank
Big Data Case study - caixa bankBig Data Case study - caixa bank
Big Data Case study - caixa bank
Chungsik Yun
 

Similar to Using Data Platforms That Are Fit-For-Purpose (20)

ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
Data Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data LakeData Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data Lake
 
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
 
Benefits of a data lake
Benefits of a data lake Benefits of a data lake
Benefits of a data lake
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data Virtualization
 
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need BothThe Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
Achieve data democracy in data lake with data integration
Achieve data democracy in data lake with data integration Achieve data democracy in data lake with data integration
Achieve data democracy in data lake with data integration
 
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
 
How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
 
single store faster analytics for warehousing
single store faster analytics for warehousingsingle store faster analytics for warehousing
single store faster analytics for warehousing
 
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
 
Manish tripathi-ea-dw-bi
Manish tripathi-ea-dw-biManish tripathi-ea-dw-bi
Manish tripathi-ea-dw-bi
 
An Overview of Data Lake
An Overview of Data LakeAn Overview of Data Lake
An Overview of Data Lake
 
Data Lakes: A Logical Approach for Faster Unified Insights
Data Lakes: A Logical Approach for Faster Unified InsightsData Lakes: A Logical Approach for Faster Unified Insights
Data Lakes: A Logical Approach for Faster Unified Insights
 
Bridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItBridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need It
 
Big Data Case study - caixa bank
Big Data Case study - caixa bankBig Data Case study - caixa bank
Big Data Case study - caixa bank
 

More from DATAVERSITY

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
DATAVERSITY
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and Governance
DATAVERSITY
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data Literacy
DATAVERSITY
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
DATAVERSITY
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for You
DATAVERSITY
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
DATAVERSITY
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?
DATAVERSITY
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling Fundamentals
DATAVERSITY
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic Project
DATAVERSITY
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
DATAVERSITY
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?
DATAVERSITY
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
DATAVERSITY
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
DATAVERSITY
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and Forwards
DATAVERSITY
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement Today
DATAVERSITY
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics
DATAVERSITY
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
DATAVERSITY
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?
DATAVERSITY
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
DATAVERSITY
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
DATAVERSITY
 

More from DATAVERSITY (20)

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and Governance
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data Literacy
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for You
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling Fundamentals
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic Project
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and Forwards
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement Today
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
 

Recently uploaded

Health care analysis using sentimental analysis
Health care analysis using sentimental analysisHealth care analysis using sentimental analysis
Health care analysis using sentimental analysis
krishnasrigannavarap
 
Bangalore Call Girls ♠ 9079923931 ♠ Beautiful Call Girls In Bangalore
Bangalore Call Girls  ♠ 9079923931 ♠ Beautiful Call Girls In BangaloreBangalore Call Girls  ♠ 9079923931 ♠ Beautiful Call Girls In Bangalore
Bangalore Call Girls ♠ 9079923931 ♠ Beautiful Call Girls In Bangalore
yashusingh54876
 
Mumbai Central Call Girls ☑ +91-9833325238 ☑ Available Hot Girls Aunty Book Now
Mumbai Central Call Girls ☑ +91-9833325238 ☑ Available Hot Girls Aunty Book NowMumbai Central Call Girls ☑ +91-9833325238 ☑ Available Hot Girls Aunty Book Now
Mumbai Central Call Girls ☑ +91-9833325238 ☑ Available Hot Girls Aunty Book Now
radhika ansal $A12
 
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
Timothy Spann
 
Hyderabad Call Girls 7339748667 With Free Home Delivery At Your Door
Hyderabad Call Girls 7339748667 With Free Home Delivery At Your DoorHyderabad Call Girls 7339748667 With Free Home Delivery At Your Door
Hyderabad Call Girls 7339748667 With Free Home Delivery At Your Door
Russian Escorts in Delhi 9711199171 with low rate Book online
 
06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus
Timothy Spann
 
IBM watsonx.data - Seller Enablement Deck.PPTX
IBM watsonx.data - Seller Enablement Deck.PPTXIBM watsonx.data - Seller Enablement Deck.PPTX
IBM watsonx.data - Seller Enablement Deck.PPTX
EbtsamRashed
 
CAP Excel Formulas & Functions July - Copy (4).pdf
CAP Excel Formulas & Functions July - Copy (4).pdfCAP Excel Formulas & Functions July - Copy (4).pdf
CAP Excel Formulas & Functions July - Copy (4).pdf
frp60658
 
Optimizing Feldera: Integrating Advanced UDFs and Enhanced SQL Functionality ...
Optimizing Feldera: Integrating Advanced UDFs and Enhanced SQL Functionality ...Optimizing Feldera: Integrating Advanced UDFs and Enhanced SQL Functionality ...
Optimizing Feldera: Integrating Advanced UDFs and Enhanced SQL Functionality ...
mparmparousiskostas
 
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
Call Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call GirlCall Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call Girl
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
sapna sharmap11
 
Delhi Call Girls Karol Bagh 👉 9711199012 👈 unlimited short high profile full ...
Delhi Call Girls Karol Bagh 👉 9711199012 👈 unlimited short high profile full ...Delhi Call Girls Karol Bagh 👉 9711199012 👈 unlimited short high profile full ...
Delhi Call Girls Karol Bagh 👉 9711199012 👈 unlimited short high profile full ...
mona lisa $A12
 
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
Call Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call GirlCall Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call Girl
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
sapna sharmap11
 
Salesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - CanariasSalesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - Canarias
davidpietrzykowski1
 
一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理
一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理
一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理
zoykygu
 
saps4hanaandsapanalyticswheretodowhat1565272000538.pdf
saps4hanaandsapanalyticswheretodowhat1565272000538.pdfsaps4hanaandsapanalyticswheretodowhat1565272000538.pdf
saps4hanaandsapanalyticswheretodowhat1565272000538.pdf
newdirectionconsulta
 
Do People Really Know Their Fertility Intentions? Correspondence between Sel...
Do People Really Know Their Fertility Intentions?  Correspondence between Sel...Do People Really Know Their Fertility Intentions?  Correspondence between Sel...
Do People Really Know Their Fertility Intentions? Correspondence between Sel...
Xiao Xu
 
Startup Grind Princeton - Gen AI 240618 18 June 2024
Startup Grind Princeton - Gen AI 240618 18 June 2024Startup Grind Princeton - Gen AI 240618 18 June 2024
Startup Grind Princeton - Gen AI 240618 18 June 2024
Timothy Spann
 
Pune Call Girls <BOOK> 😍 Call Girl Pune Escorts Service
Pune Call Girls <BOOK> 😍 Call Girl Pune Escorts ServicePune Call Girls <BOOK> 😍 Call Girl Pune Escorts Service
Pune Call Girls <BOOK> 😍 Call Girl Pune Escorts Service
vashimk775
 
Startup Grind Princeton 18 June 2024 - AI Advancement
Startup Grind Princeton 18 June 2024 - AI AdvancementStartup Grind Princeton 18 June 2024 - AI Advancement
Startup Grind Princeton 18 June 2024 - AI Advancement
Timothy Spann
 
Fabric Engineering Deep Dive Keynote from Fabric Engineering Roadshow
Fabric Engineering Deep Dive Keynote from Fabric Engineering RoadshowFabric Engineering Deep Dive Keynote from Fabric Engineering Roadshow
Fabric Engineering Deep Dive Keynote from Fabric Engineering Roadshow
Gabi Münster
 

Recently uploaded (20)

Health care analysis using sentimental analysis
Health care analysis using sentimental analysisHealth care analysis using sentimental analysis
Health care analysis using sentimental analysis
 
Bangalore Call Girls ♠ 9079923931 ♠ Beautiful Call Girls In Bangalore
Bangalore Call Girls  ♠ 9079923931 ♠ Beautiful Call Girls In BangaloreBangalore Call Girls  ♠ 9079923931 ♠ Beautiful Call Girls In Bangalore
Bangalore Call Girls ♠ 9079923931 ♠ Beautiful Call Girls In Bangalore
 
Mumbai Central Call Girls ☑ +91-9833325238 ☑ Available Hot Girls Aunty Book Now
Mumbai Central Call Girls ☑ +91-9833325238 ☑ Available Hot Girls Aunty Book NowMumbai Central Call Girls ☑ +91-9833325238 ☑ Available Hot Girls Aunty Book Now
Mumbai Central Call Girls ☑ +91-9833325238 ☑ Available Hot Girls Aunty Book Now
 
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
 
Hyderabad Call Girls 7339748667 With Free Home Delivery At Your Door
Hyderabad Call Girls 7339748667 With Free Home Delivery At Your DoorHyderabad Call Girls 7339748667 With Free Home Delivery At Your Door
Hyderabad Call Girls 7339748667 With Free Home Delivery At Your Door
 
06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus
 
IBM watsonx.data - Seller Enablement Deck.PPTX
IBM watsonx.data - Seller Enablement Deck.PPTXIBM watsonx.data - Seller Enablement Deck.PPTX
IBM watsonx.data - Seller Enablement Deck.PPTX
 
CAP Excel Formulas & Functions July - Copy (4).pdf
CAP Excel Formulas & Functions July - Copy (4).pdfCAP Excel Formulas & Functions July - Copy (4).pdf
CAP Excel Formulas & Functions July - Copy (4).pdf
 
Optimizing Feldera: Integrating Advanced UDFs and Enhanced SQL Functionality ...
Optimizing Feldera: Integrating Advanced UDFs and Enhanced SQL Functionality ...Optimizing Feldera: Integrating Advanced UDFs and Enhanced SQL Functionality ...
Optimizing Feldera: Integrating Advanced UDFs and Enhanced SQL Functionality ...
 
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
Call Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call GirlCall Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call Girl
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
 
Delhi Call Girls Karol Bagh 👉 9711199012 👈 unlimited short high profile full ...
Delhi Call Girls Karol Bagh 👉 9711199012 👈 unlimited short high profile full ...Delhi Call Girls Karol Bagh 👉 9711199012 👈 unlimited short high profile full ...
Delhi Call Girls Karol Bagh 👉 9711199012 👈 unlimited short high profile full ...
 
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
Call Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call GirlCall Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call Girl
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
 
Salesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - CanariasSalesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - Canarias
 
一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理
一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理
一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理
 
saps4hanaandsapanalyticswheretodowhat1565272000538.pdf
saps4hanaandsapanalyticswheretodowhat1565272000538.pdfsaps4hanaandsapanalyticswheretodowhat1565272000538.pdf
saps4hanaandsapanalyticswheretodowhat1565272000538.pdf
 
Do People Really Know Their Fertility Intentions? Correspondence between Sel...
Do People Really Know Their Fertility Intentions?  Correspondence between Sel...Do People Really Know Their Fertility Intentions?  Correspondence between Sel...
Do People Really Know Their Fertility Intentions? Correspondence between Sel...
 
Startup Grind Princeton - Gen AI 240618 18 June 2024
Startup Grind Princeton - Gen AI 240618 18 June 2024Startup Grind Princeton - Gen AI 240618 18 June 2024
Startup Grind Princeton - Gen AI 240618 18 June 2024
 
Pune Call Girls <BOOK> 😍 Call Girl Pune Escorts Service
Pune Call Girls <BOOK> 😍 Call Girl Pune Escorts ServicePune Call Girls <BOOK> 😍 Call Girl Pune Escorts Service
Pune Call Girls <BOOK> 😍 Call Girl Pune Escorts Service
 
Startup Grind Princeton 18 June 2024 - AI Advancement
Startup Grind Princeton 18 June 2024 - AI AdvancementStartup Grind Princeton 18 June 2024 - AI Advancement
Startup Grind Princeton 18 June 2024 - AI Advancement
 
Fabric Engineering Deep Dive Keynote from Fabric Engineering Roadshow
Fabric Engineering Deep Dive Keynote from Fabric Engineering RoadshowFabric Engineering Deep Dive Keynote from Fabric Engineering Roadshow
Fabric Engineering Deep Dive Keynote from Fabric Engineering Roadshow
 

Using Data Platforms That Are Fit-For-Purpose

  • 1. Using Data Platforms that are Fit-For-Purpose Presented by: William McKnight “#1 Global Influencer in Data Warehousing” Onalytica President, McKnight Consulting Group A 2-time Inc. 5000 Company @williammcknight www.mcknightcg.com (214) 514-1444
  • 2. © All rights reserved. Matillion 20 21 A ve n d or p e rsp e ctive from Matillion Modern Data Storage Evolutions
  • 3. © All rights reserved. Matillion 20 21 2 Paul Lacey Sr. Dire ctor P rod u ct Marke tin g Matillion Sp e ake r
  • 4. © All rights reserved. Matillion 20 21 © Matillion. All rights reserved 2021 History of Data Warehousing 1960 DBMS 1970 RDBMS 1980 SQL 1988 Da ta W a re h ou se 1992 P u b lish e d : Bu ild in g th e Da ta W a re h ou se 1996 P u b lish e d : Th e Da ta W a re h ou se Toolkit 2005 Ha d oop Big Da ta 2013 Clou d Da ta W a re h ou se 2015 Clou d ETL 2017 La ke h ou se 3
  • 5. © All rights reserved. Matillion 20 21 © Matillion. All rights reserved 2021 History of Data Warehousing 2005 Ha d oop ‘Big Da ta ’ 2011 2013 2014 2017 Clou d ETL 2015 Sp e ctru m 2017 2019 2013 2014 2011 4
  • 6. © All rights reserved. Matillion 20 21 © Matillion. All rights reserved 2021 Architectural Paradigms Effort to Reward Innovation Original Big Data Stack Pipeline 2.0 Hybrid Storage Lakehouse 2005 2013 2015 2017 5
  • 7. © All rights reserved. Matillion 20 21 © Matillion. All rights reserved 2021 The Original Big Data Stack 6
  • 8. © All rights reserved. Matillion 20 21 © Matillion. All rights reserved 2021 Pipeline 2.0 7
  • 9. © All rights reserved. Matillion 20 21 © Matillion. All rights reserved 2021 Hybrid Storage 8
  • 10. © All rights reserved. Matillion 20 21 ML & BI Se p arate d an d Siloe d Data En g in e e rin g Data Scie n ce W ran g lin g ETL Data P re p Storag e Data W are h ou se Data Lake P roce ssin g Scala P an d as Orch e stration Airflow Ju p yte r Note b ooks Visu alization Tab le au Matp lotlib 9 9
  • 11. © All rights reserved. Matillion 20 21 Com b in e d ata scie n ce an d d ata e n g in e e rin g w orkflow s The Lakehouse Approach 1 10
  • 12. © All rights reserved. Matillion 20 21 © Matillion. All rights reserved 2021 Lake h ou se – b e st of b oth w orld s Data W are h ou se Data Lake Stre am in g An alytics BI Data Scie n ce Mach in e Le arn in g Stru ctu re d , Se m i-Stru ctu re d an d Un stru ctu re d Data 11
  • 13. © All rights reserved. Matillion 20 21 © Matillion. All rights reserved 2021 Lakehouse 12
  • 14. © All rights reserved. Matillion 20 21 © Matillion. All rights reserved 2021 Fam iliar In te rface s On e Datase t Analysts Matillion ELT Logic and Orchestration SQL ACID PySpark Spark Data Scientists Data Engineers Data Eng Data Science Integrators Business Users 13
  • 15. © All rights reserved. Matillion 20 21 © Matillion. All rights reserved 2021 Acce ssib le In te g ration Brin g s Un ifie d An alytics 14
  • 16. © All rights reserved. Matillion 2021 More In fo: m atillion .com 15
  • 17. © All rights reserved. Matillion 20 21 Matillion # Te am Gre e n Matillion # te am g re e n m atillion .com Th an k You ! 16
  • 18. Performance • Performance is a critical point of interest when it comes to selecting an analytics platform. • To measure data warehouse performance, we use similarly priced specifications across data warehouse competitors. • Usually when people say they care about performance, it is the ultimate metric of price/performance. • The realities of creating fair tests can be overwhelming to many shops, and is a task usually underestimated. 2
  • 19. The Perils of Performance Alone • A modern workload is less frequently a set number of queries, but more of an interactive variable number of queries • A lack of certain key features and functions in the chosen platform leads to increased time spent • There can be some hidden downsides to some data warehouse platforms have features that appear beneficial and desirable 3
  • 21. Data is FFP when it is… • In a leveragable platform • In an appropriate platform for its profile and usage • With high non-functionals (Availability, performance, scalability, stability, durability, secure) • Data is captured at the most granular level • Data is at a data quality standard (as defined by Data Governance) 5
  • 23. Cost Predictability and Transparency • The cost profile options for cloud databases are straightforward if you accept the defaults for simple workload or proof-of-concept (POC) environments • Initial entry costs and inadequately scoped environments can artificially lower expectations of the true costs of jumping into a cloud data warehouse environment. • For some, you pay for compute resources as a function of time, but you also choose the hourly rate based on certain enterprise features you need. 7
  • 24. Cost Consciousness and Licensing Structure • Be on the lookout for cost optimizations like not paying when the system is idle, compression to save storage costs, and moving or isolating workloads to avoid contention. • Look for the ability to directly operate on compact open file formats Parquet and ORC • Also, costs can spin out of control if you have to pay a separate license for each deployment option or each machine learning algorithm. 8
  • 25. Easy Administration • Overall costs, time, as well as storage and compute resources are affected by the simplicity of configurability and overall use. • The platform should have embraced a self-sufficiency model for its customers and be well into the process of automating repetitive tasks. • Easy administration starts with setup that is a simple process of asking basic information and providing helpful information for selecting the storage and node configurations. 9
  • 26. Optimizer Robustness • The data warehouse should be designed for complex decision support and machine learning activity in a multi-user, mixed workload, highly concurrent environment. • Check on conditional parallelism and what the causes are of variations in the parallelism deployed. • Check on dynamic and controllable prioritization of resources for queries. 10
  • 27. Dedicated Compute • The dedicated compute category represents the heart of the analytics stack—the data warehouse itself. • A modern cloud data warehouse must have separate compute and storage architecture. • The power to scale compute and storage independently of one another has transitioned from an industry trend to an industry standard. 11
  • 28. Dedicated Storage • The dedicated storage category represents storage of the enterprise data. • In former days, this data was tightly-coupled to the data warehouse itself, but modern cloud architecture allows for the data to be stored separately (and priced separately). 12
  • 29. Data Integration • The data integration category represents the movement of enterprise data from source to the target data warehouse through conventional ETL (extract-transform-load) and ELT (extract-load-transform) methods. 13
  • 30. Data Access • Azure Synapse and Google BigQuery have a “serverless” pricing model that allows users to run queries and only pay for the data they scan and not an hourly rate for compute. • Redshift has the Spectrum service to scan data in S3 without loading it into the data warehouse; however, you pay for the data scanned, plus you need a running Redshift cluster at an additional charge. • For Snowflake, you pay for the compute, but not for data scanned. For all these scenarios (except Snowflake), we assumed 500TB scanned per month for the Medium-tier enterprise and 2,000TB scanned for Large organizations. 14
  • 31. Data Lake • The data lake category represents the use of a data lake that is separate from the data. This is common in many modern data-driven organizations as a way to store and analyze massive data sets of “colder” data that don’t necessarily belong in the data warehouse. 15
  • 32. Sample Breakout (AWS) 16 Dedicated Compute 43% Storage 0% Data Integration 14% Streaming 4% Spark Analytics 3% Data Exploration 6% Data Lake 20% BI 5% Machine Learning 5% Identity Management 0% Data Catalog 0%
  • 34. Concurrency Scaling • If the database has concurrency limitations, designing around them is difficult at best, and limiting to effective data usage. • If the data warehouse automatically scales up to overcome concurrency limitations, this may be costly if the data warehouse charges by compute node. • If the data warehouse charges per user, costs will also increase as the data warehouse is put to more use in the company. • Look for a data warehouse to provide linear scaling in overall query workload performance as concurrent users are added. 18
  • 35. Resource Elasticity • A data warehouse needs to be able to scale up and down and take advantage of the elastic compute and storage capabilities in the cloud, public or private, without disruption or delay. • The more the customer needs to be involved in resource determination and provisioning, the less elastic, and less modern, the solution is. • One thing to watch for in elasticity scaling is keeping the amount of money spent by the customer under the customer’s control. 19
  • 36. Machine Learning • Today, data warehouse query languages need to be extended to include machine learning, or firms may find the programming required will be too challenging to keep pace. • Data warehouses today need to weave machine learning into their data processing workflows. • Vendors must accommodate and extend SQL to include machine learning functions and algorithms to expand the capabilities of those tools and users. • If your database does not include machine learning, there are many extra things to be concerned with. • Other components will be needed to complete the toolbox and get the job done. • Ideally, security for machine learning will be the same as database security. • The data warehouse also needs to be able to operate at scale, beyond sampling. 20
  • 37. Data Storage Format Alternatives • Cloud object storage is relatively inexpensive making data storage at high scale affordable. • On-premises, specialized private cloud storage options such as Pure Flashblades tend to offer similar data type storage flexibility • To take full advantage of the elasticity of the cloud without driving up costs, data warehouse compute and storage need to be scaled separately. • To take full advantage of the many types of data available, such as Apache ORC, Apache Parquet, JSON, Apache Avro, etc., modern data warehouses need to be able to analyze that data without moving or altering it. • A unified analytics warehouse that supports these various data formats means you have the benefit of querying them directly, without greatly expanding the hierarchical complex data types to a standard tabular data structure for analysis. • You should also be able to import data directly from these formats • The ability to join data for analysis between the various internal and external data formats provides the highest level of analytic flexibility. 21
  • 38. Hadoop Sequence File and Parquet File 22
  • 39. Graph Databases Bridge vertex Bridge vertex 23 • Subject: John R Peterson Predicate: Knows Object: Frank T Smith • Subject: Triple #1 Predicate: Confidence Percent Object: 70 • Subject: Triple #1 Predicate: Provenance Object: Mary L Jones
  • 40. USAGE UNDERSTANDING BY THE BUILDERS DATA CULTIVATION Data Warehouse Data Lake
  • 41. Balance of Analytics Analytic Applications DW Data Lake Analytic Applications DW Data Lake Analytic Applications DW Data Lake DW
  • 42. Design Your Test • What are you benchmarking? – Query performance – Load performance – Query performance with concurrency – Ease of use • Competition • Queries, Schema, Data • Scale • Cost • Query Cut-Off • Number of runs/cache • Number of nodes • Tuning allowed • Vendor Involvement • Any free third party, SaaS, or on-demand software (e.g., Apigee or SQL Server) • Any not-free third party, SaaS, or on-demand software • Instance type of nodes • Measure Price/Performance! 26
  • 43. Using Data Platforms that are Fit-For-Purpose Presented by: William McKnight “#1 Global Influencer in Data Warehousing” Onalytica President, McKnight Consulting Group A 2 time Inc. 5000 Company @williammcknight www.mcknightcg.com (214) 514-1444
  翻译: