尊敬的 微信汇率:1円 ≈ 0.046078 元 支付宝汇率:1円 ≈ 0.046168元 [退出登录]
SlideShare a Scribd company logo
1
Continuous
Intelligence
Keeping your AI Application
in Production
Arif Wider
Emily Gorcenski
NDC Porto, April 23-24, 2020
7000+ technologists with 43 offices in 14 countries
Partner for technology driven business transformation
Barcelona - Madrid - London - Manchester - Berlin - Hamburg - Munich - Cologne
#1
in Agile and
Continuous Delivery
100+
books written
©ThoughtWorks 2020
©ThoughtWorks 2020
TECHNIQUES
Continuous delivery
for machine
learning (CD4ML)
models
#8
TRIAL
8
©ThoughtWorks 2020
TECHNIQUES
Continuous delivery
for machine
learning (CD4ML)
models
#8
TRIAL
8
©ThoughtWorks 2020
7
WHERE
IT ALL
STARTED
©ThoughtWorks 2020
8
Data Scientists
Developers
9
Data Scientists
Developers
prediction
model
Feedback,
requests
10
Data Scientists
Developers
prediction
model
Feedback,
requests
SOUNDS LIKE DEV/OPS?
11
Data Scientists
Developers
12
LET’S DO
CONTINUOUS
DELIVERY!
CONTINUOUS DELIVERY PIPELINES
13
Prediction Model Pipeline
Web Application Pipeline
NEW CHALLENGES
14
NEW CHALLENGES
15
→ How to version control training data?
→ Training data and prediction models don’t fit into Git :-(
→ Model re-training slows down the entire continuous delivery server!
→ Data scientists want to evaluate several solutions at the same time...
→ ...and they use analytics notebooks which are hard to version control!
→ How to unit test data science code that is tied to changing data?
→ How to prevent behaviour changes of the model to break the application?
WHAT ARE THE REASONS?
16
Continuous Delivery is the ability to get
changes of all types — including new features,
configuration changes, bug fixes and
experiments — into production, or into the
hands of users, safely and quickly in a
sustainable way.
Jez Humble & Dave Farley
MANY SOURCES OF CHANGE
18
ModelData Code
+ +
Schema
Sampling over Time
Volume
...
Research, Experiments
Training on New Data
Performance
...
New Features
Bug Fixes
Dependencies
...
Icons created by Noura Mbarki and I Putu Kharismayadi from Noun Project
©ThoughtWorks 2020
NDC Porto (Virtual!), April 23, 2020
19
Code is a broad concept. It can refer to
the source code of our services and
systems. It can represent our
infrastructure. It can refer to the code
used to access, transform, or prepare
data. It can refer to the code used to
train, validate, and assess models.
Code
HOW DOES CODE
CHANGE VERSION?
Bug fixes
New features
Dependencies
Operating systems
Scope change
Etc.
This axis is the best-understood axis of
version change for machine learning
systems.
©ThoughtWorks 2020
NDC Porto (Virtual!), April 23, 2020
20
Model
Model here refers to both the modeling
approach (e.g. random forest
classification) and the actual artifact
produced (e.g. a pkl file). A model is a
product of hyperparameters (not varied
during training) and parameters
(computed during training).
HOW DOES A MODEL
CHANGE VERSION?
Research — trying new approaches
Dependencies
New data
Performance improvements
Etc.
This axis is not as well-understood and
machine learning, as a field, is changing so
rapidly that many models can be expected
to be obsolete in short order.
©ThoughtWorks 2020
NDC Porto (Virtual!), April 23, 2020
21
Data
DATA IS MORE COMPLICATED
©ThoughtWorks 2020
NDC Porto (Virtual!), April 23, 2020
22
Data
DATA IS MORE COMPLICATED
Random Sample by Becris from the Noun Project, Puzzle by
shashank singh from the Noun Project
Sampling Schema
Data is an assessment of the state of the universe at a point in time. It has a measurement and a shape.
©ThoughtWorks 2020
NDC Porto (Virtual!), April 23, 2020
23
Sampling
Sampling refers to the values in the data
themselves. We can’t measure everything,
but what we do measure is a sample of
the universe. These values have
properties like support, distribution,
cardinality, etc.
HOW DOES SAMPLING
CHANGE VERSION?
Time—it marches ever on
Extrinsic changes in the universe
Inserts, Deletions, Updates
Etc.
It is very difficult to specify or detect shifts
in “versions” here except by example. We
often think of this as moving forward with
time, but that versions are not necessarily
time-dependent.
©ThoughtWorks 2020
NDC Porto (Virtual!), April 23, 2020
24
Schema
Schema refers to the shape of our data.
This can be a database schema, e.g. a
table/column/constraint definition, or it
could just refer to shape of the incoming
data (e.g. JSON fields, XML schema, etc).
HOW DOES SCHEMA
CHANGE VERSION?
Software updates
Requirements changes
Migrations
Etc.
It is very difficult to specify or detect shifts
in “versions” here except by example. We
often think of this as moving forward with
time, but that versions are not necessarily
time-dependent.
©ThoughtWorks 2020
NDC Porto (Virtual!), April 23, 2020
25
ML SYSTEMS HAVE FOUR “VERSION AXES”
Any data-driven service can experience version drift along any of these axes independently or jointly.
Schema
Sampling
Model
Code
©ThoughtWorks 2020
NDC Porto (Virtual!), April 23, 2020
DIFFERENT WORKFLOWS
26
master
DEVELOPERS
DIFFERENT WORKFLOWS
27
master
change-max-depth
try-random-forest
©ThoughtWorks 2020
DATA SCIENTISTS
MORE TYPES OF PIPELINES
DATA PIPELINE
ML PIPELINE DEPLOYMENT
PIPELINE
PUTTING EVERYTHING TOGETHER
29
Data Science,
Model
Building
Training Data
Source Code
+
Executables
Model
Evaluation
Productionize
Model
Integration
Testing
Deployment
Test Data
Model +
parameters
CD Tools and Repositories
DiscoverableandAccessibleData
Monitoring
©ThoughtWorks 2020
Production Data
WHAT DO WE NEED IN OUR STACK?
30
Doing CD with Machine Learning is still a hard problem
MODEL
PERFORMANCE
ASSESSMENT
VERSION
CONTROL AND
ARTIFACT
REPOSITORIES
©ThoughtWorks 2020
MONITORING
AND
OBSERVABILITY
DISCOVERABLE
AND
ACCESSIBLE
DATA
CONTINUOUS
DELIVERY
ORCHESTRATION
TO COMBINE
PIPELINES
INFRASTRUCTURE
FOR MULTIPLE
ENVIRONMENTS
AND
EXPERIMENTS
WHAT DO WE NEED IN OUR STACK?
31
Doing CD with Machine Learning is still a hard problem
MODEL
PERFORMANCE
ASSESSMENT
VERSION
CONTROL AND
ARTIFACT
REPOSITORIES
©ThoughtWorks 2020
MONITORING
AND
OBSERVABILITY
DISCOVERABLE
AND
ACCESSIBLE
DATA
CONTINUOUS
DELIVERY
ORCHESTRATION
TO COMBINE
PIPELINES
INFRASTRUCTURE
FOR MULTIPLE
ENVIRONMENTS
AND
EXPERIMENTS
WHAT DO WE NEED IN OUR STACK?
32
Doing CD with Machine Learning is still a hard problem
MODEL
PERFORMANCE
ASSESSMENT
VERSION
CONTROL AND
ARTIFACT
REPOSITORIES
©ThoughtWorks 2020
MONITORING
AND
OBSERVABILITY
DISCOVERABLE
AND
ACCESSIBLE
DATA
CONTINUOUS
DELIVERY
ORCHESTRATION
TO COMBINE
PIPELINES
INFRASTRUCTURE
FOR MULTIPLE
ENVIRONMENTS
AND
EXPERIMENTS
DEMO
DEMO
33
Data Scientist
Develops ML Model
Test and productionize
the model
Deploy to production
servers©ThoughtWorks 2020
Application
34
YOUR OBJECTIVE FUNCTION MUST LINK TO
BUSINESS VALUE
35
©ThoughtWorks 2020
NDC Porto (Virtual!), April 23, 2020
3636
THANK YOU!
Arif Wider
awider@thoughtworks.com
Emily Gorcenski
egorcens@thoughtworks.com
©ThoughtWorks 2020
join.thoughtworks.com

More Related Content

What's hot

The Role of the Logical Data Fabric in a Unified Platform for Modern Analytics
The Role of the Logical Data Fabric in a Unified Platform for Modern AnalyticsThe Role of the Logical Data Fabric in a Unified Platform for Modern Analytics
The Role of the Logical Data Fabric in a Unified Platform for Modern Analytics
Denodo
 
Take your Data Management Practice to the Next Level with Denodo 7
Take your Data Management Practice to the Next Level with Denodo 7Take your Data Management Practice to the Next Level with Denodo 7
Take your Data Management Practice to the Next Level with Denodo 7
Denodo
 
Minimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data VirtualizationMinimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data Virtualization
Denodo
 
Big Data Fabric for At-Scale Real-Time Analysis by Edwin Robbins
 Big Data Fabric for At-Scale Real-Time Analysis by Edwin Robbins Big Data Fabric for At-Scale Real-Time Analysis by Edwin Robbins
Big Data Fabric for At-Scale Real-Time Analysis by Edwin Robbins
Data Con LA
 
Data Virtualization: From Zero to Hero (Middle East)
Data Virtualization: From Zero to Hero (Middle East)Data Virtualization: From Zero to Hero (Middle East)
Data Virtualization: From Zero to Hero (Middle East)
Denodo
 
Multi-Cloud-Datenintegration mit Datenvirtualisierung
Multi-Cloud-Datenintegration mit DatenvirtualisierungMulti-Cloud-Datenintegration mit Datenvirtualisierung
Multi-Cloud-Datenintegration mit Datenvirtualisierung
Denodo
 
Data Virtualization: From Zero to Hero
Data Virtualization: From Zero to HeroData Virtualization: From Zero to Hero
Data Virtualization: From Zero to Hero
Denodo
 
Denodo DataFest 2017: Business Needs for a Fast Data Strategy
Denodo DataFest 2017: Business Needs for a Fast Data StrategyDenodo DataFest 2017: Business Needs for a Fast Data Strategy
Denodo DataFest 2017: Business Needs for a Fast Data Strategy
Denodo
 
The Virtualization of Clouds - The New Enterprise Data Architecture Opportunity
The Virtualization of Clouds - The New Enterprise Data Architecture OpportunityThe Virtualization of Clouds - The New Enterprise Data Architecture Opportunity
The Virtualization of Clouds - The New Enterprise Data Architecture Opportunity
Denodo
 
Why Data Virtualization? An Introduction
Why Data Virtualization? An IntroductionWhy Data Virtualization? An Introduction
Why Data Virtualization? An Introduction
Denodo
 
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationMaximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Denodo
 
Datamesh community meetup 28th jan 2021
Datamesh community meetup 28th jan 2021Datamesh community meetup 28th jan 2021
Datamesh community meetup 28th jan 2021
Prasad Prabhakaran
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
Denodo
 
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Databricks
 
Why Data Virtualization? An Introduction.
Why Data Virtualization? An Introduction.Why Data Virtualization? An Introduction.
Why Data Virtualization? An Introduction.
Denodo
 
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dataconomy Media
 
Denodo DataFest 2017: Conquering the Edge with Data Virtualization
Denodo DataFest 2017: Conquering the Edge with Data VirtualizationDenodo DataFest 2017: Conquering the Edge with Data Virtualization
Denodo DataFest 2017: Conquering the Edge with Data Virtualization
Denodo
 
Denodo DataFest 2016: The Role of Data Virtualization in IoT Integration
Denodo DataFest 2016: The Role of Data Virtualization in IoT IntegrationDenodo DataFest 2016: The Role of Data Virtualization in IoT Integration
Denodo DataFest 2016: The Role of Data Virtualization in IoT Integration
Denodo
 
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
HostedbyConfluent
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Denodo
 

What's hot (20)

The Role of the Logical Data Fabric in a Unified Platform for Modern Analytics
The Role of the Logical Data Fabric in a Unified Platform for Modern AnalyticsThe Role of the Logical Data Fabric in a Unified Platform for Modern Analytics
The Role of the Logical Data Fabric in a Unified Platform for Modern Analytics
 
Take your Data Management Practice to the Next Level with Denodo 7
Take your Data Management Practice to the Next Level with Denodo 7Take your Data Management Practice to the Next Level with Denodo 7
Take your Data Management Practice to the Next Level with Denodo 7
 
Minimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data VirtualizationMinimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data Virtualization
 
Big Data Fabric for At-Scale Real-Time Analysis by Edwin Robbins
 Big Data Fabric for At-Scale Real-Time Analysis by Edwin Robbins Big Data Fabric for At-Scale Real-Time Analysis by Edwin Robbins
Big Data Fabric for At-Scale Real-Time Analysis by Edwin Robbins
 
Data Virtualization: From Zero to Hero (Middle East)
Data Virtualization: From Zero to Hero (Middle East)Data Virtualization: From Zero to Hero (Middle East)
Data Virtualization: From Zero to Hero (Middle East)
 
Multi-Cloud-Datenintegration mit Datenvirtualisierung
Multi-Cloud-Datenintegration mit DatenvirtualisierungMulti-Cloud-Datenintegration mit Datenvirtualisierung
Multi-Cloud-Datenintegration mit Datenvirtualisierung
 
Data Virtualization: From Zero to Hero
Data Virtualization: From Zero to HeroData Virtualization: From Zero to Hero
Data Virtualization: From Zero to Hero
 
Denodo DataFest 2017: Business Needs for a Fast Data Strategy
Denodo DataFest 2017: Business Needs for a Fast Data StrategyDenodo DataFest 2017: Business Needs for a Fast Data Strategy
Denodo DataFest 2017: Business Needs for a Fast Data Strategy
 
The Virtualization of Clouds - The New Enterprise Data Architecture Opportunity
The Virtualization of Clouds - The New Enterprise Data Architecture OpportunityThe Virtualization of Clouds - The New Enterprise Data Architecture Opportunity
The Virtualization of Clouds - The New Enterprise Data Architecture Opportunity
 
Why Data Virtualization? An Introduction
Why Data Virtualization? An IntroductionWhy Data Virtualization? An Introduction
Why Data Virtualization? An Introduction
 
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationMaximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
 
Datamesh community meetup 28th jan 2021
Datamesh community meetup 28th jan 2021Datamesh community meetup 28th jan 2021
Datamesh community meetup 28th jan 2021
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
 
Why Data Virtualization? An Introduction.
Why Data Virtualization? An Introduction.Why Data Virtualization? An Introduction.
Why Data Virtualization? An Introduction.
 
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
 
Denodo DataFest 2017: Conquering the Edge with Data Virtualization
Denodo DataFest 2017: Conquering the Edge with Data VirtualizationDenodo DataFest 2017: Conquering the Edge with Data Virtualization
Denodo DataFest 2017: Conquering the Edge with Data Virtualization
 
Denodo DataFest 2016: The Role of Data Virtualization in IoT Integration
Denodo DataFest 2016: The Role of Data Virtualization in IoT IntegrationDenodo DataFest 2016: The Role of Data Virtualization in IoT Integration
Denodo DataFest 2016: The Role of Data Virtualization in IoT Integration
 
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
 

Similar to Continuous Intelligence: Keeping your AI Application in Production

Limited Budget but Effective End to End MLOps Practices (Machine Learning Mod...
Limited Budget but Effective End to End MLOps Practices (Machine Learning Mod...Limited Budget but Effective End to End MLOps Practices (Machine Learning Mod...
Limited Budget but Effective End to End MLOps Practices (Machine Learning Mod...
IRJET Journal
 
CD4ML and the challenges of testing and quality in ML systems
CD4ML and the challenges of testing and quality in ML systemsCD4ML and the challenges of testing and quality in ML systems
CD4ML and the challenges of testing and quality in ML systems
Seldon
 
CAST Imaging: Map & Master Your Software
CAST Imaging: Map & Master Your SoftwareCAST Imaging: Map & Master Your Software
CAST Imaging: Map & Master Your Software
Neo4j
 
Production machine learning: Managing models, workflows and risk at scale
Production machine learning: Managing models, workflows and risk at scaleProduction machine learning: Managing models, workflows and risk at scale
Production machine learning: Managing models, workflows and risk at scale
Alex Housley
 
Legacy Migration Overview
Legacy Migration OverviewLegacy Migration Overview
Legacy Migration Overview
Bambordé Baldé
 
Legacy Migration
Legacy MigrationLegacy Migration
Legacy Migration
WORPCLOUD LTD
 
IBM Think Milano
IBM Think MilanoIBM Think Milano
IBM Think Milano
ATMOSPHERE .
 
The REMICS model-driven process for migrating legacy applications to the cloud
The REMICS model-driven process for migrating legacy applications to the cloudThe REMICS model-driven process for migrating legacy applications to the cloud
The REMICS model-driven process for migrating legacy applications to the cloud
Marcos Almeida
 
Extending open source and hybrid cloud to drive OT transformation - Future Oi...
Extending open source and hybrid cloud to drive OT transformation - Future Oi...Extending open source and hybrid cloud to drive OT transformation - Future Oi...
Extending open source and hybrid cloud to drive OT transformation - Future Oi...
John Archer
 
Dagster - DataOps and MLOps for Machine Learning Engineers.pdf
Dagster - DataOps and MLOps for Machine Learning Engineers.pdfDagster - DataOps and MLOps for Machine Learning Engineers.pdf
Dagster - DataOps and MLOps for Machine Learning Engineers.pdf
Hong Ong
 
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
Denodo
 
Msr2021 tutorial-di penta
Msr2021 tutorial-di pentaMsr2021 tutorial-di penta
Msr2021 tutorial-di penta
Massimiliano Di Penta
 
DataOps: Control-M's role in data pipeline orchestration
DataOps: Control-M's role in data pipeline orchestrationDataOps: Control-M's role in data pipeline orchestration
DataOps: Control-M's role in data pipeline orchestration
pzjnjr6rsg
 
From Model-based to Model and Simulation-based Systems Architectures
From Model-based to Model and Simulation-based Systems ArchitecturesFrom Model-based to Model and Simulation-based Systems Architectures
From Model-based to Model and Simulation-based Systems Architectures
Obeo
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data Science
Pouria Amirian
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data Science
Pouria Amirian
 
Multi datastores - CLOSER'14
Multi datastores - CLOSER'14Multi datastores - CLOSER'14
Multi datastores - CLOSER'14
Marcos Almeida
 
Notes on Deploying Machine-learning Models at Scale
Notes on Deploying Machine-learning Models at ScaleNotes on Deploying Machine-learning Models at Scale
Notes on Deploying Machine-learning Models at Scale
Deep Kayal
 
Engineering 4.0: Digitization through task automation and reuse
Engineering 4.0:  Digitization through task automation and reuseEngineering 4.0:  Digitization through task automation and reuse
Engineering 4.0: Digitization through task automation and reuse
CARLOS III UNIVERSITY OF MADRID
 
[DSC Europe 22] Reproducibility and Versioning of ML Systems - Spela Poklukar
[DSC Europe 22] Reproducibility and Versioning of ML Systems - Spela Poklukar[DSC Europe 22] Reproducibility and Versioning of ML Systems - Spela Poklukar
[DSC Europe 22] Reproducibility and Versioning of ML Systems - Spela Poklukar
DataScienceConferenc1
 

Similar to Continuous Intelligence: Keeping your AI Application in Production (20)

Limited Budget but Effective End to End MLOps Practices (Machine Learning Mod...
Limited Budget but Effective End to End MLOps Practices (Machine Learning Mod...Limited Budget but Effective End to End MLOps Practices (Machine Learning Mod...
Limited Budget but Effective End to End MLOps Practices (Machine Learning Mod...
 
CD4ML and the challenges of testing and quality in ML systems
CD4ML and the challenges of testing and quality in ML systemsCD4ML and the challenges of testing and quality in ML systems
CD4ML and the challenges of testing and quality in ML systems
 
CAST Imaging: Map & Master Your Software
CAST Imaging: Map & Master Your SoftwareCAST Imaging: Map & Master Your Software
CAST Imaging: Map & Master Your Software
 
Production machine learning: Managing models, workflows and risk at scale
Production machine learning: Managing models, workflows and risk at scaleProduction machine learning: Managing models, workflows and risk at scale
Production machine learning: Managing models, workflows and risk at scale
 
Legacy Migration Overview
Legacy Migration OverviewLegacy Migration Overview
Legacy Migration Overview
 
Legacy Migration
Legacy MigrationLegacy Migration
Legacy Migration
 
IBM Think Milano
IBM Think MilanoIBM Think Milano
IBM Think Milano
 
The REMICS model-driven process for migrating legacy applications to the cloud
The REMICS model-driven process for migrating legacy applications to the cloudThe REMICS model-driven process for migrating legacy applications to the cloud
The REMICS model-driven process for migrating legacy applications to the cloud
 
Extending open source and hybrid cloud to drive OT transformation - Future Oi...
Extending open source and hybrid cloud to drive OT transformation - Future Oi...Extending open source and hybrid cloud to drive OT transformation - Future Oi...
Extending open source and hybrid cloud to drive OT transformation - Future Oi...
 
Dagster - DataOps and MLOps for Machine Learning Engineers.pdf
Dagster - DataOps and MLOps for Machine Learning Engineers.pdfDagster - DataOps and MLOps for Machine Learning Engineers.pdf
Dagster - DataOps and MLOps for Machine Learning Engineers.pdf
 
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
 
Msr2021 tutorial-di penta
Msr2021 tutorial-di pentaMsr2021 tutorial-di penta
Msr2021 tutorial-di penta
 
DataOps: Control-M's role in data pipeline orchestration
DataOps: Control-M's role in data pipeline orchestrationDataOps: Control-M's role in data pipeline orchestration
DataOps: Control-M's role in data pipeline orchestration
 
From Model-based to Model and Simulation-based Systems Architectures
From Model-based to Model and Simulation-based Systems ArchitecturesFrom Model-based to Model and Simulation-based Systems Architectures
From Model-based to Model and Simulation-based Systems Architectures
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data Science
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data Science
 
Multi datastores - CLOSER'14
Multi datastores - CLOSER'14Multi datastores - CLOSER'14
Multi datastores - CLOSER'14
 
Notes on Deploying Machine-learning Models at Scale
Notes on Deploying Machine-learning Models at ScaleNotes on Deploying Machine-learning Models at Scale
Notes on Deploying Machine-learning Models at Scale
 
Engineering 4.0: Digitization through task automation and reuse
Engineering 4.0:  Digitization through task automation and reuseEngineering 4.0:  Digitization through task automation and reuse
Engineering 4.0: Digitization through task automation and reuse
 
[DSC Europe 22] Reproducibility and Versioning of ML Systems - Spela Poklukar
[DSC Europe 22] Reproducibility and Versioning of ML Systems - Spela Poklukar[DSC Europe 22] Reproducibility and Versioning of ML Systems - Spela Poklukar
[DSC Europe 22] Reproducibility and Versioning of ML Systems - Spela Poklukar
 

More from Dr. Arif Wider

Data Mesh - It's not about technology, it's about people
Data Mesh - It's not about technology, it's about peopleData Mesh - It's not about technology, it's about people
Data Mesh - It's not about technology, it's about people
Dr. Arif Wider
 
Continuous Intelligence: Keeping Your AI Application in Production (NDC Sydne...
Continuous Intelligence: Keeping Your AI Application in Production (NDC Sydne...Continuous Intelligence: Keeping Your AI Application in Production (NDC Sydne...
Continuous Intelligence: Keeping Your AI Application in Production (NDC Sydne...
Dr. Arif Wider
 
Continuous Intelligence: Moving Machine Learning into Production Reliably
Continuous Intelligence: Moving Machine Learning into Production ReliablyContinuous Intelligence: Moving Machine Learning into Production Reliably
Continuous Intelligence: Moving Machine Learning into Production Reliably
Dr. Arif Wider
 
Continuous Intelligence: Keeping your AI Application in Production
Continuous Intelligence: Keeping your AI Application in ProductionContinuous Intelligence: Keeping your AI Application in Production
Continuous Intelligence: Keeping your AI Application in Production
Dr. Arif Wider
 
DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & Analytics
DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & AnalyticsDataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & Analytics
DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & Analytics
Dr. Arif Wider
 
DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & Analytics
DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & AnalyticsDataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & Analytics
DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & Analytics
Dr. Arif Wider
 
DataDevOps - A Manifesto on Shared Data Responsibility in Times of Microservices
DataDevOps - A Manifesto on Shared Data Responsibility in Times of MicroservicesDataDevOps - A Manifesto on Shared Data Responsibility in Times of Microservices
DataDevOps - A Manifesto on Shared Data Responsibility in Times of Microservices
Dr. Arif Wider
 
Predictive Analytics for Vehicle Price Prediction - Delivered Continuously at...
Predictive Analytics for Vehicle Price Prediction - Delivered Continuously at...Predictive Analytics for Vehicle Price Prediction - Delivered Continuously at...
Predictive Analytics for Vehicle Price Prediction - Delivered Continuously at...
Dr. Arif Wider
 
A High-Performance Solution to Microservice UI Composition @ XConf Hamburg
A High-Performance Solution to Microservice UI Composition @ XConf HamburgA High-Performance Solution to Microservice UI Composition @ XConf Hamburg
A High-Performance Solution to Microservice UI Composition @ XConf Hamburg
Dr. Arif Wider
 
An Unexpected Solution to Microservices UI Composition
An Unexpected Solution to Microservices UI CompositionAn Unexpected Solution to Microservices UI Composition
An Unexpected Solution to Microservices UI Composition
Dr. Arif Wider
 

More from Dr. Arif Wider (10)

Data Mesh - It's not about technology, it's about people
Data Mesh - It's not about technology, it's about peopleData Mesh - It's not about technology, it's about people
Data Mesh - It's not about technology, it's about people
 
Continuous Intelligence: Keeping Your AI Application in Production (NDC Sydne...
Continuous Intelligence: Keeping Your AI Application in Production (NDC Sydne...Continuous Intelligence: Keeping Your AI Application in Production (NDC Sydne...
Continuous Intelligence: Keeping Your AI Application in Production (NDC Sydne...
 
Continuous Intelligence: Moving Machine Learning into Production Reliably
Continuous Intelligence: Moving Machine Learning into Production ReliablyContinuous Intelligence: Moving Machine Learning into Production Reliably
Continuous Intelligence: Moving Machine Learning into Production Reliably
 
Continuous Intelligence: Keeping your AI Application in Production
Continuous Intelligence: Keeping your AI Application in ProductionContinuous Intelligence: Keeping your AI Application in Production
Continuous Intelligence: Keeping your AI Application in Production
 
DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & Analytics
DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & AnalyticsDataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & Analytics
DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & Analytics
 
DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & Analytics
DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & AnalyticsDataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & Analytics
DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & Analytics
 
DataDevOps - A Manifesto on Shared Data Responsibility in Times of Microservices
DataDevOps - A Manifesto on Shared Data Responsibility in Times of MicroservicesDataDevOps - A Manifesto on Shared Data Responsibility in Times of Microservices
DataDevOps - A Manifesto on Shared Data Responsibility in Times of Microservices
 
Predictive Analytics for Vehicle Price Prediction - Delivered Continuously at...
Predictive Analytics for Vehicle Price Prediction - Delivered Continuously at...Predictive Analytics for Vehicle Price Prediction - Delivered Continuously at...
Predictive Analytics for Vehicle Price Prediction - Delivered Continuously at...
 
A High-Performance Solution to Microservice UI Composition @ XConf Hamburg
A High-Performance Solution to Microservice UI Composition @ XConf HamburgA High-Performance Solution to Microservice UI Composition @ XConf Hamburg
A High-Performance Solution to Microservice UI Composition @ XConf Hamburg
 
An Unexpected Solution to Microservices UI Composition
An Unexpected Solution to Microservices UI CompositionAn Unexpected Solution to Microservices UI Composition
An Unexpected Solution to Microservices UI Composition
 

Recently uploaded

How to Optimize Call Monitoring: Automate QA and Elevate Customer Experience
How to Optimize Call Monitoring: Automate QA and Elevate Customer ExperienceHow to Optimize Call Monitoring: Automate QA and Elevate Customer Experience
How to Optimize Call Monitoring: Automate QA and Elevate Customer Experience
Aggregage
 
The "Zen" of Python Exemplars - OTel Community Day
The "Zen" of Python Exemplars - OTel Community DayThe "Zen" of Python Exemplars - OTel Community Day
The "Zen" of Python Exemplars - OTel Community Day
Paige Cruz
 
Guidelines for Effective Data Visualization
Guidelines for Effective Data VisualizationGuidelines for Effective Data Visualization
Guidelines for Effective Data Visualization
UmmeSalmaM1
 
Communications Mining Series - Zero to Hero - Session 2
Communications Mining Series - Zero to Hero - Session 2Communications Mining Series - Zero to Hero - Session 2
Communications Mining Series - Zero to Hero - Session 2
DianaGray10
 
Multivendor cloud production with VSF TR-11 - there and back again
Multivendor cloud production with VSF TR-11 - there and back againMultivendor cloud production with VSF TR-11 - there and back again
Multivendor cloud production with VSF TR-11 - there and back again
Kieran Kunhya
 
Getting Started Using the National Research Platform
Getting Started Using the National Research PlatformGetting Started Using the National Research Platform
Getting Started Using the National Research Platform
Larry Smarr
 
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc
 
Kubernetes Cloud Native Indonesia Meetup - June 2024
Kubernetes Cloud Native Indonesia Meetup - June 2024Kubernetes Cloud Native Indonesia Meetup - June 2024
Kubernetes Cloud Native Indonesia Meetup - June 2024
Prasta Maha
 
APJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes WebinarAPJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes Webinar
ThousandEyes
 
DynamoDB to ScyllaDB: Technical Comparison and the Path to Success
DynamoDB to ScyllaDB: Technical Comparison and the Path to SuccessDynamoDB to ScyllaDB: Technical Comparison and the Path to Success
DynamoDB to ScyllaDB: Technical Comparison and the Path to Success
ScyllaDB
 
Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...
Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...
Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...
anilsa9823
 
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google CloudRadically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
ScyllaDB
 
From NCSA to the National Research Platform
From NCSA to the National Research PlatformFrom NCSA to the National Research Platform
From NCSA to the National Research Platform
Larry Smarr
 
intra-mart Accel series 2024 Spring updates_En
intra-mart Accel series 2024 Spring updates_Enintra-mart Accel series 2024 Spring updates_En
intra-mart Accel series 2024 Spring updates_En
NTTDATA INTRAMART
 
Chapter 1 - Fundamentals of Testing V4.0
Chapter 1 - Fundamentals of Testing V4.0Chapter 1 - Fundamentals of Testing V4.0
Chapter 1 - Fundamentals of Testing V4.0
Neeraj Kumar Singh
 
The Strategy Behind ReversingLabs’ Massive Key-Value Migration
The Strategy Behind ReversingLabs’ Massive Key-Value MigrationThe Strategy Behind ReversingLabs’ Massive Key-Value Migration
The Strategy Behind ReversingLabs’ Massive Key-Value Migration
ScyllaDB
 
Ubuntu Server CLI cheat sheet 2024 v6.pdf
Ubuntu Server CLI cheat sheet 2024 v6.pdfUbuntu Server CLI cheat sheet 2024 v6.pdf
Ubuntu Server CLI cheat sheet 2024 v6.pdf
TechOnDemandSolution
 
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My IdentityCNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
Cynthia Thomas
 
Chapter 5 - Managing Test Activities V4.0
Chapter 5 - Managing Test Activities V4.0Chapter 5 - Managing Test Activities V4.0
Chapter 5 - Managing Test Activities V4.0
Neeraj Kumar Singh
 
Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!
Ortus Solutions, Corp
 

Recently uploaded (20)

How to Optimize Call Monitoring: Automate QA and Elevate Customer Experience
How to Optimize Call Monitoring: Automate QA and Elevate Customer ExperienceHow to Optimize Call Monitoring: Automate QA and Elevate Customer Experience
How to Optimize Call Monitoring: Automate QA and Elevate Customer Experience
 
The "Zen" of Python Exemplars - OTel Community Day
The "Zen" of Python Exemplars - OTel Community DayThe "Zen" of Python Exemplars - OTel Community Day
The "Zen" of Python Exemplars - OTel Community Day
 
Guidelines for Effective Data Visualization
Guidelines for Effective Data VisualizationGuidelines for Effective Data Visualization
Guidelines for Effective Data Visualization
 
Communications Mining Series - Zero to Hero - Session 2
Communications Mining Series - Zero to Hero - Session 2Communications Mining Series - Zero to Hero - Session 2
Communications Mining Series - Zero to Hero - Session 2
 
Multivendor cloud production with VSF TR-11 - there and back again
Multivendor cloud production with VSF TR-11 - there and back againMultivendor cloud production with VSF TR-11 - there and back again
Multivendor cloud production with VSF TR-11 - there and back again
 
Getting Started Using the National Research Platform
Getting Started Using the National Research PlatformGetting Started Using the National Research Platform
Getting Started Using the National Research Platform
 
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
 
Kubernetes Cloud Native Indonesia Meetup - June 2024
Kubernetes Cloud Native Indonesia Meetup - June 2024Kubernetes Cloud Native Indonesia Meetup - June 2024
Kubernetes Cloud Native Indonesia Meetup - June 2024
 
APJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes WebinarAPJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes Webinar
 
DynamoDB to ScyllaDB: Technical Comparison and the Path to Success
DynamoDB to ScyllaDB: Technical Comparison and the Path to SuccessDynamoDB to ScyllaDB: Technical Comparison and the Path to Success
DynamoDB to ScyllaDB: Technical Comparison and the Path to Success
 
Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...
Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...
Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...
 
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google CloudRadically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
 
From NCSA to the National Research Platform
From NCSA to the National Research PlatformFrom NCSA to the National Research Platform
From NCSA to the National Research Platform
 
intra-mart Accel series 2024 Spring updates_En
intra-mart Accel series 2024 Spring updates_Enintra-mart Accel series 2024 Spring updates_En
intra-mart Accel series 2024 Spring updates_En
 
Chapter 1 - Fundamentals of Testing V4.0
Chapter 1 - Fundamentals of Testing V4.0Chapter 1 - Fundamentals of Testing V4.0
Chapter 1 - Fundamentals of Testing V4.0
 
The Strategy Behind ReversingLabs’ Massive Key-Value Migration
The Strategy Behind ReversingLabs’ Massive Key-Value MigrationThe Strategy Behind ReversingLabs’ Massive Key-Value Migration
The Strategy Behind ReversingLabs’ Massive Key-Value Migration
 
Ubuntu Server CLI cheat sheet 2024 v6.pdf
Ubuntu Server CLI cheat sheet 2024 v6.pdfUbuntu Server CLI cheat sheet 2024 v6.pdf
Ubuntu Server CLI cheat sheet 2024 v6.pdf
 
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My IdentityCNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
 
Chapter 5 - Managing Test Activities V4.0
Chapter 5 - Managing Test Activities V4.0Chapter 5 - Managing Test Activities V4.0
Chapter 5 - Managing Test Activities V4.0
 
Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!
 

Continuous Intelligence: Keeping your AI Application in Production

  • 1. 1 Continuous Intelligence Keeping your AI Application in Production Arif Wider Emily Gorcenski NDC Porto, April 23-24, 2020
  • 2. 7000+ technologists with 43 offices in 14 countries Partner for technology driven business transformation Barcelona - Madrid - London - Manchester - Berlin - Hamburg - Munich - Cologne
  • 3. #1 in Agile and Continuous Delivery 100+ books written ©ThoughtWorks 2020
  • 5. TECHNIQUES Continuous delivery for machine learning (CD4ML) models #8 TRIAL 8 ©ThoughtWorks 2020
  • 6. TECHNIQUES Continuous delivery for machine learning (CD4ML) models #8 TRIAL 8 ©ThoughtWorks 2020
  • 13. CONTINUOUS DELIVERY PIPELINES 13 Prediction Model Pipeline Web Application Pipeline
  • 15. NEW CHALLENGES 15 → How to version control training data? → Training data and prediction models don’t fit into Git :-( → Model re-training slows down the entire continuous delivery server! → Data scientists want to evaluate several solutions at the same time... → ...and they use analytics notebooks which are hard to version control! → How to unit test data science code that is tied to changing data? → How to prevent behaviour changes of the model to break the application?
  • 16. WHAT ARE THE REASONS? 16
  • 17. Continuous Delivery is the ability to get changes of all types — including new features, configuration changes, bug fixes and experiments — into production, or into the hands of users, safely and quickly in a sustainable way. Jez Humble & Dave Farley
  • 18. MANY SOURCES OF CHANGE 18 ModelData Code + + Schema Sampling over Time Volume ... Research, Experiments Training on New Data Performance ... New Features Bug Fixes Dependencies ... Icons created by Noura Mbarki and I Putu Kharismayadi from Noun Project ©ThoughtWorks 2020 NDC Porto (Virtual!), April 23, 2020
  • 19. 19 Code is a broad concept. It can refer to the source code of our services and systems. It can represent our infrastructure. It can refer to the code used to access, transform, or prepare data. It can refer to the code used to train, validate, and assess models. Code HOW DOES CODE CHANGE VERSION? Bug fixes New features Dependencies Operating systems Scope change Etc. This axis is the best-understood axis of version change for machine learning systems. ©ThoughtWorks 2020 NDC Porto (Virtual!), April 23, 2020
  • 20. 20 Model Model here refers to both the modeling approach (e.g. random forest classification) and the actual artifact produced (e.g. a pkl file). A model is a product of hyperparameters (not varied during training) and parameters (computed during training). HOW DOES A MODEL CHANGE VERSION? Research — trying new approaches Dependencies New data Performance improvements Etc. This axis is not as well-understood and machine learning, as a field, is changing so rapidly that many models can be expected to be obsolete in short order. ©ThoughtWorks 2020 NDC Porto (Virtual!), April 23, 2020
  • 21. 21 Data DATA IS MORE COMPLICATED ©ThoughtWorks 2020 NDC Porto (Virtual!), April 23, 2020
  • 22. 22 Data DATA IS MORE COMPLICATED Random Sample by Becris from the Noun Project, Puzzle by shashank singh from the Noun Project Sampling Schema Data is an assessment of the state of the universe at a point in time. It has a measurement and a shape. ©ThoughtWorks 2020 NDC Porto (Virtual!), April 23, 2020
  • 23. 23 Sampling Sampling refers to the values in the data themselves. We can’t measure everything, but what we do measure is a sample of the universe. These values have properties like support, distribution, cardinality, etc. HOW DOES SAMPLING CHANGE VERSION? Time—it marches ever on Extrinsic changes in the universe Inserts, Deletions, Updates Etc. It is very difficult to specify or detect shifts in “versions” here except by example. We often think of this as moving forward with time, but that versions are not necessarily time-dependent. ©ThoughtWorks 2020 NDC Porto (Virtual!), April 23, 2020
  • 24. 24 Schema Schema refers to the shape of our data. This can be a database schema, e.g. a table/column/constraint definition, or it could just refer to shape of the incoming data (e.g. JSON fields, XML schema, etc). HOW DOES SCHEMA CHANGE VERSION? Software updates Requirements changes Migrations Etc. It is very difficult to specify or detect shifts in “versions” here except by example. We often think of this as moving forward with time, but that versions are not necessarily time-dependent. ©ThoughtWorks 2020 NDC Porto (Virtual!), April 23, 2020
  • 25. 25 ML SYSTEMS HAVE FOUR “VERSION AXES” Any data-driven service can experience version drift along any of these axes independently or jointly. Schema Sampling Model Code ©ThoughtWorks 2020 NDC Porto (Virtual!), April 23, 2020
  • 28. MORE TYPES OF PIPELINES DATA PIPELINE ML PIPELINE DEPLOYMENT PIPELINE
  • 29. PUTTING EVERYTHING TOGETHER 29 Data Science, Model Building Training Data Source Code + Executables Model Evaluation Productionize Model Integration Testing Deployment Test Data Model + parameters CD Tools and Repositories DiscoverableandAccessibleData Monitoring ©ThoughtWorks 2020 Production Data
  • 30. WHAT DO WE NEED IN OUR STACK? 30 Doing CD with Machine Learning is still a hard problem MODEL PERFORMANCE ASSESSMENT VERSION CONTROL AND ARTIFACT REPOSITORIES ©ThoughtWorks 2020 MONITORING AND OBSERVABILITY DISCOVERABLE AND ACCESSIBLE DATA CONTINUOUS DELIVERY ORCHESTRATION TO COMBINE PIPELINES INFRASTRUCTURE FOR MULTIPLE ENVIRONMENTS AND EXPERIMENTS
  • 31. WHAT DO WE NEED IN OUR STACK? 31 Doing CD with Machine Learning is still a hard problem MODEL PERFORMANCE ASSESSMENT VERSION CONTROL AND ARTIFACT REPOSITORIES ©ThoughtWorks 2020 MONITORING AND OBSERVABILITY DISCOVERABLE AND ACCESSIBLE DATA CONTINUOUS DELIVERY ORCHESTRATION TO COMBINE PIPELINES INFRASTRUCTURE FOR MULTIPLE ENVIRONMENTS AND EXPERIMENTS
  • 32. WHAT DO WE NEED IN OUR STACK? 32 Doing CD with Machine Learning is still a hard problem MODEL PERFORMANCE ASSESSMENT VERSION CONTROL AND ARTIFACT REPOSITORIES ©ThoughtWorks 2020 MONITORING AND OBSERVABILITY DISCOVERABLE AND ACCESSIBLE DATA CONTINUOUS DELIVERY ORCHESTRATION TO COMBINE PIPELINES INFRASTRUCTURE FOR MULTIPLE ENVIRONMENTS AND EXPERIMENTS DEMO
  • 33. DEMO 33 Data Scientist Develops ML Model Test and productionize the model Deploy to production servers©ThoughtWorks 2020 Application
  • 34. 34
  • 35. YOUR OBJECTIVE FUNCTION MUST LINK TO BUSINESS VALUE 35 ©ThoughtWorks 2020 NDC Porto (Virtual!), April 23, 2020
  • 36. 3636 THANK YOU! Arif Wider awider@thoughtworks.com Emily Gorcenski egorcens@thoughtworks.com ©ThoughtWorks 2020 join.thoughtworks.com
  翻译: