尊敬的 微信汇率:1円 ≈ 0.046239 元 支付宝汇率:1円 ≈ 0.04633元 [退出登录]
SlideShare a Scribd company logo
Containers & AI
👸Beauty and the 👾Beast!?!
Tobias Schneck
@ tobi@kubermatic.com
@toschneck
Principal Architect
@toschneck
As a Container & Kubernes guy,
why should we care about AI?
🤨
… will it be the next big thing?!
😨
By 2028, the adoption of AI will culminate in
over 50% of cloud compute
resources devoted to AI workload, up from
less than 10% in 2023.
Gartner® states, 2023
OK … so what’s about this AI thingy?
🤔󰞵
AI Technology Layers
AI Technology Layers
Data Gathering Collect and prepare data for training AI models
Data Processing Clean and structure data for effective learning
AI Technology Layers
Data Gathering Collect and prepare data for training AI models
Data Processing Clean and structure data for effective learning
Machine Learning Algorithms learn from data patterns
Deep Learning Complex patterns learned using neural networks
Language Models Large models that understand and generate language
AI Technology Layers
Data Gathering Collect and prepare data for training AI models
Data Processing Clean and structure data for effective learning
Machine Learning Algorithms learn from data patterns
Deep Learning Complex patterns learned using neural networks
Language Models Large models that understand and generate language
Chatbot Applications Interactive systems using language models
Chat GPT A conversational agent powered by language models
AI Technology Layers
Data Gathering Collect and prepare data for training AI models
Data Processing Clean and structure data for effective learning
Machine Learning Algorithms learn from data patterns
Deep Learning Complex patterns learned using neural networks
Language Models Large models that understand and generate language
Chatbot Applications Interactive systems using language models
Chat GPT A conversational agent powered by language models
API / UI
AI Technology Layers
Data Gathering Collect and prepare data for training AI models
Data Processing Clean and structure data for effective learning
Machine Learning Algorithms learn from data patterns
Deep Learning Complex patterns learned using neural networks
Language Models Large models that understand and generate language
Chatbot Applications Interactive systems using language models
Chat GPT A conversational agent powered by language models
API / UI
How to
manage?
��
… a lot of Data and Math for an
Infrastructure guy 🧐
… how does such data get compute?
🖥 💽
Credits to Andrej Karpathy 👉 Awesome Intro to LLMs
[1hr Talk] Intro to Large Language Models
Large Language Model (LLM)
Credits to Andrej Karpathy 👏
Training them is more involved.
Think of it like compressing the internet.
Credits to Andrej Karpathy 👏
How does it work?
Credits to Andrej Karpathy 👏
Little is known in full detail….
● Billions of parameters are dispersed through the network
● We know how to iteratively adjust them to make it better at prediction
● We can measure that this works, but we don’t really know how the billions of
parameters collaborate to do it.
They build and maintain some kind of knowledge database,
but it is a bit strange and imperfect:
Recent viral example: “reversal curse”
Q: “Who is Tom Cruise’s mother”?
A: Mary Lee Pfeiffer ✅
Q: “Who is Mary Lee Pfeiffer’s son?”
A: I don’t know ❌
⇒ Think of LLMs as mostly inscrutable artifacts,
develop correspondingly sophisticated evaluations
Summary: how to train your ChatGPT
Credits to Andrej Karpathy 👏
Stage 1: Pretraining
1. Download ∼10TB of text.
2. Get a cluster of ∼6,000 GPUs.
3. Compress the text into a neutral network, pay ∼$2M, wait ∼12 days
4. Obtain a base model.
Stage 2: Finetuning
1. Write labeling instructions
2. Hire people (our use scale.ai!), collect 100K high quality ideal
Q&A responses, and/or comparisons.
3. Finetune base model on this data, wait ∼1 day.
4. Obtain assistant model.
5. Run a lot of evaluations.
6. Deploy.
7. Monitor, collect misbehaviors, go to step 1.
Credits to Andrej Karpathy 👏
Every
∼year
Every
∼week
Summary: how to train your ChatGPT
Credits to Andrej Karpathy 👏
Stage 1: Pretraining
1. Download ∼10TB of text.
2. Get a cluster of ∼6,000 GPUs.
3. Compress the text into a neutral network, pay ∼$2M, wait ∼12 days
4. Obtain a base model.
Stage 2: Finetuning
1. Write labeling instructions
2. Hire people (our use scale.ai!), collect 100K high quality ideal
Q&A responses, and/or comparisons.
3. Finetune base model on this data, wait ∼1 day.
4. Obtain assistant model.
5. Run a lot of evaluations.
6. Deploy.
7. Monitor, collect misbehaviors, go to step 1.
Credits to Andrej Karpathy 👏
Every
∼year
Every
∼week
How does our normal Job look like?
󰻶
Platform
Orchestration / Scheduling
Infrastructure
Cloud or On-prem
Hardware
Accelerators
Platform Engineering
CPU Network Storage
Hardware Architect
SRE / Operations
Platform Engineer
Based on Adel Zaalouk (@ZaNetworker) drawings from the CNCF Cloud Native AI white paper 🙏
What will change in our Infra?
🏗
Based on Adel Zaalouk (@ZaNetworker) drawings from the CNCF Cloud Native AI white paper 🙏
Artificial
Intelligence
(AI)
Machine
Learning
(ML)
Deep
Learning
(DL)
Math & Statistics
Exploratory Data
Analysis (EDA)
Visualization
Data
Science
Artificial
Intelligence
(AI)
Machine
Learning
(ML)
Deep
Learning
(DL)
Math & Statistics
Exploratory Data
Analysis (EDA)
Visualization
Data
Science
Cloud Native?
Based on Adel Zaalouk (@ZaNetworker) drawings from the CNCF Cloud Native AI white paper 🙏
Artificial
Intelligence
(AI)
Machine
Learning
(ML)
Deep
Learning
(DL)
Math & Statistics
Exploratory Data
Analysis (EDA)
Visualization
Data
Science
Cloud Native?
Platform
Orchestration / Scheduling
Platform Engineering
Platform Engineer
AI Platform 🤔
So, Why Kube ?
Flexibility & Standardization
Standard
Container
High-Cube
Container
Hardtop
Container
Open Top
Container
Flat Platform (Plat) Ventilated
Container
Cooling Container Bulk Container
Tank
Container
Container Types
Data Center I
Infrastructure Layer
Standardization with Kubernetes
App Services
IT Space I IT Space II IT Space III
Backend Services
DB Services Analytics Observability
Data Center II
Data Center III
Edge
Cloud Providers
Caches AI / ML DDoS
Protect
Managed
Services
Real Time
Analysis
Intelligent
Edge Devices
Smart
Automation
Data
Processing
Kube for AI ⇔ Kube for Applications
● Kube is a de facto standard as “cloud operating systems”
● API abstraction layer for multiple types of network, storage, and compute resources
● Standard interfaces for support of DevOps best practices like GitOps
● Variation of Cloud Providers and Services are consumable via standard APIs
… and OpenAI uses Kube already 2017 🤯
So …. 😅
How to build an AI Platform?
🏗 🤔
Cloud Native!
Data Preparation Feature Store
Model Development
ML Train / Tune
Model Storage Model Serving
ML Engineer
Cloud Native!
Data Preparation Feature Store
Model Development
ML Train / Tune
Model Storage Model Serving
Repeat the
Process
ML Engineer
Cloud Native!
Data
Preparation
Feature
Store
Model
Development ML
Train / Tune
Model
Storage
Model
Serving
Repeat the
Process
Platform
Orchestration / Scheduling
Platform Engineering
Platform Engineer
AI Platform 🤔
ML Engineer
Support during Orchestration and easy to use for Data!
Cloud Native!
Data
Preparation
Feature
Store
Model
Development ML
Train / Tune
Model
Storage
Model
Serving
Repeat the
Process
Platform
Orchestration / Scheduling
Platform Engineering
Platform Engineer
AI Platform 🤔
ML Engineer
Cloud Native!
Data
Preparation
Feature
Store
Model Development
ML Train / Tune
Model
Storage
Model
Serving
Repeat the
Process
Platform
Orchestration / Scheduling
Platform Engineering
Platform Engineer
AI Platform 🤔
ML
Engineer
Application
Development
Application
QA
Application
Rollout
CI / CD
Cloud Native!
Data
Preparation
Feature
Store
Model Development
ML Train / Tune
Model
Storage
Model
Serving
Repeat the
Process
Platform
Orchestration / Scheduling
Platform Engineering
Platform Engineer
AI Platform 🤔
ML
Engineer
Application
Development
Application
QA
Application
Rollout
CI / CD
Cloud Native AI Apps
Consume
Get
managed by
Platform
Orchestration / Scheduling
Infrastructure
Cloud or On-prem
Hardware
Accelerators Hardware Architect
SRE / Operations
Platform Engineer
Data/ML/AI
Engineer
Data-Scientist
/ Developer
Workloads
Models, applications, ….
ML Lifecycle
(AI/ML/LLM Ops)
CPU GPU NPU TPU DPU
Data Prep Model Training Model Serving Perf / Scale Observe
CI CD
Classification Object Detection
Clustering Forecasting ….
RAGs LLMs
Vector DBs LVMs ….
Predictive Generative
Cloud Native AI - ©CNCF White Paper
Artificial
Intelligence
(AI)
Machine
Learning
(ML)
Deep
Learning
(DL)
Math &
Statistics
Exploratory
Data Analysis
(EDA)
Visualization
Data
Science
How does the ecosystem look
like!?
🛒 🛍
AI Frameworks
Popular Frameworks by GitHub Stars
TensorFlow: 181.3K
PyTorch: 99.2K
Keras: 54.3K
Scikit-learn: 49.1K
Caffe: 33.7K
Options to Adapt the Frameworks 󰞵
Feature
Focused Tools
(local/on-prem)
Examples
MLflow, Backyard AI, Ollama,
Hugging Face TGI
Scope
Specific functionalities
within ML lifecycle
Open Source Yes
Scalability & Portability Moderate
Setup & Management Simpler
Portability Mostly Machine based
Vendor Lock-in No
Options to Adapt the Frameworks 󰞵
Feature
Focused Tools
(local/on-prem)
Managed Platforms
Examples
MLflow, Backyard AI, Ollama,
Hugging Face TGI
AWS SageMaker, scale.ai
Scope
Specific functionalities
within ML lifecycle
Managed MLOps service
Open Source Yes No
Scalability & Portability Moderate Depends on cloud provider
Setup & Management Simpler Simpler
Portability Mostly Machine based Mostly Cloud
Vendor Lock-in No Yes (to specific cloud provider)
Options to Adapt the Frameworks 󰞵
Feature
Focused Tools
(local/on-prem)
Managed Platforms “Kube Native”
Examples
MLflow, Backyard AI, Ollama,
Hugging Face TGI
AWS SageMaker, scale.ai
Kubeflow / KServer
(Hugging Face TGI / LocalAI)
Scope
Specific functionalities
within ML lifecycle
Managed MLOps service End-to-end MLOps platform
Open Source Yes No Yes
Scalability & Portability Moderate Depends on cloud provider High
Setup & Management Simpler Simpler Complex
Portability Mostly Machine based Mostly Cloud Everywhere
Vendor Lock-in No Yes (to specific cloud provider) No
AI Frameworks
Popular Frameworks by GitHub Stars
TensorFlow: 181.3K
PyTorch: 99.2K
Keras: 54.3K
Scikit-learn: 49.1K
Caffe: 33.7K
Could use
KubeFlow ⁉
👉 Currently the most feature
complete choice for Kube
🥴 But Setup is complex!
KubeFlow ⁉
The Beauty 👸:
● Incubating CNCF Project
● Serving AI Platform in Multi-Tenancy
● Popularity 13.7k ⭐ ~ long-term Maintenance Chance
● Alternatives like MLflow / KServe are integrated
The Beast 👾
● Mostly vendor specific installer instructions 🥴
○ No maintained automated installer for generic Kubernetes
○ Helm chart issue #3173
● Dependency “hell”
○ A lot of different 3rd party dependencies constraints
○ Hard to adapt again to existing company defaults
● Only support EOL Kubernetes <= 1.26❗
○ Usability is then questionable in production
Sounds good, but what about
on-prem / offline cases?
🤔
[Cloud] Data Center I
GPU / TPU Powered Services
based on Argo CD
AI Model Serving
[AI] Application Service
Application
⚙ Separate Model Training / Model Usage Example
Infrastructure Layer
Data Center II
Data Center III
Edge
Cloud Providers
Real Time
Analysis
Intelligent
Edge Devices
Smart
Automation
Data
Processing
Data Delivery
Model
Export
Local AI
Consume
Scale for
Training
Vanilla Setup
Starting a POC
󰳘
github.com/toschneck/kubernetes-and-ai
Kubeflow | Katib Architecture for Hyperparameter
Tuning (aka optimization run)
KubeFlow Management UI
Serve trained Models “local” with LocalAI
Ask localAI about CloudLand
Ask localAI about KCD Istanbul
Any Questions?
THANKS FOR JOINING!
kubermatic
@toschneck
tobi@kubermatic.com

More Related Content

Similar to Containers & AI - Beauty and the Beast!?!

GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
James Anderson
 
Datapalooza: A Music Festival Themed ML & IoT Workshop
Datapalooza: A Music Festival Themed ML & IoT WorkshopDatapalooza: A Music Festival Themed ML & IoT Workshop
Datapalooza: A Music Festival Themed ML & IoT Workshop
Amazon Web Services
 
Microsoft AI Platform Overview
Microsoft AI Platform OverviewMicrosoft AI Platform Overview
Microsoft AI Platform Overview
David Chou
 
Dato Keynote
Dato KeynoteDato Keynote
Dato Keynote
Turi, Inc.
 
Kaz Sato, Evangelist, Google at MLconf ATL 2016
Kaz Sato, Evangelist, Google at MLconf ATL 2016Kaz Sato, Evangelist, Google at MLconf ATL 2016
Kaz Sato, Evangelist, Google at MLconf ATL 2016
MLconf
 
DataPalooza - A Music Festival themed ML + IoT Workshop
DataPalooza - A Music Festival themed ML + IoT WorkshopDataPalooza - A Music Festival themed ML + IoT Workshop
DataPalooza - A Music Festival themed ML + IoT Workshop
Amazon Web Services
 
Summit Australia 2019 - Supercharge PowerPlatform with AI - Dipankar Bhattach...
Summit Australia 2019 - Supercharge PowerPlatform with AI - Dipankar Bhattach...Summit Australia 2019 - Supercharge PowerPlatform with AI - Dipankar Bhattach...
Summit Australia 2019 - Supercharge PowerPlatform with AI - Dipankar Bhattach...
Andrew Ly
 
Democratize ai with google cloud
Democratize ai with google cloudDemocratize ai with google cloud
Democratize ai with google cloud
Henrik Hammer Eliassen
 
DataPalooza: ML & IoT Workshop
DataPalooza: ML & IoT WorkshopDataPalooza: ML & IoT Workshop
DataPalooza: ML & IoT Workshop
Amazon Web Services
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
Databricks
 
Big Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-AriBig Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-Ari
Demi Ben-Ari
 
AIoT: Intelligence on Microcontroller
AIoT: Intelligence on MicrocontrollerAIoT: Intelligence on Microcontroller
AIoT: Intelligence on Microcontroller
Andri Yadi
 
Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabs
zekeLabs Technologies
 
A dive into Microsoft Strategy on Machine Learning, Chat Bot, and Artificial ...
A dive into Microsoft Strategy on Machine Learning, Chat Bot, and Artificial ...A dive into Microsoft Strategy on Machine Learning, Chat Bot, and Artificial ...
A dive into Microsoft Strategy on Machine Learning, Chat Bot, and Artificial ...
SeokJin Han
 
What’s New with Databricks Machine Learning
What’s New with Databricks Machine LearningWhat’s New with Databricks Machine Learning
What’s New with Databricks Machine Learning
Databricks
 
Intel 20180608 v2
Intel 20180608 v2Intel 20180608 v2
Intel 20180608 v2
ISSIP
 
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Codemotion
 
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
HostedbyConfluent
 
Cloud Study Jam[1st OCT] gdscgtbit.pptx
Cloud Study Jam[1st OCT] gdscgtbit.pptxCloud Study Jam[1st OCT] gdscgtbit.pptx
Cloud Study Jam[1st OCT] gdscgtbit.pptx
GDSCGTBIT
 
Microsoft AI Platform - AETHER Introduction
Microsoft AI Platform - AETHER IntroductionMicrosoft AI Platform - AETHER Introduction
Microsoft AI Platform - AETHER Introduction
Karthik Murugesan
 

Similar to Containers & AI - Beauty and the Beast!?! (20)

GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
 
Datapalooza: A Music Festival Themed ML & IoT Workshop
Datapalooza: A Music Festival Themed ML & IoT WorkshopDatapalooza: A Music Festival Themed ML & IoT Workshop
Datapalooza: A Music Festival Themed ML & IoT Workshop
 
Microsoft AI Platform Overview
Microsoft AI Platform OverviewMicrosoft AI Platform Overview
Microsoft AI Platform Overview
 
Dato Keynote
Dato KeynoteDato Keynote
Dato Keynote
 
Kaz Sato, Evangelist, Google at MLconf ATL 2016
Kaz Sato, Evangelist, Google at MLconf ATL 2016Kaz Sato, Evangelist, Google at MLconf ATL 2016
Kaz Sato, Evangelist, Google at MLconf ATL 2016
 
DataPalooza - A Music Festival themed ML + IoT Workshop
DataPalooza - A Music Festival themed ML + IoT WorkshopDataPalooza - A Music Festival themed ML + IoT Workshop
DataPalooza - A Music Festival themed ML + IoT Workshop
 
Summit Australia 2019 - Supercharge PowerPlatform with AI - Dipankar Bhattach...
Summit Australia 2019 - Supercharge PowerPlatform with AI - Dipankar Bhattach...Summit Australia 2019 - Supercharge PowerPlatform with AI - Dipankar Bhattach...
Summit Australia 2019 - Supercharge PowerPlatform with AI - Dipankar Bhattach...
 
Democratize ai with google cloud
Democratize ai with google cloudDemocratize ai with google cloud
Democratize ai with google cloud
 
DataPalooza: ML & IoT Workshop
DataPalooza: ML & IoT WorkshopDataPalooza: ML & IoT Workshop
DataPalooza: ML & IoT Workshop
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
 
Big Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-AriBig Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-Ari
 
AIoT: Intelligence on Microcontroller
AIoT: Intelligence on MicrocontrollerAIoT: Intelligence on Microcontroller
AIoT: Intelligence on Microcontroller
 
Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabs
 
A dive into Microsoft Strategy on Machine Learning, Chat Bot, and Artificial ...
A dive into Microsoft Strategy on Machine Learning, Chat Bot, and Artificial ...A dive into Microsoft Strategy on Machine Learning, Chat Bot, and Artificial ...
A dive into Microsoft Strategy on Machine Learning, Chat Bot, and Artificial ...
 
What’s New with Databricks Machine Learning
What’s New with Databricks Machine LearningWhat’s New with Databricks Machine Learning
What’s New with Databricks Machine Learning
 
Intel 20180608 v2
Intel 20180608 v2Intel 20180608 v2
Intel 20180608 v2
 
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
 
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
 
Cloud Study Jam[1st OCT] gdscgtbit.pptx
Cloud Study Jam[1st OCT] gdscgtbit.pptxCloud Study Jam[1st OCT] gdscgtbit.pptx
Cloud Study Jam[1st OCT] gdscgtbit.pptx
 
Microsoft AI Platform - AETHER Introduction
Microsoft AI Platform - AETHER IntroductionMicrosoft AI Platform - AETHER Introduction
Microsoft AI Platform - AETHER Introduction
 

More from Tobias Schneck

Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Kubernetes in the Manufacturing Line @KubeCon EU Valencia 2022
Kubernetes in the Manufacturing Line  @KubeCon EU Valencia 2022 Kubernetes in the Manufacturing Line  @KubeCon EU Valencia 2022
Kubernetes in the Manufacturing Line @KubeCon EU Valencia 2022
Tobias Schneck
 
$ kubectl hacking @DevOpsCon Berlin 2019
$ kubectl hacking @DevOpsCon Berlin 2019$ kubectl hacking @DevOpsCon Berlin 2019
$ kubectl hacking @DevOpsCon Berlin 2019
Tobias Schneck
 
Will ARM be the new Mainstream in our Data Centers? @Rejekts Paris 2024
 Will ARM be the new Mainstream in our Data Centers? @Rejekts Paris 2024 Will ARM be the new Mainstream in our Data Centers? @Rejekts Paris 2024
Will ARM be the new Mainstream in our Data Centers? @Rejekts Paris 2024
Tobias Schneck
 
Kubermatic How to Migrate 100 Clusters from On-Prem to Google Cloud Without D...
Kubermatic How to Migrate 100 Clusters from On-Prem to Google Cloud Without D...Kubermatic How to Migrate 100 Clusters from On-Prem to Google Cloud Without D...
Kubermatic How to Migrate 100 Clusters from On-Prem to Google Cloud Without D...
Tobias Schneck
 
ClusterAPI Overview - Managing multi-cloud Kubernetes Clusters - k8s Meetup@v...
ClusterAPI Overview - Managing multi-cloud Kubernetes Clusters - k8s Meetup@v...ClusterAPI Overview - Managing multi-cloud Kubernetes Clusters - k8s Meetup@v...
ClusterAPI Overview - Managing multi-cloud Kubernetes Clusters - k8s Meetup@v...
Tobias Schneck
 
Creating Kubernetes multi clusters with ClusterAPI @ Stuttgart Kubernetes Meetup
Creating Kubernetes multi clusters with ClusterAPI @ Stuttgart Kubernetes MeetupCreating Kubernetes multi clusters with ClusterAPI @ Stuttgart Kubernetes Meetup
Creating Kubernetes multi clusters with ClusterAPI @ Stuttgart Kubernetes Meetup
Tobias Schneck
 
KubeCI - Cloud Native Continuous Delivery for Kubernetes
KubeCI - Cloud Native Continuous Delivery for KubernetesKubeCI - Cloud Native Continuous Delivery for Kubernetes
KubeCI - Cloud Native Continuous Delivery for Kubernetes
Tobias Schneck
 
Kubernetes Cluster API - managing the infrastructure of multi clusters (k8s ...
Kubernetes Cluster API - managing the infrastructure of  multi clusters (k8s ...Kubernetes Cluster API - managing the infrastructure of  multi clusters (k8s ...
Kubernetes Cluster API - managing the infrastructure of multi clusters (k8s ...
Tobias Schneck
 
UI Testing - Selenium? Rich-Clients? Containers? (SwanseaCon 2018)
UI Testing - Selenium? Rich-Clients? Containers? (SwanseaCon 2018)UI Testing - Selenium? Rich-Clients? Containers? (SwanseaCon 2018)
UI Testing - Selenium? Rich-Clients? Containers? (SwanseaCon 2018)
Tobias Schneck
 
Creating Kubernetes multi clusters with ClusterAPI in the Hetzner Cloud
Creating Kubernetes multi clusters with ClusterAPI in the Hetzner CloudCreating Kubernetes multi clusters with ClusterAPI in the Hetzner Cloud
Creating Kubernetes multi clusters with ClusterAPI in the Hetzner Cloud
Tobias Schneck
 
OpenShift Build Pipelines @ Lightweight Java User Group Meetup
OpenShift Build Pipelines @ Lightweight Java User Group MeetupOpenShift Build Pipelines @ Lightweight Java User Group Meetup
OpenShift Build Pipelines @ Lightweight Java User Group Meetup
Tobias Schneck
 
OpenShift-Build-Pipelines: Build -> Test -> Run! @JavaForumStuttgart
OpenShift-Build-Pipelines: Build -> Test -> Run! @JavaForumStuttgartOpenShift-Build-Pipelines: Build -> Test -> Run! @JavaForumStuttgart
OpenShift-Build-Pipelines: Build -> Test -> Run! @JavaForumStuttgart
Tobias Schneck
 
OpenShift-Build-Pipelines: Build ► Test ► Run!
OpenShift-Build-Pipelines: Build ► Test ► Run!OpenShift-Build-Pipelines: Build ► Test ► Run!
OpenShift-Build-Pipelines: Build ► Test ► Run!
Tobias Schneck
 
Kotlin for backend development (Hackaburg 2018 Regensburg)
Kotlin for backend development (Hackaburg 2018 Regensburg)Kotlin for backend development (Hackaburg 2018 Regensburg)
Kotlin for backend development (Hackaburg 2018 Regensburg)
Tobias Schneck
 
UI-Testing - Selenium? Rich-Clients? Containers? @APEX connect 2018
UI-Testing - Selenium? Rich-Clients? Containers? @APEX connect 2018UI-Testing - Selenium? Rich-Clients? Containers? @APEX connect 2018
UI-Testing - Selenium? Rich-Clients? Containers? @APEX connect 2018
Tobias Schneck
 
Continuous Testing: Integration- und UI-Testing mit OpenShift-Build-Pipelines
Continuous Testing: Integration- und UI-Testing mit OpenShift-Build-PipelinesContinuous Testing: Integration- und UI-Testing mit OpenShift-Build-Pipelines
Continuous Testing: Integration- und UI-Testing mit OpenShift-Build-Pipelines
Tobias Schneck
 
Testing - Selenium? Rich-Clients? Containers?
Testing - Selenium? Rich-Clients? Containers?Testing - Selenium? Rich-Clients? Containers?
Testing - Selenium? Rich-Clients? Containers?
Tobias Schneck
 
OOP2017: Containerized End-2-End Testing – automate it!
OOP2017: Containerized End-2-End Testing – automate it!OOP2017: Containerized End-2-End Testing – automate it!
OOP2017: Containerized End-2-End Testing – automate it!
Tobias Schneck
 
Containerized End-2-End Testing - Agile Testing Meetup at Süddeutsche Zeitung...
Containerized End-2-End Testing - Agile Testing Meetup at Süddeutsche Zeitung...Containerized End-2-End Testing - Agile Testing Meetup at Süddeutsche Zeitung...
Containerized End-2-End Testing - Agile Testing Meetup at Süddeutsche Zeitung...
Tobias Schneck
 

More from Tobias Schneck (20)

Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Kubernetes in the Manufacturing Line @KubeCon EU Valencia 2022
Kubernetes in the Manufacturing Line  @KubeCon EU Valencia 2022 Kubernetes in the Manufacturing Line  @KubeCon EU Valencia 2022
Kubernetes in the Manufacturing Line @KubeCon EU Valencia 2022
 
$ kubectl hacking @DevOpsCon Berlin 2019
$ kubectl hacking @DevOpsCon Berlin 2019$ kubectl hacking @DevOpsCon Berlin 2019
$ kubectl hacking @DevOpsCon Berlin 2019
 
Will ARM be the new Mainstream in our Data Centers? @Rejekts Paris 2024
 Will ARM be the new Mainstream in our Data Centers? @Rejekts Paris 2024 Will ARM be the new Mainstream in our Data Centers? @Rejekts Paris 2024
Will ARM be the new Mainstream in our Data Centers? @Rejekts Paris 2024
 
Kubermatic How to Migrate 100 Clusters from On-Prem to Google Cloud Without D...
Kubermatic How to Migrate 100 Clusters from On-Prem to Google Cloud Without D...Kubermatic How to Migrate 100 Clusters from On-Prem to Google Cloud Without D...
Kubermatic How to Migrate 100 Clusters from On-Prem to Google Cloud Without D...
 
ClusterAPI Overview - Managing multi-cloud Kubernetes Clusters - k8s Meetup@v...
ClusterAPI Overview - Managing multi-cloud Kubernetes Clusters - k8s Meetup@v...ClusterAPI Overview - Managing multi-cloud Kubernetes Clusters - k8s Meetup@v...
ClusterAPI Overview - Managing multi-cloud Kubernetes Clusters - k8s Meetup@v...
 
Creating Kubernetes multi clusters with ClusterAPI @ Stuttgart Kubernetes Meetup
Creating Kubernetes multi clusters with ClusterAPI @ Stuttgart Kubernetes MeetupCreating Kubernetes multi clusters with ClusterAPI @ Stuttgart Kubernetes Meetup
Creating Kubernetes multi clusters with ClusterAPI @ Stuttgart Kubernetes Meetup
 
KubeCI - Cloud Native Continuous Delivery for Kubernetes
KubeCI - Cloud Native Continuous Delivery for KubernetesKubeCI - Cloud Native Continuous Delivery for Kubernetes
KubeCI - Cloud Native Continuous Delivery for Kubernetes
 
Kubernetes Cluster API - managing the infrastructure of multi clusters (k8s ...
Kubernetes Cluster API - managing the infrastructure of  multi clusters (k8s ...Kubernetes Cluster API - managing the infrastructure of  multi clusters (k8s ...
Kubernetes Cluster API - managing the infrastructure of multi clusters (k8s ...
 
UI Testing - Selenium? Rich-Clients? Containers? (SwanseaCon 2018)
UI Testing - Selenium? Rich-Clients? Containers? (SwanseaCon 2018)UI Testing - Selenium? Rich-Clients? Containers? (SwanseaCon 2018)
UI Testing - Selenium? Rich-Clients? Containers? (SwanseaCon 2018)
 
Creating Kubernetes multi clusters with ClusterAPI in the Hetzner Cloud
Creating Kubernetes multi clusters with ClusterAPI in the Hetzner CloudCreating Kubernetes multi clusters with ClusterAPI in the Hetzner Cloud
Creating Kubernetes multi clusters with ClusterAPI in the Hetzner Cloud
 
OpenShift Build Pipelines @ Lightweight Java User Group Meetup
OpenShift Build Pipelines @ Lightweight Java User Group MeetupOpenShift Build Pipelines @ Lightweight Java User Group Meetup
OpenShift Build Pipelines @ Lightweight Java User Group Meetup
 
OpenShift-Build-Pipelines: Build -> Test -> Run! @JavaForumStuttgart
OpenShift-Build-Pipelines: Build -> Test -> Run! @JavaForumStuttgartOpenShift-Build-Pipelines: Build -> Test -> Run! @JavaForumStuttgart
OpenShift-Build-Pipelines: Build -> Test -> Run! @JavaForumStuttgart
 
OpenShift-Build-Pipelines: Build ► Test ► Run!
OpenShift-Build-Pipelines: Build ► Test ► Run!OpenShift-Build-Pipelines: Build ► Test ► Run!
OpenShift-Build-Pipelines: Build ► Test ► Run!
 
Kotlin for backend development (Hackaburg 2018 Regensburg)
Kotlin for backend development (Hackaburg 2018 Regensburg)Kotlin for backend development (Hackaburg 2018 Regensburg)
Kotlin for backend development (Hackaburg 2018 Regensburg)
 
UI-Testing - Selenium? Rich-Clients? Containers? @APEX connect 2018
UI-Testing - Selenium? Rich-Clients? Containers? @APEX connect 2018UI-Testing - Selenium? Rich-Clients? Containers? @APEX connect 2018
UI-Testing - Selenium? Rich-Clients? Containers? @APEX connect 2018
 
Continuous Testing: Integration- und UI-Testing mit OpenShift-Build-Pipelines
Continuous Testing: Integration- und UI-Testing mit OpenShift-Build-PipelinesContinuous Testing: Integration- und UI-Testing mit OpenShift-Build-Pipelines
Continuous Testing: Integration- und UI-Testing mit OpenShift-Build-Pipelines
 
Testing - Selenium? Rich-Clients? Containers?
Testing - Selenium? Rich-Clients? Containers?Testing - Selenium? Rich-Clients? Containers?
Testing - Selenium? Rich-Clients? Containers?
 
OOP2017: Containerized End-2-End Testing – automate it!
OOP2017: Containerized End-2-End Testing – automate it!OOP2017: Containerized End-2-End Testing – automate it!
OOP2017: Containerized End-2-End Testing – automate it!
 
Containerized End-2-End Testing - Agile Testing Meetup at Süddeutsche Zeitung...
Containerized End-2-End Testing - Agile Testing Meetup at Süddeutsche Zeitung...Containerized End-2-End Testing - Agile Testing Meetup at Süddeutsche Zeitung...
Containerized End-2-End Testing - Agile Testing Meetup at Süddeutsche Zeitung...
 

Recently uploaded

A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
ScyllaDB
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
Mydbops
 
Cyber Recovery Wargame
Cyber Recovery WargameCyber Recovery Wargame
Cyber Recovery Wargame
Databarracks
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
christinelarrosa
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
christinelarrosa
 
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google CloudRadically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
ScyllaDB
 
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
AlexanderRichford
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
CTO Insights: Steering a High-Stakes Database Migration
CTO Insights: Steering a High-Stakes Database MigrationCTO Insights: Steering a High-Stakes Database Migration
CTO Insights: Steering a High-Stakes Database Migration
ScyllaDB
 
Day 4 - Excel Automation and Data Manipulation
Day 4 - Excel Automation and Data ManipulationDay 4 - Excel Automation and Data Manipulation
Day 4 - Excel Automation and Data Manipulation
UiPathCommunity
 
So You've Lost Quorum: Lessons From Accidental Downtime
So You've Lost Quorum: Lessons From Accidental DowntimeSo You've Lost Quorum: Lessons From Accidental Downtime
So You've Lost Quorum: Lessons From Accidental Downtime
ScyllaDB
 
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc
 
An Introduction to All Data Enterprise Integration
An Introduction to All Data Enterprise IntegrationAn Introduction to All Data Enterprise Integration
An Introduction to All Data Enterprise Integration
Safe Software
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
Pablo Gómez Abajo
 
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
FilipTomaszewski5
 
Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...
Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...
Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...
anilsa9823
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
christinelarrosa
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
zjhamm304
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
ScyllaDB Real-Time Event Processing with CDC
ScyllaDB Real-Time Event Processing with CDCScyllaDB Real-Time Event Processing with CDC
ScyllaDB Real-Time Event Processing with CDC
ScyllaDB
 

Recently uploaded (20)

A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
 
Cyber Recovery Wargame
Cyber Recovery WargameCyber Recovery Wargame
Cyber Recovery Wargame
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
 
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google CloudRadically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
 
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
CTO Insights: Steering a High-Stakes Database Migration
CTO Insights: Steering a High-Stakes Database MigrationCTO Insights: Steering a High-Stakes Database Migration
CTO Insights: Steering a High-Stakes Database Migration
 
Day 4 - Excel Automation and Data Manipulation
Day 4 - Excel Automation and Data ManipulationDay 4 - Excel Automation and Data Manipulation
Day 4 - Excel Automation and Data Manipulation
 
So You've Lost Quorum: Lessons From Accidental Downtime
So You've Lost Quorum: Lessons From Accidental DowntimeSo You've Lost Quorum: Lessons From Accidental Downtime
So You've Lost Quorum: Lessons From Accidental Downtime
 
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
 
An Introduction to All Data Enterprise Integration
An Introduction to All Data Enterprise IntegrationAn Introduction to All Data Enterprise Integration
An Introduction to All Data Enterprise Integration
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
 
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
 
Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...
Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...
Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
ScyllaDB Real-Time Event Processing with CDC
ScyllaDB Real-Time Event Processing with CDCScyllaDB Real-Time Event Processing with CDC
ScyllaDB Real-Time Event Processing with CDC
 

Containers & AI - Beauty and the Beast!?!

  • 1. Containers & AI 👸Beauty and the 👾Beast!?!
  • 3. As a Container & Kubernes guy, why should we care about AI? 🤨
  • 4. … will it be the next big thing?! 😨
  • 5. By 2028, the adoption of AI will culminate in over 50% of cloud compute resources devoted to AI workload, up from less than 10% in 2023. Gartner® states, 2023
  • 6. OK … so what’s about this AI thingy? 🤔󰞵
  • 8. AI Technology Layers Data Gathering Collect and prepare data for training AI models Data Processing Clean and structure data for effective learning
  • 9. AI Technology Layers Data Gathering Collect and prepare data for training AI models Data Processing Clean and structure data for effective learning Machine Learning Algorithms learn from data patterns Deep Learning Complex patterns learned using neural networks Language Models Large models that understand and generate language
  • 10. AI Technology Layers Data Gathering Collect and prepare data for training AI models Data Processing Clean and structure data for effective learning Machine Learning Algorithms learn from data patterns Deep Learning Complex patterns learned using neural networks Language Models Large models that understand and generate language Chatbot Applications Interactive systems using language models Chat GPT A conversational agent powered by language models
  • 11. AI Technology Layers Data Gathering Collect and prepare data for training AI models Data Processing Clean and structure data for effective learning Machine Learning Algorithms learn from data patterns Deep Learning Complex patterns learned using neural networks Language Models Large models that understand and generate language Chatbot Applications Interactive systems using language models Chat GPT A conversational agent powered by language models API / UI
  • 12. AI Technology Layers Data Gathering Collect and prepare data for training AI models Data Processing Clean and structure data for effective learning Machine Learning Algorithms learn from data patterns Deep Learning Complex patterns learned using neural networks Language Models Large models that understand and generate language Chatbot Applications Interactive systems using language models Chat GPT A conversational agent powered by language models API / UI How to manage? ��
  • 13. … a lot of Data and Math for an Infrastructure guy 🧐 … how does such data get compute? 🖥 💽
  • 14. Credits to Andrej Karpathy 👉 Awesome Intro to LLMs [1hr Talk] Intro to Large Language Models
  • 15. Large Language Model (LLM) Credits to Andrej Karpathy 👏
  • 16. Training them is more involved. Think of it like compressing the internet. Credits to Andrej Karpathy 👏
  • 17. How does it work? Credits to Andrej Karpathy 👏 Little is known in full detail…. ● Billions of parameters are dispersed through the network ● We know how to iteratively adjust them to make it better at prediction ● We can measure that this works, but we don’t really know how the billions of parameters collaborate to do it. They build and maintain some kind of knowledge database, but it is a bit strange and imperfect: Recent viral example: “reversal curse” Q: “Who is Tom Cruise’s mother”? A: Mary Lee Pfeiffer ✅ Q: “Who is Mary Lee Pfeiffer’s son?” A: I don’t know ❌ ⇒ Think of LLMs as mostly inscrutable artifacts, develop correspondingly sophisticated evaluations
  • 18. Summary: how to train your ChatGPT Credits to Andrej Karpathy 👏 Stage 1: Pretraining 1. Download ∼10TB of text. 2. Get a cluster of ∼6,000 GPUs. 3. Compress the text into a neutral network, pay ∼$2M, wait ∼12 days 4. Obtain a base model. Stage 2: Finetuning 1. Write labeling instructions 2. Hire people (our use scale.ai!), collect 100K high quality ideal Q&A responses, and/or comparisons. 3. Finetune base model on this data, wait ∼1 day. 4. Obtain assistant model. 5. Run a lot of evaluations. 6. Deploy. 7. Monitor, collect misbehaviors, go to step 1. Credits to Andrej Karpathy 👏 Every ∼year Every ∼week
  • 19. Summary: how to train your ChatGPT Credits to Andrej Karpathy 👏 Stage 1: Pretraining 1. Download ∼10TB of text. 2. Get a cluster of ∼6,000 GPUs. 3. Compress the text into a neutral network, pay ∼$2M, wait ∼12 days 4. Obtain a base model. Stage 2: Finetuning 1. Write labeling instructions 2. Hire people (our use scale.ai!), collect 100K high quality ideal Q&A responses, and/or comparisons. 3. Finetune base model on this data, wait ∼1 day. 4. Obtain assistant model. 5. Run a lot of evaluations. 6. Deploy. 7. Monitor, collect misbehaviors, go to step 1. Credits to Andrej Karpathy 👏 Every ∼year Every ∼week
  • 20. How does our normal Job look like? 󰻶
  • 21. Platform Orchestration / Scheduling Infrastructure Cloud or On-prem Hardware Accelerators Platform Engineering CPU Network Storage Hardware Architect SRE / Operations Platform Engineer Based on Adel Zaalouk (@ZaNetworker) drawings from the CNCF Cloud Native AI white paper 🙏
  • 22. What will change in our Infra? 🏗
  • 23. Based on Adel Zaalouk (@ZaNetworker) drawings from the CNCF Cloud Native AI white paper 🙏 Artificial Intelligence (AI) Machine Learning (ML) Deep Learning (DL) Math & Statistics Exploratory Data Analysis (EDA) Visualization Data Science
  • 24. Artificial Intelligence (AI) Machine Learning (ML) Deep Learning (DL) Math & Statistics Exploratory Data Analysis (EDA) Visualization Data Science Cloud Native? Based on Adel Zaalouk (@ZaNetworker) drawings from the CNCF Cloud Native AI white paper 🙏
  • 25. Artificial Intelligence (AI) Machine Learning (ML) Deep Learning (DL) Math & Statistics Exploratory Data Analysis (EDA) Visualization Data Science Cloud Native? Platform Orchestration / Scheduling Platform Engineering Platform Engineer AI Platform 🤔
  • 27. Flexibility & Standardization Standard Container High-Cube Container Hardtop Container Open Top Container Flat Platform (Plat) Ventilated Container Cooling Container Bulk Container Tank Container Container Types
  • 28. Data Center I Infrastructure Layer Standardization with Kubernetes App Services IT Space I IT Space II IT Space III Backend Services DB Services Analytics Observability Data Center II Data Center III Edge Cloud Providers Caches AI / ML DDoS Protect Managed Services Real Time Analysis Intelligent Edge Devices Smart Automation Data Processing
  • 29. Kube for AI ⇔ Kube for Applications ● Kube is a de facto standard as “cloud operating systems” ● API abstraction layer for multiple types of network, storage, and compute resources ● Standard interfaces for support of DevOps best practices like GitOps ● Variation of Cloud Providers and Services are consumable via standard APIs
  • 30. … and OpenAI uses Kube already 2017 🤯
  • 31. So …. 😅 How to build an AI Platform? 🏗 🤔
  • 32. Cloud Native! Data Preparation Feature Store Model Development ML Train / Tune Model Storage Model Serving ML Engineer
  • 33. Cloud Native! Data Preparation Feature Store Model Development ML Train / Tune Model Storage Model Serving Repeat the Process ML Engineer
  • 34. Cloud Native! Data Preparation Feature Store Model Development ML Train / Tune Model Storage Model Serving Repeat the Process Platform Orchestration / Scheduling Platform Engineering Platform Engineer AI Platform 🤔 ML Engineer Support during Orchestration and easy to use for Data!
  • 35. Cloud Native! Data Preparation Feature Store Model Development ML Train / Tune Model Storage Model Serving Repeat the Process Platform Orchestration / Scheduling Platform Engineering Platform Engineer AI Platform 🤔 ML Engineer
  • 36. Cloud Native! Data Preparation Feature Store Model Development ML Train / Tune Model Storage Model Serving Repeat the Process Platform Orchestration / Scheduling Platform Engineering Platform Engineer AI Platform 🤔 ML Engineer Application Development Application QA Application Rollout CI / CD
  • 37. Cloud Native! Data Preparation Feature Store Model Development ML Train / Tune Model Storage Model Serving Repeat the Process Platform Orchestration / Scheduling Platform Engineering Platform Engineer AI Platform 🤔 ML Engineer Application Development Application QA Application Rollout CI / CD Cloud Native AI Apps Consume Get managed by
  • 38. Platform Orchestration / Scheduling Infrastructure Cloud or On-prem Hardware Accelerators Hardware Architect SRE / Operations Platform Engineer Data/ML/AI Engineer Data-Scientist / Developer Workloads Models, applications, …. ML Lifecycle (AI/ML/LLM Ops) CPU GPU NPU TPU DPU Data Prep Model Training Model Serving Perf / Scale Observe CI CD Classification Object Detection Clustering Forecasting …. RAGs LLMs Vector DBs LVMs …. Predictive Generative Cloud Native AI - ©CNCF White Paper Artificial Intelligence (AI) Machine Learning (ML) Deep Learning (DL) Math & Statistics Exploratory Data Analysis (EDA) Visualization Data Science
  • 39. How does the ecosystem look like!? 🛒 🛍
  • 40. AI Frameworks Popular Frameworks by GitHub Stars TensorFlow: 181.3K PyTorch: 99.2K Keras: 54.3K Scikit-learn: 49.1K Caffe: 33.7K
  • 41. Options to Adapt the Frameworks 󰞵 Feature Focused Tools (local/on-prem) Examples MLflow, Backyard AI, Ollama, Hugging Face TGI Scope Specific functionalities within ML lifecycle Open Source Yes Scalability & Portability Moderate Setup & Management Simpler Portability Mostly Machine based Vendor Lock-in No
  • 42. Options to Adapt the Frameworks 󰞵 Feature Focused Tools (local/on-prem) Managed Platforms Examples MLflow, Backyard AI, Ollama, Hugging Face TGI AWS SageMaker, scale.ai Scope Specific functionalities within ML lifecycle Managed MLOps service Open Source Yes No Scalability & Portability Moderate Depends on cloud provider Setup & Management Simpler Simpler Portability Mostly Machine based Mostly Cloud Vendor Lock-in No Yes (to specific cloud provider)
  • 43. Options to Adapt the Frameworks 󰞵 Feature Focused Tools (local/on-prem) Managed Platforms “Kube Native” Examples MLflow, Backyard AI, Ollama, Hugging Face TGI AWS SageMaker, scale.ai Kubeflow / KServer (Hugging Face TGI / LocalAI) Scope Specific functionalities within ML lifecycle Managed MLOps service End-to-end MLOps platform Open Source Yes No Yes Scalability & Portability Moderate Depends on cloud provider High Setup & Management Simpler Simpler Complex Portability Mostly Machine based Mostly Cloud Everywhere Vendor Lock-in No Yes (to specific cloud provider) No
  • 44. AI Frameworks Popular Frameworks by GitHub Stars TensorFlow: 181.3K PyTorch: 99.2K Keras: 54.3K Scikit-learn: 49.1K Caffe: 33.7K Could use
  • 45. KubeFlow ⁉ 👉 Currently the most feature complete choice for Kube 🥴 But Setup is complex!
  • 46. KubeFlow ⁉ The Beauty 👸: ● Incubating CNCF Project ● Serving AI Platform in Multi-Tenancy ● Popularity 13.7k ⭐ ~ long-term Maintenance Chance ● Alternatives like MLflow / KServe are integrated The Beast 👾 ● Mostly vendor specific installer instructions 🥴 ○ No maintained automated installer for generic Kubernetes ○ Helm chart issue #3173 ● Dependency “hell” ○ A lot of different 3rd party dependencies constraints ○ Hard to adapt again to existing company defaults ● Only support EOL Kubernetes <= 1.26❗ ○ Usability is then questionable in production
  • 47. Sounds good, but what about on-prem / offline cases? 🤔
  • 48. [Cloud] Data Center I GPU / TPU Powered Services based on Argo CD AI Model Serving [AI] Application Service Application ⚙ Separate Model Training / Model Usage Example Infrastructure Layer Data Center II Data Center III Edge Cloud Providers Real Time Analysis Intelligent Edge Devices Smart Automation Data Processing Data Delivery Model Export Local AI Consume Scale for Training Vanilla Setup
  • 50. Kubeflow | Katib Architecture for Hyperparameter Tuning (aka optimization run)
  • 52. Serve trained Models “local” with LocalAI
  • 53. Ask localAI about CloudLand
  • 54. Ask localAI about KCD Istanbul
  翻译: