尊敬的 微信汇率:1円 ≈ 0.046089 元 支付宝汇率:1円 ≈ 0.04618元 [退出登录]
SlideShare a Scribd company logo
What’s coming
in Apache
Airflow 2.0
Polidea
Polidea
Apache Airflow
Airflow is a platform to programmatically author,
schedule and monitor workflows.
Dynamic/Elegant
Extensible
Scalable
Polidea
What’s on
today ?
Polidea
What is the presentation about ?
● The team @ Polidea
● What the Airflow ?
● Where Apache Airflow is now?
● What’s coming in Apache Airflow 2.0.
Polidea
Team @ Polidea
Polidea
Logo or mockup
Hi!
Jarek Potiuk
Principal Software Engineer @Polidea
Apache Airflow PMC member
Certified GCP Architect
ex-Googler, ex-CTO, ex-choir member
@higrys
Polidea
Apache Airflow Development team@ Polidea
Jarek Potiuk Kamil Breguła Tomasz Urbaszek Karolina Rosół
Dariusz Aniszewski Szymon Przedwojski Antoni Smoliński
Tobiasz Kędzierski Michał Słowikowski
PMC
Past:
Polidea
Apache Airflow Website team @ Polidea
Kamil Breguła Zuzanna Rykowska Kamil Gabryjelski
Magdalena WęgrzyńskaMarta StrzałkowskaTomasz Urbaszek
Polidea
70+
TALENTS
100+
PROJECTS
DELIVERED
3m
USERS OF
OUR APPS
75%
OF BUSINESS
THROUGH
REFERRALS
Team
@Polidea
Polidea
Polidea &
Apache Airflow
Polidea
August 2018
2 people
Timeline December 2019
6 (9) people
Polidea
Our tasks
● 130+ operators
● 18+ GCP services
● Oozie-To-Airflow
● New Apache Airflow Website
Polidea
What we delivered extra
● Documentation improvements
● Breeze - improved dev environment
● Py2 -> Py3
● Pylint compatibility
● Pre-commit framework introduction
● CI environment reimplemented
● Operator scaffolding
● Convert tests to pytests
2 Apache Airflow Committers
Apache Airflow PMC member
Polidea
Open-source friendly company
Polidea
Apache
Airflow
Polidea
Why Apache Airflow and not one of these?
And many, many, many more ....
Polidea
Airflow is an Orchestrator
● Tells others what/when to do
● Synchronizes work between others
● Monitors what’s going on
● Intervenes if needed
● Mostly does not do much
Polidea
Airflow is Python
Polidea
Arbitrary complex workflows as a program
Polidea
Airflow has usable UI
Polidea
Airflow CLI
Polidea
What Airflow shines at ?
● Regular batch ETL jobs (think CRON)
● Processing fixed intervals of data
● Managing complex dependencies
● Backfilling data
● Interfacing to hundreds of different systems
● Platform for others to generate DAG files
Polidea
Apache Airflow 1.10
state of the pinwheel
Polidea
Current versions
● 1.10.2, 1.10.3, 1.10.4, 1.10.5, 1.10.6 ….
● 1.10.7 in the making
● Deployed in thousands of companies
● On the rise of usage
● 2.0 - in master
Polidea
How to stay relevant ?
● Cloud Native is coming
● APIs are backbone of modern software
● User Interface matters
● Performance and reliability matter
● Many services, many changes
● Community over code
Polidea
End of 2019 survey: 300 responses(!)
● Started by Tomasz Urbaszek
● Run for the last 2 weeks
● Fresh off-the press
● Some surprises found
● Going in the right direction
Polidea
Polidea
What do you use Airflow for?
Data processing (ETL) 97%
Artificial Intelligence and Machine Learning
Pipelines 29%
Automating DevOps operations 21%
Polidea
What can be improved ?
Scheduler performance 61%
Web UI 58%
Logging, monitoring and alerting 47%
Examples, howtos, onboarding documentation 46%
Technical documentation 44%
Reliability 36%
REST API 31%
Authentication and authorization 29%
Polidea
What would be the most interesting feature for you ?
Production-ready docker image 56%
Declarative way of writing DAGs 50%
Horizontal autoscaling 40%
Examples, howtos, onboarding documentation 46%
Asynchronous Operators 31%
Stateless web server 26%
Knative Executor 16%
I already have all I need 4%
Polidea
Apache Airflow
2.0
Polidea
Cloud Native is coming
Polidea
Polidea
No - we do not plan to use Kubernetes near term 29%
Yes - setup on our own via Helm Chart or similar 21%
Not yet - but we use Kubernetes in our organization and we
could move 20%
Yes - via managed service in the cloud
(Composer/Astronomer etc.) 15%
Not yet - but we plan to deploy Kubernetes in our
organization soon 14%
Other 2%
Either use or can use Kubernetes in foreseeable future 69%
Do not have plans to use Kubernetes 29%
Do you use Kubernetes-based deployments for Airflow?
Polidea
Cloud Native is coming: Scalability
● Knative Executor
● SIG-Knative => SIG Scalability
● Native Airflow Executor (WIP)
● Pub/Sub communication
● Horizontally auto-scalable
Polidea
Cloud Native is coming: Deployability
● Native worker deployable at different providers
● “As a service” and “on-premises” friendly
● Generic Pub/Sub architecture for communication
● No DB communication between components
● Production-optimised docker image
Polidea
Cloud Native is coming: Monitoring
● Integrate with standard monitoring tools
● More metrics exported using stats
● Integration with Prometheus on Kubernetes
● Horizontal Scalability approach based on metrics
Polidea
APIs are backbone of modern software
Polidea
APIs are taking over the world
● Modern API
● HTTP-based API used by CLI, webserver
● Pub/Sub API for communication Scheduler <> Workers
● Generic APIs - not tied to Kubernetes/other deployment options
● Better Authentication/Authorization
● Opens up multi-tenancy capabilities
Polidea
User Interface matters
Polidea
Original Airflow Graphical User Interface 97%
CLI 40%
API (experimental) 20%
Custom Own Created UI 8%
Which interface(s) of Airflow do you use as part of your current role?
Polidea
UIs are getting better
● Make UI refresh like it’s 2020
● Modern design (possibly)
● Use APIs for communication not DB/file access
● Better authentication and authorisation
● Stateless web-server
● Better responsiveness
Polidea
Performance and reliability
matter
Polidea
Performance and reliability is important
● Automated performance testing (CI - targeted)
● Monitoring performance characteristics
● Improve Webserver/Scheduler Performance
● Internal instrumentation and optimisations
Polidea
Many services, fast changes
Polidea
Fast evolving services
● Currently operators bound to releases of Airflow
● Migration to 2.0 will take time
● Introducing new approach
○ move operators to new path/namespaces
○ change import paths
○ backporting to 1.10
○ backportable to 1.10 (!)
○ future: per-provider packaging
Polidea
Community over code
Polidea
Polidea
Community over code: Documentation
● Google Season of Docs - great programme!
● Onboarding, best practices, architecture, deployment options
● Better, clearer structure
● Both user and developer documentation improved
● Worked with technical writers from India and Russia
Polidea
Community over code: New website: airflow.apache.org
Work sponsored by Google Cloud
Polidea
Community over code: Development environment
● It’s a Breeze to develop Apache Airflow
● Get your environment up in 10 minutes
● Integration with IDE
● Well documented
● Team-work enabler
● Allows to run and debug DAGs
● Fully debuggable: DebugExecutor - cooperation with Databand.ai
Friday, December 13, 2019
5:30 PM to 9:30 PM
Polidea Sp. z o.o.
Przeskok 2 IV p. · Warsaw
https://t.co/TmWdWwfemI
First Warsaw Apache Airflow Workshop
Polidea
Thanks!
hello@polidea.com
Polidea
Thanks!
Prototypes Widely Used
UX Design
Personal Growth Individuality Trust
Beter, Fester
Ideation
API Design
Shipping
Building
blocks
Backend VR Android Learn
more
Firmware
Decision
Coffee
CollaborationContactCVSave moneyOpen Source
Management
React
Native
Testing Team
Quality iOS Maintain Security Front End
Flutter
Task
Seamless UXResearchMobile AR

More Related Content

What's hot

Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
Kaxil Naik
 
Airflow 101
Airflow 101Airflow 101
Airflow 101
SaarBergerbest
 
Devops Porto - CI/CD at Gitlab
Devops Porto - CI/CD at GitlabDevops Porto - CI/CD at Gitlab
Devops Porto - CI/CD at Gitlab
Filipa Lacerda
 
Airflow at WePay
Airflow at WePayAirflow at WePay
Airflow at WePay
Chris Riccomini
 
GitLab for CI/CD process
GitLab for CI/CD processGitLab for CI/CD process
GitLab for CI/CD process
HYS Enterprise
 
Airflow presentation
Airflow presentationAirflow presentation
Airflow presentation
Anant Corporation
 
Kube Your Enthusiasm - Paul Czarkowski
Kube Your Enthusiasm - Paul CzarkowskiKube Your Enthusiasm - Paul Czarkowski
Kube Your Enthusiasm - Paul Czarkowski
VMware Tanzu
 
Fyber - airflow best practices in production
Fyber - airflow best practices in productionFyber - airflow best practices in production
Fyber - airflow best practices in production
Itai Yaffe
 
GitOps Toolkit (Cloud Native Nordics Tech Talk)
GitOps Toolkit (Cloud Native Nordics Tech Talk)GitOps Toolkit (Cloud Native Nordics Tech Talk)
GitOps Toolkit (Cloud Native Nordics Tech Talk)
Weaveworks
 
Apache Airflow Introduction
Apache Airflow IntroductionApache Airflow Introduction
Apache Airflow Introduction
Liangjun Jiang
 
Importance of GCP: 30 Days of GCP
Importance of GCP: 30 Days of GCPImportance of GCP: 30 Days of GCP
Importance of GCP: 30 Days of GCP
AnshTyagi27
 
Java 8
Java 8Java 8
Introduction of cloud native CI/CD on kubernetes
Introduction of cloud native CI/CD on kubernetesIntroduction of cloud native CI/CD on kubernetes
Introduction of cloud native CI/CD on kubernetes
Kyohei Mizumoto
 
Gradle
GradleGradle
Gradle build capabilities
Gradle build capabilities Gradle build capabilities
Gradle build capabilities
Zeinab Mohamed Abdelmawla
 
Load impact insights webinar
Load impact insights webinarLoad impact insights webinar
Load impact insights webinar
John Emmitt
 
Argocd up and running
Argocd up and runningArgocd up and running
Argocd up and running
Raphaël PINSON
 
Docker New York City: From GitOps to a scalable CI/CD Pattern for Kubernetes
Docker New York City: From GitOps to a scalable CI/CD Pattern for KubernetesDocker New York City: From GitOps to a scalable CI/CD Pattern for Kubernetes
Docker New York City: From GitOps to a scalable CI/CD Pattern for Kubernetes
Andrew Phillips
 
5 Habits of High-Velocity Teams Using Kubernetes
5 Habits of High-Velocity Teams Using Kubernetes5 Habits of High-Velocity Teams Using Kubernetes
5 Habits of High-Velocity Teams Using Kubernetes
Codefresh
 
So you want to write a cloud function
So you want to write a cloud functionSo you want to write a cloud function
So you want to write a cloud function
Elad Hirsch
 

What's hot (20)

Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
 
Airflow 101
Airflow 101Airflow 101
Airflow 101
 
Devops Porto - CI/CD at Gitlab
Devops Porto - CI/CD at GitlabDevops Porto - CI/CD at Gitlab
Devops Porto - CI/CD at Gitlab
 
Airflow at WePay
Airflow at WePayAirflow at WePay
Airflow at WePay
 
GitLab for CI/CD process
GitLab for CI/CD processGitLab for CI/CD process
GitLab for CI/CD process
 
Airflow presentation
Airflow presentationAirflow presentation
Airflow presentation
 
Kube Your Enthusiasm - Paul Czarkowski
Kube Your Enthusiasm - Paul CzarkowskiKube Your Enthusiasm - Paul Czarkowski
Kube Your Enthusiasm - Paul Czarkowski
 
Fyber - airflow best practices in production
Fyber - airflow best practices in productionFyber - airflow best practices in production
Fyber - airflow best practices in production
 
GitOps Toolkit (Cloud Native Nordics Tech Talk)
GitOps Toolkit (Cloud Native Nordics Tech Talk)GitOps Toolkit (Cloud Native Nordics Tech Talk)
GitOps Toolkit (Cloud Native Nordics Tech Talk)
 
Apache Airflow Introduction
Apache Airflow IntroductionApache Airflow Introduction
Apache Airflow Introduction
 
Importance of GCP: 30 Days of GCP
Importance of GCP: 30 Days of GCPImportance of GCP: 30 Days of GCP
Importance of GCP: 30 Days of GCP
 
Java 8
Java 8Java 8
Java 8
 
Introduction of cloud native CI/CD on kubernetes
Introduction of cloud native CI/CD on kubernetesIntroduction of cloud native CI/CD on kubernetes
Introduction of cloud native CI/CD on kubernetes
 
Gradle
GradleGradle
Gradle
 
Gradle build capabilities
Gradle build capabilities Gradle build capabilities
Gradle build capabilities
 
Load impact insights webinar
Load impact insights webinarLoad impact insights webinar
Load impact insights webinar
 
Argocd up and running
Argocd up and runningArgocd up and running
Argocd up and running
 
Docker New York City: From GitOps to a scalable CI/CD Pattern for Kubernetes
Docker New York City: From GitOps to a scalable CI/CD Pattern for KubernetesDocker New York City: From GitOps to a scalable CI/CD Pattern for Kubernetes
Docker New York City: From GitOps to a scalable CI/CD Pattern for Kubernetes
 
5 Habits of High-Velocity Teams Using Kubernetes
5 Habits of High-Velocity Teams Using Kubernetes5 Habits of High-Velocity Teams Using Kubernetes
5 Habits of High-Velocity Teams Using Kubernetes
 
So you want to write a cloud function
So you want to write a cloud functionSo you want to write a cloud function
So you want to write a cloud function
 

Similar to What's Coming in Apache Airflow 2.0 - PyDataWarsaw 2019

PCF Cloud-Native Workshop Slides
PCF Cloud-Native Workshop SlidesPCF Cloud-Native Workshop Slides
PCF Cloud-Native Workshop Slides
VMware Tanzu
 
Cloud Native Transformation (Alexis Richardson) - Continuous Lifecycle 2018 ...
 Cloud Native Transformation (Alexis Richardson) - Continuous Lifecycle 2018 ... Cloud Native Transformation (Alexis Richardson) - Continuous Lifecycle 2018 ...
Cloud Native Transformation (Alexis Richardson) - Continuous Lifecycle 2018 ...
Weaveworks
 
It's a Breeze to develop Airflow (Cloud Native Warsaw)
It's a Breeze to develop Airflow (Cloud Native Warsaw)It's a Breeze to develop Airflow (Cloud Native Warsaw)
It's a Breeze to develop Airflow (Cloud Native Warsaw)
Jarek Potiuk
 
Brisbane MuleSoft Meetup 2023-03-22 - Anypoint Code Builder and Splunk Loggin...
Brisbane MuleSoft Meetup 2023-03-22 - Anypoint Code Builder and Splunk Loggin...Brisbane MuleSoft Meetup 2023-03-22 - Anypoint Code Builder and Splunk Loggin...
Brisbane MuleSoft Meetup 2023-03-22 - Anypoint Code Builder and Splunk Loggin...
BrianFraser29
 
Webinar: Capabilities, Confidence and Community – What Flux GA Means for You
Webinar: Capabilities, Confidence and Community – What Flux GA Means for YouWebinar: Capabilities, Confidence and Community – What Flux GA Means for You
Webinar: Capabilities, Confidence and Community – What Flux GA Means for You
Weaveworks
 
London-MuleSoft-Meetup-April-19-2023
London-MuleSoft-Meetup-April-19-2023London-MuleSoft-Meetup-April-19-2023
London-MuleSoft-Meetup-April-19-2023
AnuragSharma900
 
OpenNebulaConf2019 - Welcome and Project Update - Ignacio M. Llorente, Rubén ...
OpenNebulaConf2019 - Welcome and Project Update - Ignacio M. Llorente, Rubén ...OpenNebulaConf2019 - Welcome and Project Update - Ignacio M. Llorente, Rubén ...
OpenNebulaConf2019 - Welcome and Project Update - Ignacio M. Llorente, Rubén ...
OpenNebula Project
 
Pivotal + Apigee Workshop (June 4th, 2019)
Pivotal + Apigee Workshop (June 4th, 2019)Pivotal + Apigee Workshop (June 4th, 2019)
Pivotal + Apigee Workshop (June 4th, 2019)
Alexandre Roman
 
APIdays Paris 2019 - Delivering Exceptional User Experience with REST and Gra...
APIdays Paris 2019 - Delivering Exceptional User Experience with REST and Gra...APIdays Paris 2019 - Delivering Exceptional User Experience with REST and Gra...
APIdays Paris 2019 - Delivering Exceptional User Experience with REST and Gra...
apidays
 
London MuleSoft Meetup
London MuleSoft Meetup London MuleSoft Meetup
London MuleSoft Meetup
Akshata Sawant
 
Continuous Lifecycle London 2018 Event Keynote
Continuous Lifecycle London 2018 Event KeynoteContinuous Lifecycle London 2018 Event Keynote
Continuous Lifecycle London 2018 Event Keynote
Weaveworks
 
Understanding the GitOps Workflow and CICD Pipeline - What It Is, Why It Matt...
Understanding the GitOps Workflow and CICD Pipeline - What It Is, Why It Matt...Understanding the GitOps Workflow and CICD Pipeline - What It Is, Why It Matt...
Understanding the GitOps Workflow and CICD Pipeline - What It Is, Why It Matt...
Gibran Badrulzaman
 
How to Scale Operations for a Multi-Cloud Platform using PCF
How to Scale Operations for a Multi-Cloud Platform using PCFHow to Scale Operations for a Multi-Cloud Platform using PCF
How to Scale Operations for a Multi-Cloud Platform using PCF
VMware Tanzu
 
Mule soft meetup__jaipur_december_2020_final
Mule soft meetup__jaipur_december_2020_finalMule soft meetup__jaipur_december_2020_final
Mule soft meetup__jaipur_december_2020_final
Lalit Panwar
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2
 
Breaking the Monolith
Breaking the MonolithBreaking the Monolith
Breaking the Monolith
VMware Tanzu
 
Introduction to DevOps and the Practical Use Cases at Credit OK
Introduction to DevOps and the Practical Use Cases at Credit OKIntroduction to DevOps and the Practical Use Cases at Credit OK
Introduction to DevOps and the Practical Use Cases at Credit OK
Kriangkrai Chaonithi
 
DocDoku: Using web technologies in a desktop application. OW2con'15, November...
DocDoku: Using web technologies in a desktop application. OW2con'15, November...DocDoku: Using web technologies in a desktop application. OW2con'15, November...
DocDoku: Using web technologies in a desktop application. OW2con'15, November...
OW2
 
DocDokuPLM presentation - OW2Con 2015 Community Award winner
DocDokuPLM presentation - OW2Con 2015 Community Award winnerDocDokuPLM presentation - OW2Con 2015 Community Award winner
DocDokuPLM presentation - OW2Con 2015 Community Award winner
DocDoku
 
The Best of Both Worlds: Introducing WSO2 API Manager 4.0.0
The Best of Both Worlds: Introducing WSO2 API Manager 4.0.0The Best of Both Worlds: Introducing WSO2 API Manager 4.0.0
The Best of Both Worlds: Introducing WSO2 API Manager 4.0.0
WSO2
 

Similar to What's Coming in Apache Airflow 2.0 - PyDataWarsaw 2019 (20)

PCF Cloud-Native Workshop Slides
PCF Cloud-Native Workshop SlidesPCF Cloud-Native Workshop Slides
PCF Cloud-Native Workshop Slides
 
Cloud Native Transformation (Alexis Richardson) - Continuous Lifecycle 2018 ...
 Cloud Native Transformation (Alexis Richardson) - Continuous Lifecycle 2018 ... Cloud Native Transformation (Alexis Richardson) - Continuous Lifecycle 2018 ...
Cloud Native Transformation (Alexis Richardson) - Continuous Lifecycle 2018 ...
 
It's a Breeze to develop Airflow (Cloud Native Warsaw)
It's a Breeze to develop Airflow (Cloud Native Warsaw)It's a Breeze to develop Airflow (Cloud Native Warsaw)
It's a Breeze to develop Airflow (Cloud Native Warsaw)
 
Brisbane MuleSoft Meetup 2023-03-22 - Anypoint Code Builder and Splunk Loggin...
Brisbane MuleSoft Meetup 2023-03-22 - Anypoint Code Builder and Splunk Loggin...Brisbane MuleSoft Meetup 2023-03-22 - Anypoint Code Builder and Splunk Loggin...
Brisbane MuleSoft Meetup 2023-03-22 - Anypoint Code Builder and Splunk Loggin...
 
Webinar: Capabilities, Confidence and Community – What Flux GA Means for You
Webinar: Capabilities, Confidence and Community – What Flux GA Means for YouWebinar: Capabilities, Confidence and Community – What Flux GA Means for You
Webinar: Capabilities, Confidence and Community – What Flux GA Means for You
 
London-MuleSoft-Meetup-April-19-2023
London-MuleSoft-Meetup-April-19-2023London-MuleSoft-Meetup-April-19-2023
London-MuleSoft-Meetup-April-19-2023
 
OpenNebulaConf2019 - Welcome and Project Update - Ignacio M. Llorente, Rubén ...
OpenNebulaConf2019 - Welcome and Project Update - Ignacio M. Llorente, Rubén ...OpenNebulaConf2019 - Welcome and Project Update - Ignacio M. Llorente, Rubén ...
OpenNebulaConf2019 - Welcome and Project Update - Ignacio M. Llorente, Rubén ...
 
Pivotal + Apigee Workshop (June 4th, 2019)
Pivotal + Apigee Workshop (June 4th, 2019)Pivotal + Apigee Workshop (June 4th, 2019)
Pivotal + Apigee Workshop (June 4th, 2019)
 
APIdays Paris 2019 - Delivering Exceptional User Experience with REST and Gra...
APIdays Paris 2019 - Delivering Exceptional User Experience with REST and Gra...APIdays Paris 2019 - Delivering Exceptional User Experience with REST and Gra...
APIdays Paris 2019 - Delivering Exceptional User Experience with REST and Gra...
 
London MuleSoft Meetup
London MuleSoft Meetup London MuleSoft Meetup
London MuleSoft Meetup
 
Continuous Lifecycle London 2018 Event Keynote
Continuous Lifecycle London 2018 Event KeynoteContinuous Lifecycle London 2018 Event Keynote
Continuous Lifecycle London 2018 Event Keynote
 
Understanding the GitOps Workflow and CICD Pipeline - What It Is, Why It Matt...
Understanding the GitOps Workflow and CICD Pipeline - What It Is, Why It Matt...Understanding the GitOps Workflow and CICD Pipeline - What It Is, Why It Matt...
Understanding the GitOps Workflow and CICD Pipeline - What It Is, Why It Matt...
 
How to Scale Operations for a Multi-Cloud Platform using PCF
How to Scale Operations for a Multi-Cloud Platform using PCFHow to Scale Operations for a Multi-Cloud Platform using PCF
How to Scale Operations for a Multi-Cloud Platform using PCF
 
Mule soft meetup__jaipur_december_2020_final
Mule soft meetup__jaipur_december_2020_finalMule soft meetup__jaipur_december_2020_final
Mule soft meetup__jaipur_december_2020_final
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
Breaking the Monolith
Breaking the MonolithBreaking the Monolith
Breaking the Monolith
 
Introduction to DevOps and the Practical Use Cases at Credit OK
Introduction to DevOps and the Practical Use Cases at Credit OKIntroduction to DevOps and the Practical Use Cases at Credit OK
Introduction to DevOps and the Practical Use Cases at Credit OK
 
DocDoku: Using web technologies in a desktop application. OW2con'15, November...
DocDoku: Using web technologies in a desktop application. OW2con'15, November...DocDoku: Using web technologies in a desktop application. OW2con'15, November...
DocDoku: Using web technologies in a desktop application. OW2con'15, November...
 
DocDokuPLM presentation - OW2Con 2015 Community Award winner
DocDokuPLM presentation - OW2Con 2015 Community Award winnerDocDokuPLM presentation - OW2Con 2015 Community Award winner
DocDokuPLM presentation - OW2Con 2015 Community Award winner
 
The Best of Both Worlds: Introducing WSO2 API Manager 4.0.0
The Best of Both Worlds: Introducing WSO2 API Manager 4.0.0The Best of Both Worlds: Introducing WSO2 API Manager 4.0.0
The Best of Both Worlds: Introducing WSO2 API Manager 4.0.0
 

More from Jarek Potiuk

Subtle Differences between Python versions
Subtle Differences between Python versionsSubtle Differences between Python versions
Subtle Differences between Python versions
Jarek Potiuk
 
Caching in Docker - the hardest thing in computer science
Caching in Docker - the hardest thing in computer scienceCaching in Docker - the hardest thing in computer science
Caching in Docker - the hardest thing in computer science
Jarek Potiuk
 
Off time - how to use social media to be more out of social media
Off time - how to use social media to be more out of social mediaOff time - how to use social media to be more out of social media
Off time - how to use social media to be more out of social media
Jarek Potiuk
 
Berlin Apache Con EU Airflow Workshops
Berlin Apache Con EU Airflow WorkshopsBerlin Apache Con EU Airflow Workshops
Berlin Apache Con EU Airflow Workshops
Jarek Potiuk
 
Manageable data pipelines with airflow (and kubernetes) november 27, 11 45 ...
Manageable data pipelines with airflow (and kubernetes)   november 27, 11 45 ...Manageable data pipelines with airflow (and kubernetes)   november 27, 11 45 ...
Manageable data pipelines with airflow (and kubernetes) november 27, 11 45 ...
Jarek Potiuk
 
Ci for android OS
Ci for android OSCi for android OS
Ci for android OS
Jarek Potiuk
 
It's a Breeze to develop Apache Airflow (Apache Con Berlin)
It's a Breeze to develop Apache Airflow (Apache Con Berlin)It's a Breeze to develop Apache Airflow (Apache Con Berlin)
It's a Breeze to develop Apache Airflow (Apache Con Berlin)
Jarek Potiuk
 
React native introduction (Mobile Warsaw)
React native introduction (Mobile Warsaw)React native introduction (Mobile Warsaw)
React native introduction (Mobile Warsaw)
Jarek Potiuk
 

More from Jarek Potiuk (8)

Subtle Differences between Python versions
Subtle Differences between Python versionsSubtle Differences between Python versions
Subtle Differences between Python versions
 
Caching in Docker - the hardest thing in computer science
Caching in Docker - the hardest thing in computer scienceCaching in Docker - the hardest thing in computer science
Caching in Docker - the hardest thing in computer science
 
Off time - how to use social media to be more out of social media
Off time - how to use social media to be more out of social mediaOff time - how to use social media to be more out of social media
Off time - how to use social media to be more out of social media
 
Berlin Apache Con EU Airflow Workshops
Berlin Apache Con EU Airflow WorkshopsBerlin Apache Con EU Airflow Workshops
Berlin Apache Con EU Airflow Workshops
 
Manageable data pipelines with airflow (and kubernetes) november 27, 11 45 ...
Manageable data pipelines with airflow (and kubernetes)   november 27, 11 45 ...Manageable data pipelines with airflow (and kubernetes)   november 27, 11 45 ...
Manageable data pipelines with airflow (and kubernetes) november 27, 11 45 ...
 
Ci for android OS
Ci for android OSCi for android OS
Ci for android OS
 
It's a Breeze to develop Apache Airflow (Apache Con Berlin)
It's a Breeze to develop Apache Airflow (Apache Con Berlin)It's a Breeze to develop Apache Airflow (Apache Con Berlin)
It's a Breeze to develop Apache Airflow (Apache Con Berlin)
 
React native introduction (Mobile Warsaw)
React native introduction (Mobile Warsaw)React native introduction (Mobile Warsaw)
React native introduction (Mobile Warsaw)
 

Recently uploaded

Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
UiPathCommunity
 
How to Optimize Call Monitoring: Automate QA and Elevate Customer Experience
How to Optimize Call Monitoring: Automate QA and Elevate Customer ExperienceHow to Optimize Call Monitoring: Automate QA and Elevate Customer Experience
How to Optimize Call Monitoring: Automate QA and Elevate Customer Experience
Aggregage
 
Product Listing Optimization Presentation - Gay De La Cruz.pdf
Product Listing Optimization Presentation - Gay De La Cruz.pdfProduct Listing Optimization Presentation - Gay De La Cruz.pdf
Product Listing Optimization Presentation - Gay De La Cruz.pdf
gaydlc2513
 
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My IdentityCNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
Cynthia Thomas
 
Guidelines for Effective Data Visualization
Guidelines for Effective Data VisualizationGuidelines for Effective Data Visualization
Guidelines for Effective Data Visualization
UmmeSalmaM1
 
intra-mart Accel series 2024 Spring updates_En
intra-mart Accel series 2024 Spring updates_Enintra-mart Accel series 2024 Spring updates_En
intra-mart Accel series 2024 Spring updates_En
NTTDATA INTRAMART
 
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc
 
New ThousandEyes Product Features and Release Highlights: June 2024
New ThousandEyes Product Features and Release Highlights: June 2024New ThousandEyes Product Features and Release Highlights: June 2024
New ThousandEyes Product Features and Release Highlights: June 2024
ThousandEyes
 
CTO Insights: Steering a High-Stakes Database Migration
CTO Insights: Steering a High-Stakes Database MigrationCTO Insights: Steering a High-Stakes Database Migration
CTO Insights: Steering a High-Stakes Database Migration
ScyllaDB
 
Kubernetes Cloud Native Indonesia Meetup - June 2024
Kubernetes Cloud Native Indonesia Meetup - June 2024Kubernetes Cloud Native Indonesia Meetup - June 2024
Kubernetes Cloud Native Indonesia Meetup - June 2024
Prasta Maha
 
Corporate Open Source Anti-Patterns: A Decade Later
Corporate Open Source Anti-Patterns: A Decade LaterCorporate Open Source Anti-Patterns: A Decade Later
Corporate Open Source Anti-Patterns: A Decade Later
ScyllaDB
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
zjhamm304
 
Leveraging AI for Software Developer Productivity.pptx
Leveraging AI for Software Developer Productivity.pptxLeveraging AI for Software Developer Productivity.pptx
Leveraging AI for Software Developer Productivity.pptx
petabridge
 
APJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes WebinarAPJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes Webinar
ThousandEyes
 
Chapter 6 - Test Tools Considerations V4.0
Chapter 6 - Test Tools Considerations V4.0Chapter 6 - Test Tools Considerations V4.0
Chapter 6 - Test Tools Considerations V4.0
Neeraj Kumar Singh
 
Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...
Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...
Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...
anilsa9823
 
Brightwell ILC Futures workshop David Sinclair presentation
Brightwell ILC Futures workshop David Sinclair presentationBrightwell ILC Futures workshop David Sinclair presentation
Brightwell ILC Futures workshop David Sinclair presentation
ILC- UK
 
Supplier Sourcing Presentation - Gay De La Cruz.pdf
Supplier Sourcing Presentation - Gay De La Cruz.pdfSupplier Sourcing Presentation - Gay De La Cruz.pdf
Supplier Sourcing Presentation - Gay De La Cruz.pdf
gaydlc2513
 
Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!
Ortus Solutions, Corp
 
From NCSA to the National Research Platform
From NCSA to the National Research PlatformFrom NCSA to the National Research Platform
From NCSA to the National Research Platform
Larry Smarr
 

Recently uploaded (20)

Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
 
How to Optimize Call Monitoring: Automate QA and Elevate Customer Experience
How to Optimize Call Monitoring: Automate QA and Elevate Customer ExperienceHow to Optimize Call Monitoring: Automate QA and Elevate Customer Experience
How to Optimize Call Monitoring: Automate QA and Elevate Customer Experience
 
Product Listing Optimization Presentation - Gay De La Cruz.pdf
Product Listing Optimization Presentation - Gay De La Cruz.pdfProduct Listing Optimization Presentation - Gay De La Cruz.pdf
Product Listing Optimization Presentation - Gay De La Cruz.pdf
 
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My IdentityCNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
 
Guidelines for Effective Data Visualization
Guidelines for Effective Data VisualizationGuidelines for Effective Data Visualization
Guidelines for Effective Data Visualization
 
intra-mart Accel series 2024 Spring updates_En
intra-mart Accel series 2024 Spring updates_Enintra-mart Accel series 2024 Spring updates_En
intra-mart Accel series 2024 Spring updates_En
 
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
 
New ThousandEyes Product Features and Release Highlights: June 2024
New ThousandEyes Product Features and Release Highlights: June 2024New ThousandEyes Product Features and Release Highlights: June 2024
New ThousandEyes Product Features and Release Highlights: June 2024
 
CTO Insights: Steering a High-Stakes Database Migration
CTO Insights: Steering a High-Stakes Database MigrationCTO Insights: Steering a High-Stakes Database Migration
CTO Insights: Steering a High-Stakes Database Migration
 
Kubernetes Cloud Native Indonesia Meetup - June 2024
Kubernetes Cloud Native Indonesia Meetup - June 2024Kubernetes Cloud Native Indonesia Meetup - June 2024
Kubernetes Cloud Native Indonesia Meetup - June 2024
 
Corporate Open Source Anti-Patterns: A Decade Later
Corporate Open Source Anti-Patterns: A Decade LaterCorporate Open Source Anti-Patterns: A Decade Later
Corporate Open Source Anti-Patterns: A Decade Later
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
 
Leveraging AI for Software Developer Productivity.pptx
Leveraging AI for Software Developer Productivity.pptxLeveraging AI for Software Developer Productivity.pptx
Leveraging AI for Software Developer Productivity.pptx
 
APJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes WebinarAPJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes Webinar
 
Chapter 6 - Test Tools Considerations V4.0
Chapter 6 - Test Tools Considerations V4.0Chapter 6 - Test Tools Considerations V4.0
Chapter 6 - Test Tools Considerations V4.0
 
Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...
Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...
Call Girls Chennai ☎️ +91-7426014248 😍 Chennai Call Girl Beauty Girls Chennai...
 
Brightwell ILC Futures workshop David Sinclair presentation
Brightwell ILC Futures workshop David Sinclair presentationBrightwell ILC Futures workshop David Sinclair presentation
Brightwell ILC Futures workshop David Sinclair presentation
 
Supplier Sourcing Presentation - Gay De La Cruz.pdf
Supplier Sourcing Presentation - Gay De La Cruz.pdfSupplier Sourcing Presentation - Gay De La Cruz.pdf
Supplier Sourcing Presentation - Gay De La Cruz.pdf
 
Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!
 
From NCSA to the National Research Platform
From NCSA to the National Research PlatformFrom NCSA to the National Research Platform
From NCSA to the National Research Platform
 

What's Coming in Apache Airflow 2.0 - PyDataWarsaw 2019

  • 1.
  • 4. Polidea Apache Airflow Airflow is a platform to programmatically author, schedule and monitor workflows. Dynamic/Elegant Extensible Scalable
  • 6. Polidea What is the presentation about ? ● The team @ Polidea ● What the Airflow ? ● Where Apache Airflow is now? ● What’s coming in Apache Airflow 2.0.
  • 8. Polidea Logo or mockup Hi! Jarek Potiuk Principal Software Engineer @Polidea Apache Airflow PMC member Certified GCP Architect ex-Googler, ex-CTO, ex-choir member @higrys
  • 9. Polidea Apache Airflow Development team@ Polidea Jarek Potiuk Kamil Breguła Tomasz Urbaszek Karolina Rosół Dariusz Aniszewski Szymon Przedwojski Antoni Smoliński Tobiasz Kędzierski Michał Słowikowski PMC Past:
  • 10. Polidea Apache Airflow Website team @ Polidea Kamil Breguła Zuzanna Rykowska Kamil Gabryjelski Magdalena WęgrzyńskaMarta StrzałkowskaTomasz Urbaszek
  • 13. Polidea August 2018 2 people Timeline December 2019 6 (9) people
  • 14. Polidea Our tasks ● 130+ operators ● 18+ GCP services ● Oozie-To-Airflow ● New Apache Airflow Website
  • 15. Polidea What we delivered extra ● Documentation improvements ● Breeze - improved dev environment ● Py2 -> Py3 ● Pylint compatibility ● Pre-commit framework introduction ● CI environment reimplemented ● Operator scaffolding ● Convert tests to pytests 2 Apache Airflow Committers Apache Airflow PMC member
  • 18. Polidea Why Apache Airflow and not one of these? And many, many, many more ....
  • 19. Polidea Airflow is an Orchestrator ● Tells others what/when to do ● Synchronizes work between others ● Monitors what’s going on ● Intervenes if needed ● Mostly does not do much
  • 24. Polidea What Airflow shines at ? ● Regular batch ETL jobs (think CRON) ● Processing fixed intervals of data ● Managing complex dependencies ● Backfilling data ● Interfacing to hundreds of different systems ● Platform for others to generate DAG files
  • 26. Polidea Current versions ● 1.10.2, 1.10.3, 1.10.4, 1.10.5, 1.10.6 …. ● 1.10.7 in the making ● Deployed in thousands of companies ● On the rise of usage ● 2.0 - in master
  • 27. Polidea How to stay relevant ? ● Cloud Native is coming ● APIs are backbone of modern software ● User Interface matters ● Performance and reliability matter ● Many services, many changes ● Community over code
  • 28. Polidea End of 2019 survey: 300 responses(!) ● Started by Tomasz Urbaszek ● Run for the last 2 weeks ● Fresh off-the press ● Some surprises found ● Going in the right direction
  • 30. Polidea What do you use Airflow for? Data processing (ETL) 97% Artificial Intelligence and Machine Learning Pipelines 29% Automating DevOps operations 21%
  • 31. Polidea What can be improved ? Scheduler performance 61% Web UI 58% Logging, monitoring and alerting 47% Examples, howtos, onboarding documentation 46% Technical documentation 44% Reliability 36% REST API 31% Authentication and authorization 29%
  • 32. Polidea What would be the most interesting feature for you ? Production-ready docker image 56% Declarative way of writing DAGs 50% Horizontal autoscaling 40% Examples, howtos, onboarding documentation 46% Asynchronous Operators 31% Stateless web server 26% Knative Executor 16% I already have all I need 4%
  • 36. Polidea No - we do not plan to use Kubernetes near term 29% Yes - setup on our own via Helm Chart or similar 21% Not yet - but we use Kubernetes in our organization and we could move 20% Yes - via managed service in the cloud (Composer/Astronomer etc.) 15% Not yet - but we plan to deploy Kubernetes in our organization soon 14% Other 2% Either use or can use Kubernetes in foreseeable future 69% Do not have plans to use Kubernetes 29% Do you use Kubernetes-based deployments for Airflow?
  • 37. Polidea Cloud Native is coming: Scalability ● Knative Executor ● SIG-Knative => SIG Scalability ● Native Airflow Executor (WIP) ● Pub/Sub communication ● Horizontally auto-scalable
  • 38. Polidea Cloud Native is coming: Deployability ● Native worker deployable at different providers ● “As a service” and “on-premises” friendly ● Generic Pub/Sub architecture for communication ● No DB communication between components ● Production-optimised docker image
  • 39. Polidea Cloud Native is coming: Monitoring ● Integrate with standard monitoring tools ● More metrics exported using stats ● Integration with Prometheus on Kubernetes ● Horizontal Scalability approach based on metrics
  • 40. Polidea APIs are backbone of modern software
  • 41. Polidea APIs are taking over the world ● Modern API ● HTTP-based API used by CLI, webserver ● Pub/Sub API for communication Scheduler <> Workers ● Generic APIs - not tied to Kubernetes/other deployment options ● Better Authentication/Authorization ● Opens up multi-tenancy capabilities
  • 43. Polidea Original Airflow Graphical User Interface 97% CLI 40% API (experimental) 20% Custom Own Created UI 8% Which interface(s) of Airflow do you use as part of your current role?
  • 44. Polidea UIs are getting better ● Make UI refresh like it’s 2020 ● Modern design (possibly) ● Use APIs for communication not DB/file access ● Better authentication and authorisation ● Stateless web-server ● Better responsiveness
  • 46. Polidea Performance and reliability is important ● Automated performance testing (CI - targeted) ● Monitoring performance characteristics ● Improve Webserver/Scheduler Performance ● Internal instrumentation and optimisations
  • 48. Polidea Fast evolving services ● Currently operators bound to releases of Airflow ● Migration to 2.0 will take time ● Introducing new approach ○ move operators to new path/namespaces ○ change import paths ○ backporting to 1.10 ○ backportable to 1.10 (!) ○ future: per-provider packaging
  • 51. Polidea Community over code: Documentation ● Google Season of Docs - great programme! ● Onboarding, best practices, architecture, deployment options ● Better, clearer structure ● Both user and developer documentation improved ● Worked with technical writers from India and Russia
  • 52. Polidea Community over code: New website: airflow.apache.org Work sponsored by Google Cloud
  • 53. Polidea Community over code: Development environment ● It’s a Breeze to develop Apache Airflow ● Get your environment up in 10 minutes ● Integration with IDE ● Well documented ● Team-work enabler ● Allows to run and debug DAGs ● Fully debuggable: DebugExecutor - cooperation with Databand.ai
  • 54.
  • 55. Friday, December 13, 2019 5:30 PM to 9:30 PM Polidea Sp. z o.o. Przeskok 2 IV p. · Warsaw https://t.co/TmWdWwfemI First Warsaw Apache Airflow Workshop
  • 58. Prototypes Widely Used UX Design Personal Growth Individuality Trust Beter, Fester Ideation API Design Shipping Building blocks Backend VR Android Learn more Firmware Decision Coffee CollaborationContactCVSave moneyOpen Source Management React Native Testing Team Quality iOS Maintain Security Front End Flutter Task Seamless UXResearchMobile AR
  翻译: