尊敬的 微信汇率:1円 ≈ 0.046215 元 支付宝汇率:1円 ≈ 0.046306元 [退出登录]
SlideShare a Scribd company logo
Zeppelin at Twitter
Prasad Wagle
Technical Lead, Data Platform
twitter.com/prasadwagle
July 13, 2016
SF Data Science Meetup
Galvanize, San Francisco, CA
Twitter Data Pipeline Overview
Production
systems
Presto
Vertica
MySQL
Scalding
Spark
R
Custom Dashboards
Tableau
Zeppelin
Command line tools
HDFS
Analytics Tools
Analytics Front-ends
One company-wide server
860 notes
4000 paragraphs
1500 Vertica, 1500 Presto, 300 MySQL
400 Markdown
300 Scalding, Spark, etc
850 users
Zeppelin Usage Metrics
Field of (Data) Dreams
Started as hackweek project in Dec 2015, Beta: Jan 27
Number of notes created
Feb - 270, Mar - 350, Apr - 470, May - 560, Jun - 750
Report Creators
Product managers (dashboards, product analytics)
Data scientists
Sales analysts
Engineers and SREs
Report Viewers
Anyone in the company
Zeppelin Users
My mind is absolutely blown away by the ease of use, speed, and power of
Zeppelin. I've been wanting a tool like this at Twitter my entire time
working here.
started playing with @ApacheZeppelin. amazingly addictive!
Thanks for all the updates to Zeppelin - Fabric has fallen in love with it fast
(and we're even using it for daily tracking of our OKRs amongst all the
other metrics)
Zeppelin Testimonials
Very easy to create and share reports
Web based
Works seamlessly with analytics engines
JDBC
Non-JDBC - Scalding, Spark
Open source (easy to add features)
Reasons for adoption
Drag and drop report builder
Can create complex queries without SQL knowledge (e.g. Top N)
Polished UI
Filters and other transformations work on extracts
no new database queries (fast)
Row level permissions (for sales reports)
Tableau
Areas:
Security
Stability
Operations
Interpreter
Work Done Before Production
Authentication
Integrated with Twitter’s homegrown single sign-on system
SSL
Integrated with Twitter’s homegrown key distribution system
Notebook authorization
Data source authorization
Work Done (Security)
Websocket deadlock issue with Jetty 8
reduce communication
remove synchronized block (risky, will move to Jetty 9)
Monitoring
Standby server
Backups
Work Done (Stability, Operations)
Scalding
JDBC
Create JDBC connection for every query to avoid Vertica closed
connection issue
Work Done (Interpreter)
Notebook authorization
Data source authorization
Run scheduled notes with a user
Scalding interpreter
Reduce websocket communication
Paragraph footer
Row level permissions
Work Contributed to Apache Project
Stability, Scalability
Jetty9
Interpreters
Scalding, Spark, R
Multiuser scalability, authentication, integration with Twitter
sources
Use JDBC interpreter instead of Hive
Future / Work in Progress
Features and UX
Notebook organization (folders)
Email reports and alerts
Row level permissions like tableau
Operations
Monitoring (end-to-end query)
Admin (view/stop running jobs, resource usage)
Failover
Continuous Integration
Future / Work in Progress
There is a real need for Notebook style interface for data analysis
Zeppelin is enterprise ready, flexible and easy to use
Zeppelin user and dev community is awesome
Takeaways

More Related Content

What's hot

Presto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and FuturePresto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and Future
DataWorks Summit
 
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...
Spark Summit
 
Tuning ML Models: Scaling, Workflows, and Architecture
Tuning ML Models: Scaling, Workflows, and ArchitectureTuning ML Models: Scaling, Workflows, and Architecture
Tuning ML Models: Scaling, Workflows, and Architecture
Databricks
 
Deploying MLlib for Scoring in Structured Streaming with Joseph Bradley
Deploying MLlib for Scoring in Structured Streaming with Joseph BradleyDeploying MLlib for Scoring in Structured Streaming with Joseph Bradley
Deploying MLlib for Scoring in Structured Streaming with Joseph Bradley
Databricks
 
How We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IOHow We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IO
Databricks
 
Building a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakBuilding a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe Crobak
Hakka Labs
 
Structured Streaming Use-Cases at Apple
Structured Streaming Use-Cases at AppleStructured Streaming Use-Cases at Apple
Structured Streaming Use-Cases at Apple
Databricks
 
Apache Spark & Scala
Apache Spark & ScalaApache Spark & Scala
Apache Spark & Scala
Edureka!
 
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
Luke Han
 
Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016
kbajda
 
Building your bi system-HadoopCon Taiwan 2015
Building your bi system-HadoopCon Taiwan 2015Building your bi system-HadoopCon Taiwan 2015
Building your bi system-HadoopCon Taiwan 2015
Bryan Yang
 
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...
Lightbend
 
The Revolution Will be Streamed
The Revolution Will be StreamedThe Revolution Will be Streamed
The Revolution Will be Streamed
Databricks
 
 Kafka Streams VS Spark Structured Streaming - Modern Stream Processing Engin...
 Kafka Streams VS Spark Structured Streaming - Modern Stream Processing Engin... Kafka Streams VS Spark Structured Streaming - Modern Stream Processing Engin...
 Kafka Streams VS Spark Structured Streaming - Modern Stream Processing Engin...
Jacek Laskowski
 
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
Databricks
 
An Introduction to Sparkling Water by Michal Malohlava
An Introduction to Sparkling Water by Michal MalohlavaAn Introduction to Sparkling Water by Michal Malohlava
An Introduction to Sparkling Water by Michal Malohlava
Spark Summit
 
JEEConf 2015 - Introduction to real-time big data with Apache Spark
JEEConf 2015 - Introduction to real-time big data with Apache SparkJEEConf 2015 - Introduction to real-time big data with Apache Spark
JEEConf 2015 - Introduction to real-time big data with Apache Spark
Taras Matyashovsky
 
Tangram: Distributed Scheduling Framework for Apache Spark at Facebook
Tangram: Distributed Scheduling Framework for Apache Spark at FacebookTangram: Distributed Scheduling Framework for Apache Spark at Facebook
Tangram: Distributed Scheduling Framework for Apache Spark at Facebook
Databricks
 
Spark Summit EU talk by Kent Buenaventura and Willaim Lau
Spark Summit EU talk by Kent Buenaventura and Willaim LauSpark Summit EU talk by Kent Buenaventura and Willaim Lau
Spark Summit EU talk by Kent Buenaventura and Willaim Lau
Spark Summit
 
Keynote at spark summit east anjul
Keynote at spark summit east anjulKeynote at spark summit east anjul
Keynote at spark summit east anjul
Anjul Bhambhri
 

What's hot (20)

Presto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and FuturePresto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and Future
 
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...
 
Tuning ML Models: Scaling, Workflows, and Architecture
Tuning ML Models: Scaling, Workflows, and ArchitectureTuning ML Models: Scaling, Workflows, and Architecture
Tuning ML Models: Scaling, Workflows, and Architecture
 
Deploying MLlib for Scoring in Structured Streaming with Joseph Bradley
Deploying MLlib for Scoring in Structured Streaming with Joseph BradleyDeploying MLlib for Scoring in Structured Streaming with Joseph Bradley
Deploying MLlib for Scoring in Structured Streaming with Joseph Bradley
 
How We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IOHow We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IO
 
Building a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakBuilding a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe Crobak
 
Structured Streaming Use-Cases at Apple
Structured Streaming Use-Cases at AppleStructured Streaming Use-Cases at Apple
Structured Streaming Use-Cases at Apple
 
Apache Spark & Scala
Apache Spark & ScalaApache Spark & Scala
Apache Spark & Scala
 
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
 
Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016
 
Building your bi system-HadoopCon Taiwan 2015
Building your bi system-HadoopCon Taiwan 2015Building your bi system-HadoopCon Taiwan 2015
Building your bi system-HadoopCon Taiwan 2015
 
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...
 
The Revolution Will be Streamed
The Revolution Will be StreamedThe Revolution Will be Streamed
The Revolution Will be Streamed
 
 Kafka Streams VS Spark Structured Streaming - Modern Stream Processing Engin...
 Kafka Streams VS Spark Structured Streaming - Modern Stream Processing Engin... Kafka Streams VS Spark Structured Streaming - Modern Stream Processing Engin...
 Kafka Streams VS Spark Structured Streaming - Modern Stream Processing Engin...
 
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
 
An Introduction to Sparkling Water by Michal Malohlava
An Introduction to Sparkling Water by Michal MalohlavaAn Introduction to Sparkling Water by Michal Malohlava
An Introduction to Sparkling Water by Michal Malohlava
 
JEEConf 2015 - Introduction to real-time big data with Apache Spark
JEEConf 2015 - Introduction to real-time big data with Apache SparkJEEConf 2015 - Introduction to real-time big data with Apache Spark
JEEConf 2015 - Introduction to real-time big data with Apache Spark
 
Tangram: Distributed Scheduling Framework for Apache Spark at Facebook
Tangram: Distributed Scheduling Framework for Apache Spark at FacebookTangram: Distributed Scheduling Framework for Apache Spark at Facebook
Tangram: Distributed Scheduling Framework for Apache Spark at Facebook
 
Spark Summit EU talk by Kent Buenaventura and Willaim Lau
Spark Summit EU talk by Kent Buenaventura and Willaim LauSpark Summit EU talk by Kent Buenaventura and Willaim Lau
Spark Summit EU talk by Kent Buenaventura and Willaim Lau
 
Keynote at spark summit east anjul
Keynote at spark summit east anjulKeynote at spark summit east anjul
Keynote at spark summit east anjul
 

Viewers also liked

Explorez vos données avec apache zeppelin
Explorez vos données avec apache zeppelinExplorez vos données avec apache zeppelin
Explorez vos données avec apache zeppelin
Bruno Bonnin
 
사물인터넷 보안 사례 및 대응 방안 2016.11.09
사물인터넷 보안 사례 및 대응 방안   2016.11.09사물인터넷 보안 사례 및 대응 방안   2016.11.09
사물인터넷 보안 사례 및 대응 방안 2016.11.09
Hakyong Kim
 
Health Reform Bulletin 123 | IRS Delays Disclosure Date for 2016 Form 1095
Health Reform Bulletin 123 | IRS Delays Disclosure Date for 2016 Form 1095Health Reform Bulletin 123 | IRS Delays Disclosure Date for 2016 Form 1095
Health Reform Bulletin 123 | IRS Delays Disclosure Date for 2016 Form 1095
CBIZ, Inc.
 
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
Universitat Politècnica de Catalunya
 
Amazon kinesis와 elasticsearch service로 만드는 실시간 데이터 분석 플랫폼 :: 박철수 :: AWS Summi...
Amazon kinesis와 elasticsearch service로 만드는 실시간 데이터 분석 플랫폼 :: 박철수 :: AWS Summi...Amazon kinesis와 elasticsearch service로 만드는 실시간 데이터 분석 플랫폼 :: 박철수 :: AWS Summi...
Amazon kinesis와 elasticsearch service로 만드는 실시간 데이터 분석 플랫폼 :: 박철수 :: AWS Summi...
Amazon Web Services Korea
 
Top 10 retail tech trends 2017
Top 10 retail tech trends   2017Top 10 retail tech trends   2017
Top 10 retail tech trends 2017
ALTEN Calsoft Labs
 
애자일 스크럼과 JIRA
애자일 스크럼과 JIRA 애자일 스크럼과 JIRA
애자일 스크럼과 JIRA
Terry Cho
 
Electricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural NetworksElectricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural Networks
Taegyun Jeon
 
An Overview of AI on the AWS Platform - February 2017 Online Tech Talks
An Overview of AI on the AWS Platform - February 2017 Online Tech TalksAn Overview of AI on the AWS Platform - February 2017 Online Tech Talks
An Overview of AI on the AWS Platform - February 2017 Online Tech Talks
Amazon Web Services
 
Apache flink - prise en main rapide
Apache flink - prise en main rapideApache flink - prise en main rapide
Apache flink - prise en main rapide
Bilal Baltagi
 
Apache flink - retour d'expérience sur la conférence flink forward 2015
Apache flink - retour d'expérience sur la conférence flink forward 2015Apache flink - retour d'expérience sur la conférence flink forward 2015
Apache flink - retour d'expérience sur la conférence flink forward 2015
Bilal Baltagi
 
Explorez vos données présentes dans MongoDB avec Apache Zeppelin
Explorez vos données présentes dans MongoDB avec Apache ZeppelinExplorez vos données présentes dans MongoDB avec Apache Zeppelin
Explorez vos données présentes dans MongoDB avec Apache Zeppelin
Bruno Bonnin
 

Viewers also liked (12)

Explorez vos données avec apache zeppelin
Explorez vos données avec apache zeppelinExplorez vos données avec apache zeppelin
Explorez vos données avec apache zeppelin
 
사물인터넷 보안 사례 및 대응 방안 2016.11.09
사물인터넷 보안 사례 및 대응 방안   2016.11.09사물인터넷 보안 사례 및 대응 방안   2016.11.09
사물인터넷 보안 사례 및 대응 방안 2016.11.09
 
Health Reform Bulletin 123 | IRS Delays Disclosure Date for 2016 Form 1095
Health Reform Bulletin 123 | IRS Delays Disclosure Date for 2016 Form 1095Health Reform Bulletin 123 | IRS Delays Disclosure Date for 2016 Form 1095
Health Reform Bulletin 123 | IRS Delays Disclosure Date for 2016 Form 1095
 
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
 
Amazon kinesis와 elasticsearch service로 만드는 실시간 데이터 분석 플랫폼 :: 박철수 :: AWS Summi...
Amazon kinesis와 elasticsearch service로 만드는 실시간 데이터 분석 플랫폼 :: 박철수 :: AWS Summi...Amazon kinesis와 elasticsearch service로 만드는 실시간 데이터 분석 플랫폼 :: 박철수 :: AWS Summi...
Amazon kinesis와 elasticsearch service로 만드는 실시간 데이터 분석 플랫폼 :: 박철수 :: AWS Summi...
 
Top 10 retail tech trends 2017
Top 10 retail tech trends   2017Top 10 retail tech trends   2017
Top 10 retail tech trends 2017
 
애자일 스크럼과 JIRA
애자일 스크럼과 JIRA 애자일 스크럼과 JIRA
애자일 스크럼과 JIRA
 
Electricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural NetworksElectricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural Networks
 
An Overview of AI on the AWS Platform - February 2017 Online Tech Talks
An Overview of AI on the AWS Platform - February 2017 Online Tech TalksAn Overview of AI on the AWS Platform - February 2017 Online Tech Talks
An Overview of AI on the AWS Platform - February 2017 Online Tech Talks
 
Apache flink - prise en main rapide
Apache flink - prise en main rapideApache flink - prise en main rapide
Apache flink - prise en main rapide
 
Apache flink - retour d'expérience sur la conférence flink forward 2015
Apache flink - retour d'expérience sur la conférence flink forward 2015Apache flink - retour d'expérience sur la conférence flink forward 2015
Apache flink - retour d'expérience sur la conférence flink forward 2015
 
Explorez vos données présentes dans MongoDB avec Apache Zeppelin
Explorez vos données présentes dans MongoDB avec Apache ZeppelinExplorez vos données présentes dans MongoDB avec Apache Zeppelin
Explorez vos données présentes dans MongoDB avec Apache Zeppelin
 

Similar to Zeppelin at twitter (sf data science meetup, july 2016)

Spark + AI Summit 2020 イベント概要
Spark + AI Summit 2020 イベント概要Spark + AI Summit 2020 イベント概要
Spark + AI Summit 2020 イベント概要
Paulo Gutierrez
 
DustinVannoy_DataPipelines_AzureDataConf_Dec22.pdf
DustinVannoy_DataPipelines_AzureDataConf_Dec22.pdfDustinVannoy_DataPipelines_AzureDataConf_Dec22.pdf
DustinVannoy_DataPipelines_AzureDataConf_Dec22.pdf
Dustin Vannoy
 
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 mins
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 minsSparkflows - Build E2E Data Analytics Use Cases in less than 30 mins
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 mins
sparkflows
 
Is there a way that we can build our Azure Synapse Pipelines all with paramet...
Is there a way that we can build our Azure Synapse Pipelines all with paramet...Is there a way that we can build our Azure Synapse Pipelines all with paramet...
Is there a way that we can build our Azure Synapse Pipelines all with paramet...
Erwin de Kreuk
 
Overview SQL Server 2019
Overview SQL Server 2019Overview SQL Server 2019
Overview SQL Server 2019
Juan Fabian
 
IT Summit - Modernizing Enterprise Analytics: the IT Story
IT Summit - Modernizing Enterprise Analytics: the IT StoryIT Summit - Modernizing Enterprise Analytics: the IT Story
IT Summit - Modernizing Enterprise Analytics: the IT Story
Tableau Software
 
Airflow - a data flow engine
Airflow - a data flow engineAirflow - a data flow engine
Airflow - a data flow engine
Walter Liu
 
Real time analytics at uber @ strata data 2019
Real time analytics at uber @ strata data 2019Real time analytics at uber @ strata data 2019
Real time analytics at uber @ strata data 2019
Zhenxiao Luo
 
Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)
Michael Rys
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
James Serra
 
FLiP Into Trino
FLiP Into TrinoFLiP Into Trino
FLiP Into Trino
Timothy Spann
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Databricks
 
Running High-Speed Serverless with nuclio
Running High-Speed Serverless with nuclioRunning High-Speed Serverless with nuclio
Running High-Speed Serverless with nuclio
iguazio
 
An Architect's guide to real time big data systems
An Architect's guide to real time big data systemsAn Architect's guide to real time big data systems
An Architect's guide to real time big data systems
Raja SP
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Databricks
 
Web Development In Oracle APEX
Web Development In Oracle APEXWeb Development In Oracle APEX
Web Development In Oracle APEX
iWare Logic Technologies Pvt. Ltd.
 
PostgreSQL 10; Long Awaited Enterprise Solutions
PostgreSQL 10; Long Awaited Enterprise SolutionsPostgreSQL 10; Long Awaited Enterprise Solutions
PostgreSQL 10; Long Awaited Enterprise Solutions
Julyanto SUTANDANG
 
Airavata_Architecture_xsede16
Airavata_Architecture_xsede16Airavata_Architecture_xsede16
Airavata_Architecture_xsede16
Shameera Rathnayaka
 
Visualizing Big Data in Realtime
Visualizing Big Data in RealtimeVisualizing Big Data in Realtime
Visualizing Big Data in Realtime
DataWorks Summit
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics Frameworks
Slim Baltagi
 

Similar to Zeppelin at twitter (sf data science meetup, july 2016) (20)

Spark + AI Summit 2020 イベント概要
Spark + AI Summit 2020 イベント概要Spark + AI Summit 2020 イベント概要
Spark + AI Summit 2020 イベント概要
 
DustinVannoy_DataPipelines_AzureDataConf_Dec22.pdf
DustinVannoy_DataPipelines_AzureDataConf_Dec22.pdfDustinVannoy_DataPipelines_AzureDataConf_Dec22.pdf
DustinVannoy_DataPipelines_AzureDataConf_Dec22.pdf
 
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 mins
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 minsSparkflows - Build E2E Data Analytics Use Cases in less than 30 mins
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 mins
 
Is there a way that we can build our Azure Synapse Pipelines all with paramet...
Is there a way that we can build our Azure Synapse Pipelines all with paramet...Is there a way that we can build our Azure Synapse Pipelines all with paramet...
Is there a way that we can build our Azure Synapse Pipelines all with paramet...
 
Overview SQL Server 2019
Overview SQL Server 2019Overview SQL Server 2019
Overview SQL Server 2019
 
IT Summit - Modernizing Enterprise Analytics: the IT Story
IT Summit - Modernizing Enterprise Analytics: the IT StoryIT Summit - Modernizing Enterprise Analytics: the IT Story
IT Summit - Modernizing Enterprise Analytics: the IT Story
 
Airflow - a data flow engine
Airflow - a data flow engineAirflow - a data flow engine
Airflow - a data flow engine
 
Real time analytics at uber @ strata data 2019
Real time analytics at uber @ strata data 2019Real time analytics at uber @ strata data 2019
Real time analytics at uber @ strata data 2019
 
Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
 
FLiP Into Trino
FLiP Into TrinoFLiP Into Trino
FLiP Into Trino
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
 
Running High-Speed Serverless with nuclio
Running High-Speed Serverless with nuclioRunning High-Speed Serverless with nuclio
Running High-Speed Serverless with nuclio
 
An Architect's guide to real time big data systems
An Architect's guide to real time big data systemsAn Architect's guide to real time big data systems
An Architect's guide to real time big data systems
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
 
Web Development In Oracle APEX
Web Development In Oracle APEXWeb Development In Oracle APEX
Web Development In Oracle APEX
 
PostgreSQL 10; Long Awaited Enterprise Solutions
PostgreSQL 10; Long Awaited Enterprise SolutionsPostgreSQL 10; Long Awaited Enterprise Solutions
PostgreSQL 10; Long Awaited Enterprise Solutions
 
Airavata_Architecture_xsede16
Airavata_Architecture_xsede16Airavata_Architecture_xsede16
Airavata_Architecture_xsede16
 
Visualizing Big Data in Realtime
Visualizing Big Data in RealtimeVisualizing Big Data in Realtime
Visualizing Big Data in Realtime
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics Frameworks
 

Recently uploaded

ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
ScyllaDB
 
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
AlexanderRichford
 
MongoDB to ScyllaDB: Technical Comparison and the Path to Success
MongoDB to ScyllaDB: Technical Comparison and the Path to SuccessMongoDB to ScyllaDB: Technical Comparison and the Path to Success
MongoDB to ScyllaDB: Technical Comparison and the Path to Success
ScyllaDB
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
"What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w..."What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w...
Fwdays
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
Sease
 
ScyllaDB Real-Time Event Processing with CDC
ScyllaDB Real-Time Event Processing with CDCScyllaDB Real-Time Event Processing with CDC
ScyllaDB Real-Time Event Processing with CDC
ScyllaDB
 
Introduction to ThousandEyes AMER Webinar
Introduction  to ThousandEyes AMER WebinarIntroduction  to ThousandEyes AMER Webinar
Introduction to ThousandEyes AMER Webinar
ThousandEyes
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
Cyber Recovery Wargame
Cyber Recovery WargameCyber Recovery Wargame
Cyber Recovery Wargame
Databarracks
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
Fwdays
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
Fwdays
 
Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!
Ortus Solutions, Corp
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
christinelarrosa
 
From NCSA to the National Research Platform
From NCSA to the National Research PlatformFrom NCSA to the National Research Platform
From NCSA to the National Research Platform
Larry Smarr
 
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Ukraine
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
zjhamm304
 
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdfLee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
leebarnesutopia
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
ScyllaDB
 

Recently uploaded (20)

ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
 
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
 
MongoDB to ScyllaDB: Technical Comparison and the Path to Success
MongoDB to ScyllaDB: Technical Comparison and the Path to SuccessMongoDB to ScyllaDB: Technical Comparison and the Path to Success
MongoDB to ScyllaDB: Technical Comparison and the Path to Success
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
"What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w..."What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w...
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
 
ScyllaDB Real-Time Event Processing with CDC
ScyllaDB Real-Time Event Processing with CDCScyllaDB Real-Time Event Processing with CDC
ScyllaDB Real-Time Event Processing with CDC
 
Introduction to ThousandEyes AMER Webinar
Introduction  to ThousandEyes AMER WebinarIntroduction  to ThousandEyes AMER Webinar
Introduction to ThousandEyes AMER Webinar
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
Cyber Recovery Wargame
Cyber Recovery WargameCyber Recovery Wargame
Cyber Recovery Wargame
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
 
Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
 
From NCSA to the National Research Platform
From NCSA to the National Research PlatformFrom NCSA to the National Research Platform
From NCSA to the National Research Platform
 
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
 
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdfLee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
 

Zeppelin at twitter (sf data science meetup, july 2016)

  • 1. Zeppelin at Twitter Prasad Wagle Technical Lead, Data Platform twitter.com/prasadwagle July 13, 2016 SF Data Science Meetup Galvanize, San Francisco, CA
  • 2. Twitter Data Pipeline Overview Production systems Presto Vertica MySQL Scalding Spark R Custom Dashboards Tableau Zeppelin Command line tools HDFS Analytics Tools Analytics Front-ends
  • 3. One company-wide server 860 notes 4000 paragraphs 1500 Vertica, 1500 Presto, 300 MySQL 400 Markdown 300 Scalding, Spark, etc 850 users Zeppelin Usage Metrics
  • 4. Field of (Data) Dreams Started as hackweek project in Dec 2015, Beta: Jan 27 Number of notes created Feb - 270, Mar - 350, Apr - 470, May - 560, Jun - 750
  • 5. Report Creators Product managers (dashboards, product analytics) Data scientists Sales analysts Engineers and SREs Report Viewers Anyone in the company Zeppelin Users
  • 6. My mind is absolutely blown away by the ease of use, speed, and power of Zeppelin. I've been wanting a tool like this at Twitter my entire time working here. started playing with @ApacheZeppelin. amazingly addictive! Thanks for all the updates to Zeppelin - Fabric has fallen in love with it fast (and we're even using it for daily tracking of our OKRs amongst all the other metrics) Zeppelin Testimonials
  • 7. Very easy to create and share reports Web based Works seamlessly with analytics engines JDBC Non-JDBC - Scalding, Spark Open source (easy to add features) Reasons for adoption
  • 8. Drag and drop report builder Can create complex queries without SQL knowledge (e.g. Top N) Polished UI Filters and other transformations work on extracts no new database queries (fast) Row level permissions (for sales reports) Tableau
  • 10. Authentication Integrated with Twitter’s homegrown single sign-on system SSL Integrated with Twitter’s homegrown key distribution system Notebook authorization Data source authorization Work Done (Security)
  • 11. Websocket deadlock issue with Jetty 8 reduce communication remove synchronized block (risky, will move to Jetty 9) Monitoring Standby server Backups Work Done (Stability, Operations)
  • 12. Scalding JDBC Create JDBC connection for every query to avoid Vertica closed connection issue Work Done (Interpreter)
  • 13. Notebook authorization Data source authorization Run scheduled notes with a user Scalding interpreter Reduce websocket communication Paragraph footer Row level permissions Work Contributed to Apache Project
  • 14. Stability, Scalability Jetty9 Interpreters Scalding, Spark, R Multiuser scalability, authentication, integration with Twitter sources Use JDBC interpreter instead of Hive Future / Work in Progress
  • 15. Features and UX Notebook organization (folders) Email reports and alerts Row level permissions like tableau Operations Monitoring (end-to-end query) Admin (view/stop running jobs, resource usage) Failover Continuous Integration Future / Work in Progress
  • 16. There is a real need for Notebook style interface for data analysis Zeppelin is enterprise ready, flexible and easy to use Zeppelin user and dev community is awesome Takeaways
  翻译: