尊敬的 微信汇率:1円 ≈ 0.046078 元 支付宝汇率:1円 ≈ 0.046168元 [退出登录]
SlideShare a Scribd company logo
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Zeppelin + Livy:
Bringing Multi Tenancy
to Interactive Data Analysis
Rohit Choudhary & Jeff Zhang
June 28, 2016
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Web-based notebook
that enables interactive
data analytics.
You can make beautiful
data-driven, interactive
and collaborative
documents with SQL,
Scala and more
What’s Apache Zeppelin?
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Interactive Analysis 1.0 (Spark-shell)
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Interactive Analysis 2.0 (Zeppelin)
Spark Interpreter
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Interactive Analysis 3.0 (Zeppelin + Livy)
Livy Interpreter
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Open Source Activity
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Quick Stats: Zeppelin
 Zeppelin graduated in May 2016 and is now TLP
 Incubated by Apache Foundation, since Dec- 2014
 9 Committers, 120+ contributors, growing list
 1000+ JIRAs filed
 900 PRs via the community
 Zeppelin just got a new friend “R”
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Recent Updates
 Multi-tenancy with Livy
 Generic JDBC Interpreter
– Hive, Phoenix , RedShift
– Postgres, MySql
– Several others
 Notebook Authentication and Authorization
 UI Automation through Selenium
 Security for other interpreters (on its way)
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Usage Patterns & Feedback
 Cluster monitoring, memory analysis
 Telecom data usage, Concert attendees travel patterns
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Upcoming
 GA with HDP 2.5 & Ambari 2.4.0, ETA – End July
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Architecture & Usage
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zeppelin Architecture
Current Interpreter Support
 HDFS
 PySpark, SparkR, Spark
 Hive, Phoenix, SQL
 Shell
 …
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zeppelin Features
Collate/Load
Data
Collate/Load data from existing data sources, load from external
CSVs. i.e. Eureka, Smartsense
Visualize Robust visualization mechanism to visualize data, and enable
insights
Collaborate Notebook base collaboration, export Notebooks, soon to be
added, tagging to Notebook generated data
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Popular Usage Scenarios
Customized
Dashboards
Intended for usage towards customized dashboards for Big Data
clusters
Security
Analytics
Understanding nature of data coming through multiple sources
and analyzing the effects of it
Bio-sciences Medical research companies are interested in using this for their
research
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Bringing Multi-tenancy to Zeppelin
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Multi-Tenancy: Motivation
 Supporting workloads of multiple
customers
 Supporting multiple LOBs (lines of
business), on a single data systems
 Support fine grained audits
 Inability to provision capacity for multiple
user groups
 Inability to Audit user actions, as all jobs
are run via ‘zeppelin’ proxy user
 Inability to share state/data with other
users as well
Objectives Requirements
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zeppelin Livy Interaction
LDAP
Zeppelin
Shiro
Spark
Yarn
Livy
Ispark Group
Interpreter
SPNego: Kerberos Kerberos
Security Across Zeppelin-Livy-Spark
Livy APIs
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Deep dive on Livy
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
What is Livy
Livy ServerLivy Client
Http
Http (RPC)
Http (RPC)
Livy is an open source REST interface for interacting with Spark from
anywhere.
Spark Interactive
Session
SparkContext
Spark Batch
Session
SparkContext
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Why we need Livy with Zeppelin
Reduce the pressure on client machine
Make the job submission/monitoring easy
Customize the job schedule
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Interactive Session – Create Session
2
1
3
4
curl -X POST --data '{"kind": "spark"}' -H "Content-Type: application/json" localhost:8998/sessions
{"state":"starting","proxyUser":”null","id":1,"kind":"spark","log":[]}
Request
Response
Livy Client
Livy Server
Spark Interactive
Session
SparkContext
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Interactive Session – Execute Code
{"id":0,"state":"running","output":null}
Request
Response
curl http://localhost:8998/sessions/0/statements -X POST -H 'Content-Type: application/json' -d
'{"code":"sc.parallelize(0 to 100).sum()"}'
2
1
3
4
Livy Client
Livy Server
Spark Interactive
Session
SparkContext
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
SparkContext Sharing
Livy Server
Client 1
Client 2
Client 3
Session-1
Session-1
Session-2 Session-2
Session-1
SparkSession-1
SparkContext
SparkSession-2
SparkContext
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Livy Security
Client Livy Server
(Impersonation)
Shared SecretSpengo
SparkSession
• Only authorized users can launch spark session / submit code
• Each user can access his own session
• Only Livy server can submit job securely to spark session
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
SPNEGO
Client
(Kerbrose TGT)
Livy Server
(SPENGO enabled)
Simple and Protected GSSAPI Negotiation Mechanism (SPNEGO), often pronounced "spen-go”
It is a GSSAPI "pseudo mechanism" used by client-server software to negotiate the choice of security
technology.
Http Get http://site/a.html
Error 401 Unauthorized
Http Get Request
Authorization: Negotiation
Http Get Request
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Impersonation
Alice
(Kerberos TGT)
Shared Secret
Bob
(Kerberos TGT)
Shared SecretSpengo
Spengo
Livy Server
(super user: livy)
Spark Session
Spark Session
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Shared Secret
1. Livy Server generate secret key
2. Livy Server pass secret key to spark session when launching spark session
3. Use the secret key to communicate with each other
Spark Session
Shared Secret
Livy Server
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Multi Tenant: Zeppelin Demo
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zeppelin Direction
 Workspaces and Collaboration
 Customizable Visualization
– Helium
– Custom, data type based visualization (Geolocation/Maps)
 Enterprise Readiness
– Bring security to all interpreters
– Performance improvements
 Collaboration
 Data Lineage
30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Q & A
31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank You

More Related Content

What's hot

AppDynamics VS New Relic – The Complete Guide
AppDynamics VS New Relic – The Complete GuideAppDynamics VS New Relic – The Complete Guide
AppDynamics VS New Relic – The Complete Guide
Takipi
 
Kafka Streams State Stores Being Persistent
Kafka Streams State Stores Being PersistentKafka Streams State Stores Being Persistent
Kafka Streams State Stores Being Persistent
confluent
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
 
A New View of Your Application Security Program with Snyk and ThreadFix
A New View of Your Application Security Program with Snyk and ThreadFixA New View of Your Application Security Program with Snyk and ThreadFix
A New View of Your Application Security Program with Snyk and ThreadFix
Denim Group
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
 
深入淺出 Amazon Database Migration Service
深入淺出 Amazon Database Migration Service 深入淺出 Amazon Database Migration Service
深入淺出 Amazon Database Migration Service
Amazon Web Services
 
File Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & ParquetFile Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & Parquet
DataWorks Summit/Hadoop Summit
 
New relic
New relicNew relic
New relic
Pablo Molnar
 
Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration Options
Timothy Spann
 
Observability
ObservabilityObservability
Observability
Diego Pacheco
 
OSMC 2022 | The Power of Metrics, Logs & Traces with Open Source by Emil-Andr...
OSMC 2022 | The Power of Metrics, Logs & Traces with Open Source by Emil-Andr...OSMC 2022 | The Power of Metrics, Logs & Traces with Open Source by Emil-Andr...
OSMC 2022 | The Power of Metrics, Logs & Traces with Open Source by Emil-Andr...
NETWAYS
 
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F... Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
Databricks
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
Mihai Criveti
 
Couchbase 101
Couchbase 101 Couchbase 101
Couchbase 101
Dipti Borkar
 
Avoiding Deadlocks: Lessons Learned with Zephyr Health Using Neo4j and MongoD...
Avoiding Deadlocks: Lessons Learned with Zephyr Health Using Neo4j and MongoD...Avoiding Deadlocks: Lessons Learned with Zephyr Health Using Neo4j and MongoD...
Avoiding Deadlocks: Lessons Learned with Zephyr Health Using Neo4j and MongoD...
Neo4j
 
Integrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIntegrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache Flink
Hortonworks
 
Securing Kafka
Securing Kafka Securing Kafka
Securing Kafka
confluent
 
Druid
DruidDruid
Ceph and Openstack in a Nutshell
Ceph and Openstack in a NutshellCeph and Openstack in a Nutshell
Ceph and Openstack in a Nutshell
Karan Singh
 
Apache Knox setup and hive and hdfs Access using KNOX
Apache Knox setup and hive and hdfs Access using KNOXApache Knox setup and hive and hdfs Access using KNOX
Apache Knox setup and hive and hdfs Access using KNOX
Abhishek Mallick
 

What's hot (20)

AppDynamics VS New Relic – The Complete Guide
AppDynamics VS New Relic – The Complete GuideAppDynamics VS New Relic – The Complete Guide
AppDynamics VS New Relic – The Complete Guide
 
Kafka Streams State Stores Being Persistent
Kafka Streams State Stores Being PersistentKafka Streams State Stores Being Persistent
Kafka Streams State Stores Being Persistent
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
A New View of Your Application Security Program with Snyk and ThreadFix
A New View of Your Application Security Program with Snyk and ThreadFixA New View of Your Application Security Program with Snyk and ThreadFix
A New View of Your Application Security Program with Snyk and ThreadFix
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
深入淺出 Amazon Database Migration Service
深入淺出 Amazon Database Migration Service 深入淺出 Amazon Database Migration Service
深入淺出 Amazon Database Migration Service
 
File Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & ParquetFile Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & Parquet
 
New relic
New relicNew relic
New relic
 
Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration Options
 
Observability
ObservabilityObservability
Observability
 
OSMC 2022 | The Power of Metrics, Logs & Traces with Open Source by Emil-Andr...
OSMC 2022 | The Power of Metrics, Logs & Traces with Open Source by Emil-Andr...OSMC 2022 | The Power of Metrics, Logs & Traces with Open Source by Emil-Andr...
OSMC 2022 | The Power of Metrics, Logs & Traces with Open Source by Emil-Andr...
 
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F... Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
 
Couchbase 101
Couchbase 101 Couchbase 101
Couchbase 101
 
Avoiding Deadlocks: Lessons Learned with Zephyr Health Using Neo4j and MongoD...
Avoiding Deadlocks: Lessons Learned with Zephyr Health Using Neo4j and MongoD...Avoiding Deadlocks: Lessons Learned with Zephyr Health Using Neo4j and MongoD...
Avoiding Deadlocks: Lessons Learned with Zephyr Health Using Neo4j and MongoD...
 
Integrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIntegrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache Flink
 
Securing Kafka
Securing Kafka Securing Kafka
Securing Kafka
 
Druid
DruidDruid
Druid
 
Ceph and Openstack in a Nutshell
Ceph and Openstack in a NutshellCeph and Openstack in a Nutshell
Ceph and Openstack in a Nutshell
 
Apache Knox setup and hive and hdfs Access using KNOX
Apache Knox setup and hive and hdfs Access using KNOXApache Knox setup and hive and hdfs Access using KNOX
Apache Knox setup and hive and hdfs Access using KNOX
 

Viewers also liked

Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on HiveFaster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
DataWorks Summit/Hadoop Summit
 
Building a REST Job Server for interactive Spark as a service by Romain Rigau...
Building a REST Job Server for interactive Spark as a service by Romain Rigau...Building a REST Job Server for interactive Spark as a service by Romain Rigau...
Building a REST Job Server for interactive Spark as a service by Romain Rigau...
Spark Summit
 
Workload Automation + Hadoop?
Workload Automation + Hadoop?Workload Automation + Hadoop?
Workload Automation + Hadoop?
DataWorks Summit/Hadoop Summit
 
Effective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant ClustersEffective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant Clusters
DataWorks Summit/Hadoop Summit
 
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ ExpediaBridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
DataWorks Summit/Hadoop Summit
 
SQL and Search with Spark in your browser
SQL and Search with Spark in your browserSQL and Search with Spark in your browser
SQL and Search with Spark in your browser
DataWorks Summit/Hadoop Summit
 
Evolving HDFS to a Generalized Storage Subsystem
Evolving HDFS to a Generalized Storage SubsystemEvolving HDFS to a Generalized Storage Subsystem
Evolving HDFS to a Generalized Storage Subsystem
DataWorks Summit/Hadoop Summit
 
Producing Spark on YARN for ETL
Producing Spark on YARN for ETLProducing Spark on YARN for ETL
Producing Spark on YARN for ETL
DataWorks Summit/Hadoop Summit
 
Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data AnalysisApache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis
DataWorks Summit/Hadoop Summit
 
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeData Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Spark Summit
 
Welcome to Apache Zeppelin Community
Welcome to  Apache Zeppelin CommunityWelcome to  Apache Zeppelin Community
Welcome to Apache Zeppelin Community
Ahyoung Ryu
 
Zeppelin at Twitter
Zeppelin at TwitterZeppelin at Twitter
Zeppelin at Twitter
Prasad Wagle
 
A gentle intro of Apache zeppelin
A gentle intro of Apache zeppelinA gentle intro of Apache zeppelin
A gentle intro of Apache zeppelin
Ahyoung Ryu
 
Enterprise Grade Streaming under 2ms on Hadoop
Enterprise Grade Streaming under 2ms on HadoopEnterprise Grade Streaming under 2ms on Hadoop
Enterprise Grade Streaming under 2ms on Hadoop
DataWorks Summit/Hadoop Summit
 
Apache Zeppelin, Helium and Beyond
Apache Zeppelin, Helium and BeyondApache Zeppelin, Helium and Beyond
Apache Zeppelin, Helium and Beyond
DataWorks Summit/Hadoop Summit
 
Apache zeppelin the missing component for the big data ecosystem
Apache zeppelin the missing component for the big data ecosystemApache zeppelin the missing component for the big data ecosystem
Apache zeppelin the missing component for the big data ecosystem
Duyhai Doan
 
Apache zeppelin 0.7.0 helium
Apache zeppelin 0.7.0   heliumApache zeppelin 0.7.0   helium
Apache zeppelin 0.7.0 helium
Ahyoung Ryu
 
Apache zeppelin, the missing component for the big data ecosystem
Apache zeppelin, the missing component for the big data ecosystemApache zeppelin, the missing component for the big data ecosystem
Apache zeppelin, the missing component for the big data ecosystem
Duyhai Doan
 
Data Preparation of Data Science
Data Preparation of Data ScienceData Preparation of Data Science
Data Preparation of Data Science
DataWorks Summit/Hadoop Summit
 
Large Scale Health Telemetry and Analytics with MQTT, Hadoop and Machine Lear...
Large Scale Health Telemetry and Analytics with MQTT, Hadoop and Machine Lear...Large Scale Health Telemetry and Analytics with MQTT, Hadoop and Machine Lear...
Large Scale Health Telemetry and Analytics with MQTT, Hadoop and Machine Lear...
DataWorks Summit/Hadoop Summit
 

Viewers also liked (20)

Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on HiveFaster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
 
Building a REST Job Server for interactive Spark as a service by Romain Rigau...
Building a REST Job Server for interactive Spark as a service by Romain Rigau...Building a REST Job Server for interactive Spark as a service by Romain Rigau...
Building a REST Job Server for interactive Spark as a service by Romain Rigau...
 
Workload Automation + Hadoop?
Workload Automation + Hadoop?Workload Automation + Hadoop?
Workload Automation + Hadoop?
 
Effective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant ClustersEffective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant Clusters
 
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ ExpediaBridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
 
SQL and Search with Spark in your browser
SQL and Search with Spark in your browserSQL and Search with Spark in your browser
SQL and Search with Spark in your browser
 
Evolving HDFS to a Generalized Storage Subsystem
Evolving HDFS to a Generalized Storage SubsystemEvolving HDFS to a Generalized Storage Subsystem
Evolving HDFS to a Generalized Storage Subsystem
 
Producing Spark on YARN for ETL
Producing Spark on YARN for ETLProducing Spark on YARN for ETL
Producing Spark on YARN for ETL
 
Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data AnalysisApache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis
 
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeData Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
 
Welcome to Apache Zeppelin Community
Welcome to  Apache Zeppelin CommunityWelcome to  Apache Zeppelin Community
Welcome to Apache Zeppelin Community
 
Zeppelin at Twitter
Zeppelin at TwitterZeppelin at Twitter
Zeppelin at Twitter
 
A gentle intro of Apache zeppelin
A gentle intro of Apache zeppelinA gentle intro of Apache zeppelin
A gentle intro of Apache zeppelin
 
Enterprise Grade Streaming under 2ms on Hadoop
Enterprise Grade Streaming under 2ms on HadoopEnterprise Grade Streaming under 2ms on Hadoop
Enterprise Grade Streaming under 2ms on Hadoop
 
Apache Zeppelin, Helium and Beyond
Apache Zeppelin, Helium and BeyondApache Zeppelin, Helium and Beyond
Apache Zeppelin, Helium and Beyond
 
Apache zeppelin the missing component for the big data ecosystem
Apache zeppelin the missing component for the big data ecosystemApache zeppelin the missing component for the big data ecosystem
Apache zeppelin the missing component for the big data ecosystem
 
Apache zeppelin 0.7.0 helium
Apache zeppelin 0.7.0   heliumApache zeppelin 0.7.0   helium
Apache zeppelin 0.7.0 helium
 
Apache zeppelin, the missing component for the big data ecosystem
Apache zeppelin, the missing component for the big data ecosystemApache zeppelin, the missing component for the big data ecosystem
Apache zeppelin, the missing component for the big data ecosystem
 
Data Preparation of Data Science
Data Preparation of Data ScienceData Preparation of Data Science
Data Preparation of Data Science
 
Large Scale Health Telemetry and Analytics with MQTT, Hadoop and Machine Lear...
Large Scale Health Telemetry and Analytics with MQTT, Hadoop and Machine Lear...Large Scale Health Telemetry and Analytics with MQTT, Hadoop and Machine Lear...
Large Scale Health Telemetry and Analytics with MQTT, Hadoop and Machine Lear...
 

Similar to Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis

Connecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFiConnecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFi
DataWorks Summit
 
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFiData at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Aldrin Piri
 
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San JoseDataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Aldrin Piri
 
Apache NiFi Crash Course Intro
Apache NiFi Crash Course IntroApache NiFi Crash Course Intro
Apache NiFi Crash Course Intro
DataWorks Summit/Hadoop Summit
 
Apache NiFi Crash Course - San Jose Hadoop Summit
Apache NiFi Crash Course - San Jose Hadoop SummitApache NiFi Crash Course - San Jose Hadoop Summit
Apache NiFi Crash Course - San Jose Hadoop Summit
Aldrin Piri
 
Mission to NARs with Apache NiFi
Mission to NARs with Apache NiFiMission to NARs with Apache NiFi
Mission to NARs with Apache NiFi
Hortonworks
 
You Can't Search Without Data
You Can't Search Without DataYou Can't Search Without Data
You Can't Search Without Data
Bryan Bende
 
Hadoop in adtech
Hadoop in adtechHadoop in adtech
Hadoop in adtech
Yuta Imai
 
Running Zeppelin in Enterprise
Running Zeppelin in EnterpriseRunning Zeppelin in Enterprise
Running Zeppelin in Enterprise
DataWorks Summit
 
Apache Nifi Crash Course
Apache Nifi Crash CourseApache Nifi Crash Course
Apache Nifi Crash Course
DataWorks Summit
 
Future of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep DiveFuture of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep Dive
Aldrin Piri
 
Running Apache Zeppelin production
Running Apache Zeppelin productionRunning Apache Zeppelin production
Running Apache Zeppelin production
Vinay Shukla
 
Running Zeppelin in Enterprise
Running Zeppelin in EnterpriseRunning Zeppelin in Enterprise
Running Zeppelin in Enterprise
DataWorks Summit
 
Druid: Sub-Second OLAP queries over Petabytes of Streaming Data
Druid: Sub-Second OLAP queries over Petabytes of Streaming DataDruid: Sub-Second OLAP queries over Petabytes of Streaming Data
Druid: Sub-Second OLAP queries over Petabytes of Streaming Data
DataWorks Summit
 
introduction-to-apache-kafka
introduction-to-apache-kafkaintroduction-to-apache-kafka
introduction-to-apache-kafka
Yifeng Jiang
 
Row/Column- Level Security in SQL for Apache Spark
Row/Column- Level Security in SQL for Apache SparkRow/Column- Level Security in SQL for Apache Spark
Row/Column- Level Security in SQL for Apache Spark
DataWorks Summit/Hadoop Summit
 
State of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & CommunityState of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & Community
Accumulo Summit
 
Apache Phoenix Query Server PhoenixCon2016
Apache Phoenix Query Server PhoenixCon2016Apache Phoenix Query Server PhoenixCon2016
Apache Phoenix Query Server PhoenixCon2016
Josh Elser
 
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Pardeep Kumar Mishra (Big Data / Hadoop Consultant)
 
Curb your insecurity with HDP
Curb your insecurity with HDPCurb your insecurity with HDP
Curb your insecurity with HDP
DataWorks Summit/Hadoop Summit
 

Similar to Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis (20)

Connecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFiConnecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFi
 
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFiData at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
 
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San JoseDataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
 
Apache NiFi Crash Course Intro
Apache NiFi Crash Course IntroApache NiFi Crash Course Intro
Apache NiFi Crash Course Intro
 
Apache NiFi Crash Course - San Jose Hadoop Summit
Apache NiFi Crash Course - San Jose Hadoop SummitApache NiFi Crash Course - San Jose Hadoop Summit
Apache NiFi Crash Course - San Jose Hadoop Summit
 
Mission to NARs with Apache NiFi
Mission to NARs with Apache NiFiMission to NARs with Apache NiFi
Mission to NARs with Apache NiFi
 
You Can't Search Without Data
You Can't Search Without DataYou Can't Search Without Data
You Can't Search Without Data
 
Hadoop in adtech
Hadoop in adtechHadoop in adtech
Hadoop in adtech
 
Running Zeppelin in Enterprise
Running Zeppelin in EnterpriseRunning Zeppelin in Enterprise
Running Zeppelin in Enterprise
 
Apache Nifi Crash Course
Apache Nifi Crash CourseApache Nifi Crash Course
Apache Nifi Crash Course
 
Future of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep DiveFuture of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep Dive
 
Running Apache Zeppelin production
Running Apache Zeppelin productionRunning Apache Zeppelin production
Running Apache Zeppelin production
 
Running Zeppelin in Enterprise
Running Zeppelin in EnterpriseRunning Zeppelin in Enterprise
Running Zeppelin in Enterprise
 
Druid: Sub-Second OLAP queries over Petabytes of Streaming Data
Druid: Sub-Second OLAP queries over Petabytes of Streaming DataDruid: Sub-Second OLAP queries over Petabytes of Streaming Data
Druid: Sub-Second OLAP queries over Petabytes of Streaming Data
 
introduction-to-apache-kafka
introduction-to-apache-kafkaintroduction-to-apache-kafka
introduction-to-apache-kafka
 
Row/Column- Level Security in SQL for Apache Spark
Row/Column- Level Security in SQL for Apache SparkRow/Column- Level Security in SQL for Apache Spark
Row/Column- Level Security in SQL for Apache Spark
 
State of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & CommunityState of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & Community
 
Apache Phoenix Query Server PhoenixCon2016
Apache Phoenix Query Server PhoenixCon2016Apache Phoenix Query Server PhoenixCon2016
Apache Phoenix Query Server PhoenixCon2016
 
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
 
Curb your insecurity with HDP
Curb your insecurity with HDPCurb your insecurity with HDP
Curb your insecurity with HDP
 

More from DataWorks Summit/Hadoop Summit

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit/Hadoop Summit
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
DataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
DataWorks Summit/Hadoop Summit
 
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
DataWorks Summit/Hadoop Summit
 

More from DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
 
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
 

Recently uploaded

Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google CloudRadically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
ScyllaDB
 
The Strategy Behind ReversingLabs’ Massive Key-Value Migration
The Strategy Behind ReversingLabs’ Massive Key-Value MigrationThe Strategy Behind ReversingLabs’ Massive Key-Value Migration
The Strategy Behind ReversingLabs’ Massive Key-Value Migration
ScyllaDB
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
zjhamm304
 
Brightwell ILC Futures workshop David Sinclair presentation
Brightwell ILC Futures workshop David Sinclair presentationBrightwell ILC Futures workshop David Sinclair presentation
Brightwell ILC Futures workshop David Sinclair presentation
ILC- UK
 
Kubernetes Cloud Native Indonesia Meetup - June 2024
Kubernetes Cloud Native Indonesia Meetup - June 2024Kubernetes Cloud Native Indonesia Meetup - June 2024
Kubernetes Cloud Native Indonesia Meetup - June 2024
Prasta Maha
 
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc
 
New ThousandEyes Product Features and Release Highlights: June 2024
New ThousandEyes Product Features and Release Highlights: June 2024New ThousandEyes Product Features and Release Highlights: June 2024
New ThousandEyes Product Features and Release Highlights: June 2024
ThousandEyes
 
APJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes WebinarAPJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes Webinar
ThousandEyes
 
Multivendor cloud production with VSF TR-11 - there and back again
Multivendor cloud production with VSF TR-11 - there and back againMultivendor cloud production with VSF TR-11 - there and back again
Multivendor cloud production with VSF TR-11 - there and back again
Kieran Kunhya
 
Elasticity vs. State? Exploring Kafka Streams Cassandra State Store
Elasticity vs. State? Exploring Kafka Streams Cassandra State StoreElasticity vs. State? Exploring Kafka Streams Cassandra State Store
Elasticity vs. State? Exploring Kafka Streams Cassandra State Store
ScyllaDB
 
Database Management Myths for Developers
Database Management Myths for DevelopersDatabase Management Myths for Developers
Database Management Myths for Developers
John Sterrett
 
The "Zen" of Python Exemplars - OTel Community Day
The "Zen" of Python Exemplars - OTel Community DayThe "Zen" of Python Exemplars - OTel Community Day
The "Zen" of Python Exemplars - OTel Community Day
Paige Cruz
 
Day 4 - Excel Automation and Data Manipulation
Day 4 - Excel Automation and Data ManipulationDay 4 - Excel Automation and Data Manipulation
Day 4 - Excel Automation and Data Manipulation
UiPathCommunity
 
Guidelines for Effective Data Visualization
Guidelines for Effective Data VisualizationGuidelines for Effective Data Visualization
Guidelines for Effective Data Visualization
UmmeSalmaM1
 
Automation Student Developers Session 3: Introduction to UI Automation
Automation Student Developers Session 3: Introduction to UI AutomationAutomation Student Developers Session 3: Introduction to UI Automation
Automation Student Developers Session 3: Introduction to UI Automation
UiPathCommunity
 
Introduction to ThousandEyes AMER Webinar
Introduction  to ThousandEyes AMER WebinarIntroduction  to ThousandEyes AMER Webinar
Introduction to ThousandEyes AMER Webinar
ThousandEyes
 
Chapter 5 - Managing Test Activities V4.0
Chapter 5 - Managing Test Activities V4.0Chapter 5 - Managing Test Activities V4.0
Chapter 5 - Managing Test Activities V4.0
Neeraj Kumar Singh
 
Getting Started Using the National Research Platform
Getting Started Using the National Research PlatformGetting Started Using the National Research Platform
Getting Started Using the National Research Platform
Larry Smarr
 
Ubuntu Server CLI cheat sheet 2024 v6.pdf
Ubuntu Server CLI cheat sheet 2024 v6.pdfUbuntu Server CLI cheat sheet 2024 v6.pdf
Ubuntu Server CLI cheat sheet 2024 v6.pdf
TechOnDemandSolution
 
CTO Insights: Steering a High-Stakes Database Migration
CTO Insights: Steering a High-Stakes Database MigrationCTO Insights: Steering a High-Stakes Database Migration
CTO Insights: Steering a High-Stakes Database Migration
ScyllaDB
 

Recently uploaded (20)

Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google CloudRadically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
 
The Strategy Behind ReversingLabs’ Massive Key-Value Migration
The Strategy Behind ReversingLabs’ Massive Key-Value MigrationThe Strategy Behind ReversingLabs’ Massive Key-Value Migration
The Strategy Behind ReversingLabs’ Massive Key-Value Migration
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
 
Brightwell ILC Futures workshop David Sinclair presentation
Brightwell ILC Futures workshop David Sinclair presentationBrightwell ILC Futures workshop David Sinclair presentation
Brightwell ILC Futures workshop David Sinclair presentation
 
Kubernetes Cloud Native Indonesia Meetup - June 2024
Kubernetes Cloud Native Indonesia Meetup - June 2024Kubernetes Cloud Native Indonesia Meetup - June 2024
Kubernetes Cloud Native Indonesia Meetup - June 2024
 
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
 
New ThousandEyes Product Features and Release Highlights: June 2024
New ThousandEyes Product Features and Release Highlights: June 2024New ThousandEyes Product Features and Release Highlights: June 2024
New ThousandEyes Product Features and Release Highlights: June 2024
 
APJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes WebinarAPJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes Webinar
 
Multivendor cloud production with VSF TR-11 - there and back again
Multivendor cloud production with VSF TR-11 - there and back againMultivendor cloud production with VSF TR-11 - there and back again
Multivendor cloud production with VSF TR-11 - there and back again
 
Elasticity vs. State? Exploring Kafka Streams Cassandra State Store
Elasticity vs. State? Exploring Kafka Streams Cassandra State StoreElasticity vs. State? Exploring Kafka Streams Cassandra State Store
Elasticity vs. State? Exploring Kafka Streams Cassandra State Store
 
Database Management Myths for Developers
Database Management Myths for DevelopersDatabase Management Myths for Developers
Database Management Myths for Developers
 
The "Zen" of Python Exemplars - OTel Community Day
The "Zen" of Python Exemplars - OTel Community DayThe "Zen" of Python Exemplars - OTel Community Day
The "Zen" of Python Exemplars - OTel Community Day
 
Day 4 - Excel Automation and Data Manipulation
Day 4 - Excel Automation and Data ManipulationDay 4 - Excel Automation and Data Manipulation
Day 4 - Excel Automation and Data Manipulation
 
Guidelines for Effective Data Visualization
Guidelines for Effective Data VisualizationGuidelines for Effective Data Visualization
Guidelines for Effective Data Visualization
 
Automation Student Developers Session 3: Introduction to UI Automation
Automation Student Developers Session 3: Introduction to UI AutomationAutomation Student Developers Session 3: Introduction to UI Automation
Automation Student Developers Session 3: Introduction to UI Automation
 
Introduction to ThousandEyes AMER Webinar
Introduction  to ThousandEyes AMER WebinarIntroduction  to ThousandEyes AMER Webinar
Introduction to ThousandEyes AMER Webinar
 
Chapter 5 - Managing Test Activities V4.0
Chapter 5 - Managing Test Activities V4.0Chapter 5 - Managing Test Activities V4.0
Chapter 5 - Managing Test Activities V4.0
 
Getting Started Using the National Research Platform
Getting Started Using the National Research PlatformGetting Started Using the National Research Platform
Getting Started Using the National Research Platform
 
Ubuntu Server CLI cheat sheet 2024 v6.pdf
Ubuntu Server CLI cheat sheet 2024 v6.pdfUbuntu Server CLI cheat sheet 2024 v6.pdf
Ubuntu Server CLI cheat sheet 2024 v6.pdf
 
CTO Insights: Steering a High-Stakes Database Migration
CTO Insights: Steering a High-Stakes Database MigrationCTO Insights: Steering a High-Stakes Database Migration
CTO Insights: Steering a High-Stakes Database Migration
 

Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis

  • 1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis Rohit Choudhary & Jeff Zhang June 28, 2016
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Web-based notebook that enables interactive data analytics. You can make beautiful data-driven, interactive and collaborative documents with SQL, Scala and more What’s Apache Zeppelin?
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Interactive Analysis 1.0 (Spark-shell)
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Interactive Analysis 2.0 (Zeppelin) Spark Interpreter
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Interactive Analysis 3.0 (Zeppelin + Livy) Livy Interpreter
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Open Source Activity
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Quick Stats: Zeppelin  Zeppelin graduated in May 2016 and is now TLP  Incubated by Apache Foundation, since Dec- 2014  9 Committers, 120+ contributors, growing list  1000+ JIRAs filed  900 PRs via the community  Zeppelin just got a new friend “R”
  • 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Recent Updates  Multi-tenancy with Livy  Generic JDBC Interpreter – Hive, Phoenix , RedShift – Postgres, MySql – Several others  Notebook Authentication and Authorization  UI Automation through Selenium  Security for other interpreters (on its way)
  • 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Usage Patterns & Feedback  Cluster monitoring, memory analysis  Telecom data usage, Concert attendees travel patterns
  • 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Upcoming  GA with HDP 2.5 & Ambari 2.4.0, ETA – End July
  • 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Architecture & Usage
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Zeppelin Architecture Current Interpreter Support  HDFS  PySpark, SparkR, Spark  Hive, Phoenix, SQL  Shell  …
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Zeppelin Features Collate/Load Data Collate/Load data from existing data sources, load from external CSVs. i.e. Eureka, Smartsense Visualize Robust visualization mechanism to visualize data, and enable insights Collaborate Notebook base collaboration, export Notebooks, soon to be added, tagging to Notebook generated data
  • 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Popular Usage Scenarios Customized Dashboards Intended for usage towards customized dashboards for Big Data clusters Security Analytics Understanding nature of data coming through multiple sources and analyzing the effects of it Bio-sciences Medical research companies are interested in using this for their research
  • 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Bringing Multi-tenancy to Zeppelin
  • 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Multi-Tenancy: Motivation  Supporting workloads of multiple customers  Supporting multiple LOBs (lines of business), on a single data systems  Support fine grained audits  Inability to provision capacity for multiple user groups  Inability to Audit user actions, as all jobs are run via ‘zeppelin’ proxy user  Inability to share state/data with other users as well Objectives Requirements
  • 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Zeppelin Livy Interaction LDAP Zeppelin Shiro Spark Yarn Livy Ispark Group Interpreter SPNego: Kerberos Kerberos Security Across Zeppelin-Livy-Spark Livy APIs
  • 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Deep dive on Livy
  • 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What is Livy Livy ServerLivy Client Http Http (RPC) Http (RPC) Livy is an open source REST interface for interacting with Spark from anywhere. Spark Interactive Session SparkContext Spark Batch Session SparkContext
  • 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Why we need Livy with Zeppelin Reduce the pressure on client machine Make the job submission/monitoring easy Customize the job schedule
  • 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Interactive Session – Create Session 2 1 3 4 curl -X POST --data '{"kind": "spark"}' -H "Content-Type: application/json" localhost:8998/sessions {"state":"starting","proxyUser":”null","id":1,"kind":"spark","log":[]} Request Response Livy Client Livy Server Spark Interactive Session SparkContext
  • 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Interactive Session – Execute Code {"id":0,"state":"running","output":null} Request Response curl http://localhost:8998/sessions/0/statements -X POST -H 'Content-Type: application/json' -d '{"code":"sc.parallelize(0 to 100).sum()"}' 2 1 3 4 Livy Client Livy Server Spark Interactive Session SparkContext
  • 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved SparkContext Sharing Livy Server Client 1 Client 2 Client 3 Session-1 Session-1 Session-2 Session-2 Session-1 SparkSession-1 SparkContext SparkSession-2 SparkContext
  • 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Livy Security Client Livy Server (Impersonation) Shared SecretSpengo SparkSession • Only authorized users can launch spark session / submit code • Each user can access his own session • Only Livy server can submit job securely to spark session
  • 25. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved SPNEGO Client (Kerbrose TGT) Livy Server (SPENGO enabled) Simple and Protected GSSAPI Negotiation Mechanism (SPNEGO), often pronounced "spen-go” It is a GSSAPI "pseudo mechanism" used by client-server software to negotiate the choice of security technology. Http Get http://site/a.html Error 401 Unauthorized Http Get Request Authorization: Negotiation Http Get Request
  • 26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Impersonation Alice (Kerberos TGT) Shared Secret Bob (Kerberos TGT) Shared SecretSpengo Spengo Livy Server (super user: livy) Spark Session Spark Session
  • 27. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Shared Secret 1. Livy Server generate secret key 2. Livy Server pass secret key to spark session when launching spark session 3. Use the secret key to communicate with each other Spark Session Shared Secret Livy Server
  • 28. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Multi Tenant: Zeppelin Demo
  • 29. 29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Zeppelin Direction  Workspaces and Collaboration  Customizable Visualization – Helium – Custom, data type based visualization (Geolocation/Maps)  Enterprise Readiness – Bring security to all interpreters – Performance improvements  Collaboration  Data Lineage
  • 30. 30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Q & A
  • 31. 31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Thank You

Editor's Notes

  1. Missing paradigm
  2. So this is the interactive analysis 1.0 which is spark-shell. I think most of you are familiar with spark-shell if you have some experience of spark. Spark-shell provides a nice interactive environment for coding spark program. But it lacks lots of features for interactive analysis like visualization and code management.
  3. So this is why zeppelin comes. Zeppelin brings lots of nice features to interactive analysis, such as visualization, collaboration and etc. We can call it interactive analysis 2.0. By default Zeppelin use the native spark interpreter which has some limitation, such as not able to run spark in yarn-cluster mode. That means your driver runs on your client machine which may bring heavy pressure on your client machine. And besides that you can’t share spark context across multiple zeppelin instance.
  4. So now we leverage Livy as the spark interpreter of zeppelin. With Livy we can run spark interpreter in yarn cluster, and we can also share spark context across multiple zeppelin instance. We call zeppelin + livy as interactive analysis 3.0
  5. So what is livy ? Livy is an open source REST interface for interacting with Spark from anywhere. Here’s the diagram of the overall architecture of livy. There’s 3 layers, on the most left is the livy client and in the middle is livy server. Livy client commutate with livy server by using rest api. That means they communicate with Http protocol. Livy client can ask livy server to do lots of things, such as launching spark application, submit spark job,pull the job status and even submit one piece of spark code. There’s 2 kinds of spark session that livy support now. One is spark interactive session, another is spark batch session. Since today’s talk is about interactive analysis, so we will focus on the spark interactive session. In livy 0.1 the communication between livy server and spark session is Http, while in the latest code it is changed to RPC
  6. So overall livy is a central place to launching spark jobs. It brings the several benefits for us. First, It reduce the pressure on client machine. Nothing will run on the client machine except calling rest api Second, It makes the job submission/monitoring easy. Without livy you have to install spark on your client machine and use spark-submit to submit spark jobs. While in livy you just need to call the rest api The next is that you can customize the job schedule. Since all the job submission is through the livy-server, livy-server can do the job scheduling (This feature is not implemented yet, but it is possible)
  7. Now let’s talk about how livy works for the interactive session First we will talk about how livy create session. Before you submit any piece of code, you need to create session. Here we use the curl command to invoke the rest api. This is a POST request, and we specify the kind as spark, it can also be pyspark/sparkr, and we also need to specify the url of the rest api And this is the response we get. The response contains the state of the session, here it is starting, the proxyUser is null, Now let’s see how that request is routed. First livy client send request to livy server Then livy server will launch the session After the spark session session is created, it will send back its address to livy server, so that they can establish connection between livy server and spark session And finally livy server will send back the session status to livy client.
  8. Now let’s see how livy execute code Here’s the request we send, it contains the code that we want to execute and we also need to specify the rest api url. And here’s the response which contains the statement id, state, and output. Here we notice that the output is null, because this piece of code won’t finish in in short time, but we can get the output by calling another pull job status request. Now let’s see how this request is routed First livy client send request to livy server Livy server will forward the request to its spark session Spark session will execute the code and send back output to livy server Finally Livy server will send back output to livy client
  9. Now let’s talk about the SparkContext sharing Because clients don’t own the spark session, all the spark sessions are launched by livy server. So that makes the spark context sharing possible. Here we can see that client-1 and client-2 use the same spark session ( session-1). While client-3 use its own session (session-2) When the client interact with the livy server, he need to specify the session id, so as long as they specify the same session id, they are using the same spark context. Of course this is for non-secure mode, it is more complicated for secure mode.
  10. Now let’s talk about the security. Mainly there’s 3 secure problems we need to solve. First we need to make sure that only authorized users launch spark session. We don’t want everyone to launch spark session through livy server Second is that each user can access its own session. Third is only livy server can submit job securely to spark session To resolve these 3 problems we use several technics: spengo, impersonation and shared secret. I will talk about them one by one Spengo is used between livy client and livy server, it can make sure that only authroized users can launch spark session /submit code Impersonation is used to for make sure each user can access his own session. Without impersonation, all the spark session is launched as the user who launch the livy server process, but with impernation, the spark session is launched as the user in the client And the shared secret is used to protect the communication between livy server and spark session, only livy server and spark session know the shared secret
  11. First let’s talk about spengo. Spengo can make sure that only authorized user can launch spark session / submit code to livy server. The full name of spnego is Simple and Protected GSSAPI Negotiation Mechanism (SPNEGO) It is a GSSAPI "pseudo mechanism" used by client-server software to negotiate the choice of security technology. So it is pluggable with the underlying security technology, but most of often it is used with kerbrose. Now let see how that works. First the client will send the request to server Then the server will repponse with status code 401 which means unauthorized And then the client will send the request to server again, but this time it will put the kerborse service ticket information to the request Finally the server will authrozie the user with the ticket info and response with content of the page.
  12. The next thing is impersonation We want to protect each user’s session. We don’t want user Alice to access user bob’s session for security reason. The livy server process is launched by super user livy. Without impersonation all the spark session is launched as user livy, but with impersonation, the spark session can be launched as user of the client. This is very similar to the impersonation in hive server 2. So to enable this impersonation, we need to make the following configuration changes in core-site.xml
  13. The next thing we will talk about is the the share sceret. Once the spark session is started, it can accept request from outside, but we don’t want anyone to connect with the spark session except the livy server So here we use the shared scret to protect the communication between livy server and spark session. Only the livy server and spark session know the shared secret. Now let’s see how that works. Livy Server will generate secret key Livy Server pass secret key to spark session when launching spark Session Then they will use the secret key to communicate with each other
  翻译: