尊敬的 微信汇率:1円 ≈ 0.046239 元 支付宝汇率:1円 ≈ 0.04633元 [退出登录]
SlideShare a Scribd company logo
© comScore, Inc. Proprietary.
Syncsort & MapR @ comScore
Michael Brown, CTO | July 9th, 2014
© comScore, Inc. Proprietary.© comScore, Inc. Proprietary.
The comScore Story
Analytics for a Digital World™
© comScore, Inc. Proprietary. 3
The Digital World is Complex
V0113
© comScore, Inc. Proprietary. 4
comScore’s Mission
Be the Leader in
Digital Media Analytics.
Measure all forms of
media—content and
advertising—at scale,
across all platforms, in
real-time, globally.
© comScore, Inc. Proprietary. 5
comScore Brings it Together
TabletPC/Mac TV SmartphoneGaming
V0113
© comScore, Inc. Proprietary. 6
comScore is a leading internet technology company that
provides Analytics for a Digital World™
NASDAQ SCOR
Clients 2,400+ Worldwide
Employees 1,200+
Headquarters Reston, Virginia, USA
Global Coverage Measurement from 172 Countries; 44 Markets Reported
Local Presence 32 Locations in 23 Countries
V0113
© comScore, Inc. Proprietary. 7
Providing Analytics For More Than 2,400+ Clients Globally
Media Agencies Telecom/Mobile Financial Retail Travel CPG Health Technology
V0113
© comScore, Inc. Proprietary. 8
Census
Tags & Data Feeds
Panels
PC, iOS, Android
Survey
Non-behavioral elements
Methods
Aggregation
Dictionaries
Taxonomies
Syndicated
Data
Platform
Media Metrix
vCE
Collection Calibration Delivery
Consulting
Analysis
Models
Weighting
Projection
De-Duplication
Attribution
Turning Big Data into Powerful Insight
Client
Analytics
Platform
Digital
Analytix
© comScore, Inc. Proprietary. 9
© comScore, Inc. Proprietary. 10
Panel Heat Map
© comScore, Inc. Proprietary. 11
Average Records Captured per Day (2005-2009)
-
200,000,000
400,000,000
600,000,000
800,000,000
1,000,000,000
1,200,000,000
1,400,000,000
1,600,000,000
1,800,000,000
9/26/2005
10/26/2005
11/26/2005
12/26/2005
1/26/2006
2/26/2006
3/26/2006
4/26/2006
5/26/2006
6/26/2006
7/26/2006
8/26/2006
9/26/2006
10/26/2006
11/26/2006
12/26/2006
1/26/2007
2/26/2007
3/26/2007
4/26/2007
5/26/2007
6/26/2007
7/26/2007
8/26/2007
9/26/2007
10/26/2007
11/26/2007
12/26/2007
1/26/2008
2/26/2008
3/26/2008
4/26/2008
5/26/2008
6/26/2008
7/26/2008
8/26/2008
9/26/2008
10/26/2008
11/26/2008
12/26/2008
1/26/2009
2/26/2009
3/26/2009
© comScore, Inc. Proprietary. 12
CENSUS
Unified Digital Measurement™ (UDM) Establishes Platform For
Panel + Census Data Integration
Adopted by 90% of Top 100 U.S. Media Properties
PANEL
Unified Digital Measurement (UDM)
Patent-Pending Methodology
Global PERSON
Measurement
Global DEVICE
Measurement
V0411
© comScore, Inc. Proprietary. 13
Beacon Heat Map
© comScore, Inc. Proprietary. 14
Monthly Records Collection
Billion
200 Billion
400 Billion
600 Billion
800 Billion
1,000 Billion
1,200 Billion
1,400 Billion
1,600 Billion
1,800 Billion
2,000 Billion
#ofrecords
Beacon Records
Panel Records
Total records collected in June 2014 = 1,726,563,202,649
Total records collected YTD 2014 = 10,037,131,368,475
© comScore, Inc. Proprietary.
DMX @ comScore
© comScore, Inc. Proprietary. 16
DMX use at comScore
Purchased our first 4 licenses in 2000!
We use DMX from Syncsort across hundreds of servers for efficient data
processing and aggregation.
We currently run over 100+ unique jobs every day.
With these jobs we process over 150 billion rows of data through DMX!
Connect
Design
Process Accelerate
© comScore, Inc. Proprietary. 17
Compression w/Sorting
Compress Log Files when processing large volumes of log data
Several advantages to Sorting Data First:
 Reduces the size of the data
 Improves application performance
Examples:
 1 Hour of one source of our data 2,315 GB raw (2.9 billion rows)
 Standard compression of time ordered data is 509 GB (22% of original)
 Standard compression on a sorted set is 324 GB (14% of original)
When applied to all our sources we save
 5.0 TB per day
 155 TB per month
 460 TB per quarter
© comScore, Inc. Proprietary.
Hadoop @ comScore
© comScore, Inc. Proprietary. 19
Why Hadoop?
• comScore built our own distributed
computing stack in 2002.
• In 2009 we decided it was better to leverage
the efforts of the Hadoop community instead
of building our own stack.
• We recognized the benefit of switching to
Hadoop which would allow for seamless
scaling of our infrastructure to meet the
needs of the business.
• Hadoop allows us to add compute, storage
and memory linearly and allows you to
process things at tremendous scale.
• Partnered with SyncSort on their Hadoop
efforts from Oct 2010
• Evaluated the beta of MapR in the fall of 2011
© comScore, Inc. Proprietary. 20
90 Days of Data
1,148
1,919
3,049
4,862
5,084
Trillion
1,000 Trillion
2,000 Trillion
3,000 Trillion
4,000 Trillion
5,000 Trillion
6,000 Trillion
2009 2010 2011 2012 2013 2014 2016
© comScore, Inc. Proprietary. 21
High Level Data Flow
Panel
Census
Custom Code +
ADW
EDW
Delivery
© comScore, Inc. Proprietary. 22
Our Cluster
Production Hadoop Cluster
 400+ nodes: Mix of Dell 720xd, R710 and R510 servers
 Each R720xd has (24x1.2TB drives; 128GB RAM; 32 cores)
 13,800+ total CPUs
 31.6 TB total memory
 8.2 PB total disk space
 Our distro is MapR M5 2.1.3
© comScore, Inc. Proprietary.
Leveraging Partitions from MapR
© comScore, Inc. Proprietary.
© comScore, Inc. Proprietary.
Validation Funnel & Target Effectiveness
© comScore, Inc. Proprietary. 26
Our growth
As our volume has grown we have the following stats:
 Over 683 billion events per month
 Daily Aggregate 1.8 billion
 160 billion aggregate records for 92 days
 146K Campaigns
 Over 50 countries
 We see 15 billion distinct cookies in a month
 We only need to output 26 million rows
© comScore, Inc. Proprietary. 27
Solution to reduce the shuffle
The Problem:
 Most aggregations within comScore can not take advantage of combiners, leading to large shuffles and
job performance issues
The Idea:
 Partition and sort the data by cookie on a daily basis
 Create a custom InputFormat to merge daily partitions for monthly aggregations
© comScore, Inc. Proprietary. 28
Custom Input Format with Map Side Aggregation
CB
Mapper MapperMapperMap Map Map
Reduce ReduceReduce
BA AC
A B C
A B C
Combiner Combiner Combiner
A B C
© comScore, Inc. Proprietary. 29
Risks for Partitioning
Data locality
 Custom InputFormat requires reading blocks of the partitioned data over the network
 This was solved using a feature of the MapR file system. We created volumes and set the chunk size to
zero which guarantees that the data written to a volume will stay on one node
Map failures might result in long run times
 Size of the map inputs is no longer set by block size
 This was solved by creating a large number (10K) of volumes to limit the size of data processed by each
mapper
© comScore, Inc. Proprietary. 30
Partitioning Summary
Benefits:
 A large portion of the aggregation can be completed in the map phase
 Applications can now take advantage of combiners
 Shuffles sizes are minimal
Results:
 Took a job from 35 hours to 3 hours with no hardware changes
© comScore, Inc. Proprietary.
DMX-h @ comScore
© comScore, Inc. Proprietary. 32
Reasons for comScore selecting DMX-h
Performance
• DMX-h as the pluggable sort in Hadoop allows us to increase throughput on
it’s existing platform; this reduces capital and ongoing operational
expenses
• The increase in throughput allows us to also deliver our data more quickly
to our customers. These things make the data more valuable to our clients.
Speed of Development
• The ability to quickly build out applications in the DMX-h GUI allows us to
iterate and respond quicker to the needs of the business.
• The ease of development also allows us to democratize the access to the
Hadoop platform by leveraging a point and click GUI.
© comScore, Inc. Proprietary. 33
Performance - DMx Pluggable Sort Testing Results
First Comparison Run on our Dev Cluster
Pig scripts and called with SyncSort plug in
GroupBy / Distinct Operations
• Counting uniques
• These have large shuffle steps which leads to more data to sort.
• Observed up to a 20% decrease in job runtime
Filter Operations
• Searching for a specific value
• Observed a 5% – 10% decrease in job runtime
• Dependent on type of filter and size of job output
40GB compressed data, base run is 86 min, test run is 68 min; Savings of 20%
Results from 7 Nodes; 56 cores; 433 GB RAM; 28 TB disk; MapR M5 3.0.2; DMX-h 7.12
© comScore, Inc. Proprietary. 34
Speed of Development - POC
We took an existing process that runs in our Hadoop cluster and converted
that to DMX-h to validate the new capabilities.
The existing process:
• Written in 75 lines of Pig with 3 Java UDFs
• Developed in about 25 hours
• Processes 3.5 billion input rows per day
• Takes 35 minutes to run on a daily basis
© comScore, Inc. Proprietary. 35
DMXh-Process
© comScore, Inc. Proprietary. 36
Speed of Development - POC
The new process in DMX-h:
• Developed a new job with 13 tasks
• No Java UDF required
• Runs on the same data and in the same environment.
• Developed in 12 hours.
• Runs in 11 minutes! 1/3 of the time of the Pig & Java code.
© comScore, Inc. Proprietary. 37
Useful Factoids
Visit www.comscoredatamine.com or follow @datagems for the latest gems.
Colorful, bite-sized graphical representations of the best discoveries we unearth.
© comScore, Inc. Proprietary. 38
Thank You!
Michael Brown
CTO
comScore, Inc.
mbrown@comscore.com
© 2014 MapR Technologies 1© 2014 MapR Technologies
© 2014 MapR Technologies 2
Today’s Presenters
Steve Wooledge
VP - Product Marketing
@swooledge
Jorge Lopez
Director - Product Marketing
@zanilli
Mike Brown
CTO
© 2014 MapR Technologies 3© 2014 MapR Technologies
comScore
© comScore, Inc. Proprietary.
Syncsort & MapR @ comScore
• Michael Brown, CTO | July 9th, 2014
© 2014 MapR Technologies 5© 2014 MapR Technologies
Leveraging MapR and Syncsort
© 2014 MapR Technologies 6
Big Data is Overwhelming Traditional Systems
• Mission-critical reliability
• Transaction guarantees
• Deep security
• Real-time performance
• Backup and recovery
• Interactive SQL
• Rich analytics
• Workload management
• Data governance
• Backup and recovery
Enterprise
Data
Architecture
1TRENDTREND
ENTERPRISE
USERS
OPERATIONAL
SYSTEMS
ANALYTICAL
SYSTEMS
PRODUCTION
REQUIREMENTS
PRODUCTION
REQUIREMENTS
OUTSIDE SOURCES
© 2014 MapR Technologies 7
Hadoop: The Disruptive Technology at the Core of Big DataTRENDTREND
JOB TRENDS FROM INDEED.COM
Jan ‘06 Jan ‘12 Jan ‘14Jan ‘07 Jan ‘08 Jan ‘09 Jan ‘10 Jan ‘11 Jan ‘13
2
© 2014 MapR Technologies 8
OPERATIONAL
SYSTEMS
ANALYTICAL
SYSTEMS
ENTERPRISE
USERS
1REALITYREALITY
• Data staging
• Archive
• Data transformation
• Data exploration
• Streaming,
interactions
Hadoop Relieves the Pressure from Enterprise Systems
2 Interoperability
1 Reliability and DR
4
Supports operations
and analytics
3 High performance
Keys for Production Success
© 2014 MapR Technologies 9
FOUNDATION
Architecture Matters for Success2REALITYREALITY
Data protection
& security
High performance
Multi-tenancy
Operational &
Analytical Workloads
Open standards
for integration
NEW APPLICATIONS SLAs TRUSTEDINFORMATION LOWERTCO
© 2014 MapR Technologies 10
The Power of the Open Source Community
ManagementManagement
MapR Data Platform
APACHE HADOOP AND OSS ECOSYSTEM
Security
YARN
Pig
Cascading
Spark
Batch
Spark
Streaming
Storm*
Streaming
HBase
Solr
NoSQL &
Search
Juju
Provisioning
&
coordination
Savannah*
Mahout
MLLib
ML, Graph
GraphX
MapReduce
v1 & v2
EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS
Workflow
& Data
Governance
Tez*
Accumulo*
Hive
Impala
Shark
Drill*
SQL
Sentry* Oozie ZooKeeperSqoop
Knox* WhirrFalcon*Flume
Data
Integration
& Access
HttpFS
Hue
* Certification/support planned for 2014
© 2014 MapR Technologies 11
MapR Distribution for Hadoop
ManagementManagement
MapR Data Platform
APACHE HADOOP AND OSS ECOSYSTEM
Security
YARN
Pig
Cascading
Spark
Batch
Spark
Streaming
Storm*
Streaming
HBase
Solr
NoSQL &
Search
Juju
Provisioning
&
coordination
Savannah*
Mahout
MLLib
ML, Graph
GraphX
MapReduce
v1 & v2
EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS
Workflow
& Data
Governance
Tez*
Accumulo*
Hive
Impala
Shark
Drill*
SQL
Sentry* Oozie ZooKeeperSqoop
Knox* WhirrFalcon*Flume
Data
Integration
& Access
HttpFS
Hue
* Certification/support planned for 2014
• High availability
• Data protection
• Disaster recovery
• Standard file access
• Standard database
access
• Pluggable services
• Broad developer
support
• Enterprise security
authorization
• Wire-level
authentication
• Data governance
• Ability to support
predictive analytics,
real-time database
operations, and
support high arrival
rate data
• Ability to logically
divide a cluster to
support different use
cases, job types,
user groups, and
administrators
• 2X to 7X higher
performance
• Consistent, low
latency
Enterprise-grade Security OperationalPerformance Multi-tenancyInteroperability
© 2014 MapR Technologies 12
MapR: Best Solution for Customer Success
Top Ranked
Exponential
Growth
500+
Customers
Premier
Investors
3X3X bookings Q1 ‘13 – Q1 ‘14
80%80% of accounts expand 3X
90%90% software licenses
<1%<1% lifetime churn
>$1B>$1B in incremental revenue
generated by 1 customer
© 2014 MapR Technologies 13
MapR and Syncsort Reference Architecture
Sources
RELATIONAL,
SAAS,
MAINFRAME
DOCUMENTS,
EMAILS
LOG FILES,
CLICKSTREAMS
BLOGS,
TWEETS,
LINK DATA
DATA MARTS DATA WAREHOUSE
MapR Data Platform
Business
Intelligence /
Visualization
MapR-DB MapR-FS
Batch
(MR, Spark, Hive, Pig,
…)
Interactive
(Impala, Drill, …)
Streaming
(Spark Streaming,
Storm…)
MAPR DISTRIBUTION FOR HADOOP
© 2014 MapR Technologies 14
Do You Know Syncsort?
• Syncsort provides fast, secure, enterprise‐grade 
software spanning “Big Iron to Big Data” 
• Fastest sort technology in the market
• Powering 50% of mainframes’ sort
• A history of innovation
• 25+ issued & pending patents
• Large global customer base
• 12,000+ deployments in 80 countries and serving 87 of 
the Fortune 100
• First‐to‐market, fully integrated approach to Hadoop 
ETL
• Top 7 contributors to Hadoop. Based on number of 
lines of code changed in 2013
Our customers are achieving the impossible, every 
day!
Our customers are achieving the impossible, every 
day!
Key Partners
© 2014 MapR Technologies 15
The Hadoop Challenge
PROCESS
Sort
JoinAggregate Copy
Merge
DISTRIBUTECOLLECT
Most organizations use Hadoop to…
EExtract
TTransform
LLoad
© 2014 MapR Technologies 16
Turning Hadoop into a Feature-rich ETL Solution
Collect
• Broad based connectivity with automated parallelism 
• Best in class mainframe data access & translation
Process & Distribute
• No manual coding. GUI for developing & maintaining MR jobs
• No code generation. Engine runs natively on each node
• Develop & test locally in Windows; run natively on Hadoop
Optimize & Secure
• Faster throughput per node
• Full support for Kerberos & LDAP
• Web‐based monitoring console
• Sort‐work compression for storage savings
DMX‐h 
ETL
Collect Process
& Distribute
Optimize
& Secure
© 2014 MapR Technologies 17
A Roadmap to Hadoop Success
Agile Data 
Exploration & 
Visualization
Next‐gen Analytics
Cheap Storage
Offload Data 
Warehouse
Enabling The
Data‐driven Organization
Solving The Intractable
IT Problem
17
© 2014 MapR Technologies 18
MapR + Syncsort Solutions
Data Warehouse 
Optimization
Click‐stream 
Analysis
Mainframe Offload
Shift ELT Workloads 
to Hadoop
Access, Translate & Analyze 
Mainframe Data with Hadoop
Collect, Process & Analyze More 
Data from Your Website
© 2014 MapR Technologies 19
Q&AEngage with us!
1. Download the MapR Sandbox for Hadoop: www.mapr.com/sandbox
2. Try Syncsort’s Hadoop ETL in the MapR Sandbox: www.syncsort.com/mapr
3. Learn best practices for Hadoop ETL: www.mapr.com/EDH

More Related Content

What's hot

Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...
Ontico
 
Spark meetup - Zoomdata Streaming
Spark meetup  - Zoomdata StreamingSpark meetup  - Zoomdata Streaming
Spark meetup - Zoomdata Streaming
Zoomdata
 
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
MSAdvAnalytics
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapR
Data Con LA
 
Preventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive IndustryPreventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive Industry
DataWorks Summit/Hadoop Summit
 
Solr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for HadoopSolr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for Hadoop
gregchanan
 
Achieving cloud scale with microservices based applications on azure
Achieving cloud scale with microservices based applications on azureAchieving cloud scale with microservices based applications on azure
Achieving cloud scale with microservices based applications on azure
Utkarsh Pandey
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
DataWorks Summit
 
Hadoop Ecosystem at a Glance
Hadoop Ecosystem at a GlanceHadoop Ecosystem at a Glance
Hadoop Ecosystem at a Glance
Neev Technologies
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Big Data Case Study: Fortune 100 Telco
Big Data Case Study: Fortune 100 TelcoBig Data Case Study: Fortune 100 Telco
Big Data Case Study: Fortune 100 Telco
BlueData, Inc.
 
Seamless, Real-Time Data Integration with Connect
Seamless, Real-Time Data Integration with ConnectSeamless, Real-Time Data Integration with Connect
Seamless, Real-Time Data Integration with Connect
Precisely
 
Splice Machine Overview
Splice Machine OverviewSplice Machine Overview
Splice Machine Overview
Kunal Gupta
 
Scaling Deep Learning on Hadoop at LinkedIn
Scaling Deep Learning on Hadoop at LinkedInScaling Deep Learning on Hadoop at LinkedIn
Scaling Deep Learning on Hadoop at LinkedIn
DataWorks Summit
 
Big Data at your Desk with KNIME
Big Data at your Desk with KNIMEBig Data at your Desk with KNIME
Big Data at your Desk with KNIME
DataWorks Summit/Hadoop Summit
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
Shweta Patnaik
 
Big Data Ready Enterprise
Big Data Ready Enterprise Big Data Ready Enterprise
Big Data Ready Enterprise
DataWorks Summit/Hadoop Summit
 
Data warehousing with Hadoop
Data warehousing with HadoopData warehousing with Hadoop
Data warehousing with Hadoop
hadooparchbook
 

What's hot (20)

Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...
 
Spark meetup - Zoomdata Streaming
Spark meetup  - Zoomdata StreamingSpark meetup  - Zoomdata Streaming
Spark meetup - Zoomdata Streaming
 
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapR
 
Preventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive IndustryPreventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive Industry
 
Solr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for HadoopSolr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for Hadoop
 
Achieving cloud scale with microservices based applications on azure
Achieving cloud scale with microservices based applications on azureAchieving cloud scale with microservices based applications on azure
Achieving cloud scale with microservices based applications on azure
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
 
Hadoop Ecosystem at a Glance
Hadoop Ecosystem at a GlanceHadoop Ecosystem at a Glance
Hadoop Ecosystem at a Glance
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Big Data Case Study: Fortune 100 Telco
Big Data Case Study: Fortune 100 TelcoBig Data Case Study: Fortune 100 Telco
Big Data Case Study: Fortune 100 Telco
 
Seamless, Real-Time Data Integration with Connect
Seamless, Real-Time Data Integration with ConnectSeamless, Real-Time Data Integration with Connect
Seamless, Real-Time Data Integration with Connect
 
Splice Machine Overview
Splice Machine OverviewSplice Machine Overview
Splice Machine Overview
 
Scaling Deep Learning on Hadoop at LinkedIn
Scaling Deep Learning on Hadoop at LinkedInScaling Deep Learning on Hadoop at LinkedIn
Scaling Deep Learning on Hadoop at LinkedIn
 
Big Data at your Desk with KNIME
Big Data at your Desk with KNIMEBig Data at your Desk with KNIME
Big Data at your Desk with KNIME
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Big Data Ready Enterprise
Big Data Ready Enterprise Big Data Ready Enterprise
Big Data Ready Enterprise
 
Data warehousing with Hadoop
Data warehousing with HadoopData warehousing with Hadoop
Data warehousing with Hadoop
 

Similar to How to Suceed in Hadoop

Syncsort & comScore Big Data Warehouse Meetup Sept 2013
Syncsort & comScore Big Data Warehouse Meetup Sept 2013Syncsort & comScore Big Data Warehouse Meetup Sept 2013
Syncsort & comScore Big Data Warehouse Meetup Sept 2013
Steven Totman
 
BigData @ comScore
BigData @ comScoreBigData @ comScore
BigData @ comScore
eaiti
 
Using Hadoop
Using HadoopUsing Hadoop
Using Hadoop
eaiti
 
November 2013 HUG: Real-time analytics with in-memory grid
November 2013 HUG: Real-time analytics with in-memory gridNovember 2013 HUG: Real-time analytics with in-memory grid
November 2013 HUG: Real-time analytics with in-memory grid
Yahoo Developer Network
 
Control m customers using big data
Control m customers using big dataControl m customers using big data
Control m customers using big data
Juliette Smit
 
Initiative Based Technology Consulting Case Studies
Initiative Based Technology Consulting Case StudiesInitiative Based Technology Consulting Case Studies
Initiative Based Technology Consulting Case Studies
chanderdw
 
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflowsCloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Yong Feng
 
Using Mainframe Data in the Cloud: Design Once, Deploy Anywhere in a Hybrid W...
Using Mainframe Data in the Cloud: Design Once, Deploy Anywhere in a Hybrid W...Using Mainframe Data in the Cloud: Design Once, Deploy Anywhere in a Hybrid W...
Using Mainframe Data in the Cloud: Design Once, Deploy Anywhere in a Hybrid W...
Precisely
 
IMS01 IMS Keynote
IMS01   IMS KeynoteIMS01   IMS Keynote
IMS01 IMS Keynote
Robert Hain
 
Mainframe Optimization in 2017
Mainframe Optimization in 2017Mainframe Optimization in 2017
Mainframe Optimization in 2017
Precisely
 
Solving enterprise challenges through scale out storage &amp; big compute final
Solving enterprise challenges through scale out storage &amp; big compute finalSolving enterprise challenges through scale out storage &amp; big compute final
Solving enterprise challenges through scale out storage &amp; big compute final
Avere Systems
 
RightScale Roadtrip Boston: Accelerate to Cloud
RightScale Roadtrip Boston: Accelerate to CloudRightScale Roadtrip Boston: Accelerate to Cloud
RightScale Roadtrip Boston: Accelerate to Cloud
RightScale
 
Real-time analysis using an in-memory data grid - Cloud Expo 2013
Real-time analysis using an in-memory data grid - Cloud Expo 2013Real-time analysis using an in-memory data grid - Cloud Expo 2013
Real-time analysis using an in-memory data grid - Cloud Expo 2013
ScaleOut Software
 
Data & Analytics - Session 1 - Big Data Analytics
Data & Analytics - Session 1 -  Big Data AnalyticsData & Analytics - Session 1 -  Big Data Analytics
Data & Analytics - Session 1 - Big Data Analytics
Amazon Web Services
 
Mainframe Optimization in 2017
Mainframe Optimization in 2017Mainframe Optimization in 2017
Mainframe Optimization in 2017
Precisely
 
From Disaster to Recovery: Preparing Your IT for the Unexpected
From Disaster to Recovery: Preparing Your IT for the UnexpectedFrom Disaster to Recovery: Preparing Your IT for the Unexpected
From Disaster to Recovery: Preparing Your IT for the Unexpected
DataCore Software
 
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce PlatformMongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform
MongoDB
 
Learn the new rules of cloud storage
Learn the new rules of cloud storageLearn the new rules of cloud storage
Learn the new rules of cloud storage
Buurst
 
FInal Project - USMx CC605x Cloud Computing for Enterprises - Hugo Aquino
FInal Project - USMx CC605x Cloud Computing for Enterprises - Hugo AquinoFInal Project - USMx CC605x Cloud Computing for Enterprises - Hugo Aquino
FInal Project - USMx CC605x Cloud Computing for Enterprises - Hugo Aquino
Hugo Aquino
 
Going Remote: Running VFX Virtual Workstations
Going Remote: Running VFX Virtual WorkstationsGoing Remote: Running VFX Virtual Workstations
Going Remote: Running VFX Virtual Workstations
Amazon Web Services
 

Similar to How to Suceed in Hadoop (20)

Syncsort & comScore Big Data Warehouse Meetup Sept 2013
Syncsort & comScore Big Data Warehouse Meetup Sept 2013Syncsort & comScore Big Data Warehouse Meetup Sept 2013
Syncsort & comScore Big Data Warehouse Meetup Sept 2013
 
BigData @ comScore
BigData @ comScoreBigData @ comScore
BigData @ comScore
 
Using Hadoop
Using HadoopUsing Hadoop
Using Hadoop
 
November 2013 HUG: Real-time analytics with in-memory grid
November 2013 HUG: Real-time analytics with in-memory gridNovember 2013 HUG: Real-time analytics with in-memory grid
November 2013 HUG: Real-time analytics with in-memory grid
 
Control m customers using big data
Control m customers using big dataControl m customers using big data
Control m customers using big data
 
Initiative Based Technology Consulting Case Studies
Initiative Based Technology Consulting Case StudiesInitiative Based Technology Consulting Case Studies
Initiative Based Technology Consulting Case Studies
 
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflowsCloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
 
Using Mainframe Data in the Cloud: Design Once, Deploy Anywhere in a Hybrid W...
Using Mainframe Data in the Cloud: Design Once, Deploy Anywhere in a Hybrid W...Using Mainframe Data in the Cloud: Design Once, Deploy Anywhere in a Hybrid W...
Using Mainframe Data in the Cloud: Design Once, Deploy Anywhere in a Hybrid W...
 
IMS01 IMS Keynote
IMS01   IMS KeynoteIMS01   IMS Keynote
IMS01 IMS Keynote
 
Mainframe Optimization in 2017
Mainframe Optimization in 2017Mainframe Optimization in 2017
Mainframe Optimization in 2017
 
Solving enterprise challenges through scale out storage &amp; big compute final
Solving enterprise challenges through scale out storage &amp; big compute finalSolving enterprise challenges through scale out storage &amp; big compute final
Solving enterprise challenges through scale out storage &amp; big compute final
 
RightScale Roadtrip Boston: Accelerate to Cloud
RightScale Roadtrip Boston: Accelerate to CloudRightScale Roadtrip Boston: Accelerate to Cloud
RightScale Roadtrip Boston: Accelerate to Cloud
 
Real-time analysis using an in-memory data grid - Cloud Expo 2013
Real-time analysis using an in-memory data grid - Cloud Expo 2013Real-time analysis using an in-memory data grid - Cloud Expo 2013
Real-time analysis using an in-memory data grid - Cloud Expo 2013
 
Data & Analytics - Session 1 - Big Data Analytics
Data & Analytics - Session 1 -  Big Data AnalyticsData & Analytics - Session 1 -  Big Data Analytics
Data & Analytics - Session 1 - Big Data Analytics
 
Mainframe Optimization in 2017
Mainframe Optimization in 2017Mainframe Optimization in 2017
Mainframe Optimization in 2017
 
From Disaster to Recovery: Preparing Your IT for the Unexpected
From Disaster to Recovery: Preparing Your IT for the UnexpectedFrom Disaster to Recovery: Preparing Your IT for the Unexpected
From Disaster to Recovery: Preparing Your IT for the Unexpected
 
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce PlatformMongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform
 
Learn the new rules of cloud storage
Learn the new rules of cloud storageLearn the new rules of cloud storage
Learn the new rules of cloud storage
 
FInal Project - USMx CC605x Cloud Computing for Enterprises - Hugo Aquino
FInal Project - USMx CC605x Cloud Computing for Enterprises - Hugo AquinoFInal Project - USMx CC605x Cloud Computing for Enterprises - Hugo Aquino
FInal Project - USMx CC605x Cloud Computing for Enterprises - Hugo Aquino
 
Going Remote: Running VFX Virtual Workstations
Going Remote: Running VFX Virtual WorkstationsGoing Remote: Running VFX Virtual Workstations
Going Remote: Running VFX Virtual Workstations
 

More from Precisely

Automate Studio Training: Easy Loop Creation for Greater Efficiency.pdf
Automate Studio Training: Easy Loop Creation for Greater Efficiency.pdfAutomate Studio Training: Easy Loop Creation for Greater Efficiency.pdf
Automate Studio Training: Easy Loop Creation for Greater Efficiency.pdf
Precisely
 
Making Your Data and AI Ready for Business Transformation.pdf
Making Your Data and AI Ready for Business Transformation.pdfMaking Your Data and AI Ready for Business Transformation.pdf
Making Your Data and AI Ready for Business Transformation.pdf
Precisely
 
Getting a Deeper Look at Your IBM® Z and IBM i Data in ServiceNow
Getting a Deeper Look at Your IBM® Z and IBM i Data in ServiceNowGetting a Deeper Look at Your IBM® Z and IBM i Data in ServiceNow
Getting a Deeper Look at Your IBM® Z and IBM i Data in ServiceNow
Precisely
 
Predictive Powerhouse - Elevating AI ML Accuracy and Relevance with Third-Par...
Predictive Powerhouse - Elevating AI ML Accuracy and Relevance with Third-Par...Predictive Powerhouse - Elevating AI ML Accuracy and Relevance with Third-Par...
Predictive Powerhouse - Elevating AI ML Accuracy and Relevance with Third-Par...
Precisely
 
Predictive Powerhouse: Elevating AI Accuracy and Relevance with Third-Party Data
Predictive Powerhouse: Elevating AI Accuracy and Relevance with Third-Party DataPredictive Powerhouse: Elevating AI Accuracy and Relevance with Third-Party Data
Predictive Powerhouse: Elevating AI Accuracy and Relevance with Third-Party Data
Precisely
 
Predictive Powerhouse: Elevating AI Accuracy and Relevance with Third-Party Data
Predictive Powerhouse: Elevating AI Accuracy and Relevance with Third-Party DataPredictive Powerhouse: Elevating AI Accuracy and Relevance with Third-Party Data
Predictive Powerhouse: Elevating AI Accuracy and Relevance with Third-Party Data
Precisely
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Precisely
 
信頼できるデータでESGイニシアチブを成功に導く方法.pdf How to drive success with ESG initiatives with...
信頼できるデータでESGイニシアチブを成功に導く方法.pdf How to drive success with ESG initiatives with...信頼できるデータでESGイニシアチブを成功に導く方法.pdf How to drive success with ESG initiatives with...
信頼できるデータでESGイニシアチブを成功に導く方法.pdf How to drive success with ESG initiatives with...
Precisely
 
AI-Ready Data - The Key to Transforming Projects into Production.pptx
AI-Ready Data - The Key to Transforming Projects into Production.pptxAI-Ready Data - The Key to Transforming Projects into Production.pptx
AI-Ready Data - The Key to Transforming Projects into Production.pptx
Precisely
 
Building a Multi-Layered Defense for Your IBM i Security
Building a Multi-Layered Defense for Your IBM i SecurityBuilding a Multi-Layered Defense for Your IBM i Security
Building a Multi-Layered Defense for Your IBM i Security
Precisely
 
Optimierte Daten und Prozesse mit KI / ML + SAP Fiori.pdf
Optimierte Daten und Prozesse mit KI / ML + SAP Fiori.pdfOptimierte Daten und Prozesse mit KI / ML + SAP Fiori.pdf
Optimierte Daten und Prozesse mit KI / ML + SAP Fiori.pdf
Precisely
 
Chaining, Looping, and Long Text for Script Development and Automation.pdf
Chaining, Looping, and Long Text for Script Development and Automation.pdfChaining, Looping, and Long Text for Script Development and Automation.pdf
Chaining, Looping, and Long Text for Script Development and Automation.pdf
Precisely
 
Revolutionizing SAP® Processes with Automation and Artificial Intelligence
Revolutionizing SAP® Processes with Automation and Artificial IntelligenceRevolutionizing SAP® Processes with Automation and Artificial Intelligence
Revolutionizing SAP® Processes with Automation and Artificial Intelligence
Precisely
 
Navigating the Cloud: Best Practices for Successful Migration
Navigating the Cloud: Best Practices for Successful MigrationNavigating the Cloud: Best Practices for Successful Migration
Navigating the Cloud: Best Practices for Successful Migration
Precisely
 
Unlocking the Power of Your IBM i and Z Security Data with Google Chronicle
Unlocking the Power of Your IBM i and Z Security Data with Google ChronicleUnlocking the Power of Your IBM i and Z Security Data with Google Chronicle
Unlocking the Power of Your IBM i and Z Security Data with Google Chronicle
Precisely
 
How to Build Data Governance Programs That Last - A Business-First Approach.pdf
How to Build Data Governance Programs That Last - A Business-First Approach.pdfHow to Build Data Governance Programs That Last - A Business-First Approach.pdf
How to Build Data Governance Programs That Last - A Business-First Approach.pdf
Precisely
 
Zukuntssichere SAP Prozesse dank automatisierter Massendaten
Zukuntssichere SAP Prozesse dank automatisierter MassendatenZukuntssichere SAP Prozesse dank automatisierter Massendaten
Zukuntssichere SAP Prozesse dank automatisierter Massendaten
Precisely
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
Precisely
 
Crucial Considerations for AI-ready Data.pdf
Crucial Considerations for AI-ready Data.pdfCrucial Considerations for AI-ready Data.pdf
Crucial Considerations for AI-ready Data.pdf
Precisely
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Precisely
 

More from Precisely (20)

Automate Studio Training: Easy Loop Creation for Greater Efficiency.pdf
Automate Studio Training: Easy Loop Creation for Greater Efficiency.pdfAutomate Studio Training: Easy Loop Creation for Greater Efficiency.pdf
Automate Studio Training: Easy Loop Creation for Greater Efficiency.pdf
 
Making Your Data and AI Ready for Business Transformation.pdf
Making Your Data and AI Ready for Business Transformation.pdfMaking Your Data and AI Ready for Business Transformation.pdf
Making Your Data and AI Ready for Business Transformation.pdf
 
Getting a Deeper Look at Your IBM® Z and IBM i Data in ServiceNow
Getting a Deeper Look at Your IBM® Z and IBM i Data in ServiceNowGetting a Deeper Look at Your IBM® Z and IBM i Data in ServiceNow
Getting a Deeper Look at Your IBM® Z and IBM i Data in ServiceNow
 
Predictive Powerhouse - Elevating AI ML Accuracy and Relevance with Third-Par...
Predictive Powerhouse - Elevating AI ML Accuracy and Relevance with Third-Par...Predictive Powerhouse - Elevating AI ML Accuracy and Relevance with Third-Par...
Predictive Powerhouse - Elevating AI ML Accuracy and Relevance with Third-Par...
 
Predictive Powerhouse: Elevating AI Accuracy and Relevance with Third-Party Data
Predictive Powerhouse: Elevating AI Accuracy and Relevance with Third-Party DataPredictive Powerhouse: Elevating AI Accuracy and Relevance with Third-Party Data
Predictive Powerhouse: Elevating AI Accuracy and Relevance with Third-Party Data
 
Predictive Powerhouse: Elevating AI Accuracy and Relevance with Third-Party Data
Predictive Powerhouse: Elevating AI Accuracy and Relevance with Third-Party DataPredictive Powerhouse: Elevating AI Accuracy and Relevance with Third-Party Data
Predictive Powerhouse: Elevating AI Accuracy and Relevance with Third-Party Data
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
 
信頼できるデータでESGイニシアチブを成功に導く方法.pdf How to drive success with ESG initiatives with...
信頼できるデータでESGイニシアチブを成功に導く方法.pdf How to drive success with ESG initiatives with...信頼できるデータでESGイニシアチブを成功に導く方法.pdf How to drive success with ESG initiatives with...
信頼できるデータでESGイニシアチブを成功に導く方法.pdf How to drive success with ESG initiatives with...
 
AI-Ready Data - The Key to Transforming Projects into Production.pptx
AI-Ready Data - The Key to Transforming Projects into Production.pptxAI-Ready Data - The Key to Transforming Projects into Production.pptx
AI-Ready Data - The Key to Transforming Projects into Production.pptx
 
Building a Multi-Layered Defense for Your IBM i Security
Building a Multi-Layered Defense for Your IBM i SecurityBuilding a Multi-Layered Defense for Your IBM i Security
Building a Multi-Layered Defense for Your IBM i Security
 
Optimierte Daten und Prozesse mit KI / ML + SAP Fiori.pdf
Optimierte Daten und Prozesse mit KI / ML + SAP Fiori.pdfOptimierte Daten und Prozesse mit KI / ML + SAP Fiori.pdf
Optimierte Daten und Prozesse mit KI / ML + SAP Fiori.pdf
 
Chaining, Looping, and Long Text for Script Development and Automation.pdf
Chaining, Looping, and Long Text for Script Development and Automation.pdfChaining, Looping, and Long Text for Script Development and Automation.pdf
Chaining, Looping, and Long Text for Script Development and Automation.pdf
 
Revolutionizing SAP® Processes with Automation and Artificial Intelligence
Revolutionizing SAP® Processes with Automation and Artificial IntelligenceRevolutionizing SAP® Processes with Automation and Artificial Intelligence
Revolutionizing SAP® Processes with Automation and Artificial Intelligence
 
Navigating the Cloud: Best Practices for Successful Migration
Navigating the Cloud: Best Practices for Successful MigrationNavigating the Cloud: Best Practices for Successful Migration
Navigating the Cloud: Best Practices for Successful Migration
 
Unlocking the Power of Your IBM i and Z Security Data with Google Chronicle
Unlocking the Power of Your IBM i and Z Security Data with Google ChronicleUnlocking the Power of Your IBM i and Z Security Data with Google Chronicle
Unlocking the Power of Your IBM i and Z Security Data with Google Chronicle
 
How to Build Data Governance Programs That Last - A Business-First Approach.pdf
How to Build Data Governance Programs That Last - A Business-First Approach.pdfHow to Build Data Governance Programs That Last - A Business-First Approach.pdf
How to Build Data Governance Programs That Last - A Business-First Approach.pdf
 
Zukuntssichere SAP Prozesse dank automatisierter Massendaten
Zukuntssichere SAP Prozesse dank automatisierter MassendatenZukuntssichere SAP Prozesse dank automatisierter Massendaten
Zukuntssichere SAP Prozesse dank automatisierter Massendaten
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Crucial Considerations for AI-ready Data.pdf
Crucial Considerations for AI-ready Data.pdfCrucial Considerations for AI-ready Data.pdf
Crucial Considerations for AI-ready Data.pdf
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 

Recently uploaded

Beginner's Guide to Observability@Devoxx PL 2024
Beginner's  Guide to Observability@Devoxx PL 2024Beginner's  Guide to Observability@Devoxx PL 2024
Beginner's Guide to Observability@Devoxx PL 2024
michniczscribd
 
Erotic Call Girls Bangalore🫱9079923931🫲 High Quality Call Girl Service Right ...
Erotic Call Girls Bangalore🫱9079923931🫲 High Quality Call Girl Service Right ...Erotic Call Girls Bangalore🫱9079923931🫲 High Quality Call Girl Service Right ...
Erotic Call Girls Bangalore🫱9079923931🫲 High Quality Call Girl Service Right ...
meenusingh4354543
 
Software Test Automation - A Comprehensive Guide on Automated Testing.pdf
Software Test Automation - A Comprehensive Guide on Automated Testing.pdfSoftware Test Automation - A Comprehensive Guide on Automated Testing.pdf
Software Test Automation - A Comprehensive Guide on Automated Testing.pdf
kalichargn70th171
 
Going AOT: Everything you need to know about GraalVM for Java applications
Going AOT: Everything you need to know about GraalVM for Java applicationsGoing AOT: Everything you need to know about GraalVM for Java applications
Going AOT: Everything you need to know about GraalVM for Java applications
Alina Yurenko
 
Call Girls Solapur ☎️ +91-7426014248 😍 Solapur Call Girl Beauty Girls Solapur...
Call Girls Solapur ☎️ +91-7426014248 😍 Solapur Call Girl Beauty Girls Solapur...Call Girls Solapur ☎️ +91-7426014248 😍 Solapur Call Girl Beauty Girls Solapur...
Call Girls Solapur ☎️ +91-7426014248 😍 Solapur Call Girl Beauty Girls Solapur...
anshsharma8761
 
Hyperledger Besu 빨리 따라하기 (Private Networks)
Hyperledger Besu 빨리 따라하기 (Private Networks)Hyperledger Besu 빨리 따라하기 (Private Networks)
Hyperledger Besu 빨리 따라하기 (Private Networks)
wonyong hwang
 
Introducing Claris FileMaker 2024: presented by DB Services
Introducing Claris FileMaker 2024: presented by DB ServicesIntroducing Claris FileMaker 2024: presented by DB Services
Introducing Claris FileMaker 2024: presented by DB Services
DB Services
 
Call Girls Bangalore🔥7023059433🔥Best Profile Escorts in Bangalore Available 24/7
Call Girls Bangalore🔥7023059433🔥Best Profile Escorts in Bangalore Available 24/7Call Girls Bangalore🔥7023059433🔥Best Profile Escorts in Bangalore Available 24/7
Call Girls Bangalore🔥7023059433🔥Best Profile Escorts in Bangalore Available 24/7
manji sharman06
 
Ensuring Efficiency and Speed with Practical Solutions for Clinical Operations
Ensuring Efficiency and Speed with Practical Solutions for Clinical OperationsEnsuring Efficiency and Speed with Practical Solutions for Clinical Operations
Ensuring Efficiency and Speed with Practical Solutions for Clinical Operations
OnePlan Solutions
 
The Ultimate Guide to Top 36 DevOps Testing Tools for 2024.pdf
The Ultimate Guide to Top 36 DevOps Testing Tools for 2024.pdfThe Ultimate Guide to Top 36 DevOps Testing Tools for 2024.pdf
The Ultimate Guide to Top 36 DevOps Testing Tools for 2024.pdf
kalichargn70th171
 
Folding Cheat Sheet #5 - fifth in a series
Folding Cheat Sheet #5 - fifth in a seriesFolding Cheat Sheet #5 - fifth in a series
Folding Cheat Sheet #5 - fifth in a series
Philip Schwarz
 
Microsoft-Power-Platform-Adoption-Planning.pptx
Microsoft-Power-Platform-Adoption-Planning.pptxMicrosoft-Power-Platform-Adoption-Planning.pptx
Microsoft-Power-Platform-Adoption-Planning.pptx
jrodriguezq3110
 
What’s new in VictoriaMetrics - Q2 2024 Update
What’s new in VictoriaMetrics - Q2 2024 UpdateWhat’s new in VictoriaMetrics - Q2 2024 Update
What’s new in VictoriaMetrics - Q2 2024 Update
VictoriaMetrics
 
Happy Birthday Kubernetes, 10th Birthday edition of Kubernetes Birthday in Au...
Happy Birthday Kubernetes, 10th Birthday edition of Kubernetes Birthday in Au...Happy Birthday Kubernetes, 10th Birthday edition of Kubernetes Birthday in Au...
Happy Birthday Kubernetes, 10th Birthday edition of Kubernetes Birthday in Au...
Chad Crowell
 
Devops Tools Pratical Preparatório LPI
Devops Tools Pratical   Preparatório LPIDevops Tools Pratical   Preparatório LPI
Devops Tools Pratical Preparatório LPI
DborahDmaris
 
Refactoring legacy systems using events commands and bubble contexts
Refactoring legacy systems using events commands and bubble contextsRefactoring legacy systems using events commands and bubble contexts
Refactoring legacy systems using events commands and bubble contexts
Michał Kurzeja
 
Trailhead Talks_ Journey of an All-Star Ranger .pptx
Trailhead Talks_ Journey of an All-Star Ranger .pptxTrailhead Talks_ Journey of an All-Star Ranger .pptx
Trailhead Talks_ Journey of an All-Star Ranger .pptx
ImtiazBinMohiuddin
 
How GenAI Can Improve Supplier Performance Management.pdf
How GenAI Can Improve Supplier Performance Management.pdfHow GenAI Can Improve Supplier Performance Management.pdf
How GenAI Can Improve Supplier Performance Management.pdf
Zycus
 
Folding Cheat Sheet #6 - sixth in a series
Folding Cheat Sheet #6 - sixth in a seriesFolding Cheat Sheet #6 - sixth in a series
Folding Cheat Sheet #6 - sixth in a series
Philip Schwarz
 
🔥 Kolkata Call Girls  👉 9079923931 👫 High Profile Call Girls Whatsapp Number ...
🔥 Kolkata Call Girls  👉 9079923931 👫 High Profile Call Girls Whatsapp Number ...🔥 Kolkata Call Girls  👉 9079923931 👫 High Profile Call Girls Whatsapp Number ...
🔥 Kolkata Call Girls  👉 9079923931 👫 High Profile Call Girls Whatsapp Number ...
tinakumariji156
 

Recently uploaded (20)

Beginner's Guide to Observability@Devoxx PL 2024
Beginner's  Guide to Observability@Devoxx PL 2024Beginner's  Guide to Observability@Devoxx PL 2024
Beginner's Guide to Observability@Devoxx PL 2024
 
Erotic Call Girls Bangalore🫱9079923931🫲 High Quality Call Girl Service Right ...
Erotic Call Girls Bangalore🫱9079923931🫲 High Quality Call Girl Service Right ...Erotic Call Girls Bangalore🫱9079923931🫲 High Quality Call Girl Service Right ...
Erotic Call Girls Bangalore🫱9079923931🫲 High Quality Call Girl Service Right ...
 
Software Test Automation - A Comprehensive Guide on Automated Testing.pdf
Software Test Automation - A Comprehensive Guide on Automated Testing.pdfSoftware Test Automation - A Comprehensive Guide on Automated Testing.pdf
Software Test Automation - A Comprehensive Guide on Automated Testing.pdf
 
Going AOT: Everything you need to know about GraalVM for Java applications
Going AOT: Everything you need to know about GraalVM for Java applicationsGoing AOT: Everything you need to know about GraalVM for Java applications
Going AOT: Everything you need to know about GraalVM for Java applications
 
Call Girls Solapur ☎️ +91-7426014248 😍 Solapur Call Girl Beauty Girls Solapur...
Call Girls Solapur ☎️ +91-7426014248 😍 Solapur Call Girl Beauty Girls Solapur...Call Girls Solapur ☎️ +91-7426014248 😍 Solapur Call Girl Beauty Girls Solapur...
Call Girls Solapur ☎️ +91-7426014248 😍 Solapur Call Girl Beauty Girls Solapur...
 
Hyperledger Besu 빨리 따라하기 (Private Networks)
Hyperledger Besu 빨리 따라하기 (Private Networks)Hyperledger Besu 빨리 따라하기 (Private Networks)
Hyperledger Besu 빨리 따라하기 (Private Networks)
 
Introducing Claris FileMaker 2024: presented by DB Services
Introducing Claris FileMaker 2024: presented by DB ServicesIntroducing Claris FileMaker 2024: presented by DB Services
Introducing Claris FileMaker 2024: presented by DB Services
 
Call Girls Bangalore🔥7023059433🔥Best Profile Escorts in Bangalore Available 24/7
Call Girls Bangalore🔥7023059433🔥Best Profile Escorts in Bangalore Available 24/7Call Girls Bangalore🔥7023059433🔥Best Profile Escorts in Bangalore Available 24/7
Call Girls Bangalore🔥7023059433🔥Best Profile Escorts in Bangalore Available 24/7
 
Ensuring Efficiency and Speed with Practical Solutions for Clinical Operations
Ensuring Efficiency and Speed with Practical Solutions for Clinical OperationsEnsuring Efficiency and Speed with Practical Solutions for Clinical Operations
Ensuring Efficiency and Speed with Practical Solutions for Clinical Operations
 
The Ultimate Guide to Top 36 DevOps Testing Tools for 2024.pdf
The Ultimate Guide to Top 36 DevOps Testing Tools for 2024.pdfThe Ultimate Guide to Top 36 DevOps Testing Tools for 2024.pdf
The Ultimate Guide to Top 36 DevOps Testing Tools for 2024.pdf
 
Folding Cheat Sheet #5 - fifth in a series
Folding Cheat Sheet #5 - fifth in a seriesFolding Cheat Sheet #5 - fifth in a series
Folding Cheat Sheet #5 - fifth in a series
 
Microsoft-Power-Platform-Adoption-Planning.pptx
Microsoft-Power-Platform-Adoption-Planning.pptxMicrosoft-Power-Platform-Adoption-Planning.pptx
Microsoft-Power-Platform-Adoption-Planning.pptx
 
What’s new in VictoriaMetrics - Q2 2024 Update
What’s new in VictoriaMetrics - Q2 2024 UpdateWhat’s new in VictoriaMetrics - Q2 2024 Update
What’s new in VictoriaMetrics - Q2 2024 Update
 
Happy Birthday Kubernetes, 10th Birthday edition of Kubernetes Birthday in Au...
Happy Birthday Kubernetes, 10th Birthday edition of Kubernetes Birthday in Au...Happy Birthday Kubernetes, 10th Birthday edition of Kubernetes Birthday in Au...
Happy Birthday Kubernetes, 10th Birthday edition of Kubernetes Birthday in Au...
 
Devops Tools Pratical Preparatório LPI
Devops Tools Pratical   Preparatório LPIDevops Tools Pratical   Preparatório LPI
Devops Tools Pratical Preparatório LPI
 
Refactoring legacy systems using events commands and bubble contexts
Refactoring legacy systems using events commands and bubble contextsRefactoring legacy systems using events commands and bubble contexts
Refactoring legacy systems using events commands and bubble contexts
 
Trailhead Talks_ Journey of an All-Star Ranger .pptx
Trailhead Talks_ Journey of an All-Star Ranger .pptxTrailhead Talks_ Journey of an All-Star Ranger .pptx
Trailhead Talks_ Journey of an All-Star Ranger .pptx
 
How GenAI Can Improve Supplier Performance Management.pdf
How GenAI Can Improve Supplier Performance Management.pdfHow GenAI Can Improve Supplier Performance Management.pdf
How GenAI Can Improve Supplier Performance Management.pdf
 
Folding Cheat Sheet #6 - sixth in a series
Folding Cheat Sheet #6 - sixth in a seriesFolding Cheat Sheet #6 - sixth in a series
Folding Cheat Sheet #6 - sixth in a series
 
🔥 Kolkata Call Girls  👉 9079923931 👫 High Profile Call Girls Whatsapp Number ...
🔥 Kolkata Call Girls  👉 9079923931 👫 High Profile Call Girls Whatsapp Number ...🔥 Kolkata Call Girls  👉 9079923931 👫 High Profile Call Girls Whatsapp Number ...
🔥 Kolkata Call Girls  👉 9079923931 👫 High Profile Call Girls Whatsapp Number ...
 

How to Suceed in Hadoop

  • 1. © comScore, Inc. Proprietary. Syncsort & MapR @ comScore Michael Brown, CTO | July 9th, 2014
  • 2. © comScore, Inc. Proprietary.© comScore, Inc. Proprietary. The comScore Story Analytics for a Digital World™
  • 3. © comScore, Inc. Proprietary. 3 The Digital World is Complex V0113
  • 4. © comScore, Inc. Proprietary. 4 comScore’s Mission Be the Leader in Digital Media Analytics. Measure all forms of media—content and advertising—at scale, across all platforms, in real-time, globally.
  • 5. © comScore, Inc. Proprietary. 5 comScore Brings it Together TabletPC/Mac TV SmartphoneGaming V0113
  • 6. © comScore, Inc. Proprietary. 6 comScore is a leading internet technology company that provides Analytics for a Digital World™ NASDAQ SCOR Clients 2,400+ Worldwide Employees 1,200+ Headquarters Reston, Virginia, USA Global Coverage Measurement from 172 Countries; 44 Markets Reported Local Presence 32 Locations in 23 Countries V0113
  • 7. © comScore, Inc. Proprietary. 7 Providing Analytics For More Than 2,400+ Clients Globally Media Agencies Telecom/Mobile Financial Retail Travel CPG Health Technology V0113
  • 8. © comScore, Inc. Proprietary. 8 Census Tags & Data Feeds Panels PC, iOS, Android Survey Non-behavioral elements Methods Aggregation Dictionaries Taxonomies Syndicated Data Platform Media Metrix vCE Collection Calibration Delivery Consulting Analysis Models Weighting Projection De-Duplication Attribution Turning Big Data into Powerful Insight Client Analytics Platform Digital Analytix
  • 9. © comScore, Inc. Proprietary. 9
  • 10. © comScore, Inc. Proprietary. 10 Panel Heat Map
  • 11. © comScore, Inc. Proprietary. 11 Average Records Captured per Day (2005-2009) - 200,000,000 400,000,000 600,000,000 800,000,000 1,000,000,000 1,200,000,000 1,400,000,000 1,600,000,000 1,800,000,000 9/26/2005 10/26/2005 11/26/2005 12/26/2005 1/26/2006 2/26/2006 3/26/2006 4/26/2006 5/26/2006 6/26/2006 7/26/2006 8/26/2006 9/26/2006 10/26/2006 11/26/2006 12/26/2006 1/26/2007 2/26/2007 3/26/2007 4/26/2007 5/26/2007 6/26/2007 7/26/2007 8/26/2007 9/26/2007 10/26/2007 11/26/2007 12/26/2007 1/26/2008 2/26/2008 3/26/2008 4/26/2008 5/26/2008 6/26/2008 7/26/2008 8/26/2008 9/26/2008 10/26/2008 11/26/2008 12/26/2008 1/26/2009 2/26/2009 3/26/2009
  • 12. © comScore, Inc. Proprietary. 12 CENSUS Unified Digital Measurement™ (UDM) Establishes Platform For Panel + Census Data Integration Adopted by 90% of Top 100 U.S. Media Properties PANEL Unified Digital Measurement (UDM) Patent-Pending Methodology Global PERSON Measurement Global DEVICE Measurement V0411
  • 13. © comScore, Inc. Proprietary. 13 Beacon Heat Map
  • 14. © comScore, Inc. Proprietary. 14 Monthly Records Collection Billion 200 Billion 400 Billion 600 Billion 800 Billion 1,000 Billion 1,200 Billion 1,400 Billion 1,600 Billion 1,800 Billion 2,000 Billion #ofrecords Beacon Records Panel Records Total records collected in June 2014 = 1,726,563,202,649 Total records collected YTD 2014 = 10,037,131,368,475
  • 15. © comScore, Inc. Proprietary. DMX @ comScore
  • 16. © comScore, Inc. Proprietary. 16 DMX use at comScore Purchased our first 4 licenses in 2000! We use DMX from Syncsort across hundreds of servers for efficient data processing and aggregation. We currently run over 100+ unique jobs every day. With these jobs we process over 150 billion rows of data through DMX! Connect Design Process Accelerate
  • 17. © comScore, Inc. Proprietary. 17 Compression w/Sorting Compress Log Files when processing large volumes of log data Several advantages to Sorting Data First:  Reduces the size of the data  Improves application performance Examples:  1 Hour of one source of our data 2,315 GB raw (2.9 billion rows)  Standard compression of time ordered data is 509 GB (22% of original)  Standard compression on a sorted set is 324 GB (14% of original) When applied to all our sources we save  5.0 TB per day  155 TB per month  460 TB per quarter
  • 18. © comScore, Inc. Proprietary. Hadoop @ comScore
  • 19. © comScore, Inc. Proprietary. 19 Why Hadoop? • comScore built our own distributed computing stack in 2002. • In 2009 we decided it was better to leverage the efforts of the Hadoop community instead of building our own stack. • We recognized the benefit of switching to Hadoop which would allow for seamless scaling of our infrastructure to meet the needs of the business. • Hadoop allows us to add compute, storage and memory linearly and allows you to process things at tremendous scale. • Partnered with SyncSort on their Hadoop efforts from Oct 2010 • Evaluated the beta of MapR in the fall of 2011
  • 20. © comScore, Inc. Proprietary. 20 90 Days of Data 1,148 1,919 3,049 4,862 5,084 Trillion 1,000 Trillion 2,000 Trillion 3,000 Trillion 4,000 Trillion 5,000 Trillion 6,000 Trillion 2009 2010 2011 2012 2013 2014 2016
  • 21. © comScore, Inc. Proprietary. 21 High Level Data Flow Panel Census Custom Code + ADW EDW Delivery
  • 22. © comScore, Inc. Proprietary. 22 Our Cluster Production Hadoop Cluster  400+ nodes: Mix of Dell 720xd, R710 and R510 servers  Each R720xd has (24x1.2TB drives; 128GB RAM; 32 cores)  13,800+ total CPUs  31.6 TB total memory  8.2 PB total disk space  Our distro is MapR M5 2.1.3
  • 23. © comScore, Inc. Proprietary. Leveraging Partitions from MapR
  • 24. © comScore, Inc. Proprietary.
  • 25. © comScore, Inc. Proprietary. Validation Funnel & Target Effectiveness
  • 26. © comScore, Inc. Proprietary. 26 Our growth As our volume has grown we have the following stats:  Over 683 billion events per month  Daily Aggregate 1.8 billion  160 billion aggregate records for 92 days  146K Campaigns  Over 50 countries  We see 15 billion distinct cookies in a month  We only need to output 26 million rows
  • 27. © comScore, Inc. Proprietary. 27 Solution to reduce the shuffle The Problem:  Most aggregations within comScore can not take advantage of combiners, leading to large shuffles and job performance issues The Idea:  Partition and sort the data by cookie on a daily basis  Create a custom InputFormat to merge daily partitions for monthly aggregations
  • 28. © comScore, Inc. Proprietary. 28 Custom Input Format with Map Side Aggregation CB Mapper MapperMapperMap Map Map Reduce ReduceReduce BA AC A B C A B C Combiner Combiner Combiner A B C
  • 29. © comScore, Inc. Proprietary. 29 Risks for Partitioning Data locality  Custom InputFormat requires reading blocks of the partitioned data over the network  This was solved using a feature of the MapR file system. We created volumes and set the chunk size to zero which guarantees that the data written to a volume will stay on one node Map failures might result in long run times  Size of the map inputs is no longer set by block size  This was solved by creating a large number (10K) of volumes to limit the size of data processed by each mapper
  • 30. © comScore, Inc. Proprietary. 30 Partitioning Summary Benefits:  A large portion of the aggregation can be completed in the map phase  Applications can now take advantage of combiners  Shuffles sizes are minimal Results:  Took a job from 35 hours to 3 hours with no hardware changes
  • 31. © comScore, Inc. Proprietary. DMX-h @ comScore
  • 32. © comScore, Inc. Proprietary. 32 Reasons for comScore selecting DMX-h Performance • DMX-h as the pluggable sort in Hadoop allows us to increase throughput on it’s existing platform; this reduces capital and ongoing operational expenses • The increase in throughput allows us to also deliver our data more quickly to our customers. These things make the data more valuable to our clients. Speed of Development • The ability to quickly build out applications in the DMX-h GUI allows us to iterate and respond quicker to the needs of the business. • The ease of development also allows us to democratize the access to the Hadoop platform by leveraging a point and click GUI.
  • 33. © comScore, Inc. Proprietary. 33 Performance - DMx Pluggable Sort Testing Results First Comparison Run on our Dev Cluster Pig scripts and called with SyncSort plug in GroupBy / Distinct Operations • Counting uniques • These have large shuffle steps which leads to more data to sort. • Observed up to a 20% decrease in job runtime Filter Operations • Searching for a specific value • Observed a 5% – 10% decrease in job runtime • Dependent on type of filter and size of job output 40GB compressed data, base run is 86 min, test run is 68 min; Savings of 20% Results from 7 Nodes; 56 cores; 433 GB RAM; 28 TB disk; MapR M5 3.0.2; DMX-h 7.12
  • 34. © comScore, Inc. Proprietary. 34 Speed of Development - POC We took an existing process that runs in our Hadoop cluster and converted that to DMX-h to validate the new capabilities. The existing process: • Written in 75 lines of Pig with 3 Java UDFs • Developed in about 25 hours • Processes 3.5 billion input rows per day • Takes 35 minutes to run on a daily basis
  • 35. © comScore, Inc. Proprietary. 35 DMXh-Process
  • 36. © comScore, Inc. Proprietary. 36 Speed of Development - POC The new process in DMX-h: • Developed a new job with 13 tasks • No Java UDF required • Runs on the same data and in the same environment. • Developed in 12 hours. • Runs in 11 minutes! 1/3 of the time of the Pig & Java code.
  • 37. © comScore, Inc. Proprietary. 37 Useful Factoids Visit www.comscoredatamine.com or follow @datagems for the latest gems. Colorful, bite-sized graphical representations of the best discoveries we unearth.
  • 38. © comScore, Inc. Proprietary. 38 Thank You! Michael Brown CTO comScore, Inc. mbrown@comscore.com
  • 39. © 2014 MapR Technologies 1© 2014 MapR Technologies
  • 40. © 2014 MapR Technologies 2 Today’s Presenters Steve Wooledge VP - Product Marketing @swooledge Jorge Lopez Director - Product Marketing @zanilli Mike Brown CTO
  • 41. © 2014 MapR Technologies 3© 2014 MapR Technologies comScore
  • 42. © comScore, Inc. Proprietary. Syncsort & MapR @ comScore • Michael Brown, CTO | July 9th, 2014
  • 43. © 2014 MapR Technologies 5© 2014 MapR Technologies Leveraging MapR and Syncsort
  • 44. © 2014 MapR Technologies 6 Big Data is Overwhelming Traditional Systems • Mission-critical reliability • Transaction guarantees • Deep security • Real-time performance • Backup and recovery • Interactive SQL • Rich analytics • Workload management • Data governance • Backup and recovery Enterprise Data Architecture 1TRENDTREND ENTERPRISE USERS OPERATIONAL SYSTEMS ANALYTICAL SYSTEMS PRODUCTION REQUIREMENTS PRODUCTION REQUIREMENTS OUTSIDE SOURCES
  • 45. © 2014 MapR Technologies 7 Hadoop: The Disruptive Technology at the Core of Big DataTRENDTREND JOB TRENDS FROM INDEED.COM Jan ‘06 Jan ‘12 Jan ‘14Jan ‘07 Jan ‘08 Jan ‘09 Jan ‘10 Jan ‘11 Jan ‘13 2
  • 46. © 2014 MapR Technologies 8 OPERATIONAL SYSTEMS ANALYTICAL SYSTEMS ENTERPRISE USERS 1REALITYREALITY • Data staging • Archive • Data transformation • Data exploration • Streaming, interactions Hadoop Relieves the Pressure from Enterprise Systems 2 Interoperability 1 Reliability and DR 4 Supports operations and analytics 3 High performance Keys for Production Success
  • 47. © 2014 MapR Technologies 9 FOUNDATION Architecture Matters for Success2REALITYREALITY Data protection & security High performance Multi-tenancy Operational & Analytical Workloads Open standards for integration NEW APPLICATIONS SLAs TRUSTEDINFORMATION LOWERTCO
  • 48. © 2014 MapR Technologies 10 The Power of the Open Source Community ManagementManagement MapR Data Platform APACHE HADOOP AND OSS ECOSYSTEM Security YARN Pig Cascading Spark Batch Spark Streaming Storm* Streaming HBase Solr NoSQL & Search Juju Provisioning & coordination Savannah* Mahout MLLib ML, Graph GraphX MapReduce v1 & v2 EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS Workflow & Data Governance Tez* Accumulo* Hive Impala Shark Drill* SQL Sentry* Oozie ZooKeeperSqoop Knox* WhirrFalcon*Flume Data Integration & Access HttpFS Hue * Certification/support planned for 2014
  • 49. © 2014 MapR Technologies 11 MapR Distribution for Hadoop ManagementManagement MapR Data Platform APACHE HADOOP AND OSS ECOSYSTEM Security YARN Pig Cascading Spark Batch Spark Streaming Storm* Streaming HBase Solr NoSQL & Search Juju Provisioning & coordination Savannah* Mahout MLLib ML, Graph GraphX MapReduce v1 & v2 EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS Workflow & Data Governance Tez* Accumulo* Hive Impala Shark Drill* SQL Sentry* Oozie ZooKeeperSqoop Knox* WhirrFalcon*Flume Data Integration & Access HttpFS Hue * Certification/support planned for 2014 • High availability • Data protection • Disaster recovery • Standard file access • Standard database access • Pluggable services • Broad developer support • Enterprise security authorization • Wire-level authentication • Data governance • Ability to support predictive analytics, real-time database operations, and support high arrival rate data • Ability to logically divide a cluster to support different use cases, job types, user groups, and administrators • 2X to 7X higher performance • Consistent, low latency Enterprise-grade Security OperationalPerformance Multi-tenancyInteroperability
  • 50. © 2014 MapR Technologies 12 MapR: Best Solution for Customer Success Top Ranked Exponential Growth 500+ Customers Premier Investors 3X3X bookings Q1 ‘13 – Q1 ‘14 80%80% of accounts expand 3X 90%90% software licenses <1%<1% lifetime churn >$1B>$1B in incremental revenue generated by 1 customer
  • 51. © 2014 MapR Technologies 13 MapR and Syncsort Reference Architecture Sources RELATIONAL, SAAS, MAINFRAME DOCUMENTS, EMAILS LOG FILES, CLICKSTREAMS BLOGS, TWEETS, LINK DATA DATA MARTS DATA WAREHOUSE MapR Data Platform Business Intelligence / Visualization MapR-DB MapR-FS Batch (MR, Spark, Hive, Pig, …) Interactive (Impala, Drill, …) Streaming (Spark Streaming, Storm…) MAPR DISTRIBUTION FOR HADOOP
  • 52. © 2014 MapR Technologies 14 Do You Know Syncsort? • Syncsort provides fast, secure, enterprise‐grade  software spanning “Big Iron to Big Data”  • Fastest sort technology in the market • Powering 50% of mainframes’ sort • A history of innovation • 25+ issued & pending patents • Large global customer base • 12,000+ deployments in 80 countries and serving 87 of  the Fortune 100 • First‐to‐market, fully integrated approach to Hadoop  ETL • Top 7 contributors to Hadoop. Based on number of  lines of code changed in 2013 Our customers are achieving the impossible, every  day! Our customers are achieving the impossible, every  day! Key Partners
  • 53. © 2014 MapR Technologies 15 The Hadoop Challenge PROCESS Sort JoinAggregate Copy Merge DISTRIBUTECOLLECT Most organizations use Hadoop to… EExtract TTransform LLoad
  • 54. © 2014 MapR Technologies 16 Turning Hadoop into a Feature-rich ETL Solution Collect • Broad based connectivity with automated parallelism  • Best in class mainframe data access & translation Process & Distribute • No manual coding. GUI for developing & maintaining MR jobs • No code generation. Engine runs natively on each node • Develop & test locally in Windows; run natively on Hadoop Optimize & Secure • Faster throughput per node • Full support for Kerberos & LDAP • Web‐based monitoring console • Sort‐work compression for storage savings DMX‐h  ETL Collect Process & Distribute Optimize & Secure
  • 55. © 2014 MapR Technologies 17 A Roadmap to Hadoop Success Agile Data  Exploration &  Visualization Next‐gen Analytics Cheap Storage Offload Data  Warehouse Enabling The Data‐driven Organization Solving The Intractable IT Problem 17
  • 56. © 2014 MapR Technologies 18 MapR + Syncsort Solutions Data Warehouse  Optimization Click‐stream  Analysis Mainframe Offload Shift ELT Workloads  to Hadoop Access, Translate & Analyze  Mainframe Data with Hadoop Collect, Process & Analyze More  Data from Your Website
  • 57. © 2014 MapR Technologies 19 Q&AEngage with us! 1. Download the MapR Sandbox for Hadoop: www.mapr.com/sandbox 2. Try Syncsort’s Hadoop ETL in the MapR Sandbox: www.syncsort.com/mapr 3. Learn best practices for Hadoop ETL: www.mapr.com/EDH
  翻译: