Realizing the Promise of Big Data with Hadoop - Cloudera Summer Webinar Series | Forrester

REALIZING THE PROMISE OF BIG
DATA WITH HADOOP
Noel Yuhanna - Forrester
Omer Trajman - Cloudera
Jeremy Dyer & Marty Smith - RelayHealth

1

Hadoop and Big Data
Noel Yuhanna, Principal Analyst

2 © 2009 Forrester Research, Inc. Reproduction Prohibited
2012

Enterprises have 100s of terabytes or petabytes
of data but most of it is unused…

Unused data is a valuable asset and should be leveraged !


Big Data - Problem or opportunity?

 Big data presents serious challenges:
– Strains the current limits of IT infrastructure and resources
– Requires an upgrade across the stack: storage, compute

 A huge opportunity exists with big data!
– Improve operational efficiency
– Offer new insights that can provide competitive advantage
– Deliver advanced, predictive analytics – with more precision
– Support activities and analysis that generate revenue and bring businesses
closer to their customers much faster


Big Data requires a new approach to data
processing and analytics
Organizations need to be able to:
• Process any data at any given time
• Manage very large data sets that run into 100s of TB and PBs
• Process data economically
• Integrate with many sources of data
• Support predictive analytics and self-service data management
platform


What is Hadoop and how can it help?
Open source software that enables distributed
parallel processing of large amounts of data across Large
low-cost commodity servers. amounts
of data
 It leverages an extensible framework for building
advanced analytics and new data management
capabilities.

 It’s already being commercialized and adopted rapidly
in enterprises.

Hadoop
Flexible Distributed processing

Economical Scalable
Open Source
Insights

How are organizations adopting Hadoop?
 Hadoop adoption:
– Current adoption estimate is 20% seen mostly in mid-sized to large
organizations
– Adoption is likely to double through 2016
– Adoption seen across all vertical industries with various use cases
– Many organizations are currently doing POC/Sandbox with Hadoop
platform

 How Hadoop will evolve in organizations:
– Will start out as independent project focusing on priority Analytics
– Will start to integrate with existing systems, Apps and databases
– Embed seamlessly into data management and Analytical Platforms
– Hadoop will become the Data platform delivering self-service capabilities


How to get going on the Big Data journey

 Big Data is here to stay! Hadoop is here to stay!
 Hadoop should be part of your data management and BI strategy
 Integrate Hadoop with existing data mgt., databases and Apps
 Hadoop can help save money, deliver new insights and possibilities
 Don’t limit yourself to structured data only
 A big data initiative is not a one time project, its an on-going process


CLOUDERA: THE STANDARD FOR
APACHE HADOOP IN THE ENTERPRISE
OMER TRAJMAN, VP CUSTOMER SOLUTIONS

“ YOU CAN’T SOLVE 21ST
CENTURY PROBLEMS
WITH 20TH CENTURY
TECHNOLOGIES
”

HOSPITALS
NEED MORE
COMPREHENSIVE
PATIENT
INFORMATION

BANKS MUST
DETECT FRAUD BROADCAST NETWORKS
FASTER WANT TO DELIVER
CUSTOMIZED CONTENT BY
HOUSEHOLD

AIRLINES WANT TO
UPDATE FLIGHT POWER COMPANIES
PRICES IN REAL- WANT TO SAVE
TIME CUSTOMERS MONEY BY
ANALYZING
USAGE DATA

OIL COMPANIES
WANT TO PREDICT
THE LOCATION OF
DEPOSITS MORE
ACCURATALY

RETAILERS WANT TO PARTICLE
CREATE MORE PHYSICISTS WANT
TARGETTED OFFERS REAL-TIME DATA
TO CUSTOMERS FROM THE HADRON
COLLIDER

SCIENTIFIC APPROACH
TO DATA REQUIRES…
STORAGE FORMATS
FLEXIBILITY
EXTENSIBILITY
COMPACT STORAGE
FAST LOAD/STORE
WIDELY SUPPORTED

SIX CHARACTERISTICS OF
ENTERPRISE-GRADE HADOOP

1 HIGH
AVAILABILITY 2 GRANULAR
SECURITY
THERE’S NO DOWNTIME. YOUR DATA IS PROCESS AND CONTROL SENSITIVE
ALWAYS AVAILABLE FOR DECISIONS DATA WITH CONFIDENCE

3 ROBUST
MANAGEMENT 4 SCALABLE AND
EXTENSIBLE
ACHIEVE OPTIMAL PERFORMANCE VIA ADAPTS TO YOUR WORKLOAD AND
CENTRALIZED ADMINISTRATION GROWS WITH THE BUSINESS

5 CERTIFIED AND
COMPATIBLE 6 GLOBAL SUPPORT
AND SERVICES
EXTEND AND LEVERAGE EXISTING ACHIEVE SLAs AND ADHERE TO
INFRASTRUCTURE INVESTMENTS EXISTING IT POLICIES

HADOOP PROVIDES A DATA HUB FOR ALL BIG DATA WORKLOADS

• Brings storage and computation together in one single system
• Works with every type of data in its native format
• Changes the economics of data management

APACHE HADOOP
CO-EXISTS WITH EDW, ETL & BI TOOLS
 Consulting Services
 Cloudera University Cloudera Services

OPERATORS ENGINEERS ANALYSTS BUSINESS USERS CUSTOMERS

Cloudera Enterprise
Management  Cloudera Manager Enterprise Web
 Cloudera Support IDE’s BI / Analytics
Tools Reporting Application

Enterprise Data
Warehouse
Cloudera’s Distribution
Including Apache Hadoop (CDH)
& Operational Rules
Cloudera Manager Free Edition Engines

Relational
Logs Files Web Data
Databases

CLOUDERA’S PARTNER ECOSYSTEM:
WIDEST INTEGRATION
All the industry leaders develop on CDH.

CDH4
STORAGE COMPUTATION ACCESS INTEGRATION
Big Data storage, processing and analytics platform based
on Apache Hadoop – 100% open source

BI / Analytics Data Integration Database OS / Cloud / Sys Mgmt Hardware

16

REDEFINE WHAT’S
POSSIBLE WITH
YOUR DATA

Why Hadoop, Why Cloudera, Why Now?

Agenda
✛ RH overview
✛ What is our need
✛ Why our system/data is complicated
✛ How Hadoop meets our needs

McKesson Corporation

✛ Largest healthcare company in the world
$103+billion in revenues; Fortune 15; S&P 500
Est. 1833
Headquarters: San Francisco

✛ Business
Distribution Solutions
Technology Solutions

✛ Extensive resource base
32,000+ employees solely dedicated to healthcare

✛ Comprehensive array of solutions
Significant value through a single relationship

✛ Broadest customer base in healthcare
Experienced partners in improving healthcare

Overview of Financial Solutions

200,000
Physicians 1900
2,000 Payers /
Hospitals Health Plans

Provider-to-Payer Interactions
Total Interactions: 2.4 Billion/Year

Business Challenges

✛ Help customers save money
✛ Small reductions to time in AR 
big savings, better cash flow

✛ Meet regulatory challenges
> Must store 7 years transactional
data

What Big Data Means to RelayHealth

Every single day:

+ millions of transactions generated

+ thousands of files received

+ 150GB+ log data collected
…to be stored for 7 years

Why RelayHealth Considered Hadoop

✛ Business requirement around data storage & retrieval

✛ Looked at traditional solutions

Database
File System
$$$;
Untenable when
Not easy to
searching
index files

Hybrid
(File System + Solr)
Not scalable

Achieving Operational Efficiency with Hadoop & Cloudera

✛ Why Hadoop? ✛ Why Cloudera?

> Store billions of files across > Core Apache Hadoop
machines leveraging OSS community
> Mine data in files using M/R > Integration with other open
source solutions:
> Aggregate log data & search HBase, Solr, Camel
through it using unique
> Committer level knowledge of
customer identifying
information code & how it works
> World-class support
> Store data in its highest
fidelity state > Cloudera Manager

Changing Perception

✛ Simple archive vs. a way to share data across the organization

✛ Building the ability to collect data flowing through our system at all
points needed

✛ Integrating CDH into the rest of the enterprise
> Storing data in its highest fidelity state
> Moving away from traditional warehousing systems
> Ability to distill data in the cluster for mining in other systems – CDH
connectors

Summary

✛ Challenge: ✛ Solution:
✛ Shorten healthcare providers’ ✛ Hadoop
payment cycles via scalable, flexible data
streamlined message processing & analysis on
processing multi-structured data
✛ RDBMS can’t keep up ✛ Cloudera Enterprise
with growing data adding
volumes + data storage expertise, support &
mandates for regulatory management tools to
compliance open source Hadoop

REGISTER NOW FOR THE REMAINING
‘POWER OF HADOOP’ WEBINARS:
THANK WHAT THE HADOOP: WHY YOUR BUSINESS CAN’T

YOU!
AFFORD TO IGNORE THE POWER OF HADOOP
GIGAOM PRO AND CLOUDERA
WEDNESDAY, AUGUST 29, 10AM PST

THE BUSINESS ADVANTAGE OF HADOOP:LESSONS
FROM THE FIELD
451 RESEARCH AND CLOUDERA
THURSDAY, SEPTEMBER 26, 10AM PST

29

Realizing the Promise of Big Data with Hadoop - Cloudera Summer Webinar Series | Forrester

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Realizing the Promise of Big Data with Hadoop - Cloudera Summer Webinar Series | Forrester

Similar to Realizing the Promise of Big Data with Hadoop - Cloudera Summer Webinar Series | Forrester (20)

More from Cloudera, Inc.

More from Cloudera, Inc. (20)

Recently uploaded

Recently uploaded (20)

Realizing the Promise of Big Data with Hadoop - Cloudera Summer Webinar Series | Forrester

Editor's Notes