Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical platform

© 2016 IBM Corporation
IBM BigInsights:
Bringing you big value from Big Data
Created by C. M. Saracco, IBM Silicon Valley Lab
June 2016

© 2016 IBM Corporation2
IBM Disclaimer
Information regarding potential future products is intended to outline our
general product direction and it should not be relied on in making a purchasing
decision. The information mentioned regarding potential future products is not
a commitment, promise, or legal obligation to deliver any material, code or
functionality. Information about potential future products may not be
incorporated into any contract. The development, release, and timing of any
future features or functionality described for our products remains at our sole
discretion.

Agenda
 The big picture about Big Data
 IBM’s approach
 Portfolio overview
 BigInsights
• Open source core platform with Apache Hadoop
• IBM technologies for enhanced analytics
• How BigInsights fits within a broader IT infrastructure
 How IBM can help you get off to a quick start

The Big Picture about Big Data

Business leaders frequently make
decisions based on information they
don’t trust, or don’t have1in3
83%
of CIOs cited “Business
intelligence and analytics” as part
of their visionary plans
to enhance competitiveness
Business leaders say they don’t
have access to the information they
need to do their jobs
1in2
of CEOs need to do a better job
capturing and understanding
information rapidly in order to
make swift business decisions
60%
… and organizations
need deeper insights
Information is at the center
of a new wave of opportunity…
2.5 million items
per minute
300,000 tweets
per minute
200 million emails
per minute 220,000 photos
per minute
5 TB per flight
> 1 PB per day
gas turbines
1 ZB = 1 billion TB

Extract insight from a high volume, variety and velocity of data in a
timely and cost-effective manner
Big Data presents big opportunities
Manage and benefit from
diverse data types and data
structures
Analyze streaming data and
large volumes of persistent
data
Scale from terabytes to
zettabytes
Variety:
Velocity:
Volume:

What we hear from customers . . . .
 Lots of potentially valuable data is dormant or
discarded due to size/performance issues
 Large volume of unstructured or semi-structured data
is not worth integrating fully (e.g. Tweets, logs, . . .)
 Not clear what should be analyzed (exploratory,
iterative)
 Information distributed across multiple systems
and/or Internet
 Some data has a short useful lifespan
 Volumes can be extremely high
 Query-ready resource for “cold” historic data needed
(prevent unwieldy growth of data warehouses)
 Analysis needed in the context of existing information
(not stand alone).

Merging the traditional and Big Data approaches
IT
Structures the
data to answer
that question
IT
Delivers a platform
to enable creative
discovery
Business
Explores what
questions could be
asked
Business Users
Determine what
question to ask
Monthly sales reports
Profitability analysis
Customer surveys
Brand sentiment
Product strategy
Maximum asset utilization
Big Data Approach
Iterative & Exploratory
Traditional Approach
Structured & Repeatable

Why invest in analytics?
 Analytics pay back $13.01 for every
dollar spent1
 69% created significant positive impact
on business outcomes2
 60% created significant positive impact
on revenues2
 53% created significant competitive
advantage2
1 “Analytics Pays Back $13.01 for Every Dollar Spent” Nucleus Research, September 2014
2 “Analytics: The speed advantage” IBM Institute for Business Value, 2014

Big Data scenarios span many industries
Identify criminals and threats
from disparate video, audio,
and data feeds
Make risk decisions based on
real-time transactional data
Predict weather patterns to plan
optimal wind turbine usage, and
optimize capital expenditure on
asset placement
Detect life-threatening
conditions at hospitals in
time to intervene
Multi-channel customer
sentiment and experience a
analysis

Landing and
Archive Zone
Real-time
Analytics
Zone
Enterprise
Warehouse
and Mart
Zone
Information Governance, Security and Business Continuity
Analytic
Appliances
Big Data Platform
Capabilities
Streaming Data
Text Data
Applications Data
Time Series
Geo Spatial
Relational
• Information Ingest
• Real Time Analytics
• Warehouse & Data Marts
• Analytic Appliances
Social Network
Video &
Image
All Data Sources
Advanced
Analytics /
New Insights
New / Enhanced
Applications
Automated Process
Case Management
Analytic Applications
Cognitive
Learn Dynamically?
Prescriptive
Best Outcomes?
Predictive
What Could Happen?
Descriptive
What Has Happened?
Exploration and
Discovery
What Do You Have?
Watson
Cloud Services
ISV Solutions
Alerts
IBM Big Data and analytics sample architecture
Ingestion
and
Operational
Information

Big Data use expanding rapidly
Big data adoption over time,
as reported by respondents:
2012 to 2014 2015
22%-27% 25% 0%
change
2012 to 2014 2015
24%-26% 10% 250%
decrease
Educate:
Learning about
big data capabilities
2012 to 2014 2015
43%-47% 53% 125%
increase
Explore:
Exploring internal use cases and
developing a strategy
Engage:
Implementing infrastructure and
running pilot activities
2012 to 2014 2015
5%-6% 13% 210%
increase
Execute:
Using big data and analytics
pervasively across the enterprise
2015 IBV study “Analytics: The Upside of Disruption” (ibm.biz/w3_2015analytics)

Big Data technologies pay off

Return on investment period for big data and analytics projects
as reported by respondents
Big Data ROI often < 18 months

Big Data in practice: focus areas
Survey summaries from Forbes, May 2015

IBM’s approach

IBM analytics platform strategy for Big Data
• Integrate and
manage the full
variety, velocity and
volume of Big Data
• Apply advanced
analytics
• Visualize all available
data for ad-hoc
analysis
• Support workload
optimization and
scheduling
• Provide for security
and governance
• Integrate with
enterprise software
Discovery
& Exploration
Prescriptive
Analytics
Predictive
Analytics
Content
Analytics
Business Intelligence
Data
Mgmt
Hadoop &
NoSQL
Content
Mgmt
Data
Warehouse
Information Integration & Governance
IBM ANALYTICS PLATFORM
Built on Spark. Hybrid. Trusted.
Spark Analytics Operating System
Machine LearningOn premises On cloud
Data at rest & In-motion. Inside & outside the firewall. Structured & unstructured.

IBM BigInsights for Apache Hadoop and Spark
Discovery
& Exploration
Prescriptive
Analytics
Predictive
Analytics
Content
Analytics
Business Intelligence
Data
Mgmt
Hadoop &
NoSQL
Content
Mgmt
Data
Warehouse
Information Integration & Governance
IBM ANALYTICS PLATFORM
Built on Spark. Hybrid. Trusted.
Spark Analytics Operating System
Machine LearningOn premises On cloud
Data at rest & In-motion. Inside & outside the firewall. Structured & unstructured.
 Analytical platform for
persistent Big Data
– 100% open source core
with IBM add-ons for
analysts, data
scientists, and admins
– On premise or cloud
 Distinguishing
characteristics
– Built-in analytics . . . .
Enhances business
knowledge
– Enterprise software
integration . . . .
Complements and
extends existing
capabilities
– Production-ready . . . .
Speeds time-to-value
 IBM advantage
– Combination of
software, hardware,
services and research

IBM Open Platform
100% open source platform compliant with ODPi
Apache Hadoop ecosystem
Apache Spark ecosystem
IBM-specific BigInsights features
Big SQL (industry standard SQL)
Text analytics
BigSheets (spreadsheet-style tool)
Big R (R support)
IBM Streams, Cognos (limited use licenses)
Overview of BigInsights
Free Quick Start (non production):
• IBM Open Platform
• IBM added value features
• Community support

BigInsights ISV Partner Ecosystem
lHelium SW

A Closer Look at IBM BigInsights . . . .