Hadoop hive presentation

Agenda
• Problems with traditional large-scale systems
• Requirements for new approaches
• What is Hadoop..?
• Why Hadoop?
• Overview of Hadoop
• HDFS
• Map Reduce
• Applications
• Conclusion

Problems with traditional large-scale systems

Data is being increased day-by-day

Issues with the network failure

Server failure

Loss of data

Cost is more.

Distributed computing need manual processing

Requirements for new approaches

Data should be stored in a distributed manner
and parallel processing.

High performance and less cost.

Should be scalable

Should be simple to access and process

Fault tolerance

What is Hadoop…?

Open Source Framework

Process large amount of data

Why Hadoop…?
• Accessible
• Scalable
• Robust
• Simple

Overview of Hadoop

It handles 3 types of data
Structured
Semi – structured
Unstructured

Analyses and process large amounts of data (Peta byte)

Compare with traditional DB’s
RDBMS
• Stores GB’s of data
• Supports batch process
and interactive process
• Allows Updation
• Schemas must me defined
• Only structured data
HADOOP
• Stores PB’s of data
• Only batch process
• Does not allow Updation, it
follows WORM
• Schemas not required
• Supports 3 types of data

Components

Hadoop can be divided into 2 parts
1. HDFS – Hadoop Distributed File System
2. MapReduce Programming model

Hadoop Distributed File System

It is a distributed file system

Runs on commodity hardware

Provides high throughput access to application data

suitable for applications that have large data sets.

It is designed to store a very large amount of data (Tera or peta
bytes).

Core Architectural Goal of HDFS

A HDFS instance may consist of thousands of server machines.

Detection of faults and quickly recovering from them in an
automated manner

MapReduce Programming Model

MapReduce works on divide and conquer rule on the data.

Schedules execution across a set of machines

Manages inter-process communication

The Reducer processes all output from all mappers and arrives
at final output

MapReduce Programming Model
– MAP
• Map() function that processes a key/value pair to
generate a set of intermediate key/value pairs
– REDUCE
• reduce() function that merges all intermediate values
associated with the same intermediate key.

REFERENCE
• HADOOP IN ACTION
- By CHUK LAM
• YOUTUBE
• WIKEPEDIA
• GOOGLE IMAGES

Hadoop hive presentation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Hadoop hive presentation

Similar to Hadoop hive presentation (20)

Recently uploaded

Recently uploaded (20)

Hadoop hive presentation