尊敬的 微信汇率:1円 ≈ 0.046239 元 支付宝汇率:1円 ≈ 0.04633元 [退出登录]
SlideShare a Scribd company logo
GOOGLE FILE SYSTEM 
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung 
Presented By – Ankit Thiranh
OVERVIEW 
• Introduction 
• Architecture 
• Characteristics 
• System Interaction 
• Master Operation and Fault tolerance and diagnosis 
• Measurements 
• Some Real world clusters and their performance
INTRODUCTION 
• Google – large amount of data 
• Need a good file distribution system to process its data 
• Solution: Google File System 
• GFS is : 
• Large 
• Distributed 
• Highly fault tolerant system
ASSUMPTIONS 
• The system is built from many inexpensive commodity components that often fail. 
• The system stores a modest number of large files. 
• Primarily two kind of reads: large streaming reads and small random needs. 
• Many large sequential writes append data to files. 
• The system must efficiently implement well-defined semantics for multiple clients that 
concurrently append to the same file. 
• High sustained bandwidth is more important than low latency.
ARCHITECTURE
CHARACTERISTICS 
• Single master 
• Chunk size 
• Metadata 
• In-Memory Data structures 
• Chunk Locations 
• Operational Log 
• Consistency Model (figure) 
• Guarantees by GFS 
• Implications for Applications 
Write Record Append 
Serial Success defined Defined 
interspersed with 
inconsistent 
Concurrent 
successes 
Consistent but 
undefined 
Failure inconsistent 
File Region State After Mutation
SYSTEM INTERACTION 
• Leases and Mutation Order 
• Data flow 
• Atomic Record appends 
• Snapshot 
Figure 2: Write Control and Data Flow
MASTER OPERATION 
• Namespace Management and Locking 
• Replica Placement 
• Creation, Re-replication, Rebalancing 
• Garbage Collection 
• Mechanism 
• Discussion 
• State Replica Detection
FAULT TOLERANCE AND DIAGNOSIS 
• High Availability 
• Fast Recovery 
• Chunk Replication 
• Master Replication 
• Data Integrity 
• Diagnostics tools
MEASUREMENTS 
Aggregate Throughputs. Top curves show theoretical limits imposed by the network topology. Bottom curves 
show measured throughputs. They have error bars that show 95% confidence intervals, which are illegible in 
some cases because of low variance in measurements.
REAL WORLD CLUSTERS 
• Two clusters were examined: 
• Cluster A used for Research and development by over a hundred users. 
• Cluster B is used for production data processing with occasional human 
intervention 
• Storage 
• Metadata 
Cluster A B 
Chunkservers 342 227 
Available disk Size 
72 TB 
Used Disk Space 
55 TB 
Characteristics of two GFS clusters 
180 TB 
155 TB 
Number of Files 
Number of Dead Files 
Number of chunks 
735 k 
22 k 
992 k 
737 k 
232 k 
1550 k 
Metadata at chunkservers 
Metadata at master 
13 GB 
48 MB 
21 GB 
60 MB
PERFORMANCE EVALUATION OF TWO 
CLUSTERS 
• Read and write rates and Master load 
Cluster A B 
Read Rate (last minute) 583 MB/s 380 MB/s 
Read Rate (last hour) 562 MB/s 384 MB/s 
Read Rate (since start) 589 MB/s 49 MB/s 
Write Rate (last minute) 1 MB/s 101 MB/s 
Write Rate (last hour) 2 MB/s 117 MB/s 
Write Rate (since start) 25 MB/s 13 MB/s 
Master ops (last minute) 325 Ops/s 533 Ops/s 
Master ops (last hour) 381 Ops/s 518 Ops/s 
Master ops (since start) 202 Ops/s 347 Ops/s 
Performance Metrics for Two GFS Clusters
WORKLOAD BREAKDOWN 
• Chunkserver Workload 
Operation Read Write Record Append 
Cluster X Y X Y X Y 
0K 0.4 2.6 0 0 0 0 
1B….1K 0.1 4.1 6.6 4.9 0.2 9.2 
1K…8K 65.2 38.5 0.4 1.0 18.9 15.2 
8K…64K 29.9 45.1 17.8 43.0 78.0 2.8 
64K….128K 0.1 0.7 2.3 1.9 < 0.1 4.3 
128K….256K 0.2 0.3 31.6 0.4 < 0.1 10.6 
256K…512K 0.1 0.1 4.2 7.7 < 0.1 31.2 
512K….1M 3.9 6.9 35.5 28.7 2.2 25.5 
1M..inf 0.1 1.8 1.5 12.3 0.7 2.2 
Operation Read Write Record Append 
Cluster X Y X Y X Y 
1B….1K < 0.1 <0.1 < 0.1 <0.1 < 0.1 <0.1 
1K…8K 13.8 3.9 < 0.1 <0.1 < 0.1 0.1 
8K…64K 11.4 9.3 2.4 5.9 78.0 0.3 
64K….128K 0.3 0.7 0.3 0.3 < 0.1 1.2 
128K….256K 0.8 0.6 16.5 0.2 < 0.1 5.8 
256K…512K 1.4 0.3 3.4 7.7 < 0.1 38.4 
512K….1M 65.9 55.1 74.1 58.0 0.1 46.8 
1M..inf 6.4 28.0 3.3 28.0 53.9 7.4 
Operations Break down by Size (% ) Bytes Transferred Breakdown by Operation Size (% )
WORKLOAD BREAKDOWN 
• Master Workload 
Cluster X Y 
Open 26.1 16.3 
Delete 0.7 1.5 
FindLocation 64.3 65.8 
FindLeaseHolder 7.8 13.4 
FindMatchingFiles 0.6 2.2 
All other combined 0.5 0.8 
Master Requests Break down by Type (% )
Google file system

More Related Content

What's hot

Google File System
Google File SystemGoogle File System
Google File System
nadikari123
 
Google File System
Google File SystemGoogle File System
Google File System
Junyoung Jung
 
LAS16-504: Secure Storage updates in OP-TEE
LAS16-504: Secure Storage updates in OP-TEELAS16-504: Secure Storage updates in OP-TEE
LAS16-504: Secure Storage updates in OP-TEE
Linaro
 
Google File System
Google File SystemGoogle File System
Google File System
guest2cb4689
 
Google file system
Google file systemGoogle file system
Google file system
Lalit Rastogi
 
Process management in linux
Process management in linuxProcess management in linux
Process management in linux
Mazenetsolution
 
Peer to peer Paradigms
Peer to peer ParadigmsPeer to peer Paradigms
Peer to peer Paradigms
hassan ahmed
 
Unix files
Unix filesUnix files
Unix files
Sunil Rm
 
Cryptography - Block cipher & stream cipher
Cryptography - Block cipher & stream cipherCryptography - Block cipher & stream cipher
Cryptography - Block cipher & stream cipher
Niloy Biswas
 
Introduction to Parallel and Distributed Computing
Introduction to Parallel and Distributed ComputingIntroduction to Parallel and Distributed Computing
Introduction to Parallel and Distributed Computing
Sayed Chhattan Shah
 
RPC: Remote procedure call
RPC: Remote procedure callRPC: Remote procedure call
RPC: Remote procedure call
Sunita Sahu
 
Loader and Its types
Loader and Its typesLoader and Its types
Loader and Its types
Parth Dodiya
 
Seminar Report on Google File System
Seminar Report on Google File SystemSeminar Report on Google File System
Seminar Report on Google File System
Vishal Polley
 
Heterogeneous computing
Heterogeneous computingHeterogeneous computing
Heterogeneous computing
Rashid Ansari
 
Linkers
LinkersLinkers
Linkers
Rahul Dhiman
 
Unit 1 architecture of distributed systems
Unit 1 architecture of distributed systemsUnit 1 architecture of distributed systems
Unit 1 architecture of distributed systems
karan2190
 
Processors and its Types
Processors and its TypesProcessors and its Types
Processors and its Types
Nimrah Shahbaz
 
Shared memory
Shared memoryShared memory
Shared memory
Abhishek Khune
 
Cache coherence problem and its solutions
Cache coherence problem and its solutionsCache coherence problem and its solutions
Cache coherence problem and its solutions
Majid Saleem
 
Basics of boot-loader
Basics of boot-loaderBasics of boot-loader
Basics of boot-loader
iamumr
 

What's hot (20)

Google File System
Google File SystemGoogle File System
Google File System
 
Google File System
Google File SystemGoogle File System
Google File System
 
LAS16-504: Secure Storage updates in OP-TEE
LAS16-504: Secure Storage updates in OP-TEELAS16-504: Secure Storage updates in OP-TEE
LAS16-504: Secure Storage updates in OP-TEE
 
Google File System
Google File SystemGoogle File System
Google File System
 
Google file system
Google file systemGoogle file system
Google file system
 
Process management in linux
Process management in linuxProcess management in linux
Process management in linux
 
Peer to peer Paradigms
Peer to peer ParadigmsPeer to peer Paradigms
Peer to peer Paradigms
 
Unix files
Unix filesUnix files
Unix files
 
Cryptography - Block cipher & stream cipher
Cryptography - Block cipher & stream cipherCryptography - Block cipher & stream cipher
Cryptography - Block cipher & stream cipher
 
Introduction to Parallel and Distributed Computing
Introduction to Parallel and Distributed ComputingIntroduction to Parallel and Distributed Computing
Introduction to Parallel and Distributed Computing
 
RPC: Remote procedure call
RPC: Remote procedure callRPC: Remote procedure call
RPC: Remote procedure call
 
Loader and Its types
Loader and Its typesLoader and Its types
Loader and Its types
 
Seminar Report on Google File System
Seminar Report on Google File SystemSeminar Report on Google File System
Seminar Report on Google File System
 
Heterogeneous computing
Heterogeneous computingHeterogeneous computing
Heterogeneous computing
 
Linkers
LinkersLinkers
Linkers
 
Unit 1 architecture of distributed systems
Unit 1 architecture of distributed systemsUnit 1 architecture of distributed systems
Unit 1 architecture of distributed systems
 
Processors and its Types
Processors and its TypesProcessors and its Types
Processors and its Types
 
Shared memory
Shared memoryShared memory
Shared memory
 
Cache coherence problem and its solutions
Cache coherence problem and its solutionsCache coherence problem and its solutions
Cache coherence problem and its solutions
 
Basics of boot-loader
Basics of boot-loaderBasics of boot-loader
Basics of boot-loader
 

Viewers also liked

Google file system
Google file systemGoogle file system
Google file system
Dhan V Sagar
 
The google file system
The google file systemThe google file system
The google file system
Daniel Checchia
 
Google File Systems
Google File SystemsGoogle File Systems
Google File Systems
Azeem Mumtaz
 
GFS
GFSGFS
GFS - Google File System
GFS - Google File SystemGFS - Google File System
GFS - Google File System
tutchiio
 
google file system
google file systemgoogle file system
google file system
diptipan
 
Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...
Antonio Cesarano
 
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukCloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Andrii Vozniuk
 
10 Tips for Making Beautiful Slideshow Presentations by www.visuali.se
10 Tips for Making Beautiful Slideshow Presentations by www.visuali.se10 Tips for Making Beautiful Slideshow Presentations by www.visuali.se
10 Tips for Making Beautiful Slideshow Presentations by www.visuali.se
Edahn Small
 

Viewers also liked (9)

Google file system
Google file systemGoogle file system
Google file system
 
The google file system
The google file systemThe google file system
The google file system
 
Google File Systems
Google File SystemsGoogle File Systems
Google File Systems
 
GFS
GFSGFS
GFS
 
GFS - Google File System
GFS - Google File SystemGFS - Google File System
GFS - Google File System
 
google file system
google file systemgoogle file system
google file system
 
Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...
 
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukCloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
 
10 Tips for Making Beautiful Slideshow Presentations by www.visuali.se
10 Tips for Making Beautiful Slideshow Presentations by www.visuali.se10 Tips for Making Beautiful Slideshow Presentations by www.visuali.se
10 Tips for Making Beautiful Slideshow Presentations by www.visuali.se
 

Similar to Google file system

Cassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionCassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
DataStax Academy
 
Cassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in ProductionCassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in Production
DataStax Academy
 
Cassandra Day London 2015: Diagnosing Problems in Production
Cassandra Day London 2015: Diagnosing Problems in ProductionCassandra Day London 2015: Diagnosing Problems in Production
Cassandra Day London 2015: Diagnosing Problems in Production
DataStax Academy
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
Splunk
 
Diagnosing Problems in Production (Nov 2015)
Diagnosing Problems in Production (Nov 2015)Diagnosing Problems in Production (Nov 2015)
Diagnosing Problems in Production (Nov 2015)
Jon Haddad
 
Advanced Operations
Advanced OperationsAdvanced Operations
Advanced Operations
DataStax Academy
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
thelabdude
 
What's new in JBoss ON 3.2
What's new in JBoss ON 3.2What's new in JBoss ON 3.2
What's new in JBoss ON 3.2
Thomas Segismont
 
Diagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - CassandraDiagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - Cassandra
Jon Haddad
 
Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using Elasticsearch
Joe Alex
 
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
DATAVERSITY
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
C4Media
 
How does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataHow does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsData
acelyc1112009
 
August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation
Yahoo Developer Network
 
Toronto High Scalability meetup - Scaling ELK
Toronto High Scalability meetup - Scaling ELKToronto High Scalability meetup - Scaling ELK
Toronto High Scalability meetup - Scaling ELK
Andrew Trossman
 
Colvin exadata mistakes_ioug_2014
Colvin exadata mistakes_ioug_2014Colvin exadata mistakes_ioug_2014
Colvin exadata mistakes_ioug_2014
marvin herrera
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
John Adams
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
Chester Chen
 
Introduction to STINGER
Introduction to STINGERIntroduction to STINGER
Introduction to STINGER
robertmccoll
 
Webinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionWebinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in Production
DataStax Academy
 

Similar to Google file system (20)

Cassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionCassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
 
Cassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in ProductionCassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in Production
 
Cassandra Day London 2015: Diagnosing Problems in Production
Cassandra Day London 2015: Diagnosing Problems in ProductionCassandra Day London 2015: Diagnosing Problems in Production
Cassandra Day London 2015: Diagnosing Problems in Production
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
 
Diagnosing Problems in Production (Nov 2015)
Diagnosing Problems in Production (Nov 2015)Diagnosing Problems in Production (Nov 2015)
Diagnosing Problems in Production (Nov 2015)
 
Advanced Operations
Advanced OperationsAdvanced Operations
Advanced Operations
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
What's new in JBoss ON 3.2
What's new in JBoss ON 3.2What's new in JBoss ON 3.2
What's new in JBoss ON 3.2
 
Diagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - CassandraDiagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - Cassandra
 
Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using Elasticsearch
 
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
 
How does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataHow does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsData
 
August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation
 
Toronto High Scalability meetup - Scaling ELK
Toronto High Scalability meetup - Scaling ELKToronto High Scalability meetup - Scaling ELK
Toronto High Scalability meetup - Scaling ELK
 
Colvin exadata mistakes_ioug_2014
Colvin exadata mistakes_ioug_2014Colvin exadata mistakes_ioug_2014
Colvin exadata mistakes_ioug_2014
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
 
Introduction to STINGER
Introduction to STINGERIntroduction to STINGER
Introduction to STINGER
 
Webinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionWebinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in Production
 

Recently uploaded

Interprofessional Education Platform Introduction.pdf
Interprofessional Education Platform Introduction.pdfInterprofessional Education Platform Introduction.pdf
Interprofessional Education Platform Introduction.pdf
Ben Aldrich
 
Keynote given on June 24 for MASSP at Grand Traverse City
Keynote given on June 24 for MASSP at Grand Traverse CityKeynote given on June 24 for MASSP at Grand Traverse City
Keynote given on June 24 for MASSP at Grand Traverse City
PJ Caposey
 
What are the new features in the Fleet Odoo 17
What are the new features in the Fleet Odoo 17What are the new features in the Fleet Odoo 17
What are the new features in the Fleet Odoo 17
Celine George
 
Contiguity Of Various Message Forms - Rupam Chandra.pptx
Contiguity Of Various Message Forms - Rupam Chandra.pptxContiguity Of Various Message Forms - Rupam Chandra.pptx
Contiguity Of Various Message Forms - Rupam Chandra.pptx
Kalna College
 
A Quiz on Drug Abuse Awareness by Quizzito
A Quiz on Drug Abuse Awareness by QuizzitoA Quiz on Drug Abuse Awareness by Quizzito
A Quiz on Drug Abuse Awareness by Quizzito
Quizzito The Quiz Society of Gargi College
 
CHUYÊN ĐỀ ÔN TẬP VÀ PHÁT TRIỂN CÂU HỎI TRONG ĐỀ MINH HỌA THI TỐT NGHIỆP THPT ...
CHUYÊN ĐỀ ÔN TẬP VÀ PHÁT TRIỂN CÂU HỎI TRONG ĐỀ MINH HỌA THI TỐT NGHIỆP THPT ...CHUYÊN ĐỀ ÔN TẬP VÀ PHÁT TRIỂN CÂU HỎI TRONG ĐỀ MINH HỌA THI TỐT NGHIỆP THPT ...
CHUYÊN ĐỀ ÔN TẬP VÀ PHÁT TRIỂN CÂU HỎI TRONG ĐỀ MINH HỌA THI TỐT NGHIỆP THPT ...
Nguyen Thanh Tu Collection
 
220711130097 Tulip Samanta Concept of Information and Communication Technology
220711130097 Tulip Samanta Concept of Information and Communication Technology220711130097 Tulip Samanta Concept of Information and Communication Technology
220711130097 Tulip Samanta Concept of Information and Communication Technology
Kalna College
 
The Science of Learning: implications for modern teaching
The Science of Learning: implications for modern teachingThe Science of Learning: implications for modern teaching
The Science of Learning: implications for modern teaching
Derek Wenmoth
 
Erasmus + DISSEMINATION ACTIVITIES Croatia
Erasmus + DISSEMINATION ACTIVITIES CroatiaErasmus + DISSEMINATION ACTIVITIES Croatia
Erasmus + DISSEMINATION ACTIVITIES Croatia
whatchangedhowreflec
 
Diversity Quiz Prelims by Quiz Club, IIT Kanpur
Diversity Quiz Prelims by Quiz Club, IIT KanpurDiversity Quiz Prelims by Quiz Club, IIT Kanpur
Diversity Quiz Prelims by Quiz Club, IIT Kanpur
Quiz Club IIT Kanpur
 
The Rise of the Digital Telecommunication Marketplace.pptx
The Rise of the Digital Telecommunication Marketplace.pptxThe Rise of the Digital Telecommunication Marketplace.pptx
The Rise of the Digital Telecommunication Marketplace.pptx
PriyaKumari928991
 
How to stay relevant as a cyber professional: Skills, trends and career paths...
How to stay relevant as a cyber professional: Skills, trends and career paths...How to stay relevant as a cyber professional: Skills, trends and career paths...
How to stay relevant as a cyber professional: Skills, trends and career paths...
Infosec
 
How to Setup Default Value for a Field in Odoo 17
How to Setup Default Value for a Field in Odoo 17How to Setup Default Value for a Field in Odoo 17
How to Setup Default Value for a Field in Odoo 17
Celine George
 
220711130083 SUBHASHREE RAKSHIT Internet resources for social science
220711130083 SUBHASHREE RAKSHIT  Internet resources for social science220711130083 SUBHASHREE RAKSHIT  Internet resources for social science
220711130083 SUBHASHREE RAKSHIT Internet resources for social science
Kalna College
 
The basics of sentences session 8pptx.pptx
The basics of sentences session 8pptx.pptxThe basics of sentences session 8pptx.pptx
The basics of sentences session 8pptx.pptx
heathfieldcps1
 
欧洲杯下注-欧洲杯下注押注官网-欧洲杯下注押注网站|【​网址​🎉ac44.net🎉​】
欧洲杯下注-欧洲杯下注押注官网-欧洲杯下注押注网站|【​网址​🎉ac44.net🎉​】欧洲杯下注-欧洲杯下注押注官网-欧洲杯下注押注网站|【​网址​🎉ac44.net🎉​】
欧洲杯下注-欧洲杯下注押注官网-欧洲杯下注押注网站|【​网址​🎉ac44.net🎉​】
andagarcia212
 
Dreamin in Color '24 - (Workshop) Design an API Specification with MuleSoft's...
Dreamin in Color '24 - (Workshop) Design an API Specification with MuleSoft's...Dreamin in Color '24 - (Workshop) Design an API Specification with MuleSoft's...
Dreamin in Color '24 - (Workshop) Design an API Specification with MuleSoft's...
Alexandra N. Martinez
 
Post init hook in the odoo 17 ERP Module
Post init hook in the  odoo 17 ERP ModulePost init hook in the  odoo 17 ERP Module
Post init hook in the odoo 17 ERP Module
Celine George
 
bryophytes.pptx bsc botany honours second semester
bryophytes.pptx bsc botany honours  second semesterbryophytes.pptx bsc botany honours  second semester
bryophytes.pptx bsc botany honours second semester
Sarojini38
 
Opportunity scholarships and the schools that receive them
Opportunity scholarships and the schools that receive themOpportunity scholarships and the schools that receive them
Opportunity scholarships and the schools that receive them
EducationNC
 

Recently uploaded (20)

Interprofessional Education Platform Introduction.pdf
Interprofessional Education Platform Introduction.pdfInterprofessional Education Platform Introduction.pdf
Interprofessional Education Platform Introduction.pdf
 
Keynote given on June 24 for MASSP at Grand Traverse City
Keynote given on June 24 for MASSP at Grand Traverse CityKeynote given on June 24 for MASSP at Grand Traverse City
Keynote given on June 24 for MASSP at Grand Traverse City
 
What are the new features in the Fleet Odoo 17
What are the new features in the Fleet Odoo 17What are the new features in the Fleet Odoo 17
What are the new features in the Fleet Odoo 17
 
Contiguity Of Various Message Forms - Rupam Chandra.pptx
Contiguity Of Various Message Forms - Rupam Chandra.pptxContiguity Of Various Message Forms - Rupam Chandra.pptx
Contiguity Of Various Message Forms - Rupam Chandra.pptx
 
A Quiz on Drug Abuse Awareness by Quizzito
A Quiz on Drug Abuse Awareness by QuizzitoA Quiz on Drug Abuse Awareness by Quizzito
A Quiz on Drug Abuse Awareness by Quizzito
 
CHUYÊN ĐỀ ÔN TẬP VÀ PHÁT TRIỂN CÂU HỎI TRONG ĐỀ MINH HỌA THI TỐT NGHIỆP THPT ...
CHUYÊN ĐỀ ÔN TẬP VÀ PHÁT TRIỂN CÂU HỎI TRONG ĐỀ MINH HỌA THI TỐT NGHIỆP THPT ...CHUYÊN ĐỀ ÔN TẬP VÀ PHÁT TRIỂN CÂU HỎI TRONG ĐỀ MINH HỌA THI TỐT NGHIỆP THPT ...
CHUYÊN ĐỀ ÔN TẬP VÀ PHÁT TRIỂN CÂU HỎI TRONG ĐỀ MINH HỌA THI TỐT NGHIỆP THPT ...
 
220711130097 Tulip Samanta Concept of Information and Communication Technology
220711130097 Tulip Samanta Concept of Information and Communication Technology220711130097 Tulip Samanta Concept of Information and Communication Technology
220711130097 Tulip Samanta Concept of Information and Communication Technology
 
The Science of Learning: implications for modern teaching
The Science of Learning: implications for modern teachingThe Science of Learning: implications for modern teaching
The Science of Learning: implications for modern teaching
 
Erasmus + DISSEMINATION ACTIVITIES Croatia
Erasmus + DISSEMINATION ACTIVITIES CroatiaErasmus + DISSEMINATION ACTIVITIES Croatia
Erasmus + DISSEMINATION ACTIVITIES Croatia
 
Diversity Quiz Prelims by Quiz Club, IIT Kanpur
Diversity Quiz Prelims by Quiz Club, IIT KanpurDiversity Quiz Prelims by Quiz Club, IIT Kanpur
Diversity Quiz Prelims by Quiz Club, IIT Kanpur
 
The Rise of the Digital Telecommunication Marketplace.pptx
The Rise of the Digital Telecommunication Marketplace.pptxThe Rise of the Digital Telecommunication Marketplace.pptx
The Rise of the Digital Telecommunication Marketplace.pptx
 
How to stay relevant as a cyber professional: Skills, trends and career paths...
How to stay relevant as a cyber professional: Skills, trends and career paths...How to stay relevant as a cyber professional: Skills, trends and career paths...
How to stay relevant as a cyber professional: Skills, trends and career paths...
 
How to Setup Default Value for a Field in Odoo 17
How to Setup Default Value for a Field in Odoo 17How to Setup Default Value for a Field in Odoo 17
How to Setup Default Value for a Field in Odoo 17
 
220711130083 SUBHASHREE RAKSHIT Internet resources for social science
220711130083 SUBHASHREE RAKSHIT  Internet resources for social science220711130083 SUBHASHREE RAKSHIT  Internet resources for social science
220711130083 SUBHASHREE RAKSHIT Internet resources for social science
 
The basics of sentences session 8pptx.pptx
The basics of sentences session 8pptx.pptxThe basics of sentences session 8pptx.pptx
The basics of sentences session 8pptx.pptx
 
欧洲杯下注-欧洲杯下注押注官网-欧洲杯下注押注网站|【​网址​🎉ac44.net🎉​】
欧洲杯下注-欧洲杯下注押注官网-欧洲杯下注押注网站|【​网址​🎉ac44.net🎉​】欧洲杯下注-欧洲杯下注押注官网-欧洲杯下注押注网站|【​网址​🎉ac44.net🎉​】
欧洲杯下注-欧洲杯下注押注官网-欧洲杯下注押注网站|【​网址​🎉ac44.net🎉​】
 
Dreamin in Color '24 - (Workshop) Design an API Specification with MuleSoft's...
Dreamin in Color '24 - (Workshop) Design an API Specification with MuleSoft's...Dreamin in Color '24 - (Workshop) Design an API Specification with MuleSoft's...
Dreamin in Color '24 - (Workshop) Design an API Specification with MuleSoft's...
 
Post init hook in the odoo 17 ERP Module
Post init hook in the  odoo 17 ERP ModulePost init hook in the  odoo 17 ERP Module
Post init hook in the odoo 17 ERP Module
 
bryophytes.pptx bsc botany honours second semester
bryophytes.pptx bsc botany honours  second semesterbryophytes.pptx bsc botany honours  second semester
bryophytes.pptx bsc botany honours second semester
 
Opportunity scholarships and the schools that receive them
Opportunity scholarships and the schools that receive themOpportunity scholarships and the schools that receive them
Opportunity scholarships and the schools that receive them
 

Google file system

  • 1. GOOGLE FILE SYSTEM Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presented By – Ankit Thiranh
  • 2. OVERVIEW • Introduction • Architecture • Characteristics • System Interaction • Master Operation and Fault tolerance and diagnosis • Measurements • Some Real world clusters and their performance
  • 3. INTRODUCTION • Google – large amount of data • Need a good file distribution system to process its data • Solution: Google File System • GFS is : • Large • Distributed • Highly fault tolerant system
  • 4. ASSUMPTIONS • The system is built from many inexpensive commodity components that often fail. • The system stores a modest number of large files. • Primarily two kind of reads: large streaming reads and small random needs. • Many large sequential writes append data to files. • The system must efficiently implement well-defined semantics for multiple clients that concurrently append to the same file. • High sustained bandwidth is more important than low latency.
  • 6. CHARACTERISTICS • Single master • Chunk size • Metadata • In-Memory Data structures • Chunk Locations • Operational Log • Consistency Model (figure) • Guarantees by GFS • Implications for Applications Write Record Append Serial Success defined Defined interspersed with inconsistent Concurrent successes Consistent but undefined Failure inconsistent File Region State After Mutation
  • 7. SYSTEM INTERACTION • Leases and Mutation Order • Data flow • Atomic Record appends • Snapshot Figure 2: Write Control and Data Flow
  • 8. MASTER OPERATION • Namespace Management and Locking • Replica Placement • Creation, Re-replication, Rebalancing • Garbage Collection • Mechanism • Discussion • State Replica Detection
  • 9. FAULT TOLERANCE AND DIAGNOSIS • High Availability • Fast Recovery • Chunk Replication • Master Replication • Data Integrity • Diagnostics tools
  • 10. MEASUREMENTS Aggregate Throughputs. Top curves show theoretical limits imposed by the network topology. Bottom curves show measured throughputs. They have error bars that show 95% confidence intervals, which are illegible in some cases because of low variance in measurements.
  • 11. REAL WORLD CLUSTERS • Two clusters were examined: • Cluster A used for Research and development by over a hundred users. • Cluster B is used for production data processing with occasional human intervention • Storage • Metadata Cluster A B Chunkservers 342 227 Available disk Size 72 TB Used Disk Space 55 TB Characteristics of two GFS clusters 180 TB 155 TB Number of Files Number of Dead Files Number of chunks 735 k 22 k 992 k 737 k 232 k 1550 k Metadata at chunkservers Metadata at master 13 GB 48 MB 21 GB 60 MB
  • 12. PERFORMANCE EVALUATION OF TWO CLUSTERS • Read and write rates and Master load Cluster A B Read Rate (last minute) 583 MB/s 380 MB/s Read Rate (last hour) 562 MB/s 384 MB/s Read Rate (since start) 589 MB/s 49 MB/s Write Rate (last minute) 1 MB/s 101 MB/s Write Rate (last hour) 2 MB/s 117 MB/s Write Rate (since start) 25 MB/s 13 MB/s Master ops (last minute) 325 Ops/s 533 Ops/s Master ops (last hour) 381 Ops/s 518 Ops/s Master ops (since start) 202 Ops/s 347 Ops/s Performance Metrics for Two GFS Clusters
  • 13. WORKLOAD BREAKDOWN • Chunkserver Workload Operation Read Write Record Append Cluster X Y X Y X Y 0K 0.4 2.6 0 0 0 0 1B….1K 0.1 4.1 6.6 4.9 0.2 9.2 1K…8K 65.2 38.5 0.4 1.0 18.9 15.2 8K…64K 29.9 45.1 17.8 43.0 78.0 2.8 64K….128K 0.1 0.7 2.3 1.9 < 0.1 4.3 128K….256K 0.2 0.3 31.6 0.4 < 0.1 10.6 256K…512K 0.1 0.1 4.2 7.7 < 0.1 31.2 512K….1M 3.9 6.9 35.5 28.7 2.2 25.5 1M..inf 0.1 1.8 1.5 12.3 0.7 2.2 Operation Read Write Record Append Cluster X Y X Y X Y 1B….1K < 0.1 <0.1 < 0.1 <0.1 < 0.1 <0.1 1K…8K 13.8 3.9 < 0.1 <0.1 < 0.1 0.1 8K…64K 11.4 9.3 2.4 5.9 78.0 0.3 64K….128K 0.3 0.7 0.3 0.3 < 0.1 1.2 128K….256K 0.8 0.6 16.5 0.2 < 0.1 5.8 256K…512K 1.4 0.3 3.4 7.7 < 0.1 38.4 512K….1M 65.9 55.1 74.1 58.0 0.1 46.8 1M..inf 6.4 28.0 3.3 28.0 53.9 7.4 Operations Break down by Size (% ) Bytes Transferred Breakdown by Operation Size (% )
  • 14. WORKLOAD BREAKDOWN • Master Workload Cluster X Y Open 26.1 16.3 Delete 0.7 1.5 FindLocation 64.3 65.8 FindLeaseHolder 7.8 13.4 FindMatchingFiles 0.6 2.2 All other combined 0.5 0.8 Master Requests Break down by Type (% )

Editor's Notes

  1. GFS – single master, multiple chunkservers, multiple client. Files- divided into chunks, chunks- immutable and globally unique 64 bit chunk handle. Stored in multiple chunkservers, master- contains metadata includes the namespace, access control information, mapping of file to chunks and current location of chunks
  2. Single Master- can make sophisticated chunk replacement and replication decisions using global knowledge. Read example Chunk Size – 64 MB, advantages – reduces client-master interation, client more likely to perform many operations on given chunk, reduces metadata size. Metadata – stores file and chunk namespaces, mapping from files to chunks, location to chunk’s relica, metadata stored in memory to do fast operations, chunk location – does not keep a record, polls at startup, monitor by sending heartbeat messages,operation log- contains a history of critical metadata changes. Guarantee- application mutation on same order to all the replicas , using chunk version numbers to detect any replica Consistent – all replicas have the same data, defined – consistent – defined and client can see what the mutation has written
  3. Mutation – operation that changes the content of metadata Data flow – bandwidth – data is [pushed linearly along the server, avoid bottlenecks and high-latency links- each machine forwards the data to closest possible, latency min – pipelining the data transfer over TCP connections. Record append – client specifies the data, GFS appends automatically, same way as control flow Snapshots – makes a copy of file or ‘directory tree’ minimizing any interruption with ongoing mutations
  4. Master – executes all namespace operations, manages chunk replicas, Namespace – GFS logically represent its namespace as a look up table mapping full path names to metadata. Replica placement - 1) maximise data reliability and availability, and 2) maximum bandwidth utilization Creation, re-replication – replicas on severs with below average disk utilization, limit recent creation on each chunk server, spread replicas of a chunk across racks Garbage collection – after deletion, file renamed to a hidden file, deleted after 3 days, orphaned chunks, State replica detection – chunkserver failure missing mutation while it is down, master assigns – chunk server numbers to distinguish
  5. Fast recovery – mast and chunk server designed such that they restore their data and start in two seconds Chunk replication – discussed earlier Master replication – operations log and checkpoints are replicated on multiple machines, shadow masters – provide read-only access Data integrity – uses checksumming to detect corruption of stored data, we can recover from corruption using replicas, but it is impractical Diagnostic tools – generate diagnostic logs that record many significant events. The RPC logs include the exact requests and responses sent on the wire, except for the file data being read or written.
  6. The two clusters have similar numbers of files, though B has a larger proportion of dead files, namely files which were deleted or replaced by a new version but whose storage have not yet been reclaimed. It also has more chunks because its files tend to be larger
  7. Read returns no data in Y b’coz applications in production system use file as producer-consumer queues cluster Y sees a much higher percentage of large record appends than cluster X does because our production systems, which use cluster Y, are more aggressively tuned for GFS
  翻译: