ๅฐŠๆ•ฌ็š„ ๅพฎไฟกๆฑ‡็Ž‡๏ผš1ๅ†† โ‰ˆ 0.046089 ๅ…ƒ ๆ”ฏไป˜ๅฎๆฑ‡็Ž‡๏ผš1ๅ†† โ‰ˆ 0.04618ๅ…ƒ [้€€ๅ‡บ็™ปๅฝ•]
SlideShare a Scribd company logo
Avinash Kumar
BE Computer-2
  Roll No-40
Contents
๏‚— Introduction to GFS
๏‚— System Architecture
๏‚— System Features
๏‚— Working of GFS
๏‚— Latest advancement
๏‚— Conclusion
๏‚— Questions
Introduction
๏‚— More than 15,000 commodity-class PC's.
๏‚— Multiple clusters distributed worldwide.
๏‚— Thousands of queries served per second.
๏‚— One query reads 100's of MB of data.
๏‚— One query consumes 10's of billions of CPU
  cycles.
๏‚— Google stores dozens of copies of the entire Web!

 Conclusion: Need large, distributed, highly
          fault tolerant file system.
System Architecture
 A GFS cluster consists of a single master and multiple
 chunk-servers and is accessed by multiple clients.
Large Chunk




๏‚— GFS uses large chunk: 64MB (1G = 1024 MB = 16 chunks)
   ๏‚— Stored as a plain Linux file, which will be lazily extended up to 64MB.
๏‚— Opt to many read and write on a given chunk
   ๏‚— Reduces network overhead by keeping a connection to the chunk
     server.
   ๏‚— See also Map-Reduce, Big-Table.
Architecture (contโ€™d)
 Chunkserver
   Files are divided into fixed-size chunks (64 MB)
   Each chunk is identified by an immutable and globally
   unique 64 bit chunkhandle assigned by the master at the
   time of chunkcreation
   Chunkservers store chunks on local disks as Linux files
   and read or write chunk data specified by a chunkhandle
    For reliability, each chunk is replicated on multiple
   chunkservers. (default 3 replicas)

 GFS Client
   GFS client code linked into each application implements
   the file system API and communicates with the master
   and chunkservers to read or write data on behalf of the
System Metadata
 The master stores three major types of metadata:
     The file and chunk namespaces
     The mapping from files to chunks
     The locations of each chunkโ€™s replicas
 All metadata is kept in the masterโ€™s memory
 The first two types are also kept persistent by logging
 mutations to an operation log stored on the masterโ€™s local disk
 and replicated on remote machines.
 The master does not store third type persistently. Instead, it
 asks each chunkserver about its chunks at master startup
System Features
๏‚—Page Rank- Probability that a random surfer visits the site
         Citations (Back links)
     โ€ข
         How is Page Rank calculated??
     โ€ข

     PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))
     where,
     PR      -> Page Rank of a page
     T1โ€ฆ.Tn -> Pages that point to Page A (citations)
     d       -> Damping Factor (0<d<1)
     C(A) -> No. of Links going out from A
         Page Rank of a page depends on-
            ๏‚ง Number of pages pointing to it.
            ๏‚ง Page Rank of the page that points to it.
System Features
๏‚— Anchor Text- text associated with the link
      Association with the page the link is on
    โ€ข
    โ€ข Association with the page the link points to( unique
      to Google)
    Advantages:
    โ€ข Anchors contain more information than the pages
      themselves
    โ€ข Documents that cannot be indexed can be displayed


๏‚— Other Features:
        Proximity of location information in search for all hits
    โ€ข
        Track of visual presentation details
    โ€ข
System Anatomy
Working Of GFS
Google Query Evaluation
     Parse the query.
1.

     Convert words into wordIDs.
2.

     Seek to the start of the doclist in the short barrel for every word.
3.

     Scan through the doclists until there is a document that matches all the
4.
     search terms.

     Compute the rank of that document for the query.
5.

     If in the short barrels and at the end of any doclist, seek to the start of
6.
     the doclist in the full barrel for every word and go to step 4.

     If we are not at the end of any doclist go to step 4.
7.

Sort the documents that have matched by rank and return the top k.
Client Read
๏‚— Client sends master:
  ๏‚— read(file name, chunk index)
๏‚— Masterโ€™s reply:
  ๏‚— chunk ID, chunk version number, locations of
     replicas
๏‚— Client sends โ€œclosestโ€ chunkserver w/replica:
   ๏‚— read(chunk ID, byte range)
   ๏‚— โ€œClosestโ€ determined by IP address on simple rack-
     based network topology
๏‚— Chunkserver replies with data
Client Write
๏‚— Some chunkserver is primary for each chunk
  ๏‚— Master grants lease to primary (typically for 60 sec.)
  ๏‚— Leases renewed using periodic heartbeat messages
     between master and chunkservers
๏‚— Client asks server for primary and secondary replicas for
  each chunk
๏‚— Client sends data to replicas in daisy chain
   ๏‚— Pipelined: each replica forwards as it receives
   ๏‚— Takes advantage of full-duplex Ethernet links
Client Write (2)
๏‚— All replicas acknowledge data write to client
๏‚— Client sends write request to primary
๏‚— Primary assigns serial number to write
  request, providing ordering
๏‚— Primary forwards write request with same serial
  number to secondaries
๏‚— Secondaries all reply to primary after completing
  write
๏‚— Primary replies to client
Client Write (3)
What Happen If the Master Reboots?
๏‚— Replays log from disk
   ๏‚— Recovers namespace (directory) information
   ๏‚— Recovers file-to-chunk-ID mapping
๏‚— Asks chunkservers which chunks they hold
  ๏‚— Recovers chunk-ID-to-chunkserver mapping
๏‚— If chunk server has older chunk, itโ€™s stale
   ๏‚— Chunk server down at lease renewal
๏‚— If chunk server has newer chunk, adopt its version
  number
   ๏‚— Master may have failed while granting lease
What Happen if Chunkserver Fails?

๏‚— Master notices missing heartbeats
๏‚— Master decrements count of replicas for all chunks on
  dead chunkserver
๏‚— Master re-replicates chunks missing replicas in
  background
  ๏‚— Highest priority for chunks missing greatest
    number of replicas
Latest Advancement
๏‚— gMail - An easily configurable email
          service with 1GB of web space.
๏‚— Blogger- A free web-based service that
            helps consumers publish on the
            web without writing code or installing
            software.
๏‚— Google โ€œnext generation corporate s/wโ€
             - A smaller version of the google
               software, modified for private use.
Conclusion
๏‚— Success: used actively by Google to support search service and
  other applications
   ๏‚— Availability and recoverability on cheap hardware
   ๏‚— High throughput by decoupling control and data
   ๏‚— Supports massive data sets and concurrent appends
๏‚— Semantics not transparent to apps
   ๏‚— Must verify file contents to avoid inconsistent
     regions, repeated appends (at-least-once semantics)
๏‚— Performance not good for all apps
   ๏‚— Assumes read-once, write-once workload (no client
     caching!)
ANY QUESTION
      ?

More Related Content

What's hot

Google file system
Google file systemGoogle file system
Google file system
Ankit Thiranh
ย 
6.distributed shared memory
6.distributed shared memory6.distributed shared memory
6.distributed shared memory
Gd Goenka University
ย 
Resource management
Resource managementResource management
Resource management
Dr Sandeep Kumar Poonia
ย 
key distribution in network security
key distribution in network securitykey distribution in network security
key distribution in network security
babak danyal
ย 
Comet Cloud
Comet CloudComet Cloud
Comet Cloud
pradeepas7
ย 
File replication
File replicationFile replication
File replication
Klawal13
ย 
google file system
google file systemgoogle file system
google file system
diptipan
ย 
Deadlock in Distributed Systems
Deadlock in Distributed SystemsDeadlock in Distributed Systems
Deadlock in Distributed Systems
Pritom Saha Akash
ย 
Synchronization in distributed computing
Synchronization in distributed computingSynchronization in distributed computing
Synchronization in distributed computing
SVijaylakshmi
ย 
Cloud File System with GFS and HDFS
Cloud File System with GFS and HDFS  Cloud File System with GFS and HDFS
Cloud File System with GFS and HDFS
Dr Neelesh Jain
ย 
Distributed System-Multicast & Indirect communication
Distributed System-Multicast & Indirect communicationDistributed System-Multicast & Indirect communication
Distributed System-Multicast & Indirect communication
MNM Jain Engineering College
ย 
management of distributed transactions
management of distributed transactionsmanagement of distributed transactions
management of distributed transactions
Nilu Desai
ย 
Key management
Key managementKey management
Key management
Sujata Regoti
ย 
Google App Engine ppt
Google App Engine  pptGoogle App Engine  ppt
Process Management-Process Migration
Process Management-Process MigrationProcess Management-Process Migration
Process Management-Process Migration
MNM Jain Engineering College
ย 
Clock synchronization in distributed system
Clock synchronization in distributed systemClock synchronization in distributed system
Clock synchronization in distributed system
Sunita Sahu
ย 
Chapter 6-Consistency and Replication.ppt
Chapter 6-Consistency and Replication.pptChapter 6-Consistency and Replication.ppt
Chapter 6-Consistency and Replication.ppt
sirajmohammed35
ย 
Message Authentication Code & HMAC
Message Authentication Code & HMACMessage Authentication Code & HMAC
Message Authentication Code & HMAC
Krishna Gehlot
ย 
Design Goals of Distributed System
Design Goals of Distributed SystemDesign Goals of Distributed System
Design Goals of Distributed System
Ashish KC
ย 
Concept learning
Concept learningConcept learning
Concept learning
Musa Hawamdah
ย 

What's hot (20)

Google file system
Google file systemGoogle file system
Google file system
ย 
6.distributed shared memory
6.distributed shared memory6.distributed shared memory
6.distributed shared memory
ย 
Resource management
Resource managementResource management
Resource management
ย 
key distribution in network security
key distribution in network securitykey distribution in network security
key distribution in network security
ย 
Comet Cloud
Comet CloudComet Cloud
Comet Cloud
ย 
File replication
File replicationFile replication
File replication
ย 
google file system
google file systemgoogle file system
google file system
ย 
Deadlock in Distributed Systems
Deadlock in Distributed SystemsDeadlock in Distributed Systems
Deadlock in Distributed Systems
ย 
Synchronization in distributed computing
Synchronization in distributed computingSynchronization in distributed computing
Synchronization in distributed computing
ย 
Cloud File System with GFS and HDFS
Cloud File System with GFS and HDFS  Cloud File System with GFS and HDFS
Cloud File System with GFS and HDFS
ย 
Distributed System-Multicast & Indirect communication
Distributed System-Multicast & Indirect communicationDistributed System-Multicast & Indirect communication
Distributed System-Multicast & Indirect communication
ย 
management of distributed transactions
management of distributed transactionsmanagement of distributed transactions
management of distributed transactions
ย 
Key management
Key managementKey management
Key management
ย 
Google App Engine ppt
Google App Engine  pptGoogle App Engine  ppt
Google App Engine ppt
ย 
Process Management-Process Migration
Process Management-Process MigrationProcess Management-Process Migration
Process Management-Process Migration
ย 
Clock synchronization in distributed system
Clock synchronization in distributed systemClock synchronization in distributed system
Clock synchronization in distributed system
ย 
Chapter 6-Consistency and Replication.ppt
Chapter 6-Consistency and Replication.pptChapter 6-Consistency and Replication.ppt
Chapter 6-Consistency and Replication.ppt
ย 
Message Authentication Code & HMAC
Message Authentication Code & HMACMessage Authentication Code & HMAC
Message Authentication Code & HMAC
ย 
Design Goals of Distributed System
Design Goals of Distributed SystemDesign Goals of Distributed System
Design Goals of Distributed System
ย 
Concept learning
Concept learningConcept learning
Concept learning
ย 

Similar to Google File System

Google File System
Google File SystemGoogle File System
Google File System
DreamJobs1
ย 
Qcon
QconQcon
Qcon
adityaagarwal
ย 
advanced Google file System
advanced Google file Systemadvanced Google file System
advanced Google file System
diptipan
ย 
GFS - Google File System
GFS - Google File SystemGFS - Google File System
GFS - Google File System
tutchiio
ย 
System Design.pdf
System Design.pdfSystem Design.pdf
System Design.pdf
JitendraYadav351971
ย 
SRV407 Deep Dive on Amazon Aurora
SRV407 Deep Dive on Amazon AuroraSRV407 Deep Dive on Amazon Aurora
SRV407 Deep Dive on Amazon Aurora
Amazon Web Services
ย 
Become a Performance Diagnostics Hero
Become a Performance Diagnostics HeroBecome a Performance Diagnostics Hero
Become a Performance Diagnostics Hero
TechWell
ย 
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
Amazon Web Services
ย 
KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...
KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...
KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...
confluent
ย 
Gfsไป‹็ป
Gfsไป‹็ปGfsไป‹็ป
Gfsไป‹็ป
yiditushe
ย 
Performance Optimization in Large Systems - Cusec 2019
Performance Optimization in Large Systems - Cusec 2019Performance Optimization in Large Systems - Cusec 2019
Performance Optimization in Large Systems - Cusec 2019
Pierre-Luc Maheu
ย 
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDBEVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
Scott Mansfield
ย 
Troubleshooting SQL Server
Troubleshooting SQL ServerTroubleshooting SQL Server
Troubleshooting SQL Server
Stephen Rose
ย 
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB
ย 
Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applications
Ding Li
ย 
No C-QL (Or how I learned to stop worrying, and love eventual consistency) (N...
No C-QL (Or how I learned to stop worrying, and love eventual consistency) (N...No C-QL (Or how I learned to stop worrying, and love eventual consistency) (N...
No C-QL (Or how I learned to stop worrying, and love eventual consistency) (N...
Brian Brazil
ย 
Top Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your PipelineTop Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your Pipeline
Andreas Grabner
ย 
Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...
Antonio Cesarano
ย 
Testing pcโ€™s performance
Testing pcโ€™s performanceTesting pcโ€™s performance
Testing pcโ€™s performance
iteclearners
ย 
Oracle Real Application Cluster ( RAC )
Oracle Real Application Cluster ( RAC )Oracle Real Application Cluster ( RAC )
Oracle Real Application Cluster ( RAC )
varasteh65
ย 

Similar to Google File System (20)

Google File System
Google File SystemGoogle File System
Google File System
ย 
Qcon
QconQcon
Qcon
ย 
advanced Google file System
advanced Google file Systemadvanced Google file System
advanced Google file System
ย 
GFS - Google File System
GFS - Google File SystemGFS - Google File System
GFS - Google File System
ย 
System Design.pdf
System Design.pdfSystem Design.pdf
System Design.pdf
ย 
SRV407 Deep Dive on Amazon Aurora
SRV407 Deep Dive on Amazon AuroraSRV407 Deep Dive on Amazon Aurora
SRV407 Deep Dive on Amazon Aurora
ย 
Become a Performance Diagnostics Hero
Become a Performance Diagnostics HeroBecome a Performance Diagnostics Hero
Become a Performance Diagnostics Hero
ย 
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
ย 
KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...
KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...
KSQL Performance Tuning for Fun and Profit ( Nick Dearden, Confluent) Kafka S...
ย 
Gfsไป‹็ป
Gfsไป‹็ปGfsไป‹็ป
Gfsไป‹็ป
ย 
Performance Optimization in Large Systems - Cusec 2019
Performance Optimization in Large Systems - Cusec 2019Performance Optimization in Large Systems - Cusec 2019
Performance Optimization in Large Systems - Cusec 2019
ย 
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDBEVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
ย 
Troubleshooting SQL Server
Troubleshooting SQL ServerTroubleshooting SQL Server
Troubleshooting SQL Server
ย 
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
ย 
Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applications
ย 
No C-QL (Or how I learned to stop worrying, and love eventual consistency) (N...
No C-QL (Or how I learned to stop worrying, and love eventual consistency) (N...No C-QL (Or how I learned to stop worrying, and love eventual consistency) (N...
No C-QL (Or how I learned to stop worrying, and love eventual consistency) (N...
ย 
Top Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your PipelineTop Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your Pipeline
ย 
Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...
ย 
Testing pcโ€™s performance
Testing pcโ€™s performanceTesting pcโ€™s performance
Testing pcโ€™s performance
ย 
Oracle Real Application Cluster ( RAC )
Oracle Real Application Cluster ( RAC )Oracle Real Application Cluster ( RAC )
Oracle Real Application Cluster ( RAC )
ย 

Recently uploaded

220711130082 Srabanti Bag Internet Resources For Natural Science
220711130082 Srabanti Bag Internet Resources For Natural Science220711130082 Srabanti Bag Internet Resources For Natural Science
220711130082 Srabanti Bag Internet Resources For Natural Science
Kalna College
ย 
Cross-Cultural Leadership and Communication
Cross-Cultural Leadership and CommunicationCross-Cultural Leadership and Communication
Cross-Cultural Leadership and Communication
MattVassar1
ย 
Get Success with the Latest UiPath UIPATH-ADPV1 Exam Dumps (V11.02) 2024
Get Success with the Latest UiPath UIPATH-ADPV1 Exam Dumps (V11.02) 2024Get Success with the Latest UiPath UIPATH-ADPV1 Exam Dumps (V11.02) 2024
Get Success with the Latest UiPath UIPATH-ADPV1 Exam Dumps (V11.02) 2024
yarusun
ย 
Decolonizing Universal Design for Learning
Decolonizing Universal Design for LearningDecolonizing Universal Design for Learning
Decolonizing Universal Design for Learning
Frederic Fovet
ย 
The basics of sentences session 8pptx.pptx
The basics of sentences session 8pptx.pptxThe basics of sentences session 8pptx.pptx
The basics of sentences session 8pptx.pptx
heathfieldcps1
ย 
Talking Tech through Compelling Visual Aids
Talking Tech through Compelling Visual AidsTalking Tech through Compelling Visual Aids
Talking Tech through Compelling Visual Aids
MattVassar1
ย 
220711130083 SUBHASHREE RAKSHIT Internet resources for social science
220711130083 SUBHASHREE RAKSHIT  Internet resources for social science220711130083 SUBHASHREE RAKSHIT  Internet resources for social science
220711130083 SUBHASHREE RAKSHIT Internet resources for social science
Kalna College
ย 
Keynote given on June 24 for MASSP at Grand Traverse City
Keynote given on June 24 for MASSP at Grand Traverse CityKeynote given on June 24 for MASSP at Grand Traverse City
Keynote given on June 24 for MASSP at Grand Traverse City
PJ Caposey
ย 
220711130095 Tanu Pandey message currency, communication speed & control EPC ...
220711130095 Tanu Pandey message currency, communication speed & control EPC ...220711130095 Tanu Pandey message currency, communication speed & control EPC ...
220711130095 Tanu Pandey message currency, communication speed & control EPC ...
Kalna College
ย 
The Science of Learning: implications for modern teaching
The Science of Learning: implications for modern teachingThe Science of Learning: implications for modern teaching
The Science of Learning: implications for modern teaching
Derek Wenmoth
ย 
(T.L.E.) Agriculture: "Ornamental Plants"
(T.L.E.) Agriculture: "Ornamental Plants"(T.L.E.) Agriculture: "Ornamental Plants"
(T.L.E.) Agriculture: "Ornamental Plants"
MJDuyan
ย 
220711130088 Sumi Basak Virtual University EPC 3.pptx
220711130088 Sumi Basak Virtual University EPC 3.pptx220711130088 Sumi Basak Virtual University EPC 3.pptx
220711130088 Sumi Basak Virtual University EPC 3.pptx
Kalna College
ย 
A Quiz on Drug Abuse Awareness by Quizzito
A Quiz on Drug Abuse Awareness by QuizzitoA Quiz on Drug Abuse Awareness by Quizzito
A Quiz on Drug Abuse Awareness by Quizzito
Quizzito The Quiz Society of Gargi College
ย 
Environmental science 1.What is environmental science and components of envir...
Environmental science 1.What is environmental science and components of envir...Environmental science 1.What is environmental science and components of envir...
Environmental science 1.What is environmental science and components of envir...
Deepika
ย 
Brand Guideline of Bashundhara A4 Paper - 2024
Brand Guideline of Bashundhara A4 Paper - 2024Brand Guideline of Bashundhara A4 Paper - 2024
Brand Guideline of Bashundhara A4 Paper - 2024
khabri85
ย 
Angle-or,,,,,-Pull-of-Muscleexercise therapy.pptx
Angle-or,,,,,-Pull-of-Muscleexercise therapy.pptxAngle-or,,,,,-Pull-of-Muscleexercise therapy.pptx
Angle-or,,,,,-Pull-of-Muscleexercise therapy.pptx
siddhimeena3
ย 
Information and Communication Technology in Education
Information and Communication Technology in EducationInformation and Communication Technology in Education
Information and Communication Technology in Education
MJDuyan
ย 
Non-Verbal Communication for Tech Professionals
Non-Verbal Communication for Tech ProfessionalsNon-Verbal Communication for Tech Professionals
Non-Verbal Communication for Tech Professionals
MattVassar1
ย 
Bแป˜ Bร€I TแบฌP TEST THEO UNIT - FORM 2025 - TIแบพNG ANH 12 GLOBAL SUCCESS - KรŒ 1 (B...
Bแป˜ Bร€I TแบฌP TEST THEO UNIT - FORM 2025 - TIแบพNG ANH 12 GLOBAL SUCCESS - KรŒ 1 (B...Bแป˜ Bร€I TแบฌP TEST THEO UNIT - FORM 2025 - TIแบพNG ANH 12 GLOBAL SUCCESS - KรŒ 1 (B...
Bแป˜ Bร€I TแบฌP TEST THEO UNIT - FORM 2025 - TIแบพNG ANH 12 GLOBAL SUCCESS - KรŒ 1 (B...
Nguyen Thanh Tu Collection
ย 
Slides Peluncuran Amalan Pemakanan Sihat.pptx
Slides Peluncuran Amalan Pemakanan Sihat.pptxSlides Peluncuran Amalan Pemakanan Sihat.pptx
Slides Peluncuran Amalan Pemakanan Sihat.pptx
shabeluno
ย 

Recently uploaded (20)

220711130082 Srabanti Bag Internet Resources For Natural Science
220711130082 Srabanti Bag Internet Resources For Natural Science220711130082 Srabanti Bag Internet Resources For Natural Science
220711130082 Srabanti Bag Internet Resources For Natural Science
ย 
Cross-Cultural Leadership and Communication
Cross-Cultural Leadership and CommunicationCross-Cultural Leadership and Communication
Cross-Cultural Leadership and Communication
ย 
Get Success with the Latest UiPath UIPATH-ADPV1 Exam Dumps (V11.02) 2024
Get Success with the Latest UiPath UIPATH-ADPV1 Exam Dumps (V11.02) 2024Get Success with the Latest UiPath UIPATH-ADPV1 Exam Dumps (V11.02) 2024
Get Success with the Latest UiPath UIPATH-ADPV1 Exam Dumps (V11.02) 2024
ย 
Decolonizing Universal Design for Learning
Decolonizing Universal Design for LearningDecolonizing Universal Design for Learning
Decolonizing Universal Design for Learning
ย 
The basics of sentences session 8pptx.pptx
The basics of sentences session 8pptx.pptxThe basics of sentences session 8pptx.pptx
The basics of sentences session 8pptx.pptx
ย 
Talking Tech through Compelling Visual Aids
Talking Tech through Compelling Visual AidsTalking Tech through Compelling Visual Aids
Talking Tech through Compelling Visual Aids
ย 
220711130083 SUBHASHREE RAKSHIT Internet resources for social science
220711130083 SUBHASHREE RAKSHIT  Internet resources for social science220711130083 SUBHASHREE RAKSHIT  Internet resources for social science
220711130083 SUBHASHREE RAKSHIT Internet resources for social science
ย 
Keynote given on June 24 for MASSP at Grand Traverse City
Keynote given on June 24 for MASSP at Grand Traverse CityKeynote given on June 24 for MASSP at Grand Traverse City
Keynote given on June 24 for MASSP at Grand Traverse City
ย 
220711130095 Tanu Pandey message currency, communication speed & control EPC ...
220711130095 Tanu Pandey message currency, communication speed & control EPC ...220711130095 Tanu Pandey message currency, communication speed & control EPC ...
220711130095 Tanu Pandey message currency, communication speed & control EPC ...
ย 
The Science of Learning: implications for modern teaching
The Science of Learning: implications for modern teachingThe Science of Learning: implications for modern teaching
The Science of Learning: implications for modern teaching
ย 
(T.L.E.) Agriculture: "Ornamental Plants"
(T.L.E.) Agriculture: "Ornamental Plants"(T.L.E.) Agriculture: "Ornamental Plants"
(T.L.E.) Agriculture: "Ornamental Plants"
ย 
220711130088 Sumi Basak Virtual University EPC 3.pptx
220711130088 Sumi Basak Virtual University EPC 3.pptx220711130088 Sumi Basak Virtual University EPC 3.pptx
220711130088 Sumi Basak Virtual University EPC 3.pptx
ย 
A Quiz on Drug Abuse Awareness by Quizzito
A Quiz on Drug Abuse Awareness by QuizzitoA Quiz on Drug Abuse Awareness by Quizzito
A Quiz on Drug Abuse Awareness by Quizzito
ย 
Environmental science 1.What is environmental science and components of envir...
Environmental science 1.What is environmental science and components of envir...Environmental science 1.What is environmental science and components of envir...
Environmental science 1.What is environmental science and components of envir...
ย 
Brand Guideline of Bashundhara A4 Paper - 2024
Brand Guideline of Bashundhara A4 Paper - 2024Brand Guideline of Bashundhara A4 Paper - 2024
Brand Guideline of Bashundhara A4 Paper - 2024
ย 
Angle-or,,,,,-Pull-of-Muscleexercise therapy.pptx
Angle-or,,,,,-Pull-of-Muscleexercise therapy.pptxAngle-or,,,,,-Pull-of-Muscleexercise therapy.pptx
Angle-or,,,,,-Pull-of-Muscleexercise therapy.pptx
ย 
Information and Communication Technology in Education
Information and Communication Technology in EducationInformation and Communication Technology in Education
Information and Communication Technology in Education
ย 
Non-Verbal Communication for Tech Professionals
Non-Verbal Communication for Tech ProfessionalsNon-Verbal Communication for Tech Professionals
Non-Verbal Communication for Tech Professionals
ย 
Bแป˜ Bร€I TแบฌP TEST THEO UNIT - FORM 2025 - TIแบพNG ANH 12 GLOBAL SUCCESS - KรŒ 1 (B...
Bแป˜ Bร€I TแบฌP TEST THEO UNIT - FORM 2025 - TIแบพNG ANH 12 GLOBAL SUCCESS - KรŒ 1 (B...Bแป˜ Bร€I TแบฌP TEST THEO UNIT - FORM 2025 - TIแบพNG ANH 12 GLOBAL SUCCESS - KรŒ 1 (B...
Bแป˜ Bร€I TแบฌP TEST THEO UNIT - FORM 2025 - TIแบพNG ANH 12 GLOBAL SUCCESS - KรŒ 1 (B...
ย 
Slides Peluncuran Amalan Pemakanan Sihat.pptx
Slides Peluncuran Amalan Pemakanan Sihat.pptxSlides Peluncuran Amalan Pemakanan Sihat.pptx
Slides Peluncuran Amalan Pemakanan Sihat.pptx
ย 

Google File System

  • 2. Contents ๏‚— Introduction to GFS ๏‚— System Architecture ๏‚— System Features ๏‚— Working of GFS ๏‚— Latest advancement ๏‚— Conclusion ๏‚— Questions
  • 3. Introduction ๏‚— More than 15,000 commodity-class PC's. ๏‚— Multiple clusters distributed worldwide. ๏‚— Thousands of queries served per second. ๏‚— One query reads 100's of MB of data. ๏‚— One query consumes 10's of billions of CPU cycles. ๏‚— Google stores dozens of copies of the entire Web! Conclusion: Need large, distributed, highly fault tolerant file system.
  • 4. System Architecture A GFS cluster consists of a single master and multiple chunk-servers and is accessed by multiple clients.
  • 5. Large Chunk ๏‚— GFS uses large chunk: 64MB (1G = 1024 MB = 16 chunks) ๏‚— Stored as a plain Linux file, which will be lazily extended up to 64MB. ๏‚— Opt to many read and write on a given chunk ๏‚— Reduces network overhead by keeping a connection to the chunk server. ๏‚— See also Map-Reduce, Big-Table.
  • 6. Architecture (contโ€™d) Chunkserver Files are divided into fixed-size chunks (64 MB) Each chunk is identified by an immutable and globally unique 64 bit chunkhandle assigned by the master at the time of chunkcreation Chunkservers store chunks on local disks as Linux files and read or write chunk data specified by a chunkhandle For reliability, each chunk is replicated on multiple chunkservers. (default 3 replicas) GFS Client GFS client code linked into each application implements the file system API and communicates with the master and chunkservers to read or write data on behalf of the
  • 7. System Metadata The master stores three major types of metadata: The file and chunk namespaces The mapping from files to chunks The locations of each chunkโ€™s replicas All metadata is kept in the masterโ€™s memory The first two types are also kept persistent by logging mutations to an operation log stored on the masterโ€™s local disk and replicated on remote machines. The master does not store third type persistently. Instead, it asks each chunkserver about its chunks at master startup
  • 8. System Features ๏‚—Page Rank- Probability that a random surfer visits the site Citations (Back links) โ€ข How is Page Rank calculated?? โ€ข PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn)) where, PR -> Page Rank of a page T1โ€ฆ.Tn -> Pages that point to Page A (citations) d -> Damping Factor (0<d<1) C(A) -> No. of Links going out from A Page Rank of a page depends on- ๏‚ง Number of pages pointing to it. ๏‚ง Page Rank of the page that points to it.
  • 9. System Features ๏‚— Anchor Text- text associated with the link Association with the page the link is on โ€ข โ€ข Association with the page the link points to( unique to Google) Advantages: โ€ข Anchors contain more information than the pages themselves โ€ข Documents that cannot be indexed can be displayed ๏‚— Other Features: Proximity of location information in search for all hits โ€ข Track of visual presentation details โ€ข
  • 12. Google Query Evaluation Parse the query. 1. Convert words into wordIDs. 2. Seek to the start of the doclist in the short barrel for every word. 3. Scan through the doclists until there is a document that matches all the 4. search terms. Compute the rank of that document for the query. 5. If in the short barrels and at the end of any doclist, seek to the start of 6. the doclist in the full barrel for every word and go to step 4. If we are not at the end of any doclist go to step 4. 7. Sort the documents that have matched by rank and return the top k.
  • 13. Client Read ๏‚— Client sends master: ๏‚— read(file name, chunk index) ๏‚— Masterโ€™s reply: ๏‚— chunk ID, chunk version number, locations of replicas ๏‚— Client sends โ€œclosestโ€ chunkserver w/replica: ๏‚— read(chunk ID, byte range) ๏‚— โ€œClosestโ€ determined by IP address on simple rack- based network topology ๏‚— Chunkserver replies with data
  • 14. Client Write ๏‚— Some chunkserver is primary for each chunk ๏‚— Master grants lease to primary (typically for 60 sec.) ๏‚— Leases renewed using periodic heartbeat messages between master and chunkservers ๏‚— Client asks server for primary and secondary replicas for each chunk ๏‚— Client sends data to replicas in daisy chain ๏‚— Pipelined: each replica forwards as it receives ๏‚— Takes advantage of full-duplex Ethernet links
  • 15. Client Write (2) ๏‚— All replicas acknowledge data write to client ๏‚— Client sends write request to primary ๏‚— Primary assigns serial number to write request, providing ordering ๏‚— Primary forwards write request with same serial number to secondaries ๏‚— Secondaries all reply to primary after completing write ๏‚— Primary replies to client
  • 17. What Happen If the Master Reboots? ๏‚— Replays log from disk ๏‚— Recovers namespace (directory) information ๏‚— Recovers file-to-chunk-ID mapping ๏‚— Asks chunkservers which chunks they hold ๏‚— Recovers chunk-ID-to-chunkserver mapping ๏‚— If chunk server has older chunk, itโ€™s stale ๏‚— Chunk server down at lease renewal ๏‚— If chunk server has newer chunk, adopt its version number ๏‚— Master may have failed while granting lease
  • 18. What Happen if Chunkserver Fails? ๏‚— Master notices missing heartbeats ๏‚— Master decrements count of replicas for all chunks on dead chunkserver ๏‚— Master re-replicates chunks missing replicas in background ๏‚— Highest priority for chunks missing greatest number of replicas
  • 19. Latest Advancement ๏‚— gMail - An easily configurable email service with 1GB of web space. ๏‚— Blogger- A free web-based service that helps consumers publish on the web without writing code or installing software. ๏‚— Google โ€œnext generation corporate s/wโ€ - A smaller version of the google software, modified for private use.
  • 20. Conclusion ๏‚— Success: used actively by Google to support search service and other applications ๏‚— Availability and recoverability on cheap hardware ๏‚— High throughput by decoupling control and data ๏‚— Supports massive data sets and concurrent appends ๏‚— Semantics not transparent to apps ๏‚— Must verify file contents to avoid inconsistent regions, repeated appends (at-least-once semantics) ๏‚— Performance not good for all apps ๏‚— Assumes read-once, write-once workload (no client caching!)
  ็ฟป่ฏ‘๏ผš