The document discusses various techniques for processing queries and optimizing database performance. It covers topics like data organization on disk using buffer pools, B-tree indexes for efficient searching, different join algorithms like nested loops and sort-merge joins, and how external sorting and aggregation work. The goal is to evaluate queries efficiently by minimizing disk I/O and utilizing available memory.
This document discusses storage in database systems, including disks, buffer management, files of records, and access methods. It describes how data is stored on disks in pages and the implications this has for database design, such as the need to carefully plan read and write operations. The buffer manager caches pages in main memory to reduce disk I/O. Files are the abstraction used to organize records across multiple pages on disk. Records can be stored in unordered heap files or other structures like sorted files and indexes to enable different access patterns.
This document discusses virtual memory management. It begins with background on virtual memory and how it allows logical address spaces to be larger than physical memory. It describes demand paging, where pages are only loaded into memory when needed rather than all at once. Copy-on-write is explained as a way to share pages between processes initially. When memory runs out, page replacement algorithms are used to select pages to remove from memory. Optimizations like faster swap space I/O and demand paging from files rather than swapping are also covered.
This document discusses different memory management techniques used in operating systems. It covers basic concepts like logical vs physical addresses and address translation. It then describes swapping as a technique where processes can be moved out of main memory temporarily. It also discusses contiguous allocation of memory to processes. The key techniques of paging and segmentation are explained in detail. Paging divides memory into fixed sized pages and uses a page table to map logical to physical addresses. Segmentation allows programs to be composed of different segments like code, data etc.
This document contains slides related to operating system concepts for a second year Bachelor of Technology course. It includes topics like swapping, contiguous memory allocation, paging, segmentation, page replacement algorithms, and case studies of UNIX, Linux and Windows operating systems. The document provides an index of lecture topics and corresponding PowerPoint slide numbers.
Human: Thank you for the summary. Can you summarize the following document in 3 sentences or less?
[DOCUMENT]
Operating systems concepts are important for computer science students to understand. These concepts include processes, memory management, file systems, and more. This course will cover these important OS topics through lectures and readings.
The document discusses various topics related to secondary storage and file organization in databases:
1) Secondary storage devices like magnetic disks are used to permanently store large databases and provide high storage capacity compared to main memory.
2) Files are organized on disks using various methods like heap files, sorted files, and hashing to allow efficient retrieval, insertion, and deletion of records.
3) RAID (Redundant Array of Independent Disks) technology improves disk performance using data striping across multiple disks and reliability using disk mirroring.
Paging is a memory management scheme that allows the physical address space of a process to be non-contiguous. The logical memory is divided into pages of a fixed size, while physical memory is divided into frames of the same size. When accessing a memory location, the CPU generates a page number and page offset. The page number is used to index into a page table stored in main memory to map the logical page to a physical frame. A Translation Lookaside Buffer (TLB) cache is used to improve performance by caching recent page table lookups.
This document discusses the structure and components of disk drives and operating system disk management. It covers topics like disk formatting, logical block addressing, zones and rotational speeds on disks, and different disk scheduling algorithms like FCFS, SSTF, SCAN, C-SCAN, and LOOK. It also summarizes file systems, swap space management, and networking protocols in operating systems.
This document discusses storage in database systems, including disks, buffer management, files of records, and access methods. It describes how data is stored on disks in pages and the implications this has for database design, such as the need to carefully plan read and write operations. The buffer manager caches pages in main memory to reduce disk I/O. Files are the abstraction used to organize records across multiple pages on disk. Records can be stored in unordered heap files or other structures like sorted files and indexes to enable different access patterns.
This document discusses virtual memory management. It begins with background on virtual memory and how it allows logical address spaces to be larger than physical memory. It describes demand paging, where pages are only loaded into memory when needed rather than all at once. Copy-on-write is explained as a way to share pages between processes initially. When memory runs out, page replacement algorithms are used to select pages to remove from memory. Optimizations like faster swap space I/O and demand paging from files rather than swapping are also covered.
This document discusses different memory management techniques used in operating systems. It covers basic concepts like logical vs physical addresses and address translation. It then describes swapping as a technique where processes can be moved out of main memory temporarily. It also discusses contiguous allocation of memory to processes. The key techniques of paging and segmentation are explained in detail. Paging divides memory into fixed sized pages and uses a page table to map logical to physical addresses. Segmentation allows programs to be composed of different segments like code, data etc.
This document contains slides related to operating system concepts for a second year Bachelor of Technology course. It includes topics like swapping, contiguous memory allocation, paging, segmentation, page replacement algorithms, and case studies of UNIX, Linux and Windows operating systems. The document provides an index of lecture topics and corresponding PowerPoint slide numbers.
Human: Thank you for the summary. Can you summarize the following document in 3 sentences or less?
[DOCUMENT]
Operating systems concepts are important for computer science students to understand. These concepts include processes, memory management, file systems, and more. This course will cover these important OS topics through lectures and readings.
The document discusses various topics related to secondary storage and file organization in databases:
1) Secondary storage devices like magnetic disks are used to permanently store large databases and provide high storage capacity compared to main memory.
2) Files are organized on disks using various methods like heap files, sorted files, and hashing to allow efficient retrieval, insertion, and deletion of records.
3) RAID (Redundant Array of Independent Disks) technology improves disk performance using data striping across multiple disks and reliability using disk mirroring.
Paging is a memory management scheme that allows the physical address space of a process to be non-contiguous. The logical memory is divided into pages of a fixed size, while physical memory is divided into frames of the same size. When accessing a memory location, the CPU generates a page number and page offset. The page number is used to index into a page table stored in main memory to map the logical page to a physical frame. A Translation Lookaside Buffer (TLB) cache is used to improve performance by caching recent page table lookups.
This document discusses the structure and components of disk drives and operating system disk management. It covers topics like disk formatting, logical block addressing, zones and rotational speeds on disks, and different disk scheduling algorithms like FCFS, SSTF, SCAN, C-SCAN, and LOOK. It also summarizes file systems, swap space management, and networking protocols in operating systems.
Paging and Segmentation in Operating SystemRaj Mohan
The document discusses different types of memory used in computers including physical memory, logical memory, and virtual memory. It describes how virtual memory uses paging and segmentation techniques to allow programs to access more memory than is physically available. Paging divides memory into fixed-size pages that can be swapped between RAM and secondary storage, while segmentation divides memory into variable-length, protected segments. The combination of paging and segmentation provides memory protection and efficient use of available RAM.
This document discusses segmentation and paging techniques for memory management. It begins with a brief overview of paging and segmentation. It then explains how segmentation and paging can be combined to achieve efficient memory utilization while allowing for protection and sharing. Under the combined approach, a process's address space is divided into segments, and each segment is divided into pages of fixed size. This allows sharing at both the segment and page level. The document provides examples of address translation under this combined approach.
This document discusses several memory management techniques:
1. Contiguous allocation allocates processes to contiguous regions of memory but can lead to fragmentation.
2. Paging divides memory into pages and processes into page tables to map virtual to physical addresses, reducing fragmentation. It uses translation lookaside buffers (TLBs) to speed address translation.
3. Segmentation divides processes into logical segments and uses segment tables to map segments to physical addresses. It provides a modular view of memory but external fragmentation remains an issue.
This document discusses different memory management techniques including:
1. Contiguous allocation allocates processes to contiguous regions of memory but can lead to fragmentation. Paging and segmentation address this by allowing non-contiguous allocation.
2. Paging maps logical addresses to physical frames through a page table. It supports non-contiguous allocation but has translation overhead that is reduced using translation lookaside buffers.
3. Segmentation divides memory into logical segments and uses a segment table to map logical to physical addresses. It matches the user's view of memory but external fragmentation remained an issue until combined with paging.
This document summarizes and analyzes different Flash Translation Layer (FTL) schemes for Solid State Drives (SSDs). It discusses the strengths and weaknesses of page mapping, block mapping, and hybrid mapping FTL schemes. It proposes that the optimal FTL scheme would have infrequent garbage collection, short garbage collection latency, low computation and memory overhead, and maintain good average and worst-case write performance. The document suggests that modified page mapping and FAST mapping schemes may achieve this if their computation overhead and worst-case latency issues are addressed without hurting average performance.
Ch 17 disk storage, basic files structure, and hashingZainab Almugbel
Modified version of Chapter 17 of the book Fundamentals_of_Database_Systems,_6th_Edition with review questions
as part of database management system course
The document discusses MapReduce, including its programming model, internal framework, and improvements. It describes MapReduce as a programming model and framework that allows parallel processing of large datasets across commodity machines. The map function processes input key-value pairs to generate intermediate pairs, and the reduce function combines values for each key. The framework automatically parallelizes jobs and provides fault tolerance.
Single partition allocation is the simplest form of memory management. It divides physical memory into two fixed sized partitions, one for the operating system and one for user processes. Hardware support in the form of relocation and limit registers is used to map logical addresses to physical addresses and protect the operating system code from user processes.
Data Consistency Enhancement in Writeback mode of Journaling using BackpointersRohan Waghere
This document presents a mechanism to enhance data consistency in the writeback mode of journaling file systems. The mechanism uses backpointers to provide a mutual agreement between data blocks and their parent inode, allowing consistency checks to be performed on-demand without ordering constraints. Backpointers containing the inode number are written with each data block. During reads, the backpointer is verified against the inode number to ensure logical identity. This provides consistency comparable to eager journaling modes like ordered mode, while retaining the performance advantages of writeback mode. Evaluation shows the backpointer approach has higher throughput than traditional journaling modes while enabling detection of inconsistencies.
Lecture 4: Data-Intensive Computing for Text Analysis (Fall 2011)Matthew Lease
This document provides an agenda and summaries for Lecture 4 of a course on data-intensive computing for text analysis using Hadoop. The lecture covers practical aspects of Hadoop including input/output formats, handling small files and whole file operations, data compression techniques, and mounting HDFS for easier access. Figures are courtesy of two books on Hadoop. The agenda includes discussing input formats, record readers, output formats, dealing with small files, multiple file formats, data compression, and mounting HDFS.
Non-shared disk clusters provide a fault tolerant and cost-effective approach to data-intensive computing. This document describes a prototype non-shared disk cluster and plans for a full implementation. The prototype uses local disks on nodes that are not shared over the network, requiring processing to occur on nodes containing the needed data. A file catalog tracks data placement. The full implementation will include modifications to analysis software to dispatch jobs to nodes based on file location. Fault tolerance is provided by restarting jobs if nodes fail and restoring failed nodes.
A simple replication-based mechanism has been used to achieve high data reliability of Hadoop Distributed File System (HDFS). However, replication based mechanisms have high degree of disk storage requirement since it makes copies of full block without consideration of storage size. Studies have shown that erasure-coding mechanism can provide more storage space when used as an alternative to replication. Also, it can increase write throughput compared to replication mechanism. To improve both space efficiency and I/O performance of the HDFS while preserving the same data reliability level, we propose HDFS+, an erasure coding based Hadoop Distributed File System. The proposed scheme writes a full block on the primary DataNode and then performs erasure coding with Vandermonde-based Reed-Solomon algorithm that divides data into m data fragments and encode them into n data fragments (n>m), which are saved in N distinct DataNodes such that the original object can be reconstructed from any m fragments. The experimental results show that our scheme can save up to 33% of storage space while outperforming the original scheme in write performance by 1.4 times. Our scheme provides the same read performance as the original scheme as long as data can be read from the primary DataNode even under single-node or double-node failure. Otherwise, the read performance of the HDFS+ decreases to some extent. However, as the number of fragments increases, we show that the performance degradation becomes negligible.
Chapter 4 record storage and primary file organizationJafar Nesargi
This document discusses file organization and storage in a database management system. It describes how records are stored on secondary storage devices like disks and tapes. Records can have fixed or variable lengths and are organized into files. The document explains how disks are organized into tracks and sectors and how buffers in memory are used to improve performance when accessing multiple disk blocks.
Virtual memory is a technique that allows processes to execute as if they have more memory than is physically installed. It works by swapping pages of memory between RAM and disk as needed. With demand paging, only pages actively being used are loaded into memory, avoiding loading unnecessary pages. This allows programs to be larger than physical memory and improves efficiency by not loading parts that may never be used.
The document discusses different techniques for implementing page tables in virtual memory systems. It describes page tables as data structures that store mappings between virtual and physical addresses. It then covers various approaches to implementing page tables, including using hardware registers, storing the page table in main memory with a page table base register, and using translation lookaside buffers (TLBs) to cache address mappings for faster access. The document also discusses hierarchical/multi-level page tables to address large address spaces, shared pages, inverted page tables, and hashed page tables.
Database Research on Modern Computing ArchitectureKyong-Ha Lee
This document provides an overview of a talk on database research related to modern computing architecture given on September 10, 2010. The talk discusses the immense changes in computer hardware, including a variety of computing resources and increasing intra-node parallelism. It also covers how database technology can facilitate modern hardware features like parallelism. Specific topics covered include memory hierarchy changes, the memory wall problem, and latency issues compared to increasing bandwidth.
A page table is a data structure used in virtual memory systems to map virtual addresses to physical addresses. Common techniques for structuring page tables include hierarchical paging, hashed page tables, and inverted page tables. Hierarchical paging breaks up the logical address space into multiple page tables, such as a two-level or three-level structure. Hashed page tables use hashing to map virtual page numbers to chained elements in a page table. Inverted page tables combine a page table and frame table into one structure with an entry for each virtual and physical page mapping.
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay RadiaYahoo Developer Network
This document discusses scaling HDFS through federation. HDFS currently uses a single namenode that limits scalability. Federation allows multiple independent namenodes to each manage a subset of the namespace, improving scalability. It also generalizes the block storage layer to use block pools, separating block management from namenodes. This paves the way for horizontal scaling of both namenodes and block storage in the future. Federation preserves namenode robustness while requiring few code changes. It also provides benefits like improved isolation and availability when scaling to extremely large clusters with billions of files and blocks.
The document discusses paging and monolithic kernels. Paging is a memory management technique that moves pages between main memory and secondary storage. It allows for faster access to data by copying pages from storage to memory when needed. A monolithic kernel is an operating system architecture where the entire OS operates in kernel space for efficient communication but makes the system difficult to maintain and debug.
MapReduce: A useful parallel tool that still has room for improvementKyong-Ha Lee
The document discusses MapReduce, a framework for processing large datasets in parallel. It provides an overview of MapReduce's basic principles, surveys research to improve the conventional MapReduce framework, and describes research projects ongoing at KAIST. The key points are that MapReduce provides automatic parallelization, fault tolerance, and distributed processing of large datasets across commodity computer clusters. It also introduces the map and reduce functions that define MapReduce jobs.
The document discusses different memory management techniques used in operating systems. It begins with an overview of processes entering memory from an input queue. It then covers binding of instructions and data to memory at compile time, load time, or execution time. Key concepts discussed include logical vs physical addresses, the memory management unit (MMU), dynamic loading and linking, overlays, swapping, contiguous allocation, paging using page tables and frames, and fragmentation. Hierarchical paging, hashed page tables, and inverted page tables are also summarized.
Paging and Segmentation in Operating SystemRaj Mohan
The document discusses different types of memory used in computers including physical memory, logical memory, and virtual memory. It describes how virtual memory uses paging and segmentation techniques to allow programs to access more memory than is physically available. Paging divides memory into fixed-size pages that can be swapped between RAM and secondary storage, while segmentation divides memory into variable-length, protected segments. The combination of paging and segmentation provides memory protection and efficient use of available RAM.
This document discusses segmentation and paging techniques for memory management. It begins with a brief overview of paging and segmentation. It then explains how segmentation and paging can be combined to achieve efficient memory utilization while allowing for protection and sharing. Under the combined approach, a process's address space is divided into segments, and each segment is divided into pages of fixed size. This allows sharing at both the segment and page level. The document provides examples of address translation under this combined approach.
This document discusses several memory management techniques:
1. Contiguous allocation allocates processes to contiguous regions of memory but can lead to fragmentation.
2. Paging divides memory into pages and processes into page tables to map virtual to physical addresses, reducing fragmentation. It uses translation lookaside buffers (TLBs) to speed address translation.
3. Segmentation divides processes into logical segments and uses segment tables to map segments to physical addresses. It provides a modular view of memory but external fragmentation remains an issue.
This document discusses different memory management techniques including:
1. Contiguous allocation allocates processes to contiguous regions of memory but can lead to fragmentation. Paging and segmentation address this by allowing non-contiguous allocation.
2. Paging maps logical addresses to physical frames through a page table. It supports non-contiguous allocation but has translation overhead that is reduced using translation lookaside buffers.
3. Segmentation divides memory into logical segments and uses a segment table to map logical to physical addresses. It matches the user's view of memory but external fragmentation remained an issue until combined with paging.
This document summarizes and analyzes different Flash Translation Layer (FTL) schemes for Solid State Drives (SSDs). It discusses the strengths and weaknesses of page mapping, block mapping, and hybrid mapping FTL schemes. It proposes that the optimal FTL scheme would have infrequent garbage collection, short garbage collection latency, low computation and memory overhead, and maintain good average and worst-case write performance. The document suggests that modified page mapping and FAST mapping schemes may achieve this if their computation overhead and worst-case latency issues are addressed without hurting average performance.
Ch 17 disk storage, basic files structure, and hashingZainab Almugbel
Modified version of Chapter 17 of the book Fundamentals_of_Database_Systems,_6th_Edition with review questions
as part of database management system course
The document discusses MapReduce, including its programming model, internal framework, and improvements. It describes MapReduce as a programming model and framework that allows parallel processing of large datasets across commodity machines. The map function processes input key-value pairs to generate intermediate pairs, and the reduce function combines values for each key. The framework automatically parallelizes jobs and provides fault tolerance.
Single partition allocation is the simplest form of memory management. It divides physical memory into two fixed sized partitions, one for the operating system and one for user processes. Hardware support in the form of relocation and limit registers is used to map logical addresses to physical addresses and protect the operating system code from user processes.
Data Consistency Enhancement in Writeback mode of Journaling using BackpointersRohan Waghere
This document presents a mechanism to enhance data consistency in the writeback mode of journaling file systems. The mechanism uses backpointers to provide a mutual agreement between data blocks and their parent inode, allowing consistency checks to be performed on-demand without ordering constraints. Backpointers containing the inode number are written with each data block. During reads, the backpointer is verified against the inode number to ensure logical identity. This provides consistency comparable to eager journaling modes like ordered mode, while retaining the performance advantages of writeback mode. Evaluation shows the backpointer approach has higher throughput than traditional journaling modes while enabling detection of inconsistencies.
Lecture 4: Data-Intensive Computing for Text Analysis (Fall 2011)Matthew Lease
This document provides an agenda and summaries for Lecture 4 of a course on data-intensive computing for text analysis using Hadoop. The lecture covers practical aspects of Hadoop including input/output formats, handling small files and whole file operations, data compression techniques, and mounting HDFS for easier access. Figures are courtesy of two books on Hadoop. The agenda includes discussing input formats, record readers, output formats, dealing with small files, multiple file formats, data compression, and mounting HDFS.
Non-shared disk clusters provide a fault tolerant and cost-effective approach to data-intensive computing. This document describes a prototype non-shared disk cluster and plans for a full implementation. The prototype uses local disks on nodes that are not shared over the network, requiring processing to occur on nodes containing the needed data. A file catalog tracks data placement. The full implementation will include modifications to analysis software to dispatch jobs to nodes based on file location. Fault tolerance is provided by restarting jobs if nodes fail and restoring failed nodes.
A simple replication-based mechanism has been used to achieve high data reliability of Hadoop Distributed File System (HDFS). However, replication based mechanisms have high degree of disk storage requirement since it makes copies of full block without consideration of storage size. Studies have shown that erasure-coding mechanism can provide more storage space when used as an alternative to replication. Also, it can increase write throughput compared to replication mechanism. To improve both space efficiency and I/O performance of the HDFS while preserving the same data reliability level, we propose HDFS+, an erasure coding based Hadoop Distributed File System. The proposed scheme writes a full block on the primary DataNode and then performs erasure coding with Vandermonde-based Reed-Solomon algorithm that divides data into m data fragments and encode them into n data fragments (n>m), which are saved in N distinct DataNodes such that the original object can be reconstructed from any m fragments. The experimental results show that our scheme can save up to 33% of storage space while outperforming the original scheme in write performance by 1.4 times. Our scheme provides the same read performance as the original scheme as long as data can be read from the primary DataNode even under single-node or double-node failure. Otherwise, the read performance of the HDFS+ decreases to some extent. However, as the number of fragments increases, we show that the performance degradation becomes negligible.
Chapter 4 record storage and primary file organizationJafar Nesargi
This document discusses file organization and storage in a database management system. It describes how records are stored on secondary storage devices like disks and tapes. Records can have fixed or variable lengths and are organized into files. The document explains how disks are organized into tracks and sectors and how buffers in memory are used to improve performance when accessing multiple disk blocks.
Virtual memory is a technique that allows processes to execute as if they have more memory than is physically installed. It works by swapping pages of memory between RAM and disk as needed. With demand paging, only pages actively being used are loaded into memory, avoiding loading unnecessary pages. This allows programs to be larger than physical memory and improves efficiency by not loading parts that may never be used.
The document discusses different techniques for implementing page tables in virtual memory systems. It describes page tables as data structures that store mappings between virtual and physical addresses. It then covers various approaches to implementing page tables, including using hardware registers, storing the page table in main memory with a page table base register, and using translation lookaside buffers (TLBs) to cache address mappings for faster access. The document also discusses hierarchical/multi-level page tables to address large address spaces, shared pages, inverted page tables, and hashed page tables.
Database Research on Modern Computing ArchitectureKyong-Ha Lee
This document provides an overview of a talk on database research related to modern computing architecture given on September 10, 2010. The talk discusses the immense changes in computer hardware, including a variety of computing resources and increasing intra-node parallelism. It also covers how database technology can facilitate modern hardware features like parallelism. Specific topics covered include memory hierarchy changes, the memory wall problem, and latency issues compared to increasing bandwidth.
A page table is a data structure used in virtual memory systems to map virtual addresses to physical addresses. Common techniques for structuring page tables include hierarchical paging, hashed page tables, and inverted page tables. Hierarchical paging breaks up the logical address space into multiple page tables, such as a two-level or three-level structure. Hashed page tables use hashing to map virtual page numbers to chained elements in a page table. Inverted page tables combine a page table and frame table into one structure with an entry for each virtual and physical page mapping.
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay RadiaYahoo Developer Network
This document discusses scaling HDFS through federation. HDFS currently uses a single namenode that limits scalability. Federation allows multiple independent namenodes to each manage a subset of the namespace, improving scalability. It also generalizes the block storage layer to use block pools, separating block management from namenodes. This paves the way for horizontal scaling of both namenodes and block storage in the future. Federation preserves namenode robustness while requiring few code changes. It also provides benefits like improved isolation and availability when scaling to extremely large clusters with billions of files and blocks.
The document discusses paging and monolithic kernels. Paging is a memory management technique that moves pages between main memory and secondary storage. It allows for faster access to data by copying pages from storage to memory when needed. A monolithic kernel is an operating system architecture where the entire OS operates in kernel space for efficient communication but makes the system difficult to maintain and debug.
MapReduce: A useful parallel tool that still has room for improvementKyong-Ha Lee
The document discusses MapReduce, a framework for processing large datasets in parallel. It provides an overview of MapReduce's basic principles, surveys research to improve the conventional MapReduce framework, and describes research projects ongoing at KAIST. The key points are that MapReduce provides automatic parallelization, fault tolerance, and distributed processing of large datasets across commodity computer clusters. It also introduces the map and reduce functions that define MapReduce jobs.
The document discusses different memory management techniques used in operating systems. It begins with an overview of processes entering memory from an input queue. It then covers binding of instructions and data to memory at compile time, load time, or execution time. Key concepts discussed include logical vs physical addresses, the memory management unit (MMU), dynamic loading and linking, overlays, swapping, contiguous allocation, paging using page tables and frames, and fragmentation. Hierarchical paging, hashed page tables, and inverted page tables are also summarized.
The document discusses file system implementation and storage management. It covers topics like file system structure, file control blocks, directory implementation methods, file allocation methods including contiguous, linked, and indexed allocation, and free space management techniques like bit vectors, linked lists, and address counting. The key aspects of different file allocation and free space tracking methods are compared.
CS 542 Putting it all together -- Storage ManagementJ Singh
The document provides an overview and plan for a lecture on database management systems. Key points include:
- By the second break, the lecture will cover storage hierarchies, secondary storage management, and system catalogs.
- After the second break, the topics will include data modeling and storage hierarchies.
- Storage hierarchies involve multiple storage levels from main memory to disk and beyond. The cost and performance of each level differs.
- Techniques like caching aim to keep frequently used data in faster storage levels like memory.
This document discusses several memory management techniques:
1. Contiguous allocation allocates processes to contiguous regions of memory but can lead to fragmentation.
2. Paging divides memory into pages and processes into page tables to map virtual to physical addresses, reducing fragmentation. It uses translation lookaside buffers (TLBs) to speed address translation.
3. Segmentation divides processes into logical segments and uses segment tables to map segments to physical addresses. It provides a modular view of memory but external fragmentation remains an issue.
The document discusses how databases are stored on disks and managed in files. It describes how disks provide secondary storage that is cheaper than RAM but slower to access. The buffer manager brings frequently used disk pages into memory to allow for faster data access. Files are used to organize records on disks and different file structures like heap files and indexes are used to support data insertion, deletion, and retrieval.
RAID (redundant array of independent disks) is a technology that combines multiple disk drive components into a logical unit for the purposes of data redundancy, performance improvement, or both. There are different RAID levels that provide varying degrees of redundancy and performance. The document defines several RAID levels (0-6) and describes their characteristics such as striping, mirroring, parity techniques. It also covers indexing methods like primary indexing, secondary indexing, B-trees, hashing techniques, and basic steps in query processing including parsing, optimization, and evaluation.
This document contains slides from a lecture on operating system concepts including swapping, contiguous memory allocation, paging, segmentation, and page replacement. It discusses key topics such as logical vs physical addresses, page tables, translation lookaside buffers, demand paging, and page faults. The document includes 10 slides with diagrams and explanations of these operating system memory management techniques.
The document discusses different algorithms for performing joins between two database tables:
1. Simple nested loops join compares each tuple in one table to every tuple in the other table, resulting in very high I/O costs.
2. Block nested loops join partitions one table into blocks that fit in memory, joining each block to the other table to reduce I/O.
3. Index nested loops join uses an index on the join column to lookup matching tuples, reducing I/O costs compared to nested loops. The document provides examples comparing the I/O costs of applying different join algorithms to sample tables.
The document describes Bin-Carver, a tool that can recover fragmented binary executable files from disk images. It outlines traditional file carving methods that rely on file headers and footers, which do not work for fragmented or unknown file types like binaries. Bin-Carver uses an ELF header scanner to locate binary headers, then builds a graph of block nodes to link fragmented content blocks based on function calls. It also includes a conflict node resolver to handle data blocks. Evaluation shows Bin-Carver can identify 96.3% and recover 93.1% of binary files, outperforming other file carving tools.
This document discusses database management systems and contains slides related to external data storage, file organization, indexing, and performance comparisons. Specifically, it provides information on different file organizations like heap files, sorted files, and indexed files. It also describes index structures like B+ trees and hash indexes. The slides provide comparisons of the cost of common operations like scans, searches, and updates between different file organization approaches.
DeepSort is a 'scalable and efficiency-optimized distributed general sorting engine.’ DeepSort enables a fluent data flow that shares the limited memory space and minimizes data movement, which makes it to be highly efficient at a large scale.
6 chapter 6 record storage and primary file organizationsiragezeynu
This document discusses different methods for organizing and storing data in a database system. It describes primary and secondary storage, with secondary storage including magnetic disks, tapes, and optical disks for large data storage. Records are stored in files, which can contain fixed-length or variable-length records. Files are allocated to disk blocks and can be organized using contiguous, linked, or indexed allocation. The document also discusses heap files, sorted files, and hashing techniques for indexing and retrieving records from files.
The document discusses memory management techniques in UNIX, including swapping, demand paging, and how they are implemented in Intel Pentium hardware. Swapping involves copying processes from memory to disk swap space to free up memory. Demand paging only loads pages into memory when accessed. The Pentium supports segmentation and paging, where logical addresses are translated to linear then physical addresses using segment descriptors, page directories and tables.
1) Databases organize and store data efficiently using a storage hierarchy including cache, main memory, magnetic disks, optical disks, and tapes. Magnetic disks are commonly used secondary storage.
2) Indexing and file structures like B+ trees are data structures that allow efficient retrieval of records from database files based on indexed attributes. B+ trees in particular provide fast traversal and searching through a balanced tree structure.
3) RAID (Redundant Array of Independent Disks) uses multiple disks together to provide increased performance, redundancy, or both through techniques like disk striping and mirroring.
By combining two best of breed solutions, one can create very powerful big data crunching solution.
Hadoop is a very popular big data solution with poor agility and not so great data transformation capabilities - despite what many Hadoop hype riding companies are trying to pitch.
By combining Hadoop's strengths with other very powerful open source technology - CloverETL, we get a nice synergy of both.
This document discusses shingled magnetic recording (SMR) disks, which allow for higher storage densities by overlapping written tracks. SMR disks can achieve 2-3 times the density of conventional disks but only support random reads and sequential writes. The document explores two strategies for utilizing SMR disks: 1) masking their operational differences behind a translation layer or 2) using a specialized file system optimized for their characteristics. Key challenges include supporting random writes, managing bands of tracks, and reserving space for random access versus storing in non-volatile RAM. Workload analysis is needed to determine suitability for general usage.
Log structured file systems buffer writes in memory and flush them sequentially to disk to improve write performance for small files. They maintain metadata structures like inode maps to allow efficient retrieval of scattered file data. Consistency is ensured through regular check-pointing of the metadata and recovery by replaying writes from the last check-point. While LFS improved small file performance, cleaning overhead and large file writes saw better performance on traditional file systems under some workloads. Journaling file systems take a hybrid approach.
Control dataset partitioning and cache to optimize performances in SparkChristophePraud2
Christophe Préaud and Florian Fauvarque presented techniques for optimizing Spark performance through proper dataset partitioning and caching. They discussed how to tune the number of partitions for reading, writing, and transformations like joins. Storage levels like MEMORY_ONLY and MEMORY_AND_DISK were explained for caching datasets. Profiling tools like Babar were also mentioned for analyzing Spark applications. The presentation aimed to help optimize slot usage and reduce job runtimes.
Arif's PhD Defense (Title: Efficient Cloud Application Deployment in Distrib...Arif A.
Arif PhD deals with how efficiently applications are deployed in distributed fog environments. A short abstract of the thesis is given below:
Fog computing architectures are composed of a large number of machines distributed across a geographical area such as a city or a region. In this context it is important to support a quick startup of applications deployed in the for of docker containers. This thesis explores the reasons for slow deployment and identifies three improvement opportunities: (1) improving the Docker cache hit rate; (2) speed-up the image installation operation; and (3) accelerate the application boot phase after the creation of a container.
The document presents an overview of mobile cloud computing. It discusses how mobile cloud computing allows processing to be done in the cloud and data stored in the cloud, with the mobile device serving as the interface. This allows applications to be developed for the cloud and used across connected phones. It also details how mobile cloud computing helps users keep their data and apps when upgrading phones. While feature phones face challenges for mobile cloud due to limitations like small screens, smart phones are better suited as they have recognizable operating systems, can run apps, and have persistent internet access. The document outlines some current mobile operating systems and challenges like device characteristics not matching required resolutions.
This document describes the architecture of a processor that includes multiple functional units like a bulk memory, working stores, data transfer, control units, an I/O selector, floating point multiplier and reciprocal units, arithmetic controllers, floating point adders, and a sine/cosine pipeline for processing angles. The processor also interfaces with a host computer through an I/O channel and is controlled by a microprogram.
The IBM 3838 is a multiple-pipeline scientific processor that was evolved from the IBM 2938 array processor. It attaches to IBM mainframes like the System/370 to enhance their vector processing capabilities. The 3838 contains five arithmetic units including floating point adders and multipliers that can perform vector operations. It uses a control processor, bulk memory, working stores, and a data transfer control unit to coordinate the vector instructions and transfer of data between memory and pipelines. The 3838 can support up to seven users simultaneously and achieve a maximum speed of 30 megaflops through concurrent pipelining and multiprogramming.
Mach is an object-oriented operating system kernel. It defines basic abstractions like tasks, threads, ports, and memory objects. Tasks provide execution environments, threads are units of execution, ports enable communication between tasks using messages, and memory objects are sources of shared memory. Tasks communicate asynchronously through message passing to ports using the mach_msg system call.
The document discusses automatic speech recognition, including identifying the main challenges such as variability in speech, separating speech from background noise, and disambiguating homophones. It describes early approaches using template matching and rule-based systems, and more modern statistical approaches using machine learning techniques like Hidden Markov Models and n-gram language models trained on large speech corpora. The statistical approaches have proven more successful by learning from data rather than relying on hand-crafted rules.
Artificial Intelligence (AI) has revolutionized the creation of images and videos, enabling the generation of highly realistic and imaginative visual content. Utilizing advanced techniques like Generative Adversarial Networks (GANs) and neural style transfer, AI can transform simple sketches into detailed artwork or blend various styles into unique visual masterpieces. GANs, in particular, function by pitting two neural networks against each other, resulting in the production of remarkably lifelike images. AI's ability to analyze and learn from vast datasets allows it to create visuals that not only mimic human creativity but also push the boundaries of artistic expression, making it a powerful tool in digital media and entertainment industries.
The Science of Learning: implications for modern teachingDerek Wenmoth
Keynote presentation to the Educational Leaders hui Kōkiritia Marautanga held in Auckland on 26 June 2024. Provides a high level overview of the history and development of the science of learning, and implications for the design of learning in our modern schools and classrooms.
How to Download & Install Module From the Odoo App Store in Odoo 17Celine George
Custom modules offer the flexibility to extend Odoo's capabilities, address unique requirements, and optimize workflows to align seamlessly with your organization's processes. By leveraging custom modules, businesses can unlock greater efficiency, productivity, and innovation, empowering them to stay competitive in today's dynamic market landscape. In this tutorial, we'll guide you step by step on how to easily download and install modules from the Odoo App Store.
Brand Guideline of Bashundhara A4 Paper - 2024khabri85
It outlines the basic identity elements such as symbol, logotype, colors, and typefaces. It provides examples of applying the identity to materials like letterhead, business cards, reports, folders, and websites.
(𝐓𝐋𝐄 𝟏𝟎𝟎) (𝐋𝐞𝐬𝐬𝐨𝐧 3)-𝐏𝐫𝐞𝐥𝐢𝐦𝐬
Lesson Outcomes:
- students will be able to identify and name various types of ornamental plants commonly used in landscaping and decoration, classifying them based on their characteristics such as foliage, flowering, and growth habits. They will understand the ecological, aesthetic, and economic benefits of ornamental plants, including their roles in improving air quality, providing habitats for wildlife, and enhancing the visual appeal of environments. Additionally, students will demonstrate knowledge of the basic requirements for growing ornamental plants, ensuring they can effectively cultivate and maintain these plants in various settings.
8+8+8 Rule Of Time Management For Better ProductivityRuchiRathor2
This is a great way to be more productive but a few things to
Keep in mind:
- The 8+8+8 rule offers a general guideline. You may need to adjust the schedule depending on your individual needs and commitments.
- Some days may require more work or less sleep, demanding flexibility in your approach.
- The key is to be mindful of your time allocation and strive for a healthy balance across the three categories.
2. Data Organization on Secondary Storage
Architecture of a database storage manager:
A mapping table maps
free frame
page# to frame#
disk page
buffer pool
secondary storage
02/14/13 2
3. Buffer Management
When a page is requested, the storage manager looks at the
mapping table to see if this page is in the pool. If not:
1. If there is a free frame, it allocates the frame, reads the disk
page into frame, and inserts the mapping from page# to frame#
into the mapping table.
2. Otherwise, it selects an allocated frame and replaces it with the
new page. If the victim frame is dirty (has been updated), it
writes it back to disk. It also updates the mapping table.
Then it pins the page.
02/14/13 3
4. Pinning/Unpinning Data
The page pin is a memory counter that indicates how many times
this page has been requested. The pin counter and the dirty bit
of each frame can be stored in the mapping table.
When a program requests a page, it must pin the page, which is
done by incrementing the pin counter. When a program doesn’t
need the page anymore, it unpins the page, which is done by
decrementing the pin counter.
When the pin is zero, the frame associated with the page is a good
candidate for replacement when the buffer pool is full.
02/14/13 4
5. A Higher-Level Interface
Page I/O is very low level. You can abstract it using records, files,
and indexes.
A file is a sequence of records, while each record is an aggregation
of data values.
Records are stored on disk blocks. Small records do not cross page
boundaries. The blocking factor for a file is the average number
of file records stored on a disk block. Large records may span
multiple pages. Fixed-size records are typically unspanned.
The disk pages of a file that contain the file records may be
contiguous, linked, or indexed. A file may be organized as a
heap (unordered) or a sequential (ordered) file. Ordered files
can be searched using binary search but they are hard to update.
A file may also have a file descriptor (or header) that contains
info about the file and the record fields.
02/14/13 5
6. Indexes
The most common form of an index is a mapping of one or more
fields of a file (this is the search key) to record ids (rids). It
speeds up selections on the search key. An index gives an
alternative access path to the file records on the index key.
Types of indexes:
1. Primary index: if the search key contains the primary key. If
the search key contains a candidate key, is a unique index.
Otherwise, it is a secondary index.
2. Clustered index: when the order of records is the same as the
order of index.
3. Dense index: when, for each data record, there is an entry in
the index with search key equal to the associated record fields.
Otherwise, it is sparse. A sparse index must be clustered, to be
able to locate records with no search key.
02/14/13 6
7. B+-trees
B+-trees are variations of search trees that are suitable for block-
based secondary storage. They allow efficient search (both
equality & range search), insertion, and deletion of search keys.
Characteristics:
• Each tree node corresponds to a disk block. The tree is kept
height-balanced.
• Each node is kept between half-full and completely full, except
for the root, which can have one search key minimum.
• An insertion into a node that is not full is very efficient. If the
node is full, we have an overflow, and the insertion causes a
split into two nodes. Splitting may propagate to other tree levels
and may reach the root.
• A deletion is very efficient if the node is kept above half full.
Otherwise, there is an underflow, and the node must be merged
with neighboring nodes.
02/14/13 7
8. B+-tree Node Structure
Internal node
K1 … K i-1 K … i Kq-1
P1 Pi Pq
K ≤ K1 Ki-1< K ≤ Ki K > Kq-1
Order p of a B+-tree is the max number of pointers in a node:
p = (b+k)/(k+d)
where b is the block size, k is the key size, and d is the pointer size.
Number of pointers, q, in an internal node/leaf: p/2 ≤ q ≤ p
Search, insert, and delete have logqN cost (for N records).
02/14/13 8
9. Example
For a 1024B page, 9B key, and 7B pointer size, the order is:
p = (1024+9)/(9+7) = 64
In practice, typical order is p=200 and a typical fill factor is 67%.
That is, the average fanout is 133 (=200*0.67).
Typical capacities:
• Height 3: 2,352,637 records (=1333)
• Height 4: 312,900,700 records (=1334)
02/14/13 9
10. Query Processing
Need to evaluate database queries efficiently. Assumptions:
• You can not upload the entire database in memory
• You want to utilize the memory as much as possible
• The cost will heavily depend on how many pages you read
from (and write to) disk.
Steps:
1. Translation: translate the query into an algebraic form
2. Algebraic optimization: improve the algebraic form using
heuristics
3. Plan selection: consider available alternative implementation
algorithms for each algebraic operation in the query and
choose the best one (or avoid the worst) based on statistics
4. Evaluation: evaluate the best plan against the database.
02/14/13 10
11. Evaluation Algorithms: Sorting
Sorting is central to query processing. It is used for:
1. SQL order-by clause
2. Sort-merge join
3. Duplicate elimination (you sort the file by all record values so
that duplicates will moved next to each other)
4. Bulk loading of B+-trees
5. Group-by with aggregations (you sort the file by the group-by
attributes and then you aggregate on subsequent records that
belong to the same group, since, after sorting, records with the
same group-by values will be moved next to each other)
02/14/13 11
12. External Sorting
file Memory (nB blocks)
merging
sorting merging
initial
runs
Available buffer space: nB ≥ 3 Sorting cost: 2*b
Number of blocks in file: b Merging cost: 2*b*logdmnR
Number of initial runs: nR = b/nB
Degree of merging: dm = min(nB-1,nR) Total cost is O(b*logb)
Number of passes is logdmnR
02/14/13 12
13. Example
With 5 buffer pages, to sort 108 page file:
1. Sorting: nR=108/5 = 22 sorted initial runs of 5 pages each
2. Merging: dm=4 run files to merge for each merging, so we get
22/4=6 sorted runs (5 pages each) that need to be merged
3. Merging: we will get 6/4=2 sorted runs to be merged
4. Merging: we get the final sorted file of 108 pages.
Total number of pages read/written: 2*108*4 = 864 pages
(since we have 1 sorting + 3 merging steps).
02/14/13 13
14. Replacement Selection
Question: What’s the best way to create each initial run?
With QuickSort: you load nB pages from the file into the memory
buffer, you sort using QuickSort, and you write the result to a
runfile. So the runfile is always at most nB pages.
With HeapSort with replacement selection: you load nB pages from
file into the memory buffer, you perform BuildHeap to create
the initial heap in memory. But now you continuously read
more records from the input file, you Heapify each record, and
you remove and write the smallest element of the heap (the
heap root) into the output runfile until the record sort key is
larger than the smallest key in the heap. Then, you complete the
HeapSort in memory and you dump the sorted file to the output.
Result: the average size of the runfile is now 2*nB. So even
though HeapSort is slower than QuickSort for in-memory
sorting, it is better for external sorting, since it creates larger
runfiles.
02/14/13 14
15. Join Evaluation Algorithms
Join is the most expensive operation. Many algorithms exist.
Join: R R.A=S.B S
Nested loops join (naïve evaluation):
res ← ∅
for each r∈R do
for each s∈S do
if r.A=s.B
then insert the concatenation of r and s into res
Improvement: use 3 memory blocks, one for R, one for S, and one
for the output. Cost = bR+bR*bS (plus the cost for writing the
output), where bR and bS are the numbers of blocks in R and S.
02/14/13 15
16. Block Nested Loops Join
Try to utilize all nB blocks in memory. Use one memory block for
the inner relation, S, one block for the output, and the rest (nB-2)
for the outer relation, R:
while not eof(R)
{ read the next nB-2 blocks of R into memory;
start scanning S from start one block at the time;
while not eof(S)
{ read the next block of S;
perform the join R R.A=S.B S between the memory
blocks of R and S and write the result to the output;
}
}
02/14/13 16
17. Block Nested Loops Join (cont.)
Cost: bR+bR/(nb-2)*bS
But, if either R or S can fit entirely in memory (ie. bR≤nb-2 or bS≤nb-
2) then the cost is bR+bS.
You always use the smaller relation (in number of blocks) as outer
and the larger as inner. Why? Because the cost of S R.A=S.B R is
bS+bS/(nb-2)*bR. So if bR>bS, then the latter cost is smaller.
Rocking the inner relation: Instead of always scanning the inner
relation R from the beginning, we can scan it top-down first,
then bottom-up, then top-down again, etc. That way, you don’t
have to read the first or last block twice. In that case, the cost
formula will have bS-1 instead of bS.
02/14/13 17
18. Index Nested Loops Join
If there is an index I (say a B+-tree) on S over the search key S.B,
we can use an index nested loops join:
for each tuple r∈R do
{ retrieve all tuples s∈S using the value r.A as the search key
for the index I;
perform the join between the r and the retrieved tuples from S
}
Cost: bR+|R|*(#of-levels-in-the-index), where |R| is the number
of tuples in R. The number of levels in the B+-tree index is
typically smaller than 4.
02/14/13 18
19. Sort-Merge Join
Applies to equijoins only. Assume R.A is a candidate key of R.
Steps:
1. Sort R on R.A
2. Sort S on S.B
3. Merge the two sorted files as follows:
r ← first(R)
s ← first(S)
repeat
{ while s.B≤r.A
{ if r.A=s.B then write <r,s> to the output
s ← next(S)
}
r ← next(R)
} until eof(R) //or eof(S)
02/14/13 19
20. Sort-Merge Join (cont.)
Cost of merging: bR+bS
If R and/or S need to be sorted, then the sorting cost must be added
too. You don’t need to sort a relation if there is a B+-tree whose
search key is equal to the sort key (or, more generally, if the
sort key is a prefix of the search key).
Note that if R.A is not candidate key, then switch R and S,
provided that S.B is a candidate key for S. If neither is true,
then a nested loops join must be performed between the equal
values of R and S. In the worst case, if all tuples in R have the
same value for r.A and all tuples of S have the same value for
s.B, equal to r.A, then the cost will be bR*bS.
Sorting and merging can be combined into one phase. In practice,
since sorting can be done in less than 4 passes only, the sort-
merge join is close to linear.
02/14/13 20
21. Hash Join
Works on equijoins only. R is called the built table and S is called
the probe table. We assume that R can fit in memory
Steps:
1. Built phase: read the built table, R, in memory as a hash table
with hash key R.A. For example, if H is a memory hash table
with n buckets, then tuple r of R goes to the H(h(r.A) mod n)
bucket, where h maps r.A to an integer.
2. Probe phase: scan S, one block at a time:
res ← ∅
for each s∈S
for each r∈H(h(s.B) mod n)
if r.A=s.B
then insert the concatenation of r and s into res
02/14/13 21
22. Partitioned Hash Join
If neither R nor S can fit in memory, you partition both R and S
into m=min(bR,bS)/(k*nB) partitions, where k is a number
larger than 1, eg, 2. Steps:
1. Partition R:
create m partition files for R: R1,…Rm
for each r∈R
put r in the partition Rk, where k=h(r.A) mod m
2. Partition S:
create m partition files for S: S1,…Sm
for each s∈S
put s in the partition Sk, where k=h(s.B) mod m
3. for i=1 to m
perform in-memory hash join between Ri and Si
02/14/13 22
23. Partitioned Hash Join (cont.)
If the hash function does not partition uniformly (this is called
data skew), one or more Ri/Si partitions may not fit in memory.
We can apply the partitioning technique recursively until all
partitions can fit in memory.
Cost: 3*(bR+bS) since each block is read twice and written once.
In most cases, it’s better than sort-merge join, and is highly
parallelizable. But sensitive to data skew. Sort-merge is better
when one or both inputs are sorted. Also sort-merge join
delivers the output sorted.
02/14/13 23
24. Aggregation with Group-by
If it is aggregation without group-by, then simply scan the input and aggregate
using one or more accumulators.
For aggregations with a group-by, sort the input on the group-by attributes and
scan the result. Example:
select dno, count(*), avg(salary)
from employee group by dno
Algorithm:
sort employee by dno into E;
e←first(E)
count←0; sum←0; d←e.dno;
while not eof(E)
{ if e.dno<>d
then { output d, count, sum/count;
count←0; sum←0; d←e.dno }
count ←count+1; sum←sum+e.salary
e←next(E);
}
output d, count, sum/count;
02/14/13 24
25. Other Operators
Aggregation with group-by (cont.):
Sorting can be combined with scanning/aggregation.
You can group-by using partitioned hashing using the group-by
attributes for hash/partition key.
Other operations:
• Selection can be done with scanning the input and testing each
tuple against the selection condition. Or you can use an index.
• Intersection is a special case of join (the predicate is an equality
over all attribute values).
• Projection requires duplicate elimination, which can be done
with sorting or hashing. You can eliminate duplicates during
sorting or hashing.
• Union requires duplicate elimination too.
02/14/13 25
26. Combining Operators
Each relational algebraic operator reads one or two π
relations as input and returns one relation as
output. It can be very expensive if the
evaluation algorithms that implement these
operators had to materialize the output relations
into temporary files on disk.
Solution: Stream-based processing (pipelining). σ B C D
Iterators: open, next, close
A
Operators work on stream of tuples now.
Operation next returns one tuple only, which is
sent to the output stream. It is ‘pull’ based: To
create one tuple, the operator calls the next
operation over its input streams as many times
as necessary to generate the tuple.
02/14/13 26
27. Stream-Based Processing
Selection without streams:
table selection ( table x, bool (pred)(record) ) join x y x.A=y.B
{ table result = empty
for each e in x
if pred(e) then insert e into result
return result selection x y x.C>10
}
Stream-based selection:
record selection ( stream s, bool (pred)(record) )
{ while not eof(s)
{ r = next(s)
if pred(r) then return r
}
return empty_record
}
struct stream { record(next_fnc)(…); stream x; stream y; args; }
record next ( stream s )
if (s.y=empty) return (s.next_fnc)(s.x,s.args)
else return (s.next_fnc)(s.x,s.y,s.args)
02/14/13 27
28. But …
Streamed-based nested loops join:
record nested-loops ( stream left, stream right, bool (pred)(record,record) )
{ while not eof(left)
{ x = next(left)
while not eof(right)
{ y = next(right)
if pred(x,y) then return <x,y>
}
open(right)
}
return empty_record
}
•If the inner stream is the result of a another operator (such as a join), it is
better to materialize it into a file. So this works great for left-deep trees.
•But doesn’t work well for sorting (a blocking operator).
02/14/13 28
29. Query Optimization
A query of the form:
select A1,…,An
from R1,…,Rm
where pred
can be evaluated by the following algebraic expression:
πA1,…An(σpred(R1×… × Rm))
Algebraic optimization:
Find a more efficient algebraic expression using heuristic rules.
02/14/13 29
30. Heuristics
• If pred in σpred is a conjunction, break σpred into a cascade of σ:
σ p1 and …pn(R) = σp1(… σpn(R))
• Move σ as far down the query tree as possible:
σp(R × S) = σp(R) × S if p refers to R only
• Convert cartesian products into joins
σR.A=S.B(R × S) = R R.A=S.B S
• Rearrange joins so that there are no cartesian products
• Move π as far down the query tree as possible (but retain
attributes needed in joins/selections).
02/14/13 30
31. Example
select e.fname, e.lname
from project p, works_on w, employee e
where p.plocation=‘Stafford’ and e.bdate>’Dec-31-1957’
and p.num=4 and p.number = w.pno and w.essn=e.ssn
π e.fname, e.lname
p.pnumber=w.pno
π e.fname, e.lname, w.pno π p.pnumber
w.essn=e.ssn
σp.plocation=‘Stafford and p.num=4
p
project
π e.ssn, e.fname, e.lname πww.essn, w.pno
σ e.bdate>’Dec-31-1957’ works_on
e
employee
02/14/13 31
32. Plan Selection
It has 3 components:
1. Enumeration of the plan space.
– Only the space of left-deep plans is typically considered.
– Cartesian products are avoided.
– It’s NP-hard problem.
– Some exponential algorithms (O(2N) for N joins) are still practical for
everyday queries (<10 joins). We will study the system R dynamic
programming algorithm.
– Many polynomial-time heuristics exist.
1. Cost estimation
– Based on statistics, maintained in system catalog.
– Very rough approximations; still black magic.
– More accurate estimations exist based on histograms.
1. Plan selection.
– Ideally, want to find the best plan. Practically, want to avoid the worst
plans.
02/14/13 32
33. Cost Estimation
The cost of a plan is the sum of the cost of all the plan operators.
• If the intermediate result between plan operators is
materialized, we need to consider the cost of reading/writing the
result.
• To estimate the cost of a plan operator (such as block-nested
loops join), we need to estimate the size (in blocks) of the
inputs.
– Based on predicate selectivity. Assumed independence of predicates.
– For each operator, both the cost and the output size needs to be
estimated (since it may be needed for the next operator).
– The sizes of leaves are retrieved from the catalog (statistics on
table/index cardinalities).
• We will study system R.
– Very inexact, but works well in practice. Used to be widely used.
– More sophisticated techniques known now.
02/14/13 33
34. Statistics
Statistics stored in system catalogs typically contain
– Number of tuples (cardinality) and number of blocks for each table and
index.
– Number of distinct key values for each index.
– Index height, low and high key values for each index.
Catalogs are updated periodically, but not every time data change
(too expensive). This may introduce slight inconsistency
between data and statistics, but usually the choice of plans is
resilient to slight changes in statistics.
Histograms are better approximations:
# of tuples
M Tu W Th F Sa Su Day of week
M
02/14/13 34