This document discusses decision support systems (DSS) and data warehousing. It provides definitions of DSS as interactive computer-based systems that help decision makers use data and models to identify and solve problems. It also defines data warehousing as a subject-oriented, integrated, nonvolatile, and time-variant collection of data used to support management decisions. The document outlines the concepts of operational databases, data warehousing architectures, and multidimensional database structures.
This document provides an overview of key concepts related to decision support systems (DSS) and data warehousing. It defines DSS as interactive computer systems that help decision makers use data, documents, models and communication technologies to identify and solve problems. It then discusses operational databases and how they differ from data warehouses in areas like data type, focus, users and more. Finally, it defines key characteristics of a data warehouse as being subject-oriented, integrated, time-variant and non-volatile to support management decision making.
What is Data Mining? Data Mining is defined as extracting information from huge sets of data. In other words, we can say that data mining is the procedure of mining knowledge from data.
What is Data Warehouse?OLTP vs. OLAP, Conceptual Modeling of Data Warehouses,Data Warehousing Components, Data Warehousing Components, Building a Data Warehouse, Mapping the Data Warehouse to a Multiprocessor Architecture, Database Architectures for Parallel Processing
The document outlines the syllabus for a course on data mining and data warehousing from Maulana Abul Kalam Azad University of Technology, West Bengal. It covers 7 units that discuss topics like introduction to data mining, data warehousing concepts, data mining techniques like decision trees and neural networks, mining association rules using various algorithms, clustering techniques, classification techniques, and applications of data mining. It also provides details on some core concepts like the stages of the knowledge discovery process, data mining functionalities, and classification of data mining systems.
Data warehousing combines data from multiple sources into a single database to provide businesses with analytics results from data mining, OLAP, scorecarding and reporting. It extracts, transforms and loads data from operational data stores and data marts into a data warehouse and staging area to integrate and store large amounts of corporate data. Data mining analyzes large databases to extract previously unknown and potentially useful patterns and relationships to improve business processes.
This lecture gives various definitions of Data Mining. It also gives why Data Mining is required. Various examples on Classification , Cluster and Association rules are given.
This document provides an introduction to database management systems and Microsoft Access. It discusses who needs a database and the differences between data and information. Key database concepts are explained, including data processing activities, database terminology, database management systems, relational database management systems and their applications. Different types of databases are also outlined.
meaning of data warehousing
needs of data warehousing
applications of data warehousing
architecture of data warehousing
advantages of data warehousing
disadvantages of data warehousing.
meaning of data mining
needs of data mining
applications of data mining
architecture of data mining
advantages of data mining
disadvantages of data mining
This document provides an overview of key concepts related to decision support systems (DSS) and data warehousing. It defines DSS as interactive computer systems that help decision makers use data, documents, models and communication technologies to identify and solve problems. It then discusses operational databases and how they differ from data warehouses in areas like data type, focus, users and more. Finally, it defines key characteristics of a data warehouse as being subject-oriented, integrated, time-variant and non-volatile to support management decision making.
What is Data Mining? Data Mining is defined as extracting information from huge sets of data. In other words, we can say that data mining is the procedure of mining knowledge from data.
What is Data Warehouse?OLTP vs. OLAP, Conceptual Modeling of Data Warehouses,Data Warehousing Components, Data Warehousing Components, Building a Data Warehouse, Mapping the Data Warehouse to a Multiprocessor Architecture, Database Architectures for Parallel Processing
The document outlines the syllabus for a course on data mining and data warehousing from Maulana Abul Kalam Azad University of Technology, West Bengal. It covers 7 units that discuss topics like introduction to data mining, data warehousing concepts, data mining techniques like decision trees and neural networks, mining association rules using various algorithms, clustering techniques, classification techniques, and applications of data mining. It also provides details on some core concepts like the stages of the knowledge discovery process, data mining functionalities, and classification of data mining systems.
Data warehousing combines data from multiple sources into a single database to provide businesses with analytics results from data mining, OLAP, scorecarding and reporting. It extracts, transforms and loads data from operational data stores and data marts into a data warehouse and staging area to integrate and store large amounts of corporate data. Data mining analyzes large databases to extract previously unknown and potentially useful patterns and relationships to improve business processes.
This lecture gives various definitions of Data Mining. It also gives why Data Mining is required. Various examples on Classification , Cluster and Association rules are given.
This document provides an introduction to database management systems and Microsoft Access. It discusses who needs a database and the differences between data and information. Key database concepts are explained, including data processing activities, database terminology, database management systems, relational database management systems and their applications. Different types of databases are also outlined.
meaning of data warehousing
needs of data warehousing
applications of data warehousing
architecture of data warehousing
advantages of data warehousing
disadvantages of data warehousing.
meaning of data mining
needs of data mining
applications of data mining
architecture of data mining
advantages of data mining
disadvantages of data mining
Metadata contains answers to questions about the data in a data warehouse. It is stored in a metadata repository and describes pertinent details about the data to users, developers, and the project team. Metadata is necessary for using, building, and administering the data warehouse as it provides information about data extraction, transformations, structure, refreshment, and more. It serves important roles for both business users and IT staff across the data acquisition, storage, and delivery processes.
A data warehouse is a database that collects and manages data from various sources to provide business insights. It contains consolidated historical data kept separately from operational databases. A data warehouse helps executives analyze data to make strategic decisions. Data mining extracts valuable patterns and knowledge from large amounts of data through techniques like classification, clustering, and neural networks. It is used along with data warehouses for applications like churn analysis, fraud detection, and market segmentation.
This document provides an overview of data mining and data warehousing. It discusses the history and evolution of databases from the 1960s to today. Data mining is defined as using automated tools to extract hidden patterns from large databases to address the problem of data explosion. Descriptive and predictive models are used in data mining. Data warehousing involves integrating data from multiple sources into a centralized database to support analysis and decision making.
This document provides an overview of data warehousing, OLAP, data mining, and big data. It discusses how data warehouses integrate data from different sources to create a consistent view for analysis. OLAP enables interactive analysis of aggregated data through multidimensional views and calculations. Data mining finds hidden patterns in large datasets through techniques like predictive modeling, segmentation, link analysis and deviation detection. The document provides examples of how these technologies are used in industries like retail, banking and insurance.
The document discusses key concepts related to data warehousing including:
1) What data warehousing is, its main components, and differences from OLTP systems.
2) The typical architecture of a data warehouse including operational data sources, storage, and end-user access tools.
3) Important considerations like data flows, integration, management of metadata, and tools/technologies used.
4) Additional topics such as benefits, challenges, administration, and data marts.
This document provides an introduction to data mining. It defines data mining as extracting useful information from large datasets. Key domains that benefit include market analysis, risk management, and fraud detection. Common data mining techniques are discussed such as association, classification, clustering, prediction, and decision trees. Both open source tools like RapidMiner, WEKA, and R, as well commercial tools like SQL Server, IBM Cognos, and Dundas BI are introduced for performing data mining.
Data Mining With Excel 2007 And SQL Server 2008Mark Tabladillo
Introduction to Excel 2007 Data Mining Plug-In using SQL Server 2008. The presentation starts with definitions and statistical theory (without equations). Then, the audience interactively participates in four demos showing the power and possibilities of the Microsoft Data Mining Algorithms.
Topics in Data Management include data analysis, database management systems, data modeling, database administration, data warehousing, data mining, data quality assurance, data security, and data architecture. Data analysis involves looking at and summarizing data to extract useful information and develop conclusions. Database management systems are used to manage databases and are used by over 90% of people using computers. Data modeling is the process of structuring and organizing data to be implemented in a database. Database administrators are responsible for ensuring the security, performance, and availability of organizational data.
The document provides an overview of data, information, knowledge, and data mining. It defines data as facts/observations/measurements, information as processed data that is useful (e.g. for decision making), and knowledge as patterns in data/information with a high degree of certainty. Data mining is described as the process of extracting useful but non-obvious information from large databases through an interactive and iterative process. Common business applications and technologies involved in data mining are also discussed.
A data warehouse is a central repository of historical data from an organization's various sources designed for analysis and reporting. It contains integrated data from multiple systems optimized for querying and analysis rather than transactions. Data is extracted, cleaned, and loaded from operational sources into the data warehouse periodically. The data warehouse uses a dimensional model to organize data into facts and dimensions for intuitive analysis and is optimized for reporting rather than transaction processing like operational databases. Data warehousing emerged to meet the growing demand for analysis that operational systems could not support due to impacts on performance and limitations in reporting capabilities.
Implementation of Multi-node Clusters in Column Oriented Database using HDFSIJEACS
Generally HBASE is NoSQL database which runs in the Hadoop environment, so it can be called as Hadoop Database. By using Hadoop distributed file system and map reduce with the implementation of key/value store as real time data access combines the deep capabilities and efficiency of map reduce. Basically testing is done by using single node clustering which improved the performance of query when compared to SQL, even though performance is enhanced, the data retrieval becomes complicated as there is no multi node clusters and totally based on SQL queries. In this paper, we use the concepts of HBase, which is a column oriented database and it is on the top of HDFS (Hadoop distributed file system) along with multi node clustering which increases the performance. HBase is key/value store which is Consistent, Distributed, Multidimensional and Sorted map. Data storage in HBase in the form of cells, and here those cells are grouped by a row key. Hence our proposal yields better results regarding query performance and data retrieval compared to existing approaches.
Data mining (lecture 1 & 2) conecpts and techniquesSaif Ullah
This document provides an overview of data mining concepts from Chapter 1 of the textbook "Data Mining: Concepts and Techniques". It discusses the motivation for data mining due to increasing data collection, defines data mining as the extraction of useful patterns from large datasets, and outlines some common applications like market analysis, risk management, and fraud detection. It also introduces the key steps in a typical data mining process including data selection, cleaning, mining, and evaluation.
William Inmon is considered the father of data warehousing. He has over 35 years of experience in database technology and data warehouse design. Inmon has written over 650 articles and published 45 books on topics related to building, using, and maintaining data warehouses and information factories. A data warehouse is a collection of integrated, subject-oriented databases designed to support decision-making. It contains data that is non-volatile, time-variant, integrated, and summarized for analysis. Key components of a data warehouse environment include the data store, data marts, and metadata.
A Practical Approach To Data Mining Presentationmillerca2
This document provides an overview of data mining, including common uses, tools, and challenges related to system performance, security, privacy, and ethics. It discusses how data mining involves extracting patterns from data using techniques like classification, clustering, and association rule learning. Maintaining privacy and anonymity while aggregating data from multiple sources for analysis poses ethical issues. The document also offers tips for gaining access to data and navigating performance concerns when conducting data mining projects.
This document provides an overview of data warehousing concepts including definitions of data warehousing, the components of a data warehouse architecture, characteristics of data, and the process of data modeling. It describes what a data warehouse is and some key elements like the data sources, data integration, business intelligence tools, and different types of databases. It also discusses data attributes, metadata, and the three levels of data modeling.
Big data is used for structured, unstructured and semi-structured large volume of data which is difficult to
manage and costly to store. Using explanatory analysis techniques to understand such raw data, carefully
balance the benefits in terms of storage and retrieval techniques is an essential part of the Big Data. The
research discusses the Map Reduce issues, framework for Map Reduce programming model and
implementation. The paper includes the analysis of Big Data using Map Reduce techniques and identifying
a required document from a stream of documents. Identifying a required document is part of the security in
a stream of documents in the cyber world. The document may be significant in business, medical, social, or
terrorism.
This document provides a project report on data warehousing. It includes an abstract describing data warehousing and how it transforms operational databases into informational warehouses for analysis. It also describes the introduction, background, architecture, advantages, and conclusion of data warehousing. The report is submitted by Sana Alvi and includes references.
PwC is a global network of firms providing professional services including assurance, tax, and advisory services. This training module provides an introduction to metadata management, including defining metadata, the metadata lifecycle, ensuring metadata quality, and using controlled vocabularies. Metadata exchanges and aggregation are important for interoperability.
This document provides an overview of data mining techniques and concepts. It defines data mining as the process of discovering interesting patterns and knowledge from large amounts of data. The key steps involved are data cleaning, integration, selection, transformation, mining, evaluation, and presentation. Common data mining techniques include classification, clustering, association rule mining, and anomaly detection. The document also discusses data sources, major applications of data mining, and challenges.
The document provides an overview of the key components and considerations for building a data warehouse. It discusses 7 main components: 1) the data warehouse database, 2) sourcing, acquisition, cleanup and transformation tools, 3) metadata, 4) access (query) tools, 5) data marts, 6) data warehouse administration and management, and 7) information delivery systems. It also outlines important design considerations, technical considerations, and implementation considerations that must be addressed when building a data warehouse environment.
DATAWAREHOUSE MAIn under data mining forAyushMeraki1
Data mining involves analyzing large amounts of data to discover patterns. A database is a structured collection of related data that can be accessed electronically. There are different types of databases like relational, distributed, and cloud databases. Data warehouses store historical data from multiple sources to support analysis and decision making. They use dimensional modeling with facts and dimensions organized in star schemas. OLAP systems analyze aggregated data in data warehouses for reporting and analytics, while OLTP systems handle transactional data updates and queries.
Metadata contains answers to questions about the data in a data warehouse. It is stored in a metadata repository and describes pertinent details about the data to users, developers, and the project team. Metadata is necessary for using, building, and administering the data warehouse as it provides information about data extraction, transformations, structure, refreshment, and more. It serves important roles for both business users and IT staff across the data acquisition, storage, and delivery processes.
A data warehouse is a database that collects and manages data from various sources to provide business insights. It contains consolidated historical data kept separately from operational databases. A data warehouse helps executives analyze data to make strategic decisions. Data mining extracts valuable patterns and knowledge from large amounts of data through techniques like classification, clustering, and neural networks. It is used along with data warehouses for applications like churn analysis, fraud detection, and market segmentation.
This document provides an overview of data mining and data warehousing. It discusses the history and evolution of databases from the 1960s to today. Data mining is defined as using automated tools to extract hidden patterns from large databases to address the problem of data explosion. Descriptive and predictive models are used in data mining. Data warehousing involves integrating data from multiple sources into a centralized database to support analysis and decision making.
This document provides an overview of data warehousing, OLAP, data mining, and big data. It discusses how data warehouses integrate data from different sources to create a consistent view for analysis. OLAP enables interactive analysis of aggregated data through multidimensional views and calculations. Data mining finds hidden patterns in large datasets through techniques like predictive modeling, segmentation, link analysis and deviation detection. The document provides examples of how these technologies are used in industries like retail, banking and insurance.
The document discusses key concepts related to data warehousing including:
1) What data warehousing is, its main components, and differences from OLTP systems.
2) The typical architecture of a data warehouse including operational data sources, storage, and end-user access tools.
3) Important considerations like data flows, integration, management of metadata, and tools/technologies used.
4) Additional topics such as benefits, challenges, administration, and data marts.
This document provides an introduction to data mining. It defines data mining as extracting useful information from large datasets. Key domains that benefit include market analysis, risk management, and fraud detection. Common data mining techniques are discussed such as association, classification, clustering, prediction, and decision trees. Both open source tools like RapidMiner, WEKA, and R, as well commercial tools like SQL Server, IBM Cognos, and Dundas BI are introduced for performing data mining.
Data Mining With Excel 2007 And SQL Server 2008Mark Tabladillo
Introduction to Excel 2007 Data Mining Plug-In using SQL Server 2008. The presentation starts with definitions and statistical theory (without equations). Then, the audience interactively participates in four demos showing the power and possibilities of the Microsoft Data Mining Algorithms.
Topics in Data Management include data analysis, database management systems, data modeling, database administration, data warehousing, data mining, data quality assurance, data security, and data architecture. Data analysis involves looking at and summarizing data to extract useful information and develop conclusions. Database management systems are used to manage databases and are used by over 90% of people using computers. Data modeling is the process of structuring and organizing data to be implemented in a database. Database administrators are responsible for ensuring the security, performance, and availability of organizational data.
The document provides an overview of data, information, knowledge, and data mining. It defines data as facts/observations/measurements, information as processed data that is useful (e.g. for decision making), and knowledge as patterns in data/information with a high degree of certainty. Data mining is described as the process of extracting useful but non-obvious information from large databases through an interactive and iterative process. Common business applications and technologies involved in data mining are also discussed.
A data warehouse is a central repository of historical data from an organization's various sources designed for analysis and reporting. It contains integrated data from multiple systems optimized for querying and analysis rather than transactions. Data is extracted, cleaned, and loaded from operational sources into the data warehouse periodically. The data warehouse uses a dimensional model to organize data into facts and dimensions for intuitive analysis and is optimized for reporting rather than transaction processing like operational databases. Data warehousing emerged to meet the growing demand for analysis that operational systems could not support due to impacts on performance and limitations in reporting capabilities.
Implementation of Multi-node Clusters in Column Oriented Database using HDFSIJEACS
Generally HBASE is NoSQL database which runs in the Hadoop environment, so it can be called as Hadoop Database. By using Hadoop distributed file system and map reduce with the implementation of key/value store as real time data access combines the deep capabilities and efficiency of map reduce. Basically testing is done by using single node clustering which improved the performance of query when compared to SQL, even though performance is enhanced, the data retrieval becomes complicated as there is no multi node clusters and totally based on SQL queries. In this paper, we use the concepts of HBase, which is a column oriented database and it is on the top of HDFS (Hadoop distributed file system) along with multi node clustering which increases the performance. HBase is key/value store which is Consistent, Distributed, Multidimensional and Sorted map. Data storage in HBase in the form of cells, and here those cells are grouped by a row key. Hence our proposal yields better results regarding query performance and data retrieval compared to existing approaches.
Data mining (lecture 1 & 2) conecpts and techniquesSaif Ullah
This document provides an overview of data mining concepts from Chapter 1 of the textbook "Data Mining: Concepts and Techniques". It discusses the motivation for data mining due to increasing data collection, defines data mining as the extraction of useful patterns from large datasets, and outlines some common applications like market analysis, risk management, and fraud detection. It also introduces the key steps in a typical data mining process including data selection, cleaning, mining, and evaluation.
William Inmon is considered the father of data warehousing. He has over 35 years of experience in database technology and data warehouse design. Inmon has written over 650 articles and published 45 books on topics related to building, using, and maintaining data warehouses and information factories. A data warehouse is a collection of integrated, subject-oriented databases designed to support decision-making. It contains data that is non-volatile, time-variant, integrated, and summarized for analysis. Key components of a data warehouse environment include the data store, data marts, and metadata.
A Practical Approach To Data Mining Presentationmillerca2
This document provides an overview of data mining, including common uses, tools, and challenges related to system performance, security, privacy, and ethics. It discusses how data mining involves extracting patterns from data using techniques like classification, clustering, and association rule learning. Maintaining privacy and anonymity while aggregating data from multiple sources for analysis poses ethical issues. The document also offers tips for gaining access to data and navigating performance concerns when conducting data mining projects.
This document provides an overview of data warehousing concepts including definitions of data warehousing, the components of a data warehouse architecture, characteristics of data, and the process of data modeling. It describes what a data warehouse is and some key elements like the data sources, data integration, business intelligence tools, and different types of databases. It also discusses data attributes, metadata, and the three levels of data modeling.
Big data is used for structured, unstructured and semi-structured large volume of data which is difficult to
manage and costly to store. Using explanatory analysis techniques to understand such raw data, carefully
balance the benefits in terms of storage and retrieval techniques is an essential part of the Big Data. The
research discusses the Map Reduce issues, framework for Map Reduce programming model and
implementation. The paper includes the analysis of Big Data using Map Reduce techniques and identifying
a required document from a stream of documents. Identifying a required document is part of the security in
a stream of documents in the cyber world. The document may be significant in business, medical, social, or
terrorism.
This document provides a project report on data warehousing. It includes an abstract describing data warehousing and how it transforms operational databases into informational warehouses for analysis. It also describes the introduction, background, architecture, advantages, and conclusion of data warehousing. The report is submitted by Sana Alvi and includes references.
PwC is a global network of firms providing professional services including assurance, tax, and advisory services. This training module provides an introduction to metadata management, including defining metadata, the metadata lifecycle, ensuring metadata quality, and using controlled vocabularies. Metadata exchanges and aggregation are important for interoperability.
This document provides an overview of data mining techniques and concepts. It defines data mining as the process of discovering interesting patterns and knowledge from large amounts of data. The key steps involved are data cleaning, integration, selection, transformation, mining, evaluation, and presentation. Common data mining techniques include classification, clustering, association rule mining, and anomaly detection. The document also discusses data sources, major applications of data mining, and challenges.
The document provides an overview of the key components and considerations for building a data warehouse. It discusses 7 main components: 1) the data warehouse database, 2) sourcing, acquisition, cleanup and transformation tools, 3) metadata, 4) access (query) tools, 5) data marts, 6) data warehouse administration and management, and 7) information delivery systems. It also outlines important design considerations, technical considerations, and implementation considerations that must be addressed when building a data warehouse environment.
DATAWAREHOUSE MAIn under data mining forAyushMeraki1
Data mining involves analyzing large amounts of data to discover patterns. A database is a structured collection of related data that can be accessed electronically. There are different types of databases like relational, distributed, and cloud databases. Data warehouses store historical data from multiple sources to support analysis and decision making. They use dimensional modeling with facts and dimensions organized in star schemas. OLAP systems analyze aggregated data in data warehouses for reporting and analytics, while OLTP systems handle transactional data updates and queries.
This document provides information about a course on data warehousing and data mining, including:
1. It outlines the course syllabus which covers the basics of data warehousing, data preprocessing, association rules, classification and clustering, and recent trends in data mining.
2. It describes the 5 units that make up the course, including an overview of the topics covered in each unit such as data warehouse architecture, data integration, decision trees, and applications of data mining.
3. It lists two textbooks and four references that will be used for the course.
The document discusses key concepts related to databases including:
- A database is an organized collection of data stored electronically and accessed via a DBMS.
- Data is logically organized into records, tables, and databases for meaningful representation to users.
- Databases offer advantages like reduced data redundancy, improved data integrity, and easier data sharing.
- Database subsystems include the database engine, data definition language, and data administration.
The document then covers database types, uses, issues, and security concepts.
This document contains 26 questions and their answers related to management information systems. The questions cover topics such as data resource management, databases, data warehousing, transaction processing, decision support systems, end user computing, information systems in various business functions like marketing, manufacturing, human resources, accounting, and financial management. Other topics include information resource management, file organization techniques, and humans as information processors.
The document discusses data warehousing and OLAP technology. It defines a data warehouse as a subject-oriented, integrated, time-variant, and nonvolatile collection of data used for decision making. It describes the architecture of a data warehouse including extraction, transformation, loading, and refreshing of data. It also discusses how data warehousing has evolved to online analytical mining (OLAM) which integrates OLAP and data mining capabilities.
Data mining involves discovering hidden patterns in data, while data warehousing involves integrating data from multiple sources and storing it in a centralized location to support analysis. Some key differences are:
- Data mining uses techniques like classification, clustering, and association to discover insights from data, while data warehousing focuses on data integration and OLAP tools.
- Data mining looks for unknown relationships and makes predictions, while data warehousing provides a way to extract and analyze historical data.
- Data warehousing involves extracting, cleaning, and transforming data during an ETL process before loading it into a separate database optimized for analysis. Data mining builds on the outputs of data warehousing.
Analytic Platforms in the Real World with 451Research and Calpont_July 2012Calpont Corporation
Matt Aslett of 451 Research discussed the rise of analytic platforms and their role in enabling exploratory analytics on large datasets. Bob Wilkinson from Calpont then presented on InfiniDB, Calpont's columnar analytic platform that provides scalable and fast performance for complex queries. InfiniDB was shown to accelerate analytics for telecommunications customer experience data and online advertising attribution. The discussion highlighted how InfiniDB supports flexible schemas and a spectrum of analytic approaches to enable exploratory analysis on structured data.
Data Warehouse – Introduction, characteristics, architecture, scheme and modelling, Differences between operational database systems and data warehouse.
The document discusses a workshop on designing information systems for business organizations. It covers topics like the $10 billion industry shift towards information management, motivation for next generation databases, challenges of database technology, scenarios involving instant virtual enterprises and personalized information systems, and the aims and objectives of familiarizing participants with database development techniques.
This document discusses data warehousing and OLAP technology. It defines key concepts like data warehouses, OLTP vs OLAP, and multidimensional data models. It also explains data warehouse architectures like star schemas and snowflake schemas, and how dimensions and measures are modeled in a data cube.
The document discusses a seminar on data warehousing presented by Sangram Keshari Swain. It defines a data warehouse as a subject-oriented, integrated, non-volatile collection of data used to support management decision making. The primary concept is separating nonvolatile data for analysis from operational systems. A data warehouse provides a single view of enterprise data optimized for reporting and analysis through extracting and integrating data from different sources.
The document provides an overview of key data warehousing concepts. It defines a data warehouse as a single, consistent store of data obtained from various sources and made available to users in a format they can understand for business decision making. The document outlines some common questions end users may have that a data warehouse can help answer. It also discusses the differences between online transaction processing (OLTP) systems and data warehouses, including that data warehouses integrate historical data from various sources and are optimized for analysis rather than transactions.
This document provides an overview of database concepts and information management systems. It discusses topics such as database definition, data warehousing, data mining, centralized vs distributed processing, security issues, and technical solutions for privacy protection. Databases are organized collections of data that allow for storage, retrieval and use of related information. Data warehousing involves integrating data from multiple sources to support decision making. Data mining is the process of extracting patterns and useful information from large datasets. Security measures like access control, encryption and backups are important for protecting information.
This document provides an overview of data warehousing and OLAP concepts. It defines a data warehouse as a subject-oriented database used for analysis and decision making rather than transactions. Key concepts discussed include dimensional modeling using star schemas, snowflake schemas, and fact constellations. It also describes data cubes and cuboids as multidimensional views of data and how OLAP enables interactive analysis of consolidated data.
The document provides an overview of data warehousing and OLAP technology. It defines a data warehouse as a subject-oriented, integrated collection of historical data used for analysis and decision making. It describes key properties of data warehouses including being subject-oriented, integrated, time-variant, and non-volatile. It also discusses dimensional modeling, data cubes, and OLAP for analyzing aggregated data.
This document provides an overview of data science and key concepts in data. It defines data science and describes the data value chain, which identifies the main activities in generating value from data: data acquisition, analysis, curation, storage, and usage. It also defines different data types such as structured, unstructured, and semi-structured data. The document discusses characteristics of big data, including the 3Vs of volume, velocity, and variety as well as other characteristics like veracity and variability. Finally, it outlines the typical big data lifecycle of ingesting, persisting, computing/analyzing, and visualizing data.
The document discusses key concepts in data warehousing including:
1) The distinction between data and information, with data becoming valuable when organized and presented as information for decision making.
2) Characteristics of a data warehouse including being subject-oriented, integrated, non-volatile, time-variant, and accessible to end-users.
3) Differences between operational data and data warehouse data including the data warehouse being subject-oriented, summarized over time, and serving managerial communities rather than transactional needs.
This document provides an overview of data mining and data warehousing concepts. It defines data mining as the process of identifying patterns in data. The data mining process involves tasks like classification, clustering, and association rule mining. It also discusses data warehousing concepts like dimensional modeling using star schemas and snowflake schemas to organize data for analysis. Common data mining techniques like decision trees, neural networks, and association rule mining are also summarized.
Security in Clouds: Cloud security challenges – Software as a
Service Security, Common Standards: The Open Cloud Consortium – The Distributed management Task Force – Standards for application Developers – Standards for Messaging – Standards for Security, End user access to cloud computing, Mobile Internet devices and the cloud. Hadoop – MapReduce – Virtual Box — Google App Engine – Programming Environment for Google App Engine.
Need for Virtualization – Pros and cons of Virtualization – Types of Virtualization –System VM, Process VM, Virtual Machine monitor – Virtual machine properties - Interpretation and binary translation, HLL VM - supervisors – Xen, KVM, VMware, Virtual Box, Hyper-V.
This Presentation provides a detailed insight about Collaborating Using Cloud Services Email Communication over the Cloud - CRM Management – Project Management-Event
Management - Task Management – Calendar - Schedules - Word Processing –
Presentation – Spreadsheet - Databases – Desktop - Social Networks and Groupware.
This presentation provides a detailed coverage on Cloud services: Software as a Service, Platform as a Service, Infrastructure as a Service, Database as a Service, Monitoring as a Service, Communication as Services. Service providers- Google, Amazon, Microsoft Azure, IBM, Sales force.
The document provides recommendations for books on cloud computing concepts and technologies. It then discusses the history and drivers of the Fourth Industrial Revolution powered by cloud, social, mobile, IoT, and AI technologies. The document defines cloud computing and discusses characteristics such as on-demand access to computing resources, utility computing models, and service delivery of infrastructure, platforms, and applications. It also outlines some major cloud platform providers including Eucalyptus, Nimbus, OpenNebula, and the CloudSim simulation framework.
This Presentation is an abstract of discussion I had during my Session with Participants of a Webinar at Regional Center of IGNOU, Patna on Future Skills & Career Opportunities in POST COVID-19
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
This is my presentation on the Topic "Data Science - An emerging Stream of Science with its Spreading Reach & Impact". I have compiled and collected different statistics and data from different sources. This may be useful for students and those who might be interested in this field of Study.
Delivered Key Note Address in National Seminar on
"Digital India: Use of Technology For Transforming Society" organized at Gaya College, Gaya on 28th & 29th January, 2017.
Gaya college-gaya-28-29.01.2017-presentation
Paradigm Shift in
Computing Technology, ICT & its Applications: Technical, Social, Economic and Environmental Perspective
Mobile Technology – Historical Evolution, Present Status & Future DirectionsDr. Sunil Kr. Pandey
The document discusses the history and development of mobile technology. It describes how technology has shifted from mainframes to tablets and personal computing to mobile computing and cloud computing. It outlines several generations of mobile technology including early analog cellular services in the 1940s-1970s with large transmitters and limited coverage and capacity. It also discusses the development of digital cellular services in the 1980s enabled by microprocessors and digital control links between base stations and mobile units.
Mobile Technology – Historical Evolution, Present Status & Future DirectionsDr. Sunil Kr. Pandey
I made this Presentation as a Resource Person in a Faculty Development Programme organized at Central University of Himachal Pradesh, Dharmshala, HP during 13th & 14th June, 2016.
Green Commputing - Paradigm Shift in Computing Technology, ICT & its Applicat...Dr. Sunil Kr. Pandey
I was invited as Key Note Speaker in a National Event organized at Gajadhar Bhagat College, Naugachia, (TM Bhagalpur University). I took session on "Paradigm Shift in Computing Technology, ICT & its Applications - Socioeconomic and Environmental Perspective". It was a wonderful learning experience to meet, interact and experience sharing with delegates, faculty and students there.
This presentation is an attempt to create awareness about Digital India Mission Program - its Projects preservative, Policies and various initiatives. Over all this presents a brief on the Digital India Mission Program by Govt. of India which was launched by Honorable Prime Minister of India, Sri. Narendra Modiji!
The document discusses business analysis and data warehousing. It covers the syllabus for Unit III which includes topics like business analysis, reporting and query tools, OLAP, patterns and models, statistics, and artificial intelligence. It then discusses business analysis in more detail including defining it, the business analysis process, ensuring goals are oriented, and roles of business analysts like strategist, architect and systems analyst. Finally, it covers business process improvement and different reporting and query tools.
1. Prof. S. K. Pandey, I.T.S, Ghaziabad
Data Warehousing & Mining
UNIT – I
2. Prof. S.K. Pandey, I.T.S, Ghaziabad 2
Syllabus of Unit - ISyllabus of Unit - I
DSS-Uses, definition, Operational Database.
Introduction to DATA Warehousing. Data-Mart,
Concept of Data-Warehousing,
Multi Dimensional Database Structures.
Client/Server Computing Model & Data
Warehousing
Parallel Processors & Cluster Systems. Distributed
DBMS implementations.
3. Introduction –Introduction – Decision SupportDecision Support
System (DSS)System (DSS)
A Decision Support System (DSS) is an interactive computer-
based system or subsystem intended to help decision makers use
communications technologies, data, documents, knowledge
and/or models to identify and solve problems, complete decision
process tasks, and make decisions.
It is clear that DSS belong to an environment with
multidisciplinary foundations, including (but not exclusively):
– Database research,
– Artificial intelligence,
– Human-computer interaction,
– Simulation methods,
– Software engineering, and
– Telecommunications.
Prof. S.K. Pandey, I.T.S, Ghaziabad 3
4. Prof. S.K. Pandey, I.T.S, Ghaziabad 4
DSSDSS
• A Decision Support System (DSS) is a computer-
based information system that supports business or
organizational decision-making activities.
• DSSs serve the management, operations, and planning
levels of an organization (usually mid and higher
management) and help to make decisions, which may
be rapidly changing and not easily specified in advance
(Unstructured and Semi-Structured decision problems).
• Decision support systems can be either fully
computerized, human or a combination of both.
6. Prof. S.K. Pandey, I.T.S, Ghaziabad 6
Typical DSS ArchitectureTypical DSS Architecture
TPS: transaction
processing system
MODEL: representation
of a problem
OLAP: on-line analytical
processing
USER INTERFACE:
how user enters problem
& receives answers
DSS DATABASE:
current data from
applications or groups
DATA MINING:
technology for finding
relationships in large data
bases for prediction
TPS
EXTERNAL
DATA
DSS DATA
BASE
DSS SOFTWARE SYSTEM
MODELS
OLAP TOOLS
DATA MINING TOOLS
USER
INTERFACE
USER
7. Why DSS?Why DSS?
Increasing complexity of decisions
– Technology
– Information:
“Data, data everywhere, and not the time to think!”
– Number and complexity of options
– Pace of change
Increasing availability of computerized support
– Inexpensive high-powered computing
– Better software
– More efficient software development process
Increasing usability of computers
Prof. S.K. Pandey, I.T.S, Ghaziabad 7
8. Prof. S.K. Pandey, I.T.S, Ghaziabad 8
Operational DatabasesOperational Databases
Operational database management systems (also referred to as OLTP
databases), are used to manage dynamic data in real-time.
These types of databases allow you to do more than simply view archived
data. Operational databases allows to modify that data (add, change or delete
data), doing it in real-time.
Since the early 90's, the operational database software market has been largely
taken over by SQL engines.
Today, the operational DBMS market (formerly OLTP) is evolving
dramatically, with new, innovative entrants and incumbents supporting the
growing use of unstructured data and NoSQL DBMS engines, as well as
XML databases and NewSQL databases.
Operational databases are increasingly supporting distributed database
architecture that provides high availability and fault tolerance through
replication and scale out ability.
10. Prof. S.K. Pandey, I.T.S, Ghaziabad 10
FEATURES DATABASE DATA WAREHOUSE
Characteristic It is based on Operational Processing. It is based on Informational Processing.
Data It mainly stores the Current data which
always guaranteed to be up-to-date.
It usually stores the Historical data whose
accuracy is maintained over time.
Function It is used for day-to-day operations. It is used for long-term informational
requirements and decision support.
User The common users are clerk, DBA,
database professional.
The common users are knowledge worker
(e.g., manager, executive, analyst)
Unit of work Its work consists of short and simple
transaction.
The operations on it consists of complex
queries..
Focus The focus is on “Data IN” The focus is on “Information OUT”
Orientation The orientation is on Transaction. The orientation is on Analysis.
DB design The designing of database is ER based
and application-oriented.
The designing is done using star/snowflake
schema and its subject-oriented.
Summarization The data is primitive and highly
detailed.
The data is summarized and in consolidated
form.
View The view of the data is flat relational. The view of the data is multidimensional.
Differences between the Databases and Data Warehouses
11. Prof. S.K. Pandey, I.T.S, Ghaziabad 11
FEATURES DATABASE DATA WAREHOUSE
Function It is used for day-to-day operations. It is used for long-term informational
requirements and decision support.
User The common users are clerk, DBA,
database professional.
The common users are knowledge worker
(e.g., manager, executive, analyst)
Access The most frequent type of access type is
read/write.
It mostly use the read access for the
stored data.
Operations The main operation is index/hash on
primary key.
For any operation it needs a lot of scans.
Number of
records accessed
A few tens of records. A bunch of millions of records.
Number of users In order of thousands. In the order of hundreds only.
DB size 100 MB to GB. 100 GB to TB.
Priority High performance, high availability High flexibility, end-user autonomy
Metric To measure the efficiency, transaction
throughput is measured.
To measure the efficiency, query
throughput and response time is
measured.
12. Prof. S.K. Pandey, I.T.S, Ghaziabad 12
Concept ofConcept of
Data WarehousingData Warehousing
13. Prof. S.K. Pandey, I.T.S, Ghaziabad 13
Why Separate Data Warehouse?Why Separate Data Warehouse?
High performance for both systems
– DBMS— tuned for OLTP: access methods, indexing,
concurrency control, recovery
– Warehouse—tuned for OLAP: complex OLAP queries,
multidimensional view, consolidation.
Different functions and different data:
– missing data: Decision support requires historical data
which operational DBs do not typically maintain
– data consolidation: DS requires consolidation
(aggregation, summarization) of data from heterogeneous
sources
– data quality: different sources typically use inconsistent
data representations, codes and formats which have to be
reconciled
14. Prof. S.K. Pandey, I.T.S, Ghaziabad 14
DATA Warehousing - IntroductionDATA Warehousing - Introduction
A data warehouse is a subject-oriented,
integrated, nonvolatile, time-variant collection
of data in support of management's decisions.
- WH Inmon
16. Prof. S.K. Pandey, I.T.S, Ghaziabad 16
Data Warehouse UsageData Warehouse Usage
Three kinds of data warehouse applications
– Information processing
supports querying, basic statistical analysis, and reporting using
crosstabs, tables, charts and graphs
– Analytical processing
multidimensional analysis of data warehouse data
supports basic OLAP operations, slice-dice, drilling, pivoting
– Data mining
knowledge discovery from hidden patterns
supports associations, constructing analytical models, performing
classification and prediction, and presenting the mining results
using visualization tools.
Differences among the three tasks
17. Prof. S.K. Pandey, I.T.S, Ghaziabad 17
Data Warehouse: Subject-OrientedData Warehouse: Subject-Oriented
Organized around major subjects, such as customer, product,
sales.
Focusing on the modeling and analysis of data for decision
makers, not on daily operations or transaction processing.
Provide a simple and concise view around particular
subject issues by excluding data that are not useful in the
decision support process.
18. Prof. S.K. Pandey, I.T.S, Ghaziabad 18
Subject-OrientedSubject-Oriented
Quotes Orders
ProspectsLeads
Operational
Data
Warehouse
Customers Products
Regions Time
Focus is on Subject Areas rather than ApplicationsFocus is on Subject Areas rather than Applications
19. Prof. S.K. Pandey, I.T.S, Ghaziabad 19
Data Warehouse—IntegratedData Warehouse—Integrated
Constructed by integrating multiple, heterogeneous data
sources
– relational databases, flat files, on-line transaction records
Data cleaning and data integration techniques are applied.
– Ensure consistency in naming conventions, encoding
structures, attribute measures, etc. among different data
sources
E.g., Hotel price: currency, tax, breakfast covered, etc.
– When data is moved to the warehouse, it is converted.
20. Prof. S.K. Pandey, I.T.S, Ghaziabad 20
Data Warehouse—Time VariantData Warehouse—Time Variant
The time horizon for the data warehouse is significantly longer
than that of operational systems.
– Operational database: current value data.
– Data warehouse data: provide information from a historical
perspective (e.g., past 5-10 years)
Every key structure in the data warehouse
– Contains an element of time, explicitly or implicitly
– But the key of operational data may or may not contain
“time element”.
21. Prof. S.K. Pandey, I.T.S, Ghaziabad 21
Time VariantTime Variant
Operational
Data
Warehouse
Current Value data
• time horizon : 60-90 days
Snapshot data
• time horizon : 5-10 years
•data warehouse stores historical
data
Data Warehouse Typically Spans Across TimeData Warehouse Typically Spans Across Time
22. Prof. S.K. Pandey, I.T.S, Ghaziabad 22
Data Warehouse—Non-VolatileData Warehouse—Non-Volatile
A physically separate store of data transformed from the
operational environment.
Operational update of data does not occur in the data
warehouse environment.
– Does not require transaction processing, recovery, and
concurrency control mechanisms
– Requires only two operations in data accessing:
initial loading of data and access of data.
23. Prof. S.K. Pandey, I.T.S, Ghaziabad 23
Non-volatileNon-volatile
Operational Data
Warehouse
replace
change
insert
changeinsert
delete
load
read only
access
Data Warehouse Is Relatively Static In NatureData Warehouse Is Relatively Static In Nature
24. Prof. S.K. Pandey, I.T.S, Ghaziabad 24
Data Warehouse vs. HeterogeneousData Warehouse vs. Heterogeneous
DBMSDBMS
Traditional heterogeneous DB integration:
– Build wrappers/mediators on top of heterogeneous databases
– Query driven approach
When a query is posed to a client site, a meta-dictionary is used to
translate the query into queries appropriate for individual
heterogeneous sites involved, and the results are integrated into a
global answer set
Complex information filtering, compete for resources
Data warehouse: update-driven, high performance
– Information from heterogeneous sources is integrated in advance and
stored in warehouses for direct query and analysis
25. Prof. S.K. Pandey, I.T.S, Ghaziabad 25
Data Warehouse vs. Operational DBMSData Warehouse vs. Operational DBMS
OLTP (on-line transaction processing)
– Major task of traditional relational DBMS
– Day-to-day operations: purchasing, inventory, banking, manufacturing,
payroll, registration, accounting, etc.
OLAP (on-line analytical processing)
– Major task of data warehouse system
– Data analysis and decision making
Distinct features (OLTP vs. OLAP):
– User and system orientation: customer vs. market
– Data contents: current, detailed vs. historical, consolidated
– Database design: ER + application vs. star + subject
– View: current, local vs. evolutionary, integrated
– Access patterns: update vs. read-only but complex queries
26. Prof. S.K. Pandey, I.T.S, Ghaziabad 26
OLTP vs. OLAPOLTP vs. OLAP
OLTP OLAP
users clerk, IT professional knowledge worker
function day to day operations decision support
DB design application-oriented subject-oriented
data current, up-to-date
detailed, flat relational
isolated
historical,
summarized, multidimensional
integrated, consolidated
usage repetitive ad-hoc
access read/write
index/hash on prim. key
lots of scans
unit of work short, simple transaction complex query
# records accessed tens millions
#users thousands hundreds
DB size 100MB-GB 100GB-TB
metric transaction throughput query throughput, response
27. Prof. S.K. Pandey, I.T.S, Ghaziabad 27
Slide 29- 27
Characteristics of Data WarehousesCharacteristics of Data Warehouses
Multidimensional conceptual view
Generic dimensionality
Unlimited dimensions and aggregation levels
Unrestricted cross-dimensional operations
Dynamic sparse matrix handling
Client-server architecture
Multi-user support
Accessibility
Transparency
Intuitive data manipulation
Consistent reporting performance
Flexible reporting
28. Prof. S.K. Pandey, I.T.S, Ghaziabad
28
Multi-Tiered ArchitectureMulti-Tiered Architecture
Components & Framework
Data Integration Stage
29. Prof. S.K. Pandey, I.T.S, Ghaziabad 29
Data MartData Mart
The data mart is a subset of the data warehouse that is
usually oriented to a specific business line or team. Data
marts are small slices of the data warehouse.
Whereas data warehouses have an enterprise-wide
depth, the information in data marts pertains to a single
department.
Data marts improve end-user response time by allowing
users to have access to the specific type of data they
need to view most often by providing the data in a way
that supports the collective view of a group of users.
30. Contd………….Contd………….
A data mart is basically a condensed and more focused
version of a data warehouse that reflects the regulations
and process specifications of each business unit within
an organization.
Each data mart is dedicated to a specific business
function or region.
This subset of data may span across many or all of an
enterprise’s functional subject areas.
It is common for multiple data marts to be used in order
to serve the needs of each individual business unit (different
data marts can be used to obtain specific information for various enterprise
departments, such as accounting, marketing, sales, etc.).
Prof. S.K. Pandey, I.T.S, Ghaziabad 30
31. Reasons for creating a data martReasons for creating a data mart
Easy access to frequently needed data
Creates collective view by a group of users
Improves end-user response time
Ease of creation
Lower cost than implementing a full data warehouse
Potential users are more clearly defined than in a full
data warehouse
Contains only business essential data and is less
cluttered.
Prof. S.K. Pandey, I.T.S, Ghaziabad 31
32. Types of Data MartsTypes of Data Marts
Dependent Data Mart: A dependent data mart is one
whose source is another data warehouse, and all
dependent data marts within an organization are
typically fed by the same source — the enterprise data
warehouse.
Prof. S.K. Pandey, I.T.S, Ghaziabad 32
33. Contd…Contd…
Independent Data Mart: An independent data mart
is one whose source is directly from transactional
systems, legacy applications, or external data feeds.
Prof. S.K. Pandey, I.T.S, Ghaziabad 33
34. Prof. S.K. Pandey, I.T.S, Ghaziabad 34
Data warehouse:
i. Holds multiple subject areas
ii. Holds very detailed information
iii. Works to integrate all data sources
iv. Does not necessarily use a dimensional model but feeds dimensional
models.
Data mart:
i. Often holds only one subject area- for example, Finance, or Sales
ii. May hold more summarized data (although many hold full detail)
iii. Concentrates on integrating information from a given subject area or
set of source systems
iv. Is built focused on a dimensional model using a star schema.
Data mart vs data warehouse
36. Prof. S.K. Pandey, I.T.S, Ghaziabad 36
Multi Dimensional DatabaseMulti Dimensional Database
StructuresStructures
Sales volume as a function of product,
month, and region
ProductRegion
Month
Dimensions: Product, Location, Time
Hierarchical summarization paths
Industry Region Year
Category Country Quarter
Product City Month Week
Office Day
37. Prof. S.K. Pandey, I.T.S, Ghaziabad 37
Slide 29- 37
Data Modeling for Data WarehousesData Modeling for Data Warehouses
Example of Two- Dimensional vs. Multi-Dimensional
:
:
Three dimens ional data c ube
P
r
o
d
u
c
t
Fis cal Quarter
Qtr 1
Qtr 2
Qtr 3
Qtr 4
Reg 1
P123
P124
P125
P126
Reg 2 Reg 3
Re g io n
38. Prof. S.K. Pandey, I.T.S, Ghaziabad 38
From Tables and SpreadsheetsFrom Tables and Spreadsheets
to Data Cubesto Data Cubes
A data warehouse is based on a multidimensional data model
which views data in the form of a data cube
A data cube, such as sales, allows data to be modeled and viewed in
multiple dimensions
– Dimension tables, such as item (item_name, brand, type), or
time(day, week, month, quarter, year)
– Fact table contains measures (such as dollars_sold) and keys to
each of the related dimension tables
In data warehousing literature, an n-D base cube is called a base
cuboid. The top most 0-D cuboid, which holds the highest-level of
summarization, is called the apex cuboid. The lattice of cuboids
forms a data cube.
39. Prof. S.K. Pandey, I.T.S, Ghaziabad 39
Cube: A Lattice of CuboidsCube: A Lattice of Cuboids
all
time item location supplier
time,item time,location
time,supplier
item,location
item,supplier
location,supplier
time,item,location
time,item,supplier
time,location,supplier
item,location,supplier
time, item, location, supplier
0-D cuboid
1-D cuboids
2-D cuboids
3-D cuboids
4-D(base) cuboid
41. Prof. S.K. Pandey, I.T.S, Ghaziabad 41
Conceptual Modeling of DataConceptual Modeling of Data
WarehousesWarehouses
Modeling data warehouses: dimensions & measures
– Star schema: A fact table in the middle connected to a set of
dimension tables
– Snowflake schema: A refinement of star schema where some
dimensional hierarchy is normalized into a set of smaller
dimension tables, forming a shape similar to snowflake
– Fact constellations: Multiple fact tables share dimension
tables, viewed as a collection of stars, therefore called galaxy
schema or fact constellation
42. Prof. S.K. Pandey, I.T.S, Ghaziabad 42
Example of Star SchemaExample of Star Schema
time_key
day
day_of_the_week
month
quarter
year
time
location_key
street
city
province_or_street
country
location
Sales Fact Table
time_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_sales
Measures
item_key
item_name
brand
type
supplier_type
item
branch_key
branch_name
branch_type
branch
43. Prof. S.K. Pandey, I.T.S, Ghaziabad 43
Example of Snowflake SchemaExample of Snowflake Schema
time_key
day
day_of_the_week
month
quarter
year
time
location_key
street
city_key
location
Sales Fact Table
time_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_sales
Measures
item_key
item_name
brand
type
supplier_key
item
branch_key
branch_name
branch_type
branch
supplier_key
supplier_type
supplier
city_key
city
province_or_street
country
city
44. Prof. S.K. Pandey, I.T.S, Ghaziabad 44
Example of Fact ConstellationExample of Fact Constellation
time_key
day
day_of_the_week
month
quarter
year
time
location_key
street
city
province_or_street
country
location
Sales Fact Table
time_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_sales
Measures
item_key
item_name
brand
type
supplier_type
item
branch_key
branch_name
branch_type
branch
Shipping Fact Table
time_key
item_key
shipper_key
from_location
to_location
dollars_cost
units_shipped
shipper_key
shipper_name
location_key
shipper_type
shipper
45. Prof. S.K. Pandey, I.T.S, Ghaziabad 45
Client/Server Computing Model &Client/Server Computing Model &
Data WarehousingData Warehousing
The fundamental characteristic of client/server computing is
distribution of computing resources (e.g. data, compute power)
across different computers.
The idea is to divide applications into logical segments (tasks) so
that they are then performed on platforms most appropriate.
A client/server database system increases processing power by
separating the database management system from the application;
the client as the front-end system handling the user interface and
the server as the back-end system accessing the database, which
cooperate to run an application.
46. Contd….Contd….
Data Warehousing is a continual process which
enables a corporation to assemble operational and
other data from a variety of internal and external
sources, and transform that data into consistent,
high-quality, business information, distribute that
information to the points of maximum value within
the organizations, and provide easy, flexible and fast
access for busy non-technical users.
Prof. S.K. Pandey, I.T.S, Ghaziabad 46
47. Reasons for using client/serverReasons for using client/server
Exploitation of centralized computing power /data
capacity
Scalability
Performance
Flexibility (in order to adjust to changing demands)
GUI on desktop
Protection of investment, strategic software,
strategic data
Client/server provides an integrated solution.
Prof. S.K. Pandey, I.T.S, Ghaziabad 47
49. Prof. S.K. Pandey, I.T.S, Ghaziabad 49
Loosely Coupled - ClustersLoosely Coupled - Clusters
Collection of independent whole uni-processors or SMPs
– Usually called nodes
Interconnected to form a cluster
Working together as unified resource
– Illusion of being one machine
Communication via fixed path or network connections
Cluster BenefitsCluster Benefits
Absolute scalability
Incremental scalability
High availability
Superior price/performance
50. Prof. S.K. Pandey, I.T.S, Ghaziabad 50
Distributed DBMS implementations
What Is A Distributed DBMS?What Is A Distributed DBMS?
Decentralization of business operations and globalization of
businesses created a demand for distributing the data and processes
across multiple locations.
Distributed database management systems (DDBMS) are designed
to meet the information requirements of such multi-location
organizations.
A DDBMS manages the storage and processing of logically
related data over interconnected computer systems in which
both data and processing functions are distributed among several
sites.
Distributed processing shares the database’s logical processing
among two or more physically independent sites that are
connected through a network.
51. DDBMS AdvantagesDDBMS Advantages
Data located near site with greatest demand
Faster data access
Faster data processing
Growth facilitation
Improved communications
Reduced operating costs
User-friendly interface
Less danger of single-point failure
Processor independence
Prof. S.K. Pandey, I.T.S, Ghaziabad 51
52. Prof. S.K. Pandey, I.T.S, Ghaziabad 52
Distributed ProcessingDistributed Processing
Shares database’s logical processing among physically, networked independent sites
53. Prof. S.K. Pandey, I.T.S, Ghaziabad 53
DDBMS ComponentsDDBMS Components
Computer workstations that form the network
system.
Network hardware and software components that
reside in each workstation.
Communications media that carry the data from one
workstation to another.
Transaction processor (TP) receives and processes
the application’s data requests.
Data processor (DP) stores and retrieves data
located at the site. Also known as data manager
(DM).
54. Prof. S.K. Pandey, I.T.S, Ghaziabad 54
Distributed DB TransparencyDistributed DB Transparency
A DDBMS ensures that the database operations are
transparent to the end user.
Different types of transparencies are:
– Distribution transparency
– Transaction transparency
– Failure transparency
– Performance transparency
– Heterogeneity transparency
55. 55
Distributed Database DesignDistributed Database Design
All design principles and concepts discussed in the
context of a centralized database also apply to a
distributed database.
Three additional issues are relevant to the design
of a distributed database:
– data fragmentation
– data replication
– data allocation
Prof. S.K. Pandey, I.T.S, Ghaziabad
56. 56
Data FragmentationData Fragmentation
Data fragmentation allows us to break a single object (a
database or a table) into two or more fragments.
Three type of fragmentation strategies are available
to distribute a table: - Horizontal, Vertical, Mixed.
Horizontal fragmentation divides a table into
fragments consisting of sets of tuples:
– Each fragment has unique rows and is stored at a
different node
– Example: A bank may distribute its customer table
by location
Prof. S.K. Pandey, I.T.S, Ghaziabad
57. 57
Contd……Contd……
Vertical fragmentation divides a table into fragments
consisting of sets of columns
– Each fragment is located at a different node and
consists of unique columns - with the exception of
the primary key column, which is common to all
fragments
– Example: The Customer table may be divided into
two fragments, one fragment consisting of Cust ID,
name, and address may be located in the Service
building and the other fragment with Cust ID, credit
limit, balance, dues may be located in the Collection
building.
Prof. S.K. Pandey, I.T.S, Ghaziabad
58. 58
Data FragmentationData Fragmentation
Mixed fragmentation combines the horizontal and
vertical strategies.
A fragment may consist of a subset of rows and a
subset of columns of the original table.
Example: Customer table may be divided by state
and grouped by columns. The service building in
Texas will store Customer service related
information for customers from Texas.
Prof. S.K. Pandey, I.T.S, Ghaziabad
59. 59
Data ReplicationData Replication
Data replication involves storing multiple copies of a
fragment in different locations. For example, a copy
may be stored in New Delhi and another in Mumbai.
It improves response time and data availability.
Data replication requires the DDBMS to maintain data
consistency among the replicas.
A fully replicated database stores multiple copies of each
database fragment.
A partially replicated database stores multiple copies of
some database fragments at multiple sites.
Prof. S.K. Pandey, I.T.S, Ghaziabad
60. 60
Data AllocationData Allocation
Data allocation decision involves determining the
location of the fragments so as to achieve the design
goals of cost, response time and availability.
Three data allocation strategies are: centralized,
partitioned and replicated.
A centralized allocation strategy stores the entire
database in a single location.
A partitioned strategy divides the database into
disjointed parts (fragments) and allocates the fragments
to different locations.
In a replicated strategy copies of one or more database
fragments are stored at several sites.
Prof. S.K. Pandey, I.T.S, Ghaziabad