The document discusses classification algorithms in machine learning. It provides an overview of various classification algorithms including decision tree classifiers, rule-based classifiers, nearest neighbor classifiers, Bayesian classifiers, and artificial neural network classifiers. It then describes the supervised learning process for classification, which involves using a training set to construct a classification model and then applying the model to a test set to classify new data. Finally, it provides a detailed example of how a decision tree classifier is constructed from a training dataset and how it can be used to classify data in the test set.
This course is all about the data mining that how we get the optimized results. it included with all types and how we use these techniques.This course is all about the data mining that how we get the optimized results. it included with all types and how we use these techniques.This course is all about the data mining that how we get the optimized results. it included with all types and how we use these techniques.This course is all about the data mining that how we get the optimized results. it included with all types and how we use these techniques.This course is all about the data mining that how we get the optimized results. it included with all types and how we use these techniques
This document discusses unsupervised learning approaches including clustering, blind signal separation, and self-organizing maps (SOM). Clustering groups unlabeled data points together based on similarities. Blind signal separation separates mixed signals into their underlying source signals without information about the mixing process. SOM is an algorithm that maps higher-dimensional data onto lower-dimensional displays to visualize relationships in the data.
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceMaryamRehman6
This Decision Tree Algorithm in Machine Learning Presentation will help you understand all the basics of Decision Tree along with what Machine Learning is, what Machine Learning is, what Decision Tree is, the advantages and disadvantages of Decision Tree, how Decision Tree algorithm works with resolved examples, and at the end of the decision Tree use case/demo in Python for loan payment. For both beginners and experts who want to learn Machine Learning Algorithms, this Decision Tree tutorial is perfect.
Decision tree is a type of supervised learning algorithm (having a pre-defined target variable) that is mostly used in classification problems. It is a tree in which each branch node represents a choice between a number of alternatives, and each leaf node represents a decision.
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioMarina Santini
attribute selection, constructing decision trees, decision trees, divide and conquer, entropy, gain ratio, information gain, machine leaning, pruning, rules, suprisal
This presentation was prepared as part of the curriculum studies for CSCI-659 Topics in Artificial Intelligence Course - Machine Learning in Computational Linguistics.
It was prepared under guidance of Prof. Sandra Kubler.
This course is all about the data mining that how we get the optimized results. it included with all types and how we use these techniques.This course is all about the data mining that how we get the optimized results. it included with all types and how we use these techniques.This course is all about the data mining that how we get the optimized results. it included with all types and how we use these techniques.This course is all about the data mining that how we get the optimized results. it included with all types and how we use these techniques.This course is all about the data mining that how we get the optimized results. it included with all types and how we use these techniques
This document discusses unsupervised learning approaches including clustering, blind signal separation, and self-organizing maps (SOM). Clustering groups unlabeled data points together based on similarities. Blind signal separation separates mixed signals into their underlying source signals without information about the mixing process. SOM is an algorithm that maps higher-dimensional data onto lower-dimensional displays to visualize relationships in the data.
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceMaryamRehman6
This Decision Tree Algorithm in Machine Learning Presentation will help you understand all the basics of Decision Tree along with what Machine Learning is, what Machine Learning is, what Decision Tree is, the advantages and disadvantages of Decision Tree, how Decision Tree algorithm works with resolved examples, and at the end of the decision Tree use case/demo in Python for loan payment. For both beginners and experts who want to learn Machine Learning Algorithms, this Decision Tree tutorial is perfect.
Decision tree is a type of supervised learning algorithm (having a pre-defined target variable) that is mostly used in classification problems. It is a tree in which each branch node represents a choice between a number of alternatives, and each leaf node represents a decision.
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioMarina Santini
attribute selection, constructing decision trees, decision trees, divide and conquer, entropy, gain ratio, information gain, machine leaning, pruning, rules, suprisal
This presentation was prepared as part of the curriculum studies for CSCI-659 Topics in Artificial Intelligence Course - Machine Learning in Computational Linguistics.
It was prepared under guidance of Prof. Sandra Kubler.
This document provides an overview of decision trees, including:
- Decision trees classify records by sorting them down the tree from root to leaf node, where each leaf represents a classification outcome.
- Trees are constructed top-down by selecting the most informative attribute to split on at each node, usually based on information gain.
- Trees can handle both numerical and categorical data and produce classification rules from paths in the tree.
- Examples of decision tree algorithms like ID3 that use information gain to select the best splitting attribute are described. The concepts of entropy and information gain are defined for selecting splits.
The document introduces data preprocessing techniques for data mining. It discusses why data preprocessing is important due to real-world data often being dirty, incomplete, noisy, inconsistent or duplicate. It then describes common data types and quality issues like missing values, noise, outliers and duplicates. The major tasks of data preprocessing are outlined as data cleaning, integration, transformation and reduction. Specific techniques for handling missing values, noise, outliers and duplicates are also summarized.
This document provides an overview of classification techniques. It defines classification as assigning records to predefined classes based on their attribute values. The key steps are building a classification model from training data and then using the model to classify new, unseen records. Decision trees are discussed as a popular classification method that uses a tree structure with internal nodes for attributes and leaf nodes for classes. The document covers decision tree induction, handling overfitting, and performance evaluation methods like holdout validation and cross-validation.
Classification of common clustering algorithm and techniques, e.g., hierarchical clustering, distance measures, K-means, Squared error, SOFM, Clustering large databases.
Clustering is an unsupervised learning technique used to group unlabeled data points together based on similarities. It aims to maximize similarity within clusters and minimize similarity between clusters. There are several clustering methods including partitioning, hierarchical, density-based, grid-based, and model-based. Clustering has many applications such as pattern recognition, image processing, market research, and bioinformatics. It is useful for extracting hidden patterns from large, complex datasets.
This document discusses computational intelligence and supervised learning techniques for classification. It provides examples of applications in medical diagnosis and credit card approval. The goal of supervised learning is to learn from labeled training data to predict the class of new unlabeled examples. Decision trees and backpropagation neural networks are introduced as common supervised learning algorithms. Evaluation methods like holdout validation, cross-validation and performance metrics beyond accuracy are also summarized.
K-means clustering is an algorithm that groups data points into k number of clusters based on their similarity. It works by randomly selecting k data points as initial cluster centroids and then assigning each remaining point to the closest centroid. It then recalculates the centroids and reassigns points in an iterative process until centroids stabilize. While efficient, k-means clustering has weaknesses in that it requires specifying k, can get stuck in local optima, and is not suitable for non-convex shaped clusters or noisy data.
2.1 Data Mining-classification Basic conceptsKrish_ver2
This document discusses classification and decision trees. It defines classification as predicting categorical class labels using a model constructed from a training set. Decision trees are a popular classification method that operate in a top-down recursive manner, splitting the data into purer subsets based on attribute values. The algorithm selects the optimal splitting attribute using an evaluation metric like information gain at each step until it reaches a leaf node containing only one class.
Data preprocessing involves transforming raw data into an understandable and consistent format. It includes data cleaning, integration, transformation, and reduction. Data cleaning aims to fill missing values, smooth noise, and resolve inconsistencies. Data integration combines data from multiple sources. Data transformation handles tasks like normalization and aggregation to prepare the data for mining. Data reduction techniques obtain a reduced representation of data that maintains analytical results but reduces volume, such as through aggregation, dimensionality reduction, discretization, and sampling.
Welcome to the Supervised Machine Learning and Data Sciences.
Algorithms for building models. Support Vector Machines.
Classification algorithm explanation and code in Python ( SVM ) .
Mining Frequent Patterns, Association and CorrelationsJustin Cletus
This document summarizes Chapter 6 of the book "Data Mining: Concepts and Techniques" which discusses frequent pattern mining. It introduces basic concepts like frequent itemsets and association rules. It then describes several scalable algorithms for mining frequent itemsets, including Apriori, FP-Growth, and ECLAT. It also discusses optimizations to Apriori like partitioning the database and techniques to reduce the number of candidates and database scans.
This document discusses machine learning concepts including supervised vs. unsupervised learning, clustering algorithms, and specific clustering methods like k-means and k-nearest neighbors. It provides examples of how clustering can be used for applications such as market segmentation and astronomical data analysis. Key clustering algorithms covered are hierarchy methods, partitioning methods, k-means which groups data by assigning objects to the closest cluster center, and k-nearest neighbors which classifies new data based on its closest training examples.
This document discusses unsupervised machine learning classification through clustering. It defines clustering as the process of grouping similar items together, with high intra-cluster similarity and low inter-cluster similarity. The document outlines common clustering algorithms like K-means and hierarchical clustering, and describes how K-means works by assigning points to centroids and iteratively updating centroids. It also discusses applications of clustering in domains like marketing, astronomy, genomics and more.
Supervised learning and Unsupervised learning Usama Fayyaz
This document discusses supervised and unsupervised machine learning. Supervised learning uses labeled training data to learn a function that maps inputs to outputs. Unsupervised learning is used when only input data is available, with the goal of modeling underlying structures or distributions in the data. Common supervised algorithms include decision trees and logistic regression, while common unsupervised algorithms include k-means clustering and dimensionality reduction.
The ID3 algorithm generates a decision tree from training data using a top-down, greedy search. It calculates the entropy of attributes in the training data to determine which attribute best splits the data into pure subsets with maximum information gain. It then recursively builds the decision tree, using the selected attributes to split the data at each node until reaching leaf nodes containing only one class. The resulting decision tree can then classify new samples not in the training data.
Ensemble methods combine multiple machine learning models to obtain better predictive performance than from any individual model. There are two main types of ensemble methods: sequential (e.g AdaBoost) where models are generated one after the other, and parallel (e.g Random Forest) where models are generated independently. Popular ensemble methods include bagging, boosting, and stacking. Bagging averages predictions from models trained on random samples of the data, while boosting focuses on correcting previous models' errors. Stacking trains a meta-model on predictions from other models to produce a final prediction.
key note address delivered on 23rd March 2011 in the Workshop on Data Mining and Computational Biology in Bioinformatics, sponsored by DBT India and organised by Unit of Simulation and Informatics, IARI, New Delhi.
I do not claim any originality either to slides or their content and in fact aknowledge various web sources.
NN Classififcation Neural Network NN.pptxcmpt cmpt
The document provides an overview of classification methods. It begins with defining classification tasks and discussing evaluation metrics like accuracy, recall, and precision. It then describes several common classification techniques including nearest neighbor classification, decision tree induction, Naive Bayes, logistic regression, and support vector machines. For decision trees specifically, it explains the basic structure of decision trees, the induction process using Hunt's algorithm, and issues in tree induction.
Classification and prediction models are used to categorize data or predict unknown values. Classification predicts categorical class labels to classify new data based on attributes in a training set, while prediction models continuous values. Common applications include credit approval, marketing, medical diagnosis, and treatment analysis. The classification process involves building a model from a training set and then using the model to classify new data, estimating accuracy on a test set.
This document provides an overview of decision trees, including:
- Decision trees classify records by sorting them down the tree from root to leaf node, where each leaf represents a classification outcome.
- Trees are constructed top-down by selecting the most informative attribute to split on at each node, usually based on information gain.
- Trees can handle both numerical and categorical data and produce classification rules from paths in the tree.
- Examples of decision tree algorithms like ID3 that use information gain to select the best splitting attribute are described. The concepts of entropy and information gain are defined for selecting splits.
The document introduces data preprocessing techniques for data mining. It discusses why data preprocessing is important due to real-world data often being dirty, incomplete, noisy, inconsistent or duplicate. It then describes common data types and quality issues like missing values, noise, outliers and duplicates. The major tasks of data preprocessing are outlined as data cleaning, integration, transformation and reduction. Specific techniques for handling missing values, noise, outliers and duplicates are also summarized.
This document provides an overview of classification techniques. It defines classification as assigning records to predefined classes based on their attribute values. The key steps are building a classification model from training data and then using the model to classify new, unseen records. Decision trees are discussed as a popular classification method that uses a tree structure with internal nodes for attributes and leaf nodes for classes. The document covers decision tree induction, handling overfitting, and performance evaluation methods like holdout validation and cross-validation.
Classification of common clustering algorithm and techniques, e.g., hierarchical clustering, distance measures, K-means, Squared error, SOFM, Clustering large databases.
Clustering is an unsupervised learning technique used to group unlabeled data points together based on similarities. It aims to maximize similarity within clusters and minimize similarity between clusters. There are several clustering methods including partitioning, hierarchical, density-based, grid-based, and model-based. Clustering has many applications such as pattern recognition, image processing, market research, and bioinformatics. It is useful for extracting hidden patterns from large, complex datasets.
This document discusses computational intelligence and supervised learning techniques for classification. It provides examples of applications in medical diagnosis and credit card approval. The goal of supervised learning is to learn from labeled training data to predict the class of new unlabeled examples. Decision trees and backpropagation neural networks are introduced as common supervised learning algorithms. Evaluation methods like holdout validation, cross-validation and performance metrics beyond accuracy are also summarized.
K-means clustering is an algorithm that groups data points into k number of clusters based on their similarity. It works by randomly selecting k data points as initial cluster centroids and then assigning each remaining point to the closest centroid. It then recalculates the centroids and reassigns points in an iterative process until centroids stabilize. While efficient, k-means clustering has weaknesses in that it requires specifying k, can get stuck in local optima, and is not suitable for non-convex shaped clusters or noisy data.
2.1 Data Mining-classification Basic conceptsKrish_ver2
This document discusses classification and decision trees. It defines classification as predicting categorical class labels using a model constructed from a training set. Decision trees are a popular classification method that operate in a top-down recursive manner, splitting the data into purer subsets based on attribute values. The algorithm selects the optimal splitting attribute using an evaluation metric like information gain at each step until it reaches a leaf node containing only one class.
Data preprocessing involves transforming raw data into an understandable and consistent format. It includes data cleaning, integration, transformation, and reduction. Data cleaning aims to fill missing values, smooth noise, and resolve inconsistencies. Data integration combines data from multiple sources. Data transformation handles tasks like normalization and aggregation to prepare the data for mining. Data reduction techniques obtain a reduced representation of data that maintains analytical results but reduces volume, such as through aggregation, dimensionality reduction, discretization, and sampling.
Welcome to the Supervised Machine Learning and Data Sciences.
Algorithms for building models. Support Vector Machines.
Classification algorithm explanation and code in Python ( SVM ) .
Mining Frequent Patterns, Association and CorrelationsJustin Cletus
This document summarizes Chapter 6 of the book "Data Mining: Concepts and Techniques" which discusses frequent pattern mining. It introduces basic concepts like frequent itemsets and association rules. It then describes several scalable algorithms for mining frequent itemsets, including Apriori, FP-Growth, and ECLAT. It also discusses optimizations to Apriori like partitioning the database and techniques to reduce the number of candidates and database scans.
This document discusses machine learning concepts including supervised vs. unsupervised learning, clustering algorithms, and specific clustering methods like k-means and k-nearest neighbors. It provides examples of how clustering can be used for applications such as market segmentation and astronomical data analysis. Key clustering algorithms covered are hierarchy methods, partitioning methods, k-means which groups data by assigning objects to the closest cluster center, and k-nearest neighbors which classifies new data based on its closest training examples.
This document discusses unsupervised machine learning classification through clustering. It defines clustering as the process of grouping similar items together, with high intra-cluster similarity and low inter-cluster similarity. The document outlines common clustering algorithms like K-means and hierarchical clustering, and describes how K-means works by assigning points to centroids and iteratively updating centroids. It also discusses applications of clustering in domains like marketing, astronomy, genomics and more.
Supervised learning and Unsupervised learning Usama Fayyaz
This document discusses supervised and unsupervised machine learning. Supervised learning uses labeled training data to learn a function that maps inputs to outputs. Unsupervised learning is used when only input data is available, with the goal of modeling underlying structures or distributions in the data. Common supervised algorithms include decision trees and logistic regression, while common unsupervised algorithms include k-means clustering and dimensionality reduction.
The ID3 algorithm generates a decision tree from training data using a top-down, greedy search. It calculates the entropy of attributes in the training data to determine which attribute best splits the data into pure subsets with maximum information gain. It then recursively builds the decision tree, using the selected attributes to split the data at each node until reaching leaf nodes containing only one class. The resulting decision tree can then classify new samples not in the training data.
Ensemble methods combine multiple machine learning models to obtain better predictive performance than from any individual model. There are two main types of ensemble methods: sequential (e.g AdaBoost) where models are generated one after the other, and parallel (e.g Random Forest) where models are generated independently. Popular ensemble methods include bagging, boosting, and stacking. Bagging averages predictions from models trained on random samples of the data, while boosting focuses on correcting previous models' errors. Stacking trains a meta-model on predictions from other models to produce a final prediction.
key note address delivered on 23rd March 2011 in the Workshop on Data Mining and Computational Biology in Bioinformatics, sponsored by DBT India and organised by Unit of Simulation and Informatics, IARI, New Delhi.
I do not claim any originality either to slides or their content and in fact aknowledge various web sources.
NN Classififcation Neural Network NN.pptxcmpt cmpt
The document provides an overview of classification methods. It begins with defining classification tasks and discussing evaluation metrics like accuracy, recall, and precision. It then describes several common classification techniques including nearest neighbor classification, decision tree induction, Naive Bayes, logistic regression, and support vector machines. For decision trees specifically, it explains the basic structure of decision trees, the induction process using Hunt's algorithm, and issues in tree induction.
Classification and prediction models are used to categorize data or predict unknown values. Classification predicts categorical class labels to classify new data based on attributes in a training set, while prediction models continuous values. Common applications include credit approval, marketing, medical diagnosis, and treatment analysis. The classification process involves building a model from a training set and then using the model to classify new data, estimating accuracy on a test set.
This document provides a summary of Bayesian classification. Bayesian classification predicts the probability of class membership for new data instances based on prior knowledge and training data. It uses Bayes' theorem to calculate the posterior probability of a class given the attributes of an instance. The naive Bayesian classifier assumes attribute independence and uses frequency counts to estimate probabilities. It classifies new instances by selecting the class with the highest posterior probability. The example shows how probabilities are estimated from training data and used to classify an unseen instance in the play-tennis dataset.
The document discusses classification and prediction techniques in data mining, explaining that classification involves organizing data into discrete classes while prediction forecasts continuous values, and it outlines various classification methods like decision trees, neural networks, and Bayesian classification as well as evaluating model accuracy, speed, and interpretability.
classification in data mining and data warehousing.pdf321106410027
The document discusses various classification techniques in machine learning. It begins with an overview of classification and supervised vs. unsupervised learning. Classification aims to predict categorical class labels by constructing a predictive model from labeled training data. Decision tree induction is then covered as a basic classification algorithm that recursively partitions data based on attribute values until reaching single class leaf nodes. Bayes classification methods are also mentioned, which classify examples based on applying Bayes' theorem to calculate posterior probabilities.
The document discusses the differences and similarities between classification and prediction, providing examples of how classification predicts categorical class labels by constructing a model based on training data, while prediction models continuous values to predict unknown values, though the process is similar between the two. It also covers clustering analysis, explaining that it is an unsupervised technique that groups similar data objects into clusters to discover hidden patterns in datasets.
Classification models are used to categorize data into discrete classes or categories. For example, classifying loan applications as "safe" or "risky". The classification process involves building a classifier or model from training data using a classification algorithm, then applying the classifier to new data to categorize it. Prediction models are used to predict continuous numeric values, like estimating how much a customer will spend on a computer based on their income and occupation. The main difference is that classification predicts discrete classes while prediction estimates numeric values.
Decision trees classify instances by starting at the root node and moving through the tree recursively according to attribute tests at each node, until a leaf node determining the class label is reached. They work by splitting the training data into purer partitions based on the values of predictor attributes, using an attribute selection measure like information gain to choose the splitting attributes. The resulting tree can be pruned to avoid overfitting and reduce error on new data.
The document discusses machine learning and various machine learning techniques. It defines machine learning as using data and experience to acquire models and modify decision mechanisms to improve performance. The document outlines different types of machine learning including supervised learning (using labeled data), unsupervised learning (using only unlabeled data), and reinforcement learning (where an agent takes actions and receives rewards or punishments). It provides examples of classification problems and discusses decision tree learning as a supervised learning method, including how decision trees are constructed and potential issues like overfitting.
The document discusses machine learning and various machine learning techniques. It defines machine learning as using data and experience to acquire models and modify decision mechanisms to improve performance. It describes supervised learning where data and labels are provided, unsupervised learning where only data is given, and reinforcement learning where an agent takes actions and receives rewards or punishments. Decision tree learning is discussed as a supervised learning method where trees are constructed by recursively splitting data based on attribute tests that optimize criteria like information gain. Overfitting and techniques like pruning are addressed to improve generalization.
This document discusses machine learning and various machine learning techniques. It begins by defining learning and different types of machine learning, including supervised learning, unsupervised learning, and reinforcement learning. It then focuses on supervised learning, discussing important concepts like training and test sets. Decision trees are presented as a popular supervised learning technique, including how they are constructed using a top-down recursive approach that chooses attributes to best split the data based on measures like information gain. Overfitting is also discussed as an issue to address with techniques like pruning.
The document provides an overview of various machine learning algorithms and methods. It begins with an introduction to predictive modeling and supervised vs. unsupervised learning. It then describes several supervised learning algorithms in detail including linear regression, K-nearest neighbors (KNN), decision trees, random forest, logistic regression, support vector machines (SVM), and naive Bayes. It also briefly discusses unsupervised learning techniques like clustering and dimensionality reduction methods.
The document discusses decision trees and their algorithms. It introduces decision trees, describing their structure as having root, internal, and leaf nodes. It then discusses Hunt's algorithm, the basis for decision tree induction algorithms like ID3 and C4.5. Hunt's algorithm grows a decision tree recursively by partitioning training records into purer subsets based on attribute tests. The document also covers methods for expressing test conditions based on attribute type, measures for selecting the best split like information gain, and advantages and disadvantages of decision trees.
The document discusses machine learning and various machine learning concepts. It defines learning as improving performance through experience. Machine learning involves using data to acquire models and learn hidden concepts. The main areas covered are supervised learning (data with labels), unsupervised learning (data without labels), semi-supervised learning (some labels present), and reinforcement learning (agent takes actions and receives rewards/punishments). Decision trees are presented as a way to represent hypotheses learned through examples, with attributes used to recursively split data into partitions.
The document discusses machine learning concepts including supervised and unsupervised learning. It explains that supervised learning involves labeled training data to learn a model that can classify new examples, while unsupervised learning discovers hidden patterns in unlabeled data. The document also covers regression, classification tasks, evaluating models on test data, feature selection, and the machine learning process of data collection, model training and evaluation.
The document discusses classification techniques for supervised learning problems. It describes classification as predicting categorical class labels based on a training set of labeled data. The classification process involves constructing a model from the training set and then using the model to classify new unlabeled data. Common classification techniques discussed include decision tree induction, Bayesian classification methods, and rule-based classification. Model evaluation and techniques for improving accuracy, such as ensemble methods, are also covered.
The document describes decision tree induction and classification. It provides examples of datasets that can be used to build decision trees, including examples on loan approvals, playing sailboats, and shapes. It explains the basic TDIDT (Top Down Induction of Decision Trees) algorithm for building decision trees from datasets in a recursive partitioning manner. Key aspects of decision tree learning covered include attribute selection criteria, information gain, entropy, and residual information.
Chapter 4 Classification in data sience .pdfAschalewAyele2
This document discusses data mining tasks related to predictive modeling and classification. It defines predictive modeling as using historical data to predict unknown future values, with a focus on accuracy. Classification is described as predicting categorical class labels based on a training set. Several classification algorithms are mentioned, including K-nearest neighbors, decision trees, neural networks, Bayesian networks, and support vector machines. The document also discusses evaluating classification performance using metrics like accuracy, precision, recall, and a confusion matrix.
This document discusses classification and prediction in data analysis. It defines classification as predicting categorical class labels, such as predicting if a loan applicant is risky or safe. Prediction predicts continuous numeric values, such as predicting how much a customer will spend. The document provides examples of classification, including a bank predicting loan risk and a company predicting computer purchases. It also provides an example of prediction, where a company predicts customer spending. It then discusses how classification works, including building a classifier model from training data and using the model to classify new data. Finally, it discusses decision tree induction for classification and the k-means algorithm.
Machine Learning 2 deep Learning: An IntroSi Krishan
The document provides an introduction to machine learning and deep learning. It discusses that machine learning involves making computers learn patterns from data without being explicitly programmed, while deep learning uses neural networks with many layers to perform end-to-end learning from raw data without engineered features. Deep learning has achieved remarkable success in applications involving computer vision, speech recognition, and natural language processing due to its ability to learn representations of the raw data. The document outlines popular deep learning models like convolutional neural networks and recurrent neural networks and provides examples of applications in areas such as image classification and prediction of heart attacks.
Similar to Classification techniques in data mining (20)
This document discusses various computer arithmetic operations including addition, subtraction, multiplication, and division for signed magnitude and two's complement data representations. It describes the Booth multiplication algorithm, array multipliers for performing multiplication using combinational circuits, and the division algorithm. It also covers detecting divide overflow conditions.
The document provides an introduction to computer security including:
- The basic components of security such as confidentiality, integrity, and availability.
- Common security threats like snooping, modification, and denial of service attacks.
- Issues with security including operational challenges and human factors.
- An overview of security policies, access control models, and security models like Bell-LaPadula and Biba.
Cookies and sessions allow servers to remember information about users across multiple web pages. Cookies are small files stored on a user's computer that identify users and can store data to be accessed on subsequent page requests. Sessions use cookies to identify users and store temporary data on the server side to be accessed across multiple pages in one application, such as usernames or preferences. Both cookies and sessions must be started before any page output to ensure headers are sent before the page body.
This document discusses different aspects of functions in programming including declaring and calling functions, passing arguments to functions, and returning values from functions. It also covers variable scope. Some key points covered are declaring functions with and without arguments, specifying default values, returning single values or arrays from functions, and understanding variable scope and how it relates to the global and $GLOBALS keywords and array.
This document discusses various aspects of working with web forms in PHP, including:
1) Useful server variables for forms like QUERY_STRING and SERVER_NAME.
2) Accessing form parameters submitted to the server.
3) Processing forms with functions, including validating form data with techniques like checking for required fields and valid email addresses.
4) Displaying default values or error messages for form fields.
5) Stripping HTML tags from form inputs and encoding special characters for safe display.
The document provides examples of implementing each of these techniques.
The document discusses various programming concepts related to decision making and repetition in code including understanding true and false values, using if/elseif/else statements, equality and relational operators, logical operators, and using while and for loops to repeat code. Specific topics covered include evaluating booleans, making single and multi-line if statements, comparing different data types, negation, and printing select menus with loops.
This document discusses working with arrays in PHP. It covers array basics like creating and accessing arrays, looping through arrays with foreach and for loops, modifying arrays by adding/removing elements and sorting arrays. It also discusses multidimensional arrays, how to create them and access elements within them.
This document discusses text and numbers in programming. It covers defining and manipulating text strings using single or double quotes. Escape characters can be used inside strings. Text can be validated and formatted using various string functions like trim(), strlen(), strtoupper(), substr(), and str_replace(). Numbers can be integers or floats. Variables hold data and can be operated on with arithmetic and assignment operators like +, -, *, /, %, and .=. Variables can also be incremented, decremented, and placed inside strings.
This document provides an introduction and overview of PHP for beginners. It discusses PHP's use for building websites, how PHP code is run on web servers and accessed through browsers. It then highlights some key advantages of PHP like being free, cross-platform, and widely used. It demonstrates a basic "Hello World" PHP program and shows how to output HTML forms and formatted numbers. Finally, it outlines some basic rules of PHP programs regarding tags, syntax, whitespace, comments, and case sensitivity.
The document discusses capacity planning for a data warehouse environment. It notes that capacity planning is important given the large volumes of data and processing in a data warehouse. It describes factors that make capacity planning unique for a data warehouse, such as variable workloads and larger data volumes than operational systems. The document provides guidance on estimating disk storage needs, classifying and estimating processing workloads, creating workload profiles, identifying peak capacity needs, and selecting hardware capacity to meet needs.
Data warehousing involves assembling and managing data from various sources to provide an integrated view of enterprise information. A data warehouse contains consolidated, historical data used to support management decision making. It differs from operational databases by containing aggregated, non-volatile data optimized for queries rather than updates. The extract, transform, load (ETL) process migrates data from source systems to the warehouse, transforming it as needed. Process managers oversee loading, maintaining, and querying the warehouse data.
Search engines allow users to search the vast collection of documents on the web. They consist of crawlers that fetch web pages, indexers that analyze page content and links, and interfaces that allow users to enter queries. Crawlers add pages to an index by following links, and indexers create inverted indexes to map words to pages. When a query is searched, results are retrieved from the index and ranked based on relevance. PageRank is a key algorithm that ranks pages higher that receive more links from other highly ranked pages. While it effectively searches the large, diverse and dynamic web, search poses challenges in understanding ambiguous queries over an evolving collection.
Web mining involves applying data mining techniques to discover useful information from web data. There are three types of web mining: web content mining analyzes data within web pages, web structure mining examines the hyperlink structure between pages, and web usage mining involves analyzing server logs to discover patterns in user behavior and interactions with websites. Web mining has applications in website design, web traffic analysis, e-commerce personalization, and security/crime investigation.
Information privacy and data mining
The document discusses information privacy and data mining. It defines information privacy as an individual's ability to control how information about them is shared. It outlines the basic OECD principles for protecting information privacy, including collection limitation, purpose specification, use limitation, security safeguards, and accountability. It describes common uses of data mining like fraud prevention but also potential misuses that can violate privacy. The document also discusses the primary aims of data mining applications and five pitfalls like unintentional mistakes, intentional abuse, and mission creep.
The document discusses cluster analysis, which groups data objects into clusters so that objects within a cluster are similar but dissimilar to objects in other clusters. It describes key characteristics of clustering, including that it is unsupervised learning and the clusters are determined algorithmically rather than by humans. Various clustering algorithms are covered, including partitioning, hierarchical, density-based, and grid-based methods. Applications of clustering discussed include business intelligence, image recognition, web search, outlier detection, and biology. Requirements for effective clustering in data mining are also outlined.
Association analysis is a technique used to uncover relationships between items in transactional data. It involves finding frequent itemsets whose occurrence exceeds a minimum support threshold, and then generating association rules from these itemsets that satisfy minimum confidence. The Apriori algorithm is commonly used for this task, as it leverages the Apriori property to prune the search space - if an itemset is infrequent, its supersets cannot be frequent. It performs multiple database scans to iteratively grow frequent itemsets and extract high confidence rules.
This document outlines a chapter on data preprocessing that discusses data types, attributes, and preprocessing tasks. It begins by defining data and attributes, then describes different types of attributes like nominal, binary, ordinal, and numeric attributes. It also discusses different types of datasets like records, documents, transactions, and graphs. The major section on data preprocessing outlines why it is important and describes tasks like data cleaning, integration, transformation, reduction, and discretization to prepare dirty or unstructured data for analysis.
Introduction to Data Mining and Data WarehousingKamal Acharya
This document provides details about a course on data mining and data warehousing. The course objectives are to understand the foundational principles and techniques of data mining and data warehousing. The course description covers topics like data preprocessing, classification, association analysis, cluster analysis, and data warehouses. The course is divided into 10 units that cover concepts and algorithms for data mining techniques. Practical exercises are included to apply techniques to real-world data problems.
8+8+8 Rule Of Time Management For Better ProductivityRuchiRathor2
This is a great way to be more productive but a few things to
Keep in mind:
- The 8+8+8 rule offers a general guideline. You may need to adjust the schedule depending on your individual needs and commitments.
- Some days may require more work or less sleep, demanding flexibility in your approach.
- The key is to be mindful of your time allocation and strive for a healthy balance across the three categories.
Get Success with the Latest UiPath UIPATH-ADPV1 Exam Dumps (V11.02) 2024yarusun
Are you worried about your preparation for the UiPath Power Platform Functional Consultant Certification Exam? You can come to DumpsBase to download the latest UiPath UIPATH-ADPV1 exam dumps (V11.02) to evaluate your preparation for the UIPATH-ADPV1 exam with the PDF format and testing engine software. The latest UiPath UIPATH-ADPV1 exam questions and answers go over every subject on the exam so you can easily understand them. You won't need to worry about passing the UIPATH-ADPV1 exam if you master all of these UiPath UIPATH-ADPV1 dumps (V11.02) of DumpsBase. #UIPATH-ADPV1 Dumps #UIPATH-ADPV1 #UIPATH-ADPV1 Exam Dumps
Cross-Cultural Leadership and CommunicationMattVassar1
Business is done in many different ways across the world. How you connect with colleagues and communicate feedback constructively differs tremendously depending on where a person comes from. Drawing on the culture map from the cultural anthropologist, Erin Meyer, this class discusses how best to manage effectively across the invisible lines of culture.
Decolonizing Universal Design for LearningFrederic Fovet
UDL has gained in popularity over the last decade both in the K-12 and the post-secondary sectors. The usefulness of UDL to create inclusive learning experiences for the full array of diverse learners has been well documented in the literature, and there is now increasing scholarship examining the process of integrating UDL strategically across organisations. One concern, however, remains under-reported and under-researched. Much of the scholarship on UDL ironically remains while and Eurocentric. Even if UDL, as a discourse, considers the decolonization of the curriculum, it is abundantly clear that the research and advocacy related to UDL originates almost exclusively from the Global North and from a Euro-Caucasian authorship. It is argued that it is high time for the way UDL has been monopolized by Global North scholars and practitioners to be challenged. Voices discussing and framing UDL, from the Global South and Indigenous communities, must be amplified and showcased in order to rectify this glaring imbalance and contradiction.
This session represents an opportunity for the author to reflect on a volume he has just finished editing entitled Decolonizing UDL and to highlight and share insights into the key innovations, promising practices, and calls for change, originating from the Global South and Indigenous Communities, that have woven the canvas of this book. The session seeks to create a space for critical dialogue, for the challenging of existing power dynamics within the UDL scholarship, and for the emergence of transformative voices from underrepresented communities. The workshop will use the UDL principles scrupulously to engage participants in diverse ways (challenging single story approaches to the narrative that surrounds UDL implementation) , as well as offer multiple means of action and expression for them to gain ownership over the key themes and concerns of the session (by encouraging a broad range of interventions, contributions, and stances).
The Science of Learning: implications for modern teachingDerek Wenmoth
Keynote presentation to the Educational Leaders hui Kōkiritia Marautanga held in Auckland on 26 June 2024. Provides a high level overview of the history and development of the science of learning, and implications for the design of learning in our modern schools and classrooms.
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptxCapitolTechU
Slides from a Capitol Technology University webinar held June 20, 2024. The webinar featured Dr. Donovan Wright, presenting on the Department of Defense Digital Transformation.
2. Outline Of The Chapter
• Basics
• Decision Tree Classifier
• Rule Based Classifier
• Nearest Neighbor Classifier
• Bayesian Classifier
• Artificial Neural Network Classifier
Issues : Over-fitting, Validation, Model Comparison
Compiled By: Kamal Acharya
3. Supervised Learning
• Supervised learning (classification)
– Supervision: The training data (observations, measurements, etc.) are
accompanied by labels indicating the class of the observations
– New data is classified based on the training set
Apply
Model
Induction
Deduction
Learn
Model
Model
Tid Attrib1 Attrib2 Attrib3 Class
1 Yes Large 125K No
2 No Medium 100K No
3 No Small 70K No
4 Yes Medium 120K No
5 No Large 95K Yes
6 No Medium 60K No
7 Yes Large 220K No
8 No Small 85K Yes
9 No Medium 75K No
10 No Small 90K Yes
10
Tid Attrib1 Attrib2 Attrib3 Class
11 No Small 55K ?
12 Yes Medium 80K ?
13 Yes Large 110K ?
14 No Small 95K ?
15 No Large 67K ?
10
Test Set
Learning
algorithm
Training Set
Compiled By: Kamal Acharya
4. • Classification:
– predicts categorical class labels
– classifies data (constructs a model) based on the training set and the values
(class labels) in a classifying attribute and uses it in classifying new data
• Regression:
– models continuous-valued functions, i.e., predicts unknown or missing
values
• Typical Applications
– credit approval
– target marketing
– medical diagnosis
– treatment effectiveness analysis
Classification vs. Prediction
Compiled By: Kamal Acharya
5. • Credit approval
– A bank wants to classify its customers based on whether they are
expected to pay back their approved loans
– The history of past customers is used to train the classifier
– The classifier provides rules, which identify potentially reliable future
customers
– Classification rule:
• If age = “31...40” and income = high then credit_rating = excellent
– Future customers
• Paul: age = 35, income = high excellent credit rating
• John: age = 20, income = medium fair credit rating
Why Classification? A motivating application
Compiled By: Kamal Acharya
6. Classification—A Two-Step Process
• Model construction: describing a set of predetermined classes
– Each tuple/sample is assumed to belong to a predefined class, as
determined by the class label attribute
– The set of tuples used for model construction: training set
– The model is represented as classification rules, decision trees, or
mathematical formulae
Compiled By: Kamal Acharya
7. Classification Process (1): Model Construction: E.g.
Training
Data
NAME RANK YEARS TENURED
Mike Assistant Prof 3 no
Mary Assistant Prof 7 yes
Bill Professor 2 yes
Jim Associate Prof 7 yes
Dave Assistant Prof 6 no
Anne Associate Prof 3 no
Classification
Algorithms
IF rank = ‘professor’
OR years > 6
THEN tenured = ‘yes’
Classifier
(Model)
Compiled By: Kamal Acharya
8. Classification—A Two-Step Process
• Model usage: for classifying future or unknown objects
– Estimate accuracy of the model
• The known label of test samples is compared with the
classified result from the model
• Accuracy rate is the percentage of test set samples that are
correctly classified by the model
• Test set is independent of training set, otherwise over-fitting
will occur
,
casestestofnumberTotal
tionsclassificacorrectofNumber
Accuracy
Compiled By: Kamal Acharya
9. Classification Process (2): Use the Model in Prediction
Classifier
Testing
Data
NAME RANK YEARS TENURED
Tom Assistant Prof 2 no
Mellisa Associate Prof 7 no
George Professor 5 yes
Joseph Assistant Prof 7 yes
Unseen Data
(Jeff, Professor, 4)
Tenured?
Compiled By: Kamal Acharya
10. Classification by Decision Tree Induction
• Decision tree
– A flow-chart-like tree structure
– Internal node denotes a test on an attribute
– Branch represents an outcome of the test
– Leaf nodes represent class labels or class distribution
Compiled By: Kamal Acharya
11. Classification by Decision Tree Induction
• Decision tree generation consists of two phases
– Tree construction
• At start, all the training examples are at the root
• Partition examples recursively based on selected attributes
– Tree pruning
• Identify and remove branches that reflect noise or outliers
• Use of decision tree: Classifying an unknown sample
– Test the attribute values of the sample against the decision tree
Compiled By: Kamal Acharya
12. Example of a Decision Tree
ID
Home
Owner
Marital
Status
Annual
Income
Defaulted
Borrower
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10
Home
Owner
MarSt
Income
YESNO
NO
NO
Yes No
MarriedSingle, Divorced
< 80K > 80K
Splitting Attributes
Training Data Model: Decision Tree
Compiled By: Kamal Acharya
13. Apply Model to Test Data
Home
Owner
MarSt
Income
YESNO
NO
NO
Yes No
MarriedSingle, Divorced
< 80K > 80K
Home
Owner
Marital
Status
Annual
Income
Defaulted
Borrower
No Married 80K ?
10
Test Data
Start from the root of tree.
Compiled By: Kamal Acharya
14. Apply Model to Test Data
MarSt
Income
YESNO
NO
NO
Yes No
MarriedSingle, Divorced
< 80K > 80K
Home
Owner
Marital
Status
Annual
Income
Defaulted
Borrower
No Married 80K ?
10
Test Data
Home
Owner
Compiled By: Kamal Acharya
15. Apply Model to Test Data
MarSt
Income
YESNO
NO
NO
Yes No
MarriedSingle, Divorced
< 80K > 80K
Home
Owner
Marital
Status
Annual
Income
Defaulted
Borrower
No Married 80K ?
10
Test Data
Home
Owner
Compiled By: Kamal Acharya
16. Apply Model to Test Data
MarSt
Income
YESNO
NO
NO
Yes No
MarriedSingle, Divorced
< 80K > 80K
Home
Owner
Marital
Status
Annual
Income
Defaulted
Borrower
No Married 80K ?
10
Test Data
Home
Owner
Compiled By: Kamal Acharya
17. Apply Model to Test Data
MarSt
Income
YESNO
NO
NO
Yes No
MarriedSingle, Divorced
< 80K > 80K
Home
Owner
Marital
Status
Annual
Income
Defaulted
Borrower
No Married 80K ?
10
Test Data
Home
Owner
Compiled By: Kamal Acharya
18. Apply Model to Test Data
MarSt
Income
YESNO
NO
NO
Yes No
MarriedSingle, Divorced
< 80K > 80K
Home
Owner
Marital
Status
Annual
Income
Defaulted
Borrower
No Married 80K ?
10
Test Data
Assign Defaulted to
“No”
Home
Owner
Compiled By: Kamal Acharya
19. Decision Tree Classification Task
Apply
Model
Induction
Deduction
Learn
Model
Model
Tid Attrib1 Attrib2 Attrib3 Class
1 Yes Large 125K No
2 No Medium 100K No
3 No Small 70K No
4 Yes Medium 120K No
5 No Large 95K Yes
6 No Medium 60K No
7 Yes Large 220K No
8 No Small 85K Yes
9 No Medium 75K No
10 No Small 90K Yes
10
Tid Attrib1 Attrib2 Attrib3 Class
11 No Small 55K ?
12 Yes Medium 80K ?
13 Yes Large 110K ?
14 No Small 95K ?
15 No Large 67K ?
10
Test Set
Tree
Induction
algorithm
Training Set
Decision
Tree
Compiled By: Kamal Acharya
20. Algorithm for Decision Tree Induction
• Basic algorithm (a greedy algorithm)
– Tree is constructed in a top-down recursive divide-and-conquer manner
– At start, all the training examples are at the root
– Attributes are categorical (if continuous-valued, they are discretized in
advance)
– Samples are partitioned recursively based on selected attributes
– Test attributes are selected on the basis of a heuristic or statistical measure
(e.g., information gain)
• Conditions for stopping partitioning
– All samples for a given node belong to the same class
– There are no remaining attributes for further partitioning – majority voting is
employed for classifying the leaf
– There are no samples left
Compiled By: Kamal Acharya
21. Algorithm for Decision Tree Induction (pseudocode)
Algorithm GenDecTree(Sample S, Attlist A)
1. create a node N
2. If all samples are of the same class C then label N with C; terminate;
3. If A is empty then label N with the most common class C in S (majority
voting); terminate;
4. Select aA, with the highest information gain; Label N with a;
5. For each value v of a:
a. Grow a branch from N with condition a=v;
b. Let Sv be the subset of samples in S with a=v;
c. If Sv is empty then attach a leaf labeled with the most common class in S;
d. Else attach the node generated by GenDecTree(Sv, A-a)
Compiled By: Kamal Acharya
22. Attribute Selection Measure: Information Gain (ID3)
Select the attribute with the highest information gain
Let pi be the probability that an arbitrary tuple in D (data set)
belongs to class Ci, estimated by |Ci, D|/|D|
Expected information (entropy) needed to classify a tuple in D:
Information needed (after using A to split D into v partitions) to
classify D:
Information gained by branching on attribute A
)(log)( 2
1
i
m
i
i ppDInfo
)(
||
||
)(
1
j
v
j
j
A DI
D
D
DInfo
(D)InfoInfo(D)Gain(A) A
Compiled By: Kamal Acharya
23. Input: Training Dataset
age income student credit_rating buys_computer
<=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
Compiled By: Kamal Acharya
24. Output: A Decision Tree for “buys_computer”
age?
overcast
student? credit rating?
no yes fairexcellent
<=30 >40
no noyes yes
yes
30..40
Compiled By: Kamal Acharya
25. Attribute Selection: Information Gain
Class P: buys_computer = “yes”
Class N: buys_computer = “no”
age pi ni I(pi, ni)
<=30 2 3 0.971
31…40 4 0 0
>40 3 2 0.971
694.0)2,3(
14
5
)0,4(
14
4
)3,2(
14
5
)(
I
IIDInfoage
048.0)_(
151.0)(
029.0)(
ratingcreditGain
studentGain
incomeGain
246.0)()()( DInfoDInfoageGain age
age income student credit_rating buys_computer
<=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
940.0)
14
5
(log
14
5
)
14
9
(log
14
9
)5,9()( 22 IDInfo
Compiled By: Kamal Acharya
26. Splitting the samples using age
income student credit_rating buys_computer
high no fair no
high no excellent no
medium no fair no
low yes fair yes
medium yes excellent yes
income student credit_rating buys_computer
high no fair yes
low yes excellent yes
medium no excellent yes
high yes fair yes
income student credit_rating buys_computer
medium no fair yes
low yes fair yes
low yes excellent no
medium yes fair yes
medium no excellent no
age?
<=30
30...40
>40
labeled yes
• Because age has the highest information gain among the attributes, it is
selected as the splitting attribute
Compiled By: Kamal Acharya
27. Over-fitting and Tree Pruning
• Over-fitting: An induced tree may over-fit the training data
– Good accuracy on training data but poor on test data
– Symptoms: Too many branches, some may reflect anomalies due to noise or
outliers
– Results in Poor accuracy for unseen samples
• Two approaches to avoid over-fitting
– Pre-pruning: Halt tree construction early—do not split a node if this would
result in the goodness measure(Information gain) falling below a threshold
• Difficult to choose an appropriate threshold
– Post-pruning: Remove branches from a “fully grown” tree—get a sequence of
progressively pruned trees
• Use a set of data different from the training data to decide which is the
“best pruned tree”
Compiled By: Kamal Acharya
28. Decision Tree Based Classification
• Advantages:
– Inexpensive to construct
– Extremely fast at classifying unknown records
– Easy to interpret for small-sized trees
– Robust to noise (especially when methods to avoid over-fitting are
employed)
– Can easily handle redundant or irrelevant attributes (unless the
attributes are interacting)
Compiled By: Kamal Acharya
29. Decision Tree Based Classification
• Disadvantages:
– Space of possible decision trees is exponentially large. Greedy
approaches are often unable to find the best tree.
– Does not take into account interactions between attributes
– Each decision boundary involves only a single attribute
Compiled By: Kamal Acharya
30. Class work
• Study the table given below and construct a decision tree
based on the greedy algorithm using information gain.
Compiled By: Kamal Acharya
32. Home work
• Study the table given below and construct a decision tree
based on the greedy algorithm using information gain.
Compiled By: Kamal Acharya
Day Outlook Temperature Humidity Wind Play Tennis
D1 SUNNY HOT HIGH WEAK NO
D2 SUNNY HOT HIGH STRONG NO
D3 OVERCAST HOT HIGH WEAK YES
D4 RAIN MILD HIGH WEAK YES
D5 RAIN COOL NORMAL WEAK YES
D6 RAIN COOL NORMAL STRONG NO
D7 OVERCAST COOL NORMAL STRONG YES
D8 SUNNY MILD HIGH WEAK NO
D9 SUNNY COOL NORMAL WEAK YES
D10 RAIN MILD NORMAL WEAK YES
D11 SUNNY MILD NORMAL STRONG YES
D12 OVERCAST MILD HIGH STRONG YES
D13 OVERCAST HOT NORMAL WEAK YES
D14 RAIN MILD HIGH STRONG NO
33. Rule Based Classifier
• In Rule based classifiers learned model is represented as a set of If-Then
rules.
• A rule-based classifier uses a set of IF-THEN rules for classification.
• An IF-THEN rule is an expression of the form
– IF condition THEN conclusion.
– An example is : IF(Give Birth = no) (Can Fly = yes) THEN Birds
• The “IF” part (or left side) of a rule is known as the rule antecedent or
precondition.
• The “THEN” part (or right side) is the rule consequent. In the rule antecedent,
the condition consists of one or more attribute tests (e.g., (Give Birth = no)
(Can Fly = yes)) That are logically ANDed. The rule’s consequent contains a
class prediction.
Compiled By: Kamal Acharya
34. Contd..
• Rule-based Classifier (Example)
Compiled By: Kamal Acharya
Name Blood Type Give Birth Can Fly Live in Water Class
human warm yes no no mammals
python cold no no no reptiles
salmon cold no no yes fishes
whale warm yes no yes mammals
frog cold no no sometimes amphibians
komodo cold no no no reptiles
bat warm yes yes no mammals
pigeon warm no yes no birds
cat warm yes no no mammals
leopard shark cold yes no yes fishes
turtle cold no no sometimes reptiles
penguin warm no no sometimes birds
porcupine warm yes no no mammals
eel cold no no yes fishes
salamander cold no no sometimes amphibians
gila monster cold no no no reptiles
platypus warm no no no mammals
owl warm no yes no birds
dolphin warm yes no yes mammals
eagle warm no yes no birds
R1: (Give Birth = no) (Can Fly = yes) Birds
R2: (Give Birth = no) (Live in Water = yes) Fishes
R3: (Give Birth = yes) (Blood Type = warm) Mammals
R4: (Give Birth = no) (Can Fly = no) Reptiles
R5: (Live in Water = sometimes) Amphibians
35. How does Rule-based Classifier Work?
R1: (Give Birth = no) (Can Fly = yes) Birds
R2: (Give Birth = no) (Live in Water = yes) Fishes
R3: (Give Birth = yes) (Blood Type = warm) Mammals
R4: (Give Birth = no) (Can Fly = no) Reptiles
R5: (Live in Water = sometimes) Amphibians
A lemur triggers rule R3, so it is classified as a mammal
A turtle triggers both R4 and R5
A dogfish shark triggers none of the rules
Name Blood Type Give Birth Can Fly Live in Water Class
lemur warm yes no no ?
turtle cold no no sometimes ?
dogfish shark cold yes no yes ?
Compiled By: Kamal Acharya
36. Compiled By: Kamal Acharya
age?
student? credit rating?
<=30 >40
no yes yes
yes
31..40
fairexcellentyesno
• Example: Rule extraction from our buys_computer decision-tree
IF age = young AND student = no THEN buys_computer = no
IF age = young AND student = yes THEN buys_computer = yes
IF age = mid-age THEN buys_computer = yes
IF age = old AND credit_rating = excellent THEN buys_computer = yes
IF age = young AND credit_rating = fair THEN buys_computer = no
Rule Extraction from a Decision Tree
Rules are easier to understand than large trees
One rule is created for each path from the root to a
leaf
Each attribute-value pair along a path forms a
conjunction: the leaf holds the class prediction
37. Rule Coverage and Accuracy
• A Rule R can be assessed by
its coverage and accuracy
• Coverage of a rule:
– Fraction of records that satisfy
the antecedent of a rule
• Accuracy of a rule:
– Fraction of records that satisfy
the antecedent that also satisfy
the consequent of a rule
Tid Refund Marital
Status
Taxable
Income Class
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10
(Status=Single) No
Coverage = 40%, Accuracy = 50%
Compiled By: Kamal Acharya
38. Contd..
• Advantages of Rule-Based Classifiers:
– As highly expressive as decision trees
– Easy to interpret
– Easy to generate
– Can classify new instances rapidly
– Performance comparable to decision trees
Compiled By: Kamal Acharya
39. Compiled By: Kamal Acharya
Bayesian Classification: Why?
• A statistical classifier: performs probabilistic prediction, i.e., predicts
class membership probabilities
• Foundation: Based on Bayes’ Theorem.
• Performance: A simple Bayesian classifier, naïve Bayesian classifier,
has comparable performance with decision tree and selected neural
network classifiers
• Incremental: Each training example can incrementally
increase/decrease the probability that a hypothesis is correct.
• Standard: Even when Bayesian methods are computationally
intractable, they can provide a standard of optimal decision making
against which other methods can be measured
40. Compiled By: Kamal Acharya
Bayesian Theorem: Basics
• Let X be a data sample (“evidence”): class label is unknown
• Let H be a hypothesis that X belongs to class C
• Classification is to determine P(H|X), the probability that the hypothesis
holds given the observed data sample X
• P(H) (prior probability), the initial probability
– E.g., X will buy computer, regardless of age, income, …
• P(X): probability that sample data is observed
• P(X|H) (posteriori probability), the probability of observing the sample
X, given that the hypothesis holds
– E.g., Given that X will buy computer, the prob. that X is 31..40,
medium income
41. Compiled By: Kamal Acharya
Bayesian Theorem
• Given training data X, posteriori probability of a hypothesis H,
P(H|X), follows the Bayes theorem
• Predicts X belongs to Ci iff the probability P(Ci|X) is the highest
among all the P(Ck|X) for all the k classes
)(
)()|()|(
X
XX
P
HPHPHP
42. Compiled By: Kamal Acharya
Towards Naïve Bayesian Classifier
• Let D be a training set of tuples and their associated class labels,
and each tuple is represented by an n-D attribute vector X = (x1,
x2, …, xn)
• Suppose there are m classes C1, C2, …, Cm. Classification is to
derive the maximum posteriori, i.e., the maximal P(Ci|X)
• This can be derived from Bayes’ theorem
• Since P(X) is constant for all classes, only needs to be
maximized
)(
)()|(
)|(
X
X
X
P
i
CP
i
CP
i
CP
)()|()|(
i
CP
i
CP
i
CP XX
43. Compiled By: Kamal Acharya
Derivation of Naïve Bayes Classifier
• A simplified assumption:
– attributes are conditionally independent (i.e., no dependence
relation between attributes):
– This greatly reduces the computation cost: Only counts the
class distribution
)|(...)|()|(
1
)|()|(
21
CixPCixPCixP
n
k
CixPCiP
nk
X
44. Compiled By: Kamal Acharya
Naïve Bayesian Classifier: Training Dataset
Class:
C1:buys_computer = ‘yes’
C2:buys_computer = ‘no’
Data sample
X = (age <=30,
Income = medium,
Student = yes
Credit_rating = Fair)
age income studentcredit_ratingbuys_compu
<=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
46. Compiled By: Kamal Acharya
Avoiding the 0-Probability Problem
• Naïve Bayesian prediction requires each conditional prob. be non-zero.
Otherwise, the predicted prob. will be zero
• Ex. Suppose a dataset with 1000 tuples, income=low (0), income= medium
(990), and income = high (10),
• Use Laplacian correction
– Adding 1 to each case
Prob(income = low) = 1/1003
Prob(income = medium) = 991/1003
Prob(income = high) = 11/1003
– The “corrected” prob. estimates are close to their “uncorrected” counterparts
n
k
CixkPCiXP
1
)|()|(
47. Compiled By: Kamal Acharya
Naïve Bayesian Classifier: Comments
• Advantages
– Easy to implement
– Good results obtained in most of the cases
• Disadvantages
– Assumption: class conditional independence, therefore loss of
accuracy
– Practically, dependencies exist among variables
• E.g., hospitals: patients: Profile: age, family history, etc.
Symptoms: fever, cough etc., Disease: lung cancer, diabetes, etc.
• Dependencies among these cannot be modeled by Naïve Bayesian
Classifier
48. Naive Bayesian Classifier Example
Outlook Temperature Humidity Windy Class
sunny hot high false N
sunny hot high true N
overcast hot high false P
rain mild high false P
rain cool normal false P
rain cool normal true N
overcast cool normal true P
sunny mild high false N
sunny cool normal false P
rain mild normal false P
sunny mild normal true P
overcast mild high true P
overcast hot normal false P
rain mild high true N
play tennis?
Compiled By: Kamal Acharya
49. Naive Bayesian Classifier Example
Outlook Temperature Humidity Windy Class
sunny hot high false N
sunny hot high true N
rain cool normal true N
sunny mild high false N
rain mild high true N
Outlook Temperature Humidity Windy Class
overcast hot high false P
rain mild high false P
rain cool normal false P
overcast cool normal true P
sunny cool normal false P
rain mild normal false P
sunny mild normal true P
overcast mild high true P
overcast hot normal false P
9
5
Compiled By: Kamal Acharya
50. Naive Bayesian Classifier Example
• Given the training set, we compute the probabilities:
• We also have the probabilities
– P = 9/14
– N = 5/14
O utlook P N H um idity P N
sunny 2/9 3/5 high 3/9 4/5
overcast 4/9 0 norm al 6/9 1/5
rain 3/9 2/5
Tem preature W indy
hot 2/9 2/5 true 3/9 3/5
m ild 4/9 2/5 false 6/9 2/5
cool 3/9 1/5
Compiled By: Kamal Acharya
51. Naive Bayesian Classifier Example
• To classify a new sample X:
– outlook = sunny
– temperature = cool
– humidity = high
– windy = false
• Prob(P|X) = Prob(P)*Prob(sunny|P)*Prob(cool|P)*
Prob(high|P)*Prob(false|P) = 9/14*2/9*3/9*3/9*6/9 = 0.01
• Prob(N|X) = Prob(N)*Prob(sunny|N)*Prob(cool|N)*
Prob(high|N)*Prob(false|N) = 5/14*3/5*1/5*4/5*2/5 = 0.013
• Therefore X takes class label N
Compiled By: Kamal Acharya
52. • Second example X = <rain, hot, high, false>
• P(X|p)·P(p) =
P(rain|p)·P(hot|p)·P(high|p)·P(false|p)·P(p) = 3/9·2/9·3/9·6/9·9/14 = 0.010582
• P(X|n)·P(n) =
P(rain|n)·P(hot|n)·P(high|n)·P(false|n)·P(n) = 2/5·2/5·4/5·2/5·5/14 = 0.018286
• Sample X is classified in class N (don’t play)
Naive Bayesian Classifier Example
Compiled By: Kamal Acharya
53. Compiled By: Kamal Acharya
Lazy vs. Eager Learning
• Lazy vs. eager learning
– Lazy learning (e.g., instance-based learning): Simply stores training data
(or only minor processing) and waits until it is given a test tuple
– Eager learning: Given a set of training set, constructs a classification
model before receiving new (e.g., test) data to classify
• Lazy: less time in training but more time in predicting
• Accuracy
– Lazy method effectively uses a richer hypothesis space since it uses many
local linear functions to form its implicit global approximation to the
target function
– Eager: must commit to a single hypothesis that covers the entire instance
space
54. Compiled By: Kamal Acharya
Lazy Learner: Instance-Based Methods
• Instance-based learning:
– Store training examples and delay the processing (“lazy
evaluation”) until a new instance must be classified
– Typical approach: k-nearest neighbor approach
– Instances represented as points in a Euclidean space.
55. Compiled By: Kamal Acharya
The k-Nearest Neighbor Algorithm
• All instances correspond to points in the n-D space
• The nearest neighbor are defined in terms of Euclidean distance, dist(X1,
X2)
• Target function could be discrete- or real- valued
.
_
+
_ xq
+
_ _
+
_
_
+
56. Compiled By: Kamal Acharya
Contd…
• For discrete-valued, k-NN returns the most common value among the k
training examples nearest to xq
• k-NN for real-valued prediction for a given unknown tuple
– Returns the mean values of the k nearest neighbors
• Distance-weighted nearest neighbor algorithm
– Weight the contribution of each of the k neighbors according to their
distance to the query xq
• Give greater weight to closer neighbors 2),(
1
i
xqxd
w
57. Compiled By: Kamal Acharya
Contd…
• Advantages:
– Robust to noisy data by averaging k-nearest neighbors
• Disadvantages:
– Curse of dimensionality:
• distance between neighbors could be dominated by irrelevant
attributes
• To overcome it elimination of the least relevant attributes
58. Compiled By: Kamal Acharya
Classification by Backpropagation
• Backpropagation: A neural network learning algorithm
• Started by psychologists and neurobiologists to develop and test
computational analogues of neurons
• A neural network: A set of connected input/output units where each
connection has a weight associated with it
• During the learning phase, the network learns by adjusting the
weights so as to be able to predict the correct class label of the input
tuples
• Also referred to as connectionist learning due to the connections
between units
59. Compiled By: Kamal Acharya
Neural Network as a Classifier
• Weakness
– Long training time
– Require a number of parameters typically best determined empirically, e.g., the
network topology or ``structure."
– Poor interpretability: Difficult to interpret the symbolic meaning behind the learned
weights and of ``hidden units" in the network
• Strength
– High tolerance to noisy data
– Ability to classify untrained patterns
– Well-suited for continuous-valued inputs and outputs
– Successful on a wide array of real-world data
– Algorithms are inherently parallel
– Techniques have recently been developed for the extraction of rules from trained
neural networks
60. Compiled By: Kamal Acharya
A Neuron (= a perceptron)
• The n-dimensional input vector x is mapped into variable y by means of the scalar
product and a nonlinear function mapping
-
f
weighted
sum
Input
vector x
output y
Activation
function
weight
vector w
w0
w1
wn
x0
x1
xn
)sign(y
ExampleFor
n
0i
jii bxw
bj
61. Compiled By: Kamal Acharya
A Multi-Layer Feed-Forward Neural Network
Output layer
Input layer
Hidden layer
Output vector
Input vector: X
wij
62. Compiled By: Kamal Acharya
How A Multi-Layer Neural Network Works?
• The inputs to the network correspond to the attributes measured for each
training tuple
• Inputs are fed simultaneously into the units making up the input layer
• They are then weighted and fed simultaneously to a hidden layer
• The number of hidden layers is arbitrary, although usually only one
• The weighted outputs of the last hidden layer are input to units making up
the output layer, which emits the network's prediction
• The network is feed-forward in that none of the weights cycles back to an
input unit or to an output unit of a previous layer
• From a statistical point of view, networks perform nonlinear regression:
Given enough hidden units and enough training samples, they can closely
approximate any function
63. Compiled By: Kamal Acharya
Defining a Network Topology
• First decide the network topology: # of units in the input layer, # of
hidden layers (if > 1), # of units in each hidden layer, and # of units in
the output layer
• Normalizing the input values for each attribute measured in the
training tuples to [0.0—1.0]
• One input unit per domain value, each initialized to 0
• Output, if for classification and more than two classes, one output unit
per class is used
• Once a network has been trained and its accuracy is unacceptable,
repeat the training process with a different network topology or a
different set of initial weights
64. Compiled By: Kamal Acharya
Backpropagation
• Iteratively process a set of training tuples & compare the network's prediction
with the actual known target value
• For each training tuple, the weights are modified to minimize the mean
squared error between the network's prediction and the actual target value
• Modifications are made in the “backwards” direction: from the output layer,
through each hidden layer down to the first hidden layer, hence
“backpropagation”
• Steps
– Initialize weights (to small random #s) and biases in the network
– Propagate the inputs forward (by applying activation function)
– Backpropagate the error (by updating weights and biases)
– Terminating condition (when error is very small, etc.)
65. Issues regarding classification and prediction (1):
Data Preparation
• Data cleaning
– Preprocess data in order to reduce noise and handle missing values
• Relevance analysis (feature selection)
– Remove the irrelevant or redundant attributes
• Data transformation
– Generalize and/or normalize data
• numerical attribute income categorical {low, medium, high}
• normalize all numerical attributes to [0,1)
Compiled By: Kamal Acharya
66. Issues regarding classification and prediction (2):
Evaluating Classification Methods
• accuracy
• Speed
– time to construct the model
– time to use the model
• Robustness
– handling noise and missing values
• Scalability
– efficiency in disk-resident databases
• Interpretability:
– understanding and insight provided by the model
• Goodness of rules (quality)
– decision tree size
– compactness of classification rules
Compiled By: Kamal Acharya
67. Compiled By: Kamal Acharya
Evaluation methods
• Holdout Method: The available data set D is divided into two
disjoint subsets,
– the training set Dtrain (for learning a model)
– the test set Dtest (for testing the model)
• Important: training set should not be used in testing and the
test set should not be used in learning.
– Unseen test set provides a unbiased estimate of accuracy.
• The test set is also called the holdout set. (the examples in the
original data set D are all labeled with classes.)
• This method is mainly used when the data set D is large.
68. Compiled By: Kamal Acharya
Evaluation methods (cont…)
• k-fold cross-validation: The available data is partitioned
into k equal-size disjoint subsets.
• Use each subset as the test set and combine the rest n-1
subsets as the training set to learn a classifier.
• The procedure is run k times, which give k accuracies.
• The final estimated accuracy of learning is the average of
the k accuracies.
• 10-fold and 5-fold cross-validations are commonly used.
• This method is used when the available data is not large.
69. Compiled By: Kamal Acharya
Evaluation methods (cont…)
• Leave-one-out cross-validation: This method is used when
the data set is very small.
• It is a special case of cross-validation
• Each fold of the cross validation has only a single test
example and all the rest of the data is used in training.
• If the original data has m examples, this is m-fold cross-
validation
70. Compiled By: Kamal Acharya
Evaluation methods (cont…)
• Validation set: the available data is divided into three subsets,
– a training set,
– a test set and
– a validation set
• A validation set is used frequently for estimating parameters in
learning algorithms.
• In such cases, the values that give the best accuracy on the
validation set are used as the final parameter values.
• Cross-validation can be used for parameter estimating as well.
71. Home Work
• What is supervised classification? In what situations can this technique be
useful?
• Briefly outline the major steps of decision tree classification.
• Why naïve Bayesian classification called naïve? Briefly outline the major idea
of naïve Bayesian classification.
• Compare the advantages and disadvantages of eager classification versus lazy
classification.
• Write an algorithm for k-nearest- neighbor classification given k, the nearest
number of neighbors, and n, the number of attributes describing each tuple.
Compiled By: Kamal Acharya