尊敬的 微信汇率:1円 ≈ 0.046078 元 支付宝汇率:1円 ≈ 0.046168元 [退出登录]
SlideShare a Scribd company logo
Introduction to
                      Text Mining
                       & Support
                   Vector Machines
                         (SVM)



                    Dr. Anton Heijs
                         CEO
    Treparel
 Delftechpark 26
  2628 XH Delft        July 2012
The Netherlands
www.treparel.com
KMX enables information and knowledge professionals
to gain faster, reliable, more precise insights in large
complex unstructured data sets allowing them to make
better informed decisions.




                   Treparel is a leading technology solution provider in
                         Big Data Text Analytics & Visualization

Treparel KMX – All rights reserved 2012   www.treparel.com                 2
Topics covered in this presentation


         • Who is Treparel?
         • Introduction in Text Mining
         • What is Automated Classification & Clustering?
         • Introducing Support Vector Machines




Treparel KMX – All rights reserved 2012   www.treparel.com   3
Nexus of Forces: Social, Cloud, Mobile, Information
         IT Market shift driving Big Data challenges
                                                                                 Copyright: Gartner, 2011




                 80% of data is Unstructured (Documents, Text, Images, Graphs)



Treparel KMX – All rights reserved 2012     www.treparel.com                                 4
About Treparel

         • Delft, The Netherlands, 2006.
         • Treparel is an innovative technology solution provider in Big Data
           Analytics, Text Mining and Visualization.
         • KMX is an integrated data analysis toolset which provide faster,
           reliable intelligent insights in large complex unstructured data sets to
           allow companies to make better informed decisions.
         • Clients: Philips, Bayer, Abbott, European Patent Office, European
           Commission
         • Part of Research Centers and University ecosystem; TU Delft,
           Universities of Paris and Sao Paulo
         • More info: www.treparel.com




Treparel KMX – All rights reserved 2012   www.treparel.com                        5
Positioning of Treparel’s KMX technology

Text Acquisition & Preparation   Analysis and processing         Output and display
‘Seek’                           ‘Model’                         ‘Adapt’


External sources                                                 Reporting &
                             Text preprocessing
Patents                                                          Presentation
Legal
                                                                 Media and publishing
Research                     Indexing                            databases
Media / Publishers
                                                                 Content management
Other sources                Clustering                          systems
Documents
Websites                                                         Line-of-business
                             Classification                      applications
Blogs
Newsfeeds                                                        Research applications
Email                        Semantic Analysis
Application notes                                                Search engines
Search results
Social networks                                    Visualization


            Information extraction (entities, facts, relationships, concepts, patents)
                        Management, Development and Configuration
                                                                    Copyright: Gartner, J. Popkin 2010
Getting to know the basics

        PART A: Intro in Text Mining
        • The Data (text & image) Mining evolution
        • What is Data Mining: in or out-side the database
        • The Data Mining process
        • Two types of Data Mining tasks: Predictive and Descriptive
        • Two modes of Data Mining tasks: Supervised and Unsupervised
        • The most important algorithms per category


        PART B: SVM
        • Machine Learning & Support Vector Machines (SVM)
        • What makes SVM unique
        • When and How to deploy SVM
        • Case Studies & Examples


Treparel KMX – All rights reserved 2012   www.treparel.com              7
The Data/Text/Image mining evolution
         The Road ahead
                                                                                               Future
            High                                                                                        Enterprise
                                                                               Today                    Text Analytics
                                                                                  Analytical
                                                                                  Modeling
                                                                 1995 - 2000

                                                                        SVM
                                                                        Predictive
                                                                        Modeling
             Application Value




                                               1980’s

                                     Traditional
                                                               “Easy-to-Use”
                                     Data Mining
                                                                Data Mining
                                                                   Tools
                                                               1980’s


                                                                                                            1990’s
                                                                   OLAP                   Query and
                                                                                          Reporting
             Low

                                 Hard to use                                                            Easy to Use
                                                         Usability

Treparel KMX – All rights reserved 2012                 www.treparel.com                                                 8
Knowledge Mining
         Different levels of depth in knowledge discovery

          Visualization (Adapt)



                                                                    Models of semantic data


                                                  Models of data


                           Models of meta data


                                                   Data Mining      Knowledge
         Filtered data
                                                   Text Mining      Discovery
                           Meta Data               Graph Mining


          Data Collection (Seek)

                                                                      Time
Treparel KMX – All rights reserved 2012          www.treparel.com                             9
What is Data Mining?
           Getting to know the basics
        • Most businesses have an enormous amount of data, with a great deal of
          information hiding within it; The data is also growing faster then the knowledge
          which is now extracted from the data, which leads to a growing gap between
          data and knowledge.
        • Data mining provides a way to automatically extract information buried in the
          data.
        • Data Mining creates mathematical models which describe patterns in large,
          complex collections of data.
        • Patterns elude traditional statistical approaches to analysis because of the large
          number of attributes, the complexity of the patterns, or the difficulty to perform
          the analysis
        • Mining the data directly in the database has advantages:
          less data movement, more data security, one source of the
          data
        • Basically 2 Types of Data exist:
              – Structured (tables & numbers) – 20% of data volume
              – Un-Structured (text, images) - 80% of data volume




Treparel KMX – All rights reserved 2012        www.treparel.com                          10
The Data & Text Mining process
            Automating the mining steps; adding new features

                    Understanding the knowledge mining value chain




                                   Data                                              Model
              Data                 Preparation    Algorithm   Model       Model      generation
                                   &                                      De-        (All models) &   Visualization
              Collection &                        Selection   Building
              Understanding        Cleansing                  & Testing   ployment   coordination




                                                                          Treparel's Focus
                                                                          & Core competence


                                  Traditional Players


Treparel KMX – All rights reserved 2012
2 types of Data Mining Functions
         Predictive Data Mining (supervised):
         •    Are used to predict a value; they require the specification of a
              target (known outcome)
         •    Targets are either binary attributes (indicating yes/no) decisions or
              multi-class targets indicating a preferred alternative (color of
              sweater, salary range).
         •    Constructs one or more models; these models are used to predict
              outcomes for data sets
         Descriptive Data Mining (Unsupervised):
         •    Are used to find the intrinsic structure, relations, or affinities in
              data.
         •    Describes a data set in a concise way and presents interesting
              characteristics of the data
         •    The functions are: clustering, association models, and feature
              extraction

Treparel KMX – All rights reserved 2012   www.treparel.com                       12
How does Automated Classification & Clustering
         works?
         • Consists of dividing the items that make up a collection into
           categories or classes.
         • The goal is to accurately predict the target class for each
           record in new data.
         • Algorithms for classification: different algorithms for
           different problems
                  Naïve Bayes
                  Adaptive Bayes Network
                  Support Vector Machine
                  Decision Tree


            Classification is used in: customer segmentation, sentiment
                analysis, competitive analysis, business modeling, credit
                 analysis, Smart content, Fraud and terrorist detection,
                        Diagnosis support, Patent & Drug discovery
Treparel KMX – All rights reserved 2012     www.treparel.com          13
Text Mining algorithms and features

         Feature                  Naive Bayes         Adaptive        Suport Vector     Decision Tree
                                                      Bayes           Machine
                                                      Network
         Speed                    Very fast           Fast            Fast with         Fast
                                                                      active learning
         Accuracy                 Good in many        Good in many    Significant       Good in many
                                  domains             domains                           domains

         Transparancy             No rules (black Rules for           No rules (black Rules
                                  box)                                box)

         Missing value            Missing value       Missing value   Sparse Data       Missing value
         intrepretation




Treparel KMX – All rights reserved 2012           www.treparel.com                               14
What is Support Vector Machine Learning?
        State of the Art algorithm
        • SVM is a state of the art classification and regression algorithm
        • The SVM optimization procedure maximizes predictive accuracy
          while automatically avoiding over-fitting the training data
        • SVM projects the input data into a kernel space. Then it builds a
          linear model in this kernel space
        • SVM performs well with real world applications such as
          classifying text, recognizing hand-written characters, classifying
          images, as well as bioinformatics and bio sequence analysis.
        • SVM are the standard tools for machine learning and data mining




Treparel KMX – All rights reserved 2012   www.treparel.com                     15
What is Support Vector Machine Learning?
                 Classical Data Mining vs SVM

                     Classical Statistics            SVM - Support Vector Machines

                   Hypothesis on Data                  Study of the model family:
                    distribution                         the VC dimension

                   Large number of dimensions          Number of dimensions can be
                    implies large number of model        very high because generalization
                    parameters which leads to            is controlled
                    generalization problems


                   Modeling seeks to get the best      Modeling seeks to get the best
                    Fit                                  compromise between Fit and
                                                         Robustness


                   Manual iterations and time          Automation is possible
                    are necessary



Treparel KMX –
All rights
reserved 2012
What makes SVM such a unique technology?
         • Strong theoretical foundation (Vapnik-Chervonenkis theory)
         • There is no upper limit on the number of attributes ; Only constraint is
           the hardware
         • Good generalization to novel data
         • SVM is the preferred algorithm for sparse data
         • Algorithm of choice for challenging high-dimensional data
         • SVM supports active learning.
               – SVM models grow as the size of the training set increases, big data
                 sets would be difficult to handle.
               – Aative learning forces the SVM algorithm to restrict learning to the
                 most informative training examples.
         • SVM automatically selects a kernel
         • You can control both the model quality (accuracy) and the performance
           (build time)

Treparel KMX – All rights reserved 2012   www.treparel.com                        17
What makes SVM unique?
         SVM gives you control over the models
                  Robustness
                          High
                    Robustness




                                   Under Fit Model                              Robust Model
                                   High Robustness                              Low Training Error Low Test
                                   Training Error = Test Error                  Error




                          Low                                                   Over Fit Model
                    Robustness
                                                                                Low Robustness
                                                                                No Training Error, High Test
                                                                                Error
                                 Low accuracy                                                      High accuracy
                                                                                                               Quality of fit
Treparel KMX – All rights reserved 2012                          www.treparel.com                                         18
What makes SVM unique?
         SVM gives you control over the models




                                 Need more training data                 Safe to Deploy
                         High
            Robustness



                                 (rows)



                                Need more data
                                                                Need more variables
                                (rows/columns)
                         Low




                                                                (columns) or different model
                                or different model type         type

                                            Low                              High

                                                           Quality

Treparel KMX – All rights reserved 2012               www.treparel.com                         19
Treparel is a leading technology solution provider
       in Big Data Text Analytics & Visualization


                                              Treparel
                                           Delftechpark 26
                                            2628 XH Delft
                                          The Netherlands
                                          www.treparel.com


Treparel KMX – All rights reserved 2012      www.treparel.com   20

More Related Content

What's hot

[db tech showcase Tokyo 2018] #dbts2018 #B38 『Big Data and the Multi-model Da...
[db tech showcase Tokyo 2018] #dbts2018 #B38 『Big Data and the Multi-model Da...[db tech showcase Tokyo 2018] #dbts2018 #B38 『Big Data and the Multi-model Da...
[db tech showcase Tokyo 2018] #dbts2018 #B38 『Big Data and the Multi-model Da...
Insight Technology, Inc.
 
Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Rio Info
 
Data Mining - The Big Picture!
Data Mining - The Big Picture!Data Mining - The Big Picture!
Data Mining - The Big Picture!
Khalid Salama
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-on
Dony Riyanto
 
Data mining presentation.ppt
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.ppt
neelamoberoi1030
 
II-SDV 2017: The Next Era: Deep Learning for Biomedical Research
II-SDV 2017: The Next Era: Deep Learning for Biomedical ResearchII-SDV 2017: The Next Era: Deep Learning for Biomedical Research
II-SDV 2017: The Next Era: Deep Learning for Biomedical Research
Dr. Haxel Consult
 
Big Data Analytics(Intro,Hadoop Map Reduce,Mahout,K-means clustering,H-base)
Big Data Analytics(Intro,Hadoop Map Reduce,Mahout,K-means clustering,H-base)Big Data Analytics(Intro,Hadoop Map Reduce,Mahout,K-means clustering,H-base)
Big Data Analytics(Intro,Hadoop Map Reduce,Mahout,K-means clustering,H-base)
MIT College Of Engineering,Pune
 
Knowledge discovery thru data mining
Knowledge discovery thru data miningKnowledge discovery thru data mining
Knowledge discovery thru data mining
Devakumar Jain
 
Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science
Venkata Reddy Konasani
 
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET Journal
 
Machine Learning - Intro
Machine Learning - IntroMachine Learning - Intro
Machine Learning - Intro
Giorgio Alfredo Spedicato
 

What's hot (11)

[db tech showcase Tokyo 2018] #dbts2018 #B38 『Big Data and the Multi-model Da...
[db tech showcase Tokyo 2018] #dbts2018 #B38 『Big Data and the Multi-model Da...[db tech showcase Tokyo 2018] #dbts2018 #B38 『Big Data and the Multi-model Da...
[db tech showcase Tokyo 2018] #dbts2018 #B38 『Big Data and the Multi-model Da...
 
Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...
 
Data Mining - The Big Picture!
Data Mining - The Big Picture!Data Mining - The Big Picture!
Data Mining - The Big Picture!
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-on
 
Data mining presentation.ppt
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.ppt
 
II-SDV 2017: The Next Era: Deep Learning for Biomedical Research
II-SDV 2017: The Next Era: Deep Learning for Biomedical ResearchII-SDV 2017: The Next Era: Deep Learning for Biomedical Research
II-SDV 2017: The Next Era: Deep Learning for Biomedical Research
 
Big Data Analytics(Intro,Hadoop Map Reduce,Mahout,K-means clustering,H-base)
Big Data Analytics(Intro,Hadoop Map Reduce,Mahout,K-means clustering,H-base)Big Data Analytics(Intro,Hadoop Map Reduce,Mahout,K-means clustering,H-base)
Big Data Analytics(Intro,Hadoop Map Reduce,Mahout,K-means clustering,H-base)
 
Knowledge discovery thru data mining
Knowledge discovery thru data miningKnowledge discovery thru data mining
Knowledge discovery thru data mining
 
Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science
 
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
 
Machine Learning - Intro
Machine Learning - IntroMachine Learning - Intro
Machine Learning - Intro
 

Viewers also liked

Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
Prakash Pimpale
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
nextlib
 
Lecture12 - SVM
Lecture12 - SVMLecture12 - SVM
Lecture12 - SVM
Albert Orriols-Puig
 
Support Vector machine
Support Vector machineSupport Vector machine
Support Vector machine
Anandha L Ranganathan
 
Support Vector Machine
Support Vector MachineSupport Vector Machine
Support Vector Machine
Shao-Chuan Wang
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
Musa Hawamdah
 
Support Vector Machine without tears
Support Vector Machine without tearsSupport Vector Machine without tears
Support Vector Machine without tears
Ankit Sharma
 
Image Classification And Support Vector Machine
Image Classification And Support Vector MachineImage Classification And Support Vector Machine
Image Classification And Support Vector Machine
Shao-Chuan Wang
 
Support Vector Machine (SVM) Based Classifier For Khmer Printed Character-set...
Support Vector Machine (SVM) Based Classifier For Khmer Printed Character-set...Support Vector Machine (SVM) Based Classifier For Khmer Printed Character-set...
Support Vector Machine (SVM) Based Classifier For Khmer Printed Character-set...
osify
 
Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier
Dev Sahu
 
Linear regression without tears
Linear regression without tearsLinear regression without tears
Linear regression without tears
Ankit Sharma
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
Data Science Society
 
Support Vector Machine(SVM) with Iris and Mushroom Dataset
Support Vector Machine(SVM) with Iris and Mushroom DatasetSupport Vector Machine(SVM) with Iris and Mushroom Dataset
Support Vector Machine(SVM) with Iris and Mushroom Dataset
Pawandeep Kaur
 
Sentiment Analysis Using Machine Learning
Sentiment Analysis Using Machine LearningSentiment Analysis Using Machine Learning
Sentiment Analysis Using Machine Learning
Nihar Suryawanshi
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
Ankur Tyagi
 
09 Machine Learning - Introduction Support Vector Machines
09 Machine Learning - Introduction Support Vector Machines09 Machine Learning - Introduction Support Vector Machines
09 Machine Learning - Introduction Support Vector Machines
Andres Mendez-Vazquez
 
k Nearest Neighbor
k Nearest Neighbork Nearest Neighbor
k Nearest Neighbor
butest
 
How Sentiment Analysis works
How Sentiment Analysis worksHow Sentiment Analysis works
How Sentiment Analysis works
CJ Jenkins
 
Decision Trees
Decision TreesDecision Trees
Backpropagation in Convolutional Neural Network
Backpropagation in Convolutional Neural NetworkBackpropagation in Convolutional Neural Network
Backpropagation in Convolutional Neural Network
Hiroshi Kuwajima
 

Viewers also liked (20)

Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Lecture12 - SVM
Lecture12 - SVMLecture12 - SVM
Lecture12 - SVM
 
Support Vector machine
Support Vector machineSupport Vector machine
Support Vector machine
 
Support Vector Machine
Support Vector MachineSupport Vector Machine
Support Vector Machine
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
Support Vector Machine without tears
Support Vector Machine without tearsSupport Vector Machine without tears
Support Vector Machine without tears
 
Image Classification And Support Vector Machine
Image Classification And Support Vector MachineImage Classification And Support Vector Machine
Image Classification And Support Vector Machine
 
Support Vector Machine (SVM) Based Classifier For Khmer Printed Character-set...
Support Vector Machine (SVM) Based Classifier For Khmer Printed Character-set...Support Vector Machine (SVM) Based Classifier For Khmer Printed Character-set...
Support Vector Machine (SVM) Based Classifier For Khmer Printed Character-set...
 
Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier
 
Linear regression without tears
Linear regression without tearsLinear regression without tears
Linear regression without tears
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
Support Vector Machine(SVM) with Iris and Mushroom Dataset
Support Vector Machine(SVM) with Iris and Mushroom DatasetSupport Vector Machine(SVM) with Iris and Mushroom Dataset
Support Vector Machine(SVM) with Iris and Mushroom Dataset
 
Sentiment Analysis Using Machine Learning
Sentiment Analysis Using Machine LearningSentiment Analysis Using Machine Learning
Sentiment Analysis Using Machine Learning
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
09 Machine Learning - Introduction Support Vector Machines
09 Machine Learning - Introduction Support Vector Machines09 Machine Learning - Introduction Support Vector Machines
09 Machine Learning - Introduction Support Vector Machines
 
k Nearest Neighbor
k Nearest Neighbork Nearest Neighbor
k Nearest Neighbor
 
How Sentiment Analysis works
How Sentiment Analysis worksHow Sentiment Analysis works
How Sentiment Analysis works
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 
Backpropagation in Convolutional Neural Network
Backpropagation in Convolutional Neural NetworkBackpropagation in Convolutional Neural Network
Backpropagation in Convolutional Neural Network
 

Similar to Support Vector Machines (SVM) - Text Analytics algorithm introduction 2012

Teradata Big Data London Seminar
Teradata Big Data London SeminarTeradata Big Data London Seminar
Teradata Big Data London Seminar
Hortonworks
 
Big Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the FutureBig Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the Future
Odinot Stanislas
 
Crowd-Sourced Intelligence Built into Search over Hadoop
Crowd-Sourced Intelligence Built into Search over HadoopCrowd-Sourced Intelligence Built into Search over Hadoop
Crowd-Sourced Intelligence Built into Search over Hadoop
DataWorks Summit
 
Getting Cloud Architecture Right the First Time Ver 2
Getting Cloud Architecture Right the First Time Ver 2Getting Cloud Architecture Right the First Time Ver 2
Getting Cloud Architecture Right the First Time Ver 2
David Linthicum
 
Hadoop summit EU - Crowd Sourcing Reflected Intelligence
Hadoop summit EU - Crowd Sourcing Reflected IntelligenceHadoop summit EU - Crowd Sourcing Reflected Intelligence
Hadoop summit EU - Crowd Sourcing Reflected Intelligence
Ted Dunning
 
Best practices for building and deploying predictive models over big data pre...
Best practices for building and deploying predictive models over big data pre...Best practices for building and deploying predictive models over big data pre...
Best practices for building and deploying predictive models over big data pre...
Kun Le
 
Data Search Searching And Finding Information In Unstructured And Structured ...
Data Search Searching And Finding Information In Unstructured And Structured ...Data Search Searching And Finding Information In Unstructured And Structured ...
Data Search Searching And Finding Information In Unstructured And Structured ...
Erik Fransen
 
Data Mining
Data MiningData Mining
Data Mining
swami920
 
A Trading-Based Knowledge Representation Metamodel for Management Information...
A Trading-Based Knowledge Representation Metamodel for Management Information...A Trading-Based Knowledge Representation Metamodel for Management Information...
A Trading-Based Knowledge Representation Metamodel for Management Information...
Applied Computing Group
 
The Comprehensive Approach: A Unified Information Architecture
The Comprehensive Approach: A Unified Information ArchitectureThe Comprehensive Approach: A Unified Information Architecture
The Comprehensive Approach: A Unified Information Architecture
Inside Analysis
 
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendIntroducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Caserta
 
NIEM and Oracle Overview October 2011
NIEM and Oracle Overview October 2011NIEM and Oracle Overview October 2011
NIEM and Oracle Overview October 2011
Bizagi Inc
 
Metadata Use Cases You Can Use
Metadata Use Cases You Can UseMetadata Use Cases You Can Use
Metadata Use Cases You Can Use
dmurph4
 
Metadata Use Cases
Metadata Use CasesMetadata Use Cases
Metadata Use Cases
dmurph4
 
Unity: Because the Sum is Greater than the Parts
Unity: Because the Sum is Greater than the PartsUnity: Because the Sum is Greater than the Parts
Unity: Because the Sum is Greater than the Parts
Inside Analysis
 
Web 2.0 And The End Of DITA
Web 2.0 And The End Of DITAWeb 2.0 And The End Of DITA
Web 2.0 And The End Of DITA
Joe Gollner
 
MapR lucidworks joint webinar
MapR lucidworks joint webinarMapR lucidworks joint webinar
MapR lucidworks joint webinar
Ted Dunning
 
Scalable Computing Labs (SCL).
Scalable Computing Labs (SCL).Scalable Computing Labs (SCL).
Scalable Computing Labs (SCL).
Mindtree Ltd.
 
MapR LucidWorks Joint Webinar 121211
MapR LucidWorks Joint Webinar 121211MapR LucidWorks Joint Webinar 121211
MapR LucidWorks Joint Webinar 121211
MapR Technologies
 
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
European Data Forum
 

Similar to Support Vector Machines (SVM) - Text Analytics algorithm introduction 2012 (20)

Teradata Big Data London Seminar
Teradata Big Data London SeminarTeradata Big Data London Seminar
Teradata Big Data London Seminar
 
Big Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the FutureBig Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the Future
 
Crowd-Sourced Intelligence Built into Search over Hadoop
Crowd-Sourced Intelligence Built into Search over HadoopCrowd-Sourced Intelligence Built into Search over Hadoop
Crowd-Sourced Intelligence Built into Search over Hadoop
 
Getting Cloud Architecture Right the First Time Ver 2
Getting Cloud Architecture Right the First Time Ver 2Getting Cloud Architecture Right the First Time Ver 2
Getting Cloud Architecture Right the First Time Ver 2
 
Hadoop summit EU - Crowd Sourcing Reflected Intelligence
Hadoop summit EU - Crowd Sourcing Reflected IntelligenceHadoop summit EU - Crowd Sourcing Reflected Intelligence
Hadoop summit EU - Crowd Sourcing Reflected Intelligence
 
Best practices for building and deploying predictive models over big data pre...
Best practices for building and deploying predictive models over big data pre...Best practices for building and deploying predictive models over big data pre...
Best practices for building and deploying predictive models over big data pre...
 
Data Search Searching And Finding Information In Unstructured And Structured ...
Data Search Searching And Finding Information In Unstructured And Structured ...Data Search Searching And Finding Information In Unstructured And Structured ...
Data Search Searching And Finding Information In Unstructured And Structured ...
 
Data Mining
Data MiningData Mining
Data Mining
 
A Trading-Based Knowledge Representation Metamodel for Management Information...
A Trading-Based Knowledge Representation Metamodel for Management Information...A Trading-Based Knowledge Representation Metamodel for Management Information...
A Trading-Based Knowledge Representation Metamodel for Management Information...
 
The Comprehensive Approach: A Unified Information Architecture
The Comprehensive Approach: A Unified Information ArchitectureThe Comprehensive Approach: A Unified Information Architecture
The Comprehensive Approach: A Unified Information Architecture
 
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendIntroducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
 
NIEM and Oracle Overview October 2011
NIEM and Oracle Overview October 2011NIEM and Oracle Overview October 2011
NIEM and Oracle Overview October 2011
 
Metadata Use Cases You Can Use
Metadata Use Cases You Can UseMetadata Use Cases You Can Use
Metadata Use Cases You Can Use
 
Metadata Use Cases
Metadata Use CasesMetadata Use Cases
Metadata Use Cases
 
Unity: Because the Sum is Greater than the Parts
Unity: Because the Sum is Greater than the PartsUnity: Because the Sum is Greater than the Parts
Unity: Because the Sum is Greater than the Parts
 
Web 2.0 And The End Of DITA
Web 2.0 And The End Of DITAWeb 2.0 And The End Of DITA
Web 2.0 And The End Of DITA
 
MapR lucidworks joint webinar
MapR lucidworks joint webinarMapR lucidworks joint webinar
MapR lucidworks joint webinar
 
Scalable Computing Labs (SCL).
Scalable Computing Labs (SCL).Scalable Computing Labs (SCL).
Scalable Computing Labs (SCL).
 
MapR LucidWorks Joint Webinar 121211
MapR LucidWorks Joint Webinar 121211MapR LucidWorks Joint Webinar 121211
MapR LucidWorks Joint Webinar 121211
 
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
 

Recently uploaded

Chapter 1 - Fundamentals of Testing V4.0
Chapter 1 - Fundamentals of Testing V4.0Chapter 1 - Fundamentals of Testing V4.0
Chapter 1 - Fundamentals of Testing V4.0
Neeraj Kumar Singh
 
MySQL InnoDB Storage Engine: Deep Dive - Mydbops
MySQL InnoDB Storage Engine: Deep Dive - MydbopsMySQL InnoDB Storage Engine: Deep Dive - Mydbops
MySQL InnoDB Storage Engine: Deep Dive - Mydbops
Mydbops
 
Elasticity vs. State? Exploring Kafka Streams Cassandra State Store
Elasticity vs. State? Exploring Kafka Streams Cassandra State StoreElasticity vs. State? Exploring Kafka Streams Cassandra State Store
Elasticity vs. State? Exploring Kafka Streams Cassandra State Store
ScyllaDB
 
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdfLee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
leebarnesutopia
 
Getting Started Using the National Research Platform
Getting Started Using the National Research PlatformGetting Started Using the National Research Platform
Getting Started Using the National Research Platform
Larry Smarr
 
Kubernetes Cloud Native Indonesia Meetup - June 2024
Kubernetes Cloud Native Indonesia Meetup - June 2024Kubernetes Cloud Native Indonesia Meetup - June 2024
Kubernetes Cloud Native Indonesia Meetup - June 2024
Prasta Maha
 
From NCSA to the National Research Platform
From NCSA to the National Research PlatformFrom NCSA to the National Research Platform
From NCSA to the National Research Platform
Larry Smarr
 
APJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes WebinarAPJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes Webinar
ThousandEyes
 
EverHost AI Review: Empowering Websites with Limitless Possibilities through ...
EverHost AI Review: Empowering Websites with Limitless Possibilities through ...EverHost AI Review: Empowering Websites with Limitless Possibilities through ...
EverHost AI Review: Empowering Websites with Limitless Possibilities through ...
SOFTTECHHUB
 
Chapter 5 - Managing Test Activities V4.0
Chapter 5 - Managing Test Activities V4.0Chapter 5 - Managing Test Activities V4.0
Chapter 5 - Managing Test Activities V4.0
Neeraj Kumar Singh
 
How to Optimize Call Monitoring: Automate QA and Elevate Customer Experience
How to Optimize Call Monitoring: Automate QA and Elevate Customer ExperienceHow to Optimize Call Monitoring: Automate QA and Elevate Customer Experience
How to Optimize Call Monitoring: Automate QA and Elevate Customer Experience
Aggregage
 
Day 4 - Excel Automation and Data Manipulation
Day 4 - Excel Automation and Data ManipulationDay 4 - Excel Automation and Data Manipulation
Day 4 - Excel Automation and Data Manipulation
UiPathCommunity
 
The Strategy Behind ReversingLabs’ Massive Key-Value Migration
The Strategy Behind ReversingLabs’ Massive Key-Value MigrationThe Strategy Behind ReversingLabs’ Massive Key-Value Migration
The Strategy Behind ReversingLabs’ Massive Key-Value Migration
ScyllaDB
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
UiPathCommunity
 
Product Listing Optimization Presentation - Gay De La Cruz.pdf
Product Listing Optimization Presentation - Gay De La Cruz.pdfProduct Listing Optimization Presentation - Gay De La Cruz.pdf
Product Listing Optimization Presentation - Gay De La Cruz.pdf
gaydlc2513
 
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My IdentityCNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
Cynthia Thomas
 
ThousandEyes New Product Features and Release Highlights: June 2024
ThousandEyes New Product Features and Release Highlights: June 2024ThousandEyes New Product Features and Release Highlights: June 2024
ThousandEyes New Product Features and Release Highlights: June 2024
ThousandEyes
 
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google CloudRadically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
ScyllaDB
 
Corporate Open Source Anti-Patterns: A Decade Later
Corporate Open Source Anti-Patterns: A Decade LaterCorporate Open Source Anti-Patterns: A Decade Later
Corporate Open Source Anti-Patterns: A Decade Later
ScyllaDB
 
Automation Student Developers Session 3: Introduction to UI Automation
Automation Student Developers Session 3: Introduction to UI AutomationAutomation Student Developers Session 3: Introduction to UI Automation
Automation Student Developers Session 3: Introduction to UI Automation
UiPathCommunity
 

Recently uploaded (20)

Chapter 1 - Fundamentals of Testing V4.0
Chapter 1 - Fundamentals of Testing V4.0Chapter 1 - Fundamentals of Testing V4.0
Chapter 1 - Fundamentals of Testing V4.0
 
MySQL InnoDB Storage Engine: Deep Dive - Mydbops
MySQL InnoDB Storage Engine: Deep Dive - MydbopsMySQL InnoDB Storage Engine: Deep Dive - Mydbops
MySQL InnoDB Storage Engine: Deep Dive - Mydbops
 
Elasticity vs. State? Exploring Kafka Streams Cassandra State Store
Elasticity vs. State? Exploring Kafka Streams Cassandra State StoreElasticity vs. State? Exploring Kafka Streams Cassandra State Store
Elasticity vs. State? Exploring Kafka Streams Cassandra State Store
 
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdfLee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
 
Getting Started Using the National Research Platform
Getting Started Using the National Research PlatformGetting Started Using the National Research Platform
Getting Started Using the National Research Platform
 
Kubernetes Cloud Native Indonesia Meetup - June 2024
Kubernetes Cloud Native Indonesia Meetup - June 2024Kubernetes Cloud Native Indonesia Meetup - June 2024
Kubernetes Cloud Native Indonesia Meetup - June 2024
 
From NCSA to the National Research Platform
From NCSA to the National Research PlatformFrom NCSA to the National Research Platform
From NCSA to the National Research Platform
 
APJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes WebinarAPJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes Webinar
 
EverHost AI Review: Empowering Websites with Limitless Possibilities through ...
EverHost AI Review: Empowering Websites with Limitless Possibilities through ...EverHost AI Review: Empowering Websites with Limitless Possibilities through ...
EverHost AI Review: Empowering Websites with Limitless Possibilities through ...
 
Chapter 5 - Managing Test Activities V4.0
Chapter 5 - Managing Test Activities V4.0Chapter 5 - Managing Test Activities V4.0
Chapter 5 - Managing Test Activities V4.0
 
How to Optimize Call Monitoring: Automate QA and Elevate Customer Experience
How to Optimize Call Monitoring: Automate QA and Elevate Customer ExperienceHow to Optimize Call Monitoring: Automate QA and Elevate Customer Experience
How to Optimize Call Monitoring: Automate QA and Elevate Customer Experience
 
Day 4 - Excel Automation and Data Manipulation
Day 4 - Excel Automation and Data ManipulationDay 4 - Excel Automation and Data Manipulation
Day 4 - Excel Automation and Data Manipulation
 
The Strategy Behind ReversingLabs’ Massive Key-Value Migration
The Strategy Behind ReversingLabs’ Massive Key-Value MigrationThe Strategy Behind ReversingLabs’ Massive Key-Value Migration
The Strategy Behind ReversingLabs’ Massive Key-Value Migration
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
 
Product Listing Optimization Presentation - Gay De La Cruz.pdf
Product Listing Optimization Presentation - Gay De La Cruz.pdfProduct Listing Optimization Presentation - Gay De La Cruz.pdf
Product Listing Optimization Presentation - Gay De La Cruz.pdf
 
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My IdentityCNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
 
ThousandEyes New Product Features and Release Highlights: June 2024
ThousandEyes New Product Features and Release Highlights: June 2024ThousandEyes New Product Features and Release Highlights: June 2024
ThousandEyes New Product Features and Release Highlights: June 2024
 
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google CloudRadically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
 
Corporate Open Source Anti-Patterns: A Decade Later
Corporate Open Source Anti-Patterns: A Decade LaterCorporate Open Source Anti-Patterns: A Decade Later
Corporate Open Source Anti-Patterns: A Decade Later
 
Automation Student Developers Session 3: Introduction to UI Automation
Automation Student Developers Session 3: Introduction to UI AutomationAutomation Student Developers Session 3: Introduction to UI Automation
Automation Student Developers Session 3: Introduction to UI Automation
 

Support Vector Machines (SVM) - Text Analytics algorithm introduction 2012

  • 1. Introduction to Text Mining & Support Vector Machines (SVM) Dr. Anton Heijs CEO Treparel Delftechpark 26 2628 XH Delft July 2012 The Netherlands www.treparel.com
  • 2. KMX enables information and knowledge professionals to gain faster, reliable, more precise insights in large complex unstructured data sets allowing them to make better informed decisions. Treparel is a leading technology solution provider in Big Data Text Analytics & Visualization Treparel KMX – All rights reserved 2012 www.treparel.com 2
  • 3. Topics covered in this presentation • Who is Treparel? • Introduction in Text Mining • What is Automated Classification & Clustering? • Introducing Support Vector Machines Treparel KMX – All rights reserved 2012 www.treparel.com 3
  • 4. Nexus of Forces: Social, Cloud, Mobile, Information IT Market shift driving Big Data challenges Copyright: Gartner, 2011 80% of data is Unstructured (Documents, Text, Images, Graphs) Treparel KMX – All rights reserved 2012 www.treparel.com 4
  • 5. About Treparel • Delft, The Netherlands, 2006. • Treparel is an innovative technology solution provider in Big Data Analytics, Text Mining and Visualization. • KMX is an integrated data analysis toolset which provide faster, reliable intelligent insights in large complex unstructured data sets to allow companies to make better informed decisions. • Clients: Philips, Bayer, Abbott, European Patent Office, European Commission • Part of Research Centers and University ecosystem; TU Delft, Universities of Paris and Sao Paulo • More info: www.treparel.com Treparel KMX – All rights reserved 2012 www.treparel.com 5
  • 6. Positioning of Treparel’s KMX technology Text Acquisition & Preparation Analysis and processing Output and display ‘Seek’ ‘Model’ ‘Adapt’ External sources Reporting & Text preprocessing Patents Presentation Legal Media and publishing Research Indexing databases Media / Publishers Content management Other sources Clustering systems Documents Websites Line-of-business Classification applications Blogs Newsfeeds Research applications Email Semantic Analysis Application notes Search engines Search results Social networks Visualization Information extraction (entities, facts, relationships, concepts, patents) Management, Development and Configuration Copyright: Gartner, J. Popkin 2010
  • 7. Getting to know the basics PART A: Intro in Text Mining • The Data (text & image) Mining evolution • What is Data Mining: in or out-side the database • The Data Mining process • Two types of Data Mining tasks: Predictive and Descriptive • Two modes of Data Mining tasks: Supervised and Unsupervised • The most important algorithms per category PART B: SVM • Machine Learning & Support Vector Machines (SVM) • What makes SVM unique • When and How to deploy SVM • Case Studies & Examples Treparel KMX – All rights reserved 2012 www.treparel.com 7
  • 8. The Data/Text/Image mining evolution The Road ahead Future High Enterprise Today Text Analytics Analytical Modeling 1995 - 2000 SVM Predictive Modeling Application Value 1980’s Traditional “Easy-to-Use” Data Mining Data Mining Tools 1980’s 1990’s OLAP Query and Reporting Low Hard to use Easy to Use Usability Treparel KMX – All rights reserved 2012 www.treparel.com 8
  • 9. Knowledge Mining Different levels of depth in knowledge discovery Visualization (Adapt) Models of semantic data Models of data Models of meta data Data Mining Knowledge Filtered data Text Mining Discovery Meta Data Graph Mining Data Collection (Seek) Time Treparel KMX – All rights reserved 2012 www.treparel.com 9
  • 10. What is Data Mining? Getting to know the basics • Most businesses have an enormous amount of data, with a great deal of information hiding within it; The data is also growing faster then the knowledge which is now extracted from the data, which leads to a growing gap between data and knowledge. • Data mining provides a way to automatically extract information buried in the data. • Data Mining creates mathematical models which describe patterns in large, complex collections of data. • Patterns elude traditional statistical approaches to analysis because of the large number of attributes, the complexity of the patterns, or the difficulty to perform the analysis • Mining the data directly in the database has advantages: less data movement, more data security, one source of the data • Basically 2 Types of Data exist: – Structured (tables & numbers) – 20% of data volume – Un-Structured (text, images) - 80% of data volume Treparel KMX – All rights reserved 2012 www.treparel.com 10
  • 11. The Data & Text Mining process Automating the mining steps; adding new features Understanding the knowledge mining value chain Data Model Data Preparation Algorithm Model Model generation & De- (All models) & Visualization Collection & Selection Building Understanding Cleansing & Testing ployment coordination Treparel's Focus & Core competence Traditional Players Treparel KMX – All rights reserved 2012
  • 12. 2 types of Data Mining Functions Predictive Data Mining (supervised): • Are used to predict a value; they require the specification of a target (known outcome) • Targets are either binary attributes (indicating yes/no) decisions or multi-class targets indicating a preferred alternative (color of sweater, salary range). • Constructs one or more models; these models are used to predict outcomes for data sets Descriptive Data Mining (Unsupervised): • Are used to find the intrinsic structure, relations, or affinities in data. • Describes a data set in a concise way and presents interesting characteristics of the data • The functions are: clustering, association models, and feature extraction Treparel KMX – All rights reserved 2012 www.treparel.com 12
  • 13. How does Automated Classification & Clustering works? • Consists of dividing the items that make up a collection into categories or classes. • The goal is to accurately predict the target class for each record in new data. • Algorithms for classification: different algorithms for different problems  Naïve Bayes  Adaptive Bayes Network  Support Vector Machine  Decision Tree Classification is used in: customer segmentation, sentiment analysis, competitive analysis, business modeling, credit analysis, Smart content, Fraud and terrorist detection, Diagnosis support, Patent & Drug discovery Treparel KMX – All rights reserved 2012 www.treparel.com 13
  • 14. Text Mining algorithms and features Feature Naive Bayes Adaptive Suport Vector Decision Tree Bayes Machine Network Speed Very fast Fast Fast with Fast active learning Accuracy Good in many Good in many Significant Good in many domains domains domains Transparancy No rules (black Rules for No rules (black Rules box) box) Missing value Missing value Missing value Sparse Data Missing value intrepretation Treparel KMX – All rights reserved 2012 www.treparel.com 14
  • 15. What is Support Vector Machine Learning? State of the Art algorithm • SVM is a state of the art classification and regression algorithm • The SVM optimization procedure maximizes predictive accuracy while automatically avoiding over-fitting the training data • SVM projects the input data into a kernel space. Then it builds a linear model in this kernel space • SVM performs well with real world applications such as classifying text, recognizing hand-written characters, classifying images, as well as bioinformatics and bio sequence analysis. • SVM are the standard tools for machine learning and data mining Treparel KMX – All rights reserved 2012 www.treparel.com 15
  • 16. What is Support Vector Machine Learning? Classical Data Mining vs SVM Classical Statistics SVM - Support Vector Machines  Hypothesis on Data  Study of the model family: distribution the VC dimension  Large number of dimensions  Number of dimensions can be implies large number of model very high because generalization parameters which leads to is controlled generalization problems  Modeling seeks to get the best  Modeling seeks to get the best Fit compromise between Fit and Robustness  Manual iterations and time  Automation is possible are necessary Treparel KMX – All rights reserved 2012
  • 17. What makes SVM such a unique technology? • Strong theoretical foundation (Vapnik-Chervonenkis theory) • There is no upper limit on the number of attributes ; Only constraint is the hardware • Good generalization to novel data • SVM is the preferred algorithm for sparse data • Algorithm of choice for challenging high-dimensional data • SVM supports active learning. – SVM models grow as the size of the training set increases, big data sets would be difficult to handle. – Aative learning forces the SVM algorithm to restrict learning to the most informative training examples. • SVM automatically selects a kernel • You can control both the model quality (accuracy) and the performance (build time) Treparel KMX – All rights reserved 2012 www.treparel.com 17
  • 18. What makes SVM unique? SVM gives you control over the models Robustness High Robustness Under Fit Model Robust Model High Robustness Low Training Error Low Test Training Error = Test Error Error Low Over Fit Model Robustness Low Robustness No Training Error, High Test Error Low accuracy High accuracy Quality of fit Treparel KMX – All rights reserved 2012 www.treparel.com 18
  • 19. What makes SVM unique? SVM gives you control over the models Need more training data Safe to Deploy High Robustness (rows) Need more data Need more variables (rows/columns) Low (columns) or different model or different model type type Low High Quality Treparel KMX – All rights reserved 2012 www.treparel.com 19
  • 20. Treparel is a leading technology solution provider in Big Data Text Analytics & Visualization Treparel Delftechpark 26 2628 XH Delft The Netherlands www.treparel.com Treparel KMX – All rights reserved 2012 www.treparel.com 20
  翻译: