尊敬的 微信汇率:1円 ≈ 0.046239 元 支付宝汇率:1円 ≈ 0.04633元 [退出登录]
SlideShare a Scribd company logo
Possibilities and limitations
of AI-boosted multi-
categorization for patents,
scientific literature, and
web
AI
Methodology Optimisation Automation Analysis & Synthesis
2015
2016
2017
2018
2019
7 Years of Business Intelligence Developpment
2020
2021
Climbing on the Matterhorn
The everyday use of AI-driven algorithms for
data search, analysis and synthesis comes with
important time savings
but also reveals the
need to understand and accept
the limitations of the technology
A workshop report
Image: geniusgadget.com
HUMAN
INTELLIGENCE
+
ARTIFICIAL
INTELLIGENCE
=
AUGMENTED
INTELLIGENCE
Prepare the case studies by exposing the possibilities and limits of the AI-
assisted automatic categorization process.
Discuss the challenges faced in setting up this process:
• Definition of the trainingset (type of data to be processed, Patent or NPL or both)
• Development of classifiers (single vs multi, selected fields, margin of error to be defined)
• Volume handled: > 300,000
Process Advantage:
• Collaboration with experts in the field
• Multi categorization
• Ability to select the fields to analyze
• Combine AI classification tool with collaborative monitoring tool – take the best of two worlds
Restitution of results in various forms with possible developments on demand
Monitor
oDifferent types of data to process (patent, NPL, web, internal documents)
oIncreasing volume of information to monitor
oMultiple data sources to consult
oLimited time and resources
How to
o Process this ever increasing flow of data without devoting too much time and resources ?
o Boost customer efficiency and bring customer expertise where it is most valuable?
Automate
o Automate the monitoring process from end to end
o Optimize the data classification process by integrating AI
Automate
o Provide a data selection and classification accuracy close to an expert work with
higher stability than humans
o Save time and resources
o Process quickly and efficiently large volumes of data on a regular basis
Import Result
AI classification
Input:
Patent, NPL, Web,
internal documents
Output:
RAPID, export,
synchronisation
Free yourself from doing repetitive tasks
Focus on what’s most matter: the result
SmartCat
SmartCat
Powered by
• Averbis
Integrated in
• RAPID
Designed to
• Process all types of data
• Handle large volumes of data
Empower you to
• Detect relevant documents
• Apply single or multi-label classifications
5.Run the classification process
6.Validate the AI classification
3.Run the learning process
4.Validate the prediction model
1.Provide a training set
2.Set the AI classifier
Key during the definition
and validation steps
Expert
contribution
Classification
• Balanced set
• Unambiguous classification
• Distinctive categories
Trainingset
• Field selection
• Classification mode: Single VS Multi
Classifier
• Metrics validation
Prediction
model
• Classification assessment
• Relevance labels assigned
o Precision
o Recall
o F1 score
Precision Recall F1-Score
1 1 1.00
0.5 0.5 0.50
0.9 0.5 0.64
0.9 0.9 0.90
0.8 0.8 0.80
0.7 0.9 0.79
0.1 0.9 0.18
0.2 0.9 0.33
0.3 0.8 0.44
0.4 0.8 0.53
0.5 0.8 0.62
0.6 0.9 0.72
0.7 0.9 0.79
0.8 0.9 0.85
0.9 0.9 0.90
1 1 1.00
1 1 1.00
1 1 1.00
1 1 1.00
1 1 1.00
0
0,2
0,4
0,6
0,8
1
1,2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Precision Recall F1-Score
Precision of a classifier: Ratio of good documents in a category
Recall of a classifier: Ratio of relevant documents in a category
F1-Score of a classifier: Combination of Precision and Recall
Depends on
o Thematic
o Data quality
o Classification uncertainties
and complexity
Contributes to
o Subject matter expert(s)
o Unambiguous and distinctive
classification
o Delimited search scope
What we intended to do (and some times managed to do)
Raw data One classifier Final result
What we finally did
Raw
data
Binary
classifier
Classifier #1
Classifier #2
Classifier #3
…
Bad
Result #1
Result #2
Result #3
…
Good Final result
Relevance rate estimated for each of the
3 monitoring processes implemented
Number of iterations done before
reaching a suitable relevance rate
Time to multi-classify 1000 documents
>80%
~3
4 min
Fully automated process
hosted in one place
Experts focus on the result
Patent, NPL, Web, internal documents
Import
Classification
Restitution
SmartCat
We did it !
Automated data upload Classification result
SmartCat
AI classification
Expert reviews
Weekly updates Expert evaluation
User communication
AI training based on
expert feedback
Case Study No 1: «enough time, no focus»
Major hurdles Overcome by
Implement a flexible and easy-to-use process Developping RAPID in collaboration with reknown experts in the field
Ambiguities or uncertainties when defining the
classification and the trainingset
Providing reliable definition and selection
Assess the classification quality Involving motivated experts
Shift noted from the initial request Redefining the classification in agreement with the experts involved
Synchronise data between RAPID and PS Setting an automated workflow compatible with RAPID and PS
Reliability control Real time monitoring every step of the automated process
Case Study No 1: «enough time, no focus»
Set-up
oChose a sufficiently large monitoring strategy for the alert
(Criteria: find all the existing documents under observation or with oppositions)
oTrain a classifier with all observation and opposition cases and the same quantity of
clearly non-relevant documents
oTake two month of monitoring data → 4’600 newly published documents
oConfigure SmartCat: 5 certainly relevant documents, 6 probably relevant documents and
62 potentially relevant documents
oCheck these 11 documents with Central IP → Yes, they are relevant.
Case Study No 2: «no time, no monitoring»
Set-up
0 500 1000 1500 2000 2500 3000
Non relevant – very sure
Non relevant – sure
Non relevant – not sure
Relevant – not sure
Relevant – sure
Relevant – very sure 5
6
62
601
909
2823
Effect of additional training cycles
Case Study No 2: «no time, no monitoring»
Climbing on the Matterhorn
1. Establish a good training set
2. Configure the classifier system carefully
3. Don’t despair when your first attempt(s)
fail(s)
4. Take a good guide
5. Study the AI-System carefully, identify
the gradients of convergence
6. Repeat steps 1-5 in cycles until you…
7. Reach the summit
8. Enjoy the view !
9. Be aware that every mountain is
different
From the
data lake
To the key
document
The Project Team
Jean-Baptiste Porier
Senior Data
Analyst
David Borel
Head of
Foresight Team
Harald Jenny
CEO
The time for AI implementation is now.
JACQUET DROZ 1
2002 NEUCHÂTEL
WWW.CENTREDOC.SWISS
INFO@CENTREDOC.CH
+41 32 720 51 31

More Related Content

Similar to AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization for patents, scientific literature, and web Harald Jenny (CENTREDOC, CH)

Data Science Training and Placement
Data Science Training and PlacementData Science Training and Placement
Data Science Training and Placement
AkhilGGM
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
KumarNaik21
 
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
Neotys_Partner
 
Shikha fdp 62_14july2017
Shikha fdp 62_14july2017Shikha fdp 62_14july2017
Shikha fdp 62_14july2017
Dr. Shikha Mehta
 
Data Connectors San Antonio Cybersecurity Conference 2018
Data Connectors San Antonio Cybersecurity Conference 2018Data Connectors San Antonio Cybersecurity Conference 2018
Data Connectors San Antonio Cybersecurity Conference 2018
Interset
 
Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)
SayyedYusufali
 
Data science training in hyd pdf converted (1)
Data science training in hyd pdf converted (1)Data science training in hyd pdf converted (1)
Data science training in hyd pdf converted (1)
SayyedYusufali
 
Data science training in hydpdf converted (1)
Data science training in hydpdf  converted (1)Data science training in hydpdf  converted (1)
Data science training in hydpdf converted (1)
SayyedYusufali
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?
DIGITALSAI1
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
KumarNaik21
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
SayyedYusufali
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabad
VamsiNihal
 
Data science training in Hyderabad
Data science  training in HyderabadData science  training in Hyderabad
Data science training in Hyderabad
saitejavella
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training Hyderabad
Nithinsunil1
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabad
VamsiNihal
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
SayyedYusufali
 
data science training and placement
data science training and placementdata science training and placement
data science training and placement
SaiprasadVella
 
online data science training
online data science trainingonline data science training
online data science training
DIGITALSAI1
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabad
VamsiNihal
 
data science online training in hyderabad
data science online training in hyderabaddata science online training in hyderabad
data science online training in hyderabad
VamsiNihal
 

Similar to AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization for patents, scientific literature, and web Harald Jenny (CENTREDOC, CH) (20)

Data Science Training and Placement
Data Science Training and PlacementData Science Training and Placement
Data Science Training and Placement
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
 
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
 
Shikha fdp 62_14july2017
Shikha fdp 62_14july2017Shikha fdp 62_14july2017
Shikha fdp 62_14july2017
 
Data Connectors San Antonio Cybersecurity Conference 2018
Data Connectors San Antonio Cybersecurity Conference 2018Data Connectors San Antonio Cybersecurity Conference 2018
Data Connectors San Antonio Cybersecurity Conference 2018
 
Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)
 
Data science training in hyd pdf converted (1)
Data science training in hyd pdf converted (1)Data science training in hyd pdf converted (1)
Data science training in hyd pdf converted (1)
 
Data science training in hydpdf converted (1)
Data science training in hydpdf  converted (1)Data science training in hydpdf  converted (1)
Data science training in hydpdf converted (1)
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabad
 
Data science training in Hyderabad
Data science  training in HyderabadData science  training in Hyderabad
Data science training in Hyderabad
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training Hyderabad
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabad
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
 
data science training and placement
data science training and placementdata science training and placement
data science training and placement
 
online data science training
online data science trainingonline data science training
online data science training
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabad
 
data science online training in hyderabad
data science online training in hyderabaddata science online training in hyderabad
data science online training in hyderabad
 

More from Dr. Haxel Consult

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementAI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
Dr. Haxel Consult
 
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
Dr. Haxel Consult
 
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
Dr. Haxel Consult
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
Dr. Haxel Consult
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
Dr. Haxel Consult
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
Dr. Haxel Consult
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
Dr. Haxel Consult
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
Dr. Haxel Consult
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
Dr. Haxel Consult
 
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
Dr. Haxel Consult
 
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
Dr. Haxel Consult
 
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
Dr. Haxel Consult
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
Dr. Haxel Consult
 
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
Dr. Haxel Consult
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
Dr. Haxel Consult
 
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterAI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance Center
Dr. Haxel Consult
 
AI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IPAI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IP
Dr. Haxel Consult
 
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCAI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOC
Dr. Haxel Consult
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
Dr. Haxel Consult
 
The Artificial Intelligence Conference on Search, Data and Text Mining, Analy...
The Artificial Intelligence Conference on Search, Data and Text Mining, Analy...The Artificial Intelligence Conference on Search, Data and Text Mining, Analy...
The Artificial Intelligence Conference on Search, Data and Text Mining, Analy...
Dr. Haxel Consult
 

More from Dr. Haxel Consult (20)

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementAI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
 
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
 
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
 
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
 
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
 
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
 
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterAI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance Center
 
AI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IPAI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IP
 
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCAI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOC
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
 
The Artificial Intelligence Conference on Search, Data and Text Mining, Analy...
The Artificial Intelligence Conference on Search, Data and Text Mining, Analy...The Artificial Intelligence Conference on Search, Data and Text Mining, Analy...
The Artificial Intelligence Conference on Search, Data and Text Mining, Analy...
 

Recently uploaded

peru primero de la alianza con el pacifico
peru primero de la alianza con el pacificoperu primero de la alianza con el pacifico
peru primero de la alianza con el pacifico
FernandoGuevaraVentu2
 
Top UI/UX Design Trends for 2024: What Business Owners Need to Know
Top UI/UX Design Trends for 2024: What Business Owners Need to KnowTop UI/UX Design Trends for 2024: What Business Owners Need to Know
Top UI/UX Design Trends for 2024: What Business Owners Need to Know
Onepixll
 
Call Girls Dehradun 8824825030 Escort In Dehradun service 24X7
Call Girls Dehradun 8824825030 Escort In Dehradun service 24X7Call Girls Dehradun 8824825030 Escort In Dehradun service 24X7
Call Girls Dehradun 8824825030 Escort In Dehradun service 24X7
manalishivani8
 
japie swanepoel_ ai windhoek june 2024.pptx
japie swanepoel_ ai windhoek june 2024.pptxjapie swanepoel_ ai windhoek june 2024.pptx
japie swanepoel_ ai windhoek june 2024.pptx
japie swanepoel
 
Call Girls In Chennai 💯Call Us 🔝 8824825030 🔝Independent Chennai Escorts Serv...
Call Girls In Chennai 💯Call Us 🔝 8824825030 🔝Independent Chennai Escorts Serv...Call Girls In Chennai 💯Call Us 🔝 8824825030 🔝Independent Chennai Escorts Serv...
Call Girls In Chennai 💯Call Us 🔝 8824825030 🔝Independent Chennai Escorts Serv...
payalgupta2u
 
Lesson6 in spreadsheet 2024 for g12..ppt
Lesson6 in spreadsheet 2024 for g12..pptLesson6 in spreadsheet 2024 for g12..ppt
Lesson6 in spreadsheet 2024 for g12..ppt
ReyLouieSedigo1
 
一比一原版(uc毕业证书)加拿大卡尔加里大学毕业证如何办理
一比一原版(uc毕业证书)加拿大卡尔加里大学毕业证如何办理一比一原版(uc毕业证书)加拿大卡尔加里大学毕业证如何办理
一比一原版(uc毕业证书)加拿大卡尔加里大学毕业证如何办理
dtagbe
 
India Cyber Threat Report of 2024 with year
India Cyber Threat Report of 2024 with yearIndia Cyber Threat Report of 2024 with year
India Cyber Threat Report of 2024 with year
AkashKumar1733
 
Measuring and Understanding the Route Origin Validation (ROV) in RPKI
Measuring and Understanding the Route Origin Validation (ROV) in RPKIMeasuring and Understanding the Route Origin Validation (ROV) in RPKI
Measuring and Understanding the Route Origin Validation (ROV) in RPKI
APNIC
 
Unlimited Fun With Call Girls Hyderabad ✅ 7737669865 💘 FULL CASH PAYMENT
Unlimited Fun With Call Girls Hyderabad ✅ 7737669865 💘 FULL CASH PAYMENTUnlimited Fun With Call Girls Hyderabad ✅ 7737669865 💘 FULL CASH PAYMENT
Unlimited Fun With Call Girls Hyderabad ✅ 7737669865 💘 FULL CASH PAYMENT
keshavtiwari584
 
Tesla Humanoid Robot - PPT in 11 Simple Slide
Tesla Humanoid Robot - PPT in 11 Simple SlideTesla Humanoid Robot - PPT in 11 Simple Slide
Tesla Humanoid Robot - PPT in 11 Simple Slide
abzjkr
 
Full Night Fun With Call Girls Lucknow📞7737669865 At Very Cheap Rates Doorste...
Full Night Fun With Call Girls Lucknow📞7737669865 At Very Cheap Rates Doorste...Full Night Fun With Call Girls Lucknow📞7737669865 At Very Cheap Rates Doorste...
Full Night Fun With Call Girls Lucknow📞7737669865 At Very Cheap Rates Doorste...
monuc3758 $S2
 
Bangalore Call Girls 9079923931 With -Cuties' Hot Call Girls
Bangalore Call Girls 9079923931 With -Cuties' Hot Call GirlsBangalore Call Girls 9079923931 With -Cuties' Hot Call Girls
Bangalore Call Girls 9079923931 With -Cuties' Hot Call Girls
narwatsonia7
 
❣Ramp Model Call Girls Chennai 💯Call Us 🔝 7737669865 🔝💃Independent Chennai Es...
❣Ramp Model Call Girls Chennai 💯Call Us 🔝 7737669865 🔝💃Independent Chennai Es...❣Ramp Model Call Girls Chennai 💯Call Us 🔝 7737669865 🔝💃Independent Chennai Es...
❣Ramp Model Call Girls Chennai 💯Call Us 🔝 7737669865 🔝💃Independent Chennai Es...
tiktokhotymodel
 
Karol Bagh Call Girls Delhi 🔥 9711199012 ❄- Pick Your Dream Call Girls with 1...
Karol Bagh Call Girls Delhi 🔥 9711199012 ❄- Pick Your Dream Call Girls with 1...Karol Bagh Call Girls Delhi 🔥 9711199012 ❄- Pick Your Dream Call Girls with 1...
Karol Bagh Call Girls Delhi 🔥 9711199012 ❄- Pick Your Dream Call Girls with 1...
sanju baba
 
Nashik Call Girls 💯Call Us 🔝 7374876321 🔝 💃 Independent Female Escort Service
Nashik Call Girls 💯Call Us 🔝 7374876321 🔝 💃 Independent Female Escort ServiceNashik Call Girls 💯Call Us 🔝 7374876321 🔝 💃 Independent Female Escort Service
Nashik Call Girls 💯Call Us 🔝 7374876321 🔝 💃 Independent Female Escort Service
sabanasarkari36
 
Unlimited Short Call Girls Navi Mumbai ✅ 9967824496 FULL CASH PAYMENT
Unlimited Short Call Girls Navi Mumbai ✅ 9967824496 FULL CASH PAYMENTUnlimited Short Call Girls Navi Mumbai ✅ 9967824496 FULL CASH PAYMENT
Unlimited Short Call Girls Navi Mumbai ✅ 9967824496 FULL CASH PAYMENT
rajesh344555
 
Network Security and Cyber Laws (Complete Notes) for B.Tech/BCA/BSc. IT
Network Security and Cyber Laws (Complete Notes) for B.Tech/BCA/BSc. ITNetwork Security and Cyber Laws (Complete Notes) for B.Tech/BCA/BSc. IT
Network Security and Cyber Laws (Complete Notes) for B.Tech/BCA/BSc. IT
Sarthak Sobti
 
🔥Call Girls In Chandigarh 💯Call Us 🔝 6350257716 🔝💃Top Class Call Girl Service...
🔥Call Girls In Chandigarh 💯Call Us 🔝 6350257716 🔝💃Top Class Call Girl Service...🔥Call Girls In Chandigarh 💯Call Us 🔝 6350257716 🔝💃Top Class Call Girl Service...
🔥Call Girls In Chandigarh 💯Call Us 🔝 6350257716 🔝💃Top Class Call Girl Service...
THE MOST
 
一比一原版圣托马斯大学毕业证(UST毕业证书)学历如何办理
一比一原版圣托马斯大学毕业证(UST毕业证书)学历如何办理一比一原版圣托马斯大学毕业证(UST毕业证书)学历如何办理
一比一原版圣托马斯大学毕业证(UST毕业证书)学历如何办理
uqbyfm
 

Recently uploaded (20)

peru primero de la alianza con el pacifico
peru primero de la alianza con el pacificoperu primero de la alianza con el pacifico
peru primero de la alianza con el pacifico
 
Top UI/UX Design Trends for 2024: What Business Owners Need to Know
Top UI/UX Design Trends for 2024: What Business Owners Need to KnowTop UI/UX Design Trends for 2024: What Business Owners Need to Know
Top UI/UX Design Trends for 2024: What Business Owners Need to Know
 
Call Girls Dehradun 8824825030 Escort In Dehradun service 24X7
Call Girls Dehradun 8824825030 Escort In Dehradun service 24X7Call Girls Dehradun 8824825030 Escort In Dehradun service 24X7
Call Girls Dehradun 8824825030 Escort In Dehradun service 24X7
 
japie swanepoel_ ai windhoek june 2024.pptx
japie swanepoel_ ai windhoek june 2024.pptxjapie swanepoel_ ai windhoek june 2024.pptx
japie swanepoel_ ai windhoek june 2024.pptx
 
Call Girls In Chennai 💯Call Us 🔝 8824825030 🔝Independent Chennai Escorts Serv...
Call Girls In Chennai 💯Call Us 🔝 8824825030 🔝Independent Chennai Escorts Serv...Call Girls In Chennai 💯Call Us 🔝 8824825030 🔝Independent Chennai Escorts Serv...
Call Girls In Chennai 💯Call Us 🔝 8824825030 🔝Independent Chennai Escorts Serv...
 
Lesson6 in spreadsheet 2024 for g12..ppt
Lesson6 in spreadsheet 2024 for g12..pptLesson6 in spreadsheet 2024 for g12..ppt
Lesson6 in spreadsheet 2024 for g12..ppt
 
一比一原版(uc毕业证书)加拿大卡尔加里大学毕业证如何办理
一比一原版(uc毕业证书)加拿大卡尔加里大学毕业证如何办理一比一原版(uc毕业证书)加拿大卡尔加里大学毕业证如何办理
一比一原版(uc毕业证书)加拿大卡尔加里大学毕业证如何办理
 
India Cyber Threat Report of 2024 with year
India Cyber Threat Report of 2024 with yearIndia Cyber Threat Report of 2024 with year
India Cyber Threat Report of 2024 with year
 
Measuring and Understanding the Route Origin Validation (ROV) in RPKI
Measuring and Understanding the Route Origin Validation (ROV) in RPKIMeasuring and Understanding the Route Origin Validation (ROV) in RPKI
Measuring and Understanding the Route Origin Validation (ROV) in RPKI
 
Unlimited Fun With Call Girls Hyderabad ✅ 7737669865 💘 FULL CASH PAYMENT
Unlimited Fun With Call Girls Hyderabad ✅ 7737669865 💘 FULL CASH PAYMENTUnlimited Fun With Call Girls Hyderabad ✅ 7737669865 💘 FULL CASH PAYMENT
Unlimited Fun With Call Girls Hyderabad ✅ 7737669865 💘 FULL CASH PAYMENT
 
Tesla Humanoid Robot - PPT in 11 Simple Slide
Tesla Humanoid Robot - PPT in 11 Simple SlideTesla Humanoid Robot - PPT in 11 Simple Slide
Tesla Humanoid Robot - PPT in 11 Simple Slide
 
Full Night Fun With Call Girls Lucknow📞7737669865 At Very Cheap Rates Doorste...
Full Night Fun With Call Girls Lucknow📞7737669865 At Very Cheap Rates Doorste...Full Night Fun With Call Girls Lucknow📞7737669865 At Very Cheap Rates Doorste...
Full Night Fun With Call Girls Lucknow📞7737669865 At Very Cheap Rates Doorste...
 
Bangalore Call Girls 9079923931 With -Cuties' Hot Call Girls
Bangalore Call Girls 9079923931 With -Cuties' Hot Call GirlsBangalore Call Girls 9079923931 With -Cuties' Hot Call Girls
Bangalore Call Girls 9079923931 With -Cuties' Hot Call Girls
 
❣Ramp Model Call Girls Chennai 💯Call Us 🔝 7737669865 🔝💃Independent Chennai Es...
❣Ramp Model Call Girls Chennai 💯Call Us 🔝 7737669865 🔝💃Independent Chennai Es...❣Ramp Model Call Girls Chennai 💯Call Us 🔝 7737669865 🔝💃Independent Chennai Es...
❣Ramp Model Call Girls Chennai 💯Call Us 🔝 7737669865 🔝💃Independent Chennai Es...
 
Karol Bagh Call Girls Delhi 🔥 9711199012 ❄- Pick Your Dream Call Girls with 1...
Karol Bagh Call Girls Delhi 🔥 9711199012 ❄- Pick Your Dream Call Girls with 1...Karol Bagh Call Girls Delhi 🔥 9711199012 ❄- Pick Your Dream Call Girls with 1...
Karol Bagh Call Girls Delhi 🔥 9711199012 ❄- Pick Your Dream Call Girls with 1...
 
Nashik Call Girls 💯Call Us 🔝 7374876321 🔝 💃 Independent Female Escort Service
Nashik Call Girls 💯Call Us 🔝 7374876321 🔝 💃 Independent Female Escort ServiceNashik Call Girls 💯Call Us 🔝 7374876321 🔝 💃 Independent Female Escort Service
Nashik Call Girls 💯Call Us 🔝 7374876321 🔝 💃 Independent Female Escort Service
 
Unlimited Short Call Girls Navi Mumbai ✅ 9967824496 FULL CASH PAYMENT
Unlimited Short Call Girls Navi Mumbai ✅ 9967824496 FULL CASH PAYMENTUnlimited Short Call Girls Navi Mumbai ✅ 9967824496 FULL CASH PAYMENT
Unlimited Short Call Girls Navi Mumbai ✅ 9967824496 FULL CASH PAYMENT
 
Network Security and Cyber Laws (Complete Notes) for B.Tech/BCA/BSc. IT
Network Security and Cyber Laws (Complete Notes) for B.Tech/BCA/BSc. ITNetwork Security and Cyber Laws (Complete Notes) for B.Tech/BCA/BSc. IT
Network Security and Cyber Laws (Complete Notes) for B.Tech/BCA/BSc. IT
 
🔥Call Girls In Chandigarh 💯Call Us 🔝 6350257716 🔝💃Top Class Call Girl Service...
🔥Call Girls In Chandigarh 💯Call Us 🔝 6350257716 🔝💃Top Class Call Girl Service...🔥Call Girls In Chandigarh 💯Call Us 🔝 6350257716 🔝💃Top Class Call Girl Service...
🔥Call Girls In Chandigarh 💯Call Us 🔝 6350257716 🔝💃Top Class Call Girl Service...
 
一比一原版圣托马斯大学毕业证(UST毕业证书)学历如何办理
一比一原版圣托马斯大学毕业证(UST毕业证书)学历如何办理一比一原版圣托马斯大学毕业证(UST毕业证书)学历如何办理
一比一原版圣托马斯大学毕业证(UST毕业证书)学历如何办理
 

AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization for patents, scientific literature, and web Harald Jenny (CENTREDOC, CH)

  • 1. Possibilities and limitations of AI-boosted multi- categorization for patents, scientific literature, and web
  • 2. AI Methodology Optimisation Automation Analysis & Synthesis 2015 2016 2017 2018 2019 7 Years of Business Intelligence Developpment 2020 2021
  • 3. Climbing on the Matterhorn The everyday use of AI-driven algorithms for data search, analysis and synthesis comes with important time savings but also reveals the need to understand and accept the limitations of the technology A workshop report
  • 5. Prepare the case studies by exposing the possibilities and limits of the AI- assisted automatic categorization process. Discuss the challenges faced in setting up this process: • Definition of the trainingset (type of data to be processed, Patent or NPL or both) • Development of classifiers (single vs multi, selected fields, margin of error to be defined) • Volume handled: > 300,000 Process Advantage: • Collaboration with experts in the field • Multi categorization • Ability to select the fields to analyze • Combine AI classification tool with collaborative monitoring tool – take the best of two worlds Restitution of results in various forms with possible developments on demand
  • 6. Monitor oDifferent types of data to process (patent, NPL, web, internal documents) oIncreasing volume of information to monitor oMultiple data sources to consult oLimited time and resources How to o Process this ever increasing flow of data without devoting too much time and resources ? o Boost customer efficiency and bring customer expertise where it is most valuable? Automate o Automate the monitoring process from end to end o Optimize the data classification process by integrating AI
  • 7. Automate o Provide a data selection and classification accuracy close to an expert work with higher stability than humans o Save time and resources o Process quickly and efficiently large volumes of data on a regular basis
  • 8. Import Result AI classification Input: Patent, NPL, Web, internal documents Output: RAPID, export, synchronisation Free yourself from doing repetitive tasks Focus on what’s most matter: the result SmartCat
  • 9. SmartCat Powered by • Averbis Integrated in • RAPID Designed to • Process all types of data • Handle large volumes of data Empower you to • Detect relevant documents • Apply single or multi-label classifications
  • 10. 5.Run the classification process 6.Validate the AI classification 3.Run the learning process 4.Validate the prediction model 1.Provide a training set 2.Set the AI classifier
  • 11. Key during the definition and validation steps Expert contribution Classification • Balanced set • Unambiguous classification • Distinctive categories Trainingset • Field selection • Classification mode: Single VS Multi Classifier • Metrics validation Prediction model • Classification assessment • Relevance labels assigned o Precision o Recall o F1 score
  • 12. Precision Recall F1-Score 1 1 1.00 0.5 0.5 0.50 0.9 0.5 0.64 0.9 0.9 0.90 0.8 0.8 0.80 0.7 0.9 0.79 0.1 0.9 0.18 0.2 0.9 0.33 0.3 0.8 0.44 0.4 0.8 0.53 0.5 0.8 0.62 0.6 0.9 0.72 0.7 0.9 0.79 0.8 0.9 0.85 0.9 0.9 0.90 1 1 1.00 1 1 1.00 1 1 1.00 1 1 1.00 1 1 1.00 0 0,2 0,4 0,6 0,8 1 1,2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Precision Recall F1-Score Precision of a classifier: Ratio of good documents in a category Recall of a classifier: Ratio of relevant documents in a category F1-Score of a classifier: Combination of Precision and Recall
  • 13. Depends on o Thematic o Data quality o Classification uncertainties and complexity Contributes to o Subject matter expert(s) o Unambiguous and distinctive classification o Delimited search scope
  • 14. What we intended to do (and some times managed to do) Raw data One classifier Final result
  • 15. What we finally did Raw data Binary classifier Classifier #1 Classifier #2 Classifier #3 … Bad Result #1 Result #2 Result #3 … Good Final result
  • 16. Relevance rate estimated for each of the 3 monitoring processes implemented Number of iterations done before reaching a suitable relevance rate Time to multi-classify 1000 documents >80% ~3 4 min
  • 17. Fully automated process hosted in one place Experts focus on the result Patent, NPL, Web, internal documents Import Classification Restitution SmartCat We did it !
  • 18. Automated data upload Classification result SmartCat AI classification Expert reviews Weekly updates Expert evaluation User communication AI training based on expert feedback Case Study No 1: «enough time, no focus»
  • 19. Major hurdles Overcome by Implement a flexible and easy-to-use process Developping RAPID in collaboration with reknown experts in the field Ambiguities or uncertainties when defining the classification and the trainingset Providing reliable definition and selection Assess the classification quality Involving motivated experts Shift noted from the initial request Redefining the classification in agreement with the experts involved Synchronise data between RAPID and PS Setting an automated workflow compatible with RAPID and PS Reliability control Real time monitoring every step of the automated process Case Study No 1: «enough time, no focus»
  • 20. Set-up oChose a sufficiently large monitoring strategy for the alert (Criteria: find all the existing documents under observation or with oppositions) oTrain a classifier with all observation and opposition cases and the same quantity of clearly non-relevant documents oTake two month of monitoring data → 4’600 newly published documents oConfigure SmartCat: 5 certainly relevant documents, 6 probably relevant documents and 62 potentially relevant documents oCheck these 11 documents with Central IP → Yes, they are relevant. Case Study No 2: «no time, no monitoring»
  • 21. Set-up 0 500 1000 1500 2000 2500 3000 Non relevant – very sure Non relevant – sure Non relevant – not sure Relevant – not sure Relevant – sure Relevant – very sure 5 6 62 601 909 2823 Effect of additional training cycles Case Study No 2: «no time, no monitoring»
  • 22. Climbing on the Matterhorn 1. Establish a good training set 2. Configure the classifier system carefully 3. Don’t despair when your first attempt(s) fail(s) 4. Take a good guide 5. Study the AI-System carefully, identify the gradients of convergence 6. Repeat steps 1-5 in cycles until you… 7. Reach the summit 8. Enjoy the view ! 9. Be aware that every mountain is different
  • 23. From the data lake To the key document The Project Team Jean-Baptiste Porier Senior Data Analyst David Borel Head of Foresight Team Harald Jenny CEO
  • 24. The time for AI implementation is now. JACQUET DROZ 1 2002 NEUCHÂTEL WWW.CENTREDOC.SWISS INFO@CENTREDOC.CH +41 32 720 51 31
  翻译: