Developing Recommendation System to provide a PersonalizedLearning experience at Chegg

Confidential Material / © 2020 Chegg, Inc. / All Rights Reserved
Developing Recommendation
System to provide a
Personalized
Learning experience at Chegg
Sanghamitra Deb
Staff Data Scientist
Chegg Inc

Outline • Recommendations at Chegg.
• Organizing Content – Knowledge Graph
• Deep Dive : Content Classifications
• Cross Product Recommendations.
• Takeaways.
2

Recommendations at Chegg
3
Goal of Recommendations at Chegg is providing the best possible
learning experience to Students. This is fueled by high quality
content.
Recommender Systems provide a backbone to surface the most
relevant content to a student. Organizing content into a knowledge
graph and detecting patterns in student behavior helps us
personalize student experience.

Recommendations at Chegg
Chegg Study Home Page
4
Multiple services: text book
rentals, question answering,
online tutoring, flashcards,
writing, math solver, etc.

Knowledge Graph
Subject
Course
Course
Course
Concept
Concept
Concept
Sub-
concepts
Physics
Electricity
and
Magnetism
Mechanics
Quantum
Physics
Velocity

Connecting conetnt to the Knowledge Graph
Subject
Course
Concept
Sub-
concepts
A rightward-moving bicycle increases its speed from 2.0 m/s to 12.0 m/s. Is the
bicycle accelerating?
Writing tools
Machine
Learning
Classifiers
Mitosis
a type of cell division that results in two daughter
cells each having the same number and kind of
chromosomes as the parent nucleus, typical of
ordinary tissue growth.
Get your physics paper checked by an expert

Connecting users to the Knowledge Graph
Subject
Course
Concept
Sub-
concepts
A rightward-moving bicycle
increases its speed from 2.0 m/s
to 12.0 m/s. Is the bicycle
accelerating?
Writing
tools
Machine
Learning
Classifiers
Physics
101
Acceleration
Do you need help
writing a physics
paper?
Edges are created
between users and
Biology
Mitosis

Content Classification Pipeline
Text Pre-processing Collecting Training Data Model Building
Offline
SME
• Reduces noise
• Ensures quality
• Improves overall
performance
• Training Data Collection
/ Examples of classes that we are
trying to model
• Model performance is directly
correlated with quality of training
data
Model Evaluation
• Model selection
• Architecture
• Parameter
Tuning
Student
Online
8

Classification Problem
Assigning decks to Courses
• Decks are list of cards grouped together by students
for studying.
• There are several thousand courses, typically it is
more granular than subjects but less granular than
concepts.

• TFIDF features with an SVM classifier
Pros –
• Gives decent performance on small training data.
• Straightforward training pipeline.
Cons –
• Does not do well for subjects dominated by symbols,
• Including word & character based features makes the token space & model extremely large.
• Character Based CNN.
• Has the ability to deal with out of vocabulary words. This makes it particularly suitable for user
generated raw text.
• Works for multiple languages.
• Model size is small since the tokens are limited to the number of characters ~ 70. This makes real life
deployments easier and faster.
• Networks with convolutional and pooling layers are useful for classification tasks in which we expect
to find strong local clues regarding class membership.
Modeling Approaches

CNN Model Architecture
GlobalMaxPool1D
Convolutions
Feature
Length
DenseLayer
Dropout
Prelu
Norm
….
Convolution &
pool layer
….
2 layers of convolution & pooling

Multi-task Modeling
CNN
Model
CNN
Model
Cross Entropy Loss
Output
Card Front
Front Back
Card Back
Similarity Function
Card
CNN
Model
Softmax -- # of courses
Cross Entropy Loss
Output
Two tasks
• Similarity between card
front and back.
• Classification of courses
Adding another task
improves the accuracy
by a few percent.

Model Performance
Top-3 -- 73% accuracy on offline-
test data.
Challenges
• Imbalanced Training
Data
• Some classes have
too few training
examples
Solutions
• Collect More training
data.
• Use rule based
techniques to augment
training data

Cross Product Recommendations!
Cold Start Problem: Users often use one product such as Chegg Study and may
just browse other products that provide Chegg Practice or Flash Cards.
Solutions:
Personalized
• Content Filtering --- Use KG to determine courses, concepts and sub-concepts
that users are currently studying and recommend trending content in that
category.
• Text Similarity --- Based on their content engagement. Use in house language
models optimized for Chegg content.

• Content drives recommendations
• High quality
• Relevant
• Organizing the content into a Knowledge Graph (KG) facilitates content
based recommendations.
• Accuracy of classifiers is important --- models are constantly iterated even
for few percent gain.
• KG helps connect students and courses/concepts which helps with
personalized recommendation
• Cross product recommendations are possible through KG.
• Cold Start problems are made easier.
Takeaways

Questions

Developing Recommendation System to provide a PersonalizedLearning experience at Chegg

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Developing Recommendation System to provide a PersonalizedLearning experience at Chegg

Similar to Developing Recommendation System to provide a PersonalizedLearning experience at Chegg (20)

More from Sanghamitra Deb

More from Sanghamitra Deb (14)

Recently uploaded

Recently uploaded (20)