尊敬的 微信汇率:1円 ≈ 0.046078 元 支付宝汇率:1円 ≈ 0.046168元 [退出登录]
SlideShare a Scribd company logo
Recursive Neural Networks
Sangwoo Mo
Recursive Neural Network (RNN) - Motivation
• Motivation: Many real objects has a recursive structure,
e.g. Images are sum of segments, and sentences are sum of words
Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011.
Image from Stanford CS224N Lecture Note 14.
Recursive Neural Network (RNN) - Motivation
• Motivation: Can we learn a good representation for the recursive structures?
• Recursive structures (phrases) and components (words) should lie on the same space,
e.g. the country of my birth ≃ Germany, France, etc.
Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011.
Image from Stanford CS224N Lecture Note 14.
Recursive Neural Network (RNN) - Model
• Goal: Design a neural network that features are recursively constructed
• Each module maps two children to one parents, lying on the same vector space
• To give the order of recursion, we give a score (plausibility) for each node
• Hence, the neural network module outputs (representation, score) pairs
Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011.
Image from Stanford CS224N Lecture Note 14.
Each line is
Recursive Neural Network (RNN) - Model
• cf. Note that recurrent neural network is a special case of recursive neural network
Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011.
Image from Ratsgo’s blog for text mining.
Recursive Neural Network (RNN) - Inference
• Each step, merge adjacent two nodes
• With greedy algorithm, it only requires 𝑂(𝑁) time for inference
Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011.
Image from Stanford CS224N Lecture Note 14.
Recursive Neural Network (RNN) - Inference
• Each step, merge adjacent two nodes
• With greedy algorithm, it only requires 𝑂(𝑁) time for inference
Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011.
Image from Stanford CS224N Lecture Note 14.
Recursive Neural Network (RNN) - Inference
• Each step, merge adjacent two nodes
• With greedy algorithm, it only requires 𝑂(𝑁) time for inference
Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011.
Image from Stanford CS224N Lecture Note 14.
Recursive Neural Network (RNN) - Inference
• Each step, merge adjacent two nodes
• With greedy algorithm, it only requires 𝑂(𝑁) time for inference
Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011.
Image from Stanford CS224N Lecture Note 14.
Recursive Neural Network (RNN) - Inference
• We can apply beam search to improve the performance
• Beam search: Keep 𝑘-memory for each step (Greedy = 1-Beam search)
Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011.
Image from Ratsgo’s blog for text mining.
Recursive Neural Network (RNN) - Training
• Let (sentence, tree) pair (𝑥𝑖, 𝑦𝑖) are given
• Let 𝑠(𝑥𝑖, 𝑦) be score of tree 𝑦, sum of scores of every non-leaf nodes
• Let 𝐴(𝑥𝑖) be candidate trees (approximated by beam search)
• Then max-margin objective (maximize) is
where Δ 𝑦, 𝑦𝑖 is number of wrong subtrees
• We can also give a classification loss for each node
(use node’s feature as input for the classifier)
Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011.
Image from Stanford CS224N Lecture Note 14.
Increases 𝑠(𝑥𝑖, 𝑦𝑖) decreases 𝑠(𝑥𝑖, 𝑦) if 𝑠 𝑥𝑖, 𝑦 + Δ 𝑦, 𝑦𝑖 > 𝑠(𝑥𝑖, 𝑦𝑖)
class vector
Recursive Neural Network (RNN) - Experiments
• After training, both leaf and higher nodes learn the valid representation
• Image segmentation: Infer classes for segments (feature extractor is jointly trained)
• Phrase clustering: Nearest neighborhood on phrase features
Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011.
Recursive Neural Network (RNN) - Appendix
• Preprocessing: How to convert segments/words to the representation space ℝ 𝑛
• Word: Use pretrained word2vec model (𝑉 → ℝ 𝑛)
• Image: Extract hand-crafted features in ℝ 𝑚
, and jointly train a network 𝐹: ℝ 𝑚
→ ℝ 𝑛
• Extension to image segmentation
• There are multiple adjacency segments
• Hence, there are multiple true tree structures
• Hence, Δ 𝑦, 𝑦𝑖 checks if the subtree is
included in the set of true tree structures
Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011.
Recursive Autoencoder (RAE) - Motivation & Idea
• Motivation: Recursive neural network (RNN) requires true tree structures for training
• Recursive autoencoder (RAE) extends RNN to un- (semi-)supervised learning setting
• If tree structure 𝑦 is given, we can train a local autoencoder 𝑐1, 𝑐2 → 𝑝 → 𝑐1
, 𝑐2
on each node, with reconstruction loss 𝐿(𝑦) = σ 𝑐1,𝑐2,𝑝 ∈𝑦 𝑐1, 𝑐2 − 𝑐1
, 𝑐2
′ 2
Socher et al. Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions. EMNLP 2011.
Recursive Autoencoder (RAE) - Model
• If tree structure 𝑦 is given, we can train a local autoencoder 𝑐1, 𝑐2 → 𝑝 → 𝑐1
, 𝑐2
on each node, with reconstruction loss 𝐿(𝑦) = σ 𝑐1,𝑐2,𝑝 ∈𝑦 𝑐1, 𝑐2 − 𝑐1
, 𝑐2
′ 2
• If tree structure is not given, we take minimum over all candidate trees 𝐴(𝑥𝑖)
𝑦∈𝐴(𝑥 𝑖)
𝐿(𝑦) = argmin
𝑦∈𝐴(𝑥 𝑖)
𝑐1,𝑐2,𝑝 ∈𝑦
𝑐1, 𝑐2 − 𝑐1
, 𝑐2
′ 2
• Here, 𝐴(𝑥𝑖) is approximated by greedy search, using recon loss as score
• Length normalization: Minimizing recon loss forces the scale of hidden nodes be 0
To prevent this, normalize hidden nodes by length: 𝑝/‖𝑝‖
• The resulting tree captures the information of words, but not follows the syntactics
• However, the learnt representation was still useful
Socher et al. Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions. EMNLP 2011.
Recursive Autoencoder (RAE) - Experiments
• For each paragraph, votes on 5 sentiments are labeled (multiple votes for one paragraph)
• Train a logistic regression model using the learnt representation
• The learnt representation was better than baseline models,
e.g. binary bag-of-words, hand-crafted features, and average of word vectors
Socher et al. Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions. EMNLP 2011.
Unfolding RAE & Dynamic Pooling - Model
• Unfolding RAE is global autoencoder version of RAE (expensive but may better)
• In some tasks, e.g. paraphrase detection, we should compare features of sentences
• Comparing all features would be better than root features, but size does not match
• Dynamic pooling converts the similarity matrix to the fixed-sized matrix
Socher et al. Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection. NIPS 2011.
Unfolding RAE & Dynamic Pooling - Experiments
• Unfolding RAE learns better representation than RAE
• Unfolding RAE + dynamic pooling gives the best representation for similarity
Socher et al. Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection. NIPS 2011.
Nearest Neighbors
Similarity Classification
Matrix-Vector RNN (MV-RNN)
• Motivation: Different word pairs have different composition rule
• Idea: Represent the composition rule of words ∈ ℝ 𝑛 by a matrix ∈ ℝ 𝑛×𝑛
• Hence, each word is represented by a matrix-vector pair 𝑎, 𝐴 ∈ ℝ 𝑛
× ℝ 𝑛×𝑛
• For two words 𝑎, 𝐴 and 𝑏, 𝐵 , the parent node 𝑝, 𝑃 is given by
𝑝 = 𝑓𝑉 𝑎, 𝑏, 𝐴, 𝐵 = ሚ𝑓𝑉 𝐵𝑎, 𝐴𝑏
𝑃 = 𝑓 𝑀 𝐴, 𝐵 = 𝑊 𝑀 ⋅ 𝐴 𝐵 𝑇
• We should store ℝ 𝑛×𝑛×|𝑉|
matrixes, hence the authors use
low-rank approximation to reduce the # of parameters
• MV-RNN shows better performance than vanilla RNN
Socher et al. Semantic Compositionality through Recursive Matrix-Vector Spaces. EMNLP 2012.
Semantic Classification
Recursive Neural Tensor Network (RNTN)
• Motivation: Considering composition is cool, but MV-RNN uses too many parameters
• Instead of using one matrix for each word, use a single tensor to represent composition
• Formally, let 𝑉[1:𝑛] ∈ ℝ2𝑛×2𝑛×𝑛 where 𝑉[𝑖] ∈ ℝ2𝑛×2𝑛 indicates each tensor slices
• Then the composition rule ℎ ∈ ℝ 𝑛 for children (𝑎, 𝑏) are given by
ℎ𝑖 = 𝑎 𝑏 ⋅ 𝑉 𝑖
⋅ 𝑎 𝑏 𝑇
and the parent 𝑝 ∈ ℝ 𝑛
𝑝 = 𝑓 𝑎, 𝑏, ℎ = ሚ𝑓(ℎ + 𝑊 ⋅ 𝑎 𝑏 𝑇
• It reduced the # of parameters from 𝑑 × 𝑑 × |𝑉| to 2𝑑 × 2𝑑 × 𝑑
• RNTN also shows better performance than MV-RNN
Socher et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. EMNLP 2013.
• Recursive Neural Network (RNN): Socher et al. Parsing Natural Scenes and Natural
Language with Recursive Neural Networks. ICML 2011.
• Recursive Autoencoder (RAE): Socher et al. Semi-Supervised Recursive Autoencoders for
Predicting Sentiment Distributions. EMNLP 2011.
• Unfolding RAE & Dynamic Pooling: Socher et al. Dynamic Pooling and Unfolding Recursive
Autoencoders for Paraphrase Detection. NIPS 2011.
• Matrix-Vector RNN (MV-RNN): Socher et al. Semantic Compositionality through Recursive
Matrix-Vector Spaces. EMNLP 2012.
• Recursive Neural Tensor Network (RNTN): Socher et al. Recursive Deep Models for
Semantic Compositionality Over a Sentiment Treebank. EMNLP 2013.

More Related Content

What's hot

Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)
Prakhar Rastogi
Priyanka Reddy
Hyperparameter Optimization for Machine Learning
Hyperparameter Optimization for Machine LearningHyperparameter Optimization for Machine Learning
Hyperparameter Optimization for Machine Learning
Francesco Casalegno
neural networks
 neural networks neural networks
Mask R-CNN
Mask R-CNNMask R-CNN
Mask R-CNN
Chanuk Lim
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and Regularization
Yan Xu
Autoencoders in Deep Learning
Autoencoders in Deep LearningAutoencoders in Deep Learning
Autoencoders in Deep Learning
milad abbasi
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Preferred Networks
Introduction to XGBoost
Introduction to XGBoostIntroduction to XGBoost
Introduction to XGBoost
Joonyoung Yi
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
Mohammad Junaid Khan
Feedforward neural network
Feedforward neural networkFeedforward neural network
Feedforward neural network
Sopheaktra YONG
Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10)
Larry Guo
Deep Learning With Neural Networks
Deep Learning With Neural NetworksDeep Learning With Neural Networks
Deep Learning With Neural Networks
Aniket Maurya
An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms
Hakky St
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine Learning
Knoldus Inc.
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)
Genetic algorithms
Genetic algorithmsGenetic algorithms
Genetic algorithms
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)
Appsilon Data Science
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
Gaurav Mittal

What's hot (20)

Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)
Hyperparameter Optimization for Machine Learning
Hyperparameter Optimization for Machine LearningHyperparameter Optimization for Machine Learning
Hyperparameter Optimization for Machine Learning
neural networks
 neural networks neural networks
neural networks
Mask R-CNN
Mask R-CNNMask R-CNN
Mask R-CNN
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and Regularization
Autoencoders in Deep Learning
Autoencoders in Deep LearningAutoencoders in Deep Learning
Autoencoders in Deep Learning
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to XGBoost
Introduction to XGBoostIntroduction to XGBoost
Introduction to XGBoost
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
Feedforward neural network
Feedforward neural networkFeedforward neural network
Feedforward neural network
Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning With Neural Networks
Deep Learning With Neural NetworksDeep Learning With Neural Networks
Deep Learning With Neural Networks
An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine Learning
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)
Genetic algorithms
Genetic algorithmsGenetic algorithms
Genetic algorithms
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)

Similar to Recursive Neural Networks

Recurrent Neural Network
Recurrent Neural NetworkRecurrent Neural Network
Recurrent Neural Network
Mohammad Sabouri
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Saurabh Kaushik
Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)
Dynamic pooling and unfolding recursive autoencoders for paraphrase detection
Dynamic pooling and unfolding recursive autoencoders for paraphrase detectionDynamic pooling and unfolding recursive autoencoders for paraphrase detection
Dynamic pooling and unfolding recursive autoencoders for paraphrase detection
Koza Ozawa
Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...
Ana Marasović
Nn devs
Nn devsNn devs
Introduction to Deep Learning presentation
Introduction to Deep Learning presentationIntroduction to Deep Learning presentation
Introduction to Deep Learning presentation
Evolving Comprehensible Neural Network Trees
Evolving Comprehensible Neural Network TreesEvolving Comprehensible Neural Network Trees
Evolving Comprehensible Neural Network Trees
Amr Kamel Deklel
Semi-Supervised Autoencoders for Predicting Sentiment Distributions(第 5 回 De...
 Semi-Supervised Autoencoders for Predicting Sentiment Distributions(第 5 回 De... Semi-Supervised Autoencoders for Predicting Sentiment Distributions(第 5 回 De...
Semi-Supervised Autoencoders for Predicting Sentiment Distributions(第 5 回 De...
Ohsawa Goodfellow
Sequence learning and modern RNNs
Sequence learning and modern RNNsSequence learning and modern RNNs
Sequence learning and modern RNNs
Grigory Sapunov
PhD Defense
PhD DefensePhD Defense
PhD Defense
Taehoon Lee
Neuromorphic computing for neural networks
Neuromorphic computing for neural networksNeuromorphic computing for neural networks
Neuromorphic computing for neural networks
Claudio Gallicchio
Introduction to Tree-LSTMs
Introduction to Tree-LSTMsIntroduction to Tree-LSTMs
Introduction to Tree-LSTMs
Daniel Perez
CNN for modeling sentence
CNN for modeling sentenceCNN for modeling sentence
CNN for modeling sentence
Semester presentation
Semester presentationSemester presentation
Semester presentation
khush bakhat
Deep Learning - RNN and CNN
Deep Learning - RNN and CNNDeep Learning - RNN and CNN
Deep Learning - RNN and CNN
Pradnya Saval
ANN Based POS Tagging For Nepali Text
ANN Based POS Tagging For Nepali Text ANN Based POS Tagging For Nepali Text
ANN Based POS Tagging For Nepali Text
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspective
Anirban Santara

Similar to Recursive Neural Networks (20)

Recurrent Neural Network
Recurrent Neural NetworkRecurrent Neural Network
Recurrent Neural Network
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)
Dynamic pooling and unfolding recursive autoencoders for paraphrase detection
Dynamic pooling and unfolding recursive autoencoders for paraphrase detectionDynamic pooling and unfolding recursive autoencoders for paraphrase detection
Dynamic pooling and unfolding recursive autoencoders for paraphrase detection
Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...
Nn devs
Nn devsNn devs
Nn devs
Introduction to Deep Learning presentation
Introduction to Deep Learning presentationIntroduction to Deep Learning presentation
Introduction to Deep Learning presentation
Evolving Comprehensible Neural Network Trees
Evolving Comprehensible Neural Network TreesEvolving Comprehensible Neural Network Trees
Evolving Comprehensible Neural Network Trees
Semi-Supervised Autoencoders for Predicting Sentiment Distributions(第 5 回 De...
 Semi-Supervised Autoencoders for Predicting Sentiment Distributions(第 5 回 De... Semi-Supervised Autoencoders for Predicting Sentiment Distributions(第 5 回 De...
Semi-Supervised Autoencoders for Predicting Sentiment Distributions(第 5 回 De...
Sequence learning and modern RNNs
Sequence learning and modern RNNsSequence learning and modern RNNs
Sequence learning and modern RNNs
PhD Defense
PhD DefensePhD Defense
PhD Defense
Neuromorphic computing for neural networks
Neuromorphic computing for neural networksNeuromorphic computing for neural networks
Neuromorphic computing for neural networks
Introduction to Tree-LSTMs
Introduction to Tree-LSTMsIntroduction to Tree-LSTMs
Introduction to Tree-LSTMs
CNN for modeling sentence
CNN for modeling sentenceCNN for modeling sentence
CNN for modeling sentence
Semester presentation
Semester presentationSemester presentation
Semester presentation
Deep Learning - RNN and CNN
Deep Learning - RNN and CNNDeep Learning - RNN and CNN
Deep Learning - RNN and CNN
ANN Based POS Tagging For Nepali Text
ANN Based POS Tagging For Nepali Text ANN Based POS Tagging For Nepali Text
ANN Based POS Tagging For Nepali Text
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspective

More from Sangwoo Mo

Brief History of Visual Representation Learning
Brief History of Visual Representation LearningBrief History of Visual Representation Learning
Brief History of Visual Representation Learning
Sangwoo Mo
Learning Visual Representations from Uncurated Data
Learning Visual Representations from Uncurated DataLearning Visual Representations from Uncurated Data
Learning Visual Representations from Uncurated Data
Sangwoo Mo
Hyperbolic Deep Reinforcement Learning
Hyperbolic Deep Reinforcement LearningHyperbolic Deep Reinforcement Learning
Hyperbolic Deep Reinforcement Learning
Sangwoo Mo
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
Sangwoo Mo
Self-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSelf-supervised Learning Lecture Note
Self-supervised Learning Lecture Note
Sangwoo Mo
Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)
Sangwoo Mo
Deep Learning Theory Seminar (Chap 1-2, part 1)
Deep Learning Theory Seminar (Chap 1-2, part 1)Deep Learning Theory Seminar (Chap 1-2, part 1)
Deep Learning Theory Seminar (Chap 1-2, part 1)
Sangwoo Mo
Introduction to Diffusion Models
Introduction to Diffusion ModelsIntroduction to Diffusion Models
Introduction to Diffusion Models
Sangwoo Mo
Object-Region Video Transformers
Object-Region Video TransformersObject-Region Video Transformers
Object-Region Video Transformers
Sangwoo Mo
Deep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksDeep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural Networks
Sangwoo Mo
Learning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat MinimaLearning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat Minima
Sangwoo Mo
Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)
Sangwoo Mo
Explicit Density Models
Explicit Density ModelsExplicit Density Models
Explicit Density Models
Sangwoo Mo
Score-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential EquationsScore-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential Equations
Sangwoo Mo
Self-Attention with Linear Complexity
Self-Attention with Linear ComplexitySelf-Attention with Linear Complexity
Self-Attention with Linear Complexity
Sangwoo Mo
Meta-Learning with Implicit Gradients
Meta-Learning with Implicit GradientsMeta-Learning with Implicit Gradients
Meta-Learning with Implicit Gradients
Sangwoo Mo
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Sangwoo Mo
Generative Models for General Audiences
Generative Models for General AudiencesGenerative Models for General Audiences
Generative Models for General Audiences
Sangwoo Mo
Bayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-LearningBayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-Learning
Sangwoo Mo
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
Sangwoo Mo

More from Sangwoo Mo (20)

Brief History of Visual Representation Learning
Brief History of Visual Representation LearningBrief History of Visual Representation Learning
Brief History of Visual Representation Learning
Learning Visual Representations from Uncurated Data
Learning Visual Representations from Uncurated DataLearning Visual Representations from Uncurated Data
Learning Visual Representations from Uncurated Data
Hyperbolic Deep Reinforcement Learning
Hyperbolic Deep Reinforcement LearningHyperbolic Deep Reinforcement Learning
Hyperbolic Deep Reinforcement Learning
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
Self-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSelf-supervised Learning Lecture Note
Self-supervised Learning Lecture Note
Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 1-2, part 1)
Deep Learning Theory Seminar (Chap 1-2, part 1)Deep Learning Theory Seminar (Chap 1-2, part 1)
Deep Learning Theory Seminar (Chap 1-2, part 1)
Introduction to Diffusion Models
Introduction to Diffusion ModelsIntroduction to Diffusion Models
Introduction to Diffusion Models
Object-Region Video Transformers
Object-Region Video TransformersObject-Region Video Transformers
Object-Region Video Transformers
Deep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksDeep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural Networks
Learning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat MinimaLearning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat Minima
Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)
Explicit Density Models
Explicit Density ModelsExplicit Density Models
Explicit Density Models
Score-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential EquationsScore-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential Equations
Self-Attention with Linear Complexity
Self-Attention with Linear ComplexitySelf-Attention with Linear Complexity
Self-Attention with Linear Complexity
Meta-Learning with Implicit Gradients
Meta-Learning with Implicit GradientsMeta-Learning with Implicit Gradients
Meta-Learning with Implicit Gradients
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Generative Models for General Audiences
Generative Models for General AudiencesGenerative Models for General Audiences
Generative Models for General Audiences
Bayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-LearningBayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-Learning
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing

Recently uploaded

DynamoDB to ScyllaDB: Technical Comparison and the Path to Success
DynamoDB to ScyllaDB: Technical Comparison and the Path to SuccessDynamoDB to ScyllaDB: Technical Comparison and the Path to Success
DynamoDB to ScyllaDB: Technical Comparison and the Path to Success
The "Zen" of Python Exemplars - OTel Community Day
The "Zen" of Python Exemplars - OTel Community DayThe "Zen" of Python Exemplars - OTel Community Day
The "Zen" of Python Exemplars - OTel Community Day
Paige Cruz
CTO Insights: Steering a High-Stakes Database Migration
CTO Insights: Steering a High-Stakes Database MigrationCTO Insights: Steering a High-Stakes Database Migration
CTO Insights: Steering a High-Stakes Database Migration
Communications Mining Series - Zero to Hero - Session 2
Communications Mining Series - Zero to Hero - Session 2Communications Mining Series - Zero to Hero - Session 2
Communications Mining Series - Zero to Hero - Session 2
Move Auth, Policy, and Resilience to the Platform
Move Auth, Policy, and Resilience to the PlatformMove Auth, Policy, and Resilience to the Platform
Move Auth, Policy, and Resilience to the Platform
Christian Posta
Fuxnet [EN] .pdf
Fuxnet [EN]                                   .pdfFuxnet [EN]                                   .pdf
Fuxnet [EN] .pdf
Overkill Security
ThousandEyes New Product Features and Release Highlights: June 2024
ThousandEyes New Product Features and Release Highlights: June 2024ThousandEyes New Product Features and Release Highlights: June 2024
ThousandEyes New Product Features and Release Highlights: June 2024
Chapter 1 - Fundamentals of Testing V4.0
Chapter 1 - Fundamentals of Testing V4.0Chapter 1 - Fundamentals of Testing V4.0
Chapter 1 - Fundamentals of Testing V4.0
Neeraj Kumar Singh
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
EverHost AI Review: Empowering Websites with Limitless Possibilities through ...
EverHost AI Review: Empowering Websites with Limitless Possibilities through ...EverHost AI Review: Empowering Websites with Limitless Possibilities through ...
EverHost AI Review: Empowering Websites with Limitless Possibilities through ...
Introduction to ThousandEyes AMER Webinar
Introduction  to ThousandEyes AMER WebinarIntroduction  to ThousandEyes AMER Webinar
Introduction to ThousandEyes AMER Webinar
APJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes WebinarAPJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes Webinar
Product Listing Optimization Presentation - Gay De La Cruz.pdf
Product Listing Optimization Presentation - Gay De La Cruz.pdfProduct Listing Optimization Presentation - Gay De La Cruz.pdf
Product Listing Optimization Presentation - Gay De La Cruz.pdf
ScyllaDB Topology on Raft: An Inside Look
ScyllaDB Topology on Raft: An Inside LookScyllaDB Topology on Raft: An Inside Look
ScyllaDB Topology on Raft: An Inside Look
An Introduction to All Data Enterprise Integration
An Introduction to All Data Enterprise IntegrationAn Introduction to All Data Enterprise Integration
An Introduction to All Data Enterprise Integration
Safe Software
intra-mart Accel series 2024 Spring updates_En
intra-mart Accel series 2024 Spring updates_Enintra-mart Accel series 2024 Spring updates_En
intra-mart Accel series 2024 Spring updates_En
The Strategy Behind ReversingLabs’ Massive Key-Value Migration
The Strategy Behind ReversingLabs’ Massive Key-Value MigrationThe Strategy Behind ReversingLabs’ Massive Key-Value Migration
The Strategy Behind ReversingLabs’ Massive Key-Value Migration
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdfLee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My IdentityCNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
Cynthia Thomas
Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!
Ortus Solutions, Corp

Recently uploaded (20)

DynamoDB to ScyllaDB: Technical Comparison and the Path to Success
DynamoDB to ScyllaDB: Technical Comparison and the Path to SuccessDynamoDB to ScyllaDB: Technical Comparison and the Path to Success
DynamoDB to ScyllaDB: Technical Comparison and the Path to Success
The "Zen" of Python Exemplars - OTel Community Day
The "Zen" of Python Exemplars - OTel Community DayThe "Zen" of Python Exemplars - OTel Community Day
The "Zen" of Python Exemplars - OTel Community Day
CTO Insights: Steering a High-Stakes Database Migration
CTO Insights: Steering a High-Stakes Database MigrationCTO Insights: Steering a High-Stakes Database Migration
CTO Insights: Steering a High-Stakes Database Migration
Communications Mining Series - Zero to Hero - Session 2
Communications Mining Series - Zero to Hero - Session 2Communications Mining Series - Zero to Hero - Session 2
Communications Mining Series - Zero to Hero - Session 2
Move Auth, Policy, and Resilience to the Platform
Move Auth, Policy, and Resilience to the PlatformMove Auth, Policy, and Resilience to the Platform
Move Auth, Policy, and Resilience to the Platform
Fuxnet [EN] .pdf
Fuxnet [EN]                                   .pdfFuxnet [EN]                                   .pdf
Fuxnet [EN] .pdf
ThousandEyes New Product Features and Release Highlights: June 2024
ThousandEyes New Product Features and Release Highlights: June 2024ThousandEyes New Product Features and Release Highlights: June 2024
ThousandEyes New Product Features and Release Highlights: June 2024
Chapter 1 - Fundamentals of Testing V4.0
Chapter 1 - Fundamentals of Testing V4.0Chapter 1 - Fundamentals of Testing V4.0
Chapter 1 - Fundamentals of Testing V4.0
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
EverHost AI Review: Empowering Websites with Limitless Possibilities through ...
EverHost AI Review: Empowering Websites with Limitless Possibilities through ...EverHost AI Review: Empowering Websites with Limitless Possibilities through ...
EverHost AI Review: Empowering Websites with Limitless Possibilities through ...
Introduction to ThousandEyes AMER Webinar
Introduction  to ThousandEyes AMER WebinarIntroduction  to ThousandEyes AMER Webinar
Introduction to ThousandEyes AMER Webinar
APJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes WebinarAPJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes Webinar
Product Listing Optimization Presentation - Gay De La Cruz.pdf
Product Listing Optimization Presentation - Gay De La Cruz.pdfProduct Listing Optimization Presentation - Gay De La Cruz.pdf
Product Listing Optimization Presentation - Gay De La Cruz.pdf
ScyllaDB Topology on Raft: An Inside Look
ScyllaDB Topology on Raft: An Inside LookScyllaDB Topology on Raft: An Inside Look
ScyllaDB Topology on Raft: An Inside Look
An Introduction to All Data Enterprise Integration
An Introduction to All Data Enterprise IntegrationAn Introduction to All Data Enterprise Integration
An Introduction to All Data Enterprise Integration
intra-mart Accel series 2024 Spring updates_En
intra-mart Accel series 2024 Spring updates_Enintra-mart Accel series 2024 Spring updates_En
intra-mart Accel series 2024 Spring updates_En
The Strategy Behind ReversingLabs’ Massive Key-Value Migration
The Strategy Behind ReversingLabs’ Massive Key-Value MigrationThe Strategy Behind ReversingLabs’ Massive Key-Value Migration
The Strategy Behind ReversingLabs’ Massive Key-Value Migration
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdfLee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My IdentityCNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!

Recursive Neural Networks

  • 2. Recursive Neural Network (RNN) - Motivation • Motivation: Many real objects has a recursive structure, e.g. Images are sum of segments, and sentences are sum of words Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011. Image from Stanford CS224N Lecture Note 14.
  • 3. Recursive Neural Network (RNN) - Motivation • Motivation: Can we learn a good representation for the recursive structures? • Recursive structures (phrases) and components (words) should lie on the same space, e.g. the country of my birth ≃ Germany, France, etc. Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011. Image from Stanford CS224N Lecture Note 14.
  • 4. Recursive Neural Network (RNN) - Model • Goal: Design a neural network that features are recursively constructed • Each module maps two children to one parents, lying on the same vector space • To give the order of recursion, we give a score (plausibility) for each node • Hence, the neural network module outputs (representation, score) pairs Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011. Image from Stanford CS224N Lecture Note 14. Each line is
  • 5. Recursive Neural Network (RNN) - Model • cf. Note that recurrent neural network is a special case of recursive neural network Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011. Image from Ratsgo’s blog for text mining. =
  • 6. Recursive Neural Network (RNN) - Inference • Each step, merge adjacent two nodes • With greedy algorithm, it only requires 𝑂(𝑁) time for inference Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011. Image from Stanford CS224N Lecture Note 14.
  • 7. Recursive Neural Network (RNN) - Inference • Each step, merge adjacent two nodes • With greedy algorithm, it only requires 𝑂(𝑁) time for inference Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011. Image from Stanford CS224N Lecture Note 14.
  • 8. Recursive Neural Network (RNN) - Inference • Each step, merge adjacent two nodes • With greedy algorithm, it only requires 𝑂(𝑁) time for inference Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011. Image from Stanford CS224N Lecture Note 14.
  • 9. Recursive Neural Network (RNN) - Inference • Each step, merge adjacent two nodes • With greedy algorithm, it only requires 𝑂(𝑁) time for inference Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011. Image from Stanford CS224N Lecture Note 14.
  • 10. Recursive Neural Network (RNN) - Inference • We can apply beam search to improve the performance • Beam search: Keep 𝑘-memory for each step (Greedy = 1-Beam search) Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011. Image from Ratsgo’s blog for text mining.
  • 11. Recursive Neural Network (RNN) - Training • Let (sentence, tree) pair (𝑥𝑖, 𝑦𝑖) are given • Let 𝑠(𝑥𝑖, 𝑦) be score of tree 𝑦, sum of scores of every non-leaf nodes • Let 𝐴(𝑥𝑖) be candidate trees (approximated by beam search) • Then max-margin objective (maximize) is where Δ 𝑦, 𝑦𝑖 is number of wrong subtrees • We can also give a classification loss for each node (use node’s feature as input for the classifier) Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011. Image from Stanford CS224N Lecture Note 14. Increases 𝑠(𝑥𝑖, 𝑦𝑖) decreases 𝑠(𝑥𝑖, 𝑦) if 𝑠 𝑥𝑖, 𝑦 + Δ 𝑦, 𝑦𝑖 > 𝑠(𝑥𝑖, 𝑦𝑖) class vector
  • 12. Recursive Neural Network (RNN) - Experiments • After training, both leaf and higher nodes learn the valid representation • Image segmentation: Infer classes for segments (feature extractor is jointly trained) • Phrase clustering: Nearest neighborhood on phrase features Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011.
  • 13. Recursive Neural Network (RNN) - Appendix • Preprocessing: How to convert segments/words to the representation space ℝ 𝑛 ? • Word: Use pretrained word2vec model (𝑉 → ℝ 𝑛) • Image: Extract hand-crafted features in ℝ 𝑚 , and jointly train a network 𝐹: ℝ 𝑚 → ℝ 𝑛 • Extension to image segmentation • There are multiple adjacency segments • Hence, there are multiple true tree structures • Hence, Δ 𝑦, 𝑦𝑖 checks if the subtree is included in the set of true tree structures Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011.
  • 14. Recursive Autoencoder (RAE) - Motivation & Idea • Motivation: Recursive neural network (RNN) requires true tree structures for training • Recursive autoencoder (RAE) extends RNN to un- (semi-)supervised learning setting • If tree structure 𝑦 is given, we can train a local autoencoder 𝑐1, 𝑐2 → 𝑝 → 𝑐1 ′ , 𝑐2 ′ on each node, with reconstruction loss 𝐿(𝑦) = σ 𝑐1,𝑐2,𝑝 ∈𝑦 𝑐1, 𝑐2 − 𝑐1 ′ , 𝑐2 ′ 2 Socher et al. Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions. EMNLP 2011.
  • 15. Recursive Autoencoder (RAE) - Model • If tree structure 𝑦 is given, we can train a local autoencoder 𝑐1, 𝑐2 → 𝑝 → 𝑐1 ′ , 𝑐2 ′ on each node, with reconstruction loss 𝐿(𝑦) = σ 𝑐1,𝑐2,𝑝 ∈𝑦 𝑐1, 𝑐2 − 𝑐1 ′ , 𝑐2 ′ 2 • If tree structure is not given, we take minimum over all candidate trees 𝐴(𝑥𝑖) argmin 𝑦∈𝐴(𝑥 𝑖) 𝐿(𝑦) = argmin 𝑦∈𝐴(𝑥 𝑖) ෍ 𝑐1,𝑐2,𝑝 ∈𝑦 𝑐1, 𝑐2 − 𝑐1 ′ , 𝑐2 ′ 2 • Here, 𝐴(𝑥𝑖) is approximated by greedy search, using recon loss as score • Length normalization: Minimizing recon loss forces the scale of hidden nodes be 0 To prevent this, normalize hidden nodes by length: 𝑝/‖𝑝‖ • The resulting tree captures the information of words, but not follows the syntactics • However, the learnt representation was still useful Socher et al. Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions. EMNLP 2011.
  • 16. Recursive Autoencoder (RAE) - Experiments • For each paragraph, votes on 5 sentiments are labeled (multiple votes for one paragraph) • Train a logistic regression model using the learnt representation • The learnt representation was better than baseline models, e.g. binary bag-of-words, hand-crafted features, and average of word vectors Socher et al. Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions. EMNLP 2011.
  • 17. Unfolding RAE & Dynamic Pooling - Model • Unfolding RAE is global autoencoder version of RAE (expensive but may better) • In some tasks, e.g. paraphrase detection, we should compare features of sentences • Comparing all features would be better than root features, but size does not match • Dynamic pooling converts the similarity matrix to the fixed-sized matrix Socher et al. Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection. NIPS 2011.
  • 18. Unfolding RAE & Dynamic Pooling - Experiments • Unfolding RAE learns better representation than RAE • Unfolding RAE + dynamic pooling gives the best representation for similarity Socher et al. Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection. NIPS 2011. Nearest Neighbors Similarity Classification
  • 19. Matrix-Vector RNN (MV-RNN) • Motivation: Different word pairs have different composition rule • Idea: Represent the composition rule of words ∈ ℝ 𝑛 by a matrix ∈ ℝ 𝑛×𝑛 • Hence, each word is represented by a matrix-vector pair 𝑎, 𝐴 ∈ ℝ 𝑛 × ℝ 𝑛×𝑛 • For two words 𝑎, 𝐴 and 𝑏, 𝐵 , the parent node 𝑝, 𝑃 is given by 𝑝 = 𝑓𝑉 𝑎, 𝑏, 𝐴, 𝐵 = ሚ𝑓𝑉 𝐵𝑎, 𝐴𝑏 and 𝑃 = 𝑓 𝑀 𝐴, 𝐵 = 𝑊 𝑀 ⋅ 𝐴 𝐵 𝑇 • We should store ℝ 𝑛×𝑛×|𝑉| matrixes, hence the authors use low-rank approximation to reduce the # of parameters • MV-RNN shows better performance than vanilla RNN Socher et al. Semantic Compositionality through Recursive Matrix-Vector Spaces. EMNLP 2012. Semantic Classification
  • 20. Recursive Neural Tensor Network (RNTN) • Motivation: Considering composition is cool, but MV-RNN uses too many parameters • Instead of using one matrix for each word, use a single tensor to represent composition • Formally, let 𝑉[1:𝑛] ∈ ℝ2𝑛×2𝑛×𝑛 where 𝑉[𝑖] ∈ ℝ2𝑛×2𝑛 indicates each tensor slices • Then the composition rule ℎ ∈ ℝ 𝑛 for children (𝑎, 𝑏) are given by ℎ𝑖 = 𝑎 𝑏 ⋅ 𝑉 𝑖 ⋅ 𝑎 𝑏 𝑇 and the parent 𝑝 ∈ ℝ 𝑛 is 𝑝 = 𝑓 𝑎, 𝑏, ℎ = ሚ𝑓(ℎ + 𝑊 ⋅ 𝑎 𝑏 𝑇 ) • It reduced the # of parameters from 𝑑 × 𝑑 × |𝑉| to 2𝑑 × 2𝑑 × 𝑑 • RNTN also shows better performance than MV-RNN Socher et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. EMNLP 2013.
  • 21. Reference • Recursive Neural Network (RNN): Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011. • Recursive Autoencoder (RAE): Socher et al. Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions. EMNLP 2011. • Unfolding RAE & Dynamic Pooling: Socher et al. Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection. NIPS 2011. • Matrix-Vector RNN (MV-RNN): Socher et al. Semantic Compositionality through Recursive Matrix-Vector Spaces. EMNLP 2012. • Recursive Neural Tensor Network (RNTN): Socher et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. EMNLP 2013.