尊敬的 微信汇率:1円 ≈ 0.046078 元 支付宝汇率:1円 ≈ 0.046168元 [退出登录]
SlideShare a Scribd company logo
Recursive Neural Networks
2018.06.27.
Sangwoo Mo
Recursive Neural Network (RNN) - Motivation
• Motivation: Many real objects has a recursive structure,
e.g. Images are sum of segments, and sentences are sum of words
Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011.
Image from Stanford CS224N Lecture Note 14.
Recursive Neural Network (RNN) - Motivation
• Motivation: Can we learn a good representation for the recursive structures?
• Recursive structures (phrases) and components (words) should lie on the same space,
e.g. the country of my birth ≃ Germany, France, etc.
Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011.
Image from Stanford CS224N Lecture Note 14.
Recursive Neural Network (RNN) - Model
• Goal: Design a neural network that features are recursively constructed
• Each module maps two children to one parents, lying on the same vector space
• To give the order of recursion, we give a score (plausibility) for each node
• Hence, the neural network module outputs (representation, score) pairs
Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011.
Image from Stanford CS224N Lecture Note 14.
Each line is
Recursive Neural Network (RNN) - Model
• cf. Note that recurrent neural network is a special case of recursive neural network
Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011.
Image from Ratsgo’s blog for text mining.
=
Recursive Neural Network (RNN) - Inference
• Each step, merge adjacent two nodes
• With greedy algorithm, it only requires 𝑂(𝑁) time for inference
Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011.
Image from Stanford CS224N Lecture Note 14.
Recursive Neural Network (RNN) - Inference
• Each step, merge adjacent two nodes
• With greedy algorithm, it only requires 𝑂(𝑁) time for inference
Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011.
Image from Stanford CS224N Lecture Note 14.
Recursive Neural Network (RNN) - Inference
• Each step, merge adjacent two nodes
• With greedy algorithm, it only requires 𝑂(𝑁) time for inference
Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011.
Image from Stanford CS224N Lecture Note 14.
Recursive Neural Network (RNN) - Inference
• Each step, merge adjacent two nodes
• With greedy algorithm, it only requires 𝑂(𝑁) time for inference
Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011.
Image from Stanford CS224N Lecture Note 14.
Recursive Neural Network (RNN) - Inference
• We can apply beam search to improve the performance
• Beam search: Keep 𝑘-memory for each step (Greedy = 1-Beam search)
Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011.
Image from Ratsgo’s blog for text mining.
Recursive Neural Network (RNN) - Training
• Let (sentence, tree) pair (𝑥𝑖, 𝑦𝑖) are given
• Let 𝑠(𝑥𝑖, 𝑦) be score of tree 𝑦, sum of scores of every non-leaf nodes
• Let 𝐴(𝑥𝑖) be candidate trees (approximated by beam search)
• Then max-margin objective (maximize) is
where Δ 𝑦, 𝑦𝑖 is number of wrong subtrees
• We can also give a classification loss for each node
(use node’s feature as input for the classifier)
Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011.
Image from Stanford CS224N Lecture Note 14.
Increases 𝑠(𝑥𝑖, 𝑦𝑖) decreases 𝑠(𝑥𝑖, 𝑦) if 𝑠 𝑥𝑖, 𝑦 + Δ 𝑦, 𝑦𝑖 > 𝑠(𝑥𝑖, 𝑦𝑖)
class vector
Recursive Neural Network (RNN) - Experiments
• After training, both leaf and higher nodes learn the valid representation
• Image segmentation: Infer classes for segments (feature extractor is jointly trained)
• Phrase clustering: Nearest neighborhood on phrase features
Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011.
Recursive Neural Network (RNN) - Appendix
• Preprocessing: How to convert segments/words to the representation space ℝ 𝑛
?
• Word: Use pretrained word2vec model (𝑉 → ℝ 𝑛)
• Image: Extract hand-crafted features in ℝ 𝑚
, and jointly train a network 𝐹: ℝ 𝑚
→ ℝ 𝑛
• Extension to image segmentation
• There are multiple adjacency segments
• Hence, there are multiple true tree structures
• Hence, Δ 𝑦, 𝑦𝑖 checks if the subtree is
included in the set of true tree structures
Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011.
Recursive Autoencoder (RAE) - Motivation & Idea
• Motivation: Recursive neural network (RNN) requires true tree structures for training
• Recursive autoencoder (RAE) extends RNN to un- (semi-)supervised learning setting
• If tree structure 𝑦 is given, we can train a local autoencoder 𝑐1, 𝑐2 → 𝑝 → 𝑐1
′
, 𝑐2
′
on each node, with reconstruction loss 𝐿(𝑦) = σ 𝑐1,𝑐2,𝑝 ∈𝑦 𝑐1, 𝑐2 − 𝑐1
′
, 𝑐2
′ 2
Socher et al. Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions. EMNLP 2011.
Recursive Autoencoder (RAE) - Model
• If tree structure 𝑦 is given, we can train a local autoencoder 𝑐1, 𝑐2 → 𝑝 → 𝑐1
′
, 𝑐2
′
on each node, with reconstruction loss 𝐿(𝑦) = σ 𝑐1,𝑐2,𝑝 ∈𝑦 𝑐1, 𝑐2 − 𝑐1
′
, 𝑐2
′ 2
• If tree structure is not given, we take minimum over all candidate trees 𝐴(𝑥𝑖)
argmin
𝑦∈𝐴(𝑥 𝑖)
𝐿(𝑦) = argmin
𝑦∈𝐴(𝑥 𝑖)
෍
𝑐1,𝑐2,𝑝 ∈𝑦
𝑐1, 𝑐2 − 𝑐1
′
, 𝑐2
′ 2
• Here, 𝐴(𝑥𝑖) is approximated by greedy search, using recon loss as score
• Length normalization: Minimizing recon loss forces the scale of hidden nodes be 0
To prevent this, normalize hidden nodes by length: 𝑝/‖𝑝‖
• The resulting tree captures the information of words, but not follows the syntactics
• However, the learnt representation was still useful
Socher et al. Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions. EMNLP 2011.
Recursive Autoencoder (RAE) - Experiments
• For each paragraph, votes on 5 sentiments are labeled (multiple votes for one paragraph)
• Train a logistic regression model using the learnt representation
• The learnt representation was better than baseline models,
e.g. binary bag-of-words, hand-crafted features, and average of word vectors
Socher et al. Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions. EMNLP 2011.
Unfolding RAE & Dynamic Pooling - Model
• Unfolding RAE is global autoencoder version of RAE (expensive but may better)
• In some tasks, e.g. paraphrase detection, we should compare features of sentences
• Comparing all features would be better than root features, but size does not match
• Dynamic pooling converts the similarity matrix to the fixed-sized matrix
Socher et al. Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection. NIPS 2011.
Unfolding RAE & Dynamic Pooling - Experiments
• Unfolding RAE learns better representation than RAE
• Unfolding RAE + dynamic pooling gives the best representation for similarity
Socher et al. Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection. NIPS 2011.
Nearest Neighbors
Similarity Classification
Matrix-Vector RNN (MV-RNN)
• Motivation: Different word pairs have different composition rule
• Idea: Represent the composition rule of words ∈ ℝ 𝑛 by a matrix ∈ ℝ 𝑛×𝑛
• Hence, each word is represented by a matrix-vector pair 𝑎, 𝐴 ∈ ℝ 𝑛
× ℝ 𝑛×𝑛
• For two words 𝑎, 𝐴 and 𝑏, 𝐵 , the parent node 𝑝, 𝑃 is given by
𝑝 = 𝑓𝑉 𝑎, 𝑏, 𝐴, 𝐵 = ሚ𝑓𝑉 𝐵𝑎, 𝐴𝑏
and
𝑃 = 𝑓 𝑀 𝐴, 𝐵 = 𝑊 𝑀 ⋅ 𝐴 𝐵 𝑇
• We should store ℝ 𝑛×𝑛×|𝑉|
matrixes, hence the authors use
low-rank approximation to reduce the # of parameters
• MV-RNN shows better performance than vanilla RNN
Socher et al. Semantic Compositionality through Recursive Matrix-Vector Spaces. EMNLP 2012.
Semantic Classification
Recursive Neural Tensor Network (RNTN)
• Motivation: Considering composition is cool, but MV-RNN uses too many parameters
• Instead of using one matrix for each word, use a single tensor to represent composition
• Formally, let 𝑉[1:𝑛] ∈ ℝ2𝑛×2𝑛×𝑛 where 𝑉[𝑖] ∈ ℝ2𝑛×2𝑛 indicates each tensor slices
• Then the composition rule ℎ ∈ ℝ 𝑛 for children (𝑎, 𝑏) are given by
ℎ𝑖 = 𝑎 𝑏 ⋅ 𝑉 𝑖
⋅ 𝑎 𝑏 𝑇
and the parent 𝑝 ∈ ℝ 𝑛
is
𝑝 = 𝑓 𝑎, 𝑏, ℎ = ሚ𝑓(ℎ + 𝑊 ⋅ 𝑎 𝑏 𝑇
)
• It reduced the # of parameters from 𝑑 × 𝑑 × |𝑉| to 2𝑑 × 2𝑑 × 𝑑
• RNTN also shows better performance than MV-RNN
Socher et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. EMNLP 2013.
Reference
• Recursive Neural Network (RNN): Socher et al. Parsing Natural Scenes and Natural
Language with Recursive Neural Networks. ICML 2011.
• Recursive Autoencoder (RAE): Socher et al. Semi-Supervised Recursive Autoencoders for
Predicting Sentiment Distributions. EMNLP 2011.
• Unfolding RAE & Dynamic Pooling: Socher et al. Dynamic Pooling and Unfolding Recursive
Autoencoders for Paraphrase Detection. NIPS 2011.
• Matrix-Vector RNN (MV-RNN): Socher et al. Semantic Compositionality through Recursive
Matrix-Vector Spaces. EMNLP 2012.
• Recursive Neural Tensor Network (RNTN): Socher et al. Recursive Deep Models for
Semantic Compositionality Over a Sentiment Treebank. EMNLP 2013.

More Related Content

What's hot

Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)
Prakhar Rastogi
 
rnn BASICS
rnn BASICSrnn BASICS
rnn BASICS
Priyanka Reddy
 
Hyperparameter Optimization for Machine Learning
Hyperparameter Optimization for Machine LearningHyperparameter Optimization for Machine Learning
Hyperparameter Optimization for Machine Learning
Francesco Casalegno
 
neural networks
 neural networks neural networks
Mask R-CNN
Mask R-CNNMask R-CNN
Mask R-CNN
Chanuk Lim
 
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Edureka!
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and Regularization
Yan Xu
 
Autoencoders in Deep Learning
Autoencoders in Deep LearningAutoencoders in Deep Learning
Autoencoders in Deep Learning
milad abbasi
 
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Preferred Networks
 
Introduction to XGBoost
Introduction to XGBoostIntroduction to XGBoost
Introduction to XGBoost
Joonyoung Yi
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
Mohammad Junaid Khan
 
Feedforward neural network
Feedforward neural networkFeedforward neural network
Feedforward neural network
Sopheaktra YONG
 
Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10)
Larry Guo
 
Deep Learning With Neural Networks
Deep Learning With Neural NetworksDeep Learning With Neural Networks
Deep Learning With Neural Networks
Aniket Maurya
 
An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms
Hakky St
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine Learning
Knoldus Inc.
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)
EdutechLearners
 
Genetic algorithms
Genetic algorithmsGenetic algorithms
Genetic algorithms
swapnac12
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)
Appsilon Data Science
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
Gaurav Mittal
 

What's hot (20)

Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)
 
rnn BASICS
rnn BASICSrnn BASICS
rnn BASICS
 
Hyperparameter Optimization for Machine Learning
Hyperparameter Optimization for Machine LearningHyperparameter Optimization for Machine Learning
Hyperparameter Optimization for Machine Learning
 
neural networks
 neural networks neural networks
neural networks
 
Mask R-CNN
Mask R-CNNMask R-CNN
Mask R-CNN
 
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and Regularization
 
Autoencoders in Deep Learning
Autoencoders in Deep LearningAutoencoders in Deep Learning
Autoencoders in Deep Learning
 
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
 
Introduction to XGBoost
Introduction to XGBoostIntroduction to XGBoost
Introduction to XGBoost
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
 
Feedforward neural network
Feedforward neural networkFeedforward neural network
Feedforward neural network
 
Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10)
 
Deep Learning With Neural Networks
Deep Learning With Neural NetworksDeep Learning With Neural Networks
Deep Learning With Neural Networks
 
An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine Learning
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)
 
Genetic algorithms
Genetic algorithmsGenetic algorithms
Genetic algorithms
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 

Similar to Recursive Neural Networks

Recurrent Neural Network
Recurrent Neural NetworkRecurrent Neural Network
Recurrent Neural Network
Mohammad Sabouri
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Saurabh Kaushik
 
Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)
SungminYou
 
Dynamic pooling and unfolding recursive autoencoders for paraphrase detection
Dynamic pooling and unfolding recursive autoencoders for paraphrase detectionDynamic pooling and unfolding recursive autoencoders for paraphrase detection
Dynamic pooling and unfolding recursive autoencoders for paraphrase detection
Koza Ozawa
 
Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...
Ana Marasović
 
SoftComputing6
SoftComputing6SoftComputing6
SoftComputing6
DrPrafullNarooka
 
Nn devs
Nn devsNn devs
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.ppt
yang947066
 
Introduction to Deep Learning presentation
Introduction to Deep Learning presentationIntroduction to Deep Learning presentation
Introduction to Deep Learning presentation
johanericka2
 
Evolving Comprehensible Neural Network Trees
Evolving Comprehensible Neural Network TreesEvolving Comprehensible Neural Network Trees
Evolving Comprehensible Neural Network Trees
Amr Kamel Deklel
 
Semi-Supervised Autoencoders for Predicting Sentiment Distributions(第 5 回 De...
 Semi-Supervised Autoencoders for Predicting Sentiment Distributions(第 5 回 De... Semi-Supervised Autoencoders for Predicting Sentiment Distributions(第 5 回 De...
Semi-Supervised Autoencoders for Predicting Sentiment Distributions(第 5 回 De...
Ohsawa Goodfellow
 
Sequence learning and modern RNNs
Sequence learning and modern RNNsSequence learning and modern RNNs
Sequence learning and modern RNNs
Grigory Sapunov
 
PhD Defense
PhD DefensePhD Defense
PhD Defense
Taehoon Lee
 
Neuromorphic computing for neural networks
Neuromorphic computing for neural networksNeuromorphic computing for neural networks
Neuromorphic computing for neural networks
Claudio Gallicchio
 
Introduction to Tree-LSTMs
Introduction to Tree-LSTMsIntroduction to Tree-LSTMs
Introduction to Tree-LSTMs
Daniel Perez
 
CNN for modeling sentence
CNN for modeling sentenceCNN for modeling sentence
CNN for modeling sentence
ANISH BHANUSHALI
 
Semester presentation
Semester presentationSemester presentation
Semester presentation
khush bakhat
 
Deep Learning - RNN and CNN
Deep Learning - RNN and CNNDeep Learning - RNN and CNN
Deep Learning - RNN and CNN
Pradnya Saval
 
ANN Based POS Tagging For Nepali Text
ANN Based POS Tagging For Nepali Text ANN Based POS Tagging For Nepali Text
ANN Based POS Tagging For Nepali Text
ijnlc
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspective
Anirban Santara
 

Similar to Recursive Neural Networks (20)

Recurrent Neural Network
Recurrent Neural NetworkRecurrent Neural Network
Recurrent Neural Network
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
 
Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)
 
Dynamic pooling and unfolding recursive autoencoders for paraphrase detection
Dynamic pooling and unfolding recursive autoencoders for paraphrase detectionDynamic pooling and unfolding recursive autoencoders for paraphrase detection
Dynamic pooling and unfolding recursive autoencoders for paraphrase detection
 
Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...
 
SoftComputing6
SoftComputing6SoftComputing6
SoftComputing6
 
Nn devs
Nn devsNn devs
Nn devs
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.ppt
 
Introduction to Deep Learning presentation
Introduction to Deep Learning presentationIntroduction to Deep Learning presentation
Introduction to Deep Learning presentation
 
Evolving Comprehensible Neural Network Trees
Evolving Comprehensible Neural Network TreesEvolving Comprehensible Neural Network Trees
Evolving Comprehensible Neural Network Trees
 
Semi-Supervised Autoencoders for Predicting Sentiment Distributions(第 5 回 De...
 Semi-Supervised Autoencoders for Predicting Sentiment Distributions(第 5 回 De... Semi-Supervised Autoencoders for Predicting Sentiment Distributions(第 5 回 De...
Semi-Supervised Autoencoders for Predicting Sentiment Distributions(第 5 回 De...
 
Sequence learning and modern RNNs
Sequence learning and modern RNNsSequence learning and modern RNNs
Sequence learning and modern RNNs
 
PhD Defense
PhD DefensePhD Defense
PhD Defense
 
Neuromorphic computing for neural networks
Neuromorphic computing for neural networksNeuromorphic computing for neural networks
Neuromorphic computing for neural networks
 
Introduction to Tree-LSTMs
Introduction to Tree-LSTMsIntroduction to Tree-LSTMs
Introduction to Tree-LSTMs
 
CNN for modeling sentence
CNN for modeling sentenceCNN for modeling sentence
CNN for modeling sentence
 
Semester presentation
Semester presentationSemester presentation
Semester presentation
 
Deep Learning - RNN and CNN
Deep Learning - RNN and CNNDeep Learning - RNN and CNN
Deep Learning - RNN and CNN
 
ANN Based POS Tagging For Nepali Text
ANN Based POS Tagging For Nepali Text ANN Based POS Tagging For Nepali Text
ANN Based POS Tagging For Nepali Text
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspective
 

More from Sangwoo Mo

Brief History of Visual Representation Learning
Brief History of Visual Representation LearningBrief History of Visual Representation Learning
Brief History of Visual Representation Learning
Sangwoo Mo
 
Learning Visual Representations from Uncurated Data
Learning Visual Representations from Uncurated DataLearning Visual Representations from Uncurated Data
Learning Visual Representations from Uncurated Data
Sangwoo Mo
 
Hyperbolic Deep Reinforcement Learning
Hyperbolic Deep Reinforcement LearningHyperbolic Deep Reinforcement Learning
Hyperbolic Deep Reinforcement Learning
Sangwoo Mo
 
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
Sangwoo Mo
 
Self-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSelf-supervised Learning Lecture Note
Self-supervised Learning Lecture Note
Sangwoo Mo
 
Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)
Sangwoo Mo
 
Deep Learning Theory Seminar (Chap 1-2, part 1)
Deep Learning Theory Seminar (Chap 1-2, part 1)Deep Learning Theory Seminar (Chap 1-2, part 1)
Deep Learning Theory Seminar (Chap 1-2, part 1)
Sangwoo Mo
 
Introduction to Diffusion Models
Introduction to Diffusion ModelsIntroduction to Diffusion Models
Introduction to Diffusion Models
Sangwoo Mo
 
Object-Region Video Transformers
Object-Region Video TransformersObject-Region Video Transformers
Object-Region Video Transformers
Sangwoo Mo
 
Deep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksDeep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural Networks
Sangwoo Mo
 
Learning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat MinimaLearning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat Minima
Sangwoo Mo
 
Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)
Sangwoo Mo
 
Explicit Density Models
Explicit Density ModelsExplicit Density Models
Explicit Density Models
Sangwoo Mo
 
Score-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential EquationsScore-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential Equations
Sangwoo Mo
 
Self-Attention with Linear Complexity
Self-Attention with Linear ComplexitySelf-Attention with Linear Complexity
Self-Attention with Linear Complexity
Sangwoo Mo
 
Meta-Learning with Implicit Gradients
Meta-Learning with Implicit GradientsMeta-Learning with Implicit Gradients
Meta-Learning with Implicit Gradients
Sangwoo Mo
 
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Sangwoo Mo
 
Generative Models for General Audiences
Generative Models for General AudiencesGenerative Models for General Audiences
Generative Models for General Audiences
Sangwoo Mo
 
Bayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-LearningBayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-Learning
Sangwoo Mo
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
Sangwoo Mo
 

More from Sangwoo Mo (20)

Brief History of Visual Representation Learning
Brief History of Visual Representation LearningBrief History of Visual Representation Learning
Brief History of Visual Representation Learning
 
Learning Visual Representations from Uncurated Data
Learning Visual Representations from Uncurated DataLearning Visual Representations from Uncurated Data
Learning Visual Representations from Uncurated Data
 
Hyperbolic Deep Reinforcement Learning
Hyperbolic Deep Reinforcement LearningHyperbolic Deep Reinforcement Learning
Hyperbolic Deep Reinforcement Learning
 
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
 
Self-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSelf-supervised Learning Lecture Note
Self-supervised Learning Lecture Note
 
Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)
 
Deep Learning Theory Seminar (Chap 1-2, part 1)
Deep Learning Theory Seminar (Chap 1-2, part 1)Deep Learning Theory Seminar (Chap 1-2, part 1)
Deep Learning Theory Seminar (Chap 1-2, part 1)
 
Introduction to Diffusion Models
Introduction to Diffusion ModelsIntroduction to Diffusion Models
Introduction to Diffusion Models
 
Object-Region Video Transformers
Object-Region Video TransformersObject-Region Video Transformers
Object-Region Video Transformers
 
Deep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksDeep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural Networks
 
Learning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat MinimaLearning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat Minima
 
Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)
 
Explicit Density Models
Explicit Density ModelsExplicit Density Models
Explicit Density Models
 
Score-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential EquationsScore-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential Equations
 
Self-Attention with Linear Complexity
Self-Attention with Linear ComplexitySelf-Attention with Linear Complexity
Self-Attention with Linear Complexity
 
Meta-Learning with Implicit Gradients
Meta-Learning with Implicit GradientsMeta-Learning with Implicit Gradients
Meta-Learning with Implicit Gradients
 
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
 
Generative Models for General Audiences
Generative Models for General AudiencesGenerative Models for General Audiences
Generative Models for General Audiences
 
Bayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-LearningBayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-Learning
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 

Recently uploaded

DynamoDB to ScyllaDB: Technical Comparison and the Path to Success
DynamoDB to ScyllaDB: Technical Comparison and the Path to SuccessDynamoDB to ScyllaDB: Technical Comparison and the Path to Success
DynamoDB to ScyllaDB: Technical Comparison and the Path to Success
ScyllaDB
 
The "Zen" of Python Exemplars - OTel Community Day
The "Zen" of Python Exemplars - OTel Community DayThe "Zen" of Python Exemplars - OTel Community Day
The "Zen" of Python Exemplars - OTel Community Day
Paige Cruz
 
CTO Insights: Steering a High-Stakes Database Migration
CTO Insights: Steering a High-Stakes Database MigrationCTO Insights: Steering a High-Stakes Database Migration
CTO Insights: Steering a High-Stakes Database Migration
ScyllaDB
 
Communications Mining Series - Zero to Hero - Session 2
Communications Mining Series - Zero to Hero - Session 2Communications Mining Series - Zero to Hero - Session 2
Communications Mining Series - Zero to Hero - Session 2
DianaGray10
 
Move Auth, Policy, and Resilience to the Platform
Move Auth, Policy, and Resilience to the PlatformMove Auth, Policy, and Resilience to the Platform
Move Auth, Policy, and Resilience to the Platform
Christian Posta
 
Fuxnet [EN] .pdf
Fuxnet [EN]                                   .pdfFuxnet [EN]                                   .pdf
Fuxnet [EN] .pdf
Overkill Security
 
ThousandEyes New Product Features and Release Highlights: June 2024
ThousandEyes New Product Features and Release Highlights: June 2024ThousandEyes New Product Features and Release Highlights: June 2024
ThousandEyes New Product Features and Release Highlights: June 2024
ThousandEyes
 
Chapter 1 - Fundamentals of Testing V4.0
Chapter 1 - Fundamentals of Testing V4.0Chapter 1 - Fundamentals of Testing V4.0
Chapter 1 - Fundamentals of Testing V4.0
Neeraj Kumar Singh
 
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc
 
EverHost AI Review: Empowering Websites with Limitless Possibilities through ...
EverHost AI Review: Empowering Websites with Limitless Possibilities through ...EverHost AI Review: Empowering Websites with Limitless Possibilities through ...
EverHost AI Review: Empowering Websites with Limitless Possibilities through ...
SOFTTECHHUB
 
Introduction to ThousandEyes AMER Webinar
Introduction  to ThousandEyes AMER WebinarIntroduction  to ThousandEyes AMER Webinar
Introduction to ThousandEyes AMER Webinar
ThousandEyes
 
APJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes WebinarAPJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes Webinar
ThousandEyes
 
Product Listing Optimization Presentation - Gay De La Cruz.pdf
Product Listing Optimization Presentation - Gay De La Cruz.pdfProduct Listing Optimization Presentation - Gay De La Cruz.pdf
Product Listing Optimization Presentation - Gay De La Cruz.pdf
gaydlc2513
 
ScyllaDB Topology on Raft: An Inside Look
ScyllaDB Topology on Raft: An Inside LookScyllaDB Topology on Raft: An Inside Look
ScyllaDB Topology on Raft: An Inside Look
ScyllaDB
 
An Introduction to All Data Enterprise Integration
An Introduction to All Data Enterprise IntegrationAn Introduction to All Data Enterprise Integration
An Introduction to All Data Enterprise Integration
Safe Software
 
intra-mart Accel series 2024 Spring updates_En
intra-mart Accel series 2024 Spring updates_Enintra-mart Accel series 2024 Spring updates_En
intra-mart Accel series 2024 Spring updates_En
NTTDATA INTRAMART
 
The Strategy Behind ReversingLabs’ Massive Key-Value Migration
The Strategy Behind ReversingLabs’ Massive Key-Value MigrationThe Strategy Behind ReversingLabs’ Massive Key-Value Migration
The Strategy Behind ReversingLabs’ Massive Key-Value Migration
ScyllaDB
 
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdfLee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
leebarnesutopia
 
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My IdentityCNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
Cynthia Thomas
 
Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!
Ortus Solutions, Corp
 

Recently uploaded (20)

DynamoDB to ScyllaDB: Technical Comparison and the Path to Success
DynamoDB to ScyllaDB: Technical Comparison and the Path to SuccessDynamoDB to ScyllaDB: Technical Comparison and the Path to Success
DynamoDB to ScyllaDB: Technical Comparison and the Path to Success
 
The "Zen" of Python Exemplars - OTel Community Day
The "Zen" of Python Exemplars - OTel Community DayThe "Zen" of Python Exemplars - OTel Community Day
The "Zen" of Python Exemplars - OTel Community Day
 
CTO Insights: Steering a High-Stakes Database Migration
CTO Insights: Steering a High-Stakes Database MigrationCTO Insights: Steering a High-Stakes Database Migration
CTO Insights: Steering a High-Stakes Database Migration
 
Communications Mining Series - Zero to Hero - Session 2
Communications Mining Series - Zero to Hero - Session 2Communications Mining Series - Zero to Hero - Session 2
Communications Mining Series - Zero to Hero - Session 2
 
Move Auth, Policy, and Resilience to the Platform
Move Auth, Policy, and Resilience to the PlatformMove Auth, Policy, and Resilience to the Platform
Move Auth, Policy, and Resilience to the Platform
 
Fuxnet [EN] .pdf
Fuxnet [EN]                                   .pdfFuxnet [EN]                                   .pdf
Fuxnet [EN] .pdf
 
ThousandEyes New Product Features and Release Highlights: June 2024
ThousandEyes New Product Features and Release Highlights: June 2024ThousandEyes New Product Features and Release Highlights: June 2024
ThousandEyes New Product Features and Release Highlights: June 2024
 
Chapter 1 - Fundamentals of Testing V4.0
Chapter 1 - Fundamentals of Testing V4.0Chapter 1 - Fundamentals of Testing V4.0
Chapter 1 - Fundamentals of Testing V4.0
 
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
 
EverHost AI Review: Empowering Websites with Limitless Possibilities through ...
EverHost AI Review: Empowering Websites with Limitless Possibilities through ...EverHost AI Review: Empowering Websites with Limitless Possibilities through ...
EverHost AI Review: Empowering Websites with Limitless Possibilities through ...
 
Introduction to ThousandEyes AMER Webinar
Introduction  to ThousandEyes AMER WebinarIntroduction  to ThousandEyes AMER Webinar
Introduction to ThousandEyes AMER Webinar
 
APJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes WebinarAPJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes Webinar
 
Product Listing Optimization Presentation - Gay De La Cruz.pdf
Product Listing Optimization Presentation - Gay De La Cruz.pdfProduct Listing Optimization Presentation - Gay De La Cruz.pdf
Product Listing Optimization Presentation - Gay De La Cruz.pdf
 
ScyllaDB Topology on Raft: An Inside Look
ScyllaDB Topology on Raft: An Inside LookScyllaDB Topology on Raft: An Inside Look
ScyllaDB Topology on Raft: An Inside Look
 
An Introduction to All Data Enterprise Integration
An Introduction to All Data Enterprise IntegrationAn Introduction to All Data Enterprise Integration
An Introduction to All Data Enterprise Integration
 
intra-mart Accel series 2024 Spring updates_En
intra-mart Accel series 2024 Spring updates_Enintra-mart Accel series 2024 Spring updates_En
intra-mart Accel series 2024 Spring updates_En
 
The Strategy Behind ReversingLabs’ Massive Key-Value Migration
The Strategy Behind ReversingLabs’ Massive Key-Value MigrationThe Strategy Behind ReversingLabs’ Massive Key-Value Migration
The Strategy Behind ReversingLabs’ Massive Key-Value Migration
 
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdfLee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
 
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My IdentityCNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
CNSCon 2024 Lightning Talk: Don’t Make Me Impersonate My Identity
 
Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!
 

Recursive Neural Networks

  • 2. Recursive Neural Network (RNN) - Motivation • Motivation: Many real objects has a recursive structure, e.g. Images are sum of segments, and sentences are sum of words Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011. Image from Stanford CS224N Lecture Note 14.
  • 3. Recursive Neural Network (RNN) - Motivation • Motivation: Can we learn a good representation for the recursive structures? • Recursive structures (phrases) and components (words) should lie on the same space, e.g. the country of my birth ≃ Germany, France, etc. Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011. Image from Stanford CS224N Lecture Note 14.
  • 4. Recursive Neural Network (RNN) - Model • Goal: Design a neural network that features are recursively constructed • Each module maps two children to one parents, lying on the same vector space • To give the order of recursion, we give a score (plausibility) for each node • Hence, the neural network module outputs (representation, score) pairs Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011. Image from Stanford CS224N Lecture Note 14. Each line is
  • 5. Recursive Neural Network (RNN) - Model • cf. Note that recurrent neural network is a special case of recursive neural network Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011. Image from Ratsgo’s blog for text mining. =
  • 6. Recursive Neural Network (RNN) - Inference • Each step, merge adjacent two nodes • With greedy algorithm, it only requires 𝑂(𝑁) time for inference Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011. Image from Stanford CS224N Lecture Note 14.
  • 7. Recursive Neural Network (RNN) - Inference • Each step, merge adjacent two nodes • With greedy algorithm, it only requires 𝑂(𝑁) time for inference Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011. Image from Stanford CS224N Lecture Note 14.
  • 8. Recursive Neural Network (RNN) - Inference • Each step, merge adjacent two nodes • With greedy algorithm, it only requires 𝑂(𝑁) time for inference Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011. Image from Stanford CS224N Lecture Note 14.
  • 9. Recursive Neural Network (RNN) - Inference • Each step, merge adjacent two nodes • With greedy algorithm, it only requires 𝑂(𝑁) time for inference Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011. Image from Stanford CS224N Lecture Note 14.
  • 10. Recursive Neural Network (RNN) - Inference • We can apply beam search to improve the performance • Beam search: Keep 𝑘-memory for each step (Greedy = 1-Beam search) Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011. Image from Ratsgo’s blog for text mining.
  • 11. Recursive Neural Network (RNN) - Training • Let (sentence, tree) pair (𝑥𝑖, 𝑦𝑖) are given • Let 𝑠(𝑥𝑖, 𝑦) be score of tree 𝑦, sum of scores of every non-leaf nodes • Let 𝐴(𝑥𝑖) be candidate trees (approximated by beam search) • Then max-margin objective (maximize) is where Δ 𝑦, 𝑦𝑖 is number of wrong subtrees • We can also give a classification loss for each node (use node’s feature as input for the classifier) Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011. Image from Stanford CS224N Lecture Note 14. Increases 𝑠(𝑥𝑖, 𝑦𝑖) decreases 𝑠(𝑥𝑖, 𝑦) if 𝑠 𝑥𝑖, 𝑦 + Δ 𝑦, 𝑦𝑖 > 𝑠(𝑥𝑖, 𝑦𝑖) class vector
  • 12. Recursive Neural Network (RNN) - Experiments • After training, both leaf and higher nodes learn the valid representation • Image segmentation: Infer classes for segments (feature extractor is jointly trained) • Phrase clustering: Nearest neighborhood on phrase features Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011.
  • 13. Recursive Neural Network (RNN) - Appendix • Preprocessing: How to convert segments/words to the representation space ℝ 𝑛 ? • Word: Use pretrained word2vec model (𝑉 → ℝ 𝑛) • Image: Extract hand-crafted features in ℝ 𝑚 , and jointly train a network 𝐹: ℝ 𝑚 → ℝ 𝑛 • Extension to image segmentation • There are multiple adjacency segments • Hence, there are multiple true tree structures • Hence, Δ 𝑦, 𝑦𝑖 checks if the subtree is included in the set of true tree structures Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011.
  • 14. Recursive Autoencoder (RAE) - Motivation & Idea • Motivation: Recursive neural network (RNN) requires true tree structures for training • Recursive autoencoder (RAE) extends RNN to un- (semi-)supervised learning setting • If tree structure 𝑦 is given, we can train a local autoencoder 𝑐1, 𝑐2 → 𝑝 → 𝑐1 ′ , 𝑐2 ′ on each node, with reconstruction loss 𝐿(𝑦) = σ 𝑐1,𝑐2,𝑝 ∈𝑦 𝑐1, 𝑐2 − 𝑐1 ′ , 𝑐2 ′ 2 Socher et al. Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions. EMNLP 2011.
  • 15. Recursive Autoencoder (RAE) - Model • If tree structure 𝑦 is given, we can train a local autoencoder 𝑐1, 𝑐2 → 𝑝 → 𝑐1 ′ , 𝑐2 ′ on each node, with reconstruction loss 𝐿(𝑦) = σ 𝑐1,𝑐2,𝑝 ∈𝑦 𝑐1, 𝑐2 − 𝑐1 ′ , 𝑐2 ′ 2 • If tree structure is not given, we take minimum over all candidate trees 𝐴(𝑥𝑖) argmin 𝑦∈𝐴(𝑥 𝑖) 𝐿(𝑦) = argmin 𝑦∈𝐴(𝑥 𝑖) ෍ 𝑐1,𝑐2,𝑝 ∈𝑦 𝑐1, 𝑐2 − 𝑐1 ′ , 𝑐2 ′ 2 • Here, 𝐴(𝑥𝑖) is approximated by greedy search, using recon loss as score • Length normalization: Minimizing recon loss forces the scale of hidden nodes be 0 To prevent this, normalize hidden nodes by length: 𝑝/‖𝑝‖ • The resulting tree captures the information of words, but not follows the syntactics • However, the learnt representation was still useful Socher et al. Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions. EMNLP 2011.
  • 16. Recursive Autoencoder (RAE) - Experiments • For each paragraph, votes on 5 sentiments are labeled (multiple votes for one paragraph) • Train a logistic regression model using the learnt representation • The learnt representation was better than baseline models, e.g. binary bag-of-words, hand-crafted features, and average of word vectors Socher et al. Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions. EMNLP 2011.
  • 17. Unfolding RAE & Dynamic Pooling - Model • Unfolding RAE is global autoencoder version of RAE (expensive but may better) • In some tasks, e.g. paraphrase detection, we should compare features of sentences • Comparing all features would be better than root features, but size does not match • Dynamic pooling converts the similarity matrix to the fixed-sized matrix Socher et al. Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection. NIPS 2011.
  • 18. Unfolding RAE & Dynamic Pooling - Experiments • Unfolding RAE learns better representation than RAE • Unfolding RAE + dynamic pooling gives the best representation for similarity Socher et al. Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection. NIPS 2011. Nearest Neighbors Similarity Classification
  • 19. Matrix-Vector RNN (MV-RNN) • Motivation: Different word pairs have different composition rule • Idea: Represent the composition rule of words ∈ ℝ 𝑛 by a matrix ∈ ℝ 𝑛×𝑛 • Hence, each word is represented by a matrix-vector pair 𝑎, 𝐴 ∈ ℝ 𝑛 × ℝ 𝑛×𝑛 • For two words 𝑎, 𝐴 and 𝑏, 𝐵 , the parent node 𝑝, 𝑃 is given by 𝑝 = 𝑓𝑉 𝑎, 𝑏, 𝐴, 𝐵 = ሚ𝑓𝑉 𝐵𝑎, 𝐴𝑏 and 𝑃 = 𝑓 𝑀 𝐴, 𝐵 = 𝑊 𝑀 ⋅ 𝐴 𝐵 𝑇 • We should store ℝ 𝑛×𝑛×|𝑉| matrixes, hence the authors use low-rank approximation to reduce the # of parameters • MV-RNN shows better performance than vanilla RNN Socher et al. Semantic Compositionality through Recursive Matrix-Vector Spaces. EMNLP 2012. Semantic Classification
  • 20. Recursive Neural Tensor Network (RNTN) • Motivation: Considering composition is cool, but MV-RNN uses too many parameters • Instead of using one matrix for each word, use a single tensor to represent composition • Formally, let 𝑉[1:𝑛] ∈ ℝ2𝑛×2𝑛×𝑛 where 𝑉[𝑖] ∈ ℝ2𝑛×2𝑛 indicates each tensor slices • Then the composition rule ℎ ∈ ℝ 𝑛 for children (𝑎, 𝑏) are given by ℎ𝑖 = 𝑎 𝑏 ⋅ 𝑉 𝑖 ⋅ 𝑎 𝑏 𝑇 and the parent 𝑝 ∈ ℝ 𝑛 is 𝑝 = 𝑓 𝑎, 𝑏, ℎ = ሚ𝑓(ℎ + 𝑊 ⋅ 𝑎 𝑏 𝑇 ) • It reduced the # of parameters from 𝑑 × 𝑑 × |𝑉| to 2𝑑 × 2𝑑 × 𝑑 • RNTN also shows better performance than MV-RNN Socher et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. EMNLP 2013.
  • 21. Reference • Recursive Neural Network (RNN): Socher et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. ICML 2011. • Recursive Autoencoder (RAE): Socher et al. Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions. EMNLP 2011. • Unfolding RAE & Dynamic Pooling: Socher et al. Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection. NIPS 2011. • Matrix-Vector RNN (MV-RNN): Socher et al. Semantic Compositionality through Recursive Matrix-Vector Spaces. EMNLP 2012. • Recursive Neural Tensor Network (RNTN): Socher et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. EMNLP 2013.
  翻译: