This document discusses evaluation in information retrieval. It describes standard test collections which consist of a document collection, queries on the collection, and relevance judgments. It also discusses various evaluation measures used in information retrieval like precision, recall, F-measure, mean average precision, and kappa statistic which measure reliability of relevance judgments. R-precision and normalized discounted cumulative gain are also summarized as important single number evaluation measures.