The hippocampus has become the focus of research in several neurodegenerative disorders. Automatic segmentation of this structure from magnetic resonance (MR) imaging scans of the brain facilitates this work. Segmentation techniques must be evaluated using a dataset of MR images with accurate hippocampal outlines generated manually. Manual segmentation is not a trivial task. Lack of a unique segmentation protocol and poor image quality are only two factors that have confounded the consistency required for comparative study. We have developed a publicly available dataset of T1-weighted (T1W) MR images of epileptic and nonepileptic subjects along with their hippocampal outlines to provide a means of evaluation of segmentation techniques. This dataset contains 50 T1W MR images, 40 epileptic and ten nonepileptic. All images were manually segmented by a widely used protocol. Twenty five images were selected for training and were provided with hippocampal labels. Twenty five other images were provided without labels for testing algorithms. The users are allowed to evaluate their generated labels for the test images using 11 segmentation similarity metrics. Using this dataset, we evaluated two segmentation algorithms, Brain Parser and Classifier Fusion and Labeling (CFL), trained by the training set. For Brain Parser, an average Dice coefficient of 0.64 was obtained with the testing set. For CFL, this value was 0.75. Such findings indicate a need for further improvement of segmentation algorithms in order to enhance reliability.