Mobile ocular biometrics in visible spectrum using local image descriptors: A preliminary study
Introduction. Recently, a comparative study in  has shown the superior performance of local features for face recognition in unconstrained environments. Due to the global integration of Speeded Up Robust Features (SURF) , the authors claim that it stays more robust to various image perturbations than the more locally operating SIFT descriptor. However, no detailed analysis for a SURF based face recognition has been presented so far. We provide a detailed analysis of the SURF descriptors for face recognition, and investigate whether rotation invariant descriptors are helpful for face recognition. The SURF descriptors are compared to SIFT descriptors, and different matching and viewpoint consistency constraints are benchmarked on the AR-Face and CMU-PIE databases. Additionally, a RANSAC based outlier removal and system combination approach is presented. Interest Point Based Feature Extraction. Interest points need to be found at different scales, where scale spaces are usually implemented as an image pyramid. The pyramid levels are obtained by Gaussian smoothing and sub-sampling. By iteratively reducing the image size, SIFT  uses a Difference of Gaussians (DoG) and Hessian detector by subtracting these pyramid layers. Instead, in SURF  the scale space is rather analyzed by up-scaling the integral image based filter sizes in combination with a fast Hessian matrix based approach. Grid-Based Feature Extraction. Usually, a main drawback of an interest point based feature extraction is the large number of false positive detections. This drawback can be overcome by the use of hypothesis rejection methods, such as RANSAC. However, in face recognition an interest point detection based feature extraction often fails due to missing texture or ill illuminated faces, so that only a few descriptors per face are extracted. Instead of extracting descriptors around interest points only, local feature descriptors are extracted at regular image grid points who give us a dense description of the image content. Local Feature Descriptors. The SIFT descriptor is a 128-dimensional vector which stores the gradients of 4 × 4 locations around a pixel in a histogram of 8 main orientations . Conceptually similar to the SIFT descriptor, the 64-dimensional SURF descriptor  also focusses on the spatial distribution of gradient information within the interest point neighborhood. The SURF descriptor is invariant to rotation, scale, brightness and, after reduction to unit length, contrast. In certain applications such as face recognition, rotation invariant descriptors can lead to false matching correspondences. The impact of using an upright version of the SURF and SIFT descriptors (i.e. USURF, USIFT) is investigated. Recognition by Matching. The matching is carried out by a nearest neighbor matching strategy. Additionally, a ratio constraint is applied: only if the distance from the nearest neighbor descriptor is less than 0.5 times the distance from the second nearest neighbor descriptor, a matching pair is detected. Finally, the classification is carried out by assigning the class of the nearest neighbor image which achieves the highest number of matching correspondences to the test image. Different viewpoint consistency constraints can be considered during matching, accounting for different transformation and registration errors, and resulting in different matching time complexities: • Maximum Matching: No viewpoint consistency constraints are considered during the matching, i.e. each keypoint in an image is compared to all keypoints in the target image. • Grid-Based Matching: Due to an overlaid regular grid and a blockwise comparison, outliers are removed by enforcing viewpoint consistency constraints. • Grid-Based Best Matching: Similar to the Grid-Based Matching, we additionally allow for overlapping blocks.