Gaussian Process classification (GPC) allows accurate and reliable detection of objects. The high computational load of squared-error or radial basis function kernels limits the applications that GPC can be used in, as memory requirements and computation time are both limiting factors. We describe our version of accelerated GPC on GPU (Graphics Processing Unit). GPUs have limited memory so any GPC implementation must be memory-efficient as well as computationally efficient. Using a high-performance pedestrian detector as a starting point, we use its packed or block-based feature descriptor and demonstrate a fast matrix multiplication implementation of GPC which is also extremely memory efficient. We demonstrate a speed up of 3.7 times over a multicore, BLAS-optimised CPU implementation. Results show that this is more accurate and reliable than results obtained from a comparable support vector machine algorithm.