In this paper, we introduce a new approach to finger key point detection. For RGB images captured from an egocentric vision with a mobile camera, fingertip point detection remains a challenging problem due to various factors, like background complexity, illumination variety, hand shape diversity, and image blur cause by camera movements. To address these issues, we propose a bi-level cascade structure of a convolutional neuron network (CNN). The first-level CNN generates a bounding box of hand region by filtering a large proportion of complicated background information. Using the bounding box area as input, the second-level CNN including an extra branch returns accurate fingertip location with a multi-channel dataset. Our approach is the first attempt of finger key point detection from an egocentric vision with a mobile camera. The proposed method achieves satisfying and significant better results compared to previous fingertip detection methods based on handcraft features.