Our previous findings suggest that audio-visual synchrony perception is based on the matching of salient temporal features selected from each sensory modality through bottom-up segregation or by top-down attention to a specific spatial position. This study examined whether top-down attention to a specific feature value is also effective in selection of cross-modal matching features. In the first experiment, the visual stimulus was a pulse train in which a flash randomly appeared with a probability of 6.25, 12.5 or 25% for every 6.25 ms. Four flash colors randomly appeared with equal probability, and one of them was selected as the target color on each trial. The paired auditory stimulus was a single-pitch pip sequence that had the same temporal structure as the target color flashes, presented in synchrony with the target flashes (synchronous stimulus) or with a 250-ms relative shift (asynchronous stimuli). The task of the participants was synchrony-asynchrony discrimination, with the target color being indicated to the participant by a probe (with-probe condition) or not (without probe). In another control condition, there was no correlation between color and auditory signals (color-shuffled). In the second experiment, the roles of visual and auditory stimuli were exchanged. The results show that the performance of synchrony-asynchrony discrimination was worst for the color/pitch-shuffled condition, but best under the with-probe condition where the observer knew beforehand which color/pitch should be matched with the signal of the other modality. This suggests that top-down, feature-based attention can aid in feature selection for audio-visual synchrony discrimination even when the bottom-up segmentation processes cannot uniquely determine salient features. The observed feature-based selection, however, is not as effective as position-based selection.