In this paper, we propose an approach that fuses information from a network of visual sensors for the analysis of human social behavior. A discriminative interaction classifier is trained based on the relative head orientation and distance between a pair of people. Specifically, we explore human interaction detection at different levels of feature fusion and decision fusion. While feature fusion mitigates local errors and improves feature accuracy, decision fusion at higher levels significantly reduces the amount of information to be shared among cameras. Experiment results show that our proposed method achieves promising performance on a challenging dataset. By distributing the computation over multiple smart cameras, our approach is not only robust but also scalable.