By combining multiple social media datasets, it is possible to gain insight into each dataset that goes beyond what could be obtained with either individually. In this paper we combine user-centric data from Twitter with video-centric data from YouTube to build a rich picture of who watches and shares what on YouTube. We study 87K Twitter users, 5.6 million YouTube videos and 15 million video sharing events from user-, video- and sharing-event-centric perspectives. We show that features of Twitter users correlate with YouTube features and sharing-related features. For example, urban users are quicker to share than rural users. We find a superlinear relationship between initial Twitter shares and the final amounts of views. We discover that Twitter activity metrics play more role in video popularity than mere amount of followers. We also reveal the existence of correlated behavior concerning the time between video creation and sharing within certain timescales, showing the time onset for a coherent response, and the time limit after which collective responses are extremely unlikely. Response times depend on the category of the video, suggesting Twitter video sharing is highly dependent on the video content. To the best of our knowledge, this is the first large-scale study combining YouTube and Twitter data, and it reveals novel, detailed insights into who watches (and shares) what on YouTube, and when.