Distilling Task Knowledge from How-To Communities
Despite the success of large knowledge bases, one kind of knowledge that has not received attention so far is that of human activities. An example of such an activity is proposing to someone (to get married). For the computer, knowing that this involves two adults, often but not necessarily a woman and a man, that it often takes place in some romantic location, that it typically involves flowers or jewelry, and that it is usually followed by kissing, is a valuable asset for tasks like natural language dialog, scene understanding, or video search. This corresponds to the challenging task of acquiring semantic frames that capture human activities, their participating agents, and their typical spatio-temporal contexts. This paper presents a novel approach that taps into movie scripts and other narrative texts. We develop a pipeline for semantic parsing and knowledge distillation, to systematically compile semantically refined activity frames. The resulting knowledge base contains hundreds of thousands of activity frames, mined from about two million scenes of movies, TV series, and novels. A manual assessment study, with extensive sampling and statistical significance tests, shows that the frames and their attribute values have an accuracy of at least 80 percent. We also demonstrate the usefulness of activity knowledge by the extrinsic use case of movie scene search.