Data Set Used
PAIS`A is a Creative Commons licensed, large web corpus of contemporary Italian. We describe the design, harvesting, and processing steps involved in its creation.
This work introduces SYMPAThy, a data representation model in which the com-binatorial properties of a lexical item are described by merging surface and deeper linguistic information. The proposed approach is then evaluated by comparing, for a sample list of verbal idioms, a set of SYMPAThy-based fixedness indexes against the relevant speaker-elicited… (More)
An established method for MWE extraction is the combined use of previously identified POS-patterns and association measures. However, the selection of such POS-patterns is rarely debated. Focusing on Ital-ian MWEs containing at least one adjective , we set out to explore how candidate POS-patterns listed in relevant literature and lexicographic sources… (More)