Ashwini Vaidya

Learn More
In this paper, we have addressed the task of PropBank annotation of light verb constructions, which like multi-word expressions pose special problems. To arrive at a solution, we have evaluated 3 different possible methods of annotation. The final method involves three passes: (1) manual identification of a light verb construction, (2) annotation based on(More)
This paper, submitted as an entry for the NERSSEAL-2008 shared task, describes a system build for Named Entity Recognition for South and South East Asian Languages. Our paper combines machine learning techniques with language specific heuris-tics to model the problem of NER for In-dian languages. The system has been tested on five languages: Telugu, Hindi,(More)
We are in the process of creating a multi-representational and multi-layered treebank for Hindi/Urdu (Palmer et al., 2009), which has three main layers: dependency structure, predicate-argument structure (PropBank), and phrase structure. This paper discusses an important issue in treebank design which is often neglected: the use of empty categories (ECs).(More)
This paper makes two contributions. First, we describe the Hindi Proposition Bank that contains annotations of predicate argument structures of verb predicates. Unlike PropBanks in most other languages, the Hind PropBank is annotated on top of dependency structure, the Hindi Dependency Treebank. We explore the similarities between dependency and predicate(More)
The linguistic annotation of noun-verb complex predicates (also termed as light verb constructions) is challenging as these predicates are highly productive in Hindi. For semantic role labelling, each argument of the noun-verb complex predicate must be given a role label. For complex predicates, frame files need to be created specifying the role labels for(More)
This paper examines both linguistic behavior and practical implications of empty argument insertion in the Hindi PropBank. The Hindi PropBank is annotated on the Hindi Dependency Treebank, which contains some empty categories but rarely the empty arguments of verbs. In this paper, we analyze four kinds of empty arguments, * PRO * , * REL * , * GAP * , * pro(More)
Complex predicates (CPs) are a highly productive predicational phenomenon in Hindi and Urdu and present a challenge for deep syntactic parsing. For CPs, a combination of a noun and light verb express a single event. The combinatorial preferences of nouns with one (or more) light verb is useful for predicting an instance of a CP. In this paper, we present a(More)
Hindi and Urdu are two standardized registers of what has been called the Hindus-tani language, which belongs to the Indo-Aryan language family. Although, both the varieties share a common grammar, they differ significantly in their vocabulary to an extent where both become mutually incomprehensible (Masica, 1993). Hindi draws its vocabulary from Sanskrit(More)
Converting a treebank from one representation type to another poses several challenges [4] [3]. These challenges are contingent on (amongst other things) the information encoded in source representation and the information required in target representation. In this paper, we propose a conversion algorithm that converts the Hindi-Urdu Dependency Treebank(More)