Learn More
In this paper, we consider sentence simplification as a special form of translation with the complex sentence as the source and the simple sentence as the target. We propose a Tree-based Simplification Model (TSM), which, to our knowledge, is the first statistical simplification model covering splitting, dropping, reordering and substitution integrally. We(More)
Twitter messages are a potentially rich source of continuously and instantly updated information. Shortness and informality of such messages are challenges for Natural Language Processing tasks. In this paper we present a hybrid approach for Named Entity Extraction (NEE) and Classification (NEC) for tweets. The system uses the power of the Conditional(More)
Dedicated to the soul of my father Acknowledgments " I can do all things through Christ who strengthens me. (Philippians 4:13) " I always say that I am lucky. I am lucky because I always get wonderful and kind people surrounding me. I am lucky to have Peter Apers as my promoter. He supported my research directions and gave me freedom and independence. His(More)
The quality of user-generated content in Web 2.0 dramatically varies from professional to abusive. Quality assessment is therefore a critical problem in producing, managing and retrieving information in Web 2.0. In this paper, we develop a multi-dimensional model for assessing the quality of answers in social Q&A (Question & Answer) sites.
Over the past decade, community structure, a statistical property of networked systems such as social network and World Wide Web, has attracted considerable attention in data mining field because it enables description and prediction of complex networks. Many highly sensitive graph clustering algorithms were developed for identification of communities(More)
Sequence labeling has wide applications in natural language processing and speech processing. Popular sequence labeling models suffer from some known problems. Hidden Markov models (HMMs) are generative models and they cannot encode transition features; Conditional Markov models (CMMs) suffer from the label bias problem; And training of conditional random(More)
The dispersive characteristics of higher order mode Lamb waves (HOMLW) excited by interdigital transducers (IDT) are measured and analyzed, which are necessary for designing micro-sensor in ultrahigh frequency (UHF). A measurement system is set up, in which dispersive characteristics of HOMLW are obtained by the method of transform between frequency and(More)
Structured prediction has wide applications in many areas. Powerful and popular models for structured prediction have been developed. Despite the successes, they suffer from some known problems: (i) Hidden Markov models are generative models which suffer from the mismatch problem. Also it is difficult to incorporate overlapping, non-independent features(More)
The phenomenon of image distortions caused by the multiple scattering (MS) effects of encapsulated microbubbles in ultrasoniqc imaging was experimentally found in previous studies (Soetanto and Chan 2000a), but its mechanism has not been fully understood. To study the MS effects of microbubbles in contrast imaging, two approaches are employed in this(More)
The standard training method of Conditional Random Fields (CRFs) is very slow for large-scale applications. As an alternative, piecewise training divides the full graph into pieces, trains them independently, and combines the learned weights at test time. In this paper, we present separate training for undirected models based on the novel Cooccurrence Rate(More)