Bayu Distiawan Trisedya

Learn More
The popularity of the user generated content, such as Twitter, has made it a rich source for the sentiment analysis and opinion mining tasks. This paper presents our study in automatically building a training corpus for the sentiment analysis on Indonesian tweets. We start with a set of seed sentiment corpus and subsequently expand them using a classifier(More)
Stock price prediction is a difficult task, since it very depending on the demand of the stock, and there is no certain variable that can precisely predict the demand of one stock each day. However, Efficient Market Hypothesis (EMH) said that stock price also depends on new information significantly. One of many information sources is people's opinion in(More)
This paper describes the development of an Indonesian speech recognition web service which complies with two standards: it operates on the Language Grid, ensuring process interoperability, and its output uses the LAF/GrAF format, ensuring data interoperability. It is part of a larger system, currently in development, that aims to collect speech(More)
This paper describes efforts to develop an online repository of Indonesian corpora –and its associated functions and services– that has been designed to support a wide variety of use cases and applications. Two design considerations are ensuring sustainability and accessibility of the corpora, and enabling open enrichment through annotation. The presented(More)
In this paper we describe our submission to the TREC2011 MicroblogTrack. Our run combines different methods namely customized scoring function, query reformulation, and query expansion. We apply query expansion from dataset with different weighting scheme. Furthermore, we do an initial experiment to incorporate timestamp of the tweet document in order to(More)
Parallel corpora are necessary for multilingual researches especially in information retrieval (IR) and natural language processing (NLP). However, such corpora are hard to find, specifically for low-resources languages like ethnic languages. Parallel corpora of ethnic languages were usually collected manually. On the other hand, Wikipedia as a free online(More)
This paper describes the development of an Indonesian NER system using online data such as Wikipedia 1 and DBPedia 2. The system is based on the Stanford NER system [8] and utilizes training documents constructed automatically from Wikipedia. Each entity, i.e. word or phrase that has a hyperlink, in the Wikipedia documents are tagged according to(More)
  • 1