Cost-sensitive label embedding for multi-label classification
The proposed algorithm, cost-sensitive label embedding with multidimensional scaling (CLEMS), approximates the cost information with the distances of the embedded vectors by using the classic multiddimensional scaling approach for manifold learning.
Examining Gender Bias in Languages with Grammatical Gender
- Pei Zhou, Weijia Shi, Kai-Wei Chang
- Linguistics, Computer ScienceConference on Empirical Methods in Natural…
- 1 September 2019
Experiments on modified Word Embedding Association Test, word similarity, word translation, and word pair translation tasks show that the proposed approaches can effectively reduce the gender bias while preserving the utility of the original embeddings.
Generating Syntactically Controlled Paraphrases without Using Annotated Parallel Pairs
- Kuan-Hao Huang, Kai-Wei Chang
- Computer ScienceConference of the European Chapter of the…
- 26 January 2021
This paper proposes Syntactically controlled Paraphrase Generator (SynPG), an encoder-decoder based model that learns to disentangle the semantics and the syntax of a sentence from a collection of unannotated texts that performs better syntactic control than unsupervised baselines while the quality of the generated paraphrases is competitive.
DEGREE: A Data-Efficient Generation-Based Event Extraction Model
- I-Hung Hsu, Kuan-Hao Huang, Nanyun Peng
- Computer ScienceNorth American Chapter of the Association for…
- 29 August 2021
DEGREE is proposed, a data-efficient model that formulates event extraction as a conditional generation problem and learns triggers and arguments jointly in an end-to-end manner, which encourages the model to better utilize the shared knowledge and dependencies among them.
DEGREE: A Data-Efficient Generative Event Extraction Model
This work presents a data-efficient event extraction method by formulating event extraction as a natural language generation problem and achieves superior performance over strong baselines on EE tasks in the low data regime and achieves competitive results to the current state of theart when more data becomes available.
Generating Sports News from Live Commentary: A Chinese Dataset for Sports Game Summarization
A two-step summarization model consisting of a selector and a rewriter for SportsSum, a Chinese sports game summarization dataset which contains 5,428 soccer games of live commentaries and the corresponding news articles is presented.
DeepAL: Deep Active Learning in Python
- Kuan-Hao Huang
- Computer Science
- 30 November 2021
DeepAL provides a simple and unified framework based on PyTorch that allows users to easily load custom datasets, build custom data handlers, and design custom strategies without much modification of codes.
Multilingual Generative Language Models for Zero-Shot Cross-Lingual Event Argument Extraction
- Kuan-Hao Huang, I-Hung Hsu, P. Natarajan, Kai-Wei Chang, Nanyun Peng
- Computer Science, LinguisticsAnnual Meeting of the Association for…
- 15 March 2022
By formulating EAE as a language generation task, the method effectively encodes event structures and captures the dependencies between arguments, and design language-agnostic templates to represent the event argument structures, which are compatible with any language, hence facilitating the cross-lingual transfer.
Combination of feature engineering and ranking models for paper-author identification in KDD Cup 2013
The winning solution of team National Taiwan University for track 1 of KDD Cup 2013 is to discriminate between papers confirmed by the given authors from the other deleted papers and achieves 0.98259 MAP score and ranks the first place on the private leaderboard of Test set.
A Comparative Survey of Deep Active Learning
- Xueying Zhan, Qingzhong Wang, Kuan-Hao Huang, H. Xiong, D. Dou, Antoni B. Chan
- Computer ScienceArXiv
- 25 March 2022
A DAL toolkit is constructed, DeepAL + , by re-implementing many highly-cited DAL-related methods, and it will be released to the public.