VisualBERT: A Simple and Performant Baseline for Vision and Language
- Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang
- Computer ScienceArXiv
- 9 August 2019
Analysis demonstrates that VisualBERT can ground elements of language to image regions without any explicit supervision and is even sensitive to syntactic relationships, tracking, for example, associations between verbs and image regions corresponding to their arguments.
Grounded Language-Image Pre-training
- Liunian Harold Li, Pengchuan Zhang, Jianfeng Gao
- Computer ScienceComputer Vision and Pattern Recognition
- 7 December 2021
A grounded language-image pretraining model for learning object-level, language-aware, and semantic-rich visual representations that unifies object detection and phrase grounding for pre-training and can leverage massive image-text pairs by generating grounding boxes in a self-training fashion.
How Much Can CLIP Benefit Vision-and-Language Tasks?
- Sheng Shen, Liunian Harold Li, K. Keutzer
- Computer ScienceInternational Conference on Learning…
- 13 July 2021
It is shown that CLIP significantly outperforms widely-used visual encoders trained with in-domain annotated data, such as BottomUp-TopDown, and also establishes new state-of-the-art results on Visual Question Answering, Visual Entailment, and V&L Navigation tasks.
What Does BERT with Vision Look At?
- Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang
- Computer ScienceAnnual Meeting of the Association for…
- 1 July 2020
It is demonstrated that certain attention heads of a visually grounded language model actively ground elements of language to image regions, performing the task known as entity grounding.
On the Paradox of Learning to Reason from Data
- Honghua Zhang, Liunian Harold Li, Tao Meng, Kai-Wei Chang, Guy Van den Broeck
- Computer ScienceArXiv
- 23 May 2022
This study pro-vides an explanation for this paradox: instead of learning to emulate the correct reasoning function, BERT has in fact learned statistical features that inherently exist in logical reasoning problems.
Point Precisely: Towards Ensuring the Precision of Data in Generated Texts Using Delayed Copy Mechanism
- Liunian Harold Li, Xiaojun Wan
- Computer ScienceInternational Conference on Computational…
- 1 August 2018
This paper proposes a two-stage approach with a delayed copy mechanism to improve the precision of data records in the generated texts and verifies the efficacy of the proposed approach to generate better templates and copy data records more precisely.
Efficient Contextual Representation Learning With Continuous Outputs
- Liunian Harold Li, Patrick H. Chen, Cho-Jui Hsieh, Kai-Wei Chang
- Computer ScienceTransactions of the Association for Computational…
- 1 September 2019
This work revisits the design of the output layer and considers directly predicting the pre-trained embedding of the target word for a given context and achieves a 4-fold speedup and eliminates 80% trainable parameters while achieving competitive performance on downstream tasks.
Weakly-supervised VisualBERT: Pre-training without Parallel Images and Captions
- Liunian Harold Li, Haoxuan You, Zhecan Wang, Alireza Zareian, Shih-Fu Chang, Kai-Wei Chang
- Computer ScienceArXiv
- 24 October 2020
This work proposes Weakly-supervised VisualBERT with the key idea of conducting "mask-and-predict" pre-training on language-only and image-only corpora, and introduces the object tags detected by an object recognition model as anchor points to bridge two modalities.
Efficient Contextual Representation Learning Without Softmax Layer
- Liunian Harold Li, Patrick H. Chen, Cho-Jui Hsieh, Kai-Wei Chang
- Computer ScienceArXiv
- 28 February 2019
This work redesigns the learning objective and proposes an efficient framework for training contextual representation models that bypasses the softmax layer by performing language modeling with dimension reduction, and allows the models to leverage pre-trained word embeddings.
Leveraging Diverse Lexical Chains to Construct Essays for Chinese College Entrance Examination
- Liunian Harold Li, Xiaojun Wan, Jin-ge Yao, Siming Yan
- EducationInternational Joint Conference on Natural…
- 1 November 2017
A sentence extraction framework based on diversified lexical chains to capture coherence and richness is explored and reveals the importance of information richness in essay writing.
...
...