• Corpus ID: 235727690

Transformer-F: A Transformer network with effective methods for learning universal sentence representation

  title={Transformer-F: A Transformer network with effective methods for learning universal sentence representation},
  author={Yu Shi},
  • Yu Shi
  • Published 2 July 2021
  • Computer Science
  • ArXiv
The Transformer model is widely used in natural language processing for sentence representation. However, the previous Transformer-based models focus on function words that have limited meaning in most cases and could merely extract high-level semantic abstraction features. In this paper, two approaches are introduced to improve the performance of Transformers. We calculated the attention score by multiplying the part-of-speech weight vector with the correlation coefficient, which helps extract… 

Figures and Tables from this paper


Analyzing the Structure of Attention in a Transformer Language Model
It is found that attention targets different parts of speech at different layer depths within the model, and that attention aligns with dependency relations most strongly in the middle layers, and the deepest layers of the model capture the most distant relationships.
An Efficient Character-Level and Word-Level Feature Fusion Method for Chinese Text Classification
The results show that the proposed model C_BiGRU_ATT can extract text features more effectively and reduce the effect of text representation on classification results.
Rethinking the Value of Transformer Components
This work evaluates the impact of individual component (sub-layer) in trained Transformer models from different perspectives and proposes a new training strategy that can improves translation performance by distinguishing the unimportant components in training.
Attention is All you Need
A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
BiG-Transformer: Integrating Hierarchical Features for Transformer via Bipartite Graph
BiG-Transformer is presented, which employs attention with bipartite-graph structure to replace the fully-connected self-attention mechanism in Transformer.
Chinese Text Classification Based on Neural Networks and Word2vec
This study trained two neural network models, TextCNN and TextRNN, on the HUCNews dataset, and compared their performance with the methods of THUCTC, and showed that the accuracy of Chinese text classification can be improved from 88:60% to 96:36%.
Recurrent Convolutional Neural Networks for Text Classification
A recurrent convolutional neural network is introduced for text classification without human-designed features to capture contextual information as far as possible when learning word representations, which may introduce considerably less noise compared to traditional window-based neural networks.
Attention-based LSTM for Aspect-level Sentiment Classification
This paper reveals that the sentiment polarity of a sentence is not only determined by the content but is also highly related to the concerned aspect, and proposes an Attention-based Long Short-Term Memory Network for aspect-level sentiment classification.
Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification
The experimental results on the SemEval-2010 relation classification task show that the AttBLSTM method outperforms most of the existing methods, with only word vectors.
Deep Pyramid Convolutional Neural Networks for Text Categorization
A low-complexity word-level deep convolutional neural network architecture for text categorization that can efficiently represent long-range associations in text and outperforms the previous best models on six benchmark datasets for sentiment classification and topic categorization.