• Corpus ID: 231740620

MUSE: Multi-Scale Temporal Features Evolution for Knowledge Tracing

@article{Zhang2021MUSEMT,
  title={MUSE: Multi-Scale Temporal Features Evolution for Knowledge Tracing},
  author={Chengwei Zhang and Yangzhou Jiang and Wei Zhang and Chengyu Gu},
  journal={ArXiv},
  year={2021},
  volume={abs/2102.00228}
}
Transformer based knowledge tracing model is an extensively studied problem in the field of computer-aided education. By integrating temporal features into the encoder-decoder structure, transformers can processes the exercise information and student response information in a natural way. However, current state-of-the-art transformer-based variants still share two limitations. First, extremely long temporal features cannot well handled as the complexity of self-attention mechanism is O(n… 
1 Citations

Figures and Tables from this paper

Assessing the Knowledge State of Online Students - New Data, New Approaches, Improved Accuracy
TLDR
This study is the first to use four very large sets of student data made available recently from four distinct intelligent tutoring systems and achieves improved accuracy of student modeling by introducing new features that can be easily computed from conventional question-response logs.

References

SHOWING 1-10 OF 18 REFERENCES
Towards an Appropriate Query, Key, and Value Computation for Knowledge Tracing
TLDR
This is the first work to suggest an encoder-decoder model for knowledge tracing that applies deep self-attentive layers to exercises and responses separately and achieves state-of-the-art performance in knowledge tracing with an improvement in area under receiver operating characteristic curve (AUC).
A Self Attentive model for Knowledge Tracing
TLDR
This work develops an approach that identifies the KCs from the student's past activities that are relevant to the given KC and predicts his/her mastery based on the relatively few KCs that it picked, and handles the data sparsity problem better than the methods based on RNN.
Attention is All you Need
TLDR
A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TLDR
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Deep Interest Evolution Network for Click-Through Rate Prediction
TLDR
This paper proposes a novel model, named Deep Interest Evolution Network~(DIEN), for CTR prediction, which significantly outperforms the state-of-the-art solutions and design interest extractor layer to capture temporal interests from history behavior sequence.
MRIF: Multi-resolution Interest Fusion for Recommendation
TLDR
A multi-resolution interest fusion model (MRIF) that is capable to capture the dynamic changes in users' interests at different temporal-ranges, and provides an effective way to combine a group of multi- resolution user interests to make predictions.
Multi-Scale Self-Attention for Text Classification
TLDR
A Multi-Scale Transformer which uses multi-scale multi-head self-attention to capture features from different scales and a strategy to control the scale distribution for each layer is designed.
Deep Interest Network for Click-Through Rate Prediction
TLDR
A novel model: Deep Interest Network (DIN) is proposed which tackles this challenge by designing a local activation unit to adaptively learn the representation of user interests from historical behaviors with respect to a certain ad.
FreeLB: Enhanced Adversarial Training for Natural Language Understanding
TLDR
A novel adversarial training algorithm is proposed, FreeLB, that promotes higher invariance in the embedding space, by adding adversarial perturbations to word embeddings and minimizing the resultant adversarial risk inside different regions around input samples.
Adam: A Method for Stochastic Optimization
TLDR
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
...
1
2
...