AraDIC: Arabic Document Classification using Image-Based Character Embeddings and Class-Balanced Loss

@article{Daif2020AraDICAD,
  title={AraDIC: Arabic Document Classification using Image-Based Character Embeddings and Class-Balanced Loss},
  author={Mahmoud Daif and Shunsuke Kitada and Hitoshi Iyatomi},
  journal={ArXiv},
  year={2020},
  volume={abs/2006.11586}
}
  • Mahmoud Daif, Shunsuke Kitada, Hitoshi Iyatomi
  • Published 2020
  • Computer Science
  • ArXiv
  • Classical and some deep learning techniques for Arabic text classification often depend on complex morphological analysis, word segmentation, and hand-crafted feature engineering. These could be eliminated by using character-level features. We propose a novel end-to-end Arabic document classification framework, Arabic document imagebased classifier (AraDIC), inspired by the work on image-based character embeddings. AraDIC consists of an image-based character encoder and a classifier. They are… CONTINUE READING

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 35 REFERENCES

    Character-level Convolutional Networks for Text Classification

    VIEW 4 EXCERPTS
    HIGHLY INFLUENTIAL

    Class-Balanced Loss Based on Effective Number of Samples

    VIEW 4 EXCERPTS
    HIGHLY INFLUENTIAL

    A comparative study of costsensitive boosting algorithms

    • Kai Ming Ting.
    • Proc. of ICML. Citeseer.
    • 2000
    VIEW 1 EXCERPT

    A supervised approach for multi-label classification of Arabic news articles

    VIEW 2 EXCERPTS

    APT: Arabic Part-of-speech Tagger

    VIEW 2 EXCERPTS

    Adam: A Method for Stochastic Optimization

    VIEW 1 EXCERPT