Corpus ID: 50787139

Pythia v0.1: the Winning Entry to the VQA Challenge 2018

@article{Jiang2018PythiaVT,
  title={Pythia v0.1: the Winning Entry to the VQA Challenge 2018},
  author={Yu Jiang and Vivek Natarajan and Xinlei Chen and Marcus Rohrbach and Dhruv Batra and Devi Parikh},
  journal={ArXiv},
  year={2018},
  volume={abs/1807.09956}
}
  • Yu Jiang, Vivek Natarajan, +3 authors Devi Parikh
  • Published 2018
  • Computer Science
  • ArXiv
  • This document describes Pythia v0.1, the winning entry from Facebook AI Research (FAIR)'s A-STAR team to the VQA Challenge 2018. Our starting point is a modular re-implementation of the bottom-up top-down (up-down) model. We demonstrate that by making subtle but important changes to the model architecture and the learning rate schedule, fine-tuning image features, and adding data augmentation, we can significantly improve the performance of the up-down model on VQA v2.0 dataset -- from 65.67… CONTINUE READING

    Figures, Tables, and Topics from this paper.

    Citations

    Publications citing this paper.
    SHOWING 1-10 OF 61 CITATIONS

    Cycle-Consistency for Robust Visual Question Answering

    VIEW 5 EXCERPTS
    CITES METHODS & BACKGROUND

    DRAU: Dual Recurrent Attention Units for Visual Question Answering

    VIEW 1 EXCERPT
    CITES BACKGROUND

    FILTER CITATIONS BY YEAR

    2018
    2020

    CITATION STATISTICS

    • 8 Highly Influenced Citations

    • Averaged 20 Citations per year from 2018 through 2020

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 17 REFERENCES

    Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge

    VIEW 3 EXCERPTS
    HIGHLY INFLUENTIAL

    Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

    VIEW 6 EXCERPTS
    HIGHLY INFLUENTIAL

    Aggregated Residual Transformations for Deep Neural Networks

    VIEW 2 EXCERPTS

    Beyond Bilinear: Generalized Multimodal Factorized High-Order Pooling for Visual Question Answering

    VIEW 1 EXCERPT

    Adam: A Method for Stochastic Optimization

    VIEW 1 EXCERPT

    Deep Residual Learning for Image Recognition

    VIEW 1 EXCERPT

    VQA: Visual Question Answering

    VIEW 1 EXCERPT