IITP@COLIEE 2019: Legal Information Retrieval using BM25 and BERT

  title={IITP@COLIEE 2019: Legal Information Retrieval using BM25 and BERT},
  author={Baban Gain and Dibyanayan Bandyopadhyay and Tanik Saikh and A. Ekbal},
Natural Language Processing (NLP) and Information Retrieval (IR) in the judicial domain is an essential task. With the advent of availability domain-specific data in electronic form and aid of different Artificial intelligence (AI) technologies, automated language processing becomes more comfortable, and hence it becomes feasible for researchers and developers to provide various automated tools to the legal community to reduce human burden. The Competition on Legal Information Extraction… Expand

Tables from this paper

IITP at AILA 2019: System Report for Artificial Intelligence for Legal Assistance Shared Task
A description of the systems produced as a part of the participation in the shared task namely Artificial Intelligence for Legal Assistance (AILA 2019) is presented, which opens the path of research of Natural Language Processing (NLP) in the judicial domain. Expand
A Summary of the COLIEE 2019 Competition
The evaluation of the 6th Competition on Legal Information Extraction/Entailment (COLIEE 2019) consists of four tasks: two on case law and two on statute law, which attempts to confirm whether a particular statute applies to a yes/no question. Expand


BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks. Expand
XGBoost: A Scalable Tree Boosting System
This paper proposes a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning and provides insights on cache access patterns, data compression and sharding to build a scalable tree boosting system called XGBoost. Expand
Distributed Representations of Sentences and Documents
Paragraph Vector is an unsupervised algorithm that learns fixed-length feature representations from variable-length pieces of texts, such as sentences, paragraphs, and documents, and its construction gives the algorithm the potential to overcome the weaknesses of bag-of-words models. Expand
Scikit-learn: Machine Learning in Python
Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringingExpand
The Probabilistic Relevance Framework: BM25 and Beyond
This work presents the PRF from a conceptual point of view, describing the probabilistic modelling assumptions behind the framework and the different ranking algorithms that result from its application: the binary independence model, relevance feedback models, BM25 and BM25F. Expand
Learning representations by back-propagating errors
Back-propagation repeatedly adjusts the weights of the connections in the network so as to minimize a measure of the difference between the actual output vector of the net and the desired output vector, which helps to represent important features of the task domain. Expand