An Evaluation Framework for Legal Document Summarization

  title={An Evaluation Framework for Legal Document Summarization},
  author={Ankan Mullick and Abhilash Nandy and Manav Nitin Kapadnis and Sohan Patnaik and R. Raghav and Roshni Kar},
A law practitioner has to go through numerous lengthy legal case proceedings for their practices of various categories, such as land dispute, corruption, etc. Hence, it is important to summarize these documents, and ensure that summaries contain phrases with intent matching the category of the case. To the best of our knowledge, there is no evaluation metric that evaluates a summary based on its intent. We propose an automated intent-based summarization metric, which shows a better agreement… 

Figures and Tables from this paper


A Comparative Study of Summarization Algorithms Applied to Legal Case Judgments
This paper assesses how well domain-independent summarization approaches perform on legal case judgments, and how approaches specifically designed for legal case documents of other countries generalize to Indian Supreme Court documents.
Improving Legal Document Summarization Using Graphical Models
A novel idea for applying probabilistic graphical models for automatic text summarization task related to a legal domain and the final structured summary has been observed to be closest to 80% accuracy level to the ideal summary generated by experts in the area.
CaseSummarizer: A System for Automated Summarization of Legal Texts
CaseSummarizer is presented, a tool for automated text summarization of legal documents which uses standard summary methods based on word frequency augmented with additional domain-specific knowledge.
ROUGE: A Package for Automatic Evaluation of Summaries
Four different RouGE measures are introduced: ROUGE-N, ROUge-L, R OUGE-W, and ROUAGE-S included in the Rouge summarization evaluation package and their evaluations.
A Survey of Evaluation Metrics Used for NLG Systems
This survey of automatic evaluation metrics for evaluating Natural Language Generation (NLG) systems highlights the challenges, proposes a coherent taxonomy for organising existing evaluation metrics, and briefly describes different existing metrics.
Evaluating Question Answering Evaluation
This work studies the suitability of existing metrics in QA and explores using BERTScore, a recently proposed metric for evaluating translation, for QA, finding that although it fails to provide stronger correlation with human judgements, future work focused on tailoring a BERT-based metric to QA evaluation may prove fruitful.
Leveraging BERT for Extractive Text Summarization on Lectures
This paper reports on the project called Lecture Summarization Service, a python based RESTful service that utilizes the BERT model for text embeddings and KMeans clustering to identify sentences closes to the centroid for summary selection.
Extractive Summarization using Deep Learning
This paper proposes a text summarization approach for factual reports using a deep learning model. This approach consists of three phases: feature extraction, feature enhancement, and summary
Soft Layer-Specific Multi-Task Summarization with Entailment and Question Generation
This work improves important aspects of abstractive summarization via multi-task learning with the auxiliary tasks of question generation and entailment generation with statistically significant improvements over the state-of-the-art on both the CNN/DailyMail and Gigaword datasets, as well as on the DUC-2002 transfer setup.
Bleu: a Method for Automatic Evaluation of Machine Translation
This work proposes a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.