• Corpus ID: 145819564

Sentence Boundary Detection in Adjudicatory Decisions in the United States

@inproceedings{avelka2017SentenceBD,
  title={Sentence Boundary Detection in Adjudicatory Decisions in the United States},
  author={Jarom{\'i}r {\vS}avelka and Vern R. Walker and Matthias Grabmair and Kevin D. Ashley},
  year={2017}
}
We report results of an effort to enable computers to segment US adjudicatory decisions into sentences. We created a data set of 80 court decisions from four different domains. We show that legal decisions are more challenging for existing sentence boundary detection systems than for non-legal texts. Existing sentence boundary detection systems are based on a number of assumptions that do not hold for legal texts, hence their performance is impaired. We show that a general statistical sequence… 

Tables from this paper

Sentence Boundary Detection in Legal Texts Grading: Option 3
TLDR
An effort to detect sentence boundaries in complex legal text (that do not conform to standard English syntax) using Transformer architecture based neural nets demonstrates superior results over baseline studies that utilized such features.
Sentence Boundary Detection in German Legal Documents
TLDR
An annotated dataset with over 50,000 sentences consisting of various German legal documents is created and neural networks and conditional random fields models show significantly higher performances on this data than the tested, already existing systems.
Automatic Classification of Rhetorical Roles for Sentences: Comparing Rule-Based Scripts with Machine Learning
TLDR
The paper reports promising results from using a qualitative methodology to analyze a small sample of classified sentences to develop rulebased scripts that can classify sentences that state findings of fact, and suggests that some access-to-justice use cases can be adequately addressed at much lower cost than previously believed.
Discovering Explanatory Sentences in Legal Case Decisions Using Pre-trained Language Models
TLDR
This work assembled a data set of 26,959 sentences, coming from legal case decisions, and labeled them in terms of their usefulness for explaining selected legal concepts, showing that the transformerbased models are capable of learning surprisingly sophisticated features and outperform the prior approaches to the task.
Automatic Summarization of Legal Decisions using Iterative Masking of Predictive Sentences
TLDR
It is shown that sentence predictiveness does not reliably cover all decision-relevant aspects of a case, illustrate that lexical overlap metrics are not well suited for evaluating legal summaries, and suggest that future work should focus on case-aspect coverage.
Hybrid Ensemble-Rule Algorithm for Improved MEDLINE® Sentence Boundary Detection
TLDR
This manuscript presents an algorithm to address challenges for SBD based on majority voting among three SBD engines followed by custom post-processing algorithms that rely on NLP spaCy part-of-speech, abbreviation and capital letter detection, and computing general sentence statistics.
Improving Sentence Retrieval from Case Law for Statutory Interpretation
TLDR
A specialized sentence retrieval framework is proposed that mitigates the challenges of retrieving case law sentences for interpreting statutory terms and is based on a detailed error analysis.
Legal information retrieval for understanding statutory terms
TLDR
This work proposes a novel task of discovering sentences for argumentation about the meaning of statutory terms and investigates the feasibility of developing a system that responds to a query with a list of sentences that mention the term in a way that is useful for understanding and elaborating its meaning.
Segmentation of Rulemaking Documentsfor Public Notice-and-Comment Process Analysis
TLDR
This work has annotated a dataset of final rule documents to identify all spans in which EPA discusses and evaluates the merits of public comments received on its proposed rules, and presents lessons learned from the annotation process.
Sentence Boundary Detection in Legal Text
  • George Sanchez
  • Law, Computer Science
    Proceedings of the Natural Legal Language Processing Workshop 2019
  • 2019
TLDR
This paper examined several algorithms to detect sentence boundaries in legal text and found that out-of-the-box algorithms perform poorly on legal text affecting further analysis of the text.
...
1
2
3
...

References

SHOWING 1-10 OF 17 REFERENCES
Sentence Boundary Detection: A Long Solved Problem?
TLDR
A generalized definition of SBD is proposed, eliminating text- or language-specific assumptions about candidate boundary points and degrees of variation across ‘standard’ corpora of edited, relatively formal language, as well as performance degradation when moving to less formal language.
Unsupervised Multilingual Sentence Boundary Detection
TLDR
A language-independent, unsupervised approach to sentence boundary detection based on the assumption that a large number of ambiguities in the determination of sentence boundaries can be eliminated once abbreviations have been identified, which is able to detect abbreviations with high accuracy.
Adaptive Multilingual Sentence Boundary Disambiguation
TLDR
This article presents an efficient, trainable system for sentence boundary disambiguation, called Satz, which makes simple estimates of the parts of speech of the tokens immediately preceding and following each punctuation mark, and uses these estimates as input to a machine learning algorithm that then classifies the punctuated mark.
Using Conditional Random Fields for Sentence Boundary Detection in Speech
TLDR
The authors' CRF model yields a lower error rate than the HMM and Maxent models on the NIST sentence boundary detection task in speech, although it is interesting to note that the best results are achieved by three-way voting among the classifiers.
A Maximum Entropy Approach to Identifying Sentence Boundaries
TLDR
A trainable model for identifying sentence boundaries in raw text that can be trained easily on any genre of English, and should be trainable on any other Romanalphabet language.
Maximum entropy models for natural language ambiguity resolution
This thesis demonstrates that several important kinds of natural language ambiguities can be resolved to state-of-the-art accuracies using a single statistical modeling technique based on the
A next step towards automated modelling of sources of law
TLDR
This paper has defined fourteen different categories of provisions, and compiled a list of 88 sentence structures for those categories from twenty Dutch laws, and used a parser to classify the sentences in fifteen different Dutch laws.
Periods, Capitalized Words, etc.
TLDR
This approach proved to be robust to domain shifts and new lexica and produced performance on the level with the highest reported results when incorporated into a part-of-speech tagger and helped reduce the error rate significantly on capitalized words and sentence boundaries.
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
TLDR
This work presents iterative parameter estimation algorithms for conditional random fields and compares the performance of the resulting models to HMMs and MEMMs on synthetic and natural-language data.
MITRE: description of the Alembic system used for MUC-6
As with several other veteran MUC participants, MITRE's Alembic system has undergone a major transformation in the past two years. The genesis of this transformation occurred during a dinner
...
1
2
...