Coffea - Columnar Object Framework For Effective Analysis

  title={Coffea - Columnar Object Framework For Effective Analysis},
  author={Nicholas Smith and Lindsey Gray and Matteo Cremonesi and Bo Jayatilaka and Oliver Gutsche and Allison Reinsvold Hall and Kevin Pedro and Maria Acosta Flechas and Andrew Melo and Stefano Belforte and James Pivarski},
The coffea framework provides a new approach to High-Energy Physics analysis, via columnar operations, that improves time-to-insight, scalability, portability, and reproducibility of analysis. It is implemented with the Python programming language, the scientific python package ecosystem, and commodity big data technologies. To achieve this suite of improvements across many use cases, coffea takes a factorized approach, separating the analysis implementation and data delivery scheme. All… 
5 Citations

Figures from this paper

CutLang v2: Advances in a Runtime-Interpreted Analysis Description Language for HEP Data
CutLang has been enhanced to handle object combinatorics, to include tables and weights, to save events at any analysis stage, to benefit from multi-core/multi-CPU hardware among other improvements.
Lightweight Function Monitors for Fine-Grained Management in Large Scale Python Applications
This work reviews the challenges involved in running native Python functions at scale, and presents techniques for dynamically determining a minimal set of dependencies and for assembling a lightweight function monitor (LFM) that captures the software environment and manages resources at the granularity of single functions.
Real-time HEP analysis with funcX, a high-performance platform for function as a service
We explore how the function as a service paradigm can be used to address the computing challenges in experimental high-energy physics at CERN. As a case study, we use funcX—a high-performance
funcX: A Federated Function Serving Fabric for Science
Exploding data volumes and velocities, new computational methods and platforms, and ubiquitous connectivity demand new approaches to computation in the sciences. These new approaches must enable
Coffea-casa: an analysis facility prototype
The “Coffea-casa” prototype analysis facility is an effort to provide users with alternate mechanisms to access computing resources and enable new programming paradigms, instead of the command-line interface and asynchronous batch access.


Proceedings of the 8th Python in Science conference
The SciPy conference provides a unique opportunity to learn and affect what is happening in the realm of scientific computing with Python by providing a forum for developers to share their Python expertise with the wider commercial, academic, and research communities.
Concurrency and Computation Practice and Experience
We describe how to control the cumulative use of distributed grid resources by using coordination aware policy decision points (coordinated PDPs) and an SQL database to hold “coordination” data. When
During 1949 we isolated three red crystalline antipernicious anaemia factors from Streptomyce8 griseus fermentation liquors. The first of these was vitamin B12 itself, originally isolated by Rickes,
Awkward Arrays in Python, C++, and Numba
The Awkward Array library has been an important tool for physics analysis in Python since September 2018. However, some interface and implementation issues have been raised in Awkward Array’s first
Proceedings of the 19th Python in Science Conference
  • 2020
EPJ Web of Conferences 214
  • 06021
  • 2019
and s
  • Phys. Commun. 177, 219
  • 2007
arXiv:2001.06307 [cs.MS
  • 2001