# Estimating the number of species to attain sufficient representation in a random sample

@article{Deng2016EstimatingTN, title={Estimating the number of species to attain sufficient representation in a random sample}, author={C. Deng and Timothy P Daley and Peter P. Calabrese and Jie Ren and Andrew D. Smith}, journal={arXiv: Methodology}, year={2016} }

The statistical problem of using an initial sample to estimate the number of species in a larger sample has found important applications in fields far removed from ecology. Here we address the general problem of estimating the number of species that will be represented by at least a number r of observations in a future sample. The number r indicates species with sufficient observations, which are commonly used as a necessary condition for any robust statistical inference. We derive a procedure…

## 6 Citations

Recent increases in assemblage rarity are linked to increasing local immigration

- Medicine, GeographyRoyal Society Open Science
- 2020

It is shown that the number of rare species within assemblages is increasing, on average, across systems, and the positive relationship between change in rarity and change in species richness provides evidence for the first explanation.

Molecular Heterogeneity in Large-Scale Biological Data: Techniques and Applications

- BiologyAnnual Review of Biomedical Data Science
- 2019

This research presents a novel, scalable, scalable and scalable approaches that can be integrated into the manufacturing process and provide real-time information about the response of the immune system to high-throughput sequencing technologies.

Addressing the pooled amplification paradox with unique molecular identifiers in single-cell RNA-seq

- Biology
- 2020

The incorporation of unique molecular identifiers in single-cell RNA-seq assays allows for the removal of amplification bias in the estimation of gene abundances and can invert the relative abundance of certain genes in cases of a pooled amplification paradox.

BUTTERFLY: addressing the pooled amplification paradox with unique molecular identifiers in single-cell RNA-seq

- Biology, MedicineGenome biology
- 2021

It is shown that the naïve removal of duplicates can lead to a bias due to a “pooled amplification paradox,” and an improved quantification method based on unseen species modeling is proposed, which uses a zero truncated negative binomial estimator implemented in the kallisto bustools workflow.

Simultaneous trimodal single-cell measurement of transcripts, epitopes, and chromatin accessibility using TEA-seq

- MedicineeLife
- 2021

A trimodal assay that simultaneously measures transcriptomics, epitopes, and chromatin accessibility from thousands of single cells is developed, which is term TEA-seq and provides a novel toolkit to identify type-specific gene regulation and expression grounded in phenotypically defined cell types.

Analysis of human brain tissue derived from DBS surgery

- Biology
- 2021

A novel approach to collecting brain tissue from DBS surgery-guiding instruments for liquid chromatography-mass spectrometry and RNA sequencing analyses is described and it was shown that the approach is useful for obtaining disease-specific expression data.

## References

SHOWING 1-10 OF 51 REFERENCES

THE NUMBER OF NEW SPECIES, AND THE INCREASE IN POPULATION COVERAGE, WHEN A SAMPLE IS INCREASED

- Mathematics
- 1956

A sample of size N is drawn at random from a population of animals of various species. Methods are given for estimating, knowing only the contents of this sample, the number of species which will be…

Estimating the Prediction Function and the Number of Unseen Species in Sampling with Replacement

- Mathematics
- 1998

Abstract A sample of N units is taken from a population consisting of an unknown number of species. We are interested in estimating the number of species and the prediction function for future…

The Relation Between the Number of Species and the Number of Individuals in a Random Sample of an Animal Population

- Biology
- 1943

Part 1. It is shown that in a large collection of Lepidoptera captured in Malaya the frequency of the number of species represented by different numbers of individuals fitted somewhat closely to a…

Nonparametric prediction in species sampling

- Biology
- 2004

A simple prediction method is proposed for predicting the number of new species that would be discovered by additional sampling in a continuous-time stochastic model in which species arrive in the sample according to independent Poisson processes and where the species discovery rates are heterogeneous.

Applications of species accumulation curves in large-scale biological data analysis

- Biology, Computer ScienceQuantitative Biology
- 2015

A method to estimate accumulation curves for predicting the complexity of DNA sequencing libraries, which uses rational function approximations to a classical nonparametric empirical Bayes estimator due to Good and Toulmin is developed.

The Commonness, And Rarity, of Species

- Biology
- 1948

The purpose of this paper is to deduce, from a number of examples and from theoretical considerations, some plausible general law as to how abundance or commonness is distributed among species.…

Estimating terrestrial biodiversity through extrapolation.

- Biology, MedicinePhilosophical transactions of the Royal Society of London. Series B, Biological sciences
- 1994

The importance of using 'reference' sites to assess the true richness and composition of species assemblages, to measure ecologically significant ratios between unrelated taxa, toMeasure taxon/sub-taxon (hierarchical) ratios, and to 'calibrate' standardized sampling methods is discussed.

INTERPOLATING, EXTRAPOLATING, AND COMPARING INCIDENCE-BASED SPECIES ACCUMULATION CURVES

- Mathematics
- 2004

A general binomial mixture model is proposed for the species accumulation function based on presence-absence (incidence) of species in a sample of quadrats or other sampling units. The model covers…

Estimating the population size for capture-recapture data with unequal catchability.

- Mathematics, MedicineBiometrics
- 1987

A point estimator and its associated confidence interval for the size of a closed population are proposed under models that incorporate heterogeneity of capture probability andumerical results show that the proposed confidence interval performs satisfactorily in maintaining the nominal levels.

Estimating the Number of Classes via Sample Coverage

- Mathematics
- 1992

Abstract Assume that a random sample is drawn from a population with unknown number of classes and possibly unequal class probabilities. A nonparametric estimation technique is proposed to estimate…