• Corpus ID: 88514659

Estimating the number of species to attain sufficient representation in a random sample

@article{Deng2016EstimatingTN,
  title={Estimating the number of species to attain sufficient representation in a random sample},
  author={C. Deng and Timothy P Daley and Peter P. Calabrese and Jie Ren and Andrew D. Smith},
  journal={arXiv: Methodology},
  year={2016}
}
The statistical problem of using an initial sample to estimate the number of species in a larger sample has found important applications in fields far removed from ecology. Here we address the general problem of estimating the number of species that will be represented by at least a number r of observations in a future sample. The number r indicates species with sufficient observations, which are commonly used as a necessary condition for any robust statistical inference. We derive a procedure… 

Figures and Tables from this paper

Recent increases in assemblage rarity are linked to increasing local immigration
TLDR
It is shown that the number of rare species within assemblages is increasing, on average, across systems, and the positive relationship between change in rarity and change in species richness provides evidence for the first explanation.
Molecular Heterogeneity in Large-Scale Biological Data: Techniques and Applications
TLDR
This research presents a novel, scalable, scalable and scalable approaches that can be integrated into the manufacturing process and provide real-time information about the response of the immune system to high-throughput sequencing technologies.
Addressing the pooled amplification paradox with unique molecular identifiers in single-cell RNA-seq
TLDR
The incorporation of unique molecular identifiers in single-cell RNA-seq assays allows for the removal of amplification bias in the estimation of gene abundances and can invert the relative abundance of certain genes in cases of a pooled amplification paradox.
BUTTERFLY: addressing the pooled amplification paradox with unique molecular identifiers in single-cell RNA-seq
TLDR
It is shown that the naïve removal of duplicates can lead to a bias due to a “pooled amplification paradox,” and an improved quantification method based on unseen species modeling is proposed, which uses a zero truncated negative binomial estimator implemented in the kallisto bustools workflow.
Simultaneous trimodal single-cell measurement of transcripts, epitopes, and chromatin accessibility using TEA-seq
TLDR
A trimodal assay that simultaneously measures transcriptomics, epitopes, and chromatin accessibility from thousands of single cells is developed, which is term TEA-seq and provides a novel toolkit to identify type-specific gene regulation and expression grounded in phenotypically defined cell types.
Analysis of human brain tissue derived from DBS surgery
TLDR
A novel approach to collecting brain tissue from DBS surgery-guiding instruments for liquid chromatography-mass spectrometry and RNA sequencing analyses is described and it was shown that the approach is useful for obtaining disease-specific expression data.

References

SHOWING 1-10 OF 51 REFERENCES
THE NUMBER OF NEW SPECIES, AND THE INCREASE IN POPULATION COVERAGE, WHEN A SAMPLE IS INCREASED
A sample of size N is drawn at random from a population of animals of various species. Methods are given for estimating, knowing only the contents of this sample, the number of species which will be
Estimating the Prediction Function and the Number of Unseen Species in Sampling with Replacement
Abstract A sample of N units is taken from a population consisting of an unknown number of species. We are interested in estimating the number of species and the prediction function for future
The Relation Between the Number of Species and the Number of Individuals in a Random Sample of an Animal Population
Part 1. It is shown that in a large collection of Lepidoptera captured in Malaya the frequency of the number of species represented by different numbers of individuals fitted somewhat closely to a
Nonparametric prediction in species sampling
TLDR
A simple prediction method is proposed for predicting the number of new species that would be discovered by additional sampling in a continuous-time stochastic model in which species arrive in the sample according to independent Poisson processes and where the species discovery rates are heterogeneous.
Applications of species accumulation curves in large-scale biological data analysis
TLDR
A method to estimate accumulation curves for predicting the complexity of DNA sequencing libraries, which uses rational function approximations to a classical nonparametric empirical Bayes estimator due to Good and Toulmin is developed.
The Commonness, And Rarity, of Species
The purpose of this paper is to deduce, from a number of examples and from theoretical considerations, some plausible general law as to how abundance or commonness is distributed among species.
Estimating terrestrial biodiversity through extrapolation.
  • R. K. Colwell, J. Coddington
  • Biology, Medicine
    Philosophical transactions of the Royal Society of London. Series B, Biological sciences
  • 1994
TLDR
The importance of using 'reference' sites to assess the true richness and composition of species assemblages, to measure ecologically significant ratios between unrelated taxa, toMeasure taxon/sub-taxon (hierarchical) ratios, and to 'calibrate' standardized sampling methods is discussed.
INTERPOLATING, EXTRAPOLATING, AND COMPARING INCIDENCE-BASED SPECIES ACCUMULATION CURVES
A general binomial mixture model is proposed for the species accumulation function based on presence-absence (incidence) of species in a sample of quadrats or other sampling units. The model covers
Estimating the population size for capture-recapture data with unequal catchability.
  • A. Chao
  • Mathematics, Medicine
    Biometrics
  • 1987
TLDR
A point estimator and its associated confidence interval for the size of a closed population are proposed under models that incorporate heterogeneity of capture probability andumerical results show that the proposed confidence interval performs satisfactorily in maintaining the nominal levels.
Estimating the Number of Classes via Sample Coverage
Abstract Assume that a random sample is drawn from a population with unknown number of classes and possibly unequal class probabilities. A nonparametric estimation technique is proposed to estimate
...
1
2
3
4
5
...