CpG island libraries from human Chromosomes 18 and 22: landmarks for novel genes
DNA libraries often contain very large numbers of clones (from 1000 up to 700,000). Since at present it is impossible to analyze all of these clones, usually statistical samples comprising less than 100 clones are tested. The quality of the library is then assessed by linear extrapolation. Occasionally, full coverage of chromosomal regions by DNA probes is inferred from this. However, this may not be accurate since linear extrapolation is misleading and the statistical samples are generally too small to characterize the libraries. A quantitative model of the distribution of the frequencies of the clones in a library is mandatory for any useful assessment of the quality of the library. Otherwise, it is very difficult to draw useful conclusions from moderately sized samples. Examples from everyday life and formulas are given to determine the quality of a library and useful sample sizes.