MOTIVATION The extraordinary genetic and antigenic variability of RNA viruses is arguably the greatest challenge to the development of broadly effective vaccines. No single viral variant can induce sufficiently broad immunity, and incorporating all known naturally circulating variants into one multivalent vaccine is not feasible. Furthermore, no objective strategies currently exist to select actual viral variants that should be included or excluded in polyvalent vaccines. RESULTS To address this problem, we demonstrate a method based on graph theory that quantifies the relative importance of viral variants. We demonstrate our method through application to the envelope glycoprotein gene of a particularly diverse RNA virus of pigs: porcine reproductive and respiratory syndrome virus (PRRSV). Using distance matrices derived from sequence nucleotide difference, amino acid difference and evolutionary distance, we constructed viral networks and used common network statistics to assign each sequence an objective ranking of relative 'importance'. To validate our approach, we use an independent published algorithm to score our top-ranked wild-type variants for coverage of putative T-cell epitopes across the 9383 sequences in our dataset. Top-ranked viruses achieve significantly higher coverage than low-ranked viruses, and top-ranked viruses achieve nearly equal coverage as a synthetic mosaic protein constructed in silico from the same set of 9383 sequences. CONCLUSION Our approach relies on the network structure of PRRSV but applies to any diverse RNA virus because it identifies subsets of viral variants that are most important to overall viral diversity. We suggest that this method, through the objective quantification of variant importance, provides criteria for choosing viral variants for further characterization, diagnostics, surveillance and ultimately polyvalent vaccine development.