In silico comparison of Iranian HIV -1 envelop glycoprotein with five nearby countries
To investigate the genomic properties of HIV-1, we collected 3,081 sequences from the HIV Sequence Database. The sequences were categorized according to sampling region, country, year, subtype, gene name, and sequence and were saved in a database constructed for this study. The relative synonymous codon usage (RSCU) values of matrix, capsid, and gp120 and gp41 genes were calculated using correspondence analysis. The synonymous codon usage patterns based on the geographical regions of African countries showed broad distributions; when all the other regions, including Asia, Europe, and the Americas, were taken into account, the Asian countries tended to be divided into two groups. The sequences were clustered into nine non-CRF subtypes. Among these, subtype C showed the most distinct codon usage pattern. To determine why the codon usage patterns in Asian countries were divided into two groups for four target genes, the sequences of the isolates from the Asian countries were analyzed. As a result, the synonymous codon usage patterns among Asian countries were divided into two groups, the southern Asian countries and the other Asian countries, with subtype 01_AE being the most dominant subtype in southern Asia. In summary, the synonymous codon usage patterns among the individual HIV-1 subtypes reflect genetic variations, and this bioinformatics technique may be useful in conjunction with phylogenetic methods for predicting the evolutionary patterns of pandemic viruses.