Unsupervised machine learning framework for discriminating major variants of concern during COVID-19

  title={Unsupervised machine learning framework for discriminating major variants of concern during COVID-19},
  author={M. Kang and Seshadri Vasan and Laurence O. W. Wilson and Rohitash Chandra},
Due to the rapid evolution of the SARS-CoV-2 (COVID-19) virus, a number of mutations emerged with variants such as Alpha, Gamma, Delta and Omicron which created massive impact to the world economy. Unsupervised machine learning methods have the ability to compresses, characterize and visualises unlabelled data. In this paper, we present a framework that utilizes unsupervised machine learning methods that includes combination of selected dimensional reduction and clustering methods to… 

Figures and Tables from this paper

Deep learning for COVID-19 topic modelling via Twitter: Alpha, Delta and Omicron

This paper uses prominent deep learning-based language models for COVID-19 topic modelling taking into account data from emergence (Alpha) to the Omicron variant to review the public behaviour across the first, second and third waves based on Twitter dataset from India.



The art of using t-SNE for single-cell transcriptomics

A protocol is introduced to help avoid common shortcomings of t-SNE, for example, enabling preservation of the global structure of the data.

Visualizing Data using t-SNE

A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.

Initialization is critical for preserving global data structure in both t-SNE and UMAP.

It is argued that there is currently no evidence that the UMAP algorithm per se has any advantage over t-SNE in terms of preserving global structure, and it is contended that these algorithms should always use informative initialization by default.

The global transmission of new coronavirus variants

Quantitative Comparison of Conventional and t-SNE-guided Gating Analyses

These studies highlight the consistency between t-SNE and conventional hand-gating in stratifying general immune cell lineages while demonstrating that particular cell subsets defined by conventional manual gating may be intermingled in t- SNE space.

Torus principal component analysis with applications to RNA structure

There are several cutting edge applications needing PCA methods for data on tori and this work proposes a novel torus-PCA method with important properties that can be generally applied and illustrates its method with two recently studied RNA structure (tori) data sets.

t-Distributed Stochastic Neighbor Embedding Method with the Least Information Loss for Macromolecular Simulations.

It is demonstrated that both one-dimensional (1D) and two-dimensional models of the t-SNE method are superior to distinguish important functional states of a model allosteric protein system for free energy and mechanistic analysis.

Genomic DNA k-mer spectra: models and modalities

Multimodal spectra are characterized by specific ranges of values of C+G content and of CpG dinucleotide suppression, a range that encompasses all tetrapods analyzed, and are found to capture low-order Markov models fairly well.