Dhammika Amaratunga

Learn More
Although the random forest classification procedure works well in datasets with many features, when the number of features is huge and the percentage of truly informative features is small, such as with DNA microarray data, its performance tends to decline significantly. In such instances, the procedure can be improved by reducing the contribution of trees(More)
We describe, for the first time, the generation of a viral DNA chip for simultaneous expression measurements of nearly all known open reading frames (ORFs) in the largest member of the herpesvirus family, human cytomegalovirus (HCMV). In this study, an HCMV chip was fabricated and used to characterize the temporal class of viral gene expression. The viral(More)
MOTIVATION DNA microarray technology typically generates many measurements of which only a relatively small subset is informative for the interpretation of the experiment. To avoid false positive results, it is therefore critical to select the informative genes from the large noisy data before the actual analysis. Most currently available filtering(More)
Probe-level data from Affymetrix GeneChips can be summarized in many ways to produce probe-set level gene expression measures (GEMs). Disturbingly, the different approaches not only generate quite different measures but they could also yield very different analysis results. Here, we explore the question of how much the analysis results really do differ,(More)
The strength and weakness of microarray technology can be attributed to the enormous amount of information it is generating. To fully enhance the benefit of microarray technology for testing differentially expressed genes and classification, there is a need to minimize the amount of irrelevant genes present in microarray data. A major interest is to use(More)
As gene annotation databases continue to evolve and improve, it has become feasible to incorporate the functional and pathway information about genes, available in these databases into the analysis of gene expression data, for a better understanding of the underlying mechanisms. A few methods have been proposed in the literature to formally convert(More)
MOTIVATION DNA microarrays are a well-known and established technology in biological and pharmaceutical research providing a wealth of information essential for understanding biological processes and aiding drug development. Protein microarrays are quickly emerging as a follow-up technology, which will also begin to experience rapid growth as the challenges(More)
An important issue in classification is the assessment of sample similarity. This is nontrivial in high-dimensional or megavariate datasets--datasets that are comprised of simultaneous measurements on thousands of features, many of which carry little or no information regarding consistent sample differences. Conventional similarity measures do not work(More)
Benchmark datasets are important for the validation and optimization of the analysis routes. Lately, a new benchmark dataset, 'Platinum Spike', for the Affymetrix GeneChip experiments has been introduced. We performed a quality check of the Platinum Spike dataset by using probe-level linear mixed models. The results have shown that there are 'empty' probe(More)