Yoojin Hong

Learn More
Both multiple sequence alignment and phylogenetic analysis are problematic in the "twilight zone" of sequence similarity (≤ 25% amino acid identity). Herein we explore the accuracy of phylogenetic inference at extreme sequence divergence using a variety of simulated data sets. We evaluate four leading multiple sequence alignment (MSA) methods (MAFFT,(More)
In maintaining Digital Libraries, having bibliographic data up-to-date is critical, yet often minor irregularities may cause information isolation. Unlike documents for which various kinds of unique ID systems exist (e.g., DOI, ISBN), other bibliographic entities such as author and publication venue do not have unique IDs. Therefore, in current Digital(More)
As the usage of <i>Web Services</i> proliferates dramatically, new tools to help quickly generate web services are needed. In this paper, we propose a methodology that helps to <i>automatically</i> generate Web Services from the FORM-based query interfaces of a web site. Since the majority of web data are rather "hidden" behind such a FORM interface, we(More)
Inferring evolutionary relationships among highly divergent protein sequences is a daunting task. In particular, when pairwise sequence alignments between protein sequences fall <25% identity, the phylogenetic relationships among sequences cannot be estimated with statistical certainty. Here, we show that phylogenetic profiles generated with the Gestalt(More)
The sequence of amino acids in a protein is believed to determine its native state structure, which in turn is related to the functionality of the protein. In addition, information pertaining to evolutionary relationships is contained in homologous sequences. One powerful method for inferring these sequence attributes is through comparison of a query(More)
Just as physicists strive to develop a TOE (theory of everything), which explains and unifies the physical laws of the universe, the life-scientist wishes to uncover the TOE as it relates to cellular systems. This can only be achieved with a quantitative platform that can comprehensively deduce and relate protein structure, functional, and evolution of(More)
Since modern database applications increasingly need to deal with dirty data due to a variety of reasons (e.g., data entry errors, heterogeneous formats, and ambiguous terms), considerable recent efforts have focused on the (record) linkage problem to determine if two entities represented as relational records are approximately the same or not. In this(More)
A major computational challenge in the genomic era is annotating structure/function to the vast quantities of sequence information now available. This problem is illustrated by the fact that most proteins lack comprehensive annotation, even when experimental evidence exists. We theorized that phylogenetic profiles provide a quantitative method that can(More)
The inability to resolve deep node relationships of highly divergent/rapidly evolving protein families is a major factor that stymies evolutionary studies. In this manuscript, we propose a Multiple Sequence Alignment (MSA) independent method to infer evolutionary relationships. We previously demonstrated that phylogenetic profiles built using position(More)
The recA/RAD51 gene family encodes a diverse set of recombinase proteins that affect homologous recombination, DNA-repair, and genome stability. The recA gene family is expressed across all three domains of life - Eubacteria, Archaea, and Eukaryotes - and even in some viruses. To date, efforts to resolve the deep evolutionary origins of this ancient protein(More)