Corpus ID: 219176565

CLARITY -- Comparing heterogeneous data using dissimiLARITY

  title={CLARITY -- Comparing heterogeneous data using dissimiLARITY},
  author={Daniel John Lawson and Vinesh Solanki and Igor Yanovich and Johannes Dellert and Damian J. Ruck and Phillip Endicott},
  journal={arXiv: Methodology},
Integrating datasets from different disciplines is hard because the data are often qualitatively different in meaning, scale, and reliability. When two datasets describe the same entities, many scientific questions can be phrased around whether the similarities between entities are conserved. Our method, CLARITY, quantifies consistency across datasets, identifies where inconsistencies arise, and aids in their interpretation. We explore three diverse comparisons: Gene Methylation vs Gene… Expand

Figures from this paper


Combining Information-Weighted Sequence Alignment and Sound Correspondence Models for Improved Cognate Detection
The approach presented in this paper improves on this core component of cognate detection systems by a novel combination of information weighting, a technique for putting less weight on reoccurring morphological material, with sound correspondence modeling by means of pointwise mutual information. Expand
Genomic and phenomic insights from an atlas of genetic effects on DNA methylation
Results of DNA methylation-quantitative trait loci (mQTL) analyses on 32,851 participants reveal that the genetic architecture of DNAm levels is highly polygenic and DNAm exhibits signatures of negative and positive natural selection. Expand
  • 신기덕
  • Medicine
  • The Winning Cars of the Indianapolis 500
  • 2019
Acts of God? Religiosity and Natural Disasters Across Subnational World Districts
Religiosity affects everything from fertility and health to labor force participation and productivity. But why are some societies more religious than others? To answer this question, I rely on theExpand
A new approach to concept basicness and stability as a window to the robustness of concept list rankings
A comparison with and among existing rankings suggests that concept rankings are highly data-dependent and therefore less well-grounded than previously assumed, and the robustness of the ranking against language pair resampling is evaluated. Expand
A tutorial on how not to over-interpret STRUCTURE and ADMIXTURE bar plots
An approach is implemented to assess the goodness of fit of the model using the ancestry “palettes” estimated by CHROMOPAINTER and apply it to both simulated data and real case studies, allowing a richer and more robust analysis of recent demographic history. Expand
Are Automatic Methods for Cognate Detection Good Enough for Phylogenetic Reconstruction in Historical Linguistics?
It is concluded that future work on phylogenetic reconstruction can profit much from automatic cognate detection, and algorithms for automatic cognates detection are a useful complement for current research on language phylogenies. Expand
Complete mitochondrial and rDNA complex sequences of important vector species of Biomphalaria, obligatory hosts of the human-infecting blood fluke, Schistosoma mansoni
The authors' analyses reveal that the two taxa inhabiting Lake Victoria, B. sudanica and B. choanomphala, are very similar to one another relative to the similarity either shows to B. pfeifferi or B. glabrata. Expand
European values survey
Having its origins in the 1970s and being fielded on a large scale for the first time in 1981, the European Values Study is one of the longest existing ongoing survey data collection projects. ItsExpand