Cleaning by clustering: methodology for addressing data quality issues in biomedical metadata


The ability to efficiently search and filter datasets depends on access to high quality metadata. While most biomedical repositories require data submitters to provide a minimal set of metadata, some such as the Gene Expression Omnibus (GEO) allows users to specify additional metadata in the form of textual key-value pairs (e.g. sex: female). However, since… (More)
DOI: 10.1186/s12859-017-1832-4


11 Figures and Tables