Learn More
In this paper, we propose a new clustering algorithm called <i>Fast Genetic K-means Algorithm (FGKA)</i>. FGKA is inspired by the Genetic K-means Algorithm (GKA) proposed by Krishna and Murty in 1999 but features several improvements over GKA. Our experiments indicate that, while K-means algorithm might converge to a local optimum, both FGKA and GKA always(More)
In this paper, we present a novel graph theoretic approach to the problem of document-word co-clustering. In our approach, documents and words are modeled as the two vertices of a bipartite graph. We then propose Isoperimet-ric Co-clustering Algorithm (ICA)-a new method for partitioning the document-word bipartite graph. ICA requires a simple solution to a(More)
Discovery of the protein interactions that take place within a cell can provide a starting point for understanding biological regulatory pathways. Global interaction patterns among proteins, for example, can suggest new drug targets and aid the design of new drugs by providing a clearer picture of the biological pathways in the neighborhoods of the drug(More)
Storing and querying XML documents using a RDBMS is a challenging problem since one needs to resolve the conflict between the hierarchical, ordered nature of the XML data model and the flat, unordered nature of the relational data model. This conflict can be resolved by the following XML-to-Relational mappings: schema mapping, data mapping and query(More)
Keywords: Provenance Scientific workflow Metadata management OPM OPM-compliant provenance storage RDBMS a b s t r a c t Provenance, the metadata that records the derivation history of scientific results, is essential in scientific workflows to support the reproducibility of scientific discovery, result interpretation, and problem diagnosis. To promote and(More)
The development of the Semantic Web, the next-generation Web, greatly relies on the availability of ontologies and powerful annotation tools. However, there is a lack of ontology-based annotation tools for linguistic multimedia data. Existing tools either lack ontology support or provide limited support for multimedia. To fill the gap, we present an(More)
In this paper, we introduce three microdata disclosure risk measures (minimal, maximal and weighted) for sampling disclosure control method. The minimal disclosure risk measure represents the percentage of records that can be correctly identified by an intruder based on prior knowledge of key attribute values. The maximal disclosure risk measure considers(More)
In this paper, we introduce a general framework for microdata and three disclosure risk measures (minimal, maximal and weighted). We classify the attributes from a given microdata in two different ways: based on their potential identification utility and based on the order relation that exists in their domain of value. We define inversion and change factors(More)
Hierarchical multi-label classification is a variant of traditional classification in which the instances can belong to several labels, that are in turn organized in a hierarchy. Existing hierarchical multi-label classification algorithms ignore possible correlations between the labels. Moreover, most of the current methods predict instance labels in a "(More)