Solving the Order-Preserving Submatrix Problem via Integer Programming

@article{Trapp2010SolvingTO,
  title={Solving the Order-Preserving Submatrix Problem via Integer Programming},
  author={Andrew C. Trapp and Oleg A. Prokopyev},
  journal={INFORMS J. Comput.},
  year={2010},
  volume={22},
  pages={387-400}
}
In this paper we consider the order-preserving submatrix (OPSM) problem. This problem is known to be NP-hard. Although in recent years some heuristic methods have been presented to find OPSMs, they lack the guarantee of optimality. We present exact solution approaches based on linear mixed 0--1 programming formulations and develop algorithmic enhancements to aid in solvability. Encouraging computational results are reported both for synthetic and real biological data. In addition, we discuss… 

Figures and Tables from this paper

A Fixed Parameter Tractable Integer Program for Finding the Maximum Order Preserving Submatrix

This paper proposes a novel exact algorithm to find maximum order preserving sub matrices which is fixed parameter tractable with respect to the number of columns of the provided gene expression data and exhibits better guarantees as well as better runtime performance as compared to the state of the art exact algorithms.

An apriori-based algorithm for mining semi-order-preserving submatrix

Order-preserving submatrices OPSMs find objects that exhibit a coherent pattern with the same linear ordering in subspace. In general, this problem can be reducible to a special case of the

On solving selected nonlinear integer programming problems in data mining, computational biology, and sustainability

This thesis consists of three essays concerning the use of optimization techniques to solve four problems in the fields of data mining, computational biology, and sustainable energy devices, and demonstrates that each problem can be modeled as a nonlinear (mixed) integer program.

Recovering all generalized order-preserving submatrices: new exact formulations and algorithms

Two exact mathematical programming formulations are provided that generalize the OPSM formulation by allowing for the reverse linear ordering, known as the generalized OPSM pattern, or GOPSM, and two novel algorithms to recover, for any given level of significance, all GOPSMs from a given data matrix, by iteratively solving mathematical programming formulation to global optimality.

Towards Order-Preserving SubMatrix Search and Indexing

This paper investigates the issues of indexing two datasets above and presents a naive solution pfTree by applying prefix-Tree and gives an optimization indexing method pIndex, which employs row and column header tables to traverse related branches in a bottom-up manner.

Mining order-preserving submatrices from probabilistic matrices

This article defines new probabilistic matrix representations to model uncertain data with continuous distributions and uses two biological datasets to illustrate that the POPSM model better captures the characteristics of the expression levels of biologically correlated genes and greatly promotes the discovery of patterns with high biological significance.

A new approach for the deep order preserving submatrix problem based on sequential pattern mining

This paper proposes a new exact algorithm, capable of mining all the deep OPSMs over a small support, and reveals better performance than the traditional sequential pattern mining algorithms.

A new approach for the deep order preserving submatrix problem based on sequential pattern mining

This paper proposes a new exact algorithm, capable of mining all the deep OPSMs over a small support, and reveals better performance than the traditional sequential pattern mining algorithms.

Recovering all generalized order-preserving submatrices: new exact formulations and algorithms

Two exact mathematical programming formulations are provided that generalize the OPSM formulation by allowing for the reverse linear ordering, known as the generalized OPSM pattern, or GOPSM, and provide two novel algorithms to recover, for any given level of significance, all GOPSMs from a given data matrix, by iteratively solving mathematical programming formulation to global optimality.

A common-subsequence-based approach for mining deep order preserving submatrix

  • Yun XueTieChen Li Xiaohui Hu
  • Computer Science
    2014 11th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD)
  • 2014
A new exact algorithm is proposed, which obtain all the deep OPSMs by finding the common subsequences shared by every two rows, which is suitable for the full mining of deep OPSM with a small support, which could even find all theDeep OPSMs with a minimum support threshold of 2.

References

SHOWING 1-10 OF 25 REFERENCES

Quadratic Binary Programming Models in Computational Biology

Using test problems fr scientific databases, the question, “Can a general-purpose solver obtain good answers in reasonable time?” is addressed, and the latest heuristics as incumbent solutions are used to address the question.

A Reformulation-Linearization Technique for Solving Discrete and Continuous Nonconvex Problems

This paper presents RLT-Based Global Optimization Algorithms for Nonconvex Polynomial Programming Problems and Reformulation-Convexification Technique for Polynomials Programs: Design and Implementation, and some special applications to Discrete and Continuous Non Convex Programs.

On the facial structure of set packing polyhedra

This paper shows that the cliques of the intersection graph provide a first set of facets for the polyhedron in question, and it is shown that the cycles without chords of odd length of the intersections graph give rise to a further set of facet.

Finding checkerboard patterns via fractional 0–1 programming

A new mathematical programming formulation for unsupervised biclustering is provided, which involves the solution of a fractional 0–1 programming problem and a linear-mixed 0-1 reformulation as well as two heuristic-based approaches are developed.

Computational Comparison Studies of Quadratic Assignment Like Formulations for the In Silico Sequence Selection Problem in De Novo Protein Design

The current best O(n2) formulation, which is the original formulation from Klepeis et al. (2003, 2004) plus DEE type preprocessing, is proposed for in silico sequence search and is able to reduce the required CPU time by 67%.

Parameterized Complexity

An approach to complexity theory which offers a means of analysing algorithms in terms of their tractability, and introduces readers to new classes of algorithms which may be analysed more precisely than was the case until now.

Facets for node packing

Discovering local structure in gene expression data: the order-preserving submatrix problem

A probabilistic model in which an OPSM is hidden within an otherwise random matrix is defined and an efficient algorithm is developed for finding the hidden OPSM in the random matrix.

Novel Approaches for Analyzing Biological Networks

The maximum n-clique andmaximum n-club problems on an arbitrary graph are introduced and their recognition versions are shown to be NP-complete.

Network flows - theory, algorithms and applications

In-depth, self-contained treatments of shortest path, maximum flow, and minimum cost flow problems, including descriptions of polynomial-time algorithms for these core models are presented.