Categorical Data Analysis
A massive amount of conditional independence tests on data must be performed in the problem of learning the structure of probabilistic graphical models when using the independence-based approach. An intermediate step in the computation of independence tests is the construction of contingency tables from the data. In this work we present an intelligent cache of contingency tables that allows the tables stored to be reused not only for the same test, in the not uncommon case that the test must be performed again, but for an exponential number of other tests, all those involving a subset of the variables of the test stored. In practice, however, not so many tests actually reuse the tables stored. We show results when testing the cache with IBMAP-HC, a recently proposed algorithm for learning the structure of Markov networks, a.k.a. undirected graphical models. The experiments show that in all cases, above 95% of the running time spent by IBMAP-HC in reading data is saved by the cache. The savings in running time for IBMAP-HC were up to 80% for datasets above 40,000 datapoints.