OLAP textual aggregation approach using the Google similarity distance

@article{Bouakkaz2016OLAPTA,
  title={OLAP textual aggregation approach using the Google similarity distance},
  author={Mustapha Bouakkaz and Sabine Loudcher and Youcef Ouinten},
  journal={Int. J. Bus. Intell. Data Min.},
  year={2016},
  volume={11},
  pages={31-48}
}
Data warehousing and online analytical processing OLAP are essential elements to decision support. In the case of textual data, decision support requires new tools, mainly textual aggregation functions, for better and faster high level analysis and decision making. Such tools will provide textual measures to users who wish to analyse documents online. In this paper, we propose a new aggregation function for textual data in an OLAP context based on the K-means method. This approach will… 
Textual aggregation approaches in OLAP context: A survey
Efficiently mining frequent itemsets applied for textual aggregation
TLDR
This work proposes a novel aggregation function for textual data based on the discovery of frequent closed patterns in a generated documents/keywords matrix that largely outperforms four state-of-the-art textual aggregation methods in terms of recall, precision, F-measure and runtime.
Textual data Benchmark for Distributed Systems
TLDR
This work proposes a generic document-oriented benchmark for storing textual data and constructing weighting schemes (TextBenDS), which offers a generic data model designed with a multidimensional approach for storing text documents and evaluates the computing performance of the queries on several distributed environments set within the Apache Hadoop ecosystem.
TextBenDS: a Generic Textual Data Benchmark for Distributed Systems
TLDR
A generic document-oriented benchmark for storing textual data and constructing weighting schemes, designed with a multidimensional approach for storing text documents is presented and the computing performance of the queries on several distributed environments set within the Apache Hadoop ecosystem is evaluated.
SURVEY ON AUTOMATIC DETECTION OF SENSITIVE ATTRIBUTE IN PRIVACY PRESERVED HADOOP ENVIRONMENT USING DATA MINING TECHNIQUES
TLDR
This study establishes Privacy Preserved Hadoop Environment (PPHE) that automatically detects sensitive attribute using data mining techniques and pursues the data suppression technique to compress private tweets.
Semantics-based sensitive topic diffusion detection framework towards privacy aware online social networks
TLDR
This paper presents a three-fold sanitization framework which precisely detects sensitive topics semantically using statistical topic model scheme which incorporates standard knowledge bases for tagging the sensitive topics discovered.
Technology in its context - a literature review of the macro and micro levels of business intelligence
  • T. Fischer
  • Business
    Int. J. Bus. Intell. Data Min.
  • 2018
TLDR
The literature review contributes to the characterisation and theorisation of BI and shows that a company depend on both characteristics and the purpose for which BI is used, which implies that BI is in a phase of maturity.
OLAP Textual com Múltiplas Hierarquias de Tópicos e Rankings Segmentados
TLDR
This artigo apresenta uma abordagem para OLAP textual that constroi multiplas hierarquias de topicos para cada celula do cubo, denominada DTCubing, y pretende contribuir com a apresentacao dos resultados das consultas multidimensionais.
Adomian decomposition method for solving the population dynamics model of two species
Adomian decomposition method has been a powerful method to solve differential equations. In this paper, we propose the method to solve the population dynamics model of two species for mutualism,
Technology in its context - a literature review of the macro and micro levels of business intelligence
TLDR
The literature review shows that BI is used as a monolithic concept and static tool with technical control mechanisms and shows that a company depend on both characteristics and the purpose for...

References

SHOWING 1-10 OF 47 REFERENCES
Automatic textual aggregation approach of scientific articles in OLAP context
TLDR
This paper presents a new aggregation function for textual data based on the affinity between keywords and uses the search of cycles in a graph to find the aggregated keywords.
CXT-cube: contextual text cube model and aggregation operator for text OLAP
TLDR
A contextual text cube model denoted CXT-Cube is proposed which considers several contextual factors during the OLAP analysis in order to better consider the contextual information associated with textual data.
Olap aggregation function for textual data warehouse
TLDR
A new aggregation function for keywords is presented allowing the aggregation of textual data in OLAP environments as traditional arithmetic functions would do on numeric data.
Towards a Data Warehouse Contextualized with Web Opinions
TLDR
The contextualized warehouse infrastructure is proposed to be extended with new opinion retrieval techniques conceived to classify and search for opinions in document collections with these characteristics.
Text Cube: Computing IR Measures for Multidimensional Text Database Analysis
TLDR
This paper proposes a text-cube model on multidimensional text database and conducts systematic studies on efficient text-Cube implementation, OLAP execution and query processing and shows the high promise of the methods.
Top_Keyword: An Aggregation Function for Textual Document OLAP
TLDR
A new aggregation function that aggregates textual data in an OLAP environment is presented that represents a set of documents by their most significant terms using a weighing function from information retrieval: tf.idf.
The concept of document warehousing for multi-dimensional modeling of textual-based business intelligence
Topic Cube: Topic Modeling for OLAP on Multidimensional Text Databases
TLDR
A new data model called topic cube is proposed to combine OLAP with probabilistic topic modeling and enable OLAP on the dimension of text data in a multidimensional text database and a heuristic method to speed up the iterative EM algorithm for estimating topic models is proposed.
R-Cubes: OLAP Cubes Contextualized with Documents
TLDR
The proposed architecture for the integration of a corporate warehouse of structured data with a warehouse of text-rich XML documents is called a contextualized warehouse, and a prototype R-cube system is presented, which is explained how to use.
Contextualizing data warehouses with documents
...
...