Non-Parametric Message Importance Measure: Storage Code Design and Transmission Planning for Big Data

  title={Non-Parametric Message Importance Measure: Storage Code Design and Transmission Planning for Big Data},
  author={Shanyun Liu and Rui She and Pingyi Fan and Khaled Ben Letaief},
  journal={IEEE Transactions on Communications},
The storage and the transmission of messages in big data are discussed in this paper, where message importance is taken into account. To this end, we propose to use non-parametric message importance measure (NMIM) as a measure of message importance, which can characterize the uncertainty of random events like Shannon entropy and Rényi entropy. We prove that NMIM sufficiently describes the two key characters of big data, i.e., the rare events finding and the large diversities of events. Based on… 

Figures from this paper

Storage Space Allocation Strategy for Digital Data with Message Importance

This paper presents an optimal allocation strategy in the storage of digital data by the exponential distortion measurement, which can make rational use of all the storage space and characterizes the trade-off between the relative weighted reconstruction error and the available storage size.

Attention to the Variation of Probabilistic Events: Information Processing with Message Importance Measure

This paper first constructs a system model with message importance measure and proposes the message importance loss to enrich the information processing strategies and the bitrate transmission constrained by themessage importance loss is investigated to broaden the scope for Shannon information theory.

State Variation Mining: On Information Divergence with Message Importance in Big Data

The message importance transfer capacity based onMITM is presented to offer an upper bound for the information transfer process with disturbance and the MITM is extended to the continuous case and discussed the robustness by using it to measuring information distance.

Differential Message Importance Measure: A New Approach to the Required Sampling Number in Big Data Structure Characterization

A new approach to the required sampling number is proposed, where the DMIM deviation is constructed to characterize the process of collecting message importance, and the connection between message importance and distribution goodness-of-fit is established, which verifies that analyzing the data collection with taking message importance into account is feasible.

How Many Samples Required in Big Data Collection: A Differential Message Importance Measure

It is proved that the change of DMIM can describe the gap between the distribution of a set of sample values and a theoretical distribution, and it is obtained that the empirical distribution approaches the real distribution with decreasing of the DMIM deviation.

Recognizing Information Feature Variation: Message Importance Transfer Measure and Its Applications in Big Data

This paper presents the message importance transfer measure (MITM) and analyzes its performance and applications and discusses the robustness of MITM by using it to measuring information distance and gives an upper bound for the information transfer process with disturbance.

Information Measure Similarity Theory: Message Importance Measure via Shannon Entropy

The message importance distortion function is presented to give an upper bound of information compression based on message importance measure and the bitrate transmission constrained by the message importance loss is investigated to broaden the scope for Shannon information theory.

Amplifying Inter-Message Distance: On Information Divergence Measures in Big Data

This paper defines a parametric M-I divergence in the view of information theory and presents its major properties, and designs a M- I divergence estimation algorithm by means of the ensemble estimator of the proposed weight kernel estimators, which can improve the convergence of mean squared error.

An Importance Aware Weighted Coding Theorem Using Message Importance Measure

This novel information theoretical measure generalizes the average codeword length by assigning importance weights for each symbol according to users’ concerns through focusing on user’s selections.

Importance of Small Probability Events in Big Data: Information Measures, Applications, and Challenges

This paper first makes a survey of some theories and models with respect to importance measures and investigates the relationship between subjective or semantic importance and rare events in big data.



Non-parametric message important measure: Compressed storage design for big data in wireless communication systems

A non-parametric message important measure (NMIM) is defined as a measure for message importance that can characterize the uncertainty of random events and can sufficiently describe the two key characters of big data: rare events finding and large diversities of events.

Message Importance Measure and Its Application to Minority Subset Detection in Big Data

A parametric MIM measure is defined from the viewpoint of information theory and its properties are investigated and a parameter selection principle is presented that provides answers to the minority subsets detection problem in the statistical processing of big data.

Focusing on a probability element: Parameter selection of message importance measure in big data

This paper proposes a parameter selection method of MIM focusing on a probability element and then presents its major properties and discusses the parameter selection with prior probability, and investigates the availability in a statistical processing model of big data for anomaly detection problem.

Distributed Binary Detection With Lossy Data Compression

Improvement of performance in the general case is shown to be possible when the requirement of source reconstruction is relaxed, which stands in contrast to the case of general hypotheses.

Block and Sliding-Block Lossy Compression via MCMC

An approach to lossy compression of finite-alphabet sources that utilizes Markov chain Monte Carlo and simulated annealing methods and achieves optimum rate-distortion performance in the limits of large number of iterations, and sequence length, when employed on any stationary ergodic source.

Preconditioned Data Sparsification for Big Data With Applications to PCA and K-Means

A compression scheme for large data sets that randomly keeps a small percentage of the components of each data sample, and therefore, subsequent processing, such as principal component analysis (PCA) or K-means, is significantly faster, especially in a distributed-data setting.

Joint Source–Channel Coding for Broadcasting Correlated Sources

This paper studies lossy transmission of a memoryless bivariate Gaussian source over a bandwidth-mismatched memoryless Gaussian broadcast channel with two receivers, where each receiver is interested

Lossy Compression for Compute-and-Forward in Limited Backhaul Uplink Multicell Processing

Two lattice-based coding schemes are proposed that can outperform standard CoF and successive Wyner-Ziv schemes in certain regimes, and are illustrated through some numerical examples.

Unequal error protection for video streaming using delay-aware fountain codes

  • Kairan SunD. Wu
  • Computer Science
    2017 IEEE International Conference on Communications (ICC)
  • 2017
This work proposes a method to integrate the unequal error protection (UEP) into DAF to provide additional protection for important bits, and shows the proposed system achieves higher decoding ratios and PSNR compared to equal errorprotection (EEP) under the same network conditions.

On the Rate-Distortion Function for Binary Source Coding With Side Information

An in-depth analysis of the problem of lossy compression of binary sources in the presence of correlated side information, where the correlation is given by a generic binary asymmetric channel and the Hamming distance is the distortion metric, and derives for the first time the rate-distortion function for conventional predictive coding in the binary-asymmetric-correlation-channel scenario.