“Influence sketching”: Finding influential samples in large-scale regressions

  title={“Influence sketching”: Finding influential samples in large-scale regressions},
  author={M. Wojnowicz and B. Cruz and X. Zhao and B. Wallace and M. Wolff and Jay Luan and Caleb Crable},
  journal={2016 IEEE International Conference on Big Data (Big Data)},
  • M. Wojnowicz, B. Cruz, +4 authors Caleb Crable
  • Published 2016
  • Computer Science, Mathematics
  • 2016 IEEE International Conference on Big Data (Big Data)
  • There is an especially strong need in modern large-scale data analysis to prioritize samples for manual inspection. For example, the inspection could target important mislabeled samples or key vulnerabilities exploitable by an adversarial attack. In order to solve the “needle in the haystack” problem of which samples to inspect, we develop a new scalable version of Cook's distance, a classical statistical technique for identifying samples which unusually strongly impact the fit of a regression… CONTINUE READING
    10 Citations
    Generative adversarial networks for increasing the veracity of big data
    • 10
    Understanding Black-box Predictions via Influence Functions
    • 871
    • PDF
    Scalable Explanation of Inferences on Large Graphs
    • 2
    • PDF
    Less is More: Culling the Training Set to Improve Robustness of Deep Neural Networks
    • 6
    • PDF
    Lazy Stochastic Principal Component Analysis
    • 2
    • PDF
    Exact and Consistent Interpretation of Piecewise Linear Models Hidden behind APIs: A Closed Form Solution
    • 2
    • PDF
    Towards Generic Deobfuscation of Windows API Calls
    • 5
    • PDF


    Projecting "Better Than Randomly": How to Reduce the Dimensionality of Very Large Datasets in a Way That Outperforms Random Projections
    • 6
    • PDF
    Better Malware Ground Truth: Techniques for Weighting Anti-Virus Vendor Labels
    • 55
    • PDF
    "Why Should I Trust You?": Explaining the Predictions of Any Classifier
    • 3,489
    • PDF
    Randomized Algorithms for Matrices and Data
    • M. Mahoney
    • Computer Science
    • Found. Trends Mach. Learn.
    • 2011
    • 685
    • PDF
    Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions
    • 2,486
    • PDF
    Database-friendly random projections: Johnson-Lindenstrauss with binary coins
    • 1,070
    • PDF
    Classification in the Presence of Label Noise: A Survey
    • B. Frénay, M. Verleysen
    • Computer Science, Medicine
    • IEEE Transactions on Neural Networks and Learning Systems
    • 2014
    • 751
    • Highly Influential
    • PDF
    Very sparse random projections
    • 483
    • PDF
    Efficient L1 Regularized Logistic Regression
    • 346
    • PDF