Ron Begleiter

Learn More
This paper is concerned with algorithms for prediction of discrete sequences over a finite alphabet, using variable order Markov models. The class of such algorithms is large and in principle includes any lossless compression algorithm. We focus on six prominent prediction algorithms, including Context Tree Weighting (CTW), Prediction by Partial Match (PPM)(More)
The disagreement coefficient of Hanneke has become a central concept in proving active learning rates. It has been shown in various ways that a concept class with low complexity together with a bound on the disagreement coefficient at an optimal solution allows active learning rates that are superior to passive learning ones. We present a different tool for(More)
We consider an active learning game within a transductive learning model. A major problem with many active learning algorithms is that an unreliable current hypothesis can mislead the querying component to query “uninformative” points. In this work we propose a remedy to this problem. Our solution can be viewed as a “patch” for fixing this deficiency and(More)
We present worst case bounds for the learning rate of a known prediction method that is based on hierarchical applications of binary context tree weighting (CTW) predictors. A heuristic application of this approach that relies on Huffman’s alphabet decomposition is known to achieve state-ofthe-art performance in prediction and lossless compression(More)
We consider the statistical learning setting of active learning in which the learner chooses which examples to obtain labels for. We identify a useful general purpose structural property of such learning problems, giving rise to a query-efficient iterative procedure achieving approximately optimal loss at an exponentially fast rate, where the rate is(More)
This paper presents a fast and scalable method for detecting threats in large-scale DNS logs. In such logs, queries about “abnormal” domain strings are often correlated with malicious behavior. With our method, a language model algorithm learns “normal” domain-names from a large dataset to rate the extent of domain-name(More)
Clustering is considered a non-supervised learning setting, in which the goal is to partition a collection of data points into disjoint clusters. Often a bound k on the number of clusters is given or assumed by the practitioner. Many versions of this problem have been defined, most notably k-means and k-median. An underlying problem with the unsupervised(More)
We provide a general purpose Matlab tool for performance evaluation of supervised learning algorithms. Given any existing implementation of a classifier learning algorithm, our Generic-CV tool operates the algorithm, tunes its hyper-parameters and evaluates its performance on any given dataset. The tool is generic in the sense that it easily accommodates(More)
One of the practical obstacles of learning to rank from pairwise preference labels is in its (apparent) quadric sample complexity. Some heuristics have been tested for overriding this obstacle. In this workshop we will present new provable method for reducing this sample-complexity, almost reaching the informational lower bound, while suffering only(More)
  • 1