Head/Tail Breaks: A New Classification Scheme for Data with a Heavy-Tailed Distribution

  title={Head/Tail Breaks: A New Classification Scheme for Data with a Heavy-Tailed Distribution},
  author={Bin Jiang},
  journal={The Professional Geographer},
  pages={482 - 494}
  • B. Jiang
  • Published 13 September 2012
  • Environmental Science
  • The Professional Geographer
This article introduces a new classification scheme—head/tail breaks—to find groupings or hierarchy for data with a heavy-tailed distribution. The heavy-tailed distributions are heavily right skewed, with a minority of large values in the head and a majority of small values in the tail, commonly characterized by a power law, a lognormal, or an exponential function. For example, a country's population is often distributed in such a heavy-tailed manner, with a minority of people (e.g., 20 percent… 

Head/tail Breaks for Visualization of City Structure and Dynamics

Scaling of Geographic Space as a Universal Rule for Map Generalization

Map generalization is a process of producing maps at different levels of detail by retaining essential properties of the underlying geographic space. In this article, we explore how the map

CHAPTER 13 Head / tail Breaks for Visualization of City Structure and Dynamics

The things surrounding us vary dramatically, which implies that there are far more small things than large ones, e.g., far more small cities than large ones in the world. This dramatic variation is

A Head/Tail Breaks-Based Method for Efficiently Estimating the Absolute Boltzmann Entropy of Numerical Raster Data

The condition of head and tail breaks was relaxed and classified data with a heavy-tailed distribution and the average of the data values in a given class was regarded as its representative value and this was substituted into a linear function to obtain the full expression of the relationship between classification level and Boltzmann entropy.

Characterizing the Heterogeneity of the OpenStreetMap Data and Community

The heterogeneity of the entire OSM database and historical archive in the context of big data is characterized, finding that there are far more small elements than large ones, far more inactive users than active ones, and far more lightly edited elements than heavy-edited ones.

A multi-scale representation model of polyline based on head/tail breaks

A model to quantify the multiscale representation of a polyline based on iterative head/tail breaks based on Shannon's information theory and the radical law is introduced and applied to model multiscales polyline representation by quantifying the scale of each simplified polyline.

A Comparison Study on Natural and Head/tail Breaks Involving Digital Elevation Models

The most widely used classification method for statistical mapping is Jenks’s natural breaks. However, it has been found that natural breaks is not good at classifying data which have scaling

Defining Least Community as a Homogeneous Group in Complex Networks

Equal Area Breaks: A Classification Scheme for Data to Obtain an Evenly-colored Choropleth Map

An efficient algorithm for computing the choropleth map classification scheme known as equal area breaks or geographical quantiles is introduced and is compared with the quantiles and Jenks natural breaks algorithms and found to be superior from a visual standpoint by a user study.

Wholeness as a hierarchical graph to capture the nature of space

This paper defines wholeness as a hierarchical graph, in which individual centers are represented as the nodes and their relationships as the directed links, and suggests that the hierarchical levels, or the ht-index of the PR scores induced by the head/tail breaks, can characterize the degree of wholleness for the whole.



Power-Law Distributions in Empirical Data

This work proposes a principled statistical framework for discerning and quantifying power-law behavior in empirical data by combining maximum-likelihood fitting methods with goodness-of-fit tests based on the Kolmogorov-Smirnov (KS) statistic and likelihood ratios.

The selection of class intervals

The selection of class intervals, which can strongly affect the visual impression given by a map, is currently a totally anarchic branch of cartography. While practising cartographers have barely


ABSTRACT A general, objective method is presented for the calculation of class intervals for statistical maps. The arithmetic mean divides a numerical array into two classes and the means of each of

A Universal Rule for the Distribution of Sizes

Human artifacts, ranging from small objects all the way up to large buildings and cities, display a variety and range of subdivisions. Repeating structural and design elements of the same size will

Evaluation of Methods for Classifying Epidemiological Data on Choropleth Maps in Series

Our research goal was to determine which choropleth classification methods are most suitable for epidemiological rate maps. We compared seven methods using responses by fifty-six subjects in a

Scaling of geographic space from the perspective of city and field blocks and using volunteered geographic information

An analogy between a country and a city (or a city or geographic space in general) and a complex organism like the human body or the human brain is drawn to further elaborate on the power of this block perspective in reflecting the structure or patterns of geographic space.

Street hierarchies: a minority of streets account for a majority of traffic flow

  • B. Jiang
  • Computer Science
    Int. J. Geogr. Inf. Sci.
  • 2009
This study provides new evidence as to how a city is (self‐)organized, contributing to the understanding of cities and their evolution using increasingly available mobility geographic information.

Population-Density Maps of the United States: Techniques and Patterns

The construction of an isarithmic map of population density involves a number of problems. In the solution of these problems several techniques were applied, the most important of which was the use

On Grouping for Maximum Homogeneity

Abstract Given a set of arbitrary numbers, what is a practical procedure for grouping them so that the variance within groups is minimized? An answer to this question, including a description of an

Self-organized natural roads for predicting traffic flow: a sensitivity study

It was found that there exists a tipping point from segment-based to road-based network topology in terms of correlation between ranking metrics and their traffic and to the great surprise, this correlation is significantly improved if a selfish rather than utopian strategy is adopted in forming the self-organized natural roads.