We developed a flexible framework for modeling the annotation and judgment processes of humans, which we called “normalized gamma construction of a confusion matrix.” This framework enabled us to model three properties: (1) the abilities of humans, (2) a confusion matrix with labeling, and (3) the difficulty with which items are correctly annotated. We also provided the concept of “latent confusion analysis (LCA),” whose main purpose was to analyze the principal confusions behind human annotations and judgments. It is assumed in LCA that confusion matrices are shared between persons, which we called “latent confusions”, in tribute to the “latent topics” of topic modeling. We aim at summarizing the workers’ confusion matrices with the small number of latent principal confusion matrices because many personal confusion matrices is difficult to analyze. We used LCA to analyze latent confusions regarding the effects of radioactivity on fish and shellfish following the Fukushima Daiichi nuclear disaster in 2011. An important theme in collective intelligence is modeling the annotation and judgment processes of humans. We focus on modeling a confusion matrix with labeling. Extracting a confusion matrix is useful for not just obtaining better (closer to the ground truth) aggregation of labels but also obtaining diagnostic information on human annotation and judgments. Dawid and Skene (1979) proposed a probabilistic generative model for subjective labeling. Their model can estimate individual confusion matrices even when the true label is not available. Each worker in this model has a confusion matrix in which if an item (e.g., an image) has true label u, worker j can assign another label l with probability π u,l . Smyth et al. (1994) applied the Dawid and Skene (DS) model to the image labeling problem. Snow et al. Preliminary work. Under review by the International Conference on Machine Learning (ICML). Do not distribute. (2008) applied the DS model to the analysis of opinions in natural language processing. Liu and Wang (2012) applied the DS model to judge the quality of (query, URL) pairs. Whitehill et al. (2009) proposed the Generative model of Labels, Abilities, and Difficulties (GLAD), which simultaneously estimated the expertise of each worker and the difficulty of each task. It is beneficial to use GLAD, unlike the DS model, in that it models the difficulty with which items are correctly annotated. However, it suffers from a critical issue that when we apply GLAD to a task with multiple labels, the confusion matrix of a worker cannot be constructed (see Sec.2.2 for the details).