# The Softmax Nonlinearity: Derivation Using Statistical Mechanics and Useful Properties as a Multiterminal Analog Circuit Element

@inproceedings{Elfadel1993TheSN, title={The Softmax Nonlinearity: Derivation Using Statistical Mechanics and Useful Properties as a Multiterminal Analog Circuit Element}, author={Ibrahim Abe M. Elfadel and John L. Wyatt}, booktitle={NIPS}, year={1993} }

We use mean-field theory methods from Statistical Mechanics to derive the "softmax" nonlinearity from the discontinuous winner-take-all (WTA) mapping. We give two simple ways of implementing "softmax" as a multiterminal network element. One of these has a number of important network-theoretic properties. It is a reciprocal, passive, incrementally passive, nonlinear, resistive multiterminal element with a content function having the form of information-theoretic entropy. These properties should…

## 26 Citations

### A Low-Voltage, Low-Power Reconfigurable Current-Mode Softmax Circuit for Analog Neural Networks

- EngineeringElectronics
- 2021

A novel low-power low-voltage analog implementation of the softmax function, with electrically adjustable amplitude and slope parameters, is presented, which can be scaled by the number of inputs (and of corresponding outputs).

### Convex Potentials and their Conjugates in Analog Mean-Field Optimization

- MathematicsNeural Computation
- 1995

The saddle-point paradigm of mean-field methods in statistical physics provides a systematic procedure for finding a mapping of constrained optimization problems onto analog networks via the notion of effective energy, and it is shown that within this paradigm, to each closed bounded constraint set is associated a smooth convex potential function.

### Bio-inspired feedback-circuit implementation of discrete, free energy optimizing, winner-take-all computations

- Computer ScienceBiological Cybernetics
- 2016

This work investigates simple analog electric circuits that implement the underlying differential equation under the constraint that they only permit a limited set of building blocks that are biologically interpretable, such as capacitors, resistors, voltage-dependent conductances and voltage- or current-controlled current and voltage sources.

### Competitive learning with floating-gate circuits

- Computer ScienceIEEE Trans. Neural Networks
- 2002

This work has developed an 11-transistor silicon circuit that uses silicon physics to naturally implement a similarity computation, local adaptation, simultaneous adaptation and computation and nonvolatile storage, and is an ideal building block for constructing competitive-learning networks.

### Adaptive CMOS: from biological inspiration to systems-on-a-chip

- Biology, Computer ScienceProc. IEEE
- 2002

Local long-term adaptation is a well-known feature of the synaptic junctions in nerve tissue. Neuroscientists have demonstrated that biology uses local adaptation both to tune the performance of…

### Contrasting Advantages of Learning With Random Weights and Backpropagation in Non-Volatile Memory Neural Networks

- Computer ScienceIEEE Access
- 2019

It is shown that ELM/NoProp systems can achieve better generalization abilities than nanosynaptic MLP systems when paired with pre-processing layers (which do not require backpropagated error), and make such systems worthy of consideration in future accelerators or embedded hardware.

### Equilibria of Iterative Softmax and Critical Temperatures for Intermittent Search in Self-Organizing Neural Networks

- Computer ScienceNeural Computation
- 2007

This work rigorously analyze equilibria of ISM by determining their number, position, and stability types, and offers analytical approximations to the critical symmetry-breaking bifurcation temperatures that are in good agreement with those found by numerical investigations.

### Softmax Is Not an Artificial Trick: An Information-Theoretic View of Softmax in Neural Networks

- Computer ScienceArXiv
- 2019

It is shown that training deterministic neural networks through maximising log-softmax is equivalent to enlarging the conditional mutual information, i.e., feeding label information into network outputs, and generalise the informative-theoretic perspective to neural networks with stochasticity and derive information upper and lower bounds of log- softmax.

### Optimization via Intermittency with a Self-Organizing Neural Network

- Computer ScienceNeural Computation
- 2005

This letter proposes a technique that enables a self-organizing neural network to escape from local minima by virtue of the intermittency phenomenon, which gives rise to novel search dynamics that allow the system to visit multiple global minima as meta-stable states.

### Feedforward Inhibition and Synaptic Scaling – Two Sides of the Same Coin?

- Biology, Computer SciencePLoS Comput. Biol.
- 2012

It is shown that, beyond its conventional use as a mechanism to remove undesired pattern variations, input normalization can make typical neural interaction and learning rules optimal on the stimulus subspace defined through feedforward inhibition.

## References

SHOWING 1-10 OF 17 REFERENCES

### Neurons with graded response have collective computational properties like those of two-state neurons.

- BiologyProceedings of the National Academy of Sciences of the United States of America
- 1984

A model for a large network of "neurons" with a graded response (or sigmoid input-output relation) is studied and collective properties in very close correspondence with the earlier stochastic model based on McCulloch - Pitts neurons are studied.

### Analog VLSI and neural systems

- Mathematics
- 1989

This chapter discusses a simple circuit that can generate a sinusoidal response and calls this circuit the second-order section, which can be used to generate any response that can be represented by two poles in the complex plane, where the two poles have both real and imaginary parts.

### A New Method for Mapping Optimization Problems Onto Neural Networks

- Computer ScienceInt. J. Neural Syst.
- 1989

A novel modified method for obtaining approximate solutions to difficult optimization problems within the neural network paradigm is presented, which considers the graph partition and the travelling salesman problems and exhibits an impressive level of parameter insensitivity.

### Constrained Nets for Graph Matching and Other Quadratic Assignment Problems

- Computer ScienceNeural Comput.
- 1991

The main point of the elastic net algorithm is seen to be in the way one deals with the constraints when evaluating the effective cost function (free energy in the thermodynamic analogy), and not in its geometric foundation emphasized originally by Durbin and Willshaw.

### Winner-Take-All Networks of O(N) Complexity

- Computer ScienceNIPS
- 1988

A series of compact CMOS integrated circuits that realize the winner-take-all function using only O(n) of interconnect and a circuit that computes local nonlinear inhibition is modified.

### Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations

- Computer Science
- 1986

The fundamental principles, basic mechanisms, and formal analyses involved in the development of parallel distributed processing (PDP) systems are presented in individual chapters contributed by…

### Global dynamics of winner-take-all networks

- MathematicsOptics & Photonics
- 1993

In this paper, we study the global dynamics of winner-take-all (WTA) networks. These networks generalize Hopfield's networks to the case where competitive behavior is enforced within clusters of…

### Resistive Fuses: Analog Hardware for Detecting Discontinuities in Early Vision

- GeologyAnalog VLSI Implementation of Neural Systems
- 1989

### Analog neural networks with local competition. I. Dynamics and stability.

- Mathematics, MedicinePhysical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics
- 1993