Abstract

Spiking Neural Networks (SNNs) offer energy-efficient, biologically plausible computation but suffer from non-differentiable spike generation, necessitating reliance on heuristic surrogate gradients. This paper introduces UltraLIF, a principled framework that replaces surrogate gradients with ultradiscretization, a mathematical formalism from tropical geometry providing continuous relaxations of discrete dynamics. The central insight is that the max-plus semiring underlying ultradiscretization naturally models neural threshold dynamics: the log-sum-exp function serves as a differentiable soft-maximum that converges to hard thresholding as a learnable temperature parameter ε → 0. Two neuron models are derived from distinct dynamical systems: UltraLIF from the LIF ordinary differential equation (temporal dynamics) and UltraDLIF from the diffusion equation modeling gap junction coupling across neuronal populations (spatial dynamics). Both yield fully differentiable SNNs trainable via standard backpropagation with no forward-backward mismatch. Theoretical analysis establishes pointwise convergence to tropical LIF dynamics with quantitative error bounds and bounded non-vanishing gradients. Experiments on six benchmarks spanning static images, neuromorphic vision, and audio demonstrate improvements over surrogate gradient baselines, with gains most pronounced in the ultra-low latency regime (T=1) on neuromorphic and temporal datasets. An optional sparsity penalty enables significant energy reduction while maintaining competitive accuracy.

Spike Mechanisms
Comparison of spike activation functions

Figure 1. Spike activation functions and their gradients. (a) The Heaviside (hard threshold) has zero gradient almost everywhere. Surrogate gradient methods replace it with a smooth proxy during training only, creating a mismatch. UltraLIF uses the same smooth function in both forward and backward passes — no mismatch by construction. (b) Gradients: UltraLIF provides bounded, non-vanishing gradients for all membrane potentials.

Lay Summary

Spiking Neural Networks are brain-inspired AI models that communicate through brief electrical pulses, making them far more energy-efficient than conventional AI. However, a fundamental training challenge has persisted as an open problem: the “spike” is like an on/off switch, and on/off switches have no smooth middle ground for standard learning algorithms to work with. Current methods work around this by teaching the network using a smoothed approximation of the spike during training, then switching to the real discrete spike at test time, creating a mismatch between how the model learns and how it actually runs. This inconsistency is a known limitation of virtually all existing SNN training methods.

This work introduces UltraLIF, a spiking neuron derived from first principles using a mathematical technique from physics called ultradiscretization. Rather than approximating the spike, the smooth trainable operation emerges directly from the neuron’s governing equation, making training and inference identical by construction. A temperature parameter, learned automatically during training, controls how sharp each spike is and adapts to the complexity of the task.

UltraLIF outperforms existing methods across six benchmarks, including spoken word recognition, moving-object detection from event-driven cameras, and standard image classification. It also scales to larger, deeper architectures where conventional spiking neurons permanently switch off and stop learning — a failure mode UltraLIF resolves through its self-adjusting firing sensitivity.

Interactive: Temperature Parameter ε

In UltraLIF, a single learnable parameter ε controls the softness of the spike function s = σ(V / ε) and its gradient. Drag the slider to see both effects simultaneously. At small ε the neuron fires sharply (near-binary), but gradients become concentrated near the threshold. At large ε the response is smooth and gradients are wide — but spikes lose discriminability.

Spike function — s = σ(V / ε)
ε = 1.00

Small ε → near-Heaviside. Large ε → gentle sigmoid.

Gradient — ∂s/∂V = s(1−s) / ε
same ε as left

Gradient is bounded for all ε > 0. No dead neurons.

The key advantage: unlike surrogate gradients (which are a different function from the forward spike), UltraLIF’s gradient is the exact derivative of the same spike function used in the forward pass — no mismatch.
Interactive: Membrane Update & Spike Scale

In standard UltraLIF a single ε does two jobs at once: it sets the softness of the max-plus (LSE) membrane update and the spike sharpness (= 1/ε), so the two are coupled. The decoupled variant (UltraLIF-DS) adds a separate spike scale β, giving each parameter one clean role: ε = membrane smoothness and β = spike sharpness, set independently. Drag ε to reshape the membrane (left); drag β to sharpen the spike (right) without touching the membrane.

Membrane update — LSEε(a, 0) vs max(a, 0)
ε = 1.00

LSE (soft max-plus)    ⎯⎯ Hard max (ε→0)

Spike sharpness — s = σ(β · V)
β = 5.0

σ(V/ε) standard    σ(β · V) independent    ⎯⎯ Heaviside

Gradient of LSE — ∂/∂a LSEε(a, 0) = σ(a/ε)

LSE gradient    ⎯⎯ Hard max gradient (step)

Spike gradients — ∂s/∂V for each curve

(1/ε) σ(V/ε)(1−σ(V/ε))   β σ(βV)(1−σ(βV))

In standard UltraLIF spike sharpness is locked to 1/ε, so you cannot sharpen the spikes without also hardening the membrane. Decoupling adds β: ε keeps the membrane smooth and trainable while β sharpens the spikes toward binary, independently.
Results: Ultra-Low Latency (T = 1)

The primary advantage of UltraLIF appears at a single timestep (T=1), where surrogate gradient methods suffer most from forward-backward mismatch. Results below compare the best UltraLIF variant against the best surrogate-gradient baseline on each dataset (1-layer FC, hidden=64, 100 epochs). Gains are largest on temporally-structured data: +11.22% on spoken digits (SHD), +7.96% on event-camera gestures (DVS-Gesture), +3.91% on spiking MNIST (N-MNIST).

Dataset Type Best Ultra (T=1) Model Best Baseline Baseline Model Gain
MNISTStatic 95.67%UltraDLIF 95.58%DSpike+ +0.09%
Fashion-MNISTStatic 83.02%UltraPLIF 82.67%DSpike+ +0.35%
CIFAR-10Static 43.27%UltraPLIF 40.26%DSpike+ +3.01%
N-MNISTNeuromorphic 94.14%UltraDLIF 90.23%DSpike +3.91%
DVS-GestureNeuromorphic 60.23%UltraPLIF 52.27%PLIF +7.96%
SHDAudio spike 51.24%UltraDLIF 40.02%FullPLIF +11.22%
Dataset Best Ultra Best Baseline Gain (pp) Relative gain
MNIST95.67%95.58%+0.09+0.1%
Fashion-MNIST83.02%82.67%+0.35+0.4%
CIFAR-1043.27%40.26%+3.01+7.5%
N-MNIST94.14%90.23%+3.91+4.3%
DVS-Gesture60.23%52.27%+7.96+15.2%
SHD51.24%40.02%+11.22+28.0%
Results: Depth Robustness (1L / 2L / 3L)

Standard LIF suffers dramatic accuracy collapse when depth increases at T=1, because forward-backward mismatch compounds across layers. UltraLIF degrades minimally. The most striking example is SHD: adding a second hidden layer causes LIF to collapse from 37.9% to 19.5% (−18.4pp), while UltraDLIF drops only from 53.8% to 50.8% (−2.9pp). Ultra wins all 6 datasets at 3 layers, T=1.

2-Layer FC — T=1

DatasetLIFUltraDLIFUltraDPLIFUltraLIFUltraPLIF
MNIST95.97%96.10%96.10%95.90%96.22%
Fashion82.65%83.43%83.43%83.07%83.45%
CIFAR-1040.89%44.20%44.20%43.44%44.63%
N-MNIST90.01%94.94%94.94%90.74%94.10%
DVS-Gesture52.27%51.52%51.52%53.41%56.44%
SHD19.48% (−18.4pp)50.84%50.84%36.09%42.01%

3-Layer FC — T=1

DatasetLIFUltraDLIFUltraDPLIFUltraLIFUltraPLIF
MNIST95.90%96.22%96.22%95.63%96.35%
Fashion82.90%83.55%83.55%82.23%83.42%
CIFAR-1040.35%43.64%43.64%43.01%44.15%
N-MNIST87.82%94.87%94.87%93.45%93.68%
DVS-Gesture50.38%51.14%51.14%39.39%43.94%
SHD21.73%45.67%45.67%24.25%30.70%
Results: Energy Efficiency & Architecture Scalability

A sparsity penalty λ on the mean spike rate is added to the cross-entropy loss: Loss = CE + λ · s̄. This directly reduces synaptic operations (energy ∝ spike rate × T) with minimal accuracy cost, since UltraLIF’s learnable ε can adapt to compensate.

Energy efficiency with sparsity penalty (λ = 0.1) — UltraPLIF across all datasets

DatasetTAcc. (λ=0)Acc. (λ=0.1)Spike rate (λ=0)Spike rate (λ=0.1)Reduction
MNIST195.67%95.71% ↑0.4450.26840%
MNIST1097.35%97.35%0.4730.23950%
Fashion-MNIST182.79%83.05% ↑0.4280.26638%
Fashion-MNIST1085.69%85.74% ↑0.4560.27939%
CIFAR-10143.11%43.04%0.4800.33929%
CIFAR-101045.75%45.32%0.4690.34028%
N-MNIST1097.38%96.93%0.4750.29239%
SHD1068.90%70.27% ↑0.4690.38318%

Architecture scalability across fully-connected (FC), convolutional (Conv), and ResNet backbones. On ResNet50, standard LIF produces dead neurons at all timesteps while UltraLIF variants remain stable, demonstrating that the self-adjusting ε resolves the dead neuron failure mode even at large depth.

Architecture scalability — CIFAR-10

ArchitectureTLIFBest UltraModelGain
FC 1-layer141.79%46.40%UltraDPLIF+4.61pp
FC 2-layer140.89%44.63%UltraPLIF+3.74pp
FC 3-layer140.35%44.15%UltraPLIF+3.80pp
Conv 2-layer174.37%70.54%UltraDLIF−3.83pp
ResNet18 + spiking FC193.12%93.37%UltraLIF / UltraDPLIF+0.25pp
ResNet18 + spiking FC1093.10%93.50%UltraLIF+0.40pp
ResNet50 + spiking FC131.83% (dead)92.78%UltraPLIF+60.95pp
ResNet50 + spiking FC535.09% (dead)92.88%UltraPLIF+57.79pp

ResNet18 backbone — all datasets (T=1 / T=5 / T=10)

DatasetLIF T=1Best Ultra T=1LIF T=5Best Ultra T=5LIF T=10Best Ultra T=10
CIFAR-10 93.12%93.37% (UltraLIF) 93.01%93.39% (UltraDLIF) 93.10%93.50% (UltraLIF)
Fashion-MNIST 93.81%93.95% (UltraPLIF) 93.65%93.89% (UltraDPLIF) 93.86%94.24% (UltraDLIF)
N-MNIST 99.20%99.23% (UltraDLIF) 99.13%99.23% (UltraDLIF) 99.13%99.23% (UltraDLIF)

ResNet50 backbone — CIFAR-10 (T=1 / T=5 / T=10)

ModelT=1T=5T=10
LIF31.83% (dead)35.09% (dead)23.23% (dead)
UltraDLIF92.23%92.41%92.54%
UltraDPLIF92.22%92.73%90.42%
UltraLIF91.92%91.85%91.77%
UltraPLIF92.78%92.88%92.82%
Epsilon ablation

Figure. Epsilon ablation on MNIST (T=1, 100 epochs). Learned ε shows a characteristic trajectory: initial sharpening then recovery to a model-specific optimum. Learned ε consistently matches or exceeds all fixed configurations across all four UltraLIF variants.

Citation
@inproceedings{minoza2026ultralif,
  title     = {UltraLIF: Fully Differentiable Spiking Neural Networks
               via Ultradiscretization and Max-Plus Algebra},
  author    = {Mi{\~n}oza, Jose Marie Antonio},
  booktitle = {International Conference on Machine Learning},
  year      = {2026},
}