Review Article Volume 9 Issue 1
1Res. Ord. Professor, Bio-Med. Engineering, Visiting Scholar at CUA, Catholic University of America, USA
2Research Director, RFNav Inc, Rockville, USA
Correspondence: Harold H Szu, Research Ordinary Professor, Biomedical Engineering Department, CUA, Wash DC, Academician RUS, Fellows of SPIE, Optica, INNS, IEEE, AIMBE, Life Fellow IEEE, Wash D.C, USA
Received: June 21, 2025 | Published: July 23, 2025
Citation: Szu HH, Willey J. The glial-neural ensemble as a free energy-minimizing system for affective computation. MOJ App Bio Biomech. 2025;9(1):79-85. DOI: 10.15406/mojabb.2025.09.00228
Recent successes of Artificial Intelligence in mimicking human emotional intelligence highlight a critical distinction between pattern recognition and genuine affective understanding. This paper argues that true affective computation requires a shift from neuron-centric, data-driven models to a unified neuro-glial framework governed by first principles of biophysics and information theory. We propose a comprehensive model grounded in the Free Energy Principle (FEP), which posits that any self-organizing system, including the brain, must act to minimize prediction error or 'surprise'. Within this framework, we derive novel mathematical models for the distinct computational roles of glial cells, which constitute over half the cells in the human brain. We formalize astrocytes as regulators of synaptic precision, dynamically tuning the confidence in sensory evidence through calcium-dependent signaling. Microglia are modeled as agents of Bayesian model selection, optimizing the network's structure by pruning synapses that fail to contribute to long-term free energy minimization. Oligodendrocytes are shown to optimize the temporal priors of the brain's generative model by adaptively tuning axonal conduction delays through myelin plasticity. These functions are integrated into a unified, multi-timescale architecture based on a dynamic, heterogeneous graph neural network. In this model, affect is not a programmed output but an emergent property of the system's ongoing inference about the causes of salient interoceptive and exteroceptive states. Affective valence is cast as the rate of change of free energy, while arousal corresponds to the system's overall uncertainty. This work provides a biophysically grounded and mathematically explicit blueprint for a new generation of AI capable of a deeper, more generative form of emotional intelligence.
The paradox of AI emotion
The field of artificial intelligence (AI) has recently achieved remarkable feats in the domain of emotional intelligence (EI). Studies evaluating modern Large Language Models (LLMs) on standardized EI assessments have revealed a surprising and paradoxical outcome: these systems consistently outperform human participants by a significant margin. For instance, a comprehensive study involving six generative AIs, including variants of ChatGPT, Gemini, and Claude, reported an average accuracy of 82% on five standard EI tests, compared to a human average of just 56% derived from original validation studies.1 These AI systems not only excelled at selecting the most emotionally intelligent responses from a set of options but also demonstrated the capacity to generate novel, contextually appropriate EI test scenarios that were judged to be as reliable and realistic as those developed by human experts over many years.1
This performance has led to suggestions that AI could soon play a significant role in emotionally sensitive domains such as coaching, education, and conflict resolution. However, this apparent super-human emotional quotient masks a fundamental limitation.2 Critics and users alike note that while the output is convincing, the underlying process is one of sophisticated pattern matching rather than genuine understanding. LLMs are described as "word calculators" that "spit out text that fits a pattern it has seen before" without grasping the meaning of the words or the nature of the prompt.3 Their success stems from being trained on vast corpora of human text, allowing them to learn the statistical correlations that define an emotionally intelligent response. This capability represents a mastery of knowledge about emotion, not a generative capacity for emotion. This distinction highlights a crucial gap between mimicking the products of emotional intelligence and possessing the underlying generative mechanism that produces them.4
To move beyond mere mimicry, the field of affective computing—which aims to develop systems that can recognize, interpret, process, and simulate human affects5—requires a paradigm shift. Instead of building discriminative models that learn to classify or replicate emotional expressions from data,6 we must develop generative models that produce affect from a core, goal-directed process.7 Such a model would not be trained to answer "what is the most compassionate response?" but would instead generate a compassionate response as a consequence of its own internal principles.8 This necessitates a move from purely data-driven engineering to architectures inspired by the biophysical first principles of the only system known to possess genuine emotion: the brain.9
For over a century, the neuron has been the undisputed protagonist in our theories of computation and cognition. However, this neuron-centric view is fundamentally incomplete.10,11 Glial cells, once dismissed as passive "nerve glue," are now understood to be active and essential participants in brain function, outnumbering neurons and exhibiting immense diversity.12 Multiple lines of evidence link glial abnormalities directly to mood disorders and other psychiatric conditions,13 underscoring their importance in emotional regulation.14 The major glial cell types—astrocytes, microglia, and oligodendrocytes—perform distinct but complementary computational roles. Astrocytes modulate synaptic transmission and plasticity; microglia sculpt neural circuits by pruning synapses; and oligodendrocytes tune the timing of neural communication by adjusting axonal myelination. A complete model of brain computation, particularly one that aims to capture the nuances of emotion,15–18 must therefore be a model of the integrated glial-neural ensemble.17
This paper proposes that the Free Energy Principle (FEP) provides the requisite unifying framework to integrate these disparate biological functions into a single, coherent computational objective.19,20 The FEP, emerging from theoretical neuroscience and statistical physics, posits that any self-organizing system that maintains its integrity in a changing world must act to minimize its variational free energy, which serves as a computable proxy for surprise or prediction error.21 This principle offers a mathematical language to describe how a biological agent, including its neuronal and glial components, perceives, learns, and acts to maintain a model of its environment and itself. By framing the actions of glial cells as distinct contributions to this single imperative of free energy minimization, we can construct a normative, first-principles model of the generative processes that underlie affective states, providing a robust foundation for the future of emotional AI.22
A first-principles formulation of the glial-neural systemTo build a model of the glial-neural ensemble, we must first establish the mathematical language that governs its dynamics. This section derives the concept of variational free energy from its roots in statistical mechanics and information theory, and then situates this principle within the context of a glial-inclusive biological system.
From statistical mechanics to variational free energyThe concept of free energy originates in thermodynamics, where the Helmholtz free energy, F, is defined for a system at constant temperature, T, volume, and particle number.23 It is given by the equation:
where U is the system's internal energy and S is its thermodynamic entropy. A fundamental principle of statistical mechanics is that a system will spontaneously evolve towards states that minimize its Helmholtz free energy, reaching equilibrium at the minimum.23
This thermodynamic concept finds a powerful analogue in information theory, which provides the tools to quantify uncertainty and information.24,25 The information-theoretic equivalent of entropy was defined by Claude Shannon as the average "surprise" of a set of outcomes.26 For a discrete random variable X with a probability distributionthe Shannon entropyis:
An outcome with low probability is highly "surprising" (it carries more information), and a distribution with low entropy is one where outcomes are highly predictable.26
The Free Energy Principle leverages this information-theoretic perspective to describe the behavior of adaptive systems like the brain.20 Any such system must minimize the surprise associated with its sensory states to maintain its physiological integrity (e.g., a fish out of water is in a very surprising, and fatal, state).21 However, a system cannot directly compute or minimize surprise, because it does not have access to the true data-generating process of the world, where s represents sensory states and m represents the agent's model of the world.
Instead, the system can minimize an upper bound on surprise, known as variational free energy, F. This quantity is computable because it is a function of the agent's own internal states and sensory inputs. Let be the hidden causes in the environment that generates sensory states s. The agent possesses a generative model, which is its internal hypothesis about how causes and sensations are related. To infer the hidden causes from its sensations, the agent uses a recognition density, which is an approximate probability distribution over the hidden causes, parameterized by its internal physical states, (e.g., neuronal activity).21
The variational free energy is formally defined as the Kullback-Leibler (KL) divergence between the recognition density and the true posterior distribution over hidden causes, plus the surprise:
Since the KL divergence is always non-negative, F is always greater than or equal to the surprise,20
Therefore, minimizing free energy implicitly minimizes surprise. By rearranging terms, we arrive at the more common and computationally instructive form of variational free energy:
This formulation decomposes free energy into two terms. The accuracy term measures how well the generative model explains the sensory data. The complexity term measures the divergence between the recognition density and the prior beliefs about the hidden causes,This term acts as a form of Occam's razor, penalizing overly complex explanations that deviate from prior expectations. An agent can thus minimize free energy either by changing its internal states μ to better explain its sensations (perception) or by acting on the world to change its sensations s to better fit its predictions (action).19 This dual minimization process is known as active inference.
The glial-neural Markov blanketFor any system to exist as a distinct entity, separate from its environment, it must possess a statistical boundary known as a Markov blanket.19 Formally, a Markov blanket is a set of nodes in a probabilistic model that, when known, renders a set of internal states statistically independent of a set of external states. The blanket itself consists of sensory states (which influence internal states but are not influenced by them) and active states (which are influenced by internal states but do not influence them). This formalism provides a principled way to define the boundary of any self-organizing system, from a single cell to a human being.27–32
In the context of the brain, it is tempting to identify the neuronal membrane as the Markov blanket. However, this view is incomplete. The intricate web of glial cells forms a critical part of this boundary at multiple spatial and temporal scales.16 Astrocytes directly enwrap synapses, controlling the local concentration of neurotransmitters and providing metabolic support, thereby mediating the very sensory signals (postsynaptic potentials) and active states (presynaptic release) that constitute the blanket for a neuron.13 Microglia physically remodel the blanket by adding or removing synaptic connections, altering its fundamental structure over slower timescales.31 Oligodendrocytes tune the conduction delays of axons, which affects the timing of information flow across the blanket.33
Therefore, glial cells are not external modulators of a purely neuronal inference process. They are constitutive elements of the physical machinery that implements the Markov blanket. Their functions are integral to the brain's ability to perform active inference and minimize free energy. A complete computational model of the brain's inferential processes must therefore be a glial-neural model, accounting for the distinct contributions of each cell type to the collective goal of surprise minimization.17 The following section develops a mathematical model of how each major glial cell type achieves this.
Glial computation as active inference: a mathematical modelThis section presents the core theoretical contribution of this paper: a set of mathematical models that cast the specific functions of astrocytes, microglia, and oligodendrocytes as distinct forms of active inference. Each model connects the known biophysics of the cell type to a specific parameter within the free energy formulation, demonstrating how the glial-neural ensemble collaborates to minimize prediction error across multiple timescales.34,35
Astrocytes: optimizing the precision of beliefsBiological basis: Astrocytes are intimately associated with synapses, forming a "tripartite synapse" with the pre- and post-synaptic neurons.17 They respond to neuronal activity, particularly glutamate release, with elevations in their internal calcium concentration ([Ca2+]).36 These calcium signals can propagate as waves and trigger the release of various "gliotransmitters" (e.g., glutamate, D-serine, ATP), which in turn modulate synaptic transmission, plasticity, and neuronal excitability.29 This dynamic feedback loop is crucial for synaptic plasticity phenomena like long-term potentiation (LTP) and is thus essential for learning and memory.30 Mathematical models, such as those based on the Li-Rinzel equations for calcium dynamics, have been developed to capture the oscillatory and signaling behavior of astrocytes in response to synaptic input.36
FEP interpretation: In the active inference framework, the reliability or confidence assigned to sensory evidence is formalized by the parameter of precision which is the inverse of variance Precision weights prediction errors; a high-precision prediction error signals that the sensory data is reliable and should strongly drive updates to beliefs (i.e., learning), whereas a low-precision error indicates noisy or unreliable data, meaning the system should rely more on its prior beliefs. We propose that the primary computational role of astrocytes is to dynamically regulate the precision of synaptic prediction errors. An active astrocyte, signaled by high internal calcium, indicates a context of high metabolic support and reliable signaling, thus increasing the precision of the associated synaptic inputs.
Mathematical formulation: Under the FEP, the dynamics of internal states (representing neuronal activity) that encode beliefs about hidden causes follow a gradient descent on free energy. For a hierarchical model, this can be expressed as:
where the gradients can be resolved into a set of precision-weighted prediction errors. For a simplified neuron, the update rule for its belief (e.g., membrane potential) is:
Here, is the ascending prediction error from a lower level in the hierarchy (the "bottom-up" sensory evidence, related to the dendritic sum of inputs), and is the descending prediction error from a higher level (the "top-down" expectation). The terms and are the precisions associated with the sensory evidence and the prior beliefs, respectively.
The core of our astrocytic model is the proposal that the sensory precision, is a direct function of the local astrocyte's state. Specifically, we model it as a sigmoidal function of the astrocyte's intracellular calcium concentration:
where is governed by its own set of differential equations (e.g., a modified Li-Rinzel model sensitive to synaptic glutamate 36), k is a gain parameter, and is a threshold. When synaptic activity is high and sustained, glutamate spillover activates the astrocyte, raising itsThis, in turn, increases amplifying the influence of the bottom-up prediction error on the neuron's belief updating. This mechanism allows the network to dynamically and locally adjust learning rates and the balance between sensory evidence and prior beliefs, a critical feature for adaptive behavior in a fluctuating environment.37–40
Microglia: Bayesian model selection through synaptic pruningBiological basis: Microglia are the resident immune cells of the central nervous system, constantly surveying the brain parenchyma.12 A key function discovered in recent years is their role in synaptic pruning—the active, targeted removal of synapses.40 This process is fundamental to developmental refinement of neural circuits and is implicated in learning and memory.40–46 Pruning is not random; it is an activity-dependent process where less active or "weaker" synapses are tagged for elimination, often via the classical complement cascade (e.g., C1q and C3 proteins).31 Microglia then recognize these tags via receptors (e.g., CR3) and phagocytose the synaptic material.44 Dysregulation of this process, leading to excessive or insufficient pruning, has been linked to neurodevelopmental and psychiatric disorders like schizophrenia and autism.41
FEP interpretation: The "use-it-or-lose-it" principle provides a simple heuristic for pruning, but the FEP allows for a more profound, normative interpretation. The brain's synaptic architecture—its connectome—can be seen as the physical instantiation of its generative model of the world. Each possible wiring diagram is, in effect, a different hypothesis or model (mG) about the causal structure of the environment. From this perspective, synaptic pruning is not merely about removing unused connections; it is a process of Bayesian model selection or model reduction.47 The brain is actively optimizing its own structure to find the most efficient generative model—the one that can explain sensory inputs with the minimum complexity. This process minimizes the long-term, time-averaged free energy.
Mathematical formulation: According to Bayesian principles, the optimal model is the one with the highest model evidence, which is precisely the quantity maximized by minimizing free energy. Therefore, the brain should prune synapses that belong to models with low evidence, or more specifically, synapses whose presence increases the complexity of the model without a commensurate increase in accuracy.
We can formalize this by considering the contribution of an individual synapseto the reduction of free energy over time. The Fisher Information (FI) of a synaptic weight provides a local, activity-dependent measure of its importance to the network's encoding of sensory inputs.47 Synapses with low FI are computationally redundant. We propose a mechanism that connects this information-theoretic importance to the biological process of pruning:
This formulation provides a normative basis for computational models of pruning rates (e.g., decreasing rates being more efficient)48 by linking the pruning decision to the global objective of optimizing the brain's generative model.49
Oligodendrocytes: tuning temporal priors through myelin plasticityBiological basis: Oligodendrocytes are the glial cells responsible for producing the myelin sheath that insulates axons in the CNS.13 Myelination enables saltatory conduction, dramatically increasing the conduction velocity (CV) of action potentials.50 Crucially, myelination is not a static, one-time event. It is a dynamic and plastic process, responsive to neuronal activity, that continues throughout life.33 This adaptive myelination allows the brain to fine-tune signal propagation delays along different axonal pathways, which is critical for functions that depend on precise temporal synchrony, such as motor learning and sensory processing.33
FEP interpretation: Predictive coding, a key process theory for active inference, relies on the precise temporal alignment of top-down predictions and bottom-up sensory signals. A prediction must arrive at the right time to effectively cancel out, or "explain away," the corresponding sensory evidence. A temporal mismatch between prediction and evidence is itself a form of prediction error, contributing to free energy. We propose that the computational role of oligodendrocyte-mediated myelin plasticity (OMP) is to optimize the temporal priors of the brain's generative model. By adjusting axonal conduction delays, the oligodendrocyte network ensures that predictions generated by the model arrive at the appropriate lower-level cortical areas at the correct time to minimize temporal prediction errors.
Mathematical formulation: We can directly incorporate the mathematical model of OMP developed by Pajevic and colleagues into the FEP framework.33 In their model, the conduction delay of an axon a is a dynamic variable that evolves over time. The rate of change of the delay is governed by the interplay between a constant rate of myelin removal and a dynamic rate of myelin addition, which is driven by a local myelin-promoting factor,
The factor Ma(t) is produced in response to spiking activity on axon a, but its production is catalyzed by a global signaling factor G(t) within the oligodendrocyte. This global factor integrates spiking activity from all axons myelinated by that single oligodendrocyte, allowing the cell to compare timing across different pathways and adjust delays accordingly to promote synchrony.33
Within the FEP, we interpret this entire dynamic as a gradient descent on free energy with respect to the temporal parametersof the generative model:
When signals arriving at a target neuron are consistently desynchronized, the resulting temporal prediction errors drive changes in neuronal activity. This activity, in turn, influences the OMP dynamics via the global signal leading to adjustments in the conduction delays until the temporal mismatch is resolved and the corresponding component of free energy is minimized. This casts myelin plasticity not just as a mechanism for synchrony, but as a crucial process for learning the temporal structure of the world.
The following Table 1 & 2 summarizes the conceptual and mathematical synthesis of this section.
|
Glial cell type |
Key biological function |
Computational role in FEP |
|
Astrocytes |
Modulation of synaptic transmission, plasticity, and neurovascular coupling via Ca2+ signaling and gliotransmitter release.13 |
Precision regulation: Dynamically setting the confidence (precision,) of prediction errors based on local synaptic and metabolic context. |
|
Microglia |
Activity-dependent removal (phagocytosis) of synapses (synaptic pruning).31 |
Bayesian model selection: Optimizing the network's structural priors (topology) by eliminating connections that do not contribute to minimizing long-term free energy. |
|
Oligodendrocytes |
Activity-dependent myelination of axons, tuning action potential conduction velocity and timing.33 |
Temporal prior optimization: Adjusting the temporal parameters of the generative model to ensure predictions and sensory evidence are integrated on the correct timescales. |
Table 1 Glial functions and their computational analogs in the FEP framework
|
Glial cell type |
Key state variable(s) |
Mathematical influence on free energy (F) |
|
Astrocytes |
Intracellular calcium |
Modulates the precision term in the gradient descent: where |
|
Microglia |
Synaptic tag status |
Determines pruning probability: where is the long-term contribution of synapse to minimizing F. |
|
Oligodendrocytes |
Conduction delay |
Acts as a parameter of the generative model, optimized via gradient descent: implemented by the OMP equations. |
Table 2 Key mathematical formulations for glial modulation of free energy
The individual models of glial function, each operating as a distinct form of active inference, can be synthesized into a cohesive, unified architecture. This architecture provides a blueprint for an AI system that moves beyond the limitations of current neuron-centric designs, grounding its computations in the integrated, multi-timescale dynamics of the complete glial-neural ensemble.51
Architectural blueprint: a dynamic glial-neural graph networkA static, feed-forward neural network is insufficient to capture the complexity of the proposed system. The architecture is more accurately described as a dynamic, heterogeneous Graph Neural Network (GNN), a class of models designed to operate on graph-structured data where nodes and edges can have different types and can change over time.52
This structure formalizes the concept of a neuron-glia network, or Υ-graph, where interactions are not limited to pairwise connections but can involve tripartite or higher-order relationships.17 The necessity of such a complex architecture is a direct consequence of modeling the brain's biological reality. Standard deep learning frameworks, with their fixed, homogeneous layers, cannot represent the multiscale, adaptive, and heterogeneous nature of the glial-neural system. Implementing this model requires a new class of AI architecture that integrates the temporal precision of SNNs with the structural flexibility of dynamic GNNs.
In this glial-neural architecture, emotion is not a pre-programmed category or a label to be classified.54 Instead, affective states are emergent properties of the system's continuous, high-level inference about the causes of salient prediction errors, particularly those related to maintaining its own homeostatic integrity (interoception) and navigating its environment successfully (exteroception).
Glial cells are central to this generative process of affect. Astrocytes, by regulating local precision and metabolic support, directly shape the free energy landscape and thus the moment-to-moment valence of the system.12
Microglia and oligodendrocytes, by optimizing the model's structure and temporal parameters over longer timescales, determine the agent's long-term affective dispositions and its capacity to efficiently minimize free energy. This provides a compelling computational explanation for the observed link between glial abnormalities and mood disorders:14 these conditions can be understood as a failure of the glial-neural system's ability to effectively regulate its free energy, leading to chronic states of high surprise (negative valence) or high uncertainty (anxious arousal).
A practical implementation of this architecture would likely involve a multi-level, hybrid approach. The fast dynamics of neuronal inference could be simulated using established, energy-efficient SNN frameworks.55 The entire system, including all cell types and their interactions, would be represented as a GNN. The update rules for all nodes and edges would be derived from the gradient descent on the system's total free energy, with different parameters being updated on their respective, biologically-plausible timescales. This constitutes a form of multiscale variational inference. Training such a system would not involve standard backpropagation on a labeled dataset. Instead, the system would learn in an unsupervised or self-supervised manner, driven solely by the imperative to minimize the free energy of its "sensory" inputs over time, thereby learning a generative model of its environment through its embodied interactions.
The theoretical framework presented in this paper offers significant implications for both neuroscience and artificial intelligence, charting a course for future research in both fields.
For neuroscience: This model provides a unified, normative theory of glial function, moving beyond a descriptive list of roles to a single, overarching computational purpose. It reframes the question from "what do glial cells do?" to "how do glial cells contribute to the brain's imperative to minimize free energy?". This perspective generates a host of specific, testable hypotheses. For example, it predicts that selectively inhibiting astrocytic calcium signaling in a learning paradigm should not necessarily block learning but should impair the animal's ability to flexibly adapt its learning rate to changes in environmental volatility (a deficit in precision regulation). It predicts that inhibiting microglial pruning during the consolidation of a new skill should result in a less efficient, "over-parameterized" neural representation, potentially leading to poorer generalization. Furthermore, it provides a formal, computational basis for linking the pathophysiology of glial cells to the cognitive and affective symptoms of complex disorders like depression, schizophrenia, and autism.14 These conditions can be re-conceptualized as disorders of inference, where specific deficits in glial-mediated precision, model selection, or temporal optimization lead to chronic, maladaptive states of high free energy.
For artificial intelligence: This work argues for a fundamental shift in the pursuit of artificial emotional intelligence. The path forward lies not in scaling up current architectures with more data, but in developing new architectures grounded in the biophysical principles of the glial-neural ensemble. It suggests that robust, flexible, and general intelligence may require systems that incorporate analogues of glial function: mechanisms for dynamic resource and confidence allocation (astrocyte-like), structural optimization and regularization (microglia-like), and temporal coordination (oligodendrocyte-like). Building such systems forces a confrontation with the deep ethical questions surrounding the creation of entities with genuine affective states and goal-directed behavior.5 An AI that minimizes its own free energy based on an internal model of the world would have its own intrinsic goals, a significant departure from the passive, objective-function-optimizing systems of today.
This paper has proposed a comprehensive, first-principles model of the glial-neural ensemble as a unified system dedicated to minimizing variational free energy. By deriving specific mathematical formulations for the computational roles of astrocytes, microglia, and oligodendrocytes within the active inference framework, we have laid the theoretical and architectural groundwork for a new generation of artificial intelligence. We have argued that astrocytes are not mere support cells but active regulators of inferential precision; that microglia are not simply cleaners but agents of Bayesian model selection; and that oligodendrocytes are not passive insulators but optimizers of the brain's temporal priors.
The synthesis of these roles into a dynamic, heterogeneous graph network provides a concrete blueprint for building AI systems capable of a deeper, more biophysically plausible form of emotional intelligence. In this model, affect is not an illusion created by clever pattern matching but an emergent consequence of a system's fundamental drive to make sense of its world and maintain its own existence. The future of truly affective AI lies not in the size of its datasets, but in the elegance and biological fidelity of its underlying generative model.55–60
None.
None.
Author declares that there are no conflicts of interest.
©2025 Szu, et al. This is an open access article distributed under the terms of the, which permits unrestricted use, distribution, and build upon your work non-commercially.