Posts on Nam Le

Navier–Stokes Existence and Smoothness

Fri, 29 May 2026 00:00:00 +0000

The motion of a viscous incompressible fluid is described by the Navier–Stokes equations, first written down by Claude-Louis Navier in 1822 and given their modern form by George Gabriel Stokes. Whether smooth solutions to these equations can always be continued for all time (or whether they can spontaneously develop a singularity at some finite time) is one of the deepest open problems in mathematics, and one of the seven Clay Millennium Prize Problems, carrying a 1,000,000$ prize for a solution.

Problem (Clay Millennium Prize, Fefferman 2000)

Let $u_0 : \mathbb{R}^3 \to \mathbb{R}^3$ be a smooth divergence-free vector field. Does there exist a smooth solution $u(x,t)$, $p(x,t)$ to the 3D incompressible Navier–Stokes equations $$\partial_t u + (u \cdot \nabla)u - \nu\Delta u + \nabla p = 0, \qquad \nabla \cdot u = 0, \qquad u(\cdot,0) = u_0$$ defined for all $t > 0$ and satisfying $\int_{\mathbb{R}^3}|u(x,t)|^2,dx < C$ for all $t \geq 0$? A solution or a counterexample (a smooth $u_0$ for which no such smooth solution exists) both qualify for the prize.

The Equations and Their Scaling #

Compared to the Euler equations (which describe inviscid flow), the Navier–Stokes equations add the viscous term $\nu\Delta u$, where $\nu > 0$ is the kinematic viscosity. This term dissipates energy and regularises the flow locally. The central tension is that the nonlinear term $(u\cdot\nabla)u$ can concentrate energy at small spatial scales faster than viscosity can diffuse it away.

Scaling symmetry. The Navier–Stokes equations are invariant under the rescaling $$u(x,t) \mapsto \lambda u(\lambda x,, \lambda^2 t), \qquad p(x,t) \mapsto \lambda^2 p(\lambda x,, \lambda^2 t).$$ A norm is critical (or scale-invariant) if it is preserved by this rescaling. The critical norm in $L^p(\mathbb{R}^3)$ is $L^3$, since $|\lambda u(\lambda\cdot)| _{L^3} = |u| _{L^3}$. The energy norm $|u| _{L^2}$ is subcritical: it scales as $\lambda^{1/2}|u| _{L^2}$, which shrinks under the rescaling $\lambda \to \infty$ (i.e., zoom into small scales). This mismatch is the core of the difficulty: global energy control does not prevent concentration at arbitrarily small scales.

2D global regularity. In two dimensions the scaling is different: the enstrophy $|\nabla u|_{L^2}^2$ is scale-invariant and is controlled by the energy. Global regularity in 2D follows from this enstrophy estimate, a fact known since the 1960s. In 3D no analogous critical quantity is controlled globally, and the problem is open.

The Hierarchy of Known Results #

Leray–Hopf Weak Solutions (1934) #

Theorem (Leray 1934, Hopf 1951)

For any $u_0 \in L^2(\mathbb{R}^3)$ divergence-free, there exists a global weak solution $u \in L^\infty(0,\infty;, L^2) \cap L^2(0,\infty;, H^1)$ satisfying the energy inequality $$|u(t)| _{L^2}^2 + 2\nu\int _0^t |\nabla u| _{L^2}^2, ds \leq |u_0| _{L^2}^2.$$

Leray’s construction, via a compactness argument on regularised equations, produces a solution that is globally defined but potentially not smooth, and the term “weak” refers to the fact that the equations are satisfied only in an integral (distributional) sense, not pointwise. The energy inequality is the only bound available globally. Whether Leray–Hopf solutions are unique, or whether they are the same as smooth solutions when the initial data is smooth, is unknown.

Partial Regularity: The CKN Theorem #

The best known result limiting the size of potential singularities is the following.

Theorem (Caffarelli–Kohn–Nirenberg, 1982)

For any suitable weak solution to the 3D Navier–Stokes equations, the set of space-time singular points has parabolic Hausdorff dimension at most 1. In particular, at any given time the spatial singular set has Hausdorff dimension at most $\dfrac{1}{2}$.

A “suitable weak solution” is a weak solution satisfying a local energy inequality. The CKN theorem proves that singularities, if they exist, cannot fill a curve or surface: they can occupy at most a set of dimension one in space-time. This is the most quantitative partial regularity result available and was simplified by Lin (1998). Scheffer (1977) had earlier shown singular times have Hausdorff dimension at most $\dfrac{1}{2}$.

Conditional Regularity: Ladyzhenskaya–Prodi–Serrin #

Theorem (Ladyzhenskaya 1967, Prodi 1959, Serrin 1962)

If a weak solution additionally satisfies $u \in L^r(0,T;, L^s(\mathbb{R}^3))$ with $\dfrac{2}{r} + \dfrac{3}{s} = 1$ and $3 < s \leq \infty$, then $u$ is smooth on $(0,T]$.

The condition $\dfrac{2}{r} + \dfrac{3}{s} = 1$ is precisely the scale-invariant line in the $(r,s)$ plane: membership in any of these spaces implies regularity. The family ranges from $(r,s)=(\infty, 3)$ (critical $L^3$ control in space, uniform in time) to $(r,s)=(2,\infty)$ (square-integrable $L^\infty$ control in time). These are conditional results: they do not prove that a weak solution lies in such a space, only that if it does, it must be smooth.

The Critical Endpoint: Escauriaza–Seregin–Šverák #

Theorem (Escauriaza–Seregin–Šverák, 2003)

If $u$ is a Leray–Hopf weak solution with $\sup _{t \in [0,T^*)} |u(\cdot,t)| _{L^3(\mathbb{R}^3)} < \infty$, then $u$ can be extended as a smooth solution past $T^*$.

The endpoint case $s=3$ of the LPS family is the critical one: $L^3(\mathbb{R}^3)$ is exactly the scale-invariant norm for Navier–Stokes. The ESS proof is substantially harder than the subcritical cases; it uses a compactness argument to reduce to a smooth, backwards self-similar solution and then invokes a backwards uniqueness theorem for parabolic equations to rule it out.

Tao’s Quantitative Criterion #

Theorem (Tao, 2019)

If a smooth finite-energy solution first becomes singular at time $T^*$, then $$\limsup_{t \uparrow T^*} \dfrac{|u(\cdot,t)| _{L^3(\mathbb{R}^3)}}{\bigl(\log\log\log\tfrac{1}{T^*-t}\bigr)^c} = \infty$$ for some absolute constant $c>0$. In particular, the critical $L^3$ norm must blow up at least as fast as a triple-logarithm in $(T^*-t)^{-1}$.

Tao’s result is the first supercritical regularity criterion for Navier–Stokes: it gives quantitative information about the blowup rate that goes (by a triple logarithm) beyond what scaling alone can detect. The proof quantifies the compactness arguments in the ESS proof, replacing each use of a compactness method by an explicit Carleman inequality, and propagates lower bounds for the vorticity across dyadic annuli. The triple-exponential dependence in Tao’s bound has since been localised and sharpened by Barker–Prange (2021) and others.

The Supercriticality Problem #

The fundamental analytical obstruction is that Navier–Stokes is supercritical with respect to the only globally controlled norm ($L^2$): the energy.

Define the critical regularity index as the Sobolev exponent $s$ such that $\dot{H}^s(\mathbb{R}^3)$ is scale-invariant. For Navier–Stokes, $s = 1/2$. The energy controls $\dot{H}^0 = L^2$ (subcritical), and regularity theory requires control at $\dot{H}^1$ (critical viscous norm) or $L^3$ (critical Lebesgue norm). There is a regularity gap between what is globally available ($L^2$) and what is needed ($L^3$ or $\dot{H}^1$). Every known approach to closing this gap runs into the same obstruction: the nonlinearity can create structure at arbitrarily small scales that the subcritical $L^2$ bound cannot see.

Tao (2016) made this gap precise by constructing an averaged Navier–Stokes system, where the bilinear nonlinearity $(u\cdot\nabla)u$ is replaced by a carefully designed convex average of related nonlinearities, for which finite-time blowup can be rigorously proved. This construction does not produce a counterexample to the true Navier–Stokes equations, but it demonstrates that the specific algebraic structure of the nonlinearity is load-bearing: any proof of global regularity must use something specific about $(u\cdot\nabla)u$ that is not shared by its averages.

Research Directions #

1. Improving the Quantitative Blowup Rate #

Tao’s triple-logarithmic rate is the sharpest known lower bound on blowup of the critical $L^3$ norm. Scaling considerations suggest that the true rate, if blowup occurs, should be much faster; conjecturally $|u|_{L^3} \sim (T^*-t)^{-\delta}$ for some $\delta > 0$, analogous to Type I blowup in nonlinear heat equations. The gap between the triple-logarithmic lower bound and the conjectured power-law rate represents the frontier of quantitative regularity theory. Closing even part of this gap, for instance establishing a single-logarithmic or power-of-log lower bound, would require new ideas beyond Carleman estimates.

2. Type I vs. Type II Blowup #

A blowup is called Type I if the scale-invariant norm $|u(\cdot,t)|_{L^3}$ grows no faster than $O((T^-t)^{-1/2})$ near $T^$. It is Type II otherwise. For the Navier–Stokes equations, ruling out Type I blowup would be a significant advance: all self-similar singularities (where $u(x,t) = (T^*-t)^{-1/2}U(x/(T^*-t)^{1/2})$) are of Type I, and several results (including work of Ružička and Seregin) already rule them out under mild additional assumptions. Whether all Type I blowup can be excluded, leaving only the less structured Type II, is open.

3. Uniqueness of Weak Solutions #

Leray–Hopf weak solutions exist globally, but they may not be unique. This is a separate, equally deep question: even if all smooth solutions extend globally, one must also ask whether weak solutions coincide with smooth ones when started from smooth data. Recent work of Buckmaster and Vicol (2019) showed that weak solutions below the Ladyzhenskaya–Prodi–Serrin threshold are indeed non-unique, using convex integration techniques developed for the Euler equations (De Lellis–Székelyhidi). Whether Leray–Hopf solutions with the energy inequality are unique is still open and is perhaps the central problem in the weak solution theory.

4. Self-Similar and Discretely Self-Similar Solutions #

Self-similar solutions of the form $u(x,t) = (T^*-t)^{-1/2} U(x/(T^*-t)^{1/2})$ satisfy a nonlinear elliptic system for the profile $U$. Several non-existence theorems show that backward self-similar solutions with certain integrability must be trivial (Nečas–Ružička–Šverák, 1996). The case of discretely self-similar solutions, where $u(x,t) = \lambda u(\lambda x, \lambda^2 t)$ for a fixed $\lambda \neq 1$, is less understood and was recently revisited. Whether the set of self-similar profiles that could appear as blowup limits is empty is not known.

5. Computer-Assisted Proofs via Rigorous Numerics #

The Chen–Hou approach to Euler singularities (2025) used a computer-assisted proof framework: construct a numerical approximate profile, then verify its stability rigorously using interval arithmetic. For Navier–Stokes the presence of viscosity complicates such an approach (the profile is dissipated rather than transported), but the same framework (dynamical rescaling plus nonlinear stability verification) might in principle detect or rule out singularities in specific axi-symmetric geometries. Applying and adapting the Hou group’s methods to the viscous problem is an active direction.

6. The Zero-Viscosity Limit and Euler–Navier–Stokes Connection #

As $\nu \to 0$, Navier–Stokes formally converges to Euler. The precise relationship is subtle: in the presence of boundaries (Prandtl layers) or after a potential Euler singularity, the zero-viscosity limit can fail to hold in strong norms. If Euler develops a finite-time singularity at time $T^*_E$ from smooth data (as Chen–Hou suggest for bounded domains), then for small $\nu$ the Navier–Stokes solution must either also develop a near-singularity or be regularised by viscosity before $T^*_E$. Whether viscosity is always sufficient to regularise an Euler singularity, or whether a Navier–Stokes singularity can arise from a nearby Euler one, is entirely open.

References #

Fefferman, C. L. (2000). Existence and smoothness of the Navier–Stokes equation. Clay Mathematics Institute Millennium Prize Problems. https://www.claymath.org/wp-content/uploads/2022/06/navierstokes.pdf
Leray, J. (1934). Sur le mouvement d’un liquide visqueux emplissant l’espace. Acta Mathematica, 63, 193–248.
Hopf, E. (1951). Über die Anfangswertaufgabe für die hydrodynamischen Grundgleichungen. Mathematische Nachrichten, 4(1–6), 213–231.
Caffarelli, L., Kohn, R., & Nirenberg, L. (1982). Partial regularity of suitable weak solutions of the Navier–Stokes equations. Communications on Pure and Applied Mathematics, 35(6), 771–831.
Ladyzhenskaya, O. A. (1967). On uniqueness and smoothness of generalized solutions to the Navier–Stokes equations. Zapiski Nauchnykh Seminarov LOMI, 5, 169–185.
Escauriaza, L., Seregin, G. A., & Šverák, V. (2003). $L_{3,\infty}$-solutions of the Navier–Stokes equations and backward uniqueness. Russian Mathematical Surveys, 58(2), 211–250.
Tao, T. (2019). Quantitative bounds for critically bounded solutions to the Navier–Stokes equations. arXiv:1908.04958. Published in Nine Mathematical Challenges, AMS, 2021, pp. 149–193.
Tao, T. (2016). Finite time blowup for an averaged three-dimensional Navier–Stokes equation. Journal of the American Mathematical Society, 29(3), 601–674.
Buckmaster, T. & Vicol, V. (2019). Nonuniqueness of weak solutions to the Navier–Stokes equation. Annals of Mathematics, 189(1), 101–144.
Barker, T. & Prange, C. (2021). Localized quantitative estimates and potential blow-up rates for the Navier–Stokes equations. Communications in Mathematical Physics, 385, 717–792.

Navier–Stokes Regularity: The Uniqueness of Weak Solutions

Fri, 29 May 2026 00:00:00 +0000

The companion post on Navier–Stokes existence and smoothness asked whether smooth solutions can break down in finite time. This post asks the opposite question: when a solution is only weakly defined, satisfying the equations in an integral sense rather than pointwise, is it uniquely determined by its initial data? The answer, developed over the last two decades through a dramatic series of results, is a resounding no in many regimes. The frontier is now whether the physically natural class of Leray–Hopf weak solutions retains uniqueness.

Question (Weak Uniqueness)

Are Leray–Hopf weak solutions of the 3D incompressible Navier–Stokes equations $$\partial_t u + (u\cdot\nabla)u - \nu\Delta u + \nabla p = 0, \qquad \nabla\cdot u = 0$$ uniquely determined by their initial data $u_0 \in L^2(\mathbb{R}^3)$?

The question is one of the most urgent open problems in the PDE theory of fluid dynamics. It is logically independent of the blowup question: Leray–Hopf solutions exist globally for all time regardless of whether smooth solutions break down. What is not known is whether two Leray–Hopf solutions started from the same data must coincide.

Nash’s h-Principle: The Conceptual Ancestor #

The story begins not in fluid mechanics but in differential geometry. In 1954, John Nash proved that any Riemannian manifold admits a $C^1$ isometric embedding into Euclidean space, a result that contradicted the expectation, based on the rigid behaviour of $C^2$ embeddings (Cauchy), that the metric should impose strong constraints. The key insight is that $C^1$ embeddings are flexible: one can deform them by adding high-frequency oscillations that are invisible at the large scale but locally produce any prescribed metric tensor.

Gromov formulated this phenomenon as the h-principle: for certain underdetermined differential relations, the topological (homotopy-theoretic) obstructions are the only ones, and any formal solution can be deformed into an actual solution. The h-principle is a flexibility result: it says geometry is surprisingly unconstrained below a critical regularity threshold.

De Lellis and Székelyhidi recognised in the mid-2000s that the incompressible Euler equations are formally analogous to Nash’s embedding problem. The Euler system is underdetermined (more unknowns than equations), and one can attempt to construct wild solutions by adding high-frequency oscillations. The crucial observation is that the nonlinearity $u\otimes u$ in the Reynolds stress tensor plays the role of the metric tensor in Nash’s problem.

Wild Euler Solutions #

The first step was to show that the Euler equations possess infinitely many weak solutions for given initial data.

Theorem (De Lellis–Székelyhidi, 2009–2013)

For any divergence-free $u _0 \in L^2(\mathbb{T}^3)$ and any prescribed energy profile $e(t) \in C^\infty([0,T])$ with $e(t) > |u _0| _{L^2}^2$ for all $t > 0$, there exist infinitely many weak solutions $u \in C_t^0 L_x^2$ of the 3D Euler equations with $u(\cdot,0) = u _0$ and $|u(\cdot,t)| _{L^2}^2 = e(t)$.

In particular, the Euler equations admit weak solutions that spontaneously gain or lose kinetic energy for no reason: wild solutions. The construction proceeds by convex integration: one builds the solution iteratively, at each stage adding a high-frequency perturbation (a Beltrami wave) that corrects the error in the momentum equation while staying nearly invisible in the velocity field.

Earlier, Scheffer (1993) and Shnirelman (1997) had shown the existence of weak Euler solutions with compact support in space-time: the fluid is at rest, then spontaneously moves, then returns to rest; but their constructions were indirect. De Lellis and Székelyhidi’s convex integration scheme gave the first systematic and quantitative approach.

Onsager’s Conjecture #

The De Lellis–Székelyhidi results raise an immediate question: at what regularity does the fluid behaviour transition from flexible (wild, non-unique) to rigid (energy-conserving, unique)? This is precisely what Lars Onsager conjectured in 1949.

Onsager's Conjecture (1949)

For the 3D incompressible Euler equations, the threshold regularity for energy conservation is the Hölder exponent $1/3$:

If $u \in C^{0,\alpha}$ with $\alpha > 1/3$, then every weak solution conserves kinetic energy.
For every $\alpha < 1/3$, there exist weak solutions in $C^{0,\alpha}$ that dissipate energy.

The positive direction (conservation above $1/3$) was proved by Constantin–E–Titi (1994). The negative direction (dissipation possible below $1/3$) required much more work and was fully resolved only recently.

Theorem (Isett, 2018)

For every $\alpha < 1/3$ there exist weak solutions $u \in C^{0,\alpha}(\mathbb{T}^3\times[0,T])$ of the 3D Euler equations that fail to conserve kinetic energy.

Isett’s proof, published in the Annals of Mathematics in 2018, was the culmination of a decade of refinements of the De Lellis–Székelyhidi scheme. The key difficulty at regularity exactly $1/3$ is that the high-frequency perturbations must be sized to cancel the Reynolds stress error while staying in $C^{1/3-}$; this requires a delicate interplay of oscillation and concentration (intermittency). De Lellis, Székelyhidi, Buckmaster, and Vicol also obtained solutions attaining any prescribed energy profile in $C^{1/3-}$. Onsager’s conjecture is now a theorem.

Viscous Non-Uniqueness: Buckmaster–Vicol #

Adapting the convex integration scheme from Euler to Navier–Stokes requires overcoming the viscous term $\nu\Delta u$, which smooths out high-frequency oscillations. The intermittent Beltrami waves used by Isett concentrate energy at sparse spatial sets, reducing their interaction with the Laplacian. Buckmaster and Vicol exploited this idea to bring convex integration into the viscous setting.

Theorem (Buckmaster–Vicol, 2019)

There exist infinitely many weak solutions $u \in C_t^0 L_x^2(\mathbb{T}^3)$ of the 3D Navier–Stokes equations, belonging to the same regularity class as Leray–Hopf solutions, that do not satisfy the global energy inequality. In particular, weak solutions of 3D Navier–Stokes are not unique in the class $C_t^0 L_x^2$.

The Buckmaster–Vicol solutions, published in the Annals of Mathematics 189 (2019), 101–144, are weak in both the PDE sense and the energy sense: they satisfy the equations distributionally and have finite kinetic energy, but they can gain energy spontaneously, violating the natural dissipation law $\partial _t|u| _{L^2}^2 \leq -2\nu|\nabla u| _{L^2}^2$.

This non-uniqueness is striking but also limited: the Buckmaster–Vicol solutions are not Leray–Hopf solutions, because Leray–Hopf solutions are required to satisfy the energy inequality $|u(t)| _{L^2}^2 \leq |u _0| _{L^2}^2$. Whether this single additional constraint, that energy does not increase, suffices to restore uniqueness is the open question.

Crossing the Energy Barrier: Albritton–Brué–Colombo #

The energy inequality distinguishing Leray–Hopf solutions from Buckmaster–Vicol wild solutions seemed for a long time to be a genuine barrier to non-uniqueness. The following result crossed this barrier, but required introducing an external force.

Theorem (Albritton–Brué–Colombo, 2022)

There exists a body force $f \in L^1(0,T;, L^2(\mathbb{R}^3))$ and two distinct Leray–Hopf weak solutions of the forced 3D Navier–Stokes equations $\partial_t u + (u\cdot\nabla)u - \nu\Delta u + \nabla p = f$ with the same initial data $u_0 \equiv 0$ and the same force $f$.

Published in the Annals of Mathematics 196 (2022), 415–455, the proof uses a completely different mechanism from convex integration. The key ingredient is an unstable background solution: using Vishik’s construction of spectrally unstable steady states of the 2D Euler equations, Albritton–Brué–Colombo lift a 2D unstable vortex ring to an axisymmetric 3D solution and embed it into the Navier–Stokes flow via a self-similar change of variables. The force $f$ is chosen precisely to make this background exactly solve the forced equations; the instability then allows two different solutions to branch from the same initial data.

The force is singular; it belongs to $L^1_t L^2_x$ but is not smooth, and is concentrated near the initial time $t=0$. Whether the same non-uniqueness can be achieved with a smooth or zero force is the remaining open problem.

The Unforced Case: Current Frontier #

Non-uniqueness of Leray–Hopf solutions for the unforced Navier–Stokes equations remains open. The route to the unforced case requires finding a self-similar background profile that solves the unforced equations exactly and has an unstable eigenvalue, a far more demanding task than the forced case, where the profile can be any divergence-free function.

Open Problem (Jia–Šverák Programme)

Do there exist two distinct Leray–Hopf solutions of the 3D Navier–Stokes equations with the same initial data and no external force?

Jia and Šverák (2013–2014) showed that non-uniqueness would follow from a spectral assumption: if there exists a forward self-similar Navier–Stokes solution whose linearised operator has an eigenvalue with positive real part, then Leray–Hopf solutions are non-unique. Guillod and Šverák (2017) provided compelling numerical evidence that such an unstable self-similar profile exists.

In September 2025, Giri and Kwon posted a preprint (arXiv:2509.25116) claiming a computer-assisted proof of the existence of an unstable self-similar profile for the unforced equations, which, via the Jia–Šverák mechanism, would establish non-uniqueness of Leray–Hopf solutions. The proof uses rigorous interval arithmetic to verify the existence of an unstable eigenvalue. As of this writing the preprint is under review by the community.

The Regularity Threshold #

The accumulated results suggest the following picture of the flexibility-rigidity dichotomy for the Euler and Navier–Stokes equations.

Regularity class	Euler	Navier–Stokes
$C^{0,\alpha}$, $\alpha < 1/3$	non-unique, dissipative (Isett 2018)	n/a
$C^{0,\alpha}$, $\alpha > 1/3$	energy-conserving (Constantin–E–Titi 1994)	n/a
$L^2$ (global energy inequality)	non-unique	open (unforced); non-unique forced (ABC 2022)
$L^\infty_t L^3_x$ (LPS regularity)	n/a	unique and smooth (ESS 2003)

The Leray–Hopf class sits precisely at the boundary where uniqueness is expected to break down but has not yet been proved to do so in the unforced case.

Research Directions #

1. Resolving the Jia–Šverák Spectral Condition #

The most direct path to unforced Leray–Hopf non-uniqueness is to rigorously confirm or refute the spectral condition of Jia–Šverák: find (or prove the nonexistence of) a forward self-similar Navier–Stokes profile with an unstable linearised eigenvalue. The 2025 Giri–Kwon computer-assisted preprint claims this is now done. If confirmed, the consequence is striking: Leray’s 1934 existence theorem cannot be supplemented by uniqueness, and the Navier–Stokes Cauchy problem is ill-posed in the Leray–Hopf class.

2. Selection Principles and Physical Solutions #

If Leray–Hopf solutions are indeed non-unique, a fundamental question becomes which solution is the physically correct one, the one observed in experiments and computed in simulations. Several selection criteria have been proposed: the vanishing viscosity limit of the Navier–Stokes solution as $\nu\to 0$ from above, entropy conditions analogous to those for hyperbolic conservation laws, and renormalisation group or statistical ensemble approaches motivated by turbulence theory. None of these has been rigorously validated as a selection criterion that distinguishes a unique Leray–Hopf solution from the others.

3. Sharp Regularity Thresholds for Navier–Stokes #

For Euler, Onsager’s conjecture identifies $C^{1/3}$ as the sharp regularity threshold for energy conservation. What is the analogous threshold for Navier–Stokes? The Buckmaster–Vicol solutions are in $C_t^0 L_x^2$ (very rough), while the Ladyzhenskaya–Prodi–Serrin class gives uniqueness. The precise exponent at which uniqueness breaks down, if it does, is not known. Determining the sharp Sobolev or Hölder regularity threshold for Navier–Stokes uniqueness, analogous to Onsager’s $1/3$, is a central open problem.

4. Uniqueness for Axisymmetric Initial Data #

A natural restricted problem is whether Leray–Hopf solutions with axisymmetric, swirl-free initial data are unique. Such data imposes a strong geometric constraint that eliminates most of the degrees of freedom available to convex integration. Partial results are known (e.g., global regularity for axisymmetric data without swirl is not proved but no counterexamples exist), but uniqueness in this class has not been established. If the Giri–Kwon instability is confirmed, understanding whether the instability mechanism survives axisymmetric perturbations is an immediate question.

5. Stochastic Regularisation #

There is a well-studied phenomenon, regularisation by noise, in which adding a stochastic forcing term to an ill-posed deterministic PDE restores well-posedness. For the Navier–Stokes equations, Hofmanová–Zhu–Zhu (2023) showed non-uniqueness persists even under multiplicative noise for certain body forces, by adapting the Albritton–Brué–Colombo construction. Whether a generic stochastic perturbation can restore uniqueness of Leray–Hopf solutions, and what the appropriate notion of “generic” should be, is a rich open direction combining convex integration with stochastic analysis.

References #

Nash, J. (1954). $C^1$ isometric imbeddings. Annals of Mathematics, 60(3), 383–396.
De Lellis, C. & Székelyhidi, L. (2009). The Euler equations as a differential inclusion. Annals of Mathematics, 170(3), 1417–1436.
De Lellis, C. & Székelyhidi, L. (2013). Dissipative continuous Euler flows. Inventiones Mathematicae, 193(2), 377–407.
Constantin, P., E, W., & Titi, E. S. (1994). Onsager’s conjecture on the energy conservation for solutions of Euler’s equation. Communications in Mathematical Physics, 165(1), 207–209.
Isett, P. (2018). A proof of Onsager’s conjecture. Annals of Mathematics, 188(3), 871–963.
Buckmaster, T. & Vicol, V. (2019). Nonuniqueness of weak solutions to the Navier–Stokes equation. Annals of Mathematics, 189(1), 101–144.
Buckmaster, T. & Vicol, V. (2019). Convex integration and phenomenologies in turbulence. EMS Surveys in Mathematical Sciences, 6(1–2), 1–88.
Albritton, D., Brué, E., & Colombo, M. (2022). Non-uniqueness of Leray solutions of the forced Navier–Stokes equations. Annals of Mathematics, 196(1), 415–455.
Jia, H. & Šverák, V. (2014). Local-in-space estimates near initial time for weak solutions of the Navier–Stokes equations and forward self-similar solutions. Inventiones Mathematicae, 196(1), 233–265.
Giri, V. & Kwon, H. (2025). Nonuniqueness of Leray–Hopf solutions to the unforced incompressible 3D Navier–Stokes equation. arXiv:2509.25116.

The Regularity Problem for the 3D Euler Equations

Fri, 29 May 2026 00:00:00 +0000

Leonhard Euler wrote down the equations governing the motion of an ideal incompressible fluid in 1757. Whether smooth solutions to these equations can develop a singularity in finite time, a point at which derivatives of the velocity blow up, has been an open problem ever since, and remains one of the central questions in mathematical fluid dynamics.

Problem (Euler Regularity)

Let $u_0 : \mathbb{R}^3 \to \mathbb{R}^3$ be a smooth, divergence-free initial velocity field with sufficient decay at infinity. Does the unique local smooth solution $u(x,t)$ to the 3D incompressible Euler equations $$\partial_t u + (u \cdot \nabla)u + \nabla p = 0, \qquad \nabla \cdot u = 0, \qquad u(\cdot,0)=u_0$$ remain smooth for all time $t > 0$?

The problem is rated L4 on UnsolvedMath, reflecting its depth, and is closely related to the Clay Millennium Prize Problem on the Navier–Stokes equations. The two questions are linked through the zero-viscosity limit, but neither implies the other.

The Equations and What Regularity Means #

The Euler equations express conservation of momentum (first equation) and incompressibility (second equation) for an inviscid fluid. The unknowns are the velocity field $u(x,t) \in \mathbb{R}^3$ and pressure $p(x,t) \in \mathbb{R}$; the pressure is determined implicitly by incompressibility via an elliptic equation.

Vorticity. The central quantity for singularity analysis is the vorticity $\omega = \nabla \times u$, which satisfies the vorticity equation $$\partial_t \omega + (u \cdot \nabla)\omega = (\omega \cdot \nabla)u.$$ The right-hand side, the vortex stretching term, is the essential source of difficulty. It creates a quadratic feedback: large $\omega$ produces large $(\omega \cdot \nabla)u$, which can further amplify $\omega$.

Local well-posedness. For $u_0 \in H^s(\mathbb{R}^3)$ with $s > 5/2$, there exists a unique smooth solution on a time interval $[0, T^*)$ for some $T^* > 0$ depending on $|u _0| _{H^s}$ (Kato, 1972). The question is whether $T^*$ can be taken equal to $+\infty$.

Why 2D is easy, 3D is not. In two dimensions the vortex stretching term $(\omega \cdot \nabla)u$ vanishes identically by antisymmetry. The scalar vorticity $\omega = \partial_1 u_2 - \partial_2 u_1$ is then simply transported along fluid particle paths without amplification, and $|\omega|_{L^\infty}$ is conserved. Global regularity in 2D follows immediately. In 3D no such conservation holds, and the problem is genuinely open.

The Beale–Kato–Majda Criterion #

The first major structural result reduces the regularity problem to a single quantity.

Theorem (Beale–Kato–Majda, 1984)

A smooth solution $u$ of the 3D Euler equations loses regularity at time $T^*$ if and only if $$\int _0^{T^*} |\omega(\cdot,t)| _{L^\infty(\mathbb{R}^3)}, dt = +\infty.$$ In particular, if the vorticity remains bounded in $L^\infty$ on $[0,T]$ for every finite $T$, the solution remains smooth globally.

The BKM criterion redirects the problem: one must show that the vorticity magnitude $|\omega|_{L^\infty}$ cannot accumulate to infinity in finite time. Since $\omega$ satisfies a transport-stretching equation, this requires understanding the geometric structure of the vorticity field under its own evolution.

Geometric Conditions and Depletion of Stretching #

The vortex stretching term $(\omega \cdot \nabla)u$ can be decomposed as $$(\omega \cdot \nabla)u = |\omega|^2 (\hat\omega \cdot \nabla)\hat u,$$ where $\hat\omega = \omega/|\omega|$ is the unit vorticity direction. The key observation is that stretching is governed not only by the magnitude of $\omega$ but also by the geometry of the vorticity field.

Theorem (Constantin–Fefferman–Majda, 1996)

If the unit vorticity direction $\hat\omega = \omega/|\omega|$ is uniformly Lipschitz in a neighbourhood of the set ${|\omega| > \lambda}$ for all $t \in [0, T]$ and some $\lambda > 0$, then the solution remains smooth on $[0,T]$.

This result says that blowup, if it occurs, must be accompanied by violent geometric irregularity of vortex lines, not just large vorticity magnitude, but also loss of Lipschitz regularity of the vorticity direction. It has motivated a line of research on the geometric structure of vortex tubes near potential singularities.

Blowup for Less Regular Data #

Recent years have seen dramatic progress on singularity formation for initial data that is smooth except at isolated points.

Theorem (Elgindi, 2021)

There exist axisymmetric, swirl-free initial velocity fields $u_0 \in C^{1,\alpha}(\mathbb{R}^3)$ for sufficiently small $\alpha > 0$ such that the corresponding solution to the 3D Euler equations develops a finite-time singularity.

Elgindi’s proof, published in the Annals of Mathematics 194 (2021), 647–727, constructs a self-similar blowup profile and establishes its nonlinear stability using a dynamical rescaling formulation. The initial data is not smooth: it belongs to $C^{1,\alpha}$ but not to $C^2$. The singularity forms at the axis of symmetry $r=0$.

This was a breakthrough, but it left open the smooth case. Elgindi himself noted the next target: constructing blowup from initial data that is non-smooth only at a single point, or eventually from fully smooth data.

Extending Elgindi’s construction. Chen and Hou (2022) proved the same type of $C^{1,\alpha}$ blowup for the 3D axisymmetric Euler equations with boundary (inside a periodic cylinder), realising the Hou–Luo blowup scenario numerically proposed in 2014. Subsequent work by Córdoba, Martínez-Zoroa, and Zheng (2025, Annals of PDE) showed that the singularity can be formed from initial data in $C^\infty(\mathbb{R}^3 \setminus {0}) \cap C^{1,\alpha}$, with non-smoothness at a single point, a further step toward the smooth case.

The 2025 Breakthrough: Smooth Blowup with Boundary #

The most significant recent development is the following result, which provides a rigorous proof of finite-time singularity from smooth initial data.

Theorem (Chen–Hou, PNAS 2025)

There exists a family of smooth, finite-energy initial data for the 3D axisymmetric Euler equations in a smooth bounded domain (periodic cylinder) such that the corresponding solutions develop a finite-time singularity. The blowup is nearly self-similar and occurs at the intersection of the boundary $r=1$ and the symmetry plane $z=0$.

The paper, contributed by Thomas Hou and published in PNAS in June 2025 (reviewed by Caflisch, Gómez-Serrano, Sverak, and Tao), provides a computer-assisted proof. The strategy is to:

construct a numerical approximate self-similar blowup profile via the dynamical rescaling formulation,
prove rigorously that the true solution remains close to this profile using energy estimates with carefully verified error bounds (computed with interval arithmetic), and
conclude nonlinear stability of the blowup via a bootstrap argument.

This resolves the problem affirmatively in the setting of smooth data and a smooth bounded domain. The boundary plays a crucial role: it creates an antisymmetric flow pattern driving azimuthal vorticity toward a critical ring, generating intense vortex stretching at a hyperbolic saddle point on the wall.

The remaining open case. The problem in $\mathbb{R}^3$ (or on the periodic torus $\mathbb{T}^3$) without boundary remains open. It is not known whether smooth initial data in free space can produce a singularity, or whether the absence of a boundary provides a genuine stabilising mechanism.

Research Directions #

1. Removing the Boundary #

The most pressing open question is whether the Chen–Hou construction can be extended to $\mathbb{R}^3$ or $\mathbb{T}^3$. The boundary in the 2025 result acts as a geometric catalyst: it enforces a no-flow condition that concentrates vorticity at a specific ring on the wall. Without a boundary, the antisymmetric flow structure that drives the singularity must be sustained entirely by the initial data and the nonlinear dynamics. Whether a comparable mechanism can persist in free space, without the reflective constraint of the wall, is the central open question.

2. Self-Similar Blowup in Full 3D #

All current singularity results are for axisymmetric flows, which reduce the problem from 3 spatial dimensions to 2 (the $rz$-plane). In full 3D, the angular variable $\theta$ is active, and perturbations in the azimuthal direction can either stabilise or destabilise the singularity. Elgindi, Ghoul, and Masmoudi (2021) proved stability of the $C^{1,\alpha}$ blowup under axisymmetric perturbations. Whether the singularity survives fully 3D (non-axisymmetric) perturbations, a question Elgindi posed as open, is crucial: a blowup that is destroyed by any non-symmetric perturbation has limited physical relevance.

3. Quantitative Vortex Stretching and the Role of Geometry #

The BKM criterion and the Constantin–Fefferman–Majda theorem both express the same idea from opposite directions: blowup is controlled by the magnitude and geometry of the vorticity. Current research asks whether a quantitative version can be made sharp. Specifically: if the vorticity direction $\hat\omega$ becomes Hölder-continuous but not Lipschitz, does blowup necessarily follow? Or is there a finer scale invariant quantity, perhaps involving the Hessian of the velocity or the curvature of vortex lines, that governs the problem?

4. Weak Solutions and Non-Uniqueness #

Separate from the question of whether smooth solutions blow up is the question of what happens after a potential singularity. De Lellis and Székelyhidi (2009–2013) proved that the Euler equations have infinitely many weak $L^\infty$ solutions for generic initial data, via convex integration. Isett (2018) proved that weak solutions can dissipate energy, confirming Onsager’s 1949 conjecture. These results show that the solution concept must be carefully chosen. After a smooth blowup, the system likely enters a regime of non-unique weak solutions, and identifying the physically relevant selection criterion, entropy conditions, vanishing viscosity, $h$-principle, is a major open problem.

5. Vanishing Viscosity and the Navier–Stokes Connection #

The Navier–Stokes equations add a viscous term $\nu \Delta u$ to the right-hand side. For any $\nu > 0$, global regularity of Navier–Stokes in 3D is itself open (the Clay Millennium Problem). For the zero-viscosity limit $\nu \to 0$, the central question is whether Navier–Stokes solutions converge to Euler solutions uniformly in time, a question tied to boundary layer behaviour (the Prandtl conjecture) and to the regularity of the Euler solution. If Euler develops a singularity at time $T^*$, the behaviour of Navier–Stokes solutions near $T^*$ as $\nu \to 0$ is completely unknown.

References #

Euler, L. (1757). Principes généraux du mouvement des fluides. Mémoires de l’Académie des Sciences de Berlin, 11, 274–315.
Beale, J. T., Kato, T., & Majda, A. (1984). Remarks on the breakdown of smooth solutions for the 3-D Euler equations. Communications in Mathematical Physics, 94(1), 61–66.
Constantin, P., Fefferman, C., & Majda, A. J. (1996). Geometric constraints on potentially singular solutions for the 3-D Euler equations. Communications in Partial Differential Equations, 21(3–4), 559–571.
Elgindi, T. M. (2021). Finite-time singularity formation for $C^{1,\alpha}$ solutions to the incompressible Euler equations on $\mathbb{R}^3$. Annals of Mathematics, 194(3), 647–727.
Elgindi, T. M., Ghoul, T.-E., & Masmoudi, N. (2021). On the stability of self-similar blow-up for $C^{1,\alpha}$ solutions to the incompressible Euler equations. Cambridge Journal of Mathematics, 9(4), 1035–1075.
Chen, J. & Hou, T. Y. (2023). Finite time blowup of 2D Boussinesq and 3D Euler equations with $C^{1,\alpha}$ velocity and boundary. Communications in Mathematical Physics, 383, 4827–4890.
Chen, J. & Hou, T. Y. (2025). Singularity formation in 3D Euler equations with smooth initial data and boundary. Proceedings of the National Academy of Sciences, 122(27). https://doi.org/10.1073/pnas.2500940122
Córdoba, D., Martínez-Zoroa, L., & Zheng, F. (2025). Finite time singularities to the 3D incompressible Euler equations for solutions in $C^\infty(\mathbb{R}^3\setminus{0})\cap C^{1,\alpha}\cap L^2$. Annals of PDE. https://doi.org/10.1007/s40818-025-00214-2
Isett, P. (2018). A proof of Onsager’s conjecture. Annals of Mathematics, 188(3), 871–963.
Majda, A. J. & Bertozzi, A. L. (2002). Vorticity and Incompressible Flow. Cambridge University Press.

$C^r$ Stability Conjecture

Thu, 28 May 2026 00:00:00 +0000

Structural stability is a global topological property: a dynamical system is structurally stable if all nearby systems have the same orbit structure, up to continuous reparametrisation. Hyperbolicity is a local differential property: the tangent bundle over the recurrent set splits into uniformly contracting and expanding directions. That these two conditions should be equivalent is one of the deepest principles in smooth dynamics.

Conjecture ($C^r$ Stability Conjecture, Palis–Smale, ~1970)

Let $M$ be a closed smooth manifold and $r \geq 1$. If $f \in \mathrm{Diff}^r(M)$ is $C^r$-structurally stable, then $f$ is hyperbolic, i.e., it satisfies Axiom A and the Strong Transversality Condition.

The problem is rated L3 on UnsolvedMath and sits at the heart of the global theory of smooth dynamical systems. The case $r = 1$ is resolved. The case $r \geq 2$ is open, and even basic consequences of structural stability that are elementary for $r = 1$ remain unknown for $r = 2$.

Key Definitions #

Structural stability. A diffeomorphism $f \in \mathrm{Diff}^r(M)$ is $C^r$-structurally stable if there exists a $C^r$-neighborhood $\mathcal{U}$ of $f$ such that every $g \in \mathcal{U}$ is topologically conjugate to $f$: there is a homeomorphism $h : M \to M$ with $h \circ f = g \circ h$. The system is therefore robust under $C^r$-small perturbations in the strongest possible sense: topology, not just orbit counts, is preserved.

Axiom A. The diffeomorphism $f$ satisfies Axiom A if:

the non-wandering set $\Omega(f)$ is hyperbolic: there is a $Df$-invariant splitting $T_x M = E^s_x \oplus E^u_x$ over $\Omega(f)$ with uniform exponential contraction on $E^s$ and expansion on $E^u$;
the periodic points of $f$ are dense in $\Omega(f)$.

Strong Transversality Condition (STC). For every $x, y \in \Omega(f)$, the stable manifold $W^s(x)$ and the unstable manifold $W^u(y)$ intersect transversally. Tangential intersections, namely homoclinic or heteroclinic tangencies, are forbidden.

Together, Axiom A and the STC constitute what is usually meant by saying $f$ is hyperbolic in the sense of the stability conjecture.

The Two Directions #

The conjecture, as an equivalence, has an easy direction and a hard direction.

Structural stability follows from hyperbolicity (the easy direction). Robbin (1971) proved this for $C^2$ diffeomorphisms; Robinson (1976) extended it to $C^1$. Both proofs use the implicit function theorem on an appropriate space of conjugacies, and work for all $r \geq 1$ since Axiom A + STC is the hypothesis.

Theorem (Robbin 1971, Robinson 1976)

For every $r \geq 1$, if $f \in \mathrm{Diff}^r(M)$ satisfies Axiom A and the Strong Transversality Condition, then $f$ is $C^r$-structurally stable.

Hyperbolicity follows from structural stability (the hard direction) is the conjecture itself. It requires understanding what structural stability forces on the dynamics, ruling out every non-hyperbolic mechanism compatible with stability. This is where the difficulty lies, and where the gap between $r = 1$ and $r \geq 2$ opens.

The $C^1$ Case: Mañé’s Theorem #

The $C^1$ stability conjecture was fully proved by Mañé in 1987.

Theorem (Mañé, 1987)

Every $C^1$-structurally stable diffeomorphism of a closed manifold satisfies Axiom A and the Strong Transversality Condition.

The proof, published in Publ. Math. IHÉS 66 (1987), 161–210, is a tour de force of $C^1$ perturbation theory. It rests on several tools that are available only in the $C^1$ topology:

Pugh’s $C^1$ closing lemma (1967): Given a non-wandering point $x$ of $f$, one can make an arbitrarily small $C^1$ perturbation of $f$ to create a periodic orbit passing near $x$. This is the essential mechanism for showing that periodic points are dense in $\Omega(f)$.
Mañé’s ergodic closing lemma (1982): A more refined version that controls the Lyapunov exponents of the created periodic orbit, allowing the construction of hyperbolic periodic points that shadow the orbit of an ergodic measure.
Franks’ lemma (1971): Linear maps along periodic orbits can be prescribed independently (up to $C^1$ conjugacy), allowing one to test whether a given splitting is genuinely hyperbolic or can be destroyed by a small $C^1$ perturbation.

The strategy is to assume structural stability and use these tools to show, step by step, that the non-wandering set must be hyperbolic and that tangencies cannot persist. Mañé had proved the surface case ($\dim M = 2$, $r = 1$) earlier, with the full higher-dimensional result completed in the 1987 paper. Aoki (1992) and Hayashi (1992) subsequently settled the closely related Mañé conjecture on the $C^1$ interior of the set of diffeomorphisms with all hyperbolic periodic points.

The Wall at $r \geq 2$ #

The $C^r$ case for $r \geq 2$ is not merely an incremental extension. The tools that power Mañé’s proof are fundamentally $C^1$ phenomena.

The $C^r$ closing lemma is open for $r \geq 2$. Pugh’s closing lemma fails for $r \geq 2$ in general: Gutierrez showed that the local perturbation argument used for $C^1$ does not work in the $C^2$ topology. A $C^r$ closing lemma is available only for specific classes of diffeomorphisms:

Conservative (volume-preserving) diffeomorphisms on surfaces: Asaoka–Irie ($C^\infty$, 2015), Cristofaro-Gardiner–Prasad–Zhang (2023).
Partially hyperbolic diffeomorphisms with one-dimensional center bundle (all $r \geq 2$ including $r = \infty$): Gan–Shi (2022) and the follow-up $C^r$-chain closing lemma of Shi–Wang (Ergodic Theory Dynam. Syst. 44, 2024).

In the absence of a general $C^r$ closing lemma, the first step of Mañé’s proof, showing that periodic points are dense in $\Omega(f)$ under $C^r$ structural stability, is not known for $r \geq 2$.

Mañé himself underscored this gap. In the 1987 paper, immediately after the proof of Theorem A, he writes that for $r > 1$ “not even [being] known whether a $C^2$ structurally stable diffeomorphism has at least one periodic point, it seems, to say the least, difficult to prove that they are dense.”

Franks’ lemma also fails for $r \geq 2$. Controlling linear maps along periodic orbits requires $C^1$ perturbations; in higher regularity the ambient perturbation must be smooth and the constraints on higher derivatives can prevent the desired linear behaviour from being achieved.

Research Directions #

1. The $C^r$ Closing Lemma for General Diffeomorphisms #

The most direct path to the $C^r$ stability conjecture passes through a general $C^r$ closing lemma. For $r \geq 2$ this asks: given any non-wandering point of a $C^r$ diffeomorphism, can one make an arbitrarily small $C^r$ perturbation to close the orbit? Answering this in the affirmative for all closed manifolds and all $r \geq 2$ would be a landmark result, and would immediately advance the stability conjecture. The recent progress in conservative surface dynamics (Cristofaro-Gardiner et al., 2023) and partially hyperbolic settings shows the question is not hopeless, but the general dissipative case remains untouched.

2. The Surface Case $\dim M = 2$, $r \geq 2$ #

On surfaces the dynamics is simpler: the non-wandering set has lower-dimensional structure, and the absence of a center bundle means “partially hyperbolic” reduces to “hyperbolic.” Mañé settled the surface case for $r = 1$. The $C^r$ stability conjecture for surfaces and $r \geq 2$ is already an important open target and may be the most accessible subcase. Recent $C^\infty$ closing lemmas for conservative surface diffeomorphisms (Asaoka–Irie) suggest that the conservative surface case may be reachable.

3. Partially Hyperbolic Diffeomorphisms #

A diffeomorphism is partially hyperbolic if the tangent bundle splits as $TM = E^{ss} \oplus E^c \oplus E^{uu}$ with uniform contraction on $E^{ss}$, uniform expansion on $E^{uu}$, and an intermediate “center” bundle $E^c$. For these systems, Gan–Shi (2022) and Shi–Wang (2024) have established $C^r$ closing and chain-closing lemmas when $\dim E^c = 1$. The question is whether $C^r$-structural stability of a partially hyperbolic diffeomorphism forces the center bundle to also become hyperbolic, that is, whether partial hyperbolicity implies full hyperbolicity under stability.

4. The Palis Global Conjecture #

Palis proposed that the complement of the hyperbolic diffeomorphisms is exactly the closure of systems exhibiting homoclinic tangencies or heteroclinic cycles. This is a positive description of non-hyperbolic dynamics, and is a strengthening of the $C^r$ stability conjecture (it would also characterise what structural stability forbids). In $C^1$ topology this programme is largely complete through Bonatti– Crovisier’s connecting lemma (2004) and related results. For $r \geq 2$ it is wide open, and progress on the Palis conjecture in $C^r$ would likely resolve the stability conjecture as a corollary.

5. Flows and the Vector Field Analogue #

The stability conjecture has a natural analogue for $C^r$ vector fields: a $C^r$-structurally stable flow should satisfy Axiom A and the strong transversality condition. For $r = 1$ this is also proved. For $r \geq 2$ it is open. The vector field setting introduces additional complications from singular points (zeros of the vector field), as Labarca–Pacifico showed that on manifolds with boundary stable flows can fail Axiom A, so the correct formulation may need adaptation. Progress on the diffeomorphism case would likely shed light on the flow case as well.

References #

Palis, J. & Smale, S. (1970). Structural stability theorems. Proc. Sympos. Pure Math., 14, 223–231.
Robbin, J. W. (1971). A structural stability theorem. Annals of Mathematics, 94(2), 447–493.
Robinson, C. (1976). Structural stability of $C^1$ diffeomorphisms. Journal of Differential Equations, 22(1), 28–73.
Mañé, R. (1987). A proof of the $C^1$ stability conjecture. Publications Mathématiques de l’IHÉS, 66, 161–210.
Aoki, N. (1992). The set of Axiom A diffeomorphisms with no cycles. Bol. Soc. Brasil. Mat., 23(1–2), 21–65.
Hayashi, S. (1992). Diffeomorphisms in $\mathcal{F}^1(M)$ satisfy Axiom A. Ergodic Theory Dynam. Systems, 12(2), 233–253.
Gan, S. & Shi, Y. (2022). $C^r$-closing lemma for partially hyperbolic diffeomorphisms with 1D-center bundle. Journal of Differential Equations, 334, 337–363.
Shi, Y. & Wang, X. (2024). $C^r$-chain closing lemma for certain partially hyperbolic diffeomorphisms. Ergodic Theory Dynam. Systems, 44(7), 1923–1944.
Bonatti, C. & Crovisier, S. (2004). Récurrence et généricité. Inventiones Mathematicae, 158(1), 33–104.
Berger, P. (2017). Lectures on structural stability in dynamics. arXiv:1703.00092.

Inequality for Square-Summable Complex Series

Thu, 28 May 2026 00:00:00 +0000

Some inequalities look formidable until the right decomposition makes them transparent. The conjecture below, posed by Zoltan Retkes on the Open Problem Garden in 2012 with a £10 prize attached, is one such case: once the dyadic structure of the positive integers is made explicit, the proof reduces to two classical facts.

Conjecture (Retkes, 2012), now proved

For all $\alpha = (\alpha_1, \alpha_2, \ldots) \in \ell^2(\mathbb{C})$, $$\sum_{n \geq 1} |\alpha_n|^2 \geq \frac{6}{\pi^2} \sum_{k \geq 0} \left|, \sum_{l \geq 0} \frac{\alpha_{2^k(2l+1)}}{l+1} ,\right|^2.$$

The conjecture was confirmed by an anonymous comment on the problem page in November 2013. A self-contained proof and an extension to $\ell^p$ were subsequently published by Ibragimov and Salimova in Elemente der Mathematik 70 (2015), 79–81.

The Dyadic Decomposition #

The index $2^k(2l+1)$ running over $k \geq 0$ and $l \geq 0$ is not arbitrary: it encodes a canonical partition of the positive integers. Every $n \in \mathbb{N}^+$ factors uniquely as $$n = 2^k \cdot r, \qquad k \geq 0,\quad r \text{ odd positive},$$ where $k = v_2(n)$ is the 2-adic valuation of $n$ and $r = n/2^k$ is its odd part. Writing $r = 2l+1$ gives the bijection $\mathbb{N}_0 \times \mathbb{N}_0 \to \mathbb{N}^+$, $(k, l) \mapsto 2^k(2l+1)$. In particular the sets $$A_k = {2^k(2l+1) : l \geq 0} = {2^k, 3 \cdot 2^k, 5 \cdot 2^k, \ldots}$$ form a partition of $\mathbb{N}^+$. Explicitly: $A_0 = {1, 3, 5, 7, \ldots}$ (odd numbers), $A_1 = {2, 6, 10, 14, \ldots}$ (twice an odd number), and so on. This partition is the key structural fact behind the proof.

Proof #

The argument has two ingredients: the Basel sum $\sum_{l \geq 0}(l+1)^{-2} = \pi^2/6$, and the Cauchy–Schwarz inequality in $\ell^2(\mathbb{C})$.

Define two sequences in $\ell^2(\mathbb{C})$: $$x = \left(1,, \tfrac{1}{2},, \tfrac{1}{3},, \ldots\right), \qquad y_k = \left(\alpha_{2^k},, \alpha_{3 \cdot 2^k},, \alpha_{5 \cdot 2^k},, \ldots\right) \quad (k \geq 0).$$

The inner sum in the conjecture is exactly the $\ell^2$ inner product $\langle x, y_k \rangle$: $$\sum_{l \geq 0} \frac{\alpha_{2^k(2l+1)}}{l+1} = \langle x, y_k \rangle.$$

Step 1: Apply Cauchy–Schwarz. For each $k$,

$$|\langle x, y_k \rangle|^2 \leq |x|_2^2 \cdot |y_k|_2^2.$$

Summing over $k \geq 0$,

$$\sum _{k \geq 0} |\langle x, y _k \rangle|^2 \leq |x| _2^2 \sum _{k \geq 0} |y _k| _2^2.$$

Step 2: Evaluate using the Basel problem and the partition. The Basel problem gives $$|x| _2^2 = \sum _{l \geq 0} \frac{1}{(l+1)^2} = \frac{\pi^2}{6}.$$

Since the sets $A_k$ partition $\mathbb{N}^+$, $$\sum _{k \geq 0} |y_k|_2^2 = \sum _{k \geq 0} \sum _{l \geq 0} |\alpha _{2^k(2l+1)}|^2 = \sum _{n \geq 1} |\alpha_n|^2.$$

Combining both steps, $$\sum_{k \geq 0} \left|\sum_{l \geq 0} \frac{\alpha_{2^k(2l+1)}}{l+1}\right|^2 \leq \frac{\pi^2}{6} \sum_{n \geq 1} |\alpha_n|^2,$$ which is the inequality with the $\frac{6}{\pi^2}$ factor moved to the other side.

Sharpness of the Constant #

The constant $6/\pi^2$ is the best possible. To see this, consider the truncated sequence $\alpha^{(N)}$ defined by $\alpha^{(N)}_{2l+1} = 1/(l+1)$ for $l = 0, 1, \ldots, N-1$ and $\alpha^{(N)}_n = 0$ otherwise. Then:

The left-hand side equals $\displaystyle\sum_{l=0}^{N-1} \frac{1}{(l+1)^2} \to \frac{\pi^2}{6}$.
The only non-zero contribution to the right-hand side comes from $k = 0$ (since all non-zero indices are odd, i.e. in $A_0$), giving $\displaystyle\frac{6}{\pi^2}\left(\sum_{l=0}^{N-1} \frac{1}{(l+1)^2}\right)^2 \to \frac{6}{\pi^2} \cdot \frac{\pi^4}{36} = \frac{\pi^2}{6}$.

The ratio of the right-hand side to the left-hand side therefore tends to $1$ as $N \to \infty$, so no larger constant than $6/\pi^2$ can hold universally. Equality is never achieved for $\alpha \in \ell^2(\mathbb{C})\setminus{0}$ with finite norm since the limiting sequence does not belong to $\ell^2(\mathbb{C})$.

Extension to $\ell^p$ #

The Cauchy–Schwarz inequality used above is a special case of Hölder’s inequality, and the proof generalises immediately.

Theorem (Ibragimov–Salimova, 2015)

Let $p, q \in (1,\infty)$ with $\tfrac{1}{p} + \tfrac{1}{q} = 1$. For all $\alpha = (\alpha_1, \alpha_2, \ldots) \in \ell^p(\mathbb{C})$ and $x = (x_0, x_1, \ldots) \in \ell^q(\mathbb{C})$, $$\sum_{n \geq 1} |\alpha_n|^p \geq \left(\sum_{l \geq 0} |x_l|^q\right)^{-p/q} \sum_{k \geq 0} \left|\sum_{l \geq 0} x_l, \alpha_{2^k(2l+1)}\right|^p.$$

Retkes’s original inequality is the case $p = q = 2$ and $x_l = 1/(l+1)$, where $(\sum_{l\geq 0}|x_l|^2)^{-1} = 6/\pi^2$ by the Basel problem.

Remarks on Structure #

The role of the dyadic partition. The sets $A_k$ are the dyadic layers of $\mathbb{N}^+$: each integer sits in exactly one layer determined by its 2-adic valuation. This structure also appears in the theory of Hardy spaces, where the dyadic martingale decomposition underpins the $H^1$–BMO duality, and in wavelets, where the dyadic scaling of the real line organises the multiresolution analysis. The inequality can be read as a norm comparison between the $\ell^2$ norm and a weighted sum over dyadic layers.

Relation to the Basel problem. The constant $6/\pi^2$, the reciprocal of $\zeta(2)$, appears here because the weight sequence $1/(l+1)$ used in the inner sum is precisely the harmonic sequence, whose $\ell^2$ norm squared is $\zeta(2)$. Any other weight sequence $x \in \ell^2(\mathbb{C})$ would produce the analogous inequality with $|x|_2^{-2}$ in place of $6/\pi^2$.

The inequality as a rearrangement estimate. The right-hand side reorganises the entries of $\alpha$ by their dyadic layer and applies a weighted average within each layer. The inequality says the total $\ell^2$ energy cannot be less than $6/\pi^2$ times the energy of this rearranged, averaged version of the sequence, a quantitative statement about how averaging destroys energy.

Further Questions #

While the original conjecture is settled, several natural variants remain.

Question 1

What is the sharp constant in the inequality if the dyadic partition is replaced by the partition induced by a prime $p \neq 2$, i.e. by the sets $A_k^{(p)} = {p^k m : \gcd(m, p) = 1}$? The same argument applies with $x_l = w_l$ for any weight sequence $w \in \ell^2(\mathbb{C})$, but the resulting constant depends on $|w|_2$ and the choice of weight, not on $\pi$.

Question 2

The inner sum $\sum_{l \geq 0} \alpha_{2^k(2l+1)}/(l+1)$ averages the entries in layer $A_k$ with the harmonic weights. What happens if the harmonic weight $1/(l+1)$ is replaced by a weight $w(l)$ depending on the position $l$ within the layer in a more general way, for instance $w(l) = l^{-s}$ for $s > 1/2$? The sharp constant would then involve $\zeta(2s)$ instead of $\zeta(2) = \pi^2/6$.

Question 3

For $p = 1$ the Ibragimov–Salimova theorem requires $q = \infty$, and the Hölder inequality takes a different form. Does an analogue of Retkes’s inequality hold for $\alpha \in \ell^1(\mathbb{C})$, and if so, what is the sharp constant?

References #

Ibragimov, Z. O. & Salimova, D. F. (2015). On an inequality in $\ell_p(\mathbb{C})$ involving Basel problem. Elemente der Mathematik, 70(2), 79–81. https://ems.press/content/serial-article-files/45532
Retkes, Z. (2012). Inequality for square summable complex series. Open Problem Garden. http://www.openproblemgarden.org/op/inequality_for_square_summable_complex_series
Benko, D. & Molokach, J. (2013). The Basel problem as a rearrangement of series. College Mathematics Journal, 44(3), 171–176.
Ritelli, D. (2013). Another proof of $\zeta(2) = \pi^2/6$ using double integrals. American Mathematical Monthly, 120(7), 642–645.

Recent Advances in Neural Network Optimization for LLM Training

Thu, 28 May 2026 00:00:00 +0000

The optimization landscape for LLM training looks very different from two years ago. AdamW still dominates production runs, but a wave of research is eroding that dominance from multiple angles simultaneously: matrix-aware optimizers, horizon-free schedulers, a sharply revised understanding of µP, and communication-efficient distributed methods. This post synthesizes 18 recent papers across five interconnected fronts.

The unifying thread is an active re-examination of long-held assumptions, from whether gradient geometry matters, to what µP is actually doing, to whether weight decay is a regularizer at all.

1. Muon and Non-Euclidean Optimizers #

Background #

Muon (Momentum Urthogon*alized by Newton-Schulz*) applies a gradient orthogonalization step via a Newton-Schulz iteration before each weight update. Rather than treating each parameter as an independent scalar (as Adam does), Muon recognizes that weight matrices have geometric structure and optimizes them accordingly, performing steepest descent under the spectral norm.

The core Newton-Schulz iteration, which runs stably in bfloat16 on tensor cores, is:

$$ X \leftarrow aX + b(XX^\top)X + c(XX^\top)^2 X $$

with coefficients $a = 3.4445$, $b = -4.7750$, $c = 2.0315$. In PyTorch:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


def newtonschulz5(G, steps=5, eps=1e-7):
 a, b, c = (3.4445, -4.7750, 2.0315)
 X = G.bfloat16()
 X /= (X.norm() + eps)
 if G.size(0) > G.size(1):
 X = X.T
 for _ in range(steps):
 A = X @ X.T
 B = b * A + c * A @ A
 X = a * X + B @ X
 if G.size(0) > G.size(1):
 X = X.T
 return X

A ready-to-use implementation lives at KellerJordan/Muon. Install via:

1

pip install git+https://github.com/KellerJordan/Muon

Muon is intended for hidden-layer matrix weights only. Embeddings, the output head, and scalar/vector parameters should still use AdamW:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20


from muon import MuonWithAuxAdam


hidden_matrix_params = [
 p for n, p in model.blocks.named_parameters()
 if p.ndim >= 2 and "embed" not in n
]
embed_params = [p for n, p in model.named_parameters() if "embed" in n]
scalar_params = [p for p in model.parameters() if p.ndim < 2]
head_params = [model.lm_head.weight]


optimizer = MuonWithAuxAdam(
 muon_params=hidden_matrix_params,
 lr=0.02,
 adamw_params=embed_params + scalar_params + head_params,
 adamw_lr=3e-4,
 adamw_wd=0.1,
)
# LR has built-in muP scaling, so no retuning is needed as you scale up

Scaling Muon: the Moonlight result #

MoonshotAI’s Moonlight (3B/16B-parameter MoE, trained on 5.7T tokens) provides the strongest evidence yet that Muon scales to real LLM training (arXiv:2502.16982, GitHub). Two fixes are needed to make Muon work beyond small scale:

Weight decay: without it, weight and output RMS norms grow until they overflow bfloat16.
Per-parameter update scale adjustment: matching the RMS update norm of AdamW by a factor of $\sqrt{(1-\beta_1)/(1+\beta_1)}$.

With these in place, scaling-law experiments indicate roughly 2× computational efficiency compared to AdamW at compute-optimal settings.

1
2
3
4
5


# Train a Qwen-like dense model with Muon (from Moonlight repo)
python3 examples/toy_train.py \
 --model qwen --optimizer muon \
 --dataset openwebtext-100k \
 --hidden_size 896 --lr 1e-3

A further efficiency variant is Flash-Muon, which reimplements the Newton-Schulz inner loop using a custom Triton kernel that exploits the symmetry of the $XX^\top$ computation, halving the effective FLOP count.

Theoretical foundations #

Kovalev (2025) shows in Understanding Gradient Orthogonalization via Non-Euclidean Trust-Region Optimization that the orthogonalized gradient update can be interpreted as a first-order trust-region method where the trust-region is defined in terms of the matrix spectral norm. This framework unifies Muon with normalized SGD and signSGD with momentum.

Pethick et al. (2025) propose Scion, a family of LMO-based algorithms that subsumes Muon, AdamW, and normalized SGD under a single framework (arXiv:2502.07529). By choosing an explicit norm for deep architectures, Scion also achieves hyperparameter transferability across model widths.

The Polar Express (Amsel et al., 2025) replaces Newton-Schulz with a minimax polar decomposition, solving a minimax problem at each iteration to minimize worst-case error. It converges faster than Newton-Schulz in both early and asymptotic stages, while remaining numerically stable in bfloat16.

Challenging the geometric narrative #

Despite the theoretical appeal, Shumaylov et al. (2026) mount a systematic challenge in Muon is Not That Special: Random or Inverted Spectra Work Just as Well. They introduce:

Freon: a family of optimizers based on Schatten (quasi-)norms, interpolating between SGD and Muon. The best-performing Schatten parameter for GPT-2 lies in the quasi-norm regime, which no LMO-based optimizer can represent.
Kaon: replaces Muon’s singular values with random noise, yet still matches Muon’s validation loss on GPT-2.

Their key insight: performance is primarily controlled by two local quantities, alignment (how well the update direction aligns with the gradient) and descent potential (step-size optimality). Muon succeeds by guaranteeing step-size optimality, not by tracking an ideal geometry.

Optimizer	Core mechanism	Key claim
Muon	Newton-Schulz orthogonalization	~2× efficiency over AdamW at compute-optimal
Scion	LMO over norm-ball	Unifies Muon/Adam; HP transferable across widths
Polar Express	Minimax polar decomposition	Faster convergence; bfloat16-safe
Freon / Kaon	Schatten quasi-norms / random SVs	Geometry is irrelevant; alignment drives performance

2. Learning Rate Scheduling #

Linear decay is provably optimal #

Defazio et al. (2023/2024) close a long-standing gap between theory and practice in Optimal Linear Decay Learning Rate Schedules and Further Refinements (arXiv:2310.07831). Under worst-case analysis, linear decay, setting $\eta_t \propto (1 - t/T)$, is the theoretically optimal schedule for a broad class of optimizers including SGD. Across 10 diverse benchmarks, it consistently outperforms cosine annealing.

$$ \eta_t = \eta_{\max} \cdot \left(1 - \frac{t}{T}\right) $$

1
2
3
4


# PyTorch built-in, the optimal default
scheduler = torch.optim.lr_scheduler.LinearLR(
 optimizer, start_factor=1.0, end_factor=0.0, total_iters=total_steps
)

The WSD cooldown phase #

The Warmup-Stable-Decay (WSD) scheduler separates training into distinct phases ending in a sharp LR drop. Dremov et al. (2025) analyse the cooldown phase specifically in Training Dynamics of the Cooldown Stage in WSD, finding:

Cooldown shapes that balance exploration and exploitation consistently outperform purely exploratory or exploitative alternatives.
There is substantial sensitivity to AdamW’s $\beta_2$ parameter during cooldown, and higher $\beta_2$ values yield consistent improvements.
Loss-landscape visualisations support the “river valley” perspective: the cooldown follows a narrow valley in parameter space.

Convex theory meets LLM practice #

Schaipp et al. (2025) show in The Surprising Agreement Between Convex Optimization Theory and Learning-Rate Scheduling for Large Model Training that schedules for large model training obey performance bounds from non-smooth convex optimisation. For the constant schedule with linear cooldown, the bound is:

$$ \bar{f}T - f^* \leq \frac{|x_0 - x^*|^2}{2\eta T} + \frac{\eta}{2} \sum{t=0}^{T-1} \sigma_t^2 $$

where the cooldown benefit appears explicitly through the absence of logarithmic terms. This enables principled LR transfer: exploiting the theory yields noticeable validation loss improvements for 124M and 210M Llama-type models when extending schedules for continued training.

Anytime schedules and weight averaging #

Meterez et al. (2026) prove in Anytime Pretraining: Horizon-Free Learning-Rate Schedules with Weight Averaging (arXiv:2602.03702) that horizon-free (anytime) schedules exist for overparameterised linear regression, with weight averaging central to achieving minimax-optimal convergence. At 150M–300M params trained at 1–32× Chinchilla scale, a constant LR with weight averaging matches well-tuned cosine decay across the full training duration.

Weight averaging is a largely underutilised practical lever. It should be a default, not an afterthought.

ScheduleFree+ at LLM scale #

Defazio (2026) extends schedule-free learning to full LLM pretraining in ScheduleFree+: Scaling Learning-Rate-Free and Schedule-Free Learning to Large Language Models (arXiv:2605.19095). Practical fixes for large batch and model sizes enable ScheduleFree+ to achieve a 31% improvement over WSD schedules at 1000 tokens per parameter, while also providing a theoretical foundation for checkpoint merging during pretraining.

1

pip install schedulefree

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


from schedulefree import AdamWScheduleFree


optimizer = AdamWScheduleFree(
 model.parameters(), lr=1e-3, warmup_steps=1000
)


# Must switch to eval mode before evaluation
optimizer.eval()
val_loss = evaluate(model)
optimizer.train()

GitHub: facebookresearch/schedule_free

3. Hyperparameter Transfer and Scaling Laws (µP) #

Weight decay as the true driver of LR transfer #

The Maximal Update Parameterisation (µP) is widely used to transfer optimal learning rates from proxy models to large ones without re-tuning. Kosson et al. (2025/2026), accepted to ICLR 2026, provide a large-scale empirical refutation of the standard µP narrative in Weight Decay May Matter More than µP for Learning Rate Transfer in Practice.

Their finding: µP’s geometric alignment assumptions, which require alignment between a layer’s inputs, weights, and gradient updates, hold only briefly at the start of training. For the remainder, it is weight decay that stabilises update dynamics across widths and facilitates LR transfer. This implies µP’s scaling primarily acts as an implicit warmup, and can be largely replaced by modified warmup schedules.

Embedding layer LR as the key factor #

Kalra & Barkeshli (2026) provide complementary evidence in Quantifying Hyperparameter Transfer and the Importance of Embedding Layer Learning Rate, tracing µP’s advantage over standard parameterisation (SP) to a single factor: the embedding layer learning rate.

In SP, the embedding LR acts as a training bottleneck. Simply increasing it by a factor of model width, matching µP, eliminates most of the gap. Three quantitative metrics are used: quality of scaling law fit, robustness to extrapolation errors, and asymptotic loss penalty.

1
2
3
4
5
6
7
8
9


# Simple fix that captures most of µP's benefit in SP
embed_lr_multiplier = model_width / base_width # = d_model / d_model_proxy


param_groups = [
 {"params": model.embed.parameters(), "lr": base_lr * embed_lr_multiplier},
 {"params": non_embed_params, "lr": base_lr},
]
optimizer = torch.optim.AdamW(param_groups, weight_decay=0.1)

Open question: Kosson et al. argue µP acts as an implicit warmup; Kalra & Barkeshli argue it is about the embedding LR. Both contradict µP’s original geometric motivation. No consensus has emerged, and the practical implications differ significantly.

4. Normalization, Weight Decay, and Variance Reduction #

The end-of-training gradient spike #

Defazio (2025) identifies a subtle pathology in Why Gradients Rapidly Increase Near the End of Training: gradient norms spike sharply near the end of long LLM runs. The diagnosis is a three-way interaction between weight decay, normalisation layers, and the LR schedule.

When a layer is followed by normalisation, its scale becomes irrelevant to the forward pass, but weight decay continues shrinking the parameters. This creates an implicit competition between the optimizer’s effective update size and normalisation rescaling, causing gradient norms to grow unchecked as the LR decays.

Fix: disable weight decay for AdamW-updated layers in architectures where those layers are directly followed by normalisation (e.g. every transformer block):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


no_wd, wd = [], []
for name, param in model.named_parameters():
 if "norm" in name or "embed" in name or param.ndim < 2:
 no_wd.append(param)
 else:
 wd.append(param)


optimizer = torch.optim.AdamW([
 {"params": wd, "weight_decay": 0.1},
 {"params": no_wd, "weight_decay": 0.0},
], lr=3e-4)

This simultaneously eliminates the spike and reduces loss throughout training. The analysis explains why weight decay should be disabled for AdamW-updated layers in architectures like modded-nanoGPT.

Weight normalisation as an alternative #

Nemotron-Flash (Fu et al., 2025, NeurIPS 2025) investigates weight normalisation as a practical mechanism in small language models, finding that it enables more effective weight updates and improves final convergence. Weight normalisation sidesteps the weight-decay/normalisation interaction described above, though at the cost of slightly worse final loss compared to a well-tuned baseline.

MARS: variance reduction meets preconditioned gradients #

Despite decades of theoretical work, variance reduction has largely failed to yield practical gains in deep learning. Yuan et al. (2024/2025) attempt to change this in MARS: Unleashing the Power of Variance Reduction for Training Large Models, proposing a unified framework that reconciles AdamW, Lion, and Shampoo with variance reduction via a scaled stochastic recursive momentum technique.

GPT-2 training results look strong. However, the comprehensive benchmark by Semenov et al. (2025), Benchmarking Optimizers for Large Language Model Pretraining, a 73-page study covering 44 figures and 48 tables across standardised scenarios, reveals that MARS does not work well with small batch sizes, limiting its practical applicability in memory-constrained settings.

This underscores the danger of evaluating optimizers on a single benchmark setup: MARS looks excellent at the batch sizes used in the original paper and brittle elsewhere.

5. Distributed Training: DiLoCo and Its Descendants #

DiLoCo (Distributed Low-Communication training) uses AdamW as an inner optimizer for $H$ local steps on each worker (typically $H = 500$), then synchronises by applying Nesterov momentum to the pseudo-gradient, the sum of all parameter changes across those inner steps. This reduces communication frequency by up to 500×.

OpenDiLoCo: the open-source foundation #

PrimeIntellect’s OpenDiLoCo provides a reproducible drop-in implementation, demonstrated training across two continents and three countries with 90–95% compute utilisation. It later served as the foundation for INTELLECT-1, a 10B-parameter model trained globally.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18


from functools import partial
from open_diloco.hivemind_diloco import DiLoCoOptimizer


inner_optimizer = partial(torch.optim.AdamW, lr=4e-4)
outer_optimizer = partial(
 torch.optim.SGD, lr=0.7, momentum=0.9, nesterov=True
)


optimizer = DiLoCoOptimizer(
 dht=dht,
 params=model.parameters(),
 batch_size=512,
 num_inner_steps=500, # sync every 500 steps, 500× fewer communications
 inner_optimizer=inner_optimizer,
 outer_optimizer=outer_optimizer,
)

Why DiLoCo works on a single node: SNOO #

Kallusky et al. (2025) show in SNOO: Step-K Nesterov Outer Optimizer that DiLoCo’s effectiveness, even on a single node, stems from applying Nesterov momentum to the pseudo-gradient. Their method isolates this as a standalone Lookahead variant. Results:

1.5–2.5× FLOPs efficiency gains up to $10^{23}$ training FLOPs.
Improvements increase with model size.
Compatible with both AdamW and Muon as inner optimizers.
Minimal memory overhead.

The single-worker DiLoCo achieves speedups of up to 6.32% in steps-to-loss over AdamW on a 160M Llama model.

Smoothing DiLoCo: Generalized Primal Averaging (GPA) #

Defazio et al. (2025/2026) propose GPA in Smoothing DiLoCo with Primal Averaging for Faster Training of LLMs (arXiv:2512.17131), which decouples DiLoCo’s interpolation constants to enable smooth iterate averaging at every step, replacing uniform averaging with exponential moving averaging.

GPA unifies single-worker DiLoCo and ScheduleFree within a single non-distributed framework. Speedups over AdamW in steps-to-target-loss:

Model	Speedup
Llama-160M	8.71%
Llama-1B	10.13%
Llama-8B	9.58%

Streaming DiLoCo: towards free distributed training #

Douillard et al. (2025) address the remaining bottleneck in Streaming DiLoCo with Overlapping Communication: Towards a Distributed Free Lunch (arXiv:2501.18512): even with infrequent synchronisation, each sync exchanges all parameters simultaneously. Three fixes:

Streaming sync: synchronise only subsets of parameters at a time.
Overlapping communication: continue training during synchronisation.
Quantisation: reduce cross-worker data to fewer bits.

Together, required bandwidth drops by two orders of magnitude while maintaining comparable quality at billion-parameter scale.

Method	Setting	Key contribution	Gain
SNOO	Single-node	Nesterov momentum on pseudo-gradient	1.5–2.5× FLOP efficiency
GPA	Single-node	Smooth iterate averaging; unifies DiLoCo + SF	~9% steps-to-loss
Streaming DiLoCo	Distributed	Streaming sync + quantisation	~100× bandwidth reduction

6. Cross-Cutting Themes and Open Questions #

Several recurrent tensions emerge from reading these papers together.

Geometry vs. step-size calibration in Muon #

Kovalev, Pethick et al., and Amsel et al. offer geometric explanations for Muon’s success. Shumaylov et al. argue that geometry is practically irrelevant and step-size optimality is the true driver. Which narrative guides future research matters: geometry points toward more sophisticated matrix norms; the step-size interpretation suggests much simpler paths to similar gains.

What µP is actually doing #

Kosson et al. argue µP is primarily an implicit warmup mechanism. Kalra & Barkeshli argue it is essentially about the embedding layer LR. Both stand in contrast to µP’s original geometric motivation. The practical stakes are high: the warmup interpretation suggests µP can be discarded with a schedule change; the embedding LR interpretation suggests a single-line fix.

Weight decay as a multi-role hyperparameter #

Weight decay appears as a protagonist in three independent stories in this survey:

Defazio: source of end-of-training gradient spikes via interaction with normalisation.
Kosson et al.: the true driver of LR transfer, not µP geometry.
Kalra & Barkeshli: improves scaling law fits but hurts extrapolation robustness.

It is no longer tenable to treat weight decay as a simple regulariser with a sensible default. It must be understood per-layer and in interaction with your normalisation strategy.

DiLoCo as the practical distributed optimizer #

Despite a large body of research on distributed optimizers, DiLoCo and its derivatives appear to be the only methods that consistently add value beyond simply scaling the batch size. The finding that its benefits carry over to single-node settings (via SNOO and GPA) makes it a particularly important line of work for practitioners at all scales.

Practical Recommendations for 2026 #

Based on the convergence of evidence across these papers, for a new large training run consider:

Optimizer: Muon for hidden-layer matrix weights + AdamW for embeddings/head. The Moonlight scaling fixes (weight decay + update scale adjustment) are necessary above ~1B parameters.
Schedule: ScheduleFree+ or linear decay instead of cosine. If you need a fixed-horizon schedule, WSD with higher $\beta_2$ during cooldown.
Weight decay: Disable it for layers directly followed by normalisation to avoid end-of-training gradient spikes.
Outer optimizer: Wrap your training loop with single-worker DiLoCo (SNOO or GPA) for a ~9% efficiency gain with no architectural changes.
µP alternatives: Before adopting full µP overhead, try increasing the embedding layer LR by a factor of $d_{\text{model}} / d_{\text{proxy}}$. This may reproduce most of the benefit.

None of these require fundamental architectural changes.

References #

#	Paper	Venue	Links
1	Jordan et al. (2024): Muon: An optimizer for hidden layers	n/a	blog · GitHub
2	Liu et al. (2025): Muon is Scalable for LLM Training (Moonlight)	n/a	arXiv:2502.16982 · GitHub
3	Kovalev (2025): Understanding Gradient Orthogonalization	n/a	n/a
4	Pethick et al. (2025): Training Deep Learning Models with Norm-Constrained LMOs (Scion)	n/a	arXiv:2502.07529
5	Amsel et al. (2025): The Polar Express	n/a	n/a
6	Shumaylov et al. (2026): Muon is Not That Special (Freon/Kaon)	n/a	n/a
7	Defazio et al. (2023): Optimal Linear Decay Learning Rate Schedules	n/a	arXiv:2310.07831
8	Dremov et al. (2025): Training Dynamics of the Cooldown Stage in WSD	n/a	n/a
9	Schaipp et al. (2025): Surprising Agreement Between Convex Theory and LR Scheduling	n/a	n/a
10	Meterez et al. (2026): Anytime Pretraining	n/a	arXiv:2602.03702
11	Defazio (2026): ScheduleFree+	n/a	arXiv:2605.19095 · GitHub
12	Kosson et al. (2026): Weight Decay May Matter More than µP	ICLR 2026	n/a
13	Kalra & Barkeshli (2026): Quantifying HP Transfer and Embedding LR	n/a	n/a
14	Defazio (2025): Why Gradients Rapidly Increase Near End of Training	n/a	n/a
15	Fu et al. (2025): Nemotron-Flash	NeurIPS 2025	n/a
16	Yuan et al. (2025): MARS	n/a	n/a
17	Semenov et al. (2025): Benchmarking Optimizers for LLM Pretraining	n/a	n/a
18	Kallusky et al. (2025): SNOO	n/a	n/a
19	Defazio et al. (2026): Smoothing DiLoCo with Primal Averaging (GPA)	n/a	arXiv:2512.17131
20	Douillard et al. (2025): Streaming DiLoCo	n/a	arXiv:2501.18512
21	Douillard et al. (2023/2024): DiLoCo (original)	n/a	arXiv:2311.08105
22	PrimeIntellect AI (2024): OpenDiLoCo	n/a	GitHub · blog

The Invariant Subspace Problem

Thu, 28 May 2026 00:00:00 +0000

Few questions in functional analysis have attracted sustained attention across as many decades as this one. It sits at the confluence of operator theory, spectral theory, and complex analysis, and every partial result has opened new territory rather than narrowing the problem to a routine case.

Problem (Invariant Subspace Problem)

Does every bounded linear operator $T$ on an infinite-dimensional separable complex Hilbert space $\mathcal{H}$ have a non-trivial closed invariant subspace?

That is, does there always exist a closed subspace $\mathcal{M} \subsetneq \mathcal{H}$ with $\mathcal{M} \neq {0}$ such that $T\mathcal{M} \subseteq \mathcal{M}$?

The problem is rated medium importance on the Open Problem Garden. It is old enough to have accumulated a rich history of partial results, yet still open in the Hilbert space setting after more than seventy years.

Trivial Observations and Why They Run Out #

Two subspaces are always invariant: ${0}$ and $\mathcal{H}$ itself. These are the trivial invariant subspaces; the problem asks whether anything else must exist.

On finite-dimensional spaces the answer is immediate: every operator on $\mathbb{C}^n$ has an eigenvector (by the fundamental theorem of algebra applied to the characteristic polynomial), and the span of any eigenvector is a one-dimensional invariant subspace. This argument fails completely in infinite dimensions, where the spectrum can be continuous and eigenvectors need not exist.

On non-separable Hilbert spaces the problem is also trivial but for a different reason: for any non-zero vector $x \in \mathcal{H}$, the closed linear span $\overline{\operatorname{span}{T^n x : n \geq 0}}$ is a closed invariant subspace, and if $\mathcal{H}$ is non-separable it cannot equal all of $\mathcal{H}$. So the problem is genuinely about separable spaces.

Landscape of Known Results #

Positive Results: Classes with Invariant Subspaces #

Theorem (Aronszajn–Smith, 1954)

Every compact operator on a Banach space of dimension greater than one has a non-trivial closed invariant subspace.

The compact case was already known to von Neumann in the 1930s for Hilbert spaces, but was never published; Aronszajn and Smith gave the first published proof, extended to Banach spaces. The key idea is that a compact operator can be approximated by finite-rank operators, each of which has invariant subspaces, and a limiting argument produces an invariant subspace for the compact operator.

Theorem (Lomonosov, 1973)

If a bounded operator $T$ on a Banach space commutes with a non-zero compact operator, then $T$ has a non-trivial hyperinvariant subspace (a subspace invariant under every operator that commutes with $T$).

Lomonosov’s proof is strikingly short, less than a page, and uses the Schauder fixed-point theorem in an unexpected way. It subsumes both the compact case (an operator commutes with itself) and the polynomially compact case (an operator commutes with $p(T)$, which is compact if $p(T)$ is). For several years it seemed that Lomonosov’s theorem might resolve the problem entirely, until Hadwin, Nordgren, Radjavi, and Rosenthal (1980) exhibited an operator that does not commute with any non-zero compact operator yet still has invariant subspaces.

Theorem (Brown, 1987)

Every subnormal operator on a Hilbert space has a non-trivial invariant subspace.

An operator $T$ is subnormal if it is the restriction of a normal operator on a larger Hilbert space. Normal operators are handled by the spectral theorem, which produces a rich lattice of invariant subspaces; subnormal operators inherit invariant subspaces by restriction. Brown’s proof uses techniques from rational approximation theory (the solution of the Halmos problem on subnormal operators).

Beyond these landmark theorems, invariant subspaces are also known for: hyponormal operators with some additional conditions, operators whose spectrum has interior points, operators satisfying growth conditions on the resolvent, and polynomially bounded operators with spectrum containing the unit circle under further constraints (Liu, 2017; Réjasse, 2023).

Beurling’s Theorem: A Complete Classification #

Theorem (Beurling, 1949)

The closed invariant subspaces of the unilateral shift $S : H^2(\mathbb{D}) \to H^2(\mathbb{D})$, $(Sf)(z) = zf(z)$, are exactly the subspaces of the form $\varphi H^2(\mathbb{D})$ where $\varphi$ is an inner function (i.e. $|\varphi(e^{i\theta})| = 1$ a.e.).

Beurling’s theorem is a landmark because it gives not merely existence but a full classification of all invariant subspaces for a single operator. The shift on $H^2$ is in many senses the canonical operator for the Hilbert space invariant subspace problem: finding a counterexample to the full problem is equivalent to finding an operator with no invariant subspaces, and the shift shows how rich such structure can be even for a single operator.

Negative Results: Counterexamples on Banach Spaces #

Theorem (Enflo, 1975/1987; Read, 1984)

There exist separable Banach spaces and bounded linear operators on them with no non-trivial closed invariant subspace. In particular, Read constructed such an operator on $\ell^1$.

Enflo’s counterexample was the first, constructed in 1975 though not published until 1987 due to its length and complexity. Read’s construction (1984) arrived independently and somewhat earlier in print; a further, more explicit example by Read (1985) lives on the classical space $\ell^1$. These results make clear that the answer to the invariant subspace problem is negative for general Banach spaces. The Hilbert space case remains the central open question precisely because no counterexample on any reflexive Banach space, much less a Hilbert space, has been found.

The Hilbert–Banach Gap #

The separation between Hilbert space and general Banach space behaviour is a recurring theme. Several features of Hilbert spaces that Banach spaces lack suggest why counterexamples might not exist in the Hilbert setting:

The inner product gives every operator an adjoint $T^*$, and the lattice of invariant subspaces of $T$ and of $T^*$ are related by orthogonal complementation.
The spectral theorem for normal operators provides a complete invariant subspace theory for that class, anchoring intuition.
Reflexivity and the existence of unconditional bases in specific Hilbert spaces constrain operator behaviour more than in $\ell^1$.

None of these features has yet been converted into a proof for the general case.

Recent Proof Attempts #

The problem has attracted renewed attention in recent years.

In May 2023, Per Enflo, the same mathematician who produced the first Banach space counterexample, posted a preprint to arXiv (2305.15442) claiming a positive resolution for all separable Hilbert spaces. The original preprint was 13 pages; a substantially expanded version (52 KB) appeared in April 2024. Enflo himself has been cautious about the result, noting that expert review is ongoing. As of this writing the preprint has not received a definitive verdict from the community.

In July 2023 an independent preprint by Neville (arXiv:2307.08176) also claimed a positive solution for separable Hilbert spaces.

In September 2024 a peer-reviewed article in Axioms by Khalil, Yousef, Alshanti, and Abu Hammad announced a proof, but basic errors were identified shortly after publication (Ghatasheh, arXiv:2411.19409, November 2024).

The problem therefore remains officially open. The cluster of recent attempts reflects both its difficulty and its continued centrality in functional analysis.

Research Directions #

1. Cyclic Vectors and the Spectral Radius Formula #

A vector $x \in \mathcal{H}$ is cyclic for $T$ if $\mathcal{H} = \overline{\operatorname{span}{T^n x : n \geq 0}}$. An operator with a non-trivial invariant subspace cannot have every non-zero vector be cyclic. The contrapositive is: if every non-zero vector is cyclic, then $T$ is a counterexample.

Read’s Banach-space constructions proceed by building hypercyclic operators whose orbits are dense. On Hilbert spaces, Hilbert space geometry severely constrains the density of orbits. Making this constraint quantitative, via growth estimates on $|T^n x|$ or on the resolvent $|(T-\lambda)^{-1}|$, might close the gap between known positive results and the general case.

2. Dual Algebra Techniques #

A powerful modern approach studies the dual algebra $\mathcal{A} _T$, the weak-$*$ closure of the polynomials in $T$ as a subalgebra of $\mathcal{B}(\mathcal{H})$. If $\mathcal{A} _T = \mathcal{B}(\mathcal{H})$ (the operator is reflexive in this sense), one can sometimes extract invariant subspaces from the structure of the algebra. Results along these lines have been obtained for $C _{00}$ contractions (Bercovici, Foiaş, Pearcy) and for polynomially bounded operators under spectral conditions (Liu, 2017). The key open question is whether every Hilbert space contraction is reflexive in this sense, or whether the dual algebra approach can be made to work for all contractions via Sz.-Nagy–Foiaş theory.

3. Contractions and the Sz.-Nagy–Foiaş Calculus #

Every contraction ($|T| \leq 1$) on a Hilbert space admits a minimal unitary dilation (Sz.-Nagy’s dilation theorem), and Foiaş developed a functional calculus for contractions based on $H^\infty(\mathbb{D})$. The rich structure of this calculus has produced invariant subspace theorems for $C_{11}$ contractions and for contractions whose spectrum is rich enough. The question is whether the calculus can be pushed to all contractions; the general invariant subspace problem for contractions is equivalent to the full problem (by rescaling), so this is not a simplification but a different vantage point that has been productive.

4. Almost Invariant Half-Spaces #

A weaker notion, studied by Androulakis, Popov, Tcaciuc, and Troitsky, asks for almost invariant half-spaces: closed subspaces $\mathcal{M}$ of infinite dimension and infinite codimension such that $T\mathcal{M} \subseteq \mathcal{M} + \mathcal{F}$ for some finite-dimensional subspace $\mathcal{F}$. These exist for every operator on any infinite-dimensional Banach space. Whether every operator on a Hilbert space has a genuinely invariant (not just almost invariant) infinite-dimensional subspace of infinite codimension remains open and is a concrete intermediate target.

5. Hyperinvariant Subspaces #

A subspace is hyperinvariant for $T$ if it is invariant under every operator that commutes with $T$. Every hyperinvariant subspace is invariant, so existence of a hyperinvariant subspace implies a positive answer to the invariant subspace problem. Lomonosov’s 1973 theorem gives hyperinvariant subspaces when $T$ commutes with a compact operator. The hyperinvariant subspace problem, does every operator on a Hilbert space (other than scalar multiples of the identity) have a hyperinvariant subspace?, is also open and may be harder than the invariant subspace problem itself.

References #

Aronszajn, N. & Smith, K. T. (1954). Invariant subspaces of completely continuous operators. Annals of Mathematics, 60(2), 345–350.
Beurling, A. (1949). On two problems concerning linear transformations in Hilbert space. Acta Mathematica, 81, 239–255.
Brown, S. (1987). Hyponormal operators with thick spectra have invariant subspaces. Annals of Mathematics, 125(1), 93–103.
Enflo, P. H. (1987). On the invariant subspace problem for Banach spaces. Acta Mathematica, 158, 213–313.
Enflo, P. H. (2023). On the invariant subspace problem in Hilbert spaces. arXiv:2305.15442.
Lomonosov, V. I. (1973). Invariant subspaces of operators commuting with compact operators. Functional Analysis and Its Applications, 7(3), 213–214.
Read, C. J. (1984). A solution to the invariant subspace problem. Bulletin of the London Mathematical Society, 16(4), 337–401.
Read, C. J. (1985). A solution to the invariant subspace problem on the space $\ell^1$. Bulletin of the London Mathematical Society, 17(4), 305–317.
Radjavi, H. & Rosenthal, P. (2003). Invariant Subspaces (2nd ed.). Dover.
Bercovici, H., Foiaş, C., & Pearcy, C. (1985). Dual Algebras with Applications to Invariant Subspaces and Dilation Theory. AMS.

Something Like Picard for 1-Forms

Wed, 27 May 2026 00:00:00 +0000

Picard’s great theorem is a statement about how wildly a holomorphic function can behave near an essential singularity. The conjecture below asks whether injectivity of local primitives of a 1-form is enough to rule out such wild behaviour at the origin, forcing the 1-form to extend meromorphically across the puncture.

Conjecture (Elsner, 2010)

Let $D$ be the open unit disk and let $U_1,\dots,U_n$ be open sets with $\bigcup_{j=1}^n U_j = D\setminus{0}$. Suppose there are injective holomorphic functions $f_j : U_j \to \mathbb{C}$ such that $$\mathrm{d}f_j = \mathrm{d}f_k \quad \text{on every connected component of } U_j \cap U_k.$$ Then the $\mathrm{d}f_j$ glue together to a meromorphic 1-form on $D$.

The problem is rated medium importance on the Open Problem Garden and is not recommended for undergraduates, reflecting the depth of the tools involved. It arises from Elsner’s study of hyperelliptic action integrals in the context of the exact WKB method for Schrödinger equations with polynomial potential (Elsner, Ann. Inst. Fourier 49(1), 1999).

Setup and Interpretation #

The compatibility condition $\mathrm{d}f_j = \mathrm{d}f_k$ on each connected component of $U_j \cap U_k$ is equivalent to saying $f_j - f_k$ is locally constant there. The local differentials therefore glue together unambiguously to a global holomorphic 1-form $$\omega \in \Omega^1(D\setminus{0})$$ whose restriction to each $U_j$ equals $\mathrm{d}f_j$. The conjecture asserts that $\omega$ does not have an essential singularity at the origin: it extends to a meromorphic 1-form on all of $D$, meaning near $0$ it looks like $$\omega = \left(\frac{c_{-m}}{z^m} + \cdots + \frac{c_{-1}}{z} + c_0 + c_1 z + \cdots\right)dz$$ for some $m \ge 0$.

The injectivity of each $f_j$ is the crucial hypothesis. Without it the statement is false: any holomorphic 1-form $\omega$ on $D\setminus{0}$ with an essential singularity at $0$ is locally $\mathrm{d}f_j$ for some holomorphic $f_j$, and these $f_j$ can be chosen on contractible pieces of the cover; injectivity is what prohibits essential singularities from arising.

What Is Already Known #

Partial Result

Under the hypotheses of the conjecture:

The 1-form $\omega$ is holomorphic on $D\setminus{0}$.
If the residue of $\omega$ at the origin vanishes, Picard’s big theorem can be applied to conclude that $\omega$ extends meromorphically across $0$.

Point (1) is straightforward: each $\mathrm{d}f_j$ is holomorphic on $U_j$ and the local forms agree on overlaps, so $\omega$ is holomorphic wherever it is defined, i.e. on $D\setminus{0}$.

Point (2) is the key partial result recorded by Elsner. If $\operatorname{Res}_0\omega = 0$, then $\omega$ has trivial monodromy around the origin and admits a single-valued holomorphic primitive $F$ on the punctured disk: $\omega = \mathrm{d}F$. The injectivity of each local branch $f_j$ then forces $F$ itself to be injective on some punctured neighbourhood of $0$ (since $f_j = F + c$ locally). An injective holomorphic function on a punctured disk cannot have an essential singularity there, and this is where Picard enters: at an essential singularity, by Picard’s big theorem, every value is taken infinitely often in any punctured neighbourhood, contradicting injectivity. Hence $F$ has at most a pole at $0$, and $\omega = \mathrm{d}F$ is meromorphic.

The open case is when $\operatorname{Res}_0\omega \ne 0$, so that $\omega$ has non-trivial monodromy and no single-valued global primitive exists. The local primitives $f_j$ then experience monodromy as one loops around the origin, and the injectivity constraint must be leveraged in this more delicate multi-valued setting.

Connection to Picard’s Theorem #

The title of the conjecture reflects a precise structural analogy.

Theorem (Picard's Great Theorem)

If $f$ has an essential singularity at $z_0$, then in every punctured neighbourhood of $z_0$ the function $f$ takes every value in $\mathbb{C}$, with at most one exception, infinitely many times.

In particular, a function with an essential singularity is far from injective near that point. The conjecture elevates this observation to the level of 1-forms: an injective holomorphic primitive should preclude essential singularities in the 1-form itself, even when the primitive is only locally and multi-valuedly defined.

Standard Picard covers the zero-residue case by reducing to a single-valued primitive. The conjecture asks for an analogue that works when the monodromy is non-trivial, a genuinely new statement about multi-valued functions and their differential geometry.

Origin: Hyperelliptic Action Integrals #

The problem arises from the exact WKB method applied to the stationary Schrödinger equation $-\psi’’ + V(x)\psi = E\psi$ with polynomial potential $V$. The formal WKB ansatz $\psi \sim e^{S/\hbar}$ produces a multivalued action integral $$\mathcal{I}(E) = \int_\gamma \sqrt{V(x) - E}\mathrm{d}x$$ defined on a hyperelliptic Riemann surface whose branch structure depends on the energy parameter $E$. Elsner’s 1999 paper constructs the Riemann surface of $\mathcal{I}$ explicitly and shows its branch points accumulate densely in the value plane, a phenomenon that obstructs Borel–Laplace resummation of the WKB symbols.

In this setting the local inverses of $\mathcal{I}$ play the role of the $f_j$: they are locally injective holomorphic functions whose differentials agree on overlaps. The conjecture asks whether the obstruction to global meromorphic extension can arise only from a pole, a controlled singularity, rather than an essential one.

Research Directions #

1. The Non-Zero Residue Case #

The open heart of the problem is the case $\operatorname{Res}_0\omega \ne 0$. Here $\omega$ is not exact near $0$, the monodromy of the primitive is a non-trivial translation $f_j \mapsto f_j + 2\pi i, \operatorname{Res}_0\omega$, and no single injective function encompasses the full behaviour near the singularity.

A natural approach is to pass to a cyclic cover $\tilde D \to D$ that trivialises the monodromy, construct a single-valued primitive on $\tilde D\setminus{0}$, and then appeal to the zero-residue argument there. The key difficulty is that the injectivity of each $f_j$ on $U_j$ does not immediately imply injectivity of the lifted primitive on $\tilde D$, since different sheets can collide. Making this argument precise, or finding a counterexample, is the main open problem.

2. Quantitative Control via Nevanlinna Theory #

An alternative strategy replaces Picard’s theorem by its quantitative form. If $F$ is a meromorphic function on the punctured disk with an essential singularity, the Nevanlinna characteristic $T(r,F)$ grows faster than any power of $\log(1/r)$ as $r\to 0$. For an injective function the counting functions $N(r,a,F)$, recording how often $F = a$ in the punctured disk, satisfy strong constraints.

Nevanlinna-theoretic methods might give a direct bound on $T(r,f_j)$ in terms of the geometry of the cover ${U_j}$ and the injectivity of $f_j$, ruling out essential singularities of $\omega$ without passing through the monodromy argument. This would require adapting the standard Nevanlinna machinery to functions that are only locally defined on an open cover.

3. Replacing Injectivity by Finite Valence #

One can ask whether the conjecture remains true if “injective” is weakened to “at most $d$-to-one” for some fixed integer $d$. Finite-valence holomorphic functions cannot have essential singularities either, by a Picard-type argument (a function of valence at most $d$ takes each value at most $d$ times, so in any neighbourhood of an essential singularity it must omit a set of positive capacity, contradicting Picard).

If the conjecture extends to finite valence, the proof strategy will likely yield a valence-independent argument that illuminates the zero-residue case more transparently. If it fails for finite valence, the counterexample geometry would clarify what role injectivity plays beyond the mere avoidance of essential singularities.

4. Several Complex Variables #

In $\mathbb{C}^n$ for $n \ge 2$ the theory of isolated singularities of holomorphic functions changes dramatically: by Hartogs’ extension theorem, isolated singularities of holomorphic functions are always removable. One would expect the analogous conjecture for holomorphic 1-forms in $\mathbb{C}^n$ to be more tractable, or even to follow from known extension results.

Formulating the precise analogue, replacing the punctured disk by a domain $\Omega\setminus{0}$ in $\mathbb{C}^n$, and specifying what “meromorphic 1-form” means on a higher-dimensional domain, and checking whether Hartogs-type arguments already resolve it would clarify which features of the problem are genuinely one-dimensional.

5. Geometric Formulation on Riemann Surfaces #

The disk $D$ and the puncture at $0$ are not special: the same question can be posed on any Riemann surface $X$ with a marked point $p$. Given an open cover of $X\setminus{p}$ and injective holomorphic functions $f_j$ on each piece with compatible differentials, does $\omega = \mathrm{d}f_j$ extend meromorphically across $p$?

The answer may depend on the genus and the function theory of $X$. For the disk (simply connected, genus 0) the monodromy is a simple translation; for a torus or higher-genus surface the monodromy group is richer and the argument structure should change. Comparing these cases may isolate the essential input from the topology versus the analysis.

References #

Elsner, B. (1999). Hyperelliptic action integral. Annales de l’Institut Fourier, 49(1), 303–331. https://www.numdam.org/item/AIF_1999__49_1_303_0/
Ahlfors, L. V. (1979). Complex Analysis (3rd ed.). McGraw-Hill.
Conway, J. B. (1978). Functions of One Complex Variable (2nd ed.). Springer.
Nevanlinna, R. (1970). Analytic Functions. Springer.
Forster, O. (1981). Lectures on Riemann Surfaces. Springer.
Delabaere, E., Dillinger, H., & Pham, F. (1993). Résurgence de Voros et périodes des courbes hyperelliptiques. Annales de l’Institut Fourier, 43(1), 163–199.

Criterion for Boundedness of Power Series

Tue, 26 May 2026 00:00:00 +0000

Introduction & Problem Statement #

Power series constitute one of the most ubiquitous objects in analysis. A power series $\sum_{n=0}^{\infty}a_n x^n$ with infinite radius of convergence defines a real-entire function $f:\mathbb{R}\to\mathbb{R}$. Whereas the question of convergence is completely settled by Cauchy–Hadamard theory, the question of boundedness of the sum function is far subtler and, as of this writing, remains open.

Question 1 (Rüdinger, 2009)

Let $(a_n) _{n\ge 0}$ be a sequence of real numbers such that the power series $\sum _{n=0}^{\infty}a_n x^n$ converges for every $x\in\mathbb{R}$, thereby defining a smooth function $f:\mathbb{R}\to\mathbb{R}$. Give a necessary and sufficient criterion on $(a_n)$ for $f$ to be bounded on $\mathbb{R}$.

The problem is rated low importance on the Open Problem Garden and is recommended as accessible to undergraduates; nevertheless, a complete answer appears to be unknown.

Motivating examples.

Function	Power series	Bounded?
$\cos x$	$\displaystyle\sum_{k=0}^{\infty}\frac{(-1)^k}{(2k)!}x^{2k}$	$\|\cos x\|\le 1$
$\sin x$	$\displaystyle\sum_{k=0}^{\infty}\frac{(-1)^k}{(2k+1)!}x^{2k+1}$	$\|\sin x\|\le 1$
$e^x$	$\displaystyle\sum_{n=0}^{\infty}\frac{x^n}{n!}$	$e^x\to+\infty$
$p(x)=a_0+\cdots+a_Nx^N,\ N\ge 1$	(polynomial)	unbounded

Background & Prerequisites #

This section collects the core mathematical tools needed to engage seriously with Question 1.

Power Series and Entire Functions #

Definition 1 (Power Series & Radius of Convergence)

A power series centred at the origin is a formal series $\sum_{n=0}^{\infty}a_n x^n$ with $a_n\in\mathbb{R}$. Its radius of convergence is $$ R = \frac{1}{\limsup_{n\to\infty}|a_n|^{1/n}} \in [0,+\infty]. $$

Throughout this note we always assume $R=+\infty$, i.e., $\limsup_{n\to\infty}|a_n|^{1/n}=0$.

Definition 2 (Entire Function)

A function $f:\mathbb{C}\to\mathbb{C}$ is called entire if it is holomorphic on all of $\mathbb{C}$. Every power series with $R=+\infty$ defines a real-entire function, and by the identity theorem its complex extension is entire.

Theorem 1 (Cauchy–Hadamard)

The radius of convergence of $\sum a_n z^n$ equals $$ R = \Bigl(\limsup_{n\to\infty}|a_n|^{1/n}\Bigr)^{-1}. $$

Remark 1

The condition $R=+\infty$ is equivalent to $a_n = O(r^n/n!)$ for every $r>0$, i.e., the coefficients decay faster than any geometric sequence. This is the Paley–Wiener type condition for entire functions of order $1$.

Order and Type of Entire Functions #

Definition 3 (Order and Type)

The order of an entire function $f$ is $$ \rho = \limsup_{r\to\infty}\frac{\log\log M(r)}{\log r}, \qquad M(r)=\max_{|z|=r}|f(z)|. $$ The type $\sigma$ (for $0<\rho<\infty$) is $$ \sigma = \limsup_{r\to\infty}\frac{\log M(r)}{r^{\rho}}. $$

A bounded complex entire function has order $\rho=0$ (by Liouville’s theorem it must be constant), while a bounded real-valued entire function can be non-constant. Boundedness is therefore a genuinely real-variable phenomenon.

Liouville’s Theorem and Its Limitations #

Theorem 2 (Liouville)

Every bounded entire function $f:\mathbb{C}\to\mathbb{C}$ is constant.

Remark 2 (Why Liouville does not solve the problem)

Question 1 concerns real-valued functions $f:\mathbb{R}\to\mathbb{R}$. A function may be bounded on $\mathbb{R}$ while its complex extension is unbounded. For instance, $\cos z$ satisfies $|\cos z|\to\infty$ along the imaginary axis (since $\cos(iy)=\cosh y\to+\infty$). Liouville’s theorem therefore does not apply, and the problem is genuinely non-trivial.

Algebraic Structure of the Relevant Function Space #

Definition 4 (Space of Bounded Power Series)

Let $\mathcal{B}$ denote the set of all functions $f:\mathbb{R}\to\mathbb{R}$ that can be represented as a convergent power series $\sum_{n\ge 0}a_n x^n$ (with $R=+\infty$) and that are bounded on $\mathbb{R}$.

Proposition 1, Algebraic Properties of $\mathcal{B}$ (Rüdinger, 2009)

$\mathcal{B}$ is a linear subspace of $C^\infty(\mathbb{R})$: if $f,g\in\mathcal{B}$ and $\lambda\in\mathbb{R}$ then $f+\lambda g\in\mathcal{B}$.
$\mathcal{B}$ is closed under pointwise multiplication: if $f,g\in\mathcal{B}$ then $fg\in\mathcal{B}$.
$\mathcal{B}$ contains all functions of the form $c\cos(h(x))$, where $c\in\mathbb{R}$ and $h:\mathbb{R}\to\mathbb{R}$ is any entire function.

Remark 3

Part (3) follows from $\cos(h(x)) = \operatorname{Re}(e^{ih(x)})$ together with $|\cos(h(x))|\le 1$. The class is strictly larger than ${c\cos(bx):c,b\in\mathbb{R}}$; for example, $\cos(x^3-x)\in\mathcal{B}$.

Known Partial Results #

Necessary Conditions #

Proposition 2, Necessary Condition for Boundedness (Rüdinger, 2009)

Suppose $f(x)=\sum_{n=0}^{\infty}a_n x^n$ is bounded on $\mathbb{R}$. Then either:

$a_0$ is the only non-zero coefficient (i.e., $f$ is the constant function $f\equiv a_0$), or
there are infinitely many indices $n$ with $a_n\neq 0$, and the signs of the non-zero $a_n$ change infinitely often.

Remark 4

The sign-change condition is necessary: if the non-zero coefficients are eventually of one sign, the dominant-term comparison shows $f(x)\to\pm\infty$ as $x\to+\infty$ or $x\to-\infty$.

Corollary 1

Every non-constant polynomial is unbounded on $\mathbb{R}$.

Proof.

A polynomial has only finitely many non-zero coefficients. By Proposition 2 (1), the only bounded polynomial is the constant function. Any non-constant polynomial satisfies $|p(x)|\to\infty$ as $|x|\to\infty$.

The Sign-Change Condition Is Not Sufficient #

The condition of Proposition 2 is not sufficient, as the following examples show.

Example 1

Consider the geometric series $$ f(x) = \sum_{n=0}^{\infty}(-1)^n x^{2n} = \frac{1}{1+x^2}, \qquad |x|<1. $$ The coefficients alternate in sign, yet $R=1\neq+\infty$. One must first require $R=+\infty$ before the sign-change condition becomes meaningful.

For a subtler case with $R=+\infty$: take $a_n=(-1)^n/n!$, so $$ f(x) = \sum_{n=0}^{\infty}\frac{(-1)^n}{n!}x^n = e^{-x}. $$ The signs alternate, yet $e^{-x}\to+\infty$ as $x\to-\infty$.

Remark 5

The $e^{-x}$ example reveals the key gap: sign alternation of the coefficients does not prevent the function from growing in one direction, because the series for $e^{-x}$ reconstructs exponential growth in the negative half-line. A complete criterion must capture cancellation in both directions.

Connections to Entire Function Theory #

Theorem 3 (Borel–Carathéodory)

Let $f$ be holomorphic in $|z|\le R$. Then for $0<r<R$, $$ M(r) \le \frac{2r}{R-r}\sup_{|z|=R}\operatorname{Re}f(z) + \frac{R+r}{R-r},|f(0)|. $$

Remark 6

Borel–Carathéodory shows that the real part of a complex-valued entire function controls its modulus. For a real-valued function on $\mathbb{R}$ the analogous control is more delicate, since we only observe the function on a line, not on a disk.

Theorem 4 (Hadamard Factorisation)

Every entire function of finite order $\rho$ can be written as $$ f(z) = z^m e^{g(z)}\prod_{n=1}^{\infty} E_p!\left(\frac{z}{z_n}\right), $$ where $m\ge 0$, $p=\lfloor\rho\rfloor$, $g$ is a polynomial of degree $\le\rho$, and the $E_p$ are Weierstrass elementary factors.

Remark 7

A bounded real entire function of infinite order (if one exists) would not be directly covered by the Hadamard factorisation. Understanding the zero set and the exponential factor in $e^{g(z)}$ may be key to classifying all $f\in\mathcal{B}$.

The Open Sub-Question on the Generators of $\mathcal{B}$ #

Question 2 (Rüdinger, 2009)

Does $\mathcal{B}$ consist precisely of functions of the form $c\cos(h(x))$ and their linear combinations and products, where $h:\mathbb{R}\to\mathbb{R}$ is entire and $c\in\mathbb{R}$?

A positive answer would give an implicit characterisation via algebraic generators. A negative answer would require producing a bounded entire function on $\mathbb{R}$ that does not lie in the $\mathbb{R}$-algebra generated by ${\cos\circ, h : h\text{ entire}}$.

Remark 8

By Proposition 1 (3), every $c\cos(h(x))$ belongs to $\mathcal{B}$, and $\mathcal{B}$ is an algebra, so all products and sums remain in $\mathcal{B}$. What is unknown is whether every element of $\mathcal{B}$ arises this way. Note that $\sin x = \cos(x-\pi/2) \in \mathcal{B}$, so sine is already covered.

Research Directions and Conjectures #

Direction 1: Coefficient Growth Rate #

A promising approach is to examine the rate of decay of $|a_n|$, not just the sign pattern.

Question 3

Is there a decay condition on $|a_n|$, combined with the sign-change condition, that gives a sufficient criterion for $f\in\mathcal{B}$?

Approach. The Cauchy estimates give $|a_n| = |f^{(n)}(0)|/n!\le M(r)/r^n$ for all $r>0$. If $f\in\mathcal{B}$ with $|f|\le B$, the bound $|a_n|\le B/r^n$ holds for every $r>0$, but this recovers only the $R=+\infty$ condition. Is there a sharper constraint?

Direction 2: Fourier-Analytic Approach #

Every $f\in L^\infty(\mathbb{R})\cap L^2(\mathbb{R})$ possesses a square-integrable Fourier transform. If $f$ is also entire, Paley–Wiener forces the transform to be compactly supported. However, a generic $f\in\mathcal{B}$ may not lie in $L^2$ (e.g., $\cos x\notin L^2(\mathbb{R})$).

Question 4

Can the Fourier theory for tempered distributions give a necessary and sufficient condition for $f\in\mathcal{B}$ in terms of the spectral support of $f$?

Direction 3: Differential Equation Characterisation #

Bounded entire functions often arise as solutions to ODEs. For instance $y’’+y=0$ has bounded solutions $A\cos x + B\sin x$. More generally, $y’’+\omega(x)y=0$ with $\omega$ entire and bounded can produce bounded solutions.

Question 5

Characterise those linear differential operators $L$ with entire coefficients whose full solution space lies within $\mathcal{B}$.

Direction 4: Even/Odd Decomposition and Reduction #

Every $f\in\mathcal{B}$ splits as $f=f_e+f_o$ where $$ f_e(x)=\tfrac{1}{2}(f(x)+f(-x))=\sum_{k\ge 0}a_{2k}x^{2k} \quad\text{and}\quad f_o(x)=\tfrac{1}{2}(f(x)-f(-x))=\sum_{k\ge 0}a_{2k+1}x^{2k+1}. $$ Since $f_e(x)=g(x^2)$ for the entire function $g(t)=\sum_{k\ge 0}a_{2k}t^k$, boundedness of $f_e$ reduces to: is $g$ bounded on $[0,+\infty)$? This reduction may make the even and odd parts easier to study separately.

Direction 5: Polynomial Approximation and Numerics #

Question 6

If the partial sums $S_N(x)=\sum_{n=0}^{N}a_n x^n$ are uniformly bounded on growing intervals $[-R_N,R_N]$ (with $R_N\to\infty$), does it follow that $f\in\mathcal{B}$? Conversely, if $f\in\mathcal{B}$, how fast must $R_N$ grow relative to $N$ for the bound to hold?

Summary of Open Problems #

#	Statement
Q1	Give a necessary and sufficient condition on $(a_n)$ for $f=\sum a_n x^n$ to be bounded on $\mathbb{R}$.
Q2	Is $\mathcal{B}$ generated (as an algebra) precisely by ${c\cos(h(x)):h\text{ entire}}$?
Q3	Does a sharper decay condition on $
Q4	Can spectral-support (Paley–Wiener / distribution) theory characterise $\mathcal{B}$?
Q5	Which linear ODEs with entire coefficients have solution space $\subseteq\mathcal{B}$?
Q6	What is the precise relationship between truncation bounds on $[-R_N,R_N]$ and $f\in\mathcal{B}$?

References #

Ahlfors, L. V. (1979). Complex Analysis, 3rd ed. McGraw-Hill.
Boas, R. P. (1954). Entire Functions. Academic Press.
Conway, J. B. (1978). Functions of One Complex Variable, 2nd ed. Springer.
Levin, B. Ya. (1996). Lectures on Entire Functions. AMS Translations of Mathematical Monographs, vol. 150.
Rudin, W. (1976). Principles of Mathematical Analysis, 3rd ed. McGraw-Hill.
Rudin, W. (1987). Real and Complex Analysis, 3rd ed. McGraw-Hill.
Rüdinger, A. (2009). Criterion for boundedness of power series. Open Problem Garden. http://www.openproblemgarden.org/op/criterion_for_boundedness_of_power_series
Stein, E. M. and Shakarchi, R. (2003). Fourier Analysis: An Introduction. Princeton University Press.
Stein, E. M. and Shakarchi, R. (2010). Complex Analysis. Princeton University Press.
Titchmarsh, E. C. (1939). The Theory of Functions, 2nd ed. Oxford University Press.

Brezis' first open problem - An elliptic equation involving the critical exponent in 3D

Sat, 18 Apr 2026 00:00:00 +0000

Yamabe problem #

Yamabe problem: Suppose $(\mathcal{M}, g_0)$ is a compact closed Riemannian manifold with dimension $N \geq 3$, does there exist a conformal metric $g = u^{\frac{4}{N-2}}g_0$ which has constant scalar curvature $R_g \equiv C$?

Find $u > 0$ on $\mathcal{M}$ such that $$ -\frac{4(N-1)}{N-2}\Delta_{g_0}u + R_{g_0}u = Cu^{\frac{N+2}{N-2}}\qquad\text{on }\mathcal{M}. $$

Some results:

Trudinger [1968]: if $g$ has non-positive scalar curvature.
Aubin [1976]: $N \geq 6$ and $(\mathcal{M}, g)$ not locally conformally flat.
Schoen [1984]: any dimension, the remaining cases, assuming the Positive Mass Theorem by Schoen-Yau [1979].

A special case #

Consider the special case where $\mathcal{M}$ is a bounded domain $\Omega$ in $\mathbb{R}^{N}$: $$ \begin{cases} -\Delta u = u^{\frac{N+2}{N-2}}\qquad\text{in }\Omega, \\ u > 0\qquad\text{in }\Omega, \\ u = 0\qquad\text{on }\partial\Omega. \end{cases} $$

Pohozaev [1965]: if $\Omega$ is star-shaped, then there is no nontrivial solution.

Brezis-Nirenberg problem #

Consider a lower-order perturbation: $$ \begin{cases} -\Delta u = u^{\frac{N+2}{N-2}} + \lambda u\qquad\text{in }\Omega, \\ u > 0\qquad\text{in }\Omega, \\ u = 0\qquad\text{on }\partial\Omega. \end{cases} $$

Some results:

Pohozaev’s result also yields nonexistence when $\lambda \leq 0$ and $\Omega$ is star-shaped.
If a positive solution exists, then necessarily $\lambda < \lambda_1$, where $\lambda_1$ is the first eigenvalue of $-\Delta$ on $\Omega$ with zero Dirichlet boundary condition.

Hence, for positive solutions on star-shaped domains, $$ 0 < \lambda < \lambda_1. $$

Brezis’ Open Problem 1.1 #

Let $N=3$, and let $\Omega = B_1 \subset \mathbb{R}^3$ be the unit ball. Consider $$ \begin{cases} -\Delta u = u^5 + \lambda u \qquad \text{in } B_1, \\ u = 0 \qquad \text{on } \partial B_1. \end{cases} $$ We ask whether this problem admits a nontrivial positive solution $u \not\equiv 0$.

Here the exponent $5 = \frac{N+2}{N-2}$ is the critical Sobolev exponent when $N=3$, and this is exactly the source of the main compactness difficulty.

Let $\lambda_1$ be the first Dirichlet eigenvalue of $-\Delta$ on $B_1$. The classical Brezis-Nirenberg theory shows:

If $\lambda \leq 0$, then the only solution is $u \equiv 0$.
If $\frac{1}{4}\lambda_1 < \lambda < \lambda_1$, then there exists a positive radial solution.
If $0 < \lambda \leq \frac{1}{4}\lambda_1$, then any radial solution must be trivial; hence there is no positive radial solution.
If $\lambda > \lambda_1$, there exist sign-changing solutions, but no positive solution.

Therefore the unresolved case is:

Open Problem 1.1. Assume $$ 0 < \lambda \leq \frac{1}{4}\lambda_1. $$ Does there exist a nontrivial solution?
Equivalently, since no positive radial solution can exist in this range, can there exist a non-radial positive solution?

This problem has remained open for decades, even if one restricts further to a smaller interval such as $$ 0 < \lambda < \varepsilon $$ for some sufficiently small $\varepsilon > 0$.

Remarks #

A few points are worth emphasizing:

By the Gidas-Ni-Nirenberg symmetry principle, positive solutions on a ball are often expected to be radial; however, in this regime Brezis observed that any radial solution must vanish, so any eventual positive solution would have to be genuinely non-radial.
This makes dimension $3$ sharply different from higher-dimensional cases, where the Brezis-Nirenberg existence theory is better understood.
The bifurcation picture suggests branches of sign-changing non-radial solutions emerging from higher eigenvalues, but it is not known whether such branches can reach the interval $\left(0,\frac14\lambda_1\right]$.

References #

H. Brezis and L. Nirenberg, Positive solutions of nonlinear elliptic equations involving critical Sobolev exponents, Comm. Pure Appl. Math. 36 (1983), 437–477.
H. Brezis, Some of My Favorite Open Problems, Open Problem 1.1.
M. Comte, Solutions of elliptic equations with critical Sobolev exponent in dimension three, Nonlinear Anal. 17 (1991), 445–455.
O. Druet, Elliptic equations with critical Sobolev exponents in dimension 3, Ann. Inst. H. Poincaré Anal. Non Linéaire 19 (2002), 125–142.

Recent Advances in KAN-Based Numerical PDE Solvers

Mon, 30 Mar 2026 00:00:00 +0000

Kolmogorov-Arnold Networks (KANs), introduced in 2024, have rapidly become one of the most active frontiers in scientific machine learning for solving partial differential equations (PDEs) (Liu et al., 2024). Unlike Multi-Layer Perceptrons (MLPs), which apply fixed activation functions at nodes, KANs place learnable univariate activation functions on edges, grounded in the Kolmogorov-Arnold representation theorem: every continuous multivariate function can be expressed as a composition of univariate functions and summations. This structural difference gives KANs two key properties relevant to PDE numerics — higher interpretability and parameter efficiency — making them an appealing successor to MLP-based Physics-Informed Neural Networks (PINNs).

From 2024 through early 2026, researchers have published dozens of frameworks combining KANs with classical numerical concepts (spectral methods, operator learning, energy-stable time-stepping, neural operators) and targeting problems ranging from single PDEs to high-dimensional systems with hundreds of variables.

Overview #

The KAN-for-PDEs landscape organises into several interrelated research threads:

Physics-Informed KAN Frameworks (PIKANs / KINN) — direct replacements of MLP layers in PINNs with KAN layers, using strong, energy, and inverse PDE formulations.
Spectral-Basis and Wavelet-Enriched KANs — embedding orthogonal polynomial or wavelet bases to combat spectral bias.
KAN-Based Neural Operators — KAN sub-networks inside DeepONet, FNO, and pseudo-differential operator frameworks for learning PDE solution maps.
Time-Dependent and Evolutionary KANs — energy-stable schemes, KAN-ODEs, and moving-boundary solvers.
Discontinuities, Shock Waves, and Turbulence — specialised architectures for sharp transitions.
High-Dimensional PDEs — separable and tensor-product KAN surrogates scaling to hundreds of dimensions.
Data-Driven Discovery and Inverse Problems — interpretability-driven model identification.

Architecture	Key Strength	Representative Work
KINN	Forward/inverse problems, strong/energy/inverse forms	Wang et al., 2024
ChebPIKAN	Fluid mechanics PDEs, orthogonal basis	Cui et al., 2024
KANO	Symbolic operator recovery, variable-coefficient PDEs	arXiv:2509.16825
EvoKAN	Long-horizon time evolution, energy stability	arXiv:2503.01618
Anant-KAN	High-dimensional PDEs (up to 300D)	arXiv:2505.03595
DPINN	Shock waves and discontinuities	arXiv:2507.08338

Background #

The Kolmogorov-Arnold Representation Theorem #

The theoretical foundation of KANs is the Kolmogorov-Arnold theorem: any continuous function $f: [0,1]^n \to \mathbb{R}$ can be written as

$$f(x_1, \ldots, x_n) = \sum_{q=0}^{2n} \Phi_q!\left(\sum_{p=1}^{n} \phi_{q,p}(x_p)\right),$$

where $\phi_{q,p}: [0,1] \to \mathbb{R}$ and $\Phi_q: \mathbb{R} \to \mathbb{R}$ are univariate continuous functions. In contrast to MLPs — where activations are fixed and weights are learned — KANs parameterise the activation functions themselves (typically as B-splines or orthogonal polynomials) on each edge of the network graph.

Physics-Informed Neural Networks (PINNs) — The Starting Point #

PINNs (Raissi, Perdikaris, & Karniadakis, 2019) embed physical laws directly into the neural network loss function. For a PDE $\mathcal{N}[u] = f$ on domain $\Omega$ with boundary condition $\mathcal{B}[u] = g$ on $\partial\Omega$, the PINN loss is

$$\mathcal{L} = \underbrace{\frac{1}{N _r}\sum _{i=1}^{N _r}|\mathcal{N}[u _\theta](x _i)|^2} _{\text{PDE residual}} + \underbrace{\frac{1}{N _b}\sum _{j=1}^{N _b}|\mathcal{B}[u _\theta](x _j) - g(x _j)|^2} _{\text{boundary condition}}.$$

The substitution of MLP layers with KAN layers in this framework is the basic idea behind all PIKAN architectures.

Recent Developments #

1. Physics-Informed KAN Frameworks #

KINN — The Foundational Framework #

The Kolmogorov-Arnold-Informed Neural Network (KINN) is the primary physics-informed framework replacing MLP layers in PINNs with KAN layers (Wang et al., 2024). KINN supports three PDE formulations: the strong form (collocating the PDE residual directly), the energy form (minimising a variational energy functional), and the inverse form (recovering unknown parameters from observations).

Systematic benchmarks demonstrate that KINN significantly outperforms MLP-based PINNs in accuracy and convergence speed for multi-scale problems, stress concentration, singularities, nonlinear hyperelasticity, and heterogeneous materials. The one domain where MLP remains competitive is complex geometry problems. Published in Computer Methods in Applied Mechanics and Engineering (2024), KINN has become the canonical reference for subsequent KAN-PDE research.

Chebyshev and Polynomial Basis PIKANs #

A major architectural refinement has been substituting B-spline basis functions with orthogonal polynomial bases. The ChebPIKAN model leverages orthogonality of Chebyshev polynomials and integrates physics-informed loss functions for fluid-mechanics PDEs including the Allen-Cahn, Burgers, Helmholtz, Kovasznay flow, cylinder wake flow, and cavity flow equations (Cui et al., 2024). ChebPIKAN significantly outperforms vanilla KAN by embedding essential physical information and alleviating overfitting.

The AC-PKAN (Attention-Enhanced Chebyshev PKAN) further addresses the rank collapse problem in Chebyshev-based KANs by integrating wavelet-activated MLPs with an internal attention mechanism, provably preserving a full-rank Jacobian and approximating PDEs of arbitrary order (arXiv:2505.08687). An external Residual Gradient Attention (RGA) mechanism dynamically re-weights individual loss terms based on gradient norms, stabilising training of stiff PDE systems.

The Legendre-KAN method applies Legendre polynomial orthogonality to solve the fully nonlinear Monge-Ampère equation with Dirichlet boundary conditions, demonstrating effectiveness on both smooth and singular solutions across various dimensions and in the optimal transport problem.

Hybrid KAN–MLP and Augmented Lagrangian Approaches #

The AL-PKAN introduces a hybrid encoder-decoder architecture where the decoder maps hidden variable features from high-dimensional latent space into trainable univariate activation functions via KAN (Zhang et al., 2025). An augmented Lagrangian function treats penalty factors and Lagrangian multipliers as learnable parameters to dynamically balance constraint terms. This approach typically improves prediction accuracy by one to two orders of magnitude compared to traditional neural networks.

The HPKM-PINN combines MLP and KAN branches with a trainable convex mixing parameter to blend features optimally across subdomains, especially effective for multi-scale problems.

2. Spectral-Basis and Wavelet-Enriched KANs #

Wav-KAN incorporates wavelet functions into the KAN structure, capturing both high-frequency and low-frequency components via continuous dyadic wavelet transforms for multiresolution analysis. This directly addresses the spectral bias problem inherent in standard neural networks, which struggle to resolve high-frequency features in PDE solutions.

PIKANs have been extended to multi-resolution spectral hybridisations (HWF-PIKAN), combining wavelet and Fourier features to explicitly counteract spectral bias and accelerate convergence for advection-dominated and kinetic equations.

A unified benchmark published in February 2026 provides a systematic, controlled comparison between MLP-based PINNs and KAN-based PIKANs across a representative collection of ODEs and PDEs (arXiv:2602.15068). The results show that PIKANs consistently achieve more accurate solutions, converge in fewer iterations, and yield superior gradient estimates.

3. KAN-Based Neural Operators #

Neural operators learn mappings between infinite-dimensional function spaces, enabling generalisation across families of PDEs. KANs are increasingly embedded in operator architectures.

DeepOKAN replaces MLP sub-networks in the Deep Operator Network (DeepONet) framework with KAN sub-networks using Gaussian Radial Basis Functions (Abueidda et al., 2024). The branch and trunk networks of DeepONet are re-implemented as RBF-KAN layers. Evaluated on 1D sinusoidal waves, 2D orthotropic elasticity, and transient Poisson problems, DeepOKAN consistently achieves lower training losses and more accurate predictions compared to standard DeepONet.

PO-CKAN (Physics-informed Deep Operator KAN with Chunk Rational Structure) integrates PDE residual loss into a DeepONet-style branch–trunk architecture using Chunkwise Rational KAN sub-networks (arXiv:2510.08795). On Burgers’ equation with viscosity $\nu = 0.01$, PO-CKAN reduces mean relative $L^2$ error by approximately 48% compared to PI-DeepONet.

KANO (Kolmogorov-Arnold Neural Operator) is the most theoretically ambitious framework, jointly parameterising operators in both spectral and spatial bases within a pseudo-differential operator framework (arXiv:2509.16825). KANO overcomes the pure-spectral bottleneck of Fourier Neural Operators (FNO): while FNO remains practical only for spectrally sparse operators, KANO remains expressive over generic variable-coefficient PDEs. Crucially, KANO achieves symbolic recovery of the learned operator, enabling closed-form extraction of governing equations. On the quantum Hamiltonian learning benchmark, KANO attains state infidelity $\approx 6 \times 10^{-6}$ compared to FNO’s $\approx 1.5 \times 10^{-2}$.

KAN-ONets embeds adaptive, learnable B-spline activations from KAN into FNO (yielding FNO-KAN for uniform grids) and into the attention-based GNOT (yielding GNOT-KAN for arbitrary grids). Across seven challenging PDE benchmarks, KAN-ONets achieves MSE reductions of 10.2–30.2% compared to existing models.

4. Time-Dependent and Evolutionary KANs #

EvoKAN (Evolutionary Kolmogorov-Arnold Network, March 2025) introduces a novel paradigm: rather than retraining repeatedly, EvoKAN encodes only the PDE’s initial state during an initial learning phase, then evolves the network parameters numerically, governed by the same PDE (arXiv:2503.01618). KAN weights are treated as time-dependent functions updated through time steps, enabling prediction over arbitrarily long time horizons.

EvoKAN integrates the Scalar Auxiliary Variable (SAV) method to guarantee unconditional energy stability: at each time step, SAV requires only solving decoupled linear systems with constant coefficients. EvoKAN has been validated on the 1D and 2D Allen-Cahn equations (phase-field phenomena with sharp interfaces) and the 2D Navier-Stokes equations (turbulent flows), closely matching analytical references.

KAN-ODEs apply KANs as the backbone of neural ordinary differential equation (ODE) frameworks, enabling data-driven discovery of governing dynamics with greater interpretability compared to MLP-based neural ODEs (arXiv:2407.04192).

Shallow-KAN addresses Stefan-type moving boundary problems (melting, solidification) by approximating the temperature distribution and moving interface while enforcing governing PDEs, phase equilibrium, and the Stefan condition through physics-informed residuals (arXiv:2601.09818). A key finding is that two hidden layers with tens of learnable parameters suffice — far fewer than the nearly one million parameters required by standard MLP-based PINNs for the same problem.

5. Discontinuities, Shock Waves, and Turbulence #

A known weakness of smooth neural networks is difficulty resolving sharp spatial transitions and discontinuities such as shock waves. Two specialised frameworks address this:

DPINN (Discontinuity-aware PINN) incorporates a discontinuity-aware KAN for modelling shock-wave properties, combined with an adaptive Fourier-feature embedding layer to mitigate spectral bias, mesh transformation for complex geometries, and learnable local artificial viscosity to stabilise the algorithm near discontinuities (arXiv:2507.08338). Numerical experiments on the inviscid Burgers’ equation and transonic/supersonic airfoil flows demonstrate superior accuracy over existing methods.

A Physics-Infused KAN for Turbulence (2026) targets turbulent flow prediction integrated with CFD, applying KAN within the Reynolds-Averaged Navier-Stokes (RANS) framework. It addresses the information bottleneck phenomenon in multi-output KANs and proposes pruning-based network optimisation, achieving high prediction accuracy for Navier-Stokes equations.

6. High-Dimensional PDEs and the Curse of Dimensionality #

High-dimensional PDEs (tens to hundreds of dimensions) are where conventional numerical methods completely fail due to exponential cost scaling. KAN has shown early promise here.

Anant-Net (2025) is a scalable neural surrogate employing a tensor product formulation with dimension-wise sweeps and selective automatic differentiation (arXiv:2505.03595). Benchmarked on the Poisson, Sine-Gordon, Allen-Cahn, and transient heat equations, Anant-Net solves PDEs in up to 300 dimensions on a single GPU within a few hours. The framework includes Anant-KAN, an interpretable KAN-based variant offering deeper insights into the learned solution structure.

Separable PIKANs (SPIKANs) decompose the PDE solution into products of one-dimensional KAN networks, drastically reducing computational complexity for high-dimensional problems while retaining accuracy and interpretability.

7. Data-Driven Discovery and Inverse Problems #

KANs are especially powerful for scientific discovery tasks where interpretability of the learned function is critical.

Data-driven model discovery with KANs has been demonstrated on complex dynamical systems — including the Ikeda map and optical-cavity systems — where sparse optimisation methods fail due to non-sparse governing equations (arXiv:2409.15167). KAN captures complex behaviour while offering interpretability through its edge-wise univariate functions, providing insight into governing dynamics inaccessible in black-box MLPs.

PI-KAN-PointNet extends PIKAN to simultaneously solve inverse problems over multiple irregular geometries within a single training run, demonstrated on natural convection over 135 geometries with sparse data. KINN for Inverse Problems enables identification of unknown material parameters in heterogeneous or hyperelastic materials from partial observations. KANHedge applies KANs to high-dimensional BSDE solvers for option pricing, demonstrating improved hedging performance over MLP-based deep BSDE solvers (arXiv:2601.11097).

8. Comparative Analysis: KAN vs. MLP for PDEs #

A comprehensive comparison between MLP and KAN representations for differential equations establishes nuanced findings (arXiv:2406.02917):

Architecture	Shallow Networks	Deep Networks	Robustness	Interpretability
KAN (B-spline)	Superior accuracy	Comparable to MLP	Lower (may diverge with different seeds)	High — symbolic extraction possible
KAN (Chebyshev/Legendre)	High accuracy	Competitive	Moderate — rank collapse risk	High
MLP/PINN	Moderate accuracy	Robust	High	Low
PIKAN (optimised)	Superior	Superior or comparable	Moderate	High

Key findings: KANs in shallow settings significantly outperform MLPs, leveraging per-edge nonlinear expressiveness. In deep settings, KANs do not consistently outperform MLPs, but when properly optimised (e.g., with L-BFGS or Self-Scaled Broyden second-order optimisers), they achieve superior accuracy. JAX-based PIKAN implementations have achieved up to 84× training speedup over original NumPy/PyTorch KANs.

Open Problems #

Despite rapid progress, several challenges remain:

Computational cost. Spline function evaluation involves multiple iterations, making KANs significantly slower per parameter than MLPs. Variants like PowerMLP propose more efficient formulations (arXiv:2412.13571), but a satisfactory solution to raw training speed at scale is still outstanding.

Scalability to complex geometries. KINN and standard PIKANs underperform MLPs on irregular geometry problems. This remains a practical bottleneck for engineering applications involving complex domains.

Gradient instability in deep KANs. Deep PIKANs face vanishing/exploding gradient challenges, motivating Glorot-like initialisation strategies and residual-gated architectures.

Theoretical guarantees. Generalisation bounds for KANs trained on PDE collocation have been studied — bounds scale with $\ell_1$ norms of spline coefficients — but practical understanding of how architecture choices affect convergence and generalisation remains incomplete (arXiv:2410.08026).

Operator learning completeness. While KANO achieves symbolic operator recovery, the theoretical relationship between KAN architecture depth/width and approximation of PDE solution operators is still under active development.

The trajectory is clear: KAN-based PDE solvers are moving from proof-of-concept demonstrations on canonical benchmarks toward production-ready frameworks for engineering simulation, turbulence modelling, inverse problems, and high-dimensional scientific computing. The combination of interpretability, parameter efficiency, and growing theoretical foundations positions KANs as a genuinely transformative architecture for numerical PDEs.

References #

Abueidda, D. W., Pantidis, P., & Mobasher, M. E. (2024). DeepOKAN: Deep operator network based on Kolmogorov Arnold networks for mechanics problems. arXiv:2405.19143. https://www.alphaxiv.org/overview/2405.19143v3

Cui, Z., et al. (2024). Physics-informed Kolmogorov–Arnold network with Chebyshev polynomials for fluid mechanics. Physics of Fluids, 37(9), 095120. https://pubs.aip.org/aip/pof/article-abstract/37/9/095120/3361431

Knottenbelt, W., et al. (2026). KANHedge: Efficient hedging of high-dimensional options using Kolmogorov-Arnold network-based BSDE solver. arXiv:2601.11097. https://arxiv.org/abs/2601.11097

Kovachki, N., et al. (2023). Neural operator: Learning maps between function spaces with applications to PDEs. Journal of Machine Learning Research, 24(89), 1–97.

Li, Z., et al. (2025). Discontinuity-aware KAN-based physics-informed neural networks. arXiv:2507.08338. https://arxiv.org/html/2507.08338v1

Liu, Z., et al. (2024). KAN: Kolmogorov–Arnold Networks. arXiv:2404.19756. https://storage.prod.researchhub.com/uploads/papers/2024/05/04/2404.19756.pdf

Liu, Z., et al. (2024). A comprehensive and FAIR comparison between MLP and KAN representations for differential equations and operator networks. arXiv:2406.02917. https://arxiv.org/abs/2406.02917

Liu, Z., et al. (2026). A unified benchmark of physics-informed neural networks and Kolmogorov-Arnold networks. arXiv:2602.15068. https://arxiv.org/html/2602.15068v1

Peng, W., et al. (2025). KANO: Kolmogorov-Arnold Neural Operator. arXiv:2509.16825. https://arxiv.org/abs/2509.16825

Shukla, K., et al. (2025). Anant-Net: Breaking the curse of dimensionality with scalable and interpretable neural surrogates for high-dimensional PDEs. arXiv:2505.03595. https://arxiv.org/html/2505.03595v3

Tang, K., et al. (2025). AC-PKAN: Attention-enhanced and Chebyshev polynomial-based Kolmogorov-Arnold networks. arXiv:2505.08687. https://arxiv.org/html/2505.08687v2

Wang, Z., et al. (2025). EvoKAN: Energy-dissipative evolutionary Kolmogorov-Arnold networks for complex PDE systems. arXiv:2503.01618. https://arxiv.org/abs/2503.01618

Wang, Z., et al. (2024). Kolmogorov–Arnold-Informed neural network: A physics-informed deep learning framework for solving forward and inverse problems based on Kolmogorov–Arnold Networks. Computer Methods in Applied Mechanics and Engineering. arXiv:2406.11045. https://www.sciencedirect.com/science/article/abs/pii/S0045782524007722

Xu, Y., et al. (2026). Shallow-KAN based solution of moving boundary PDEs. arXiv:2601.09818. https://arxiv.org/html/2601.09818v1

Yang, L., et al. (2025). KAN-ODEs: Kolmogorov-Arnold network ordinary differential equations for learning dynamical systems and hidden physics. arXiv:2407.04192. https://arxiv.org/html/2407.04192v1

Zhang, Z., et al. (2025). Physics-informed neural networks with hybrid Kolmogorov-Arnold networks. PMC. https://pmc.ncbi.nlm.nih.gov/articles/PMC11950322/

Zuo, Q., et al. (2025). Data-driven model discovery with Kolmogorov-Arnold networks. arXiv:2409.15167. https://arxiv.org/abs/2409.15167

Recent Advances in Numerical PDEs

Mon, 30 Mar 2026 00:00:00 +0000

Numerical methods for partial differential equations (PDEs) have entered a period of rapid transformation, driven by two converging forces: deep learning’s maturation as a tool for high-dimensional function approximation, and the resurgence of classical methods augmented by machine learning. The field broadly divides into physics-informed machine learning, neural operator learning, foundation models for PDEs, and the continuing evolution of classical high-order, structure-preserving, and data-driven discovery methods. Quantum computing and laser-based hardware solvers are also beginning to enter the landscape. This survey organises the most active research fronts, highlights landmark and recent key papers, and identifies open problems as of early 2026.

Overview #

The table below summarises the major approaches covered in this survey, their representative key papers, and their current status.

Approach	Representative Key Papers	Status
PINNs (adaptive/staged training)	Raissi et al. (2019); IEEE 2025 staged training; PhysicsNeMo/Modulus	Production-ready
KANs for PDEs	Liu et al. (2024, ICLR 2025); KINN; PI-KAN; HRKANs	Active frontier
Fourier Neural Operators	Li et al. (2020); O-FNO (2025); ReBA accelerator	Widely adopted
DeepONet variants	Lu et al. (2019); L-DeepONet; Hybrid KAN-DeepONet; Quantum DeepONet	Mature + expanding
PDE Foundation Models	Poseidon; OmniArch; PDEformer; Geo-NeW	Emerging (2024–2026)
Deep BSDE & high-dimensional	Han, Jentzen, & E (PNAS 2018); Deep Shotgun; DRDM; Heun-BSDE	Active
Data-driven PDE discovery	SINDy (Brunton et al.); GN-SINDy; Evo-SINDy; Bayesian-SINDy	Active
Structure-preserving methods	Hairer et al. (2006); Stochastic multisymplectic; Geo-NeW	Maturing
High-order FEM/DG	hp-DGFEM Boltzmann; ML-accelerated FEM; FEX-PG	Mature + augmented
Fractional PDEs	Review (2024); O-FNO for fractional Poisson; Fractional Laplacian meshfree	Active
Hamilton–Jacobi PDEs	Review arXiv:2502.20833; Actor-critic NN; Deep BSDE for HJB	Active
Multiscale / ROM	MLP-based multiscale; POD-DL-ROM; Multi-fidelity ROM	Active
Uncertainty quantification	QMC/RQMC; PDE-DKL	Active
Quantum computing	Schrödingerisation; H-DES (ColibriTD); Quantum DeepONet	Early-stage
Photonic/analog solvers	LightSolver LPU	Very early-stage

Background #

The Classical PDE Problem #

A general PDE on a domain $\Omega \subseteq \mathbb{R}^d$ takes the form

$$\mathcal{N} [u] (x) = f(x), \quad x \in \Omega, \qquad \mathcal{B} [u] (x) = g(x), \quad x \in \partial \Omega,$$

where $\mathcal{N}$ is a (possibly nonlinear) differential operator, $\mathcal{B}$ encodes boundary or initial conditions, and $u: \Omega \to \mathbb{R}$ is the unknown. Classical mesh-based methods — finite element (FEM), finite difference (FDM), finite volume (FVM), and spectral methods — discretise $\Omega$ into $N$ degrees of freedom and solve a resulting algebraic system. Their complexity typically scales as $O(N^\alpha)$ for some $\alpha \geq 1$, and in $d$ dimensions $N \sim h^{-d}$ for mesh spacing $h$, leading to exponential cost as $d$ grows.

The Deep Learning Turn #

The 2019 PINN paper by Raissi, Perdikaris, and Karniadakis, and the 2020 FNO paper by Li et al., triggered an explosion of mesh-free and operator-learning approaches. Rather than discretising $\Omega$, these methods parameterise $u$ (or the solution operator $\mathcal{N}^{-1}$) as a neural network and minimise a physics-informed or data-driven loss. The key advantages are mesh-free flexibility, natural handling of inverse problems, and — in the operator-learning setting — the ability to generalise across PDE instances.

Recent Developments #

1. Physics-Informed Neural Networks (PINNs) and Variants #

PINNs, introduced by Raissi, Perdikaris, and Karniadakis (2019), embed physical laws directly into the neural network loss function as residual terms of the form $\mathcal{L}_{\text{phys}} = |f(\hat{u})|^2$, supplemented by data, boundary, and initial condition constraints. Their appeal lies in a mesh-free design that handles irregular geometries and inverse problems naturally. Yet PINN training is notoriously fragile — subject to spectral bias, loss imbalance, and stiffness — motivating a rich line of training improvements.

Staged training strategies. A 2025 IEEE paper proposes a two-stage process: a short-time pretraining phase followed by extension to the full time domain, combined with uncertainty-guided sampling. This significantly improves accuracy and efficiency for time-dependent PDEs compared to standard PINNs (IEEE, 2025).

Evolutionary optimisation of PINNs. A 2025 arXiv paper introduces evolutionary optimisation to tune PINN architectures, improving robustness when data are scarce by complying with physical laws through training loss (arXiv:2501.06572).

Automatic structure discovery via knowledge distillation. A 2025 Nature Communications paper proposes a physics-informed distillation framework that decouples physical and parameter regularisation in teacher–student networks, then uses clustering and parameter reconstruction to embed physically meaningful structures. Experiments on Laplace, Burgers, Poisson, and fluid mechanics equations show improved accuracy, training efficiency, and transferability (arXiv:2502.06026).

Production-ready frameworks include PhysicsNeMo/Modulus (CUDA-optimised kernels with 4× speedups) and DeepXDE, which support adaptive weighting schemes, curriculum learning, intelligent residual point sampling, and domain decomposition for stiff problems.

2. Kolmogorov–Arnold Networks (KANs) for PDEs #

Proposed by Liu, Wang, Vaidya et al. (2024, accepted ICLR 2025), KANs replace fixed activation functions at MLP nodes with learnable spline-parameterised functions on each edge. This change — inspired by the Kolmogorov-Arnold representation theorem — provides faster neural scaling laws, improved interpretability, and comparable or better accuracy with far fewer parameters, especially for scientific AI tasks. The major PINN-KAN hybrid architectures are as follows:

Architecture	PDE focus	Key claim
KINN	Solid mechanics, multi-scale, singularities	Significantly outperforms MLP-PINNs in accuracy and convergence speed
PI-KAN	Navier–Stokes (forward)	High prediction accuracy; addresses information bottleneck
HRKANs	Poisson, Burgers	Highest fitting accuracy, lowest training time vs. KAN and ReLU-KAN
PIKANs (adaptive grid)	Forward PDE problems	Up to 84× faster training; adaptive state transition reduces $L^2$ error by 43%
EvoKAN	Complex PDE systems	Energy-dissipative; encodes only the initial state, avoiding retraining
KAN-ODEs	Schrödinger, Allen–Cahn, dynamical systems	Improved performance over Neural ODEs in discovering hidden physics

KANs are also being used inside DeepONet branch/trunk networks for hybrid neural operator surrogates in porous media flows, including Darcy flow and 2D/3D multiphase problems (arXiv:2511.02962). For a deeper treatment of KAN architectures for PDEs, see the companion post in this series.

3. Neural Operator Learning #

Neural operators learn mappings between infinite-dimensional function spaces — enabling resolution-invariant, discretisation-agnostic PDE solvers. The two dominant architectures are the Fourier Neural Operator (FNO) and Deep Operator Networks (DeepONet).

FNO applies global convolution in Fourier space, giving resolution invariance and fast inference. The 2025 Optimised FNO (O-FNO) integrates residual connections and enhanced spectral resolution for the 2D fractional Poisson equation, achieving over 98% test accuracy and outperforming both base FNO and DeepONet. A hardware/algorithm co-design chip, ReBA, implements the Galerkin Transformer achieving 34.57× speedup over CPUs and up to 51.26× over prior accelerators (IEEE, 2025).

DeepONet’s branch-trunk architecture excels under noise and complex geometries where FNO degrades. Recent extensions include multi-fidelity physics-guided DeepONet (2025), Fusion DeepONet for hypersonic flow predictions on arbitrary grids (arXiv:2501.01934), and Latent-space DeepONet (L-DeepONet) (Nature Communications, 2024), which outperforms all other neural operators with small latent dimensions ($d \leq 100$), enabling real-time high-dimensional predictions. Ensemble and Mixture-of-Experts DeepONets achieve 2–4× lower relative $\ell_2$ errors through basis enrichment and spatial locality (arXiv:2405.11907). Taylor Mode Neural Operators provide an order-of-magnitude speed-up for DeepONet and 8× for FNO in computing high-order derivatives via Taylor-mode automatic differentiation.

Graph Neural Operator Methods. The GOLA framework (2025) addresses the limitation of regular-grid assumptions by constructing graphs from irregularly sampled spatial points with a Fourier-based encoder for learnable complex-coefficient embeddings, outperforming baselines in data-scarce regimes across 2D Darcy, Advection, Eikonal, and Nonlinear Diffusion problems (arXiv:2505.18923).

4. Foundation Models for PDEs #

Inspired by the success of LLMs, PDE foundation models represent a paradigm shift: large transformers pre-trained on diverse physical systems that can be fine-tuned for downstream tasks with minimal data.

Poseidon (ETH Zurich, 2024) is a multiscale operator transformer with time-conditioned layer norms, enabling continuous-in-time evaluation. Pre-trained on diverse physical systems, it exploits the semigroup property of time-dependent PDEs for significant data scaling (arXiv:2405.19101).

OmniArch (ICML 2025) is the first multi-scale and multi-physics scientific computing foundation model, featuring a Fourier encoder-decoder and transformer backbone with a PDE-Aligner for physics-informed fine-tuning. It achieves unified 1D-2D-3D pre-training on PDEBench and demonstrates zero-shot learning on new physics.

PDEformer (2025) represents PDEs as computational graphs integrating symbolic and numerical information; a graph transformer with implicit neural representation enables mesh-free predictions with zero-shot accuracy comparable to specialist models (arXiv:2402.12652).

Multimodal PDE Foundation Model (UCLA, 2025) integrates both numerical inputs (equation parameters, initial conditions) and text descriptions. It achieves average relative error below 3.3% in-distribution and generates interpretable scientific text — bridging NLP and scientific computing (arXiv:2502.06026).

Physics-informed fine-tuning (arXiv:2603.15431, 2026) establishes that hybrid fine-tuning (combining physics-informed and data-driven objectives) achieves superior extrapolation to downstream tasks and enables data-free learning of unseen PDE families.

Geo-NeW (arXiv:2602.02788, Feb 2026) — General-Geometry Neural Whitney Forms — is a data-driven finite element method jointly learning differential operators and compatible finite element spaces on the geometry. It exactly preserves physical conservation laws via Finite Element Exterior Calculus, with state-of-the-art performance on out-of-distribution geometries.

5. Deep Learning for High-Dimensional PDEs #

Classical mesh-based methods suffer exponential complexity growth in dimension $d$. Three principal deep learning paradigms address this.

The Deep BSDE method (Han, Jentzen, & E, PNAS, 2018) reformulates semilinear parabolic PDEs using backward stochastic differential equations (BSDEs) and learns the gradient of the solution with neural networks, enabling solution of PDEs in hundreds to thousands of dimensions. A 2025 review by the original authors traces subsequent advances. Key recent improvements include:

Deep Shotgun Method (J. Sci. Comput., 2025): avoids full trajectory simulation, using only data distribution, achieving results up to dimension 10,000 (Springer, 2025).
XNet-enhanced Deep BSDE (2025): a new network architecture with fewer parameters, significantly improving computational efficiency and accuracy (arXiv:2502.06238).
Deep Random Difference Method (DRDM) (2025): approximates the convection-diffusion operator using only first-order differences, avoiding Hessian computations, with proved first-order accuracy in time step $h$ (arXiv:2506.20308).
Stratonovich-based BSDE with Heun integration (2025): identifies that Euler-Maruyama discretisation bias is the root cause of BSDE underperformance relative to PINNs; Heun integration eliminates this bias and achieves competitive results across high-dimensional benchmarks (arXiv:2505.01078).

The Deep Ritz method (E & Yu, 2018) minimises energy functionals using neural networks. Extensions to multiscale problems leverage scale convergence theory to derive $\Gamma$-limits of oscillatory energy functionals.

The Full History Recursive Multilevel Picard (MLP) methodology — combining Picard iterations with multilevel Monte Carlo — was the first method proven to overcome the curse of dimensionality for semilinear parabolic PDEs and remains one of very few methods with such proven guarantees.

PDE-DKL (2025) combines deep learning for low-dimensional latent representations with Gaussian Processes for kernel regression under explicit PDE constraints, providing both high accuracy and principled uncertainty quantification in limited-data regimes (arXiv:2501.18258).

6. Classical High-Order Methods: FEM, DG, and Spectral #

Despite the deep learning surge, classical methods continue to mature, particularly in rigorous error analysis and efficiency.

The hp-version DG finite element method for the Boltzmann transport problem (J. Sci. Comput., 2024) achieves arbitrary-order convergence rates and handles polytopic elements, enabling efficient parallel implementation within existing multigroup discrete ordinates software. High-order DG methods for unsteady compressible flows — targeting acoustic waves, turbulence, and magnetohydrodynamics — benefit from block-diagonal mass matrices allowing efficient explicit time-stepping.

A systematic 2024 approach uses neural networks to learn the element-wise solution map of PDEs, accelerating finite element-type methods in an “element neural network” paradigm that generalises across element geometries. Machine learning-based spectral methods combine orthogonal function expansions (Fourier, Legendre) with deep neural operator learning for highly accurate solutions with fewer grid points.

FEX-PG (2024) solves high-dimensional partial integro-differential equations using parameter grouping to reduce coefficient count and Taylor series approximation for integral terms, achieving relative errors on the order of single-precision machine epsilon while providing interpretable, explicit solution formulas absent from most DL methods (arXiv:2410.00835).

7. Structure-Preserving Numerical Methods #

Structure-preserving methods retain intrinsic properties of the continuous system — symplecticity, energy conservation, divergence-free constraints — at the discrete level. They enhance numerical stability and long-term accuracy, ensuring computed solutions respect the underlying mathematical structure.

Recent research encompasses geometric integrators and mimetic discretisations for conservative finite element, difference, and volume schemes; stochastic multisymplectic PDEs and their structure-preserving discretisations (Studies in Applied Mathematics, 2025); and structure-preserving learning via the Geo-NeW model, which exactly preserves physical conservation laws through Finite Element Exterior Calculus. A 2024 University of Maryland workshop identified integration of structure-preserving methods with uncertainty quantification as a key open problem.

8. Data-Driven PDE Discovery #

SINDy and its extensions use sparse regression over a dictionary of candidate functions. GN-SINDy (2024–2026) addresses high dimensionality and large datasets by combining Q-DEIM greedy sampling, differentiable surrogate modelling, and sparse regression, showing robustness on Burgers, Allen-Cahn, and KdV equations. Evo-SINDy (ACM, 2025) uses multi-population co-evolutionary algorithms for universal PDE identification. Bayesian-SINDy quantifies parameter uncertainty robustly (arXiv:2402.15357).

On the neural-symbolic front, Mechanistic PDE Networks (arXiv:2502.18377, 2025) represent spatiotemporal data as space-time dependent linear PDEs within neural network hidden representations, then solve and decode for specific tasks. MORL4PDEs (Chaos Solitons Fractals, 2024) uses reinforcement learning and genetic algorithms for symbolic PDE regression without pre-specified candidate libraries. The Physics-Informed Information Criterion (PIC) (Research, 2022) selects the most appropriate PDE from candidates by incorporating symmetry constraints.

9. Hamilton–Jacobi PDEs #

Hamilton–Jacobi (HJ) PDEs govern optimal control, level-set methods, and front propagation. A comprehensive 2025 review (arXiv:2502.20833) covers grid-based methods, representation formula methods, Monte Carlo via Laplace’s method, and deep learning approaches. Key deep learning advances include actor-critic neural network frameworks for static HJ equations (convergence analysed in 2024), and variational methods that solve HJ PDEs up to 100 dimensions with relative errors of 1–5%. Deep BSDE methods naturally apply to Hamilton-Jacobi-Bellman (HJB) equations arising in stochastic optimal control.

10. Fractional and Non-Local PDEs #

Fractional-order derivatives model anomalous diffusion, viscoelastic behaviour, and memory effects that integer-order PDEs cannot capture. Recent advances include semi-analytical methods (Adomian Decomposition, Variational Iteration) applied to 3D time-fractional diffusion, telegraph, and wave equations; a 2024 comprehensive review of fractional stochastic PDEs covering the latest numerical methods and practical implementations; the Optimised FNO (O-FNO, 2025) achieving 98%+ test accuracy for fractional Poisson equations; and a 2025 meshfree finite difference scheme for the fractional Laplacian on arbitrary bounded domains.

11. Multiscale Methods and Model Order Reduction #

The 2024 Numerical Multiscale Methods dissertation establishes an equivalence between time averaging and space homogenisation, and extends Deep Ritz to multiscale problems via scale convergence theory. Multi-fidelity reduced order models for PDE-constrained optimisation (arXiv:2503.21252, 2025) use a hierarchical trust region algorithm with active learning, constructing a full/reduced/ML model hierarchy on-the-fly. POD-DL-ROMs (Politecnico di Milano, 2024) combine proper orthogonal decomposition with autoencoder architectures for nonlinear parametric PDEs, providing a mathematically rigorous framework enhancing accuracy of reduced models.

12. Uncertainty Quantification and Stochastic PDEs #

Quasi-Monte Carlo (QMC) methods achieve faster convergence than Monte Carlo for smooth integrands. A 2024 paper analyses QMC with generalised Gaussian random variables and Gevrey regular inputs — relaxing the standard uniformly bounded assumption — analysing dimension truncation, FEM, and QMC errors jointly for randomly shifted rank-1 lattice rules (arXiv:2411.03793). Randomised QMC (RQMC) with scrambled Sobol’ sequences achieves smaller bias and RMSE than Monte Carlo for risk-averse optimisation (arXiv:2408.02842). A 2024 ICERM semester at Brown University (“Numerical PDEs: Analysis, Algorithms, and Data Challenges”) served as a major gathering point for researchers integrating uncertainty quantification with PDE methods.

13. Quantum and Photonic Computing for PDEs #

Schrödingerisation techniques convert general linear PDEs into Schrödinger-type equations via the “warped transformation,” enabling direct quantum Hamiltonian simulation. A 2024 Quantum journal paper provides explicit quantum circuit implementations for the heat and advection equations with complexity analysis demonstrating quantum advantage in high dimensions. ColibriTD’s H-DES (March 2025) was reported as the first real-hardware solution of a PDE via variational quantum algorithm, executing on IBM’s 156-qubit Heron R2 processor for the inviscid Burgers’ equation.

LightSolver’s Laser Processing Unit (LPU) (announced September 2025) can now directly map and solve PDEs, with constant-time iteration steps independent of problem size, claiming up to 100× speed gains over GPU solvers and partnerships with Ansys for engineering integration.

Open Problems #

PINN training stability. Despite many improvements, PINN training remains fragile for stiff and multi-scale problems. A general theory of loss landscape conditioning and principled hyperparameter selection is lacking.

Neural operator generalisation theory. While FNO and DeepONet generalise empirically across PDE instances, rigorous approximation-theoretic guarantees relating operator-learning error to network width, depth, and training data remain incomplete.

Foundation model reliability and extrapolation. PDE foundation models show impressive zero-shot accuracy within their pre-training distribution, but their failure modes on out-of-distribution physics — and the extent to which physics-informed fine-tuning can compensate — are not yet well understood.

High-dimensional solvers beyond parabolic PDEs. The Deep BSDE method and MLP method primarily address semilinear parabolic PDEs. Extending their curse-of-dimensionality guarantees to elliptic, hyperbolic, or fully nonlinear PDEs remains largely open.

Structure-preserving deep learning. Integrating conservation laws and geometric structure (symplecticity, divergence-free constraints) into neural PDE solvers at scale — beyond the Geo-NeW approach for specific exterior calculus structures — is an active and unresolved challenge.

Quantum hardware advantage. Near-term quantum devices face noise and connectivity limitations that restrict their practical advantage over classical HPC for PDE solving. Demonstrating genuine quantum speedup for industrially relevant PDEs on real hardware remains an open goal.

References #

Brunton, S. L., Proctor, J. L., & Kutz, J. N. (2016). Discovering governing equations from data by sparse identification of nonlinear dynamical systems. PNAS, 113(15), 3932–3937.

ColibriTD. (2025, March). H-DES: First real-hardware PDE solver via variational quantum algorithm. The Quantum Insider. https://thequantuminsider.com/2025/03/25/colibritd-announces-h-des-pde-solver-as-a-step-toward-accessible-quantum-simulation-in-engineering/

E, W., & Yu, B. (2018). The deep Ritz method: A deep learning-based numerical algorithm for solving variational problems. Communications in Mathematics and Statistics, 6(1), 1–12.

E, W., Han, J., & Jentzen, A. (2022). Algorithms for solving high dimensional PDEs: From nonlinear Monte Carlo to machine learning. Nonlinearity, 35(1), 278.

Han, J., Jentzen, A., & E, W. (2018). Solving high-dimensional partial differential equations using deep learning. PNAS, 115(34), 8505–8510. https://www.pnas.org/doi/10.1073/pnas.1718942115

Han, J. (2025). A brief review of the Deep BSDE method for solving high-dimensional partial differential equations. arXiv:2505.17032. https://arxiv.org/abs/2505.17032

Hu, J., Jin, S., Liu, N., & Zhang, L. (2024). Quantum circuits for partial differential equations via Schrödingerisation. Quantum, 8, 1563. https://quantum-journal.org/papers/q-2024-12-12-1563/

IEEE. (2025). A staged training approach for physics-informed neural networks in solving partial differential equations. https://ieeexplore.ieee.org/document/11172661/

IEEE. (2025). Higher-order-ReLU-KANs (HRKANs) for solving physics-informed neural networks more accurately, robustly and faster. https://ieeexplore.ieee.org/document/11105234/

IEEE. (2025). ReBA: A hybrid sparse reconfigurable butterfly accelerator for solving PDEs via hardware and algorithm co-design. https://ieeexplore.ieee.org/document/11044078/

IEEE. (2025). An optimized Fourier neural operator for the 2D fractional Poisson equation. https://ieeexplore.ieee.org/document/11405135/

Li, Z., et al. (2020). Fourier neural operator for parametric partial differential equations. arXiv:2010.08895.

LightSolver. (2025, September). LightSolver announces advance in physical modeling on the LPU. The Quantum Insider. https://thequantuminsider.com/2025/09/16/lightsolver-announces-advance-in-physical-modeling-on-the-lpu-and-new-roadmap-for-optical-analog-pde-solving/

Liu, Z., et al. (2024). KAN: Kolmogorov-Arnold Networks. arXiv:2404.19756. ICLR 2025. https://arxiv.org/abs/2404.19756

Lu, L., Jin, P., Pang, G., Zhang, Z., & Karniadakis, G. E. (2021). Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Nature Machine Intelligence, 3, 218–229.

Lu, L., et al. (2024). Learning nonlinear operators in latent spaces for real-time predictions of complex dynamics in physical systems. Nature Communications. https://www.nature.com/articles/s41467-024-49411-w

McCabe, M., et al. (2025). Poseidon: Efficient foundation models for PDEs. arXiv:2405.19101. https://arxiv.org/html/2405.19101v2

Peng, W., et al. (2025). OmniArch: Building foundation model for scientific computing. ICML 2025. https://icml.cc/virtual/2025/poster/45099

Raissi, M., Perdikaris, P., & Karniadakis, G. E. (2019). Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378, 686–707.

Shi, Z., et al. (2025). Physics-informed fine-tuning of foundation models for partial differential equations. arXiv:2603.15431. https://arxiv.org/html/2603.15431v1

Wang, S., et al. (2025). Geo-NeW: Structure-preserving learning improves geometry generalization in PDEs. arXiv:2602.02788. https://arxiv.org/abs/2602.02788

Wang, Z., et al. (2024). Kolmogorov–Arnold-Informed neural network: A physics-informed deep learning framework for solving forward and inverse problems. Computer Methods in Applied Mechanics and Engineering. https://linkinghub.elsevier.com/retrieve/pii/S0045782524007722

Xiao, P., et al. (2025). Quantum DeepONet: Neural operators accelerated by quantum computing. Quantum, 9, 1761. https://quantum-journal.org/papers/q-2025-06-04-1761/

Xie, Z., et al. (2025). Anant-Net: Breaking the curse of dimensionality with scalable and interpretable neural surrogates. arXiv:2505.03595. https://arxiv.org/html/2505.03595v3

Xie, Z., et al. (2025). A deep shotgun method for solving high-dimensional parabolic partial differential equations. Journal of Scientific Computing. https://link.springer.com/10.1007/s10915-025-02983-1

Xu, K., & Darve, E. (2025). Integration matters for learning PDEs with backwards SDEs. arXiv:2505.01078. https://arxiv.org/abs/2505.01078

Zeng, Q., et al. (2025). Automatic network structure discovery of physics informed neural networks via knowledge distillation. Nature Communications. https://www.nature.com/articles/s41467-025-64624-3

Zhang, Y., et al. (2024). PDEformer: Towards a foundation model for one-dimensional partial differential equations. arXiv:2402.12652. http://arxiv.org/pdf/2402.12652.pdf

Zhang, Y., et al. (2025). A multimodal PDE foundation model for prediction and scientific text descriptions. arXiv:2502.06026. https://arxiv.org/abs/2502.06026

Recent Advances in Steady States of Navier-Stokes Equations

Mon, 30 Mar 2026 00:00:00 +0000

The study of steady-state and self-similar solutions of the incompressible Navier-Stokes equations (NSE) has undergone remarkable progress in the 2020s. This post surveys landmark results from 2024–2026 touching on existence, uniqueness, classification, and stability of such solutions. The stationary (steady) NSE in $\mathbb{R}^3$ reads:

$$-\nu \Delta u + (u \cdot \nabla) u + \nabla p = 0, \quad \operatorname{div} u = 0.$$

A central object of the self-similar theory is the class of $(-1)$-homogeneous (scale-invariant) solutions: a function $u$ is $(-1)$-homogeneous if $u(\lambda x) = \lambda^{-1} u(x)$ for all $\lambda > 0$. These are precisely the profiles of forward self-similar solutions $u(x,t) = t^{-1/2} U(x/\sqrt{t})$ of the time-dependent NSE.

Overview #

Five landmark results define the frontier of this area in 2024–2026:

Non-uniqueness of Leray–Hopf solutions via a computer-assisted proof in the self-similar framework (Hou, Wang, & Yang, 2025).
Forward self-similar solutions in 2D for arbitrarily large initial data (Albritton, Guillod, Korobkov, & Ren, 2026).
Existence of self-similar solutions in high dimensions ($4 \leq n \leq 16$) without smallness conditions (Bang, Gui, Liu, Wang, & Xie, 2025).
Sharp removable singularity results for $(-1)$-homogeneous solutions with singular rays (Li, Li, & Yan, 2024).
Steady NSE in junction domains with large, non-small fluxes (Gazzola, Korobkov, Ren, & Sperone, 2025).

Paper	Authors	Contribution
arXiv:2410.11170	Li, Li, Yan	Optimal removable singularity for $(-1)$-homogeneous solutions
arXiv:2412.07283	Bang, Gui, Liu, Wang, Xie	Self-similar solutions in 2D sector: existence/non-uniqueness
arXiv:2505.14642	Gazzola, Korobkov, Ren, Sperone	Steady NSE in junction channels, non-small fluxes
arXiv:2509.25116	Hou, Wang, Yang	First rigorous non-uniqueness of Leray–Hopf
arXiv:2510.10488	Bang, Gui, Liu, Wang, Xie	$(-1)$-homogeneous solutions, dimensions $4 \leq n \leq 16$
arXiv:2601.03161	Albritton, Guillod, Korobkov, Ren	Forward self-similar solutions, 2D, large data
arXiv:2601.03833	Gui, Liu, Xie	Global existence of 2D forward self-similar solutions
arXiv:2602.19846	Fujii	Sharp uniqueness/non-uniqueness in critical Besov spaces

Background #

Landau Solutions and Šverák’s Classification #

In 1944, Landau discovered a three-parameter explicit family of $(-1)$-homogeneous axisymmetric no-swirl solutions of the 3D stationary NSE. Known as Landau solutions, they are parameterized by vectors $b \in \mathbb{R}^3$ and represent fluid jets emanating from the origin. A seminal result of Šverák (2006) established that all $(-1)$-homogeneous solutions smooth on $\mathbb{S}^2$ must be Landau solutions — the only scale-invariant flows without singularities on the sphere.

Forward Self-Similar Solutions #

A forward self-similar solution takes the form

$$u(x, t) = \frac{1}{\sqrt{t}} U!\left(\frac{x}{\sqrt{t}}\right),$$

where the self-similar profile $U$ solves the stationary scaled NSE. The seminal work of Jia and Šverák (2014) showed that for any $(-1)$-homogeneous initial data smooth away from the origin, at least one global self-similar solution exists for large data — without any smallness restriction. Existence is proved via the Leray–Schauder continuation theorem rather than a fixed-point contraction (Jia & Šverák, 2015).

Discretely self-similar (DSS) solutions, where $u(\lambda x, \lambda^2 t) = \lambda^{-1} u(x,t)$ for a specific $\lambda > 1$, were constructed for large data by Tsai (2014).

Classification of $(-1)$-Homogeneous Solutions #

Tian and Xin (1998) proved that all $(-1)$-homogeneous axisymmetric solutions with exactly one singularity must be Landau solutions. A key series of papers by Li, Li, and Yan (2016–2023) classified all $(-1)$-homogeneous axisymmetric no-swirl solutions with singularities at both the north and south poles of $\mathbb{S}^2$, parameterizing them as a four-dimensional surface with boundary. They also constructed the first non-axisymmetric $(-1)$-homogeneous solutions with swirl using the Weierstrass representation of minimal surfaces.

Recent Developments #

1. Removable Singularity Theorem (Li, Li, & Yan, 2024) #

One of the sharpest results of 2024 is the removable singularity theorem proved by Li, Li, and Yan (arXiv:2410.11170, to appear in Trans. Amer. Math. Soc.): any local $(-1)$-homogeneous solution $u$ near a potential singular ray through $P \in \mathbb{S}^2$ extends smoothly across $P$, provided $u = o(\ln \operatorname{dist}(x, P))$ on $\mathbb{S}^2$.

The result is sharp: for any $\alpha > 0$, there exist local solutions where $|u(x)| / \ln |x’| \to -\alpha$ as $x \to P$, showing that logarithmic growth exactly prevents smooth extension. The paper also establishes existence of solutions with any finite number of singularities located arbitrarily on $\mathbb{S}^2$. A companion survey by Li and Yan (arXiv:2509.07243, Sep 2025) provides a state-of-the-art exposition of this topic.

2. Self-Similar Solutions in High Dimensions (Bang et al., 2025) #

Bang, Gui, Liu, Wang, and Xie (arXiv:2510.10488, Oct 2025) proved existence of $(-1)$-homogeneous solutions to the steady NSE in high spatial dimensions:

For any $(-3)$-homogeneous, locally Lipschitz external force on $\mathbb{R}^n \setminus {0}$ with $4 \leq n \leq 16$, the steady NSE admit at least one $(-1)$-homogeneous solution that is scale-invariant and regular away from the origin.

Global uniqueness holds when the external force is small. The key novelty is a dimension-reduction effect from self-similarity: integral estimates of the positive part of the total head pressure enable energy estimates even in the supercritical dimension regime. For forces with only a nonnegative radial component, existence extends to all $n \geq 4$.

The same group (arXiv:2412.07283, Dec 2024) also established existence, uniqueness, and non-uniqueness of self-similar solutions to the steady NSE in 2D sectors with no-slip boundary conditions, providing rigorous corrections to classical Rosenhead (1940) calculations.

3. Forward Self-Similar Solutions in 2D for Large Data (2026) #

Two independent papers in January 2026 addressed the 2D problem, where classical local energy estimates break down because the initial $(-1)$-homogeneous vorticity is not locally integrable:

Gui, Liu, and Xie (arXiv:2601.03833) established global existence of forward self-similar solutions for any divergence-free, $(-1)$-homogeneous, locally Hölder continuous initial velocity, with no smallness assumption.
Albritton, Guillod, Korobkov, and Ren (arXiv:2601.03161) independently constructed such solutions from arbitrarily large initial data and provided numerical evidence for non-uniqueness — the first construction and validation of non-uniqueness for the 2D self-similar problem.

4. Non-Uniqueness of Leray–Hopf Solutions (Hou, Wang, & Yang, 2025) #

The most dramatic recent development is the first rigorous computer-assisted proof of non-uniqueness of Leray–Hopf solutions to the unforced 3D NSE by Hou, Wang, and Yang (arXiv:2509.25116, Sep 2025, revised Mar 2026):

There exist infinitely many distinct suitable Leray–Hopf solutions to the 3D NSE on $\mathbb{R}^3 \times [0,1]$ with the same compactly supported, divergence-free initial condition $u_{in} \in L^q$ for any $q < 3$.

The proof executes the Jia–Šverák program (Jia & Šverák, 2015), which requires finding a large forward self-similar background flow whose linearized operator has an unstable eigenvalue (positive real part), then bifurcating to produce infinitely many Leray–Hopf solutions. The key steps are:

A finite-element + spectral-basis numerical method computes a highly precise candidate profile $\tilde{U}$.
The linearized operator $L_{\tilde{U}}$ is decomposed into a coercive part plus a finite-rank perturbation, whose invertibility is certified by computer-assisted interval arithmetic.
This certifies an unstable eigenpair $(\tilde{v}, \tilde{\lambda})$ with $\operatorname{Re}(\tilde{\lambda}) > 0$, yielding the second (and infinitely many) solutions via Riesz projection and Duhamel analysis.

These solutions just miss the Prodi–Serrin condition that guarantees uniqueness. Guillod and Šverák (2017) had provided strong numerical evidence that such unstable profiles exist, but the rigorous proof remained elusive until Hou et al.

5. Sharp Non-Uniqueness for Weak Solutions via Convex Integration (2022–2026) #

A parallel program uses convex integration to prove non-uniqueness of weak solutions. Cheskidov and Luo (Invent. Math., 2022) proved sharp non-uniqueness in $L^p_t L^\infty$ for any $p < 2$ in the periodic setting. Miao, Nie, and Ye (arXiv:2412.09637, Dec 2024) extended this to $\mathbb{R}^3$. Fujii (arXiv:2602.19846, Feb 2026) completed a sharp classification in critical Besov spaces $C([0,T); \dot{B}^{n/p-1}_{p,q}(\mathbb{R}^n))$, finding that large-time asymptotics of non-unique solutions are governed by non-trivial stationary flows — a first in the critical regularity setting.

Result	Authors	Year	Setting	Self-similar?
Non-uniqueness, $L^p_t L^\infty$, torus	Cheskidov & Luo	2022	3D periodic	No
Non-uniqueness, $L^p_t L^\infty$, $\mathbb{R}^3$	Miao, Nie & Ye	2024	3D whole space	No
Non-uniqueness of Leray–Hopf, 3D	Hou, Wang & Yang	2025	3D whole space	Yes
Forward self-similar, 2D, large data	Albritton et al.	2026	2D whole space	Yes
Steady NSE in 2D sector	Bang et al.	2024	2D sector	Yes

6. Liouville Theorems and Stability of Landau Solutions #

Tan (arXiv:2501.03609, Jan 2025) proved new Liouville theorems for the stationary NSE (including the fractional case) under growth conditions in Lebesgue spaces. Ding and Tan (arXiv:2501.03615, Jan 2025) proved a Liouville theorem for the stationary inhomogeneous NSE via frequency localization of the Dirichlet energy near the origin.

The asymptotic stability of small Landau solutions in $L^3$ was sharpened by Bradshaw and Wang (arXiv:2409.12918, Sep 2024): $L^3$-asymptotic stability holds in Lorentz spaces $L^{3,q}$ for $q < \infty$, but fails in $L^{3,\infty}$ (weak-$L^3$), marking the precise boundary of stability.

7. Steady NSE in Bounded and Unbounded Domains #

A major reference work by Korobkov, Pileckas, and Russo (Springer/Birkhäuser, March 2024) provides the first comprehensive book treatment of Leray’s problem: existence of a solution in bounded domains under only the condition of zero total flux — without smallness on the boundary data.

Gazzola, Korobkov, Ren, and Sperone (arXiv:2505.14642, May 2025) studied steady NSE in a junction of unbounded channels with sources and sinks, under inhomogeneous Dirichlet boundary conditions and without smallness of fluxes. They prove existence of a solution with uniformly bounded Dirichlet integral in every compact subset via Leray’s reductio ad absurdum argument using Morse–Sard-type theorems in Sobolev spaces.

Open Problems #

Several central questions remain unresolved or only partially answered:

The Clay Millennium Prize Problem. Whether 3D NSE solutions from smooth initial data can blow up in finite time is not resolved. The Hou et al. non-uniqueness result concerns Leray–Hopf solutions from singular $L^q$ ($q < 3$) initial data, not smooth data.

Complete classification of $(-1)$-homogeneous solutions in 3D. The axisymmetric no-swirl case is fully classified, and swirl solutions are well-studied, but a complete classification for all $(-1)$-homogeneous solutions with arbitrarily many singular rays and all possible swirl configurations is not yet achieved.

Rigorous non-uniqueness of forward self-similar solutions in 3D. The Jia–Šverák program produced numerical evidence (Guillod & Šverák, 2017), but a fully rigorous, non-computer-assisted proof of non-uniqueness for the forward (not backward) self-similar 3D problem remains open.

Asymptotic stability of large Landau solutions. While small Landau solutions are asymptotically stable in $L^3$, stability for large-parameter Landau solutions is not fully understood.

The Leray problem in non-axisymmetric 3D exterior domains without flux restrictions. The axisymmetric case was solved by Korobkov, Pileckas, and Russo, but the general 3D exterior domain problem under large flux remains open.

References #

Albritton, D., Guillod, J., Korobkov, M., & Ren, X. (2026). Forward self-similar solutions to the 2D Navier-Stokes equations from large data. arXiv:2601.03161. https://arxiv.org/abs/2601.03161

Bang, J., Gui, C., Liu, Y., Wang, C., & Xie, C. (2024). Self-similar solutions to the steady Navier-Stokes equations in 2D sectors. arXiv:2412.07283. https://arxiv.org/abs/2412.07283

Bang, J., Gui, C., Liu, Y., Wang, C., & Xie, C. (2025). On the existence of self-similar solutions to the steady Navier-Stokes equations in high dimensions. arXiv:2510.10488. https://arxiv.org/abs/2510.10488

Bradshaw, Z., & Wang, X. (2024). Asymptotic stability of Landau solutions in Lorentz spaces. arXiv:2409.12918. https://arxiv.org/pdf/2409.12918.pdf

Cheskidov, A., & Luo, X. (2022). Sharp nonuniqueness for the Navier-Stokes equations. Inventiones Mathematicae. arXiv:2009.06596. https://arxiv.org/abs/2009.06596

Ding, M., & Tan, W. (2025). Liouville-type theorem for the stationary inhomogeneous Navier-Stokes equations. arXiv:2501.03615. https://arxiv.org/abs/2501.03615

Fujii, M. (2026). Sharp non-uniqueness for the Navier-Stokes equations in critical Besov spaces. arXiv:2602.19846. https://arxiv.org/html/2602.19846v1

Gazzola, F., Korobkov, M., Ren, X., & Sperone, G. (2025). The steady Navier-Stokes equations in a system of unbounded channels with sources and sinks. arXiv:2505.14642. https://arxiv.org/abs/2505.14642

Gui, C., Liu, Y., & Xie, C. (2026). On the forward self-similar solutions to the two-dimensional Navier-Stokes equations. arXiv:2601.03833. https://arxiv.org/html/2601.03833v2

Hou, T., Wang, Y., & Yang, C. (2025). Nonuniqueness of Leray-Hopf solutions to the unforced incompressible 3D Navier-Stokes equations. arXiv:2509.25116. https://arxiv.org/abs/2509.25116

Jia, H., & Šverák, V. (2015). Are the incompressible 3d Navier–Stokes equations locally ill-posed in the natural energy space? Journal of Functional Analysis, 268(12), 3734–3766. https://www.sciencedirect.com/science/article/pii/S002212361500138X

Korobkov, M., Pileckas, K., & Russo, R. (2024). The Steady Navier-Stokes System: Basics of the Theory and the Leray Problem. Springer/Birkhäuser. https://books.google.com/books/about/The_Steady_Navier_Stokes_System.html?id=GOf8EAAAQBAJ

Korobkov, M., & Ren, X. (2024). On basic velocity estimates for the plane steady-state Navier-Stokes equations in convex domains. arXiv:2405.17884. https://arxiv.org/abs/2405.17884

Li, L., Li, Y., & Yan, Y. (2024). Removable singularity of $(-1)$-homogeneous solutions of stationary Navier-Stokes equations. Transactions of the American Mathematical Society. arXiv:2410.11170. https://arxiv.org/abs/2410.11170

Li, Y., & Yan, Y. (2025). Recent research on $(-1)$-homogeneous solutions of stationary Navier-Stokes equations. arXiv:2509.07243. https://arxiv.org/abs/2509.07243

Miao, C., Nie, Y., & Ye, W. (2024). Sharp non-uniqueness for the Navier-Stokes equations in the whole space. arXiv:2412.09637. https://arxiv.org/abs/2412.09637

Tan, W. (2025). New Liouville type theorems for the stationary Navier-Stokes equations. arXiv:2501.03609. https://arxiv.org/pdf/2501.03609.pdf

Tsai, T.-P. (2014). Forward discretely self-similar solutions of the Navier-Stokes equations. arXiv:1210.2783. https://arxiv.org/abs/1210.2783

Recent Research Directions in Analysis of PDEs 2021–2026

Mon, 30 Mar 2026 00:00:00 +0000

The arXiv section of Analysis of Partial Differential Equations is one of the most prolific areas of pure mathematics, producing over 400 preprints per month as of early 2026. The period 2021–2026 has witnessed landmark breakthroughs — including a computer-assisted proof of finite-time singularity in the 3D Euler equations, the resolution of Hilbert’s Sixth Problem via kinetic theory, and the emergence of probabilistic and nonlocal operator methods as dominant paradigms. This survey identifies, categorises, and profiles the key research directions and landmark papers in math.AP during this era.

Overview #

The landscape of math.AP in 2021–2026 organises into several major research directions:

Direction	Landmark Papers	Landmark Results
Fluid singularity (Euler)	Chen & Hou (2022–2023)	Finite-time blowup for 3D Euler/2D Boussinesq, smooth data (PNAS 2025)
NS non-uniqueness	Albritton, Brué & Colombo (2021)	Non-unique Leray–Hopf solutions for forced NS
Hilbert’s 6th Problem	Deng, Hani & Ma (2024–2025)	Long-time Boltzmann derivation; fluid equations from Newton’s laws
Wave kinetic equation	Deng & Hani (2021)	Rigorous WKE derivation from cubic NLS
Mixed local-nonlocal operators	Biagi, Dipierro, Valdinoci et al. (2020–2022)	Regularity, max. principles, Faber-Krahn inequalities
Double phase functionals	De Filippis & Mingione (2022–2023)	Gradient regularity in mixed/double phase settings
Normalized Schrödinger	Wei & Wu (2021); Jeanjean & Le (2020)	Critical mass constraints, ground states, NLS
MFG inverse problems	Imanuvilov, Liu & Yamamoto (2023)	Lipschitz stability, Carleman estimates for MFG
Keller-Segel chemotaxis	Li & Winkler (2022); Lyu & Wang (2021)	Signal-dependent motility, global regularity
Stefan/free boundary	Ferrari et al. (2024); Arya, Jeon & Julin (2026)	$C^{1,\alpha}$ regularity, supercooled Stefan
Stochastic PDEs	Bailleul & Bruned (2021); Bailleul & Hoshino (2025)	Renormalisation, regularity structures
Calderón inverse problem	Cârstea, Uhlmann et al. (2021); Krupchyk (2025)	Nonlinear and fractional settings
Dispersive PDEs	Deng, Nahmod & Yue (2020); Gubinelli et al. (2025)	Random tensors, modulated dispersive equations

Background #

The math.AP Landscape #

Analysis of PDEs is the mathematical study of equations involving unknown functions and their partial derivatives, arising in physics, geometry, probability, and engineering. The arXiv math.AP category encompasses everything from regularity theory for elliptic and parabolic equations to global well-posedness for dispersive equations, from geometric flows to inverse problems, and from kinetic theory to stochastic PDEs. With roughly 300–400 papers per month (408 in February 2026 alone), it is one of the most active and interconnected areas of pure mathematics.

The period 2021–2026 is characterised by three broad trends. First, grand-challenge resolutions: several longstanding open problems — including Hilbert’s Sixth Problem and the existence of finite-time singularities for 3D Euler equations with smooth data — were settled using novel combinations of rigorous analysis, Feynman-diagram combinatorics, and computer-assisted numerics. Second, new paradigm emergence: mixed local-nonlocal operators, double phase functionals, and normalised solutions have matured from isolated curiosities into systematic research programmes with their own regularity theories. Third, interdisciplinary expansion: MFG systems, optimal transport, SPDEs, and AI-assisted methods have become structural parts of the math.AP ecosystem.

Recent Developments #

1. Mathematical Fluid Dynamics: Singularity, Non-Uniqueness, and Stability #

Finite-Time Blowup of the 3D Euler Equations #

The question of whether the 3D incompressible Euler equations

$$\partial_t u + (u \cdot \nabla) u + \nabla p = 0, \qquad \operatorname{div} u = 0,$$

can develop a singularity from smooth initial data — open since Euler introduced the equations in 1757 — saw a decisive resolution in a bounded-domain setting through a landmark two-part series by Jiajie Chen and Thomas Y. Hou (arXiv:2210.07191, arXiv:2305.05660, PNAS 2025). Their work proves finite-time, nearly self-similar blowup of both the 2D Boussinesq and 3D axisymmetric Euler equations with smooth initial data and finite energy in the presence of a solid boundary. The proof employs weighted $L^\infty$ and $C^{1/2}$ norms, sharp functional inequalities inspired by optimal transport, and computer-assisted rigorous numerics to verify nonlinear stability constants. The result was praised as one of the most significant advances in mathematical fluid mechanics in decades.

Prior to Chen–Hou, Tarek Elgindi (2021) showed finite-time singularity for the 3D axisymmetric Euler equations without swirl from $C^{1,\alpha}$ initial vorticity. The Chen–Hou 2021 paper on the Hou-Luo model proved asymptotically self-similar blowup from smooth data for the HL model. Concurrently, Hou and collaborators presented numerical evidence for singularity in 3D Navier-Stokes achieving a $10^7$-fold increase in maximum vorticity, and DeepMind (2025) used AI-assisted methods to discover families of unstable singularities in the Incompressible Porous Media and Boussinesq equations.

Non-Uniqueness of Leray–Hopf Solutions for Navier-Stokes #

A 2021 breakthrough by Dallas Albritton, Elia Brué, and Maria Colombo proved non-uniqueness of Leray–Hopf solutions to the forced 3D Navier-Stokes equations: they exhibited two distinct Leray solutions with zero initial velocity and identical body force, exploiting the extreme instability of a self-similar background solution. Recognised as the most influential 2021 math.AP paper on arXiv by Paper Digest, the result was subsequently extended to bounded domains via gluing methods (arXiv:2209.03530) and to stochastic settings (Electronic Journal of Probability, 2024).

Stability of Shear Flows and Kinetic Theory #

Parallel to the singularity programme, sharp asymptotic stability results for 2D monotone shear flows with no-slip boundary conditions, and extensive work on inviscid damping and enhanced dissipation near shear flows, have appeared throughout 2025–2026.

Arguably the most monumental result in kinetic PDE theory during this period: Yu Deng, Zaher Hani, and Xiao Ma provided a rigorous long-time derivation of the Boltzmann equation from hard-sphere dynamics (arXiv:2408.07818, 2024), extending Lanford’s 1975 short-time theorem to all times within the lifespan of the Boltzmann solution. In a companion paper (arXiv:2503.01800, 2025), they completed the derivation of the compressible Euler and incompressible Navier-Stokes-Fourier equations from Newton’s laws — effectively resolving Hilbert’s Sixth Problem for rarefied hard-sphere gases. The proof uses cumulant ansätze, Feynman-diagram combinatorics, and a molecule-reduction algorithm. This followed the same team’s 2021 derivation of the wave kinetic equation from the cubic NLS.

2. Nonlocal and Fractional PDEs: Mixed Local-Nonlocal Operators #

One of the dominant new paradigms of the 2020s is the study of operators of the form

$$\mathcal{L} u = -\Delta u + (-\Delta)^s u, \quad s \in (0,1),$$

which superpose a classical Laplacian with a fractional (nonlocal) Laplacian. These arise naturally in models combining Brownian and Lévy diffusion processes. The foundational paper by Biagi, Dipierro, Valdinoci, and Vecchi (2020/2021) initiated a systematic theory of regularity and maximum principles for such operators.

Between 2021 and 2026 an explosion of activity produced: gradient regularity for mixed local-nonlocal problems via De Filippis and Mingione (2022, minimisers of mixed functionals are locally $C^{1,\beta}$-regular); Hölder regularity for mixed local-nonlocal degenerate elliptic equations (Garain & Lindgren, 2022); the Wiener criterion for nonlocal Dirichlet problems (Kim, Lee & Lee, 2022); and a Faber-Krahn inequality for mixed operators (Biagi, Dipierro, Valdinoci & Vecchi, 2021). Serena Dipierro and Enrico Valdinoci were among the most prolific contributors, publishing on nonlocal logistic equations with Neumann conditions, ecological niches for mixed dispersal, and Sobolev inequalities for mixed operators.

Giovanni Leoni’s 2023 treatise A First Course in Fractional Sobolev Spaces provided a self-contained reference covering definitions, embeddings, Hardy inequalities, and interpolation inequalities, and ranked among the most-cited arXiv math.AP papers of 2023. Concurrently, a 2025 paper established well-posedness and regularity theory for time-fractional stochastic PDEs involving Caputo derivatives and general nonlocal operators driven by Gaussian and Lévy noise (arXiv:2512.03754).

3. Double Phase Operators and Nonstandard Growth #

The double phase functional

$$\mathcal{H}(u) := \int_\Omega \bigl(|Du|^p + a(x)|Du|^q\bigr),dx, \quad q > p > 1,\ a(x) \geq 0,$$

introduced by Colombo and Mingione, generated a remarkable surge of activity throughout 2021–2026.

Year	Paper	Authors	Key Contribution
2021	A new class of double phase variable exponent problems	Crespo-Blanco, Gasiński, Harjulehto, Winkert	Existence/uniqueness for new double phase with variable exponents
2021	Double phase implicit obstacle problems	Zeng, Rădulescu, Winkert	Mixed BVPs with convection and multivalued conditions
2022	Nonuniformly elliptic Schauder theory	De Filippis, Mingione	Schauder estimates in nonuniform elliptic settings
2022	New embedding results for double phase problems	Ho, Winkert	Musielak-Orlicz Sobolev spaces with variable exponent
2023	Regularity at nearly linear growth	De Filippis, Mingione	Hölder gradient regularity for log-type functionals
2025	Partial regularity for parabolic double phase systems	Ok, Scilla, Stroffolini	Partial Hölder regularity for parabolic systems

The work of Cristiana De Filippis and Giuseppe Mingione is particularly prominent throughout, providing a comprehensive regularity theory for double phase and nonuniformly elliptic functionals (arXiv:2308.10222).

4. Normalized Solutions and Variational Methods for Schrödinger Equations #

The problem of finding solutions $u \in H^1(\mathbb{R}^N)$ with prescribed $L^2$-norm — the mass constraint

$$\int_{\mathbb{R}^N} |u|^2,dx = c$$

— has become a central theme in the study of nonlinear Schrödinger equations. The influential papers by Louis Jeanjean and Thanh Trung Le on multiple normalized solutions for Sobolev critical equations (2020–2021) and by Juncheng Wei and Yuanze Wu on normalized solutions with critical Sobolev exponent and mixed nonlinearities (2021) launched a wave of activity. Key directions include: normalized ground states for NLS with potential (Bartsch, Molle, Rizzi & Verzini); normalized solutions for Schrödinger-Poisson-Slater equations; and standing waves and stability for Choquard equations. The March 2026 arXiv listings confirm that sharp exponents, existence and asymptotics for Choquard equations, and boosted ground states for pseudo-relativistic Schrödinger equations remain highly active.

Parallel work on eigenvalue problems addresses Steklov eigenvalues (monotonicity for regular $N$-gons, sharp geometric bounds), eigenvalues of Pucci’s extremal operator in 3D, and biharmonic Steklov problems on thin sets.

5. Mean Field Games and Aggregation-Diffusion PDEs #

Mean field game theory generated a prolific suite of PDE questions between 2021 and 2026. Highlights include: Imanuvilov, Liu, and Yamamoto (2023) proving Lipschitz stability for determining states and inverse sources in MFG equations using Carleman estimates; Klibanov, Li, and Liu (2023) on Hölder stability via Carleman estimates; the inverse boundary problem for first-order master equations (Liu & Zhang, 2022); and Bresch, Jabin, and Soler (2022) introducing a novel probabilistic derivation of the mean-field limit applicable to Vlasov-Poisson-Fokker-Planck in 2D. By 2025–2026, nonlocal MFG models with spatial interactions and new work on Wasserstein gradient flows of kernel mean discrepancies with connections to machine learning appeared on arXiv (arXiv:2506.01200).

Optimal transport has deeply influenced aggregation-diffusion equations and gradient flows. The March 2026 arXiv listings include a major 73-page paper by Carrillo, Gwiazda, and Skrzeczkowski presenting a new formula for the Wasserstein distance between solutions to nonlinear continuity equations.

6. Chemotaxis and Reaction-Diffusion Systems #

Chemotaxis systems — in particular Keller-Segel models with signal-dependent motility (density-suppressed diffusion) — generated intense activity. Key papers include logistic damping effects and global classical solutions for reaction-diffusion systems with density-suppressed motility (Lyu & Wang, 2021), refined regularity analysis for Keller-Segel-consumption systems (Li & Winkler, 2022), and global existence with uniform boundedness under signal-dependent motility (Jiang & Laurençot, 2021). In 2024, a construction of smooth finite-time blowup solutions for the 3D Keller-Segel-Navier-Stokes (chemotaxis-fluid) system with buoyancy appeared, using a quantitative method that directly constructs the singular solution (arXiv:2404.17228).

In parallel, free boundary reaction-diffusion models for species spreading and SIS epidemic models — including 2026 work on asymmetric kernels in advective periodic environments — continue to produce threshold and long-time dynamics results.

7. Free Boundary Problems #

The Stefan problem (modelling solidification and melting) remained highly active throughout 2021–2026. Key results include $C^{1,\alpha}$ regularity of flat free boundaries for the inhomogeneous one-phase Stefan problem (Ferrari, Forcillo, Giovagnoli & Jesus, 2024; arXiv:2404.07535); regularity of the free boundary for the supercooled Stefan problem in arbitrary dimensions (2025; arXiv:2512.10136), where the free boundary decomposes into regular, singular, and jump parts with the singular part having controlled parabolic dimension; and well-posedness and regularity of physical solutions for the supercooled Stefan problem assuming only integrable initial temperature, with explicit classification of free boundary points (2025; arXiv:2506.18741). These results use obstacle problem techniques, non-degeneracy estimates, and sharp free boundary classification arguments.

Shape optimisation for principal eigenvalues of Pucci operators and $\Gamma$-convergence of convolution-type functionals for free discontinuity problems are active related directions in 2026.

8. Stochastic PDEs and Regularity Structures #

Martin Hairer’s theory of regularity structures generated deep ongoing activity. The period 2021–2026 saw Bailleul and Bruned (2021) extending the algebraic renormalisation framework of regularity structures to a broader class of singular SPDEs (arXiv:2101.11949); the publication of “A tourist’s guide to regularity structures” by Bailleul and Hoshino (2025/2026) in EMS Surveys as an essentially self-contained treatment; applications to stochastic quantisation ($\Phi^4_3$), the KPZ equation, and stochastic geometric flows (Hairer, 2021); and variance renormalisation in regularity structures for the 2D generalised Parabolic Anderson Model (Gerencsér & Hsu, 2026).

On the fluid side, global unique solvability for stochastic Navier-Stokes-Korteweg equations and stochastic Allen-Cahn-Navier-Stokes systems with ergodic invariant measures appeared in 2025, and non-uniqueness of Leray-Hopf solutions was extended to the stochastic forced setting.

9. Dispersive PDEs: Wave Turbulence, Well-Posedness, and Blowup #

The full derivation of the wave kinetic equation from the cubic NLS by Deng and Hani (arXiv:1912.09518, 2021) was the most impactful dispersive result of the era. Their analysis relies on absolutely convergent Feynman-diagram (paired-tree) expansions and identifies favourable scaling laws $\alpha \sim L^{-\varepsilon}$ for the kinetic limit.

Ongoing work includes polynomial growth of Sobolev norms for the fractional NLS on $\mathbb{T}^d$ (Wang, 2026); low-regularity global well-posedness for generalised Zakharov-Kuznetsov equations (Nowicki-Koth, 2026); modulated dispersive equations (modulated KdV with normal form reduction; Gubinelli, Li, Li & Oh, 2025; arXiv:2505.24270); and probabilistic well-posedness of dispersive PDEs beyond variance blowup (2025; arXiv:2509.02344). Scattering results for the quintic generalised Benjamin-Bona-Mahony equation and the 3D Zakharov-Kuznetsov equation, and long-time asymptotics via Riemann-Hilbert and inverse scattering methods for integrable equations, appear in the March 2026 listings.

10. Geometric PDEs #

Ricci flow uniqueness in the non-compact setting (Lee, 2025; arXiv:2503.20292) and a new non-Kähler expanding Ricci soliton construction with Kähler tangent cone at infinity (Bamler, Chen & Conlon, 2026) reflect the continued health of geometric flows. The volume-preserving mean curvature flow regularity in dimensions 2 and 3 appeared in March 2026 (Arya, Jeon & Julin).

Regularity theory for Monge-Ampère equations received major contributions via a geometric approach: Brendle, Léger, McCann, and Rankin (2023; arXiv:2311.10208) derived the Pogorelov second-derivative bound using Kim-McCann-Warren’s pseudo-Riemannian geometry, providing a new approach to $C^1$ estimates for optimal transport maps. Liouville theorems and sharp solvability for the parabolic Monge-Ampère equation with periodic data appeared in March 2026.

11. Inverse Problems for PDEs #

The Calderón problem — recovering a coefficient from boundary Dirichlet-to-Neumann data — attracted major advances: the quasilinear setting (Cârstea, Feizmohammadi, Kian, Krupchyk & Uhlmann, 2021), inverse problems for fractional semilinear elliptic equations (Lai & Lin, 2020), the Calderón problem via Vekua theory (Clifford analysis framework, 2026; arXiv:2601.17313), and the convex lifting approach (Alberti, Petit & Sanna, 2025; arXiv:2507.00645). The anisotropic Calderón problem for fractional Schrödinger operators on closed Riemannian manifolds (Krupchyk, 2025) was an important further advance.

Inverse moving source problems for parabolic equations (Zhao, 2023), reconstruction of scalar parameters in subdiffusion, and inverse problems for multi-term time-fractional diffusion with Caputo derivatives are active in 2025–2026.

12. Semi-Classical Analysis, Spectral Theory, and Nonlinear Elliptic Theory #

A 2024 arXiv survey on semi-classical analysis introducing three representative topics ranked as the top 2024 math.AP paper by Paper Digest, and a 2026 paper celebrating the 100th anniversary of the WKB papers (Vũ Ngọc) indicate that semi-classical methods remain foundational.

In nonlinear elliptic and parabolic theory, major contributions include: Regularity Theory for Elliptic PDEs by Fernández-Real and Ros-Oton (2023), a comprehensive self-contained reference; Fujita-type results for degenerate parabolic equations on Heisenberg groups (Fino, Ruzhansky & Torebek, 2023), ranked the highest-impact 2023 math.AP paper; and singularity formation for nonlinear heat equations on infinite graphs (Punko & Zucchero, 2026).

Emerging and Cross-Cutting Themes (2025–2026) #

Computer-assisted proofs and rigorous numerics. The Chen–Hou Euler blowup proof and related work on the CLM model (Hou-Wang, 2026) demonstrate that computer-assisted methods with rigorous error control are becoming standard for complex nonlinear stability analyses. These methods combine spectral Galerkin approximations with interval arithmetic and weighted norm frameworks to certify nonlinear stability constants — a methodology likely to expand further.

AI and machine learning for PDEs. The 2026 workshop MLPDES26 and the NSF/AMS report on AI for the mathematical sciences signal growing interplay between pure math.AP and deep learning. Neural PDE networks for equation discovery (arXiv:2502.18377), geometric operator learning via optimal transport (arXiv:2507.20065), and AI-assisted singularity discovery (DeepMind, 2025) represent this interdisciplinary frontier.

PDE methods in geometry and probability. The intersection of math.AP with differential geometry, probability (SPDEs), and mathematical physics remains extremely active. The March 2026 listings span general relativity (tensorial wave equations), Kähler geometry (Ricci solitons), and stochastic PDEs — confirming that math.AP functions as a hub connecting multiple mathematical disciplines.

Open Problems #

Smooth-data Euler regularity beyond bounded domains. The Chen–Hou result proves blowup in a bounded domain. Whether finite-time singularity occurs for the 3D Euler equations in all of $\mathbb{R}^3$ from smooth, rapidly decaying initial data — the original Euler problem — remains open.

Navier-Stokes uniqueness from smooth initial data. The Albritton-Brué-Colombo result proves non-uniqueness for forced NS from zero initial velocity. Non-uniqueness (or uniqueness) of Leray–Hopf solutions for the unforced equations from smooth $H^1$ initial data is unresolved (see the companion survey on self-similar solutions).

Optimal regularity theory for double phase problems. Despite the comprehensive work of De Filippis and Mingione, optimal Schauder estimates for parabolic double phase systems at the boundary and under critical growth conditions are not fully established.

Complete derivation programme for Hilbert’s Sixth Problem. Deng-Hani-Ma resolved the case of hard-sphere gases in the Boltzmann regime. The derivation of hydrodynamic equations from particle dynamics in other regimes — dense gases, quantum systems, plasma — remains largely open.

Global well-posedness for energy-critical NLS in high dimensions. Despite progress on wave kinetic theory and probabilistic well-posedness, the deterministic global well-posedness theory for energy-critical and supercritical dispersive equations in dimensions $d \geq 5$ has significant gaps.

Quantum and numerical computation in pure math.AP. The growing use of computer-assisted proofs raises methodological questions about standards of verification, reproducibility, and the scope of problems accessible to these techniques.

References #

Albritton, D., Brué, E., & Colombo, M. (2021). Non-uniqueness of Leray solutions of the forced Navier-Stokes equations. https://cvgmt.sns.it/media/doc/paper/5405/main.pdf

Bailleul, I., & Bruned, Y. (2021). Renormalised singular stochastic PDEs. arXiv:2101.11949. https://www.pure.ed.ac.uk/ws/portalfiles/portal/194767736/2101.11949.pdf

Bailleul, I., & Hoshino, M. (2025). A tourist’s guide to regularity structures and singular stochastic PDEs. EMS Surveys in Mathematical Sciences. https://ems.press/journals/emss/articles/14298505

Brendle, S., Léger, F., McCann, R. J., & Rankin, C. (2023). A geometric approach to a priori estimates for optimal transport maps. arXiv:2311.10208. https://arxiv.org/abs/2311.10208

Chen, J., & Hou, T. Y. (2022). Stable nearly self-similar blowup of the 2D Boussinesq and 3D Euler equations with smooth data I: Analysis. arXiv:2210.07191. https://arxiv.org/abs/2210.07191

Chen, J., & Hou, T. Y. (2023). Stable nearly self-similar blowup of the 2D Boussinesq and 3D Euler equations with smooth data II: Rigorous numerics. arXiv:2305.05660. https://arxiv.org/abs/2305.05660

Chen, J., & Hou, T. Y. (2025). Singularity formation in 3D Euler equations with smooth initial data. PNAS, 122(28). https://www.pnas.org/doi/10.1073/pnas.2500940122

De Filippis, C., & Mingione, G. (2023). Regularity for double phase problems at nearly linear growth. arXiv:2308.10222. https://arxiv.org/abs/2308.10222

DeepMind. (2025). Discovering new solutions to century-old problems in fluid dynamics. https://deepmind.google/blog/discovering-new-solutions-to-century-old-problems-in-fluid-dynamics/

Deng, Y., & Hani, Z. (2021). On the derivation of the wave kinetic equation for NLS. arXiv:1912.09518. http://arxiv.org/pdf/1912.09518.pdf

Deng, Y., Hani, Z., & Ma, X. (2024). Long time derivation of the Boltzmann equation from hard sphere dynamics. arXiv:2408.07818. https://www.semanticscholar.org/paper/91b67412a6058c1ace054a32fbf36fa2d2998d3d

Deng, Y., Hani, Z., & Ma, X. (2025). Hilbert’s sixth problem: Derivation of fluid equations via Boltzmann’s kinetic theory. arXiv:2503.01800. https://www.semanticscholar.org/paper/01d8f11b5d31f7037fb4914797e938db11d76ec5

Ferrari, F., Forcillo, N., Giovagnoli, D., & Jesus, B. (2024). Free boundary regularity for the inhomogeneous one-phase Stefan problem. arXiv:2404.07535. https://arxiv.org/abs/2404.07535

Gubinelli, M., Li, J., Li, T., & Oh, T. (2025). Nonlinear PDEs with modulated dispersion IV: Normal form reduction for modulated KdV. arXiv:2505.24270. https://arxiv.org/pdf/2505.24270.pdf

Hou, T. Y. (2021). The potentially singular behavior of the 3D Navier-Stokes equations. arXiv:2107.06509. https://arxiv.org/abs/2107.06509

Hu, J., Jin, S., Liu, N., & Zhang, L. (2024). Quantum circuits for partial differential equations via Schrödingerisation. Quantum, 8, 1563.

Imanuvilov, O. Y., Liu, Y., & Yamamoto, M. (2023). Lipschitz stability for determining states and inverse sources in MFG equations. [Journal of Mathematical Analysis].

Ok, J., Scilla, G., & Stroffolini, B. (2025). Partial regularity for parabolic systems of double phase type. arXiv:2510.03849. https://arxiv.org/pdf/2510.03849.pdf

Paper Digest. (2025, March). Most influential arXiv (Analysis of PDEs) papers — 2025-03 version. https://www.paperdigest.org/2025/03/most-influential-arxiv-analysis-of-pdes-papers-2025-03-version/

Segata, J., & Chen, M. (2026). Scattering for the 3D Zakharov-Kuznetsov equation [arXiv preprint]. arXiv math.AP March 2026.

arXiv math.AP listings. (2026, February–March). https://arxiv.org/list/math.AP/2026-03

Paper Reading - Optimization problems for elliptic PDEs (2601.01591)

Fri, 20 Feb 2026 00:00:00 +0000

This paper is a panoramic tour of three families of optimal control problems for elliptic PDEs: where the control is the coefficient, the potential, or the source term, unifying and sharpening results from the authors’ previous works.

Three ways to control an elliptic PDE #

The authors always consider a Dirichlet problem on a bounded domain $\Omega \subset \mathbb{R}^d$, with the solution $u$ as the state and a function (or measure) as the control. They study three settings:

Optimal coefficients $a(x)$: $$ -\mathrm{div}(a(x)\nabla u) = f \text{ in } \Omega, \quad u=0 \text{ on } \partial\Omega, $$ cost function $J(u,a) = \int_\Omega j(u,a),dx$, with a constraint $\int_\Omega \psi(a),dx \le 1$.
Optimal potentials $V(x)$: $$ -\Delta u + V(x)u = f \text{ in } \Omega, \quad u\in H_0^1(\Omega), $$ cost function $J(u,V) = \int_\Omega (j(x,u) + \psi(V)),dx$.
Optimal sources $f$: $$ -\Delta u = f \text{ in } \Omega, \quad u\in H_0^1(\Omega), $$ cost function $J(f) = \int_\Omega j(x,u_f,f),dx$ with $\int_\Omega \psi(f),dx \le m$.

In all cases, $\psi$ is convex and lower semi-continuous (l.s.c), encoding constraints and penalizations on the control. The paper focuses on existence of optimal controls (sometimes as measures), characterization via auxiliary variational problems and adjoint states, bang–bang behavior, and regularity of optimal controls and their induced interfaces.

Optimal Coefficients: Where to Put the Good Material? #

Minimal Compliance and Measure-Valued Coefficients #

The model problem is compliance minimization for $-\mathrm{div}(a(x)\nabla u) = f$, $u=0$, with non-negative $a$.

Compliance is defined as: $$ C(a) = \int_\Omega f u_a,dx, $$ and it relates to the energy $$ E(a) = \inf_{u\in H_0^1} \int_\Omega \left(\tfrac{1}{2} a|\nabla u|^2 - f u\right)dx $$ via $C(a) = -2E(a)$.

The optimization problem is written as: $$ \min_{a \geq 0} \left\{ C(a) + \int_\Omega \psi(a)dx \right\}, $$ or equivalently as a max–min problem in $(a,u)$.

Two growth regimes of $\psi$ are crucial:

Superlinear: $\psi(s)/s \to +\infty$. Then admissible coefficients are in $L^1(\Omega)$, and there exists an optimal $a_{\mathrm{opt}}\in L^1(\Omega)$.
Linear growth: $\psi(s)/s \to k>0$. Then it is natural to extend the problem to measures $\mu\ge 0$, allowing “thin” structures on lower-dimensional sets. The cost $\int \psi(\mu)$ is interpreted through the Lebesgue–singular decomposition and the recession function $\psi_\infty$. An optimal measure $\mu_{\mathrm{opt}}\in \mathcal{M}^+(\Omega)$ still exists.

Because the functional is convex in $u$ and concave in $a$, the authors exchange inf and sup and reduce to an auxiliary minimization problem in $u$ alone: $$ \inf_{u} \int_\Omega \psi^{*}(|\nabla u|^2)dx - 2\int_\Omega u df, $$ where $\psi^{*}$ is the Legendre–Fenchel conjugate. Under mild assumptions this problem has a unique minimizer $\bar u$, and the optimal coefficient is recovered point-wise from the optimality condition: $$ a_{\mathrm{opt}}|\nabla\bar u|^2 = \psi(a_{\mathrm{opt}}) + \psi^*(|\nabla\bar u|^2). $$

Examples:

Power penalization $\psi(s) = s^p/p$, $p>1$: The auxiliary problem involves a nonlinear PDE $$-\Delta_{2p/(p-1)} u = \tfrac{2p}{p-1} f,$$ and the optimal coefficient is $a_{\mathrm{opt}}(x) = |\nabla \bar u(x)|^{2/(p-1)}$. For $\Omega$ a ball and $f=1$ or $f=\delta_0$, the authors give explicit radial formulas and plots for $\bar u$ and $a_{\mathrm{opt}}$.
Two-phase box constraint $\psi(s) = s$ on $[\alpha,\beta]$, $+\infty$ otherwise: The auxiliary problem yields an optimal coefficient $a_{\mathrm{opt}}\in L^\infty(\Omega)$ taking values in $[\alpha,\beta]$, and under regularity of $\Omega$ and $f$ one gets extra smoothness (e.g. $\nabla a_{\mathrm{opt}}\cdot \nabla \bar u \in L^2(\Omega)$).

General Coefficients and G-Closure #

For a general cost: $$\min_{a\ge 0}\min_{u} \int_\Omega (j(x,u)+\psi(a)),dx \quad \text{s.t. } u \text{ solves } -\mathrm{div}(a\nabla u)=f,$$ existence of an optimal $a$ may fail.

The relaxed problem is naturally expressed via G-convergence: sequences of scalar coefficients $a_n\in[\alpha,\beta]$ can generate limit operators with matrix-valued coefficients $A(x)$, described by the celebrated Murat–Tartar G-closure.

The G-closure set $\mathcal{A}$ consists of symmetric matrices $A(x)$ whose eigenvalues $\lambda_1\le\cdots\le\lambda_d$ lie in $[\alpha,\beta]$ and satisfy a family of inequalities depending on a mixing parameter $t\in[0,1]$, involving the arithmetic and harmonic means $\mu_t, \nu_t$ of $\alpha,\beta$. For $d=2$, this gives an explicit admissible region in the $(\lambda_1,\lambda_2)$-plane.

Relaxed functionals of the form $\int \psi(x,a),dx$ over G-limits have been studied in special cases, e.g. $\psi(x,a)=g(x)a$, where one can express the relaxation in terms of the largest eigenvalue $\lambda_{\max}(A(x))$. The authors show a numerical example where the relaxed optimal matrix $A_{\mathrm{opt}}$ has eigenvalues $\lambda_1\neq \lambda_2$ on a set of positive measure, revealing genuine microstructure.

Optimal Potentials: Shaping the “Landscape” $V(x)$ #

Here the control is a nonnegative potential $V$ in $$-\Delta u + V u = f, \quad u\in H_0^1(\Omega).$$ The cost is: $$\min \int_\Omega (j(x,u) + \psi(V)),dx,$$ with $V\ge 0$ and $\psi$ convex, l.s.c., super-linear (so any finite-cost $V$ lies in $L^1(\Omega)$).

Compliance Case: Eliminating the Control #

For the compliance choice $j(x,u) = f(x)u$, the problem can again be reduced to a variational problem in $u$ only.

Define: $$ E(V) = \min_{u\in H_0^1(\Omega)} \int_\Omega \left(\tfrac{1}{2} |\nabla u|^2 + \tfrac{1}{2} V u^2 - f u\right)dx, \quad \Psi(V)=\int_\Omega \psi(V),dx. $$

Minimizing $-2E(V)+\Psi(V)$ over $V\ge 0$ is equivalent to: $$ \min_{u\in H_0^1(\Omega)} \int_\Omega \left(|\nabla u|^2 + \psi^*(u^2) - 2 f u\right)dx, $$ a semi-linear elliptic problem in $u$ with nonlinearity $g(s)=s(\psi^*)’(s^2)$. The optimal state $\bar u$ solves: $$ -\Delta u + g(u) = f, \quad u\in H_0^1(\Omega), $$ and the optimal potential is: $$ V_{\mathrm{opt}} = (\psi^*)’(\bar u^2). $$ So in this special case the control can be explicitly reconstructed from the state.

General Costs, Adjoint Equation, and Regularity #

For a general $j(x,u)$, the authors prove an existence theorem of an optimal $V_{\mathrm{opt}}\in L^1(\Omega)$ under natural growth and coercivity assumptions on $j$ and super-linearity of $\psi$.

Optimality conditions involve:

The state $\bar u$ solving $-\Delta u + V_{\mathrm{opt}}u = f$.
An adjoint state $v$ solving $-\Delta v + V_{\mathrm{opt}} v = \partial_s j(x,\bar u)$.
A sub-differential relation $\bar u v \in \partial\psi(V_{\mathrm{opt}})$, rewritten as a point-wise inequality $h^{-}(\bar u v) \le V_{\mathrm{opt}} \le h(\bar u v)$, where $h$ is built from the sub-differential of $\psi$.

From here, regularity of $V_{\mathrm{opt}}$ is linked to properties of $h$ and to elliptic regularity for $\bar u$ and $v$. Under strengthened assumptions on $j$, $f$, and $\Omega$, the authors show that $\bar u, v \in W^{2,q}(\Omega)$ for some $q>d/2$ (hence continuous), and the product $\bar u v V_{\mathrm{opt}}$ is in $BV(\Omega)$, so $V_{\mathrm{opt}}\in BV_{\mathrm{loc}}(\Omega\setminus K)$ where $K = {\bar u v =0}$. This identifies the “degeneracy set” $K$ as the core where singularities of the optimal potential may concentrate.

Bang–Bang Potentials: If $\psi$ is flat on an interval $[\alpha,\beta]$ (e.g. $\psi(s) = s$ on $[\alpha,\beta]$, $+\infty$ otherwise), the function $h$ becomes multi-valued and the optimal potential is bang–bang: $$ V_{\mathrm{opt}} = \alpha + (\beta-\alpha)\mathbf{1}_E $$ for some set $E$ of finite perimeter. The paper includes numerical simulations showing the geometry of such sets for specific loads $f$.

Optimal Sources: Choosing the Right-Hand Side #

Finally, the control is the source $f$ in $-\Delta u = f$, $u\in H_0^1(\Omega)$, with cost $J(f) = \int_\Omega j(x,u_f,f),dx$ and constraint $\int_\Omega \psi(f),dx\le m$.

Existence with Superlinear and Linear $\psi$: If $\psi$ is super-linear and $j$ satisfies suitable lower bounds and convexity in $f$, then an optimal $f_{\mathrm{opt}}\in L^1(\Omega)$ exists.

If $\psi$ has linear growth, the natural admissible class is signed measures $f$ with finite total variation, and $\int \psi(f)$ is defined via the Lebesgue–singular decomposition and recession coefficients $c_-(\psi), c_+(\psi)$. Under a decomposition $j(x,s,z)=A(x,s)+B(x,z)$ with specific structure and lower bounds, the functional is lower semi-continuous under weak-* convergence of measures, and there exists an optimal measure-valued source $f_{\mathrm{opt}}$.

Optimality Conditions and Bang–Bang Description: Introduce the self-adjoint resolvent operator $R$ mapping a source $f$ to the solution $u_f$. Under differentiability and growth conditions on $j$, the authors derive necessary (and, under convexity, sufficient) conditions for optimality. For super-linear $\psi$, define: $$ w := R\big(\partial_s j(x, R(f_{\mathrm{opt}}), f_{\mathrm{opt}})\big) + \partial_z j(x, R(f_{\mathrm{opt}}), f_{\mathrm{opt}}). $$ Then there is $\lambda \ge 0$ such that either:

$\lambda=0$: $w$ has a fixed sign and $f_{\mathrm{opt}}$ saturates the endpoints of $\mathrm{dom}(\psi)$ on the regions where $w$ is strictly positive/negative — a pure bang–bang behavior.
$\lambda>0$: the constraint is saturated, $\int \psi(f_{\mathrm{opt}})=m$, and $f_{\mathrm{opt}}$ satisfies a point-wise equality involving $\psi$, its conjugate $\psi^*$, and $w$.

For linear-growth $\psi$, a similar structure holds, but the singular part of $f_{\mathrm{opt}}$ is supported on level sets where $w$ hits thresholds determined by the slopes $c_-(\psi), c_+(\psi)$.

Spectral Example: Maximizing Energy Under an $L^2$ Constraint

For: $$ j(u) = -\tfrac{1}{2} u^2, \quad \psi(s)=\tfrac{1}{2} s^2, $$ the problem becomes: $$ \max \left\{\frac{1}{2}\int_\Omega u_f^2 f,dx : \int_\Omega f^2,dx \right\}. $$

The optimality system shows that the optimal source $f$ satisfies a fourth-order eigenvalue problem $\Delta^2 f = f/\lambda$, equivalent to an eigenvalue problem for the Laplacian. The maximizer is a multiple of the first Dirichlet eigenfunction $\varphi$ of $-\Delta$: $$ f = \pm \sqrt{2m},\varphi, \quad \lambda = 1/\mu_1^2, $$ where $\mu_1$ is the first eigenvalue. The paper includes a numerical plot for such an optimal source in an ellipse.

Compliance with Box Constraints on the Source: For compliance with box constraints: $$ \min \left\{\int_\Omega f,R(f),dx : \int_\Omega f,dx \ge m,\ f\in[\alpha,\beta]\right\}, \quad 0\le \alpha<\beta, $$ the optimal source is bang–bang: $$ f _{\mathrm{opt}} = \alpha,\mathbf{1} _E + \beta,\mathbf{1} _{\Omega\setminus E}, $$ with $E = {R(f _{\mathrm{opt}}) < s}$ and $s$ chosen to fit the mass constraint. The corresponding state solves: $$ -\Delta u = \beta,\mathbf{1} _{\{u<s\}} + \alpha,\mathbf{1} _{\{u>s\}}. $$

Using results from their previous work on optimal potentials, the authors prove that $f _{\mathrm{opt}} \in BV(\Omega)$: the interface between the regions where $f=\alpha$ and $f=\beta$ has finite perimeter.

If $\Omega$ is convex, they go further: in the special case $\alpha = 0$, $f _{\mathrm{opt}} = \mathbf{1} _E$ with $E = {w < s}$, where $w$ solves $-\Delta w = \mathbf{1} _{\{w<s\}}$. They show that the optimal set $E$ is convex and its boundary is of class $C^1$. So in convex domains, the region where you “turn on” the source to maximize stiffness is itself a smooth convex set.

References #

[1] Buttazzo, G., Casado-Díaz, J., & Maestre, F. (2025). Optimal sources for elliptic PDEs. arXiv preprint arXiv:2509.01521.

1
2
3
4
5
6


@article{buttazzo2025optimal,
 title={Optimal sources for elliptic PDEs},
 author={Buttazzo, Giuseppe and Casado-D{\'\i}az, Juan and Maestre, Faustino},
 journal={arXiv preprint arXiv:2509.01521},
 year={2025}
}

[2] Buttazzo, G., Casado-Díaz, J., & Maestre, F. (2025). Optimal coefficients for elliptic PDEs. arXiv preprint arXiv:2512.08431.

1
2
3
4
5
6


@article{buttazzo2025optimal,
 title={Optimal coefficients for elliptic PDEs},
 author={Buttazzo, Giuseppe and Casado-D{\'\i}az, Juan and Maestre, Faustino},
 journal={arXiv preprint arXiv:2512.08431},
 year={2025}
}

[3] Buttazzo, G., Casado-Díaz, J., & Maestre, F. (2026). Optimization problems for elliptic PDEs. arXiv preprint arXiv:2601.01591.

1
2
3
4
5
6


@article{buttazzo2026optimization,
 title={Optimization problems for elliptic PDEs},
 author={Buttazzo, Giuseppe and Casado-D{\'\i}az, Juan and Maestre, Faustino},
 journal={arXiv preprint arXiv:2601.01591},
 year={2026}
}

Paper Reading - Optimal coefficients for elliptic PDEs (2512.08431)

Thu, 19 Feb 2026 00:00:00 +0000

This paper gives a clear, fairly complete picture of how to optimally choose the coefficient $a(x)$ (think “material quality”) in an elliptic PDE, with compliance as the main model and then a general optimal control formulation.

Problem Setup #

Considering the boundary value problem: $$ -{\rm div}(a(x)\nabla u) = f \quad\text{in } \Omega,\qquad u=0 \text{ on } \partial\Omega, $$ where $\Omega$ is a bounded domain, $f$ is a given load, and $a(x)$ is the design variable.

Typical assumptions on $a(x)$:

Point-wise bounds $\alpha \le a(x) \le \beta$ (two material qualities, e.g., “soft” vs “stiff”).
Possibly a budget constraint (e.g., only a fixed fraction of the domain can use the best material $\beta$).

The map $a \mapsto u_a$ is well-defined by elliptic theory: for each admissible $a$, the PDE has a unique weak solution in $H_0^1(\Omega)$.

Example #

The elastic compliance is a classical cost in mechanics: it measures how much the structure deforms under the load $f$. In this setting, a standard functional is

either $C(a) = \int_\Omega f,u_a,dx$ (work of the load),
or equivalently the elastic energy $\int_\Omega a(x),|\nabla u_a|^2,dx$ up to constants.

Minimizing the compliance means:

Given a fixed load and a given volume of good material, distribute (a(x)) in (\Omega) so that the resulting displacement (u_a) is as small as possible in the energy sense.

Key qualitative facts the paper emphasizes in this compliance setting:

Existence: under standard bounds $\alpha \le a \le \beta$ and a convex constraint (like a fixed integral of $a$), there exists at least one optimal coefficient $a_{\text{opt}}$.
Extremal behavior: because the compliance functional is convex in $u$ but often leads to a concave dependence on $a$ under constraints, optimal $a_{\text{opt}}$ tend to take values only at the extremes $\alpha$ or $\beta$ almost everywhere, a typical “black-and-white” design phenomenon known in topology optimization.

Intuitively, if we can choose between “bad” and “good” material at each point but only have a limited budget of good material, it is never optimal to mix them continuously; we either go full good or full bad locally and let the PDE determine where gradients are large so good material is most effective.

From two-phase design to optimal control #

The authors then move to a more general PDE-constrained optimal control view: $a(x)$ is the control, the PDE is the state equation, and the cost is an abstract functional $$ J(a) = \int_\Omega j(x, u_a(x), a(x), \nabla u_a(x)),dx, $$ possibly plus boundary or integral terms.

In this general framework:

The admissible set $\mathcal{A}$ of coefficients may encode box constraints, integral constraints, or more refined structure (e.g., multi-phase materials).
The goal is to minimize $J(a)$ over $\mathcal{A}$.

The paper outlines how standard tools of optimal control of PDEs apply:

Adjoint equation: one introduces an adjoint state $p$ solving its own elliptic problem linked to derivatives of $j$ with respect to $u$ and $\nabla u$.
First-order optimality: optimal coefficients satisfy variational inequalities or pointwise optimality conditions involving $a_{\text{opt}}$, $u_{a_{\text{opt}}}$, and $p$.

In simple situations, one gets an explicit “gradient” of the cost with respect to the coefficient:

local changes in $a(x)$ are weighted by expressions involving $\nabla u$ and $\nabla p$;
this tells us where increasing stiffness (raising $a$) helps most, and where it is wasteful.

This general perspective makes clear that compliance minimization is just one concrete instance of a broader family of coefficient optimization problems.

Bang–bang and intermediate materials #

A recurring theme, already visible in compliance, is whether optimal coefficients are bang–bang (only $\alpha$ or $\beta$) or can take intermediate values.

The paper’s message, in line with the authors’ broader work, is:

Under linear or suitably convex-structured costs and simple constraints, the optimization problem often favors extreme coefficients because any “grey” intermediate material can be improved by redistributing toward the extremes while keeping constraints satisfied.
If instead the cost penalizes variations of $a$ (e.g., includes $|\nabla a|$ or a strictly convex cost of $a$), then intermediate values can become optimal and the design becomes smoother.

This has practical consequences:

For pure stiffness or compliance problems, we should expect “black-and-white” topologies.
For problems where manufacturing or grading costs matter, optimal designs may be graded rather than sharply two-phase.

Applications #

Even though the arXiv abstract is brief, the paper’s role is clear: it systematizes and clarifies the theory of optimal coefficients for elliptic PDEs in two complementary regimes—compliance and more general optimal control.

For engineers and applied mathematicians, the main takeaways are:

We can rigorously frame “optimal material distribution” as an elliptic PDE with a coefficient control and prove existence of optimal designs under realistic constraints.
In many practically relevant cases (especially compliance), optimal designs heavily favor extreme phases, justifying the common use of binary material models in topology optimization.
Adjoint-based optimality conditions give a computable sensitivity of the cost to local changes in $a$, providing the mathematical underpinning for gradient-based optimization algorithms.

If we imagine designing a bridge deck or a heat sink, this theory tells us:

where to place stiff or conductive material,
why optimal layouts tend to be sharply separated regions of different material,
and how to systematically refine the design using PDE solutions and their adjoints.

References #

[1] Buttazzo, G., Casado-Díaz, J., & Maestre, F. (2025). Optimal sources for elliptic PDEs. arXiv preprint arXiv:2509.01521.

1
2
3
4
5
6


@article{buttazzo2025optimal,
 title={Optimal sources for elliptic PDEs},
 author={Buttazzo, Giuseppe and Casado-D{\'\i}az, Juan and Maestre, Faustino},
 journal={arXiv preprint arXiv:2509.01521},
 year={2025}
}

[2] Buttazzo, G., Casado-Díaz, J., & Maestre, F. (2025). Optimal coefficients for elliptic PDEs. arXiv preprint arXiv:2512.08431.

1
2
3
4
5
6


@article{buttazzo2025optimal,
 title={Optimal coefficients for elliptic PDEs},
 author={Buttazzo, Giuseppe and Casado-D{\'\i}az, Juan and Maestre, Faustino},
 journal={arXiv preprint arXiv:2512.08431},
 year={2025}
}

Paper Reading - Optimal sources for elliptic PDEs (2509.01521)

Wed, 18 Feb 2026 00:00:00 +0000

Introduction #

The authors study how to “best choose” a source term $f$ in a Poisson-type equation $$ -\Delta u = f \quad\quad\text{in }\Omega,\quad u = 0\text{ on }\partial\Omega, $$ so that a given performance measure (a cost functional) is optimized. The twist is that the source itself is the control, and it can be subject to various constraints (size, bounds, sign, etc.). This makes the problem sit at the intersection of optimal control, shape optimization, and regularity theory.

The basic optimization setup #

First, we fix a bounded domain $\Omega \subset \mathbb{R}^d$ and, for each admissible source $f$, we solve the PDE to get the state $u_f$. Then we evaluate a cost function which defined as follow: $$ J(f) = \int_\Omega j(x, u_f(x), f(x)),dx, $$ and we want to minimize $J$ over all admissible $f$.

The admissible class is defined via an integral constraint: $$ \int_\Omega \psi(f),dx \le m, $$ for some convex function $\psi$. Different choices of $\psi$ encode different types of constraints:

Super-linear $\psi$ (growing faster than $|s|$) keeps $f$ in $L^1$ and “penalizes” large values strongly.
Linearly growing $\psi$ allows $f$ to be a measure (e.g., sums of Dirac masses), not just a function.

The first main result: under mild assumptions on $j$ and $\psi$, the problem always has at least one optimal source $f_{\text{opt}}$ (either as a function or a finite measure, depending on growth).

When optimal sources are “all or nothing” (bang–bang phenomenon) #

A central theme is the bang–bang phenomenon: in many natural constraints, the best source uses only its extreme admissible values, like $f = \alpha$ or $f = \beta$, with no intermediate levels.

This occurs, for instance, when we impose point-wise bounds: $$ \alpha \le f \le \beta $$ and choose a suitable $\psi$ that is affine on $[\alpha,\beta]$. Then the optimal source takes the form: $$ f _{\text{opt}} = \beta,\mathbf{1} _E + \alpha,\mathbf{1} _{\Omega\setminus E} $$ for some measurable set $E\subset \Omega$. At that point the problem becomes a shape optimization problem in the unknown set $E$.

The authors derive a precise system of necessary optimality conditions using a Lagrange multiplier $\lambda$ and an adjoint state $w$ (solution of another elliptic problem). Roughly:

$w$ is built from derivatives of the integrand $j$ with respect to $u$ and $f$.
The sign of $w+\lambda$ decides whether $f_{\text{opt}}$ equals $\alpha$ or $\beta$ at each point.

They show when these conditions are also sufficient, so we can fully characterize optimal controls in convex cases.

A key structural insight: bang–bang behavior appears if and only if $\psi$ is not strictly convex on some interval (it is affine on a nontrivial segment). If $\psi$ is strictly convex (e.g., $\psi(s)=s^2$), the optimal source is more regular and not bang–bang.

Important model examples #

The paper discusses several instructive choices of $\psi$ and $j$, each corresponding to a classical PDE optimization problem:

Total variation constraint: $\psi(s)=|s|$.
- The admissible sources are bounded measures with total variation at most $m$.
- Optimality conditions show that $f_{\text{opt}}$ is supported where an adjoint field $w$ saturates a threshold.
- In radially symmetric cases (e.g., $\Omega$ a ball, linear cost), the optimal source is a Dirac delta at the center.
Nonnegative sources with mass constraint:
- $\psi(s)=s$ for $s\ge0$, $\psi(s)=+\infty$ otherwise.
- One finds conditions under which the optimal $f$ is a single Dirac mass carrying all the “budget”.
- For certain power-type functionals $\int |u|^p$, existence and structure of maximizers are detailed.
Box-constrained sources $\alpha \le f \le \beta$ with a volume (mass) constraint $\int f \le m$:
- The authors show precisely when the optimal $f$ is constant (always $\alpha$ or always $\beta$) and when it becomes a genuine bang–bang mixture of both extremes.
- Strict monotonicity of $j$ in $u$ tends to force true bang–bang solutions.
Tracking a target state:
- Cost $J(f)=\int_\Omega |u_f - u_0|^2 dx$ with $\alpha \le f \le \beta$.
- Under mild assumptions on the target $u_0$, the unique optimal control is bang–bang almost everywhere, again determined by the sign of an adjoint field.
Strictly convex $\psi$, like $\psi(s)=s^2$:
- Then the optimal control is not bang–bang but a continuous function explicitly related to $w$ and the mass constraint.
Compliance optimization:
- Minimize $\int_\Omega f u_f,dx$ under $\alpha \le f \le \beta$ and $\int f \ge m$.
- This is equivalent to maximizing the elastic energy of the system with bounded loads.
- For $0\le \alpha < \beta$, the optimal right-hand side is bang–bang; the domain splits into two regions where the load is either $\alpha$ or $\beta$.

Regularity of the optimal sets and interfaces #

Once we know the optimal control is bang–bang, the main qualitative object is the interface between the regions where $f=\alpha$ and $f=\beta$.

The interface is essentially a level set of an elliptic solution $u$ (or of the adjoint $w$), so understanding its geometry is a regularity problem.

Bounded variation (BV) regularity #

In a first model case (compliance with $0\le \alpha < \beta$), the authors show that the optimal source $f_{\text{opt}}$ belongs to the space $BV(\Omega)$. This means the interface set has finite perimeter: geometrically, the boundary between phases has finite (d–1)-dimensional measure.

More generally, they derive estimates that control the curvature-like quantities of $u$ via the $BV$-norm of $f$.

A refined view near critical points #

A tougher issue is what happens on the set where $\nabla u=0$, because level sets can get very wild there. The authors prove:

For data $f \in BV(\Omega)$ satisfying a uniform positivity $f \ge \alpha>0$, certain weighted quantities like

$$ \int \frac{1}{|\nabla u|},\frac{1}{\log^q(1/|\nabla u|)},dx $$

stay finite for any $q>1$.

They then construct weights involving $\log(1/|\nabla u|)$ which “switch off” exactly where $\nabla u=0$, and show that appropriately weighted indicators of level sets belong to $BV$.

In particular, they define a refined Hausdorff-type measure $H_{d-1,q}$ with logarithmic weights and prove that, for sufficiently regular $f$, the set ${\nabla u=0}$ has zero $H_{d-1,q}$-measure for all $q>1$. This implies that the critical set has Hausdorff dimension at most $d-1$, with an even stronger “thinness” encoded by the log weights.

Convex domains: convex and smooth optimal regions #

In the compliance case on a convex domain $\Omega$, the structure is even nicer. The optimal set $E={x : f_{\text{opt}}(x)=\beta}$ coincides with a sublevel set of a solution to a semi-linear equation.

Using a result of Caffarelli–Spruck type convexity for level sets, they show:

$E$ is itself convex.
One can rule out “corners”, and deduce that the boundary of $E$ is actually of class $C^1$.

So in convex domains, the optimal high-load region is a smooth convex set.

Summary #

This work gives a unified and quite complete picture of how optimal sources for elliptic PDEs behave under natural constraints:

It establishes existence of optimal controls for broad classes of convex functionals and constraints.
It identifies exactly when we get bang–bang sources, turning a PDE control problem into a shape optimization problem.
It provides sharp optimality conditions through adjoint states and sub-differential characterizations, allowing practical characterization and numerical approximation of optimal controls.
It develops regularity theory for the resulting optimal sets and interfaces, including BV estimates, structure of level sets, and refined control of critical sets.
For people working in optimal design, structural mechanics, or inverse problems, the message is: if our cost is convex and our constraint has a “flat” part (non-strictly convex $\psi$), expect extreme, piecewise-constant sources with reasonably regular interfaces that we can analyze geometrically and approximate numerically.

References #

[1] Buttazzo, G., Casado-Díaz, J., & Maestre, F. (2025). Optimal sources for elliptic PDEs. arXiv preprint arXiv:2509.01521.

1
2
3
4
5
6


@article{buttazzo2025optimal,
 title={Optimal sources for elliptic PDEs},
 author={Buttazzo, Giuseppe and Casado-D{\'\i}az, Juan and Maestre, Faustino},
 journal={arXiv preprint arXiv:2509.01521},
 year={2025}
}

Restriction and extension

Wed, 29 Oct 2025 00:00:00 +0000

Considering a smooth compact hyper-surface $\mathcal{S}$ in $\mathbb{R}^d$ with surface measure $d\sigma$. Given $f \in L^1(\mathbb{R}^d)$, the Fourier transform defined as follow: $$ \begin{equation} \hat{f}(x) = \int_{\mathbb{R}^d}e^{-2\pi i x \xi}f(x)dx \end{equation} $$ which by Riemann-Lebesgue is a bounded, continuous function vanishing at infinity.

Since $\hat{f}$ is continuous on $\mathbb{R}^d$, by the Rimann-Lesbegue lemma its restriction to the compact hyper-surface $S \subset \mathbb{R}^d$ is is well-defined pointwise. Specifically, the restriction $\hat{f}\mid_{S}: S \rightarrow \mathbb{C}$ is the continuous function given by $$ \begin{equation} \hat{f}\mid_{S}(\sigma) = \hat{f}(\sigma) = \int_{\mathbb{R}^d}e^{-2\pi i x \xi}f(x)dx \end{equation} $$ for each $\sigma \in S$. This is bounded (as $\hat{f}$ is bounded) and can be integrated against the surface measure $d\sigma$ on $S$.

Thus when we restrict $\hat{f}$ to $S$, we get a meaningful function which has finite $L^q$-norm for every $q$ .

When starting with $f \in L^2(\mathbb{R}^d)$, the Fourier transform $\hat{f}$ is not well-defined point-wise in general, so there is no meaningful way to restrict an arbitrary $L^2$ function to a set of measure zero such as the hyper-surface $S$.

For especially, for any given $f \in L^2(\mathbb{R}^d)$, the Fourier transform is defined in the $L^2$ sense via the Plancherel theorem: $$ \begin{equation} \mathcal{F}: L^2(\mathbb{R}^d) \to L^2(\mathbb{R}^d), \quad | \hat{f} | _{L^2} = | f | _{L^2} \end{equation} $$ It is an isometry. So: $$ \begin{equation} \hat{f} \in L^2(\mathbb{R}^d) \end{equation} $$ Since $\hat{f}$ is only an $L^2$ function — it is not necessarily continuous, and not even bounded, and need not have a pointwise value almost everywhere.

So the expression: $$ \begin{equation} \hat{f}|_S(\sigma) = \hat{f}(\sigma), \quad \sigma \in S \end{equation} $$ does not make sense pointwise for arbitrary $f \in L^2$.

The question arises: what happens for $1 < p < 2$?

Question 1:

For which $p$ and $q$ do we have: $$ \begin{equation} ||\hat{f}|| _{L^q(S, d\sigma)} \lesssim ||f|| _{L^p(\mathbb{R}^d)}, \quad \forall f. \end{equation} $$

This is restriction of Fourier transforms to hyper-surfaces problem in Harmonic analysis.

Proof of Theorem of solution of wave equation in the case $n = 1$

Thu, 31 Jul 2025 00:00:00 +0000

Solution of Brezis Problem 8.24 (1) and (2)

Thu, 31 Jul 2025 00:00:00 +0000

Solution of Evans PDE Problem 13

Thu, 31 Jul 2025 00:00:00 +0000

A lemma of J. L. Lions

Tue, 24 Jun 2025 00:00:00 +0000

This post explores J. L. Lions’ lemma about Banach spaces with compact injection, including applications to functional analysis.

Lemma statement:

Let $X$, $Y$, and $Z$ be three Banach spaces with norms $|| \cdot ||_X$, $|| \cdot ||_Y$, and $|| \cdot ||_Z$. Assume that $X \subset Y$ with compact injection and that $Y \subset Z$ with continuous injection. Prove that

$$ \forall \varepsilon > 0, \exists C_\varepsilon > 0 \text{ satisfying } || u ||_Y \leq \varepsilon || u ||_X + C _{\varepsilon}|| u ||_Z,\quad \forall u \in X $$

Applications:

Prove that for every $\varepsilon > 0$ there exists $C_\varepsilon > 0$ satisfying

$$ \max_{t \in [0,1]} |u(t)| \leq \varepsilon \max_{t \in [0,1]} |u’(t)| + C_\varepsilon ||u ||_{L^1}, \quad \forall u \in C^1([0,1]). $$

Pick $p > 1$. Prove that for every $\varepsilon > 0$ there exists $C = C(\varepsilon, p)$ such that

$$ || u || _{L^\infty(0,1)} \leq \varepsilon || u || _{W^{1,p}(0,1)} + C || u || _{L^1(0,1)}, \quad \forall u \in W^{1,p}(0,1). $$

Proof:

For the initial lemma, just argue by contradiction. Assume the contrary that there exists some $\varepsilon_0 > 0$ and a sequence $(u_n)_{n \in \mathbb{Z}^{+}} \subset X$ such that

$$ || u ||_Y > \varepsilon || u ||_X + C _{\varepsilon}|| u ||_Z $$

Then $u_n \ne 0, \forall n \in \mathbb{Z}^{+}$.

Let $v_n := \dfrac{u_n}{|| u_n||_X}$

Then clearly, $||v_n||_X = 1$ and we have

$$ ||v_n|| _Y > \varepsilon_0 + C _{\varepsilon_0}||v_n||_Z $$

Since $X \subset Y$ with compact injection.

Assume without loss generalization, there is $v \in Y$ such that $|| v_n - v|| _Y \rightarrow 0$ as $n \rightarrow \infty$. In particular, we have $(||v_n||) _{n \in \mathbb{Z}^{+}}$ bounded. It follows that $||v_n|| \rightarrow 0$ as $n \rightarrow \infty$.

And because $Y \subset Z$ with continuous injection, we obtain:

$$ ||v_n - v||_Z \rightarrow 0 \quad \text{as} \quad n \rightarrow \infty $$

Then $v = 0$ and $||v_n||_Y \rightarrow 0$ as $n \rightarrow \infty$

On the other hand, we also have

$$ \lim_{n \rightarrow \infty} > \varepsilon_0 + \varepsilon_0\lim_{n \rightarrow \infty}||v_n||_Z $$

Consequently,

$$ 0 > \varepsilon_0 > 0 $$ which is a contradiction. The two application are more or less immediate after using the given lemma. The proof is completed.

Complex Hahn-Banach Theorem

Tue, 24 Jun 2025 00:00:00 +0000

Let $X$ be a complex vector space, $X_0$ one of its subspaces, $p: X \to \mathbb{R}_+$ such that

$$ p(\lambda x) = |\lambda| p(x), \quad \forall \lambda \in \mathbb{C}, x \in X \text{ and } p(x + y) \leq p(x) + p(y), \quad \forall x, y \in X, $$

satisfying $|f(x)| \leq p(x)$, $\forall x \in X_0$, where $f: X_0 \to \mathbb{C}$ is linear.

Under these conditions, there exists a linear functional $F: X \to \mathbb{C}$ such that $F|_{X_0} = f$ and

$$ |F(x)| \leq p(x), \quad \forall x \in X. $$

Proof: Since $f$ is linear, it follows that $\text{Re } f: X_0 \to \mathbb{R}$ is linear and $$ \text{Re } f(x) \leq |f(x)| \leq p(x), \quad \forall x \in X_0. $$

By the Real Hahn-Banach Theorem there exists $g: X \to \mathbb{R}$ a linear functional such that $g$ is an extension for $\text{Re } f$ and $g(x) \leq p(x)$, $\forall x \in X$. We also have $g(x) = -g(-x) \geq -p(x)$ so $|g(x)| \leq p(x)$, $\forall x \in X$.

Define now $F(x) = g(x) - i g(ix)$, $\forall x \in X$. This is obviously linear and if $x \in X_0$ we have $$ F(x) = g(x) - i g(ix) = \text{Re } f(x) - i \text{Re } i f(x) = \text{Re } f(x) + i \text{Im } f(x) = f(x), \quad \forall x \in X_0. $$

For the last part we have $|F(x)| = e^{i\theta} F(x) = F(e^{i\theta} x) = g(e^{i\theta} x)$, because this is a real number. Furthermore, we have $g(e^{i\theta} x) \leq p(e^{i\theta} x) = p(x)$. Combining the two above, we get $$ |F(x)| \leq p(x), \quad \forall x \in X, $$ which solves the theorem.

Real Hahn-Banach Theorem

Tue, 24 Jun 2025 00:00:00 +0000

Suppose $X$ is a vector space over $\mathbb{R}$, $p: X \to \mathbb{R}$ has the following properties:

$p(X) = \lambda p(x)$, $\forall x \in X$, $\lambda \in \mathbb{R}_+$ and $p(x + y) \leq p(x) + p(y)$, $\forall x, y \in X$.
Let $X_0$ be a subspace of $X$ and $u: X_0 \to \mathbb{R}$ a linear functional such that $u(x) \leq p(x)$, $\forall x \in X_0$.

Then we can find $f: X \to \mathbb{R}$ a linear functional such that $f|_{X_0} = u$ and $f(x) \leq u(x)$, $\forall x \in X$.

Proof: Let $Y$ is a subspace of $X$, $g: Y \to \mathbb{R}$ is a linear functional which extends $u$ and $g \leq p$ on $Y$

Consider the set $M = { (Y, g) }$. Define an order relation on $M$ like this $(Y_1, g_1) \leq (Y_2, g_2)$ if $Y_1 \subset Y_2$ and $g_2$ is an extension for $g_1$.

We show that in $M$ every chain has an upper bound. Suppose $M_0$ is a totally ordered subset of $M$. Then define $Y_0 = \bigcup_{(Y,g) \in M_0} Y$ and $g: Y_0 \to \mathbb{R}$, $g(y) = g_0(y)$ if $y \in Y_0$ and $(Y_0, g) \in M_0$. This function is well defined, and $Y_0$ is a subspace of $X$ because the set $M_0$ is totally ordered.

Furthermore, from the definition for $g_0$, we have that $g_0 \leq p$. Therefore $(Y_0, g_0) \in M$, and is obviously an upper bound for $M_0$. By Zorn’s Lemma, we find that $M$ has at least one maximal element $(Z, h)$.

Suppose $X \neq Z$. Then we can find $x_0 \in X \setminus Z$. Define $W = \text{Span}{Z, x_0} = \mathbb{R} \cdot x_0 \oplus Z$. Therefore, $W$ is a linear subspace in $X$. Let $y, z \in Z$. Then $$ h(y) + h(z) = h(y + z) \leq p(y + z) = p(y - x_0 + x_0 + z) \leq p(y - x_0) + p(x_0 + z) $$ Therefore, we have $$ h(z) - p(-x _0 + z) + h(y) - p(y - x _0) \leq - h(y) + p(x _0 + y), \quad\forall y, z \in Z $$

Therefore, we can say $$ a = \sup_{z \in Z} (h(z) - p(-x_0 + z)) \leq - \inf_{y \in Z} (-h(y) + p(x_0 + y)) $$ Pick one $c \in [a, b]$ and define $h_1(z) = \lambda c + h(y)$, where $z = \lambda x_0 + y$ (unique representation), $h_1$ is linear, and extends $h_1$ on $W$, which means that it extends $u$ on $X_0$.

We can check that $(W, h_1) \in M$ and the maximal element $h_1$ is the requested functional element, which is a contradiction.

Therefore $Z = X$, and the maximal element $h_1$ is the requested functional.

Riesz Representation Theorem

Tue, 24 Jun 2025 00:00:00 +0000

1. Riesz Representation Theorem #

Let $H$ be a Hilbert space over $\mathbb{R}$ or $\mathbb{C}$, and $T$ be a bounded linear functional on $H$ (a bounded operator from $H$ to the field $\mathbb{R}$ or $\mathbb{C}$, where $H$ is defined over that field). The following is known as the Riesz Representation Theorem:

Theorem 1:

If $T$ is a bounded linear functional on the Hilbert space $H$, then there exists $g \in H$ such that for every $f \in H$, we have: $$ T(f) = \langle f, g \rangle. $$

Moreover, $|T| = |g|$ (here $|T|$ denotes the operator norm of $T$, while $|g|$ is the Hilbert space norm of $g$).

Now, let’s prove this theorem.

Proof:

Assume that $H$ is separable for now. The proof for any Hilbert space is not much more difficult, but the separable case nicely uses ideas we have developed related to Fourier analysis. Additionally, we will work over $\mathbb{R}$.

Since $H$ is separable, we can choose an orthonormal basis $\phi_j$, $j \geq 1$, for $H$. Let $T$ be a bounded linear functional and set $a_j = T(\phi_j)$. For $f \in H$, set $c_j = \langle f, \phi_j \rangle$, and define $$ f_n = \sum_{j=1}^{n} c_j \phi_j. $$

Since the $\phi_j$ form a basis, we know that $|f - f_n| \to 0$ as $n \to \infty$.

Since $T$ is linear, we have: $$ T(f_n) = \sum_{j=1}^{n} a_j c_j. \tag{1} $$

Since $T$ is bounded, assume with norm $|T| < \infty$, we have: $$ |T(f) - T(f_n)| \leq |T| |f - f_n|. \tag{2} $$

Because $|f - f_n| \to 0$ as $n \to \infty$, we conclude from equations (1) and (2) that: $$ T(f) = \lim_{n\to\infty} T(f_n) = \sum_{j=1}^{\infty} a_j c_j. \tag{3} $$

In fact, the sequence $a_j$ must be square-summable. To see this, first note that since $|T(f)| \leq |T| |f|$, we have: $$ \left|\sum_{j=1}^{\infty} c_j a_j\right| \leq |T| \left(\sum_{j=1}^{\infty} c_j^2\right)^{1/2}. \tag{4} $$

Equation (4) must hold for every square-summable sequence $c_j$ (since any such $c_j$ corresponds to some element in $H$). Fix a positive integer $N$ and define the sequence $c_j = a_j$ for $j \leq N$, $c_j = 0$ for $j > N$. Clearly, such a sequence is square-summable, and equation (4) gives us: $$ \left(\sum_{j=1}^{N} a_j^2\right)^{1/2} \leq |T|. \tag{5} $$

Thus, $a_j$ is square-summable, as the sequence of partial sums is bounded above.

Since $a_j$ is square-summable, the function $g = \sum_{j} a_j \phi_j$ is well-defined as an element of $H$, and $T(f) = \sum_{j} a_j c_j = \langle f, g \rangle$. Finally, equation (5) shows that $|g| \leq |T|$. But from the Cauchy-Schwarz inequality, we also have $|T(f)| = |\langle f, g \rangle| \leq |f| |g|$ or $\frac{|T(f)|}{|f|} \leq |g|$, implying $|T| \leq |g|$, hence $|T| = |g|$. The proof is complete.

2. Application to PDE #

This example illustrates how functional analysis methods are used in PDEs (although the example is for an ODE). Consider the ODE: $$ -f’’(x) + b(x)f(x) = q(x) \tag{6} $$

on the interval $0 < x < 1$, with $b(x) \geq \delta > 0$ for some $\delta$; assume the functions $b$ and $q$ are continuous on $[0, 1]$. We want to find a solution to equation (6) with $f’(0) = f’(1) = 0$ (other boundary conditions could also be applied). If we multiply (6) by a $C^1$ function $\phi$ and integrate the first term, $-f’’\phi$, by parts from $x = 0$ to $x = 1$, we obtain: $$ \int_0^1 (f’(x)\phi’(x) + b(x)f(x)\phi(x)),dx = \int_0^1 q(x)\phi(x),dx. \tag{7} $$

Equation (7) must hold for every $\phi \in C^1([0, 1])$, if $f$ is a $C^2(0, 1)$ solution of equation (6) that is continuous on $[0, 1]$. Conversely, if for a $C^2$ function $f$, we find that (7) holds for every $\phi$, then $f$ must be a solution of equation (6), because if we “undo” the integration by parts in (7), we get: $$ \phi(1)f’(1) - \phi(0)f’(0) + \phi(x)(-f’’(x) + b(x)f(x)) = \phi(x)q(x) $$ for every $\phi$.

A familiar PDE argument then shows that $f’(0) = f’(1) = 0$ and equation (6) must hold.

We will show that there is a unique solution to equation (7). Such a “solution” does not necessarily need to be twice differentiable as required by equation (6), but it will satisfy equation (7). Equation (7) is often called the “weak” form of the problem.

Define an inner product: $$ \langle g, h \rangle = \int_0^1 (g’(x)h’(x) + b(x)g(x)h(x)),dx $$

on the space $C^1([0, 1])$, and let $H$ denote the completion of this space. This is essentially the procedure used on the third problem of the first exam; the presence of $b(x)$ makes no difference. (Note that we must use $b \geq \delta > 0$ to ensure that $\langle \cdot, \cdot \rangle$ is indeed an inner product, so that $|g| = \sqrt{\langle g, g \rangle} = 0$ if and only if $g \equiv 0$.) The space $H$ is a Hilbert space and can be understood (if needed) as a subspace of $C([0, 1])$.

Define a functional $T : H \to \mathbb{R}$ by: $$ T(\phi) = \int_0^1 q(x)\phi(x),dx $$

You can easily check that $T$ is bounded on $H$ (using Cauchy-Schwarz). From the Riesz Representation Theorem, it follows that there must exist a function $f \in H$ such that: $$ T(\phi) = \langle f, \phi \rangle $$

for every $\phi \in H$. This is exactly equation (7), the weak form of the ODE!

The function $f$ satisfying equation (7) lies in $H$. Under the conditions on $b$ (specifically, $b \geq \delta > 0$ and $|b|_\infty < \infty$ since $b \in C([0, 1])$), the function $f$ lies in the same space defined in the third problem of the first exam. Specifically, $f$ is a continuous function. Proving that $f$ is actually twice differentiable requires more work, along with additional assumptions about the function $q$.

References #

[1] (Original) The Riesz Representation Theorem, MA 466, Kurt Bryan

The application of Hahn-Banach Theorem 01

Tue, 24 Jun 2025 00:00:00 +0000

Suppose $X$ is a normed space and $X_0$ is a closed subspace of $X$ and $x_0 \in X \setminus X_0$. Then we can find $f \in X’$ such that $f(x_0) = 1$ and $f(x) = 0$, $\forall x \in X_0$.

Proof: Since $x_0 \notin X_0$, we can find $\delta > 0$ such that $|x_0 - x| \geq \delta$, $\forall x \in X_0$, which is equivalent to $1 \leq \dfrac{|x_0 - x|}{\delta}$, $\forall x \in X_0$.

Define $Y = \text{Span}{x_0, X_0} = X_0 \oplus \mathbb{K} \cdot x_0$. Then for each $y \in Y$ we can find a unique $\lambda \in \mathbb{K}$ such that $u = \lambda x_0 + x$, $x \in X_0$. Define $u: Y \to \mathbb{K}$ by $u(y) = u(\lambda x_0 + x) = \lambda$. It is well defined and linear.

Furthermore, we have: $$|u(y)| = |\lambda| \leq |\lambda| \frac{|x _0 + x|}{\delta} = \frac{1}{\delta} |y| \quad \text{for} \lambda \neq 0$$ If $\lambda = 0$, then $y \in X_0$ and $u(y) = 0 \leq \frac{1}{\delta} |y|$.

Therefore, we obtain
$$ u(y) \leq \frac{1}{\delta} |y| \quad\forall y \in Y $$ By Hahn-Banach’s Theorem, we can extend $u$ to $f: X \to \mathbb{K}$ such that $f|_Y = u$ and $|f(x)| \leq \dfrac{1}{\delta} |x|$, $\forall x \in X$. Therefore $f(x_0) = u(x_0) = 1$ and $x \in X_0 \Rightarrow f(x) = 0$.

The application of Hahn-Banach Theorem 02

Tue, 24 Jun 2025 00:00:00 +0000

$X'$ = $\{ f: X \to \mathbb{K} \}$ where $f$ is is linear and continuous and $X$ is a Banach space over $\mathbb{K}$. Prove that $X' \neq {0}$, in fact, for every $x \neq 0 \in X$, we can find $f \in X’$ such that $f(x) = |x|$ and $|f| = 1$.

Proof: Pick $x_0 \in X$. Define $X_0 = x_0 \cdot \mathbb{K}$, a subspace of $X$, and $g: X_0 \to \mathbb{K}$, $g(x) = x$, which is linear. Since $g$ and $|\cdot|$ satisfy the conditions of the Hahn-Banach theorem, we can find $f: X \to \mathbb{K}$ such that $f|_{X_0} = g$, $f$ is linear and $f(x) \leq |x|$, $\forall x \in X$. Therefore $f(x_0) = g(x_0) = |x_0|$ and $|f| \leq 1$. The equality $f(x_0) = |x_0|$ guarantees that $|f| = 1$.