Home on Nam Le

Navier–Stokes Existence and Smoothness

Fri, 29 May 2026 00:00:00 +0000

The motion of a viscous incompressible fluid is described by the Navier–Stokes equations, first written down by Claude-Louis Navier in 1822 and given their modern form by George Gabriel Stokes. Whether smooth solutions to these equations can always be continued for all time (or whether they can spontaneously develop a singularity at some finite time) is one of the deepest open problems in mathematics, and one of the seven Clay Millennium Prize Problems, carrying a 1,000,000$ prize for a solution.

Problem (Clay Millennium Prize, Fefferman 2000)

Let $u_0 : \mathbb{R}^3 \to \mathbb{R}^3$ be a smooth divergence-free vector field. Does there exist a smooth solution $u(x,t)$, $p(x,t)$ to the 3D incompressible Navier–Stokes equations $$\partial_t u + (u \cdot \nabla)u - \nu\Delta u + \nabla p = 0, \qquad \nabla \cdot u = 0, \qquad u(\cdot,0) = u_0$$ defined for all $t > 0$ and satisfying $\int_{\mathbb{R}^3}|u(x,t)|^2,dx < C$ for all $t \geq 0$? A solution or a counterexample (a smooth $u_0$ for which no such smooth solution exists) both qualify for the prize.

The Equations and Their Scaling #

Compared to the Euler equations (which describe inviscid flow), the Navier–Stokes equations add the viscous term $\nu\Delta u$, where $\nu > 0$ is the kinematic viscosity. This term dissipates energy and regularises the flow locally. The central tension is that the nonlinear term $(u\cdot\nabla)u$ can concentrate energy at small spatial scales faster than viscosity can diffuse it away.

Scaling symmetry. The Navier–Stokes equations are invariant under the rescaling $$u(x,t) \mapsto \lambda u(\lambda x,, \lambda^2 t), \qquad p(x,t) \mapsto \lambda^2 p(\lambda x,, \lambda^2 t).$$ A norm is critical (or scale-invariant) if it is preserved by this rescaling. The critical norm in $L^p(\mathbb{R}^3)$ is $L^3$, since $|\lambda u(\lambda\cdot)| _{L^3} = |u| _{L^3}$. The energy norm $|u| _{L^2}$ is subcritical: it scales as $\lambda^{1/2}|u| _{L^2}$, which shrinks under the rescaling $\lambda \to \infty$ (i.e., zoom into small scales). This mismatch is the core of the difficulty: global energy control does not prevent concentration at arbitrarily small scales.

2D global regularity. In two dimensions the scaling is different: the enstrophy $|\nabla u|_{L^2}^2$ is scale-invariant and is controlled by the energy. Global regularity in 2D follows from this enstrophy estimate, a fact known since the 1960s. In 3D no analogous critical quantity is controlled globally, and the problem is open.

The Hierarchy of Known Results #

Leray–Hopf Weak Solutions (1934) #

Theorem (Leray 1934, Hopf 1951)

For any $u_0 \in L^2(\mathbb{R}^3)$ divergence-free, there exists a global weak solution $u \in L^\infty(0,\infty;, L^2) \cap L^2(0,\infty;, H^1)$ satisfying the energy inequality $$|u(t)| _{L^2}^2 + 2\nu\int _0^t |\nabla u| _{L^2}^2, ds \leq |u_0| _{L^2}^2.$$

Leray’s construction, via a compactness argument on regularised equations, produces a solution that is globally defined but potentially not smooth, and the term “weak” refers to the fact that the equations are satisfied only in an integral (distributional) sense, not pointwise. The energy inequality is the only bound available globally. Whether Leray–Hopf solutions are unique, or whether they are the same as smooth solutions when the initial data is smooth, is unknown.

Partial Regularity: The CKN Theorem #

The best known result limiting the size of potential singularities is the following.

Theorem (Caffarelli–Kohn–Nirenberg, 1982)

For any suitable weak solution to the 3D Navier–Stokes equations, the set of space-time singular points has parabolic Hausdorff dimension at most 1. In particular, at any given time the spatial singular set has Hausdorff dimension at most $\dfrac{1}{2}$.

A “suitable weak solution” is a weak solution satisfying a local energy inequality. The CKN theorem proves that singularities, if they exist, cannot fill a curve or surface: they can occupy at most a set of dimension one in space-time. This is the most quantitative partial regularity result available and was simplified by Lin (1998). Scheffer (1977) had earlier shown singular times have Hausdorff dimension at most $\dfrac{1}{2}$.

Conditional Regularity: Ladyzhenskaya–Prodi–Serrin #

Theorem (Ladyzhenskaya 1967, Prodi 1959, Serrin 1962)

If a weak solution additionally satisfies $u \in L^r(0,T;, L^s(\mathbb{R}^3))$ with $\dfrac{2}{r} + \dfrac{3}{s} = 1$ and $3 < s \leq \infty$, then $u$ is smooth on $(0,T]$.

The condition $\dfrac{2}{r} + \dfrac{3}{s} = 1$ is precisely the scale-invariant line in the $(r,s)$ plane: membership in any of these spaces implies regularity. The family ranges from $(r,s)=(\infty, 3)$ (critical $L^3$ control in space, uniform in time) to $(r,s)=(2,\infty)$ (square-integrable $L^\infty$ control in time). These are conditional results: they do not prove that a weak solution lies in such a space, only that if it does, it must be smooth.

The Critical Endpoint: Escauriaza–Seregin–Šverák #

Theorem (Escauriaza–Seregin–Šverák, 2003)

If $u$ is a Leray–Hopf weak solution with $\sup _{t \in [0,T^*)} |u(\cdot,t)| _{L^3(\mathbb{R}^3)} < \infty$, then $u$ can be extended as a smooth solution past $T^*$.

The endpoint case $s=3$ of the LPS family is the critical one: $L^3(\mathbb{R}^3)$ is exactly the scale-invariant norm for Navier–Stokes. The ESS proof is substantially harder than the subcritical cases; it uses a compactness argument to reduce to a smooth, backwards self-similar solution and then invokes a backwards uniqueness theorem for parabolic equations to rule it out.

Tao’s Quantitative Criterion #

Theorem (Tao, 2019)

If a smooth finite-energy solution first becomes singular at time $T^*$, then $$\limsup_{t \uparrow T^*} \dfrac{|u(\cdot,t)| _{L^3(\mathbb{R}^3)}}{\bigl(\log\log\log\tfrac{1}{T^*-t}\bigr)^c} = \infty$$ for some absolute constant $c>0$. In particular, the critical $L^3$ norm must blow up at least as fast as a triple-logarithm in $(T^*-t)^{-1}$.

Tao’s result is the first supercritical regularity criterion for Navier–Stokes: it gives quantitative information about the blowup rate that goes (by a triple logarithm) beyond what scaling alone can detect. The proof quantifies the compactness arguments in the ESS proof, replacing each use of a compactness method by an explicit Carleman inequality, and propagates lower bounds for the vorticity across dyadic annuli. The triple-exponential dependence in Tao’s bound has since been localised and sharpened by Barker–Prange (2021) and others.

The Supercriticality Problem #

The fundamental analytical obstruction is that Navier–Stokes is supercritical with respect to the only globally controlled norm ($L^2$): the energy.

Define the critical regularity index as the Sobolev exponent $s$ such that $\dot{H}^s(\mathbb{R}^3)$ is scale-invariant. For Navier–Stokes, $s = 1/2$. The energy controls $\dot{H}^0 = L^2$ (subcritical), and regularity theory requires control at $\dot{H}^1$ (critical viscous norm) or $L^3$ (critical Lebesgue norm). There is a regularity gap between what is globally available ($L^2$) and what is needed ($L^3$ or $\dot{H}^1$). Every known approach to closing this gap runs into the same obstruction: the nonlinearity can create structure at arbitrarily small scales that the subcritical $L^2$ bound cannot see.

Tao (2016) made this gap precise by constructing an averaged Navier–Stokes system, where the bilinear nonlinearity $(u\cdot\nabla)u$ is replaced by a carefully designed convex average of related nonlinearities, for which finite-time blowup can be rigorously proved. This construction does not produce a counterexample to the true Navier–Stokes equations, but it demonstrates that the specific algebraic structure of the nonlinearity is load-bearing: any proof of global regularity must use something specific about $(u\cdot\nabla)u$ that is not shared by its averages.

Research Directions #

1. Improving the Quantitative Blowup Rate #

Tao’s triple-logarithmic rate is the sharpest known lower bound on blowup of the critical $L^3$ norm. Scaling considerations suggest that the true rate, if blowup occurs, should be much faster; conjecturally $|u|_{L^3} \sim (T^*-t)^{-\delta}$ for some $\delta > 0$, analogous to Type I blowup in nonlinear heat equations. The gap between the triple-logarithmic lower bound and the conjectured power-law rate represents the frontier of quantitative regularity theory. Closing even part of this gap, for instance establishing a single-logarithmic or power-of-log lower bound, would require new ideas beyond Carleman estimates.

2. Type I vs. Type II Blowup #

A blowup is called Type I if the scale-invariant norm $|u(\cdot,t)|_{L^3}$ grows no faster than $O((T^-t)^{-1/2})$ near $T^$. It is Type II otherwise. For the Navier–Stokes equations, ruling out Type I blowup would be a significant advance: all self-similar singularities (where $u(x,t) = (T^*-t)^{-1/2}U(x/(T^*-t)^{1/2})$) are of Type I, and several results (including work of Ružička and Seregin) already rule them out under mild additional assumptions. Whether all Type I blowup can be excluded, leaving only the less structured Type II, is open.

3. Uniqueness of Weak Solutions #

Leray–Hopf weak solutions exist globally, but they may not be unique. This is a separate, equally deep question: even if all smooth solutions extend globally, one must also ask whether weak solutions coincide with smooth ones when started from smooth data. Recent work of Buckmaster and Vicol (2019) showed that weak solutions below the Ladyzhenskaya–Prodi–Serrin threshold are indeed non-unique, using convex integration techniques developed for the Euler equations (De Lellis–Székelyhidi). Whether Leray–Hopf solutions with the energy inequality are unique is still open and is perhaps the central problem in the weak solution theory.

4. Self-Similar and Discretely Self-Similar Solutions #

Self-similar solutions of the form $u(x,t) = (T^*-t)^{-1/2} U(x/(T^*-t)^{1/2})$ satisfy a nonlinear elliptic system for the profile $U$. Several non-existence theorems show that backward self-similar solutions with certain integrability must be trivial (Nečas–Ružička–Šverák, 1996). The case of discretely self-similar solutions, where $u(x,t) = \lambda u(\lambda x, \lambda^2 t)$ for a fixed $\lambda \neq 1$, is less understood and was recently revisited. Whether the set of self-similar profiles that could appear as blowup limits is empty is not known.

5. Computer-Assisted Proofs via Rigorous Numerics #

The Chen–Hou approach to Euler singularities (2025) used a computer-assisted proof framework: construct a numerical approximate profile, then verify its stability rigorously using interval arithmetic. For Navier–Stokes the presence of viscosity complicates such an approach (the profile is dissipated rather than transported), but the same framework (dynamical rescaling plus nonlinear stability verification) might in principle detect or rule out singularities in specific axi-symmetric geometries. Applying and adapting the Hou group’s methods to the viscous problem is an active direction.

6. The Zero-Viscosity Limit and Euler–Navier–Stokes Connection #

As $\nu \to 0$, Navier–Stokes formally converges to Euler. The precise relationship is subtle: in the presence of boundaries (Prandtl layers) or after a potential Euler singularity, the zero-viscosity limit can fail to hold in strong norms. If Euler develops a finite-time singularity at time $T^*_E$ from smooth data (as Chen–Hou suggest for bounded domains), then for small $\nu$ the Navier–Stokes solution must either also develop a near-singularity or be regularised by viscosity before $T^*_E$. Whether viscosity is always sufficient to regularise an Euler singularity, or whether a Navier–Stokes singularity can arise from a nearby Euler one, is entirely open.

References #

Fefferman, C. L. (2000). Existence and smoothness of the Navier–Stokes equation. Clay Mathematics Institute Millennium Prize Problems. https://www.claymath.org/wp-content/uploads/2022/06/navierstokes.pdf
Leray, J. (1934). Sur le mouvement d’un liquide visqueux emplissant l’espace. Acta Mathematica, 63, 193–248.
Hopf, E. (1951). Über die Anfangswertaufgabe für die hydrodynamischen Grundgleichungen. Mathematische Nachrichten, 4(1–6), 213–231.
Caffarelli, L., Kohn, R., & Nirenberg, L. (1982). Partial regularity of suitable weak solutions of the Navier–Stokes equations. Communications on Pure and Applied Mathematics, 35(6), 771–831.
Ladyzhenskaya, O. A. (1967). On uniqueness and smoothness of generalized solutions to the Navier–Stokes equations. Zapiski Nauchnykh Seminarov LOMI, 5, 169–185.
Escauriaza, L., Seregin, G. A., & Šverák, V. (2003). $L_{3,\infty}$-solutions of the Navier–Stokes equations and backward uniqueness. Russian Mathematical Surveys, 58(2), 211–250.
Tao, T. (2019). Quantitative bounds for critically bounded solutions to the Navier–Stokes equations. arXiv:1908.04958. Published in Nine Mathematical Challenges, AMS, 2021, pp. 149–193.
Tao, T. (2016). Finite time blowup for an averaged three-dimensional Navier–Stokes equation. Journal of the American Mathematical Society, 29(3), 601–674.
Buckmaster, T. & Vicol, V. (2019). Nonuniqueness of weak solutions to the Navier–Stokes equation. Annals of Mathematics, 189(1), 101–144.
Barker, T. & Prange, C. (2021). Localized quantitative estimates and potential blow-up rates for the Navier–Stokes equations. Communications in Mathematical Physics, 385, 717–792.

Navier–Stokes Regularity: The Uniqueness of Weak Solutions

Fri, 29 May 2026 00:00:00 +0000

The companion post on Navier–Stokes existence and smoothness asked whether smooth solutions can break down in finite time. This post asks the opposite question: when a solution is only weakly defined, satisfying the equations in an integral sense rather than pointwise, is it uniquely determined by its initial data? The answer, developed over the last two decades through a dramatic series of results, is a resounding no in many regimes. The frontier is now whether the physically natural class of Leray–Hopf weak solutions retains uniqueness.

Question (Weak Uniqueness)

Are Leray–Hopf weak solutions of the 3D incompressible Navier–Stokes equations $$\partial_t u + (u\cdot\nabla)u - \nu\Delta u + \nabla p = 0, \qquad \nabla\cdot u = 0$$ uniquely determined by their initial data $u_0 \in L^2(\mathbb{R}^3)$?

The question is one of the most urgent open problems in the PDE theory of fluid dynamics. It is logically independent of the blowup question: Leray–Hopf solutions exist globally for all time regardless of whether smooth solutions break down. What is not known is whether two Leray–Hopf solutions started from the same data must coincide.

Nash’s h-Principle: The Conceptual Ancestor #

The story begins not in fluid mechanics but in differential geometry. In 1954, John Nash proved that any Riemannian manifold admits a $C^1$ isometric embedding into Euclidean space, a result that contradicted the expectation, based on the rigid behaviour of $C^2$ embeddings (Cauchy), that the metric should impose strong constraints. The key insight is that $C^1$ embeddings are flexible: one can deform them by adding high-frequency oscillations that are invisible at the large scale but locally produce any prescribed metric tensor.

Gromov formulated this phenomenon as the h-principle: for certain underdetermined differential relations, the topological (homotopy-theoretic) obstructions are the only ones, and any formal solution can be deformed into an actual solution. The h-principle is a flexibility result: it says geometry is surprisingly unconstrained below a critical regularity threshold.

De Lellis and Székelyhidi recognised in the mid-2000s that the incompressible Euler equations are formally analogous to Nash’s embedding problem. The Euler system is underdetermined (more unknowns than equations), and one can attempt to construct wild solutions by adding high-frequency oscillations. The crucial observation is that the nonlinearity $u\otimes u$ in the Reynolds stress tensor plays the role of the metric tensor in Nash’s problem.

Wild Euler Solutions #

The first step was to show that the Euler equations possess infinitely many weak solutions for given initial data.

Theorem (De Lellis–Székelyhidi, 2009–2013)

For any divergence-free $u _0 \in L^2(\mathbb{T}^3)$ and any prescribed energy profile $e(t) \in C^\infty([0,T])$ with $e(t) > |u _0| _{L^2}^2$ for all $t > 0$, there exist infinitely many weak solutions $u \in C_t^0 L_x^2$ of the 3D Euler equations with $u(\cdot,0) = u _0$ and $|u(\cdot,t)| _{L^2}^2 = e(t)$.

In particular, the Euler equations admit weak solutions that spontaneously gain or lose kinetic energy for no reason: wild solutions. The construction proceeds by convex integration: one builds the solution iteratively, at each stage adding a high-frequency perturbation (a Beltrami wave) that corrects the error in the momentum equation while staying nearly invisible in the velocity field.

Earlier, Scheffer (1993) and Shnirelman (1997) had shown the existence of weak Euler solutions with compact support in space-time: the fluid is at rest, then spontaneously moves, then returns to rest; but their constructions were indirect. De Lellis and Székelyhidi’s convex integration scheme gave the first systematic and quantitative approach.

Onsager’s Conjecture #

The De Lellis–Székelyhidi results raise an immediate question: at what regularity does the fluid behaviour transition from flexible (wild, non-unique) to rigid (energy-conserving, unique)? This is precisely what Lars Onsager conjectured in 1949.

Onsager's Conjecture (1949)

For the 3D incompressible Euler equations, the threshold regularity for energy conservation is the Hölder exponent $1/3$:

If $u \in C^{0,\alpha}$ with $\alpha > 1/3$, then every weak solution conserves kinetic energy.
For every $\alpha < 1/3$, there exist weak solutions in $C^{0,\alpha}$ that dissipate energy.

The positive direction (conservation above $1/3$) was proved by Constantin–E–Titi (1994). The negative direction (dissipation possible below $1/3$) required much more work and was fully resolved only recently.

Theorem (Isett, 2018)

For every $\alpha < 1/3$ there exist weak solutions $u \in C^{0,\alpha}(\mathbb{T}^3\times[0,T])$ of the 3D Euler equations that fail to conserve kinetic energy.

Isett’s proof, published in the Annals of Mathematics in 2018, was the culmination of a decade of refinements of the De Lellis–Székelyhidi scheme. The key difficulty at regularity exactly $1/3$ is that the high-frequency perturbations must be sized to cancel the Reynolds stress error while staying in $C^{1/3-}$; this requires a delicate interplay of oscillation and concentration (intermittency). De Lellis, Székelyhidi, Buckmaster, and Vicol also obtained solutions attaining any prescribed energy profile in $C^{1/3-}$. Onsager’s conjecture is now a theorem.

Viscous Non-Uniqueness: Buckmaster–Vicol #

Adapting the convex integration scheme from Euler to Navier–Stokes requires overcoming the viscous term $\nu\Delta u$, which smooths out high-frequency oscillations. The intermittent Beltrami waves used by Isett concentrate energy at sparse spatial sets, reducing their interaction with the Laplacian. Buckmaster and Vicol exploited this idea to bring convex integration into the viscous setting.

Theorem (Buckmaster–Vicol, 2019)

There exist infinitely many weak solutions $u \in C_t^0 L_x^2(\mathbb{T}^3)$ of the 3D Navier–Stokes equations, belonging to the same regularity class as Leray–Hopf solutions, that do not satisfy the global energy inequality. In particular, weak solutions of 3D Navier–Stokes are not unique in the class $C_t^0 L_x^2$.

The Buckmaster–Vicol solutions, published in the Annals of Mathematics 189 (2019), 101–144, are weak in both the PDE sense and the energy sense: they satisfy the equations distributionally and have finite kinetic energy, but they can gain energy spontaneously, violating the natural dissipation law $\partial _t|u| _{L^2}^2 \leq -2\nu|\nabla u| _{L^2}^2$.

This non-uniqueness is striking but also limited: the Buckmaster–Vicol solutions are not Leray–Hopf solutions, because Leray–Hopf solutions are required to satisfy the energy inequality $|u(t)| _{L^2}^2 \leq |u _0| _{L^2}^2$. Whether this single additional constraint, that energy does not increase, suffices to restore uniqueness is the open question.

Crossing the Energy Barrier: Albritton–Brué–Colombo #

The energy inequality distinguishing Leray–Hopf solutions from Buckmaster–Vicol wild solutions seemed for a long time to be a genuine barrier to non-uniqueness. The following result crossed this barrier, but required introducing an external force.

Theorem (Albritton–Brué–Colombo, 2022)

There exists a body force $f \in L^1(0,T;, L^2(\mathbb{R}^3))$ and two distinct Leray–Hopf weak solutions of the forced 3D Navier–Stokes equations $\partial_t u + (u\cdot\nabla)u - \nu\Delta u + \nabla p = f$ with the same initial data $u_0 \equiv 0$ and the same force $f$.

Published in the Annals of Mathematics 196 (2022), 415–455, the proof uses a completely different mechanism from convex integration. The key ingredient is an unstable background solution: using Vishik’s construction of spectrally unstable steady states of the 2D Euler equations, Albritton–Brué–Colombo lift a 2D unstable vortex ring to an axisymmetric 3D solution and embed it into the Navier–Stokes flow via a self-similar change of variables. The force $f$ is chosen precisely to make this background exactly solve the forced equations; the instability then allows two different solutions to branch from the same initial data.

The force is singular; it belongs to $L^1_t L^2_x$ but is not smooth, and is concentrated near the initial time $t=0$. Whether the same non-uniqueness can be achieved with a smooth or zero force is the remaining open problem.

The Unforced Case: Current Frontier #

Non-uniqueness of Leray–Hopf solutions for the unforced Navier–Stokes equations remains open. The route to the unforced case requires finding a self-similar background profile that solves the unforced equations exactly and has an unstable eigenvalue, a far more demanding task than the forced case, where the profile can be any divergence-free function.

Open Problem (Jia–Šverák Programme)

Do there exist two distinct Leray–Hopf solutions of the 3D Navier–Stokes equations with the same initial data and no external force?

Jia and Šverák (2013–2014) showed that non-uniqueness would follow from a spectral assumption: if there exists a forward self-similar Navier–Stokes solution whose linearised operator has an eigenvalue with positive real part, then Leray–Hopf solutions are non-unique. Guillod and Šverák (2017) provided compelling numerical evidence that such an unstable self-similar profile exists.

In September 2025, Giri and Kwon posted a preprint (arXiv:2509.25116) claiming a computer-assisted proof of the existence of an unstable self-similar profile for the unforced equations, which, via the Jia–Šverák mechanism, would establish non-uniqueness of Leray–Hopf solutions. The proof uses rigorous interval arithmetic to verify the existence of an unstable eigenvalue. As of this writing the preprint is under review by the community.

The Regularity Threshold #

The accumulated results suggest the following picture of the flexibility-rigidity dichotomy for the Euler and Navier–Stokes equations.

Regularity class	Euler	Navier–Stokes
$C^{0,\alpha}$, $\alpha < 1/3$	non-unique, dissipative (Isett 2018)	n/a
$C^{0,\alpha}$, $\alpha > 1/3$	energy-conserving (Constantin–E–Titi 1994)	n/a
$L^2$ (global energy inequality)	non-unique	open (unforced); non-unique forced (ABC 2022)
$L^\infty_t L^3_x$ (LPS regularity)	n/a	unique and smooth (ESS 2003)

The Leray–Hopf class sits precisely at the boundary where uniqueness is expected to break down but has not yet been proved to do so in the unforced case.

Research Directions #

1. Resolving the Jia–Šverák Spectral Condition #

The most direct path to unforced Leray–Hopf non-uniqueness is to rigorously confirm or refute the spectral condition of Jia–Šverák: find (or prove the nonexistence of) a forward self-similar Navier–Stokes profile with an unstable linearised eigenvalue. The 2025 Giri–Kwon computer-assisted preprint claims this is now done. If confirmed, the consequence is striking: Leray’s 1934 existence theorem cannot be supplemented by uniqueness, and the Navier–Stokes Cauchy problem is ill-posed in the Leray–Hopf class.

2. Selection Principles and Physical Solutions #

If Leray–Hopf solutions are indeed non-unique, a fundamental question becomes which solution is the physically correct one, the one observed in experiments and computed in simulations. Several selection criteria have been proposed: the vanishing viscosity limit of the Navier–Stokes solution as $\nu\to 0$ from above, entropy conditions analogous to those for hyperbolic conservation laws, and renormalisation group or statistical ensemble approaches motivated by turbulence theory. None of these has been rigorously validated as a selection criterion that distinguishes a unique Leray–Hopf solution from the others.

3. Sharp Regularity Thresholds for Navier–Stokes #

For Euler, Onsager’s conjecture identifies $C^{1/3}$ as the sharp regularity threshold for energy conservation. What is the analogous threshold for Navier–Stokes? The Buckmaster–Vicol solutions are in $C_t^0 L_x^2$ (very rough), while the Ladyzhenskaya–Prodi–Serrin class gives uniqueness. The precise exponent at which uniqueness breaks down, if it does, is not known. Determining the sharp Sobolev or Hölder regularity threshold for Navier–Stokes uniqueness, analogous to Onsager’s $1/3$, is a central open problem.

4. Uniqueness for Axisymmetric Initial Data #

A natural restricted problem is whether Leray–Hopf solutions with axisymmetric, swirl-free initial data are unique. Such data imposes a strong geometric constraint that eliminates most of the degrees of freedom available to convex integration. Partial results are known (e.g., global regularity for axisymmetric data without swirl is not proved but no counterexamples exist), but uniqueness in this class has not been established. If the Giri–Kwon instability is confirmed, understanding whether the instability mechanism survives axisymmetric perturbations is an immediate question.

5. Stochastic Regularisation #

There is a well-studied phenomenon, regularisation by noise, in which adding a stochastic forcing term to an ill-posed deterministic PDE restores well-posedness. For the Navier–Stokes equations, Hofmanová–Zhu–Zhu (2023) showed non-uniqueness persists even under multiplicative noise for certain body forces, by adapting the Albritton–Brué–Colombo construction. Whether a generic stochastic perturbation can restore uniqueness of Leray–Hopf solutions, and what the appropriate notion of “generic” should be, is a rich open direction combining convex integration with stochastic analysis.

References #

Nash, J. (1954). $C^1$ isometric imbeddings. Annals of Mathematics, 60(3), 383–396.
De Lellis, C. & Székelyhidi, L. (2009). The Euler equations as a differential inclusion. Annals of Mathematics, 170(3), 1417–1436.
De Lellis, C. & Székelyhidi, L. (2013). Dissipative continuous Euler flows. Inventiones Mathematicae, 193(2), 377–407.
Constantin, P., E, W., & Titi, E. S. (1994). Onsager’s conjecture on the energy conservation for solutions of Euler’s equation. Communications in Mathematical Physics, 165(1), 207–209.
Isett, P. (2018). A proof of Onsager’s conjecture. Annals of Mathematics, 188(3), 871–963.
Buckmaster, T. & Vicol, V. (2019). Nonuniqueness of weak solutions to the Navier–Stokes equation. Annals of Mathematics, 189(1), 101–144.
Buckmaster, T. & Vicol, V. (2019). Convex integration and phenomenologies in turbulence. EMS Surveys in Mathematical Sciences, 6(1–2), 1–88.
Albritton, D., Brué, E., & Colombo, M. (2022). Non-uniqueness of Leray solutions of the forced Navier–Stokes equations. Annals of Mathematics, 196(1), 415–455.
Jia, H. & Šverák, V. (2014). Local-in-space estimates near initial time for weak solutions of the Navier–Stokes equations and forward self-similar solutions. Inventiones Mathematicae, 196(1), 233–265.
Giri, V. & Kwon, H. (2025). Nonuniqueness of Leray–Hopf solutions to the unforced incompressible 3D Navier–Stokes equation. arXiv:2509.25116.

The Regularity Problem for the 3D Euler Equations

Fri, 29 May 2026 00:00:00 +0000

Leonhard Euler wrote down the equations governing the motion of an ideal incompressible fluid in 1757. Whether smooth solutions to these equations can develop a singularity in finite time, a point at which derivatives of the velocity blow up, has been an open problem ever since, and remains one of the central questions in mathematical fluid dynamics.

Problem (Euler Regularity)

Let $u_0 : \mathbb{R}^3 \to \mathbb{R}^3$ be a smooth, divergence-free initial velocity field with sufficient decay at infinity. Does the unique local smooth solution $u(x,t)$ to the 3D incompressible Euler equations $$\partial_t u + (u \cdot \nabla)u + \nabla p = 0, \qquad \nabla \cdot u = 0, \qquad u(\cdot,0)=u_0$$ remain smooth for all time $t > 0$?

The problem is rated L4 on UnsolvedMath, reflecting its depth, and is closely related to the Clay Millennium Prize Problem on the Navier–Stokes equations. The two questions are linked through the zero-viscosity limit, but neither implies the other.

The Equations and What Regularity Means #

The Euler equations express conservation of momentum (first equation) and incompressibility (second equation) for an inviscid fluid. The unknowns are the velocity field $u(x,t) \in \mathbb{R}^3$ and pressure $p(x,t) \in \mathbb{R}$; the pressure is determined implicitly by incompressibility via an elliptic equation.

Vorticity. The central quantity for singularity analysis is the vorticity $\omega = \nabla \times u$, which satisfies the vorticity equation $$\partial_t \omega + (u \cdot \nabla)\omega = (\omega \cdot \nabla)u.$$ The right-hand side, the vortex stretching term, is the essential source of difficulty. It creates a quadratic feedback: large $\omega$ produces large $(\omega \cdot \nabla)u$, which can further amplify $\omega$.

Local well-posedness. For $u_0 \in H^s(\mathbb{R}^3)$ with $s > 5/2$, there exists a unique smooth solution on a time interval $[0, T^*)$ for some $T^* > 0$ depending on $|u _0| _{H^s}$ (Kato, 1972). The question is whether $T^*$ can be taken equal to $+\infty$.

Why 2D is easy, 3D is not. In two dimensions the vortex stretching term $(\omega \cdot \nabla)u$ vanishes identically by antisymmetry. The scalar vorticity $\omega = \partial_1 u_2 - \partial_2 u_1$ is then simply transported along fluid particle paths without amplification, and $|\omega|_{L^\infty}$ is conserved. Global regularity in 2D follows immediately. In 3D no such conservation holds, and the problem is genuinely open.

The Beale–Kato–Majda Criterion #

The first major structural result reduces the regularity problem to a single quantity.

Theorem (Beale–Kato–Majda, 1984)

A smooth solution $u$ of the 3D Euler equations loses regularity at time $T^*$ if and only if $$\int _0^{T^*} |\omega(\cdot,t)| _{L^\infty(\mathbb{R}^3)}, dt = +\infty.$$ In particular, if the vorticity remains bounded in $L^\infty$ on $[0,T]$ for every finite $T$, the solution remains smooth globally.

The BKM criterion redirects the problem: one must show that the vorticity magnitude $|\omega|_{L^\infty}$ cannot accumulate to infinity in finite time. Since $\omega$ satisfies a transport-stretching equation, this requires understanding the geometric structure of the vorticity field under its own evolution.

Geometric Conditions and Depletion of Stretching #

The vortex stretching term $(\omega \cdot \nabla)u$ can be decomposed as $$(\omega \cdot \nabla)u = |\omega|^2 (\hat\omega \cdot \nabla)\hat u,$$ where $\hat\omega = \omega/|\omega|$ is the unit vorticity direction. The key observation is that stretching is governed not only by the magnitude of $\omega$ but also by the geometry of the vorticity field.

Theorem (Constantin–Fefferman–Majda, 1996)

If the unit vorticity direction $\hat\omega = \omega/|\omega|$ is uniformly Lipschitz in a neighbourhood of the set ${|\omega| > \lambda}$ for all $t \in [0, T]$ and some $\lambda > 0$, then the solution remains smooth on $[0,T]$.

This result says that blowup, if it occurs, must be accompanied by violent geometric irregularity of vortex lines, not just large vorticity magnitude, but also loss of Lipschitz regularity of the vorticity direction. It has motivated a line of research on the geometric structure of vortex tubes near potential singularities.

Blowup for Less Regular Data #

Recent years have seen dramatic progress on singularity formation for initial data that is smooth except at isolated points.

Theorem (Elgindi, 2021)

There exist axisymmetric, swirl-free initial velocity fields $u_0 \in C^{1,\alpha}(\mathbb{R}^3)$ for sufficiently small $\alpha > 0$ such that the corresponding solution to the 3D Euler equations develops a finite-time singularity.

Elgindi’s proof, published in the Annals of Mathematics 194 (2021), 647–727, constructs a self-similar blowup profile and establishes its nonlinear stability using a dynamical rescaling formulation. The initial data is not smooth: it belongs to $C^{1,\alpha}$ but not to $C^2$. The singularity forms at the axis of symmetry $r=0$.

This was a breakthrough, but it left open the smooth case. Elgindi himself noted the next target: constructing blowup from initial data that is non-smooth only at a single point, or eventually from fully smooth data.

Extending Elgindi’s construction. Chen and Hou (2022) proved the same type of $C^{1,\alpha}$ blowup for the 3D axisymmetric Euler equations with boundary (inside a periodic cylinder), realising the Hou–Luo blowup scenario numerically proposed in 2014. Subsequent work by Córdoba, Martínez-Zoroa, and Zheng (2025, Annals of PDE) showed that the singularity can be formed from initial data in $C^\infty(\mathbb{R}^3 \setminus {0}) \cap C^{1,\alpha}$, with non-smoothness at a single point, a further step toward the smooth case.

The 2025 Breakthrough: Smooth Blowup with Boundary #

The most significant recent development is the following result, which provides a rigorous proof of finite-time singularity from smooth initial data.

Theorem (Chen–Hou, PNAS 2025)

There exists a family of smooth, finite-energy initial data for the 3D axisymmetric Euler equations in a smooth bounded domain (periodic cylinder) such that the corresponding solutions develop a finite-time singularity. The blowup is nearly self-similar and occurs at the intersection of the boundary $r=1$ and the symmetry plane $z=0$.

The paper, contributed by Thomas Hou and published in PNAS in June 2025 (reviewed by Caflisch, Gómez-Serrano, Sverak, and Tao), provides a computer-assisted proof. The strategy is to:

construct a numerical approximate self-similar blowup profile via the dynamical rescaling formulation,
prove rigorously that the true solution remains close to this profile using energy estimates with carefully verified error bounds (computed with interval arithmetic), and
conclude nonlinear stability of the blowup via a bootstrap argument.

This resolves the problem affirmatively in the setting of smooth data and a smooth bounded domain. The boundary plays a crucial role: it creates an antisymmetric flow pattern driving azimuthal vorticity toward a critical ring, generating intense vortex stretching at a hyperbolic saddle point on the wall.

The remaining open case. The problem in $\mathbb{R}^3$ (or on the periodic torus $\mathbb{T}^3$) without boundary remains open. It is not known whether smooth initial data in free space can produce a singularity, or whether the absence of a boundary provides a genuine stabilising mechanism.

Research Directions #

1. Removing the Boundary #

The most pressing open question is whether the Chen–Hou construction can be extended to $\mathbb{R}^3$ or $\mathbb{T}^3$. The boundary in the 2025 result acts as a geometric catalyst: it enforces a no-flow condition that concentrates vorticity at a specific ring on the wall. Without a boundary, the antisymmetric flow structure that drives the singularity must be sustained entirely by the initial data and the nonlinear dynamics. Whether a comparable mechanism can persist in free space, without the reflective constraint of the wall, is the central open question.

2. Self-Similar Blowup in Full 3D #

All current singularity results are for axisymmetric flows, which reduce the problem from 3 spatial dimensions to 2 (the $rz$-plane). In full 3D, the angular variable $\theta$ is active, and perturbations in the azimuthal direction can either stabilise or destabilise the singularity. Elgindi, Ghoul, and Masmoudi (2021) proved stability of the $C^{1,\alpha}$ blowup under axisymmetric perturbations. Whether the singularity survives fully 3D (non-axisymmetric) perturbations, a question Elgindi posed as open, is crucial: a blowup that is destroyed by any non-symmetric perturbation has limited physical relevance.

3. Quantitative Vortex Stretching and the Role of Geometry #

The BKM criterion and the Constantin–Fefferman–Majda theorem both express the same idea from opposite directions: blowup is controlled by the magnitude and geometry of the vorticity. Current research asks whether a quantitative version can be made sharp. Specifically: if the vorticity direction $\hat\omega$ becomes Hölder-continuous but not Lipschitz, does blowup necessarily follow? Or is there a finer scale invariant quantity, perhaps involving the Hessian of the velocity or the curvature of vortex lines, that governs the problem?

4. Weak Solutions and Non-Uniqueness #

Separate from the question of whether smooth solutions blow up is the question of what happens after a potential singularity. De Lellis and Székelyhidi (2009–2013) proved that the Euler equations have infinitely many weak $L^\infty$ solutions for generic initial data, via convex integration. Isett (2018) proved that weak solutions can dissipate energy, confirming Onsager’s 1949 conjecture. These results show that the solution concept must be carefully chosen. After a smooth blowup, the system likely enters a regime of non-unique weak solutions, and identifying the physically relevant selection criterion, entropy conditions, vanishing viscosity, $h$-principle, is a major open problem.

5. Vanishing Viscosity and the Navier–Stokes Connection #

The Navier–Stokes equations add a viscous term $\nu \Delta u$ to the right-hand side. For any $\nu > 0$, global regularity of Navier–Stokes in 3D is itself open (the Clay Millennium Problem). For the zero-viscosity limit $\nu \to 0$, the central question is whether Navier–Stokes solutions converge to Euler solutions uniformly in time, a question tied to boundary layer behaviour (the Prandtl conjecture) and to the regularity of the Euler solution. If Euler develops a singularity at time $T^*$, the behaviour of Navier–Stokes solutions near $T^*$ as $\nu \to 0$ is completely unknown.

References #

Euler, L. (1757). Principes généraux du mouvement des fluides. Mémoires de l’Académie des Sciences de Berlin, 11, 274–315.
Beale, J. T., Kato, T., & Majda, A. (1984). Remarks on the breakdown of smooth solutions for the 3-D Euler equations. Communications in Mathematical Physics, 94(1), 61–66.
Constantin, P., Fefferman, C., & Majda, A. J. (1996). Geometric constraints on potentially singular solutions for the 3-D Euler equations. Communications in Partial Differential Equations, 21(3–4), 559–571.
Elgindi, T. M. (2021). Finite-time singularity formation for $C^{1,\alpha}$ solutions to the incompressible Euler equations on $\mathbb{R}^3$. Annals of Mathematics, 194(3), 647–727.
Elgindi, T. M., Ghoul, T.-E., & Masmoudi, N. (2021). On the stability of self-similar blow-up for $C^{1,\alpha}$ solutions to the incompressible Euler equations. Cambridge Journal of Mathematics, 9(4), 1035–1075.
Chen, J. & Hou, T. Y. (2023). Finite time blowup of 2D Boussinesq and 3D Euler equations with $C^{1,\alpha}$ velocity and boundary. Communications in Mathematical Physics, 383, 4827–4890.
Chen, J. & Hou, T. Y. (2025). Singularity formation in 3D Euler equations with smooth initial data and boundary. Proceedings of the National Academy of Sciences, 122(27). https://doi.org/10.1073/pnas.2500940122
Córdoba, D., Martínez-Zoroa, L., & Zheng, F. (2025). Finite time singularities to the 3D incompressible Euler equations for solutions in $C^\infty(\mathbb{R}^3\setminus{0})\cap C^{1,\alpha}\cap L^2$. Annals of PDE. https://doi.org/10.1007/s40818-025-00214-2
Isett, P. (2018). A proof of Onsager’s conjecture. Annals of Mathematics, 188(3), 871–963.
Majda, A. J. & Bertozzi, A. L. (2002). Vorticity and Incompressible Flow. Cambridge University Press.

$C^r$ Stability Conjecture

Thu, 28 May 2026 00:00:00 +0000

Structural stability is a global topological property: a dynamical system is structurally stable if all nearby systems have the same orbit structure, up to continuous reparametrisation. Hyperbolicity is a local differential property: the tangent bundle over the recurrent set splits into uniformly contracting and expanding directions. That these two conditions should be equivalent is one of the deepest principles in smooth dynamics.

Conjecture ($C^r$ Stability Conjecture, Palis–Smale, ~1970)

Let $M$ be a closed smooth manifold and $r \geq 1$. If $f \in \mathrm{Diff}^r(M)$ is $C^r$-structurally stable, then $f$ is hyperbolic, i.e., it satisfies Axiom A and the Strong Transversality Condition.

The problem is rated L3 on UnsolvedMath and sits at the heart of the global theory of smooth dynamical systems. The case $r = 1$ is resolved. The case $r \geq 2$ is open, and even basic consequences of structural stability that are elementary for $r = 1$ remain unknown for $r = 2$.

Key Definitions #

Structural stability. A diffeomorphism $f \in \mathrm{Diff}^r(M)$ is $C^r$-structurally stable if there exists a $C^r$-neighborhood $\mathcal{U}$ of $f$ such that every $g \in \mathcal{U}$ is topologically conjugate to $f$: there is a homeomorphism $h : M \to M$ with $h \circ f = g \circ h$. The system is therefore robust under $C^r$-small perturbations in the strongest possible sense: topology, not just orbit counts, is preserved.

Axiom A. The diffeomorphism $f$ satisfies Axiom A if:

the non-wandering set $\Omega(f)$ is hyperbolic: there is a $Df$-invariant splitting $T_x M = E^s_x \oplus E^u_x$ over $\Omega(f)$ with uniform exponential contraction on $E^s$ and expansion on $E^u$;
the periodic points of $f$ are dense in $\Omega(f)$.

Strong Transversality Condition (STC). For every $x, y \in \Omega(f)$, the stable manifold $W^s(x)$ and the unstable manifold $W^u(y)$ intersect transversally. Tangential intersections, namely homoclinic or heteroclinic tangencies, are forbidden.

Together, Axiom A and the STC constitute what is usually meant by saying $f$ is hyperbolic in the sense of the stability conjecture.

The Two Directions #

The conjecture, as an equivalence, has an easy direction and a hard direction.

Structural stability follows from hyperbolicity (the easy direction). Robbin (1971) proved this for $C^2$ diffeomorphisms; Robinson (1976) extended it to $C^1$. Both proofs use the implicit function theorem on an appropriate space of conjugacies, and work for all $r \geq 1$ since Axiom A + STC is the hypothesis.

Theorem (Robbin 1971, Robinson 1976)

For every $r \geq 1$, if $f \in \mathrm{Diff}^r(M)$ satisfies Axiom A and the Strong Transversality Condition, then $f$ is $C^r$-structurally stable.

Hyperbolicity follows from structural stability (the hard direction) is the conjecture itself. It requires understanding what structural stability forces on the dynamics, ruling out every non-hyperbolic mechanism compatible with stability. This is where the difficulty lies, and where the gap between $r = 1$ and $r \geq 2$ opens.

The $C^1$ Case: Mañé’s Theorem #

The $C^1$ stability conjecture was fully proved by Mañé in 1987.

Theorem (Mañé, 1987)

Every $C^1$-structurally stable diffeomorphism of a closed manifold satisfies Axiom A and the Strong Transversality Condition.

The proof, published in Publ. Math. IHÉS 66 (1987), 161–210, is a tour de force of $C^1$ perturbation theory. It rests on several tools that are available only in the $C^1$ topology:

Pugh’s $C^1$ closing lemma (1967): Given a non-wandering point $x$ of $f$, one can make an arbitrarily small $C^1$ perturbation of $f$ to create a periodic orbit passing near $x$. This is the essential mechanism for showing that periodic points are dense in $\Omega(f)$.
Mañé’s ergodic closing lemma (1982): A more refined version that controls the Lyapunov exponents of the created periodic orbit, allowing the construction of hyperbolic periodic points that shadow the orbit of an ergodic measure.
Franks’ lemma (1971): Linear maps along periodic orbits can be prescribed independently (up to $C^1$ conjugacy), allowing one to test whether a given splitting is genuinely hyperbolic or can be destroyed by a small $C^1$ perturbation.

The strategy is to assume structural stability and use these tools to show, step by step, that the non-wandering set must be hyperbolic and that tangencies cannot persist. Mañé had proved the surface case ($\dim M = 2$, $r = 1$) earlier, with the full higher-dimensional result completed in the 1987 paper. Aoki (1992) and Hayashi (1992) subsequently settled the closely related Mañé conjecture on the $C^1$ interior of the set of diffeomorphisms with all hyperbolic periodic points.

The Wall at $r \geq 2$ #

The $C^r$ case for $r \geq 2$ is not merely an incremental extension. The tools that power Mañé’s proof are fundamentally $C^1$ phenomena.

The $C^r$ closing lemma is open for $r \geq 2$. Pugh’s closing lemma fails for $r \geq 2$ in general: Gutierrez showed that the local perturbation argument used for $C^1$ does not work in the $C^2$ topology. A $C^r$ closing lemma is available only for specific classes of diffeomorphisms:

Conservative (volume-preserving) diffeomorphisms on surfaces: Asaoka–Irie ($C^\infty$, 2015), Cristofaro-Gardiner–Prasad–Zhang (2023).
Partially hyperbolic diffeomorphisms with one-dimensional center bundle (all $r \geq 2$ including $r = \infty$): Gan–Shi (2022) and the follow-up $C^r$-chain closing lemma of Shi–Wang (Ergodic Theory Dynam. Syst. 44, 2024).

In the absence of a general $C^r$ closing lemma, the first step of Mañé’s proof, showing that periodic points are dense in $\Omega(f)$ under $C^r$ structural stability, is not known for $r \geq 2$.

Mañé himself underscored this gap. In the 1987 paper, immediately after the proof of Theorem A, he writes that for $r > 1$ “not even [being] known whether a $C^2$ structurally stable diffeomorphism has at least one periodic point, it seems, to say the least, difficult to prove that they are dense.”

Franks’ lemma also fails for $r \geq 2$. Controlling linear maps along periodic orbits requires $C^1$ perturbations; in higher regularity the ambient perturbation must be smooth and the constraints on higher derivatives can prevent the desired linear behaviour from being achieved.

Research Directions #

1. The $C^r$ Closing Lemma for General Diffeomorphisms #

The most direct path to the $C^r$ stability conjecture passes through a general $C^r$ closing lemma. For $r \geq 2$ this asks: given any non-wandering point of a $C^r$ diffeomorphism, can one make an arbitrarily small $C^r$ perturbation to close the orbit? Answering this in the affirmative for all closed manifolds and all $r \geq 2$ would be a landmark result, and would immediately advance the stability conjecture. The recent progress in conservative surface dynamics (Cristofaro-Gardiner et al., 2023) and partially hyperbolic settings shows the question is not hopeless, but the general dissipative case remains untouched.

2. The Surface Case $\dim M = 2$, $r \geq 2$ #

On surfaces the dynamics is simpler: the non-wandering set has lower-dimensional structure, and the absence of a center bundle means “partially hyperbolic” reduces to “hyperbolic.” Mañé settled the surface case for $r = 1$. The $C^r$ stability conjecture for surfaces and $r \geq 2$ is already an important open target and may be the most accessible subcase. Recent $C^\infty$ closing lemmas for conservative surface diffeomorphisms (Asaoka–Irie) suggest that the conservative surface case may be reachable.

3. Partially Hyperbolic Diffeomorphisms #

A diffeomorphism is partially hyperbolic if the tangent bundle splits as $TM = E^{ss} \oplus E^c \oplus E^{uu}$ with uniform contraction on $E^{ss}$, uniform expansion on $E^{uu}$, and an intermediate “center” bundle $E^c$. For these systems, Gan–Shi (2022) and Shi–Wang (2024) have established $C^r$ closing and chain-closing lemmas when $\dim E^c = 1$. The question is whether $C^r$-structural stability of a partially hyperbolic diffeomorphism forces the center bundle to also become hyperbolic, that is, whether partial hyperbolicity implies full hyperbolicity under stability.

4. The Palis Global Conjecture #

Palis proposed that the complement of the hyperbolic diffeomorphisms is exactly the closure of systems exhibiting homoclinic tangencies or heteroclinic cycles. This is a positive description of non-hyperbolic dynamics, and is a strengthening of the $C^r$ stability conjecture (it would also characterise what structural stability forbids). In $C^1$ topology this programme is largely complete through Bonatti– Crovisier’s connecting lemma (2004) and related results. For $r \geq 2$ it is wide open, and progress on the Palis conjecture in $C^r$ would likely resolve the stability conjecture as a corollary.

5. Flows and the Vector Field Analogue #

The stability conjecture has a natural analogue for $C^r$ vector fields: a $C^r$-structurally stable flow should satisfy Axiom A and the strong transversality condition. For $r = 1$ this is also proved. For $r \geq 2$ it is open. The vector field setting introduces additional complications from singular points (zeros of the vector field), as Labarca–Pacifico showed that on manifolds with boundary stable flows can fail Axiom A, so the correct formulation may need adaptation. Progress on the diffeomorphism case would likely shed light on the flow case as well.

References #

Palis, J. & Smale, S. (1970). Structural stability theorems. Proc. Sympos. Pure Math., 14, 223–231.
Robbin, J. W. (1971). A structural stability theorem. Annals of Mathematics, 94(2), 447–493.
Robinson, C. (1976). Structural stability of $C^1$ diffeomorphisms. Journal of Differential Equations, 22(1), 28–73.
Mañé, R. (1987). A proof of the $C^1$ stability conjecture. Publications Mathématiques de l’IHÉS, 66, 161–210.
Aoki, N. (1992). The set of Axiom A diffeomorphisms with no cycles. Bol. Soc. Brasil. Mat., 23(1–2), 21–65.
Hayashi, S. (1992). Diffeomorphisms in $\mathcal{F}^1(M)$ satisfy Axiom A. Ergodic Theory Dynam. Systems, 12(2), 233–253.
Gan, S. & Shi, Y. (2022). $C^r$-closing lemma for partially hyperbolic diffeomorphisms with 1D-center bundle. Journal of Differential Equations, 334, 337–363.
Shi, Y. & Wang, X. (2024). $C^r$-chain closing lemma for certain partially hyperbolic diffeomorphisms. Ergodic Theory Dynam. Systems, 44(7), 1923–1944.
Bonatti, C. & Crovisier, S. (2004). Récurrence et généricité. Inventiones Mathematicae, 158(1), 33–104.
Berger, P. (2017). Lectures on structural stability in dynamics. arXiv:1703.00092.

Inequality for Square-Summable Complex Series

Thu, 28 May 2026 00:00:00 +0000

Some inequalities look formidable until the right decomposition makes them transparent. The conjecture below, posed by Zoltan Retkes on the Open Problem Garden in 2012 with a £10 prize attached, is one such case: once the dyadic structure of the positive integers is made explicit, the proof reduces to two classical facts.

Conjecture (Retkes, 2012), now proved

For all $\alpha = (\alpha_1, \alpha_2, \ldots) \in \ell^2(\mathbb{C})$, $$\sum_{n \geq 1} |\alpha_n|^2 \geq \frac{6}{\pi^2} \sum_{k \geq 0} \left|, \sum_{l \geq 0} \frac{\alpha_{2^k(2l+1)}}{l+1} ,\right|^2.$$

The conjecture was confirmed by an anonymous comment on the problem page in November 2013. A self-contained proof and an extension to $\ell^p$ were subsequently published by Ibragimov and Salimova in Elemente der Mathematik 70 (2015), 79–81.

The Dyadic Decomposition #

The index $2^k(2l+1)$ running over $k \geq 0$ and $l \geq 0$ is not arbitrary: it encodes a canonical partition of the positive integers. Every $n \in \mathbb{N}^+$ factors uniquely as $$n = 2^k \cdot r, \qquad k \geq 0,\quad r \text{ odd positive},$$ where $k = v_2(n)$ is the 2-adic valuation of $n$ and $r = n/2^k$ is its odd part. Writing $r = 2l+1$ gives the bijection $\mathbb{N}_0 \times \mathbb{N}_0 \to \mathbb{N}^+$, $(k, l) \mapsto 2^k(2l+1)$. In particular the sets $$A_k = {2^k(2l+1) : l \geq 0} = {2^k, 3 \cdot 2^k, 5 \cdot 2^k, \ldots}$$ form a partition of $\mathbb{N}^+$. Explicitly: $A_0 = {1, 3, 5, 7, \ldots}$ (odd numbers), $A_1 = {2, 6, 10, 14, \ldots}$ (twice an odd number), and so on. This partition is the key structural fact behind the proof.

Proof #

The argument has two ingredients: the Basel sum $\sum_{l \geq 0}(l+1)^{-2} = \pi^2/6$, and the Cauchy–Schwarz inequality in $\ell^2(\mathbb{C})$.

Define two sequences in $\ell^2(\mathbb{C})$: $$x = \left(1,, \tfrac{1}{2},, \tfrac{1}{3},, \ldots\right), \qquad y_k = \left(\alpha_{2^k},, \alpha_{3 \cdot 2^k},, \alpha_{5 \cdot 2^k},, \ldots\right) \quad (k \geq 0).$$

The inner sum in the conjecture is exactly the $\ell^2$ inner product $\langle x, y_k \rangle$: $$\sum_{l \geq 0} \frac{\alpha_{2^k(2l+1)}}{l+1} = \langle x, y_k \rangle.$$

Step 1: Apply Cauchy–Schwarz. For each $k$,

$$|\langle x, y_k \rangle|^2 \leq |x|_2^2 \cdot |y_k|_2^2.$$

Summing over $k \geq 0$,

$$\sum _{k \geq 0} |\langle x, y _k \rangle|^2 \leq |x| _2^2 \sum _{k \geq 0} |y _k| _2^2.$$

Step 2: Evaluate using the Basel problem and the partition. The Basel problem gives $$|x| _2^2 = \sum _{l \geq 0} \frac{1}{(l+1)^2} = \frac{\pi^2}{6}.$$

Since the sets $A_k$ partition $\mathbb{N}^+$, $$\sum _{k \geq 0} |y_k|_2^2 = \sum _{k \geq 0} \sum _{l \geq 0} |\alpha _{2^k(2l+1)}|^2 = \sum _{n \geq 1} |\alpha_n|^2.$$

Combining both steps, $$\sum_{k \geq 0} \left|\sum_{l \geq 0} \frac{\alpha_{2^k(2l+1)}}{l+1}\right|^2 \leq \frac{\pi^2}{6} \sum_{n \geq 1} |\alpha_n|^2,$$ which is the inequality with the $\frac{6}{\pi^2}$ factor moved to the other side.

Sharpness of the Constant #

The constant $6/\pi^2$ is the best possible. To see this, consider the truncated sequence $\alpha^{(N)}$ defined by $\alpha^{(N)}_{2l+1} = 1/(l+1)$ for $l = 0, 1, \ldots, N-1$ and $\alpha^{(N)}_n = 0$ otherwise. Then:

The left-hand side equals $\displaystyle\sum_{l=0}^{N-1} \frac{1}{(l+1)^2} \to \frac{\pi^2}{6}$.
The only non-zero contribution to the right-hand side comes from $k = 0$ (since all non-zero indices are odd, i.e. in $A_0$), giving $\displaystyle\frac{6}{\pi^2}\left(\sum_{l=0}^{N-1} \frac{1}{(l+1)^2}\right)^2 \to \frac{6}{\pi^2} \cdot \frac{\pi^4}{36} = \frac{\pi^2}{6}$.

The ratio of the right-hand side to the left-hand side therefore tends to $1$ as $N \to \infty$, so no larger constant than $6/\pi^2$ can hold universally. Equality is never achieved for $\alpha \in \ell^2(\mathbb{C})\setminus{0}$ with finite norm since the limiting sequence does not belong to $\ell^2(\mathbb{C})$.

Extension to $\ell^p$ #

The Cauchy–Schwarz inequality used above is a special case of Hölder’s inequality, and the proof generalises immediately.

Theorem (Ibragimov–Salimova, 2015)

Let $p, q \in (1,\infty)$ with $\tfrac{1}{p} + \tfrac{1}{q} = 1$. For all $\alpha = (\alpha_1, \alpha_2, \ldots) \in \ell^p(\mathbb{C})$ and $x = (x_0, x_1, \ldots) \in \ell^q(\mathbb{C})$, $$\sum_{n \geq 1} |\alpha_n|^p \geq \left(\sum_{l \geq 0} |x_l|^q\right)^{-p/q} \sum_{k \geq 0} \left|\sum_{l \geq 0} x_l, \alpha_{2^k(2l+1)}\right|^p.$$

Retkes’s original inequality is the case $p = q = 2$ and $x_l = 1/(l+1)$, where $(\sum_{l\geq 0}|x_l|^2)^{-1} = 6/\pi^2$ by the Basel problem.

Remarks on Structure #

The role of the dyadic partition. The sets $A_k$ are the dyadic layers of $\mathbb{N}^+$: each integer sits in exactly one layer determined by its 2-adic valuation. This structure also appears in the theory of Hardy spaces, where the dyadic martingale decomposition underpins the $H^1$–BMO duality, and in wavelets, where the dyadic scaling of the real line organises the multiresolution analysis. The inequality can be read as a norm comparison between the $\ell^2$ norm and a weighted sum over dyadic layers.

Relation to the Basel problem. The constant $6/\pi^2$, the reciprocal of $\zeta(2)$, appears here because the weight sequence $1/(l+1)$ used in the inner sum is precisely the harmonic sequence, whose $\ell^2$ norm squared is $\zeta(2)$. Any other weight sequence $x \in \ell^2(\mathbb{C})$ would produce the analogous inequality with $|x|_2^{-2}$ in place of $6/\pi^2$.

The inequality as a rearrangement estimate. The right-hand side reorganises the entries of $\alpha$ by their dyadic layer and applies a weighted average within each layer. The inequality says the total $\ell^2$ energy cannot be less than $6/\pi^2$ times the energy of this rearranged, averaged version of the sequence, a quantitative statement about how averaging destroys energy.

Further Questions #

While the original conjecture is settled, several natural variants remain.

Question 1

What is the sharp constant in the inequality if the dyadic partition is replaced by the partition induced by a prime $p \neq 2$, i.e. by the sets $A_k^{(p)} = {p^k m : \gcd(m, p) = 1}$? The same argument applies with $x_l = w_l$ for any weight sequence $w \in \ell^2(\mathbb{C})$, but the resulting constant depends on $|w|_2$ and the choice of weight, not on $\pi$.

Question 2

The inner sum $\sum_{l \geq 0} \alpha_{2^k(2l+1)}/(l+1)$ averages the entries in layer $A_k$ with the harmonic weights. What happens if the harmonic weight $1/(l+1)$ is replaced by a weight $w(l)$ depending on the position $l$ within the layer in a more general way, for instance $w(l) = l^{-s}$ for $s > 1/2$? The sharp constant would then involve $\zeta(2s)$ instead of $\zeta(2) = \pi^2/6$.

Question 3

For $p = 1$ the Ibragimov–Salimova theorem requires $q = \infty$, and the Hölder inequality takes a different form. Does an analogue of Retkes’s inequality hold for $\alpha \in \ell^1(\mathbb{C})$, and if so, what is the sharp constant?

References #

Ibragimov, Z. O. & Salimova, D. F. (2015). On an inequality in $\ell_p(\mathbb{C})$ involving Basel problem. Elemente der Mathematik, 70(2), 79–81. https://ems.press/content/serial-article-files/45532
Retkes, Z. (2012). Inequality for square summable complex series. Open Problem Garden. http://www.openproblemgarden.org/op/inequality_for_square_summable_complex_series
Benko, D. & Molokach, J. (2013). The Basel problem as a rearrangement of series. College Mathematics Journal, 44(3), 171–176.
Ritelli, D. (2013). Another proof of $\zeta(2) = \pi^2/6$ using double integrals. American Mathematical Monthly, 120(7), 642–645.

Recent Advances in Neural Network Optimization for LLM Training

Thu, 28 May 2026 00:00:00 +0000

The optimization landscape for LLM training looks very different from two years ago. AdamW still dominates production runs, but a wave of research is eroding that dominance from multiple angles simultaneously: matrix-aware optimizers, horizon-free schedulers, a sharply revised understanding of µP, and communication-efficient distributed methods. This post synthesizes 18 recent papers across five interconnected fronts.

The unifying thread is an active re-examination of long-held assumptions, from whether gradient geometry matters, to what µP is actually doing, to whether weight decay is a regularizer at all.

1. Muon and Non-Euclidean Optimizers #

Background #

Muon (Momentum Urthogon*alized by Newton-Schulz*) applies a gradient orthogonalization step via a Newton-Schulz iteration before each weight update. Rather than treating each parameter as an independent scalar (as Adam does), Muon recognizes that weight matrices have geometric structure and optimizes them accordingly, performing steepest descent under the spectral norm.

The core Newton-Schulz iteration, which runs stably in bfloat16 on tensor cores, is:

$$ X \leftarrow aX + b(XX^\top)X + c(XX^\top)^2 X $$

with coefficients $a = 3.4445$, $b = -4.7750$, $c = 2.0315$. In PyTorch:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


def newtonschulz5(G, steps=5, eps=1e-7):
 a, b, c = (3.4445, -4.7750, 2.0315)
 X = G.bfloat16()
 X /= (X.norm() + eps)
 if G.size(0) > G.size(1):
 X = X.T
 for _ in range(steps):
 A = X @ X.T
 B = b * A + c * A @ A
 X = a * X + B @ X
 if G.size(0) > G.size(1):
 X = X.T
 return X

A ready-to-use implementation lives at KellerJordan/Muon. Install via:

1

pip install git+https://github.com/KellerJordan/Muon

Muon is intended for hidden-layer matrix weights only. Embeddings, the output head, and scalar/vector parameters should still use AdamW:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20


from muon import MuonWithAuxAdam


hidden_matrix_params = [
 p for n, p in model.blocks.named_parameters()
 if p.ndim >= 2 and "embed" not in n
]
embed_params = [p for n, p in model.named_parameters() if "embed" in n]
scalar_params = [p for p in model.parameters() if p.ndim < 2]
head_params = [model.lm_head.weight]


optimizer = MuonWithAuxAdam(
 muon_params=hidden_matrix_params,
 lr=0.02,
 adamw_params=embed_params + scalar_params + head_params,
 adamw_lr=3e-4,
 adamw_wd=0.1,
)
# LR has built-in muP scaling, so no retuning is needed as you scale up

Scaling Muon: the Moonlight result #

MoonshotAI’s Moonlight (3B/16B-parameter MoE, trained on 5.7T tokens) provides the strongest evidence yet that Muon scales to real LLM training (arXiv:2502.16982, GitHub). Two fixes are needed to make Muon work beyond small scale:

Weight decay: without it, weight and output RMS norms grow until they overflow bfloat16.
Per-parameter update scale adjustment: matching the RMS update norm of AdamW by a factor of $\sqrt{(1-\beta_1)/(1+\beta_1)}$.

With these in place, scaling-law experiments indicate roughly 2× computational efficiency compared to AdamW at compute-optimal settings.

1
2
3
4
5


# Train a Qwen-like dense model with Muon (from Moonlight repo)
python3 examples/toy_train.py \
 --model qwen --optimizer muon \
 --dataset openwebtext-100k \
 --hidden_size 896 --lr 1e-3

A further efficiency variant is Flash-Muon, which reimplements the Newton-Schulz inner loop using a custom Triton kernel that exploits the symmetry of the $XX^\top$ computation, halving the effective FLOP count.

Theoretical foundations #

Kovalev (2025) shows in Understanding Gradient Orthogonalization via Non-Euclidean Trust-Region Optimization that the orthogonalized gradient update can be interpreted as a first-order trust-region method where the trust-region is defined in terms of the matrix spectral norm. This framework unifies Muon with normalized SGD and signSGD with momentum.

Pethick et al. (2025) propose Scion, a family of LMO-based algorithms that subsumes Muon, AdamW, and normalized SGD under a single framework (arXiv:2502.07529). By choosing an explicit norm for deep architectures, Scion also achieves hyperparameter transferability across model widths.

The Polar Express (Amsel et al., 2025) replaces Newton-Schulz with a minimax polar decomposition, solving a minimax problem at each iteration to minimize worst-case error. It converges faster than Newton-Schulz in both early and asymptotic stages, while remaining numerically stable in bfloat16.

Challenging the geometric narrative #

Despite the theoretical appeal, Shumaylov et al. (2026) mount a systematic challenge in Muon is Not That Special: Random or Inverted Spectra Work Just as Well. They introduce:

Freon: a family of optimizers based on Schatten (quasi-)norms, interpolating between SGD and Muon. The best-performing Schatten parameter for GPT-2 lies in the quasi-norm regime, which no LMO-based optimizer can represent.
Kaon: replaces Muon’s singular values with random noise, yet still matches Muon’s validation loss on GPT-2.

Their key insight: performance is primarily controlled by two local quantities, alignment (how well the update direction aligns with the gradient) and descent potential (step-size optimality). Muon succeeds by guaranteeing step-size optimality, not by tracking an ideal geometry.

Optimizer	Core mechanism	Key claim
Muon	Newton-Schulz orthogonalization	~2× efficiency over AdamW at compute-optimal
Scion	LMO over norm-ball	Unifies Muon/Adam; HP transferable across widths
Polar Express	Minimax polar decomposition	Faster convergence; bfloat16-safe
Freon / Kaon	Schatten quasi-norms / random SVs	Geometry is irrelevant; alignment drives performance

2. Learning Rate Scheduling #

Linear decay is provably optimal #

Defazio et al. (2023/2024) close a long-standing gap between theory and practice in Optimal Linear Decay Learning Rate Schedules and Further Refinements (arXiv:2310.07831). Under worst-case analysis, linear decay, setting $\eta_t \propto (1 - t/T)$, is the theoretically optimal schedule for a broad class of optimizers including SGD. Across 10 diverse benchmarks, it consistently outperforms cosine annealing.

$$ \eta_t = \eta_{\max} \cdot \left(1 - \frac{t}{T}\right) $$

1
2
3
4


# PyTorch built-in, the optimal default
scheduler = torch.optim.lr_scheduler.LinearLR(
 optimizer, start_factor=1.0, end_factor=0.0, total_iters=total_steps
)

The WSD cooldown phase #

The Warmup-Stable-Decay (WSD) scheduler separates training into distinct phases ending in a sharp LR drop. Dremov et al. (2025) analyse the cooldown phase specifically in Training Dynamics of the Cooldown Stage in WSD, finding:

Cooldown shapes that balance exploration and exploitation consistently outperform purely exploratory or exploitative alternatives.
There is substantial sensitivity to AdamW’s $\beta_2$ parameter during cooldown, and higher $\beta_2$ values yield consistent improvements.
Loss-landscape visualisations support the “river valley” perspective: the cooldown follows a narrow valley in parameter space.

Convex theory meets LLM practice #

Schaipp et al. (2025) show in The Surprising Agreement Between Convex Optimization Theory and Learning-Rate Scheduling for Large Model Training that schedules for large model training obey performance bounds from non-smooth convex optimisation. For the constant schedule with linear cooldown, the bound is:

$$ \bar{f}T - f^* \leq \frac{|x_0 - x^*|^2}{2\eta T} + \frac{\eta}{2} \sum{t=0}^{T-1} \sigma_t^2 $$

where the cooldown benefit appears explicitly through the absence of logarithmic terms. This enables principled LR transfer: exploiting the theory yields noticeable validation loss improvements for 124M and 210M Llama-type models when extending schedules for continued training.

Anytime schedules and weight averaging #

Meterez et al. (2026) prove in Anytime Pretraining: Horizon-Free Learning-Rate Schedules with Weight Averaging (arXiv:2602.03702) that horizon-free (anytime) schedules exist for overparameterised linear regression, with weight averaging central to achieving minimax-optimal convergence. At 150M–300M params trained at 1–32× Chinchilla scale, a constant LR with weight averaging matches well-tuned cosine decay across the full training duration.

Weight averaging is a largely underutilised practical lever. It should be a default, not an afterthought.

ScheduleFree+ at LLM scale #

Defazio (2026) extends schedule-free learning to full LLM pretraining in ScheduleFree+: Scaling Learning-Rate-Free and Schedule-Free Learning to Large Language Models (arXiv:2605.19095). Practical fixes for large batch and model sizes enable ScheduleFree+ to achieve a 31% improvement over WSD schedules at 1000 tokens per parameter, while also providing a theoretical foundation for checkpoint merging during pretraining.

1

pip install schedulefree

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


from schedulefree import AdamWScheduleFree


optimizer = AdamWScheduleFree(
 model.parameters(), lr=1e-3, warmup_steps=1000
)


# Must switch to eval mode before evaluation
optimizer.eval()
val_loss = evaluate(model)
optimizer.train()

GitHub: facebookresearch/schedule_free

3. Hyperparameter Transfer and Scaling Laws (µP) #

Weight decay as the true driver of LR transfer #

The Maximal Update Parameterisation (µP) is widely used to transfer optimal learning rates from proxy models to large ones without re-tuning. Kosson et al. (2025/2026), accepted to ICLR 2026, provide a large-scale empirical refutation of the standard µP narrative in Weight Decay May Matter More than µP for Learning Rate Transfer in Practice.

Their finding: µP’s geometric alignment assumptions, which require alignment between a layer’s inputs, weights, and gradient updates, hold only briefly at the start of training. For the remainder, it is weight decay that stabilises update dynamics across widths and facilitates LR transfer. This implies µP’s scaling primarily acts as an implicit warmup, and can be largely replaced by modified warmup schedules.

Embedding layer LR as the key factor #

Kalra & Barkeshli (2026) provide complementary evidence in Quantifying Hyperparameter Transfer and the Importance of Embedding Layer Learning Rate, tracing µP’s advantage over standard parameterisation (SP) to a single factor: the embedding layer learning rate.

In SP, the embedding LR acts as a training bottleneck. Simply increasing it by a factor of model width, matching µP, eliminates most of the gap. Three quantitative metrics are used: quality of scaling law fit, robustness to extrapolation errors, and asymptotic loss penalty.

1
2
3
4
5
6
7
8
9


# Simple fix that captures most of µP's benefit in SP
embed_lr_multiplier = model_width / base_width # = d_model / d_model_proxy


param_groups = [
 {"params": model.embed.parameters(), "lr": base_lr * embed_lr_multiplier},
 {"params": non_embed_params, "lr": base_lr},
]
optimizer = torch.optim.AdamW(param_groups, weight_decay=0.1)

Open question: Kosson et al. argue µP acts as an implicit warmup; Kalra & Barkeshli argue it is about the embedding LR. Both contradict µP’s original geometric motivation. No consensus has emerged, and the practical implications differ significantly.

4. Normalization, Weight Decay, and Variance Reduction #

The end-of-training gradient spike #

Defazio (2025) identifies a subtle pathology in Why Gradients Rapidly Increase Near the End of Training: gradient norms spike sharply near the end of long LLM runs. The diagnosis is a three-way interaction between weight decay, normalisation layers, and the LR schedule.

When a layer is followed by normalisation, its scale becomes irrelevant to the forward pass, but weight decay continues shrinking the parameters. This creates an implicit competition between the optimizer’s effective update size and normalisation rescaling, causing gradient norms to grow unchecked as the LR decays.

Fix: disable weight decay for AdamW-updated layers in architectures where those layers are directly followed by normalisation (e.g. every transformer block):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


no_wd, wd = [], []
for name, param in model.named_parameters():
 if "norm" in name or "embed" in name or param.ndim < 2:
 no_wd.append(param)
 else:
 wd.append(param)


optimizer = torch.optim.AdamW([
 {"params": wd, "weight_decay": 0.1},
 {"params": no_wd, "weight_decay": 0.0},
], lr=3e-4)

This simultaneously eliminates the spike and reduces loss throughout training. The analysis explains why weight decay should be disabled for AdamW-updated layers in architectures like modded-nanoGPT.

Weight normalisation as an alternative #

Nemotron-Flash (Fu et al., 2025, NeurIPS 2025) investigates weight normalisation as a practical mechanism in small language models, finding that it enables more effective weight updates and improves final convergence. Weight normalisation sidesteps the weight-decay/normalisation interaction described above, though at the cost of slightly worse final loss compared to a well-tuned baseline.

MARS: variance reduction meets preconditioned gradients #

Despite decades of theoretical work, variance reduction has largely failed to yield practical gains in deep learning. Yuan et al. (2024/2025) attempt to change this in MARS: Unleashing the Power of Variance Reduction for Training Large Models, proposing a unified framework that reconciles AdamW, Lion, and Shampoo with variance reduction via a scaled stochastic recursive momentum technique.

GPT-2 training results look strong. However, the comprehensive benchmark by Semenov et al. (2025), Benchmarking Optimizers for Large Language Model Pretraining, a 73-page study covering 44 figures and 48 tables across standardised scenarios, reveals that MARS does not work well with small batch sizes, limiting its practical applicability in memory-constrained settings.

This underscores the danger of evaluating optimizers on a single benchmark setup: MARS looks excellent at the batch sizes used in the original paper and brittle elsewhere.

5. Distributed Training: DiLoCo and Its Descendants #

DiLoCo (Distributed Low-Communication training) uses AdamW as an inner optimizer for $H$ local steps on each worker (typically $H = 500$), then synchronises by applying Nesterov momentum to the pseudo-gradient, the sum of all parameter changes across those inner steps. This reduces communication frequency by up to 500×.

OpenDiLoCo: the open-source foundation #

PrimeIntellect’s OpenDiLoCo provides a reproducible drop-in implementation, demonstrated training across two continents and three countries with 90–95% compute utilisation. It later served as the foundation for INTELLECT-1, a 10B-parameter model trained globally.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18


from functools import partial
from open_diloco.hivemind_diloco import DiLoCoOptimizer


inner_optimizer = partial(torch.optim.AdamW, lr=4e-4)
outer_optimizer = partial(
 torch.optim.SGD, lr=0.7, momentum=0.9, nesterov=True
)


optimizer = DiLoCoOptimizer(
 dht=dht,
 params=model.parameters(),
 batch_size=512,
 num_inner_steps=500, # sync every 500 steps, 500× fewer communications
 inner_optimizer=inner_optimizer,
 outer_optimizer=outer_optimizer,
)

Why DiLoCo works on a single node: SNOO #

Kallusky et al. (2025) show in SNOO: Step-K Nesterov Outer Optimizer that DiLoCo’s effectiveness, even on a single node, stems from applying Nesterov momentum to the pseudo-gradient. Their method isolates this as a standalone Lookahead variant. Results:

1.5–2.5× FLOPs efficiency gains up to $10^{23}$ training FLOPs.
Improvements increase with model size.
Compatible with both AdamW and Muon as inner optimizers.
Minimal memory overhead.

The single-worker DiLoCo achieves speedups of up to 6.32% in steps-to-loss over AdamW on a 160M Llama model.

Smoothing DiLoCo: Generalized Primal Averaging (GPA) #

Defazio et al. (2025/2026) propose GPA in Smoothing DiLoCo with Primal Averaging for Faster Training of LLMs (arXiv:2512.17131), which decouples DiLoCo’s interpolation constants to enable smooth iterate averaging at every step, replacing uniform averaging with exponential moving averaging.

GPA unifies single-worker DiLoCo and ScheduleFree within a single non-distributed framework. Speedups over AdamW in steps-to-target-loss:

Model	Speedup
Llama-160M	8.71%
Llama-1B	10.13%
Llama-8B	9.58%

Streaming DiLoCo: towards free distributed training #

Douillard et al. (2025) address the remaining bottleneck in Streaming DiLoCo with Overlapping Communication: Towards a Distributed Free Lunch (arXiv:2501.18512): even with infrequent synchronisation, each sync exchanges all parameters simultaneously. Three fixes:

Streaming sync: synchronise only subsets of parameters at a time.
Overlapping communication: continue training during synchronisation.
Quantisation: reduce cross-worker data to fewer bits.

Together, required bandwidth drops by two orders of magnitude while maintaining comparable quality at billion-parameter scale.

Method	Setting	Key contribution	Gain
SNOO	Single-node	Nesterov momentum on pseudo-gradient	1.5–2.5× FLOP efficiency
GPA	Single-node	Smooth iterate averaging; unifies DiLoCo + SF	~9% steps-to-loss
Streaming DiLoCo	Distributed	Streaming sync + quantisation	~100× bandwidth reduction

6. Cross-Cutting Themes and Open Questions #

Several recurrent tensions emerge from reading these papers together.

Geometry vs. step-size calibration in Muon #

Kovalev, Pethick et al., and Amsel et al. offer geometric explanations for Muon’s success. Shumaylov et al. argue that geometry is practically irrelevant and step-size optimality is the true driver. Which narrative guides future research matters: geometry points toward more sophisticated matrix norms; the step-size interpretation suggests much simpler paths to similar gains.

What µP is actually doing #

Kosson et al. argue µP is primarily an implicit warmup mechanism. Kalra & Barkeshli argue it is essentially about the embedding layer LR. Both stand in contrast to µP’s original geometric motivation. The practical stakes are high: the warmup interpretation suggests µP can be discarded with a schedule change; the embedding LR interpretation suggests a single-line fix.

Weight decay as a multi-role hyperparameter #

Weight decay appears as a protagonist in three independent stories in this survey:

Defazio: source of end-of-training gradient spikes via interaction with normalisation.
Kosson et al.: the true driver of LR transfer, not µP geometry.
Kalra & Barkeshli: improves scaling law fits but hurts extrapolation robustness.

It is no longer tenable to treat weight decay as a simple regulariser with a sensible default. It must be understood per-layer and in interaction with your normalisation strategy.

DiLoCo as the practical distributed optimizer #

Despite a large body of research on distributed optimizers, DiLoCo and its derivatives appear to be the only methods that consistently add value beyond simply scaling the batch size. The finding that its benefits carry over to single-node settings (via SNOO and GPA) makes it a particularly important line of work for practitioners at all scales.

Practical Recommendations for 2026 #

Based on the convergence of evidence across these papers, for a new large training run consider:

Optimizer: Muon for hidden-layer matrix weights + AdamW for embeddings/head. The Moonlight scaling fixes (weight decay + update scale adjustment) are necessary above ~1B parameters.
Schedule: ScheduleFree+ or linear decay instead of cosine. If you need a fixed-horizon schedule, WSD with higher $\beta_2$ during cooldown.
Weight decay: Disable it for layers directly followed by normalisation to avoid end-of-training gradient spikes.
Outer optimizer: Wrap your training loop with single-worker DiLoCo (SNOO or GPA) for a ~9% efficiency gain with no architectural changes.
µP alternatives: Before adopting full µP overhead, try increasing the embedding layer LR by a factor of $d_{\text{model}} / d_{\text{proxy}}$. This may reproduce most of the benefit.

None of these require fundamental architectural changes.

References #

#	Paper	Venue	Links
1	Jordan et al. (2024): Muon: An optimizer for hidden layers	n/a	blog · GitHub
2	Liu et al. (2025): Muon is Scalable for LLM Training (Moonlight)	n/a	arXiv:2502.16982 · GitHub
3	Kovalev (2025): Understanding Gradient Orthogonalization	n/a	n/a
4	Pethick et al. (2025): Training Deep Learning Models with Norm-Constrained LMOs (Scion)	n/a	arXiv:2502.07529
5	Amsel et al. (2025): The Polar Express	n/a	n/a
6	Shumaylov et al. (2026): Muon is Not That Special (Freon/Kaon)	n/a	n/a
7	Defazio et al. (2023): Optimal Linear Decay Learning Rate Schedules	n/a	arXiv:2310.07831
8	Dremov et al. (2025): Training Dynamics of the Cooldown Stage in WSD	n/a	n/a
9	Schaipp et al. (2025): Surprising Agreement Between Convex Theory and LR Scheduling	n/a	n/a
10	Meterez et al. (2026): Anytime Pretraining	n/a	arXiv:2602.03702
11	Defazio (2026): ScheduleFree+	n/a	arXiv:2605.19095 · GitHub
12	Kosson et al. (2026): Weight Decay May Matter More than µP	ICLR 2026	n/a
13	Kalra & Barkeshli (2026): Quantifying HP Transfer and Embedding LR	n/a	n/a
14	Defazio (2025): Why Gradients Rapidly Increase Near End of Training	n/a	n/a
15	Fu et al. (2025): Nemotron-Flash	NeurIPS 2025	n/a
16	Yuan et al. (2025): MARS	n/a	n/a
17	Semenov et al. (2025): Benchmarking Optimizers for LLM Pretraining	n/a	n/a
18	Kallusky et al. (2025): SNOO	n/a	n/a
19	Defazio et al. (2026): Smoothing DiLoCo with Primal Averaging (GPA)	n/a	arXiv:2512.17131
20	Douillard et al. (2025): Streaming DiLoCo	n/a	arXiv:2501.18512
21	Douillard et al. (2023/2024): DiLoCo (original)	n/a	arXiv:2311.08105
22	PrimeIntellect AI (2024): OpenDiLoCo	n/a	GitHub · blog

The Invariant Subspace Problem

Thu, 28 May 2026 00:00:00 +0000

Few questions in functional analysis have attracted sustained attention across as many decades as this one. It sits at the confluence of operator theory, spectral theory, and complex analysis, and every partial result has opened new territory rather than narrowing the problem to a routine case.

Problem (Invariant Subspace Problem)

Does every bounded linear operator $T$ on an infinite-dimensional separable complex Hilbert space $\mathcal{H}$ have a non-trivial closed invariant subspace?

That is, does there always exist a closed subspace $\mathcal{M} \subsetneq \mathcal{H}$ with $\mathcal{M} \neq {0}$ such that $T\mathcal{M} \subseteq \mathcal{M}$?

The problem is rated medium importance on the Open Problem Garden. It is old enough to have accumulated a rich history of partial results, yet still open in the Hilbert space setting after more than seventy years.

Trivial Observations and Why They Run Out #

Two subspaces are always invariant: ${0}$ and $\mathcal{H}$ itself. These are the trivial invariant subspaces; the problem asks whether anything else must exist.

On finite-dimensional spaces the answer is immediate: every operator on $\mathbb{C}^n$ has an eigenvector (by the fundamental theorem of algebra applied to the characteristic polynomial), and the span of any eigenvector is a one-dimensional invariant subspace. This argument fails completely in infinite dimensions, where the spectrum can be continuous and eigenvectors need not exist.

On non-separable Hilbert spaces the problem is also trivial but for a different reason: for any non-zero vector $x \in \mathcal{H}$, the closed linear span $\overline{\operatorname{span}{T^n x : n \geq 0}}$ is a closed invariant subspace, and if $\mathcal{H}$ is non-separable it cannot equal all of $\mathcal{H}$. So the problem is genuinely about separable spaces.

Landscape of Known Results #

Positive Results: Classes with Invariant Subspaces #

Theorem (Aronszajn–Smith, 1954)

Every compact operator on a Banach space of dimension greater than one has a non-trivial closed invariant subspace.

The compact case was already known to von Neumann in the 1930s for Hilbert spaces, but was never published; Aronszajn and Smith gave the first published proof, extended to Banach spaces. The key idea is that a compact operator can be approximated by finite-rank operators, each of which has invariant subspaces, and a limiting argument produces an invariant subspace for the compact operator.

Theorem (Lomonosov, 1973)

If a bounded operator $T$ on a Banach space commutes with a non-zero compact operator, then $T$ has a non-trivial hyperinvariant subspace (a subspace invariant under every operator that commutes with $T$).

Lomonosov’s proof is strikingly short, less than a page, and uses the Schauder fixed-point theorem in an unexpected way. It subsumes both the compact case (an operator commutes with itself) and the polynomially compact case (an operator commutes with $p(T)$, which is compact if $p(T)$ is). For several years it seemed that Lomonosov’s theorem might resolve the problem entirely, until Hadwin, Nordgren, Radjavi, and Rosenthal (1980) exhibited an operator that does not commute with any non-zero compact operator yet still has invariant subspaces.

Theorem (Brown, 1987)

Every subnormal operator on a Hilbert space has a non-trivial invariant subspace.

An operator $T$ is subnormal if it is the restriction of a normal operator on a larger Hilbert space. Normal operators are handled by the spectral theorem, which produces a rich lattice of invariant subspaces; subnormal operators inherit invariant subspaces by restriction. Brown’s proof uses techniques from rational approximation theory (the solution of the Halmos problem on subnormal operators).

Beyond these landmark theorems, invariant subspaces are also known for: hyponormal operators with some additional conditions, operators whose spectrum has interior points, operators satisfying growth conditions on the resolvent, and polynomially bounded operators with spectrum containing the unit circle under further constraints (Liu, 2017; Réjasse, 2023).

Beurling’s Theorem: A Complete Classification #

Theorem (Beurling, 1949)

The closed invariant subspaces of the unilateral shift $S : H^2(\mathbb{D}) \to H^2(\mathbb{D})$, $(Sf)(z) = zf(z)$, are exactly the subspaces of the form $\varphi H^2(\mathbb{D})$ where $\varphi$ is an inner function (i.e. $|\varphi(e^{i\theta})| = 1$ a.e.).

Beurling’s theorem is a landmark because it gives not merely existence but a full classification of all invariant subspaces for a single operator. The shift on $H^2$ is in many senses the canonical operator for the Hilbert space invariant subspace problem: finding a counterexample to the full problem is equivalent to finding an operator with no invariant subspaces, and the shift shows how rich such structure can be even for a single operator.

Negative Results: Counterexamples on Banach Spaces #

Theorem (Enflo, 1975/1987; Read, 1984)

There exist separable Banach spaces and bounded linear operators on them with no non-trivial closed invariant subspace. In particular, Read constructed such an operator on $\ell^1$.

Enflo’s counterexample was the first, constructed in 1975 though not published until 1987 due to its length and complexity. Read’s construction (1984) arrived independently and somewhat earlier in print; a further, more explicit example by Read (1985) lives on the classical space $\ell^1$. These results make clear that the answer to the invariant subspace problem is negative for general Banach spaces. The Hilbert space case remains the central open question precisely because no counterexample on any reflexive Banach space, much less a Hilbert space, has been found.

The Hilbert–Banach Gap #

The separation between Hilbert space and general Banach space behaviour is a recurring theme. Several features of Hilbert spaces that Banach spaces lack suggest why counterexamples might not exist in the Hilbert setting:

The inner product gives every operator an adjoint $T^*$, and the lattice of invariant subspaces of $T$ and of $T^*$ are related by orthogonal complementation.
The spectral theorem for normal operators provides a complete invariant subspace theory for that class, anchoring intuition.
Reflexivity and the existence of unconditional bases in specific Hilbert spaces constrain operator behaviour more than in $\ell^1$.

None of these features has yet been converted into a proof for the general case.

Recent Proof Attempts #

The problem has attracted renewed attention in recent years.

In May 2023, Per Enflo, the same mathematician who produced the first Banach space counterexample, posted a preprint to arXiv (2305.15442) claiming a positive resolution for all separable Hilbert spaces. The original preprint was 13 pages; a substantially expanded version (52 KB) appeared in April 2024. Enflo himself has been cautious about the result, noting that expert review is ongoing. As of this writing the preprint has not received a definitive verdict from the community.

In July 2023 an independent preprint by Neville (arXiv:2307.08176) also claimed a positive solution for separable Hilbert spaces.

In September 2024 a peer-reviewed article in Axioms by Khalil, Yousef, Alshanti, and Abu Hammad announced a proof, but basic errors were identified shortly after publication (Ghatasheh, arXiv:2411.19409, November 2024).

The problem therefore remains officially open. The cluster of recent attempts reflects both its difficulty and its continued centrality in functional analysis.

Research Directions #

1. Cyclic Vectors and the Spectral Radius Formula #

A vector $x \in \mathcal{H}$ is cyclic for $T$ if $\mathcal{H} = \overline{\operatorname{span}{T^n x : n \geq 0}}$. An operator with a non-trivial invariant subspace cannot have every non-zero vector be cyclic. The contrapositive is: if every non-zero vector is cyclic, then $T$ is a counterexample.

Read’s Banach-space constructions proceed by building hypercyclic operators whose orbits are dense. On Hilbert spaces, Hilbert space geometry severely constrains the density of orbits. Making this constraint quantitative, via growth estimates on $|T^n x|$ or on the resolvent $|(T-\lambda)^{-1}|$, might close the gap between known positive results and the general case.

2. Dual Algebra Techniques #

A powerful modern approach studies the dual algebra $\mathcal{A} _T$, the weak-$*$ closure of the polynomials in $T$ as a subalgebra of $\mathcal{B}(\mathcal{H})$. If $\mathcal{A} _T = \mathcal{B}(\mathcal{H})$ (the operator is reflexive in this sense), one can sometimes extract invariant subspaces from the structure of the algebra. Results along these lines have been obtained for $C _{00}$ contractions (Bercovici, Foiaş, Pearcy) and for polynomially bounded operators under spectral conditions (Liu, 2017). The key open question is whether every Hilbert space contraction is reflexive in this sense, or whether the dual algebra approach can be made to work for all contractions via Sz.-Nagy–Foiaş theory.

3. Contractions and the Sz.-Nagy–Foiaş Calculus #

Every contraction ($|T| \leq 1$) on a Hilbert space admits a minimal unitary dilation (Sz.-Nagy’s dilation theorem), and Foiaş developed a functional calculus for contractions based on $H^\infty(\mathbb{D})$. The rich structure of this calculus has produced invariant subspace theorems for $C_{11}$ contractions and for contractions whose spectrum is rich enough. The question is whether the calculus can be pushed to all contractions; the general invariant subspace problem for contractions is equivalent to the full problem (by rescaling), so this is not a simplification but a different vantage point that has been productive.

4. Almost Invariant Half-Spaces #

A weaker notion, studied by Androulakis, Popov, Tcaciuc, and Troitsky, asks for almost invariant half-spaces: closed subspaces $\mathcal{M}$ of infinite dimension and infinite codimension such that $T\mathcal{M} \subseteq \mathcal{M} + \mathcal{F}$ for some finite-dimensional subspace $\mathcal{F}$. These exist for every operator on any infinite-dimensional Banach space. Whether every operator on a Hilbert space has a genuinely invariant (not just almost invariant) infinite-dimensional subspace of infinite codimension remains open and is a concrete intermediate target.

5. Hyperinvariant Subspaces #

A subspace is hyperinvariant for $T$ if it is invariant under every operator that commutes with $T$. Every hyperinvariant subspace is invariant, so existence of a hyperinvariant subspace implies a positive answer to the invariant subspace problem. Lomonosov’s 1973 theorem gives hyperinvariant subspaces when $T$ commutes with a compact operator. The hyperinvariant subspace problem, does every operator on a Hilbert space (other than scalar multiples of the identity) have a hyperinvariant subspace?, is also open and may be harder than the invariant subspace problem itself.

References #

Aronszajn, N. & Smith, K. T. (1954). Invariant subspaces of completely continuous operators. Annals of Mathematics, 60(2), 345–350.
Beurling, A. (1949). On two problems concerning linear transformations in Hilbert space. Acta Mathematica, 81, 239–255.
Brown, S. (1987). Hyponormal operators with thick spectra have invariant subspaces. Annals of Mathematics, 125(1), 93–103.
Enflo, P. H. (1987). On the invariant subspace problem for Banach spaces. Acta Mathematica, 158, 213–313.
Enflo, P. H. (2023). On the invariant subspace problem in Hilbert spaces. arXiv:2305.15442.
Lomonosov, V. I. (1973). Invariant subspaces of operators commuting with compact operators. Functional Analysis and Its Applications, 7(3), 213–214.
Read, C. J. (1984). A solution to the invariant subspace problem. Bulletin of the London Mathematical Society, 16(4), 337–401.
Read, C. J. (1985). A solution to the invariant subspace problem on the space $\ell^1$. Bulletin of the London Mathematical Society, 17(4), 305–317.
Radjavi, H. & Rosenthal, P. (2003). Invariant Subspaces (2nd ed.). Dover.
Bercovici, H., Foiaş, C., & Pearcy, C. (1985). Dual Algebras with Applications to Invariant Subspaces and Dilation Theory. AMS.

Something Like Picard for 1-Forms

Wed, 27 May 2026 00:00:00 +0000

Picard’s great theorem is a statement about how wildly a holomorphic function can behave near an essential singularity. The conjecture below asks whether injectivity of local primitives of a 1-form is enough to rule out such wild behaviour at the origin, forcing the 1-form to extend meromorphically across the puncture.

Conjecture (Elsner, 2010)

Let $D$ be the open unit disk and let $U_1,\dots,U_n$ be open sets with $\bigcup_{j=1}^n U_j = D\setminus{0}$. Suppose there are injective holomorphic functions $f_j : U_j \to \mathbb{C}$ such that $$\mathrm{d}f_j = \mathrm{d}f_k \quad \text{on every connected component of } U_j \cap U_k.$$ Then the $\mathrm{d}f_j$ glue together to a meromorphic 1-form on $D$.

The problem is rated medium importance on the Open Problem Garden and is not recommended for undergraduates, reflecting the depth of the tools involved. It arises from Elsner’s study of hyperelliptic action integrals in the context of the exact WKB method for Schrödinger equations with polynomial potential (Elsner, Ann. Inst. Fourier 49(1), 1999).

Setup and Interpretation #

The compatibility condition $\mathrm{d}f_j = \mathrm{d}f_k$ on each connected component of $U_j \cap U_k$ is equivalent to saying $f_j - f_k$ is locally constant there. The local differentials therefore glue together unambiguously to a global holomorphic 1-form $$\omega \in \Omega^1(D\setminus{0})$$ whose restriction to each $U_j$ equals $\mathrm{d}f_j$. The conjecture asserts that $\omega$ does not have an essential singularity at the origin: it extends to a meromorphic 1-form on all of $D$, meaning near $0$ it looks like $$\omega = \left(\frac{c_{-m}}{z^m} + \cdots + \frac{c_{-1}}{z} + c_0 + c_1 z + \cdots\right)dz$$ for some $m \ge 0$.

The injectivity of each $f_j$ is the crucial hypothesis. Without it the statement is false: any holomorphic 1-form $\omega$ on $D\setminus{0}$ with an essential singularity at $0$ is locally $\mathrm{d}f_j$ for some holomorphic $f_j$, and these $f_j$ can be chosen on contractible pieces of the cover; injectivity is what prohibits essential singularities from arising.

What Is Already Known #

Partial Result

Under the hypotheses of the conjecture:

The 1-form $\omega$ is holomorphic on $D\setminus{0}$.
If the residue of $\omega$ at the origin vanishes, Picard’s big theorem can be applied to conclude that $\omega$ extends meromorphically across $0$.

Point (1) is straightforward: each $\mathrm{d}f_j$ is holomorphic on $U_j$ and the local forms agree on overlaps, so $\omega$ is holomorphic wherever it is defined, i.e. on $D\setminus{0}$.

Point (2) is the key partial result recorded by Elsner. If $\operatorname{Res}_0\omega = 0$, then $\omega$ has trivial monodromy around the origin and admits a single-valued holomorphic primitive $F$ on the punctured disk: $\omega = \mathrm{d}F$. The injectivity of each local branch $f_j$ then forces $F$ itself to be injective on some punctured neighbourhood of $0$ (since $f_j = F + c$ locally). An injective holomorphic function on a punctured disk cannot have an essential singularity there, and this is where Picard enters: at an essential singularity, by Picard’s big theorem, every value is taken infinitely often in any punctured neighbourhood, contradicting injectivity. Hence $F$ has at most a pole at $0$, and $\omega = \mathrm{d}F$ is meromorphic.

The open case is when $\operatorname{Res}_0\omega \ne 0$, so that $\omega$ has non-trivial monodromy and no single-valued global primitive exists. The local primitives $f_j$ then experience monodromy as one loops around the origin, and the injectivity constraint must be leveraged in this more delicate multi-valued setting.

Connection to Picard’s Theorem #

The title of the conjecture reflects a precise structural analogy.

Theorem (Picard's Great Theorem)

If $f$ has an essential singularity at $z_0$, then in every punctured neighbourhood of $z_0$ the function $f$ takes every value in $\mathbb{C}$, with at most one exception, infinitely many times.

In particular, a function with an essential singularity is far from injective near that point. The conjecture elevates this observation to the level of 1-forms: an injective holomorphic primitive should preclude essential singularities in the 1-form itself, even when the primitive is only locally and multi-valuedly defined.

Standard Picard covers the zero-residue case by reducing to a single-valued primitive. The conjecture asks for an analogue that works when the monodromy is non-trivial, a genuinely new statement about multi-valued functions and their differential geometry.

Origin: Hyperelliptic Action Integrals #

The problem arises from the exact WKB method applied to the stationary Schrödinger equation $-\psi’’ + V(x)\psi = E\psi$ with polynomial potential $V$. The formal WKB ansatz $\psi \sim e^{S/\hbar}$ produces a multivalued action integral $$\mathcal{I}(E) = \int_\gamma \sqrt{V(x) - E}\mathrm{d}x$$ defined on a hyperelliptic Riemann surface whose branch structure depends on the energy parameter $E$. Elsner’s 1999 paper constructs the Riemann surface of $\mathcal{I}$ explicitly and shows its branch points accumulate densely in the value plane, a phenomenon that obstructs Borel–Laplace resummation of the WKB symbols.

In this setting the local inverses of $\mathcal{I}$ play the role of the $f_j$: they are locally injective holomorphic functions whose differentials agree on overlaps. The conjecture asks whether the obstruction to global meromorphic extension can arise only from a pole, a controlled singularity, rather than an essential one.

Research Directions #

1. The Non-Zero Residue Case #

The open heart of the problem is the case $\operatorname{Res}_0\omega \ne 0$. Here $\omega$ is not exact near $0$, the monodromy of the primitive is a non-trivial translation $f_j \mapsto f_j + 2\pi i, \operatorname{Res}_0\omega$, and no single injective function encompasses the full behaviour near the singularity.

A natural approach is to pass to a cyclic cover $\tilde D \to D$ that trivialises the monodromy, construct a single-valued primitive on $\tilde D\setminus{0}$, and then appeal to the zero-residue argument there. The key difficulty is that the injectivity of each $f_j$ on $U_j$ does not immediately imply injectivity of the lifted primitive on $\tilde D$, since different sheets can collide. Making this argument precise, or finding a counterexample, is the main open problem.

2. Quantitative Control via Nevanlinna Theory #

An alternative strategy replaces Picard’s theorem by its quantitative form. If $F$ is a meromorphic function on the punctured disk with an essential singularity, the Nevanlinna characteristic $T(r,F)$ grows faster than any power of $\log(1/r)$ as $r\to 0$. For an injective function the counting functions $N(r,a,F)$, recording how often $F = a$ in the punctured disk, satisfy strong constraints.

Nevanlinna-theoretic methods might give a direct bound on $T(r,f_j)$ in terms of the geometry of the cover ${U_j}$ and the injectivity of $f_j$, ruling out essential singularities of $\omega$ without passing through the monodromy argument. This would require adapting the standard Nevanlinna machinery to functions that are only locally defined on an open cover.

3. Replacing Injectivity by Finite Valence #

One can ask whether the conjecture remains true if “injective” is weakened to “at most $d$-to-one” for some fixed integer $d$. Finite-valence holomorphic functions cannot have essential singularities either, by a Picard-type argument (a function of valence at most $d$ takes each value at most $d$ times, so in any neighbourhood of an essential singularity it must omit a set of positive capacity, contradicting Picard).

If the conjecture extends to finite valence, the proof strategy will likely yield a valence-independent argument that illuminates the zero-residue case more transparently. If it fails for finite valence, the counterexample geometry would clarify what role injectivity plays beyond the mere avoidance of essential singularities.

4. Several Complex Variables #

In $\mathbb{C}^n$ for $n \ge 2$ the theory of isolated singularities of holomorphic functions changes dramatically: by Hartogs’ extension theorem, isolated singularities of holomorphic functions are always removable. One would expect the analogous conjecture for holomorphic 1-forms in $\mathbb{C}^n$ to be more tractable, or even to follow from known extension results.

Formulating the precise analogue, replacing the punctured disk by a domain $\Omega\setminus{0}$ in $\mathbb{C}^n$, and specifying what “meromorphic 1-form” means on a higher-dimensional domain, and checking whether Hartogs-type arguments already resolve it would clarify which features of the problem are genuinely one-dimensional.

5. Geometric Formulation on Riemann Surfaces #

The disk $D$ and the puncture at $0$ are not special: the same question can be posed on any Riemann surface $X$ with a marked point $p$. Given an open cover of $X\setminus{p}$ and injective holomorphic functions $f_j$ on each piece with compatible differentials, does $\omega = \mathrm{d}f_j$ extend meromorphically across $p$?

The answer may depend on the genus and the function theory of $X$. For the disk (simply connected, genus 0) the monodromy is a simple translation; for a torus or higher-genus surface the monodromy group is richer and the argument structure should change. Comparing these cases may isolate the essential input from the topology versus the analysis.

References #

Elsner, B. (1999). Hyperelliptic action integral. Annales de l’Institut Fourier, 49(1), 303–331. https://www.numdam.org/item/AIF_1999__49_1_303_0/
Ahlfors, L. V. (1979). Complex Analysis (3rd ed.). McGraw-Hill.
Conway, J. B. (1978). Functions of One Complex Variable (2nd ed.). Springer.
Nevanlinna, R. (1970). Analytic Functions. Springer.
Forster, O. (1981). Lectures on Riemann Surfaces. Springer.
Delabaere, E., Dillinger, H., & Pham, F. (1993). Résurgence de Voros et périodes des courbes hyperelliptiques. Annales de l’Institut Fourier, 43(1), 163–199.

Criterion for Boundedness of Power Series

Tue, 26 May 2026 00:00:00 +0000

Introduction & Problem Statement #

Power series constitute one of the most ubiquitous objects in analysis. A power series $\sum_{n=0}^{\infty}a_n x^n$ with infinite radius of convergence defines a real-entire function $f:\mathbb{R}\to\mathbb{R}$. Whereas the question of convergence is completely settled by Cauchy–Hadamard theory, the question of boundedness of the sum function is far subtler and, as of this writing, remains open.

Question 1 (Rüdinger, 2009)

Let $(a_n) _{n\ge 0}$ be a sequence of real numbers such that the power series $\sum _{n=0}^{\infty}a_n x^n$ converges for every $x\in\mathbb{R}$, thereby defining a smooth function $f:\mathbb{R}\to\mathbb{R}$. Give a necessary and sufficient criterion on $(a_n)$ for $f$ to be bounded on $\mathbb{R}$.

The problem is rated low importance on the Open Problem Garden and is recommended as accessible to undergraduates; nevertheless, a complete answer appears to be unknown.

Motivating examples.

Function	Power series	Bounded?
$\cos x$	$\displaystyle\sum_{k=0}^{\infty}\frac{(-1)^k}{(2k)!}x^{2k}$	$\|\cos x\|\le 1$
$\sin x$	$\displaystyle\sum_{k=0}^{\infty}\frac{(-1)^k}{(2k+1)!}x^{2k+1}$	$\|\sin x\|\le 1$
$e^x$	$\displaystyle\sum_{n=0}^{\infty}\frac{x^n}{n!}$	$e^x\to+\infty$
$p(x)=a_0+\cdots+a_Nx^N,\ N\ge 1$	(polynomial)	unbounded

Background & Prerequisites #

This section collects the core mathematical tools needed to engage seriously with Question 1.

Power Series and Entire Functions #

Definition 1 (Power Series & Radius of Convergence)

A power series centred at the origin is a formal series $\sum_{n=0}^{\infty}a_n x^n$ with $a_n\in\mathbb{R}$. Its radius of convergence is $$ R = \frac{1}{\limsup_{n\to\infty}|a_n|^{1/n}} \in [0,+\infty]. $$

Throughout this note we always assume $R=+\infty$, i.e., $\limsup_{n\to\infty}|a_n|^{1/n}=0$.

Definition 2 (Entire Function)

A function $f:\mathbb{C}\to\mathbb{C}$ is called entire if it is holomorphic on all of $\mathbb{C}$. Every power series with $R=+\infty$ defines a real-entire function, and by the identity theorem its complex extension is entire.

Theorem 1 (Cauchy–Hadamard)

The radius of convergence of $\sum a_n z^n$ equals $$ R = \Bigl(\limsup_{n\to\infty}|a_n|^{1/n}\Bigr)^{-1}. $$

Remark 1

The condition $R=+\infty$ is equivalent to $a_n = O(r^n/n!)$ for every $r>0$, i.e., the coefficients decay faster than any geometric sequence. This is the Paley–Wiener type condition for entire functions of order $1$.

Order and Type of Entire Functions #

Definition 3 (Order and Type)

The order of an entire function $f$ is $$ \rho = \limsup_{r\to\infty}\frac{\log\log M(r)}{\log r}, \qquad M(r)=\max_{|z|=r}|f(z)|. $$ The type $\sigma$ (for $0<\rho<\infty$) is $$ \sigma = \limsup_{r\to\infty}\frac{\log M(r)}{r^{\rho}}. $$

A bounded complex entire function has order $\rho=0$ (by Liouville’s theorem it must be constant), while a bounded real-valued entire function can be non-constant. Boundedness is therefore a genuinely real-variable phenomenon.

Liouville’s Theorem and Its Limitations #

Theorem 2 (Liouville)

Every bounded entire function $f:\mathbb{C}\to\mathbb{C}$ is constant.

Remark 2 (Why Liouville does not solve the problem)

Question 1 concerns real-valued functions $f:\mathbb{R}\to\mathbb{R}$. A function may be bounded on $\mathbb{R}$ while its complex extension is unbounded. For instance, $\cos z$ satisfies $|\cos z|\to\infty$ along the imaginary axis (since $\cos(iy)=\cosh y\to+\infty$). Liouville’s theorem therefore does not apply, and the problem is genuinely non-trivial.

Algebraic Structure of the Relevant Function Space #

Definition 4 (Space of Bounded Power Series)

Let $\mathcal{B}$ denote the set of all functions $f:\mathbb{R}\to\mathbb{R}$ that can be represented as a convergent power series $\sum_{n\ge 0}a_n x^n$ (with $R=+\infty$) and that are bounded on $\mathbb{R}$.

Proposition 1, Algebraic Properties of $\mathcal{B}$ (Rüdinger, 2009)

$\mathcal{B}$ is a linear subspace of $C^\infty(\mathbb{R})$: if $f,g\in\mathcal{B}$ and $\lambda\in\mathbb{R}$ then $f+\lambda g\in\mathcal{B}$.
$\mathcal{B}$ is closed under pointwise multiplication: if $f,g\in\mathcal{B}$ then $fg\in\mathcal{B}$.
$\mathcal{B}$ contains all functions of the form $c\cos(h(x))$, where $c\in\mathbb{R}$ and $h:\mathbb{R}\to\mathbb{R}$ is any entire function.

Remark 3

Part (3) follows from $\cos(h(x)) = \operatorname{Re}(e^{ih(x)})$ together with $|\cos(h(x))|\le 1$. The class is strictly larger than ${c\cos(bx):c,b\in\mathbb{R}}$; for example, $\cos(x^3-x)\in\mathcal{B}$.

Known Partial Results #

Necessary Conditions #

Proposition 2, Necessary Condition for Boundedness (Rüdinger, 2009)

Suppose $f(x)=\sum_{n=0}^{\infty}a_n x^n$ is bounded on $\mathbb{R}$. Then either:

$a_0$ is the only non-zero coefficient (i.e., $f$ is the constant function $f\equiv a_0$), or
there are infinitely many indices $n$ with $a_n\neq 0$, and the signs of the non-zero $a_n$ change infinitely often.

Remark 4

The sign-change condition is necessary: if the non-zero coefficients are eventually of one sign, the dominant-term comparison shows $f(x)\to\pm\infty$ as $x\to+\infty$ or $x\to-\infty$.

Corollary 1

Every non-constant polynomial is unbounded on $\mathbb{R}$.

Proof.

A polynomial has only finitely many non-zero coefficients. By Proposition 2 (1), the only bounded polynomial is the constant function. Any non-constant polynomial satisfies $|p(x)|\to\infty$ as $|x|\to\infty$.

The Sign-Change Condition Is Not Sufficient #

The condition of Proposition 2 is not sufficient, as the following examples show.

Example 1

Consider the geometric series $$ f(x) = \sum_{n=0}^{\infty}(-1)^n x^{2n} = \frac{1}{1+x^2}, \qquad |x|<1. $$ The coefficients alternate in sign, yet $R=1\neq+\infty$. One must first require $R=+\infty$ before the sign-change condition becomes meaningful.

For a subtler case with $R=+\infty$: take $a_n=(-1)^n/n!$, so $$ f(x) = \sum_{n=0}^{\infty}\frac{(-1)^n}{n!}x^n = e^{-x}. $$ The signs alternate, yet $e^{-x}\to+\infty$ as $x\to-\infty$.

Remark 5

The $e^{-x}$ example reveals the key gap: sign alternation of the coefficients does not prevent the function from growing in one direction, because the series for $e^{-x}$ reconstructs exponential growth in the negative half-line. A complete criterion must capture cancellation in both directions.

Connections to Entire Function Theory #

Theorem 3 (Borel–Carathéodory)

Let $f$ be holomorphic in $|z|\le R$. Then for $0<r<R$, $$ M(r) \le \frac{2r}{R-r}\sup_{|z|=R}\operatorname{Re}f(z) + \frac{R+r}{R-r},|f(0)|. $$

Remark 6

Borel–Carathéodory shows that the real part of a complex-valued entire function controls its modulus. For a real-valued function on $\mathbb{R}$ the analogous control is more delicate, since we only observe the function on a line, not on a disk.

Theorem 4 (Hadamard Factorisation)

Every entire function of finite order $\rho$ can be written as $$ f(z) = z^m e^{g(z)}\prod_{n=1}^{\infty} E_p!\left(\frac{z}{z_n}\right), $$ where $m\ge 0$, $p=\lfloor\rho\rfloor$, $g$ is a polynomial of degree $\le\rho$, and the $E_p$ are Weierstrass elementary factors.

Remark 7

A bounded real entire function of infinite order (if one exists) would not be directly covered by the Hadamard factorisation. Understanding the zero set and the exponential factor in $e^{g(z)}$ may be key to classifying all $f\in\mathcal{B}$.

The Open Sub-Question on the Generators of $\mathcal{B}$ #

Question 2 (Rüdinger, 2009)

Does $\mathcal{B}$ consist precisely of functions of the form $c\cos(h(x))$ and their linear combinations and products, where $h:\mathbb{R}\to\mathbb{R}$ is entire and $c\in\mathbb{R}$?

A positive answer would give an implicit characterisation via algebraic generators. A negative answer would require producing a bounded entire function on $\mathbb{R}$ that does not lie in the $\mathbb{R}$-algebra generated by ${\cos\circ, h : h\text{ entire}}$.

Remark 8

By Proposition 1 (3), every $c\cos(h(x))$ belongs to $\mathcal{B}$, and $\mathcal{B}$ is an algebra, so all products and sums remain in $\mathcal{B}$. What is unknown is whether every element of $\mathcal{B}$ arises this way. Note that $\sin x = \cos(x-\pi/2) \in \mathcal{B}$, so sine is already covered.

Research Directions and Conjectures #

Direction 1: Coefficient Growth Rate #

A promising approach is to examine the rate of decay of $|a_n|$, not just the sign pattern.

Question 3

Is there a decay condition on $|a_n|$, combined with the sign-change condition, that gives a sufficient criterion for $f\in\mathcal{B}$?

Approach. The Cauchy estimates give $|a_n| = |f^{(n)}(0)|/n!\le M(r)/r^n$ for all $r>0$. If $f\in\mathcal{B}$ with $|f|\le B$, the bound $|a_n|\le B/r^n$ holds for every $r>0$, but this recovers only the $R=+\infty$ condition. Is there a sharper constraint?

Direction 2: Fourier-Analytic Approach #

Every $f\in L^\infty(\mathbb{R})\cap L^2(\mathbb{R})$ possesses a square-integrable Fourier transform. If $f$ is also entire, Paley–Wiener forces the transform to be compactly supported. However, a generic $f\in\mathcal{B}$ may not lie in $L^2$ (e.g., $\cos x\notin L^2(\mathbb{R})$).

Question 4

Can the Fourier theory for tempered distributions give a necessary and sufficient condition for $f\in\mathcal{B}$ in terms of the spectral support of $f$?

Direction 3: Differential Equation Characterisation #

Bounded entire functions often arise as solutions to ODEs. For instance $y’’+y=0$ has bounded solutions $A\cos x + B\sin x$. More generally, $y’’+\omega(x)y=0$ with $\omega$ entire and bounded can produce bounded solutions.

Question 5

Characterise those linear differential operators $L$ with entire coefficients whose full solution space lies within $\mathcal{B}$.

Direction 4: Even/Odd Decomposition and Reduction #

Every $f\in\mathcal{B}$ splits as $f=f_e+f_o$ where $$ f_e(x)=\tfrac{1}{2}(f(x)+f(-x))=\sum_{k\ge 0}a_{2k}x^{2k} \quad\text{and}\quad f_o(x)=\tfrac{1}{2}(f(x)-f(-x))=\sum_{k\ge 0}a_{2k+1}x^{2k+1}. $$ Since $f_e(x)=g(x^2)$ for the entire function $g(t)=\sum_{k\ge 0}a_{2k}t^k$, boundedness of $f_e$ reduces to: is $g$ bounded on $[0,+\infty)$? This reduction may make the even and odd parts easier to study separately.

Direction 5: Polynomial Approximation and Numerics #

Question 6

If the partial sums $S_N(x)=\sum_{n=0}^{N}a_n x^n$ are uniformly bounded on growing intervals $[-R_N,R_N]$ (with $R_N\to\infty$), does it follow that $f\in\mathcal{B}$? Conversely, if $f\in\mathcal{B}$, how fast must $R_N$ grow relative to $N$ for the bound to hold?

Summary of Open Problems #

#	Statement
Q1	Give a necessary and sufficient condition on $(a_n)$ for $f=\sum a_n x^n$ to be bounded on $\mathbb{R}$.
Q2	Is $\mathcal{B}$ generated (as an algebra) precisely by ${c\cos(h(x)):h\text{ entire}}$?
Q3	Does a sharper decay condition on $
Q4	Can spectral-support (Paley–Wiener / distribution) theory characterise $\mathcal{B}$?
Q5	Which linear ODEs with entire coefficients have solution space $\subseteq\mathcal{B}$?
Q6	What is the precise relationship between truncation bounds on $[-R_N,R_N]$ and $f\in\mathcal{B}$?

References #

Ahlfors, L. V. (1979). Complex Analysis, 3rd ed. McGraw-Hill.
Boas, R. P. (1954). Entire Functions. Academic Press.
Conway, J. B. (1978). Functions of One Complex Variable, 2nd ed. Springer.
Levin, B. Ya. (1996). Lectures on Entire Functions. AMS Translations of Mathematical Monographs, vol. 150.
Rudin, W. (1976). Principles of Mathematical Analysis, 3rd ed. McGraw-Hill.
Rudin, W. (1987). Real and Complex Analysis, 3rd ed. McGraw-Hill.
Rüdinger, A. (2009). Criterion for boundedness of power series. Open Problem Garden. http://www.openproblemgarden.org/op/criterion_for_boundedness_of_power_series
Stein, E. M. and Shakarchi, R. (2003). Fourier Analysis: An Introduction. Princeton University Press.
Stein, E. M. and Shakarchi, R. (2010). Complex Analysis. Princeton University Press.
Titchmarsh, E. C. (1939). The Theory of Functions, 2nd ed. Oxford University Press.

Brezis' first open problem - An elliptic equation involving the critical exponent in 3D

Sat, 18 Apr 2026 00:00:00 +0000

Yamabe problem #

Yamabe problem: Suppose $(\mathcal{M}, g_0)$ is a compact closed Riemannian manifold with dimension $N \geq 3$, does there exist a conformal metric $g = u^{\frac{4}{N-2}}g_0$ which has constant scalar curvature $R_g \equiv C$?

Find $u > 0$ on $\mathcal{M}$ such that $$ -\frac{4(N-1)}{N-2}\Delta_{g_0}u + R_{g_0}u = Cu^{\frac{N+2}{N-2}}\qquad\text{on }\mathcal{M}. $$

Some results:

Trudinger [1968]: if $g$ has non-positive scalar curvature.
Aubin [1976]: $N \geq 6$ and $(\mathcal{M}, g)$ not locally conformally flat.
Schoen [1984]: any dimension, the remaining cases, assuming the Positive Mass Theorem by Schoen-Yau [1979].

A special case #

Consider the special case where $\mathcal{M}$ is a bounded domain $\Omega$ in $\mathbb{R}^{N}$: $$ \begin{cases} -\Delta u = u^{\frac{N+2}{N-2}}\qquad\text{in }\Omega, \\ u > 0\qquad\text{in }\Omega, \\ u = 0\qquad\text{on }\partial\Omega. \end{cases} $$

Pohozaev [1965]: if $\Omega$ is star-shaped, then there is no nontrivial solution.

Brezis-Nirenberg problem #

Consider a lower-order perturbation: $$ \begin{cases} -\Delta u = u^{\frac{N+2}{N-2}} + \lambda u\qquad\text{in }\Omega, \\ u > 0\qquad\text{in }\Omega, \\ u = 0\qquad\text{on }\partial\Omega. \end{cases} $$

Some results:

Pohozaev’s result also yields nonexistence when $\lambda \leq 0$ and $\Omega$ is star-shaped.
If a positive solution exists, then necessarily $\lambda < \lambda_1$, where $\lambda_1$ is the first eigenvalue of $-\Delta$ on $\Omega$ with zero Dirichlet boundary condition.

Hence, for positive solutions on star-shaped domains, $$ 0 < \lambda < \lambda_1. $$

Brezis’ Open Problem 1.1 #

Let $N=3$, and let $\Omega = B_1 \subset \mathbb{R}^3$ be the unit ball. Consider $$ \begin{cases} -\Delta u = u^5 + \lambda u \qquad \text{in } B_1, \\ u = 0 \qquad \text{on } \partial B_1. \end{cases} $$ We ask whether this problem admits a nontrivial positive solution $u \not\equiv 0$.

Here the exponent $5 = \frac{N+2}{N-2}$ is the critical Sobolev exponent when $N=3$, and this is exactly the source of the main compactness difficulty.

Let $\lambda_1$ be the first Dirichlet eigenvalue of $-\Delta$ on $B_1$. The classical Brezis-Nirenberg theory shows:

If $\lambda \leq 0$, then the only solution is $u \equiv 0$.
If $\frac{1}{4}\lambda_1 < \lambda < \lambda_1$, then there exists a positive radial solution.
If $0 < \lambda \leq \frac{1}{4}\lambda_1$, then any radial solution must be trivial; hence there is no positive radial solution.
If $\lambda > \lambda_1$, there exist sign-changing solutions, but no positive solution.

Therefore the unresolved case is:

Open Problem 1.1. Assume $$ 0 < \lambda \leq \frac{1}{4}\lambda_1. $$ Does there exist a nontrivial solution?
Equivalently, since no positive radial solution can exist in this range, can there exist a non-radial positive solution?

This problem has remained open for decades, even if one restricts further to a smaller interval such as $$ 0 < \lambda < \varepsilon $$ for some sufficiently small $\varepsilon > 0$.

Remarks #

A few points are worth emphasizing:

By the Gidas-Ni-Nirenberg symmetry principle, positive solutions on a ball are often expected to be radial; however, in this regime Brezis observed that any radial solution must vanish, so any eventual positive solution would have to be genuinely non-radial.
This makes dimension $3$ sharply different from higher-dimensional cases, where the Brezis-Nirenberg existence theory is better understood.
The bifurcation picture suggests branches of sign-changing non-radial solutions emerging from higher eigenvalues, but it is not known whether such branches can reach the interval $\left(0,\frac14\lambda_1\right]$.

References #

H. Brezis and L. Nirenberg, Positive solutions of nonlinear elliptic equations involving critical Sobolev exponents, Comm. Pure Appl. Math. 36 (1983), 437–477.
H. Brezis, Some of My Favorite Open Problems, Open Problem 1.1.
M. Comte, Solutions of elliptic equations with critical Sobolev exponent in dimension three, Nonlinear Anal. 17 (1991), 445–455.
O. Druet, Elliptic equations with critical Sobolev exponents in dimension 3, Ann. Inst. H. Poincaré Anal. Non Linéaire 19 (2002), 125–142.

Recent Advances in KAN-Based Numerical PDE Solvers

Mon, 30 Mar 2026 00:00:00 +0000

Kolmogorov-Arnold Networks (KANs), introduced in 2024, have rapidly become one of the most active frontiers in scientific machine learning for solving partial differential equations (PDEs) (Liu et al., 2024). Unlike Multi-Layer Perceptrons (MLPs), which apply fixed activation functions at nodes, KANs place learnable univariate activation functions on edges, grounded in the Kolmogorov-Arnold representation theorem: every continuous multivariate function can be expressed as a composition of univariate functions and summations. This structural difference gives KANs two key properties relevant to PDE numerics — higher interpretability and parameter efficiency — making them an appealing successor to MLP-based Physics-Informed Neural Networks (PINNs).

From 2024 through early 2026, researchers have published dozens of frameworks combining KANs with classical numerical concepts (spectral methods, operator learning, energy-stable time-stepping, neural operators) and targeting problems ranging from single PDEs to high-dimensional systems with hundreds of variables.

Overview #

The KAN-for-PDEs landscape organises into several interrelated research threads:

Physics-Informed KAN Frameworks (PIKANs / KINN) — direct replacements of MLP layers in PINNs with KAN layers, using strong, energy, and inverse PDE formulations.
Spectral-Basis and Wavelet-Enriched KANs — embedding orthogonal polynomial or wavelet bases to combat spectral bias.
KAN-Based Neural Operators — KAN sub-networks inside DeepONet, FNO, and pseudo-differential operator frameworks for learning PDE solution maps.
Time-Dependent and Evolutionary KANs — energy-stable schemes, KAN-ODEs, and moving-boundary solvers.
Discontinuities, Shock Waves, and Turbulence — specialised architectures for sharp transitions.
High-Dimensional PDEs — separable and tensor-product KAN surrogates scaling to hundreds of dimensions.
Data-Driven Discovery and Inverse Problems — interpretability-driven model identification.

Architecture	Key Strength	Representative Work
KINN	Forward/inverse problems, strong/energy/inverse forms	Wang et al., 2024
ChebPIKAN	Fluid mechanics PDEs, orthogonal basis	Cui et al., 2024
KANO	Symbolic operator recovery, variable-coefficient PDEs	arXiv:2509.16825
EvoKAN	Long-horizon time evolution, energy stability	arXiv:2503.01618
Anant-KAN	High-dimensional PDEs (up to 300D)	arXiv:2505.03595
DPINN	Shock waves and discontinuities	arXiv:2507.08338

Background #

The Kolmogorov-Arnold Representation Theorem #

The theoretical foundation of KANs is the Kolmogorov-Arnold theorem: any continuous function $f: [0,1]^n \to \mathbb{R}$ can be written as

$$f(x_1, \ldots, x_n) = \sum_{q=0}^{2n} \Phi_q!\left(\sum_{p=1}^{n} \phi_{q,p}(x_p)\right),$$

where $\phi_{q,p}: [0,1] \to \mathbb{R}$ and $\Phi_q: \mathbb{R} \to \mathbb{R}$ are univariate continuous functions. In contrast to MLPs — where activations are fixed and weights are learned — KANs parameterise the activation functions themselves (typically as B-splines or orthogonal polynomials) on each edge of the network graph.

Physics-Informed Neural Networks (PINNs) — The Starting Point #

PINNs (Raissi, Perdikaris, & Karniadakis, 2019) embed physical laws directly into the neural network loss function. For a PDE $\mathcal{N}[u] = f$ on domain $\Omega$ with boundary condition $\mathcal{B}[u] = g$ on $\partial\Omega$, the PINN loss is

$$\mathcal{L} = \underbrace{\frac{1}{N _r}\sum _{i=1}^{N _r}|\mathcal{N}[u _\theta](x _i)|^2} _{\text{PDE residual}} + \underbrace{\frac{1}{N _b}\sum _{j=1}^{N _b}|\mathcal{B}[u _\theta](x _j) - g(x _j)|^2} _{\text{boundary condition}}.$$

The substitution of MLP layers with KAN layers in this framework is the basic idea behind all PIKAN architectures.

Recent Developments #

1. Physics-Informed KAN Frameworks #

KINN — The Foundational Framework #

The Kolmogorov-Arnold-Informed Neural Network (KINN) is the primary physics-informed framework replacing MLP layers in PINNs with KAN layers (Wang et al., 2024). KINN supports three PDE formulations: the strong form (collocating the PDE residual directly), the energy form (minimising a variational energy functional), and the inverse form (recovering unknown parameters from observations).

Systematic benchmarks demonstrate that KINN significantly outperforms MLP-based PINNs in accuracy and convergence speed for multi-scale problems, stress concentration, singularities, nonlinear hyperelasticity, and heterogeneous materials. The one domain where MLP remains competitive is complex geometry problems. Published in Computer Methods in Applied Mechanics and Engineering (2024), KINN has become the canonical reference for subsequent KAN-PDE research.

Chebyshev and Polynomial Basis PIKANs #

A major architectural refinement has been substituting B-spline basis functions with orthogonal polynomial bases. The ChebPIKAN model leverages orthogonality of Chebyshev polynomials and integrates physics-informed loss functions for fluid-mechanics PDEs including the Allen-Cahn, Burgers, Helmholtz, Kovasznay flow, cylinder wake flow, and cavity flow equations (Cui et al., 2024). ChebPIKAN significantly outperforms vanilla KAN by embedding essential physical information and alleviating overfitting.

The AC-PKAN (Attention-Enhanced Chebyshev PKAN) further addresses the rank collapse problem in Chebyshev-based KANs by integrating wavelet-activated MLPs with an internal attention mechanism, provably preserving a full-rank Jacobian and approximating PDEs of arbitrary order (arXiv:2505.08687). An external Residual Gradient Attention (RGA) mechanism dynamically re-weights individual loss terms based on gradient norms, stabilising training of stiff PDE systems.

The Legendre-KAN method applies Legendre polynomial orthogonality to solve the fully nonlinear Monge-Ampère equation with Dirichlet boundary conditions, demonstrating effectiveness on both smooth and singular solutions across various dimensions and in the optimal transport problem.

Hybrid KAN–MLP and Augmented Lagrangian Approaches #

The AL-PKAN introduces a hybrid encoder-decoder architecture where the decoder maps hidden variable features from high-dimensional latent space into trainable univariate activation functions via KAN (Zhang et al., 2025). An augmented Lagrangian function treats penalty factors and Lagrangian multipliers as learnable parameters to dynamically balance constraint terms. This approach typically improves prediction accuracy by one to two orders of magnitude compared to traditional neural networks.

The HPKM-PINN combines MLP and KAN branches with a trainable convex mixing parameter to blend features optimally across subdomains, especially effective for multi-scale problems.

2. Spectral-Basis and Wavelet-Enriched KANs #

Wav-KAN incorporates wavelet functions into the KAN structure, capturing both high-frequency and low-frequency components via continuous dyadic wavelet transforms for multiresolution analysis. This directly addresses the spectral bias problem inherent in standard neural networks, which struggle to resolve high-frequency features in PDE solutions.

PIKANs have been extended to multi-resolution spectral hybridisations (HWF-PIKAN), combining wavelet and Fourier features to explicitly counteract spectral bias and accelerate convergence for advection-dominated and kinetic equations.

A unified benchmark published in February 2026 provides a systematic, controlled comparison between MLP-based PINNs and KAN-based PIKANs across a representative collection of ODEs and PDEs (arXiv:2602.15068). The results show that PIKANs consistently achieve more accurate solutions, converge in fewer iterations, and yield superior gradient estimates.

3. KAN-Based Neural Operators #

Neural operators learn mappings between infinite-dimensional function spaces, enabling generalisation across families of PDEs. KANs are increasingly embedded in operator architectures.

DeepOKAN replaces MLP sub-networks in the Deep Operator Network (DeepONet) framework with KAN sub-networks using Gaussian Radial Basis Functions (Abueidda et al., 2024). The branch and trunk networks of DeepONet are re-implemented as RBF-KAN layers. Evaluated on 1D sinusoidal waves, 2D orthotropic elasticity, and transient Poisson problems, DeepOKAN consistently achieves lower training losses and more accurate predictions compared to standard DeepONet.

PO-CKAN (Physics-informed Deep Operator KAN with Chunk Rational Structure) integrates PDE residual loss into a DeepONet-style branch–trunk architecture using Chunkwise Rational KAN sub-networks (arXiv:2510.08795). On Burgers’ equation with viscosity $\nu = 0.01$, PO-CKAN reduces mean relative $L^2$ error by approximately 48% compared to PI-DeepONet.

KANO (Kolmogorov-Arnold Neural Operator) is the most theoretically ambitious framework, jointly parameterising operators in both spectral and spatial bases within a pseudo-differential operator framework (arXiv:2509.16825). KANO overcomes the pure-spectral bottleneck of Fourier Neural Operators (FNO): while FNO remains practical only for spectrally sparse operators, KANO remains expressive over generic variable-coefficient PDEs. Crucially, KANO achieves symbolic recovery of the learned operator, enabling closed-form extraction of governing equations. On the quantum Hamiltonian learning benchmark, KANO attains state infidelity $\approx 6 \times 10^{-6}$ compared to FNO’s $\approx 1.5 \times 10^{-2}$.

KAN-ONets embeds adaptive, learnable B-spline activations from KAN into FNO (yielding FNO-KAN for uniform grids) and into the attention-based GNOT (yielding GNOT-KAN for arbitrary grids). Across seven challenging PDE benchmarks, KAN-ONets achieves MSE reductions of 10.2–30.2% compared to existing models.

4. Time-Dependent and Evolutionary KANs #

EvoKAN (Evolutionary Kolmogorov-Arnold Network, March 2025) introduces a novel paradigm: rather than retraining repeatedly, EvoKAN encodes only the PDE’s initial state during an initial learning phase, then evolves the network parameters numerically, governed by the same PDE (arXiv:2503.01618). KAN weights are treated as time-dependent functions updated through time steps, enabling prediction over arbitrarily long time horizons.

EvoKAN integrates the Scalar Auxiliary Variable (SAV) method to guarantee unconditional energy stability: at each time step, SAV requires only solving decoupled linear systems with constant coefficients. EvoKAN has been validated on the 1D and 2D Allen-Cahn equations (phase-field phenomena with sharp interfaces) and the 2D Navier-Stokes equations (turbulent flows), closely matching analytical references.

KAN-ODEs apply KANs as the backbone of neural ordinary differential equation (ODE) frameworks, enabling data-driven discovery of governing dynamics with greater interpretability compared to MLP-based neural ODEs (arXiv:2407.04192).

Shallow-KAN addresses Stefan-type moving boundary problems (melting, solidification) by approximating the temperature distribution and moving interface while enforcing governing PDEs, phase equilibrium, and the Stefan condition through physics-informed residuals (arXiv:2601.09818). A key finding is that two hidden layers with tens of learnable parameters suffice — far fewer than the nearly one million parameters required by standard MLP-based PINNs for the same problem.

5. Discontinuities, Shock Waves, and Turbulence #

A known weakness of smooth neural networks is difficulty resolving sharp spatial transitions and discontinuities such as shock waves. Two specialised frameworks address this:

DPINN (Discontinuity-aware PINN) incorporates a discontinuity-aware KAN for modelling shock-wave properties, combined with an adaptive Fourier-feature embedding layer to mitigate spectral bias, mesh transformation for complex geometries, and learnable local artificial viscosity to stabilise the algorithm near discontinuities (arXiv:2507.08338). Numerical experiments on the inviscid Burgers’ equation and transonic/supersonic airfoil flows demonstrate superior accuracy over existing methods.

A Physics-Infused KAN for Turbulence (2026) targets turbulent flow prediction integrated with CFD, applying KAN within the Reynolds-Averaged Navier-Stokes (RANS) framework. It addresses the information bottleneck phenomenon in multi-output KANs and proposes pruning-based network optimisation, achieving high prediction accuracy for Navier-Stokes equations.

6. High-Dimensional PDEs and the Curse of Dimensionality #

High-dimensional PDEs (tens to hundreds of dimensions) are where conventional numerical methods completely fail due to exponential cost scaling. KAN has shown early promise here.

Anant-Net (2025) is a scalable neural surrogate employing a tensor product formulation with dimension-wise sweeps and selective automatic differentiation (arXiv:2505.03595). Benchmarked on the Poisson, Sine-Gordon, Allen-Cahn, and transient heat equations, Anant-Net solves PDEs in up to 300 dimensions on a single GPU within a few hours. The framework includes Anant-KAN, an interpretable KAN-based variant offering deeper insights into the learned solution structure.

Separable PIKANs (SPIKANs) decompose the PDE solution into products of one-dimensional KAN networks, drastically reducing computational complexity for high-dimensional problems while retaining accuracy and interpretability.

7. Data-Driven Discovery and Inverse Problems #

KANs are especially powerful for scientific discovery tasks where interpretability of the learned function is critical.

Data-driven model discovery with KANs has been demonstrated on complex dynamical systems — including the Ikeda map and optical-cavity systems — where sparse optimisation methods fail due to non-sparse governing equations (arXiv:2409.15167). KAN captures complex behaviour while offering interpretability through its edge-wise univariate functions, providing insight into governing dynamics inaccessible in black-box MLPs.

PI-KAN-PointNet extends PIKAN to simultaneously solve inverse problems over multiple irregular geometries within a single training run, demonstrated on natural convection over 135 geometries with sparse data. KINN for Inverse Problems enables identification of unknown material parameters in heterogeneous or hyperelastic materials from partial observations. KANHedge applies KANs to high-dimensional BSDE solvers for option pricing, demonstrating improved hedging performance over MLP-based deep BSDE solvers (arXiv:2601.11097).

8. Comparative Analysis: KAN vs. MLP for PDEs #

A comprehensive comparison between MLP and KAN representations for differential equations establishes nuanced findings (arXiv:2406.02917):

Architecture	Shallow Networks	Deep Networks	Robustness	Interpretability
KAN (B-spline)	Superior accuracy	Comparable to MLP	Lower (may diverge with different seeds)	High — symbolic extraction possible
KAN (Chebyshev/Legendre)	High accuracy	Competitive	Moderate — rank collapse risk	High
MLP/PINN	Moderate accuracy	Robust	High	Low
PIKAN (optimised)	Superior	Superior or comparable	Moderate	High

Key findings: KANs in shallow settings significantly outperform MLPs, leveraging per-edge nonlinear expressiveness. In deep settings, KANs do not consistently outperform MLPs, but when properly optimised (e.g., with L-BFGS or Self-Scaled Broyden second-order optimisers), they achieve superior accuracy. JAX-based PIKAN implementations have achieved up to 84× training speedup over original NumPy/PyTorch KANs.

Open Problems #

Despite rapid progress, several challenges remain:

Computational cost. Spline function evaluation involves multiple iterations, making KANs significantly slower per parameter than MLPs. Variants like PowerMLP propose more efficient formulations (arXiv:2412.13571), but a satisfactory solution to raw training speed at scale is still outstanding.

Scalability to complex geometries. KINN and standard PIKANs underperform MLPs on irregular geometry problems. This remains a practical bottleneck for engineering applications involving complex domains.

Gradient instability in deep KANs. Deep PIKANs face vanishing/exploding gradient challenges, motivating Glorot-like initialisation strategies and residual-gated architectures.

Theoretical guarantees. Generalisation bounds for KANs trained on PDE collocation have been studied — bounds scale with $\ell_1$ norms of spline coefficients — but practical understanding of how architecture choices affect convergence and generalisation remains incomplete (arXiv:2410.08026).

Operator learning completeness. While KANO achieves symbolic operator recovery, the theoretical relationship between KAN architecture depth/width and approximation of PDE solution operators is still under active development.

The trajectory is clear: KAN-based PDE solvers are moving from proof-of-concept demonstrations on canonical benchmarks toward production-ready frameworks for engineering simulation, turbulence modelling, inverse problems, and high-dimensional scientific computing. The combination of interpretability, parameter efficiency, and growing theoretical foundations positions KANs as a genuinely transformative architecture for numerical PDEs.

References #

Abueidda, D. W., Pantidis, P., & Mobasher, M. E. (2024). DeepOKAN: Deep operator network based on Kolmogorov Arnold networks for mechanics problems. arXiv:2405.19143. https://www.alphaxiv.org/overview/2405.19143v3

Cui, Z., et al. (2024). Physics-informed Kolmogorov–Arnold network with Chebyshev polynomials for fluid mechanics. Physics of Fluids, 37(9), 095120. https://pubs.aip.org/aip/pof/article-abstract/37/9/095120/3361431

Knottenbelt, W., et al. (2026). KANHedge: Efficient hedging of high-dimensional options using Kolmogorov-Arnold network-based BSDE solver. arXiv:2601.11097. https://arxiv.org/abs/2601.11097

Kovachki, N., et al. (2023). Neural operator: Learning maps between function spaces with applications to PDEs. Journal of Machine Learning Research, 24(89), 1–97.

Li, Z., et al. (2025). Discontinuity-aware KAN-based physics-informed neural networks. arXiv:2507.08338. https://arxiv.org/html/2507.08338v1

Liu, Z., et al. (2024). KAN: Kolmogorov–Arnold Networks. arXiv:2404.19756. https://storage.prod.researchhub.com/uploads/papers/2024/05/04/2404.19756.pdf

Liu, Z., et al. (2024). A comprehensive and FAIR comparison between MLP and KAN representations for differential equations and operator networks. arXiv:2406.02917. https://arxiv.org/abs/2406.02917

Liu, Z., et al. (2026). A unified benchmark of physics-informed neural networks and Kolmogorov-Arnold networks. arXiv:2602.15068. https://arxiv.org/html/2602.15068v1

Peng, W., et al. (2025). KANO: Kolmogorov-Arnold Neural Operator. arXiv:2509.16825. https://arxiv.org/abs/2509.16825

Shukla, K., et al. (2025). Anant-Net: Breaking the curse of dimensionality with scalable and interpretable neural surrogates for high-dimensional PDEs. arXiv:2505.03595. https://arxiv.org/html/2505.03595v3

Tang, K., et al. (2025). AC-PKAN: Attention-enhanced and Chebyshev polynomial-based Kolmogorov-Arnold networks. arXiv:2505.08687. https://arxiv.org/html/2505.08687v2

Wang, Z., et al. (2025). EvoKAN: Energy-dissipative evolutionary Kolmogorov-Arnold networks for complex PDE systems. arXiv:2503.01618. https://arxiv.org/abs/2503.01618

Wang, Z., et al. (2024). Kolmogorov–Arnold-Informed neural network: A physics-informed deep learning framework for solving forward and inverse problems based on Kolmogorov–Arnold Networks. Computer Methods in Applied Mechanics and Engineering. arXiv:2406.11045. https://www.sciencedirect.com/science/article/abs/pii/S0045782524007722

Xu, Y., et al. (2026). Shallow-KAN based solution of moving boundary PDEs. arXiv:2601.09818. https://arxiv.org/html/2601.09818v1

Yang, L., et al. (2025). KAN-ODEs: Kolmogorov-Arnold network ordinary differential equations for learning dynamical systems and hidden physics. arXiv:2407.04192. https://arxiv.org/html/2407.04192v1

Zhang, Z., et al. (2025). Physics-informed neural networks with hybrid Kolmogorov-Arnold networks. PMC. https://pmc.ncbi.nlm.nih.gov/articles/PMC11950322/

Zuo, Q., et al. (2025). Data-driven model discovery with Kolmogorov-Arnold networks. arXiv:2409.15167. https://arxiv.org/abs/2409.15167

Recent Advances in Numerical PDEs

Mon, 30 Mar 2026 00:00:00 +0000

Numerical methods for partial differential equations (PDEs) have entered a period of rapid transformation, driven by two converging forces: deep learning’s maturation as a tool for high-dimensional function approximation, and the resurgence of classical methods augmented by machine learning. The field broadly divides into physics-informed machine learning, neural operator learning, foundation models for PDEs, and the continuing evolution of classical high-order, structure-preserving, and data-driven discovery methods. Quantum computing and laser-based hardware solvers are also beginning to enter the landscape. This survey organises the most active research fronts, highlights landmark and recent key papers, and identifies open problems as of early 2026.

Overview #

The table below summarises the major approaches covered in this survey, their representative key papers, and their current status.

Approach	Representative Key Papers	Status
PINNs (adaptive/staged training)	Raissi et al. (2019); IEEE 2025 staged training; PhysicsNeMo/Modulus	Production-ready
KANs for PDEs	Liu et al. (2024, ICLR 2025); KINN; PI-KAN; HRKANs	Active frontier
Fourier Neural Operators	Li et al. (2020); O-FNO (2025); ReBA accelerator	Widely adopted
DeepONet variants	Lu et al. (2019); L-DeepONet; Hybrid KAN-DeepONet; Quantum DeepONet	Mature + expanding
PDE Foundation Models	Poseidon; OmniArch; PDEformer; Geo-NeW	Emerging (2024–2026)
Deep BSDE & high-dimensional	Han, Jentzen, & E (PNAS 2018); Deep Shotgun; DRDM; Heun-BSDE	Active
Data-driven PDE discovery	SINDy (Brunton et al.); GN-SINDy; Evo-SINDy; Bayesian-SINDy	Active
Structure-preserving methods	Hairer et al. (2006); Stochastic multisymplectic; Geo-NeW	Maturing
High-order FEM/DG	hp-DGFEM Boltzmann; ML-accelerated FEM; FEX-PG	Mature + augmented
Fractional PDEs	Review (2024); O-FNO for fractional Poisson; Fractional Laplacian meshfree	Active
Hamilton–Jacobi PDEs	Review arXiv:2502.20833; Actor-critic NN; Deep BSDE for HJB	Active
Multiscale / ROM	MLP-based multiscale; POD-DL-ROM; Multi-fidelity ROM	Active
Uncertainty quantification	QMC/RQMC; PDE-DKL	Active
Quantum computing	Schrödingerisation; H-DES (ColibriTD); Quantum DeepONet	Early-stage
Photonic/analog solvers	LightSolver LPU	Very early-stage

Background #

The Classical PDE Problem #

A general PDE on a domain $\Omega \subseteq \mathbb{R}^d$ takes the form

$$\mathcal{N} [u] (x) = f(x), \quad x \in \Omega, \qquad \mathcal{B} [u] (x) = g(x), \quad x \in \partial \Omega,$$

where $\mathcal{N}$ is a (possibly nonlinear) differential operator, $\mathcal{B}$ encodes boundary or initial conditions, and $u: \Omega \to \mathbb{R}$ is the unknown. Classical mesh-based methods — finite element (FEM), finite difference (FDM), finite volume (FVM), and spectral methods — discretise $\Omega$ into $N$ degrees of freedom and solve a resulting algebraic system. Their complexity typically scales as $O(N^\alpha)$ for some $\alpha \geq 1$, and in $d$ dimensions $N \sim h^{-d}$ for mesh spacing $h$, leading to exponential cost as $d$ grows.

The Deep Learning Turn #

The 2019 PINN paper by Raissi, Perdikaris, and Karniadakis, and the 2020 FNO paper by Li et al., triggered an explosion of mesh-free and operator-learning approaches. Rather than discretising $\Omega$, these methods parameterise $u$ (or the solution operator $\mathcal{N}^{-1}$) as a neural network and minimise a physics-informed or data-driven loss. The key advantages are mesh-free flexibility, natural handling of inverse problems, and — in the operator-learning setting — the ability to generalise across PDE instances.

Recent Developments #

1. Physics-Informed Neural Networks (PINNs) and Variants #

PINNs, introduced by Raissi, Perdikaris, and Karniadakis (2019), embed physical laws directly into the neural network loss function as residual terms of the form $\mathcal{L}_{\text{phys}} = |f(\hat{u})|^2$, supplemented by data, boundary, and initial condition constraints. Their appeal lies in a mesh-free design that handles irregular geometries and inverse problems naturally. Yet PINN training is notoriously fragile — subject to spectral bias, loss imbalance, and stiffness — motivating a rich line of training improvements.

Staged training strategies. A 2025 IEEE paper proposes a two-stage process: a short-time pretraining phase followed by extension to the full time domain, combined with uncertainty-guided sampling. This significantly improves accuracy and efficiency for time-dependent PDEs compared to standard PINNs (IEEE, 2025).

Evolutionary optimisation of PINNs. A 2025 arXiv paper introduces evolutionary optimisation to tune PINN architectures, improving robustness when data are scarce by complying with physical laws through training loss (arXiv:2501.06572).

Automatic structure discovery via knowledge distillation. A 2025 Nature Communications paper proposes a physics-informed distillation framework that decouples physical and parameter regularisation in teacher–student networks, then uses clustering and parameter reconstruction to embed physically meaningful structures. Experiments on Laplace, Burgers, Poisson, and fluid mechanics equations show improved accuracy, training efficiency, and transferability (arXiv:2502.06026).

Production-ready frameworks include PhysicsNeMo/Modulus (CUDA-optimised kernels with 4× speedups) and DeepXDE, which support adaptive weighting schemes, curriculum learning, intelligent residual point sampling, and domain decomposition for stiff problems.

2. Kolmogorov–Arnold Networks (KANs) for PDEs #

Proposed by Liu, Wang, Vaidya et al. (2024, accepted ICLR 2025), KANs replace fixed activation functions at MLP nodes with learnable spline-parameterised functions on each edge. This change — inspired by the Kolmogorov-Arnold representation theorem — provides faster neural scaling laws, improved interpretability, and comparable or better accuracy with far fewer parameters, especially for scientific AI tasks. The major PINN-KAN hybrid architectures are as follows:

Architecture	PDE focus	Key claim
KINN	Solid mechanics, multi-scale, singularities	Significantly outperforms MLP-PINNs in accuracy and convergence speed
PI-KAN	Navier–Stokes (forward)	High prediction accuracy; addresses information bottleneck
HRKANs	Poisson, Burgers	Highest fitting accuracy, lowest training time vs. KAN and ReLU-KAN
PIKANs (adaptive grid)	Forward PDE problems	Up to 84× faster training; adaptive state transition reduces $L^2$ error by 43%
EvoKAN	Complex PDE systems	Energy-dissipative; encodes only the initial state, avoiding retraining
KAN-ODEs	Schrödinger, Allen–Cahn, dynamical systems	Improved performance over Neural ODEs in discovering hidden physics

KANs are also being used inside DeepONet branch/trunk networks for hybrid neural operator surrogates in porous media flows, including Darcy flow and 2D/3D multiphase problems (arXiv:2511.02962). For a deeper treatment of KAN architectures for PDEs, see the companion post in this series.

3. Neural Operator Learning #

Neural operators learn mappings between infinite-dimensional function spaces — enabling resolution-invariant, discretisation-agnostic PDE solvers. The two dominant architectures are the Fourier Neural Operator (FNO) and Deep Operator Networks (DeepONet).

FNO applies global convolution in Fourier space, giving resolution invariance and fast inference. The 2025 Optimised FNO (O-FNO) integrates residual connections and enhanced spectral resolution for the 2D fractional Poisson equation, achieving over 98% test accuracy and outperforming both base FNO and DeepONet. A hardware/algorithm co-design chip, ReBA, implements the Galerkin Transformer achieving 34.57× speedup over CPUs and up to 51.26× over prior accelerators (IEEE, 2025).

DeepONet’s branch-trunk architecture excels under noise and complex geometries where FNO degrades. Recent extensions include multi-fidelity physics-guided DeepONet (2025), Fusion DeepONet for hypersonic flow predictions on arbitrary grids (arXiv:2501.01934), and Latent-space DeepONet (L-DeepONet) (Nature Communications, 2024), which outperforms all other neural operators with small latent dimensions ($d \leq 100$), enabling real-time high-dimensional predictions. Ensemble and Mixture-of-Experts DeepONets achieve 2–4× lower relative $\ell_2$ errors through basis enrichment and spatial locality (arXiv:2405.11907). Taylor Mode Neural Operators provide an order-of-magnitude speed-up for DeepONet and 8× for FNO in computing high-order derivatives via Taylor-mode automatic differentiation.

Graph Neural Operator Methods. The GOLA framework (2025) addresses the limitation of regular-grid assumptions by constructing graphs from irregularly sampled spatial points with a Fourier-based encoder for learnable complex-coefficient embeddings, outperforming baselines in data-scarce regimes across 2D Darcy, Advection, Eikonal, and Nonlinear Diffusion problems (arXiv:2505.18923).

4. Foundation Models for PDEs #

Inspired by the success of LLMs, PDE foundation models represent a paradigm shift: large transformers pre-trained on diverse physical systems that can be fine-tuned for downstream tasks with minimal data.

Poseidon (ETH Zurich, 2024) is a multiscale operator transformer with time-conditioned layer norms, enabling continuous-in-time evaluation. Pre-trained on diverse physical systems, it exploits the semigroup property of time-dependent PDEs for significant data scaling (arXiv:2405.19101).

OmniArch (ICML 2025) is the first multi-scale and multi-physics scientific computing foundation model, featuring a Fourier encoder-decoder and transformer backbone with a PDE-Aligner for physics-informed fine-tuning. It achieves unified 1D-2D-3D pre-training on PDEBench and demonstrates zero-shot learning on new physics.

PDEformer (2025) represents PDEs as computational graphs integrating symbolic and numerical information; a graph transformer with implicit neural representation enables mesh-free predictions with zero-shot accuracy comparable to specialist models (arXiv:2402.12652).

Multimodal PDE Foundation Model (UCLA, 2025) integrates both numerical inputs (equation parameters, initial conditions) and text descriptions. It achieves average relative error below 3.3% in-distribution and generates interpretable scientific text — bridging NLP and scientific computing (arXiv:2502.06026).

Physics-informed fine-tuning (arXiv:2603.15431, 2026) establishes that hybrid fine-tuning (combining physics-informed and data-driven objectives) achieves superior extrapolation to downstream tasks and enables data-free learning of unseen PDE families.

Geo-NeW (arXiv:2602.02788, Feb 2026) — General-Geometry Neural Whitney Forms — is a data-driven finite element method jointly learning differential operators and compatible finite element spaces on the geometry. It exactly preserves physical conservation laws via Finite Element Exterior Calculus, with state-of-the-art performance on out-of-distribution geometries.

5. Deep Learning for High-Dimensional PDEs #

Classical mesh-based methods suffer exponential complexity growth in dimension $d$. Three principal deep learning paradigms address this.

The Deep BSDE method (Han, Jentzen, & E, PNAS, 2018) reformulates semilinear parabolic PDEs using backward stochastic differential equations (BSDEs) and learns the gradient of the solution with neural networks, enabling solution of PDEs in hundreds to thousands of dimensions. A 2025 review by the original authors traces subsequent advances. Key recent improvements include:

Deep Shotgun Method (J. Sci. Comput., 2025): avoids full trajectory simulation, using only data distribution, achieving results up to dimension 10,000 (Springer, 2025).
XNet-enhanced Deep BSDE (2025): a new network architecture with fewer parameters, significantly improving computational efficiency and accuracy (arXiv:2502.06238).
Deep Random Difference Method (DRDM) (2025): approximates the convection-diffusion operator using only first-order differences, avoiding Hessian computations, with proved first-order accuracy in time step $h$ (arXiv:2506.20308).
Stratonovich-based BSDE with Heun integration (2025): identifies that Euler-Maruyama discretisation bias is the root cause of BSDE underperformance relative to PINNs; Heun integration eliminates this bias and achieves competitive results across high-dimensional benchmarks (arXiv:2505.01078).

The Deep Ritz method (E & Yu, 2018) minimises energy functionals using neural networks. Extensions to multiscale problems leverage scale convergence theory to derive $\Gamma$-limits of oscillatory energy functionals.

The Full History Recursive Multilevel Picard (MLP) methodology — combining Picard iterations with multilevel Monte Carlo — was the first method proven to overcome the curse of dimensionality for semilinear parabolic PDEs and remains one of very few methods with such proven guarantees.

PDE-DKL (2025) combines deep learning for low-dimensional latent representations with Gaussian Processes for kernel regression under explicit PDE constraints, providing both high accuracy and principled uncertainty quantification in limited-data regimes (arXiv:2501.18258).

6. Classical High-Order Methods: FEM, DG, and Spectral #

Despite the deep learning surge, classical methods continue to mature, particularly in rigorous error analysis and efficiency.

The hp-version DG finite element method for the Boltzmann transport problem (J. Sci. Comput., 2024) achieves arbitrary-order convergence rates and handles polytopic elements, enabling efficient parallel implementation within existing multigroup discrete ordinates software. High-order DG methods for unsteady compressible flows — targeting acoustic waves, turbulence, and magnetohydrodynamics — benefit from block-diagonal mass matrices allowing efficient explicit time-stepping.

A systematic 2024 approach uses neural networks to learn the element-wise solution map of PDEs, accelerating finite element-type methods in an “element neural network” paradigm that generalises across element geometries. Machine learning-based spectral methods combine orthogonal function expansions (Fourier, Legendre) with deep neural operator learning for highly accurate solutions with fewer grid points.

FEX-PG (2024) solves high-dimensional partial integro-differential equations using parameter grouping to reduce coefficient count and Taylor series approximation for integral terms, achieving relative errors on the order of single-precision machine epsilon while providing interpretable, explicit solution formulas absent from most DL methods (arXiv:2410.00835).

7. Structure-Preserving Numerical Methods #

Structure-preserving methods retain intrinsic properties of the continuous system — symplecticity, energy conservation, divergence-free constraints — at the discrete level. They enhance numerical stability and long-term accuracy, ensuring computed solutions respect the underlying mathematical structure.

Recent research encompasses geometric integrators and mimetic discretisations for conservative finite element, difference, and volume schemes; stochastic multisymplectic PDEs and their structure-preserving discretisations (Studies in Applied Mathematics, 2025); and structure-preserving learning via the Geo-NeW model, which exactly preserves physical conservation laws through Finite Element Exterior Calculus. A 2024 University of Maryland workshop identified integration of structure-preserving methods with uncertainty quantification as a key open problem.

8. Data-Driven PDE Discovery #

SINDy and its extensions use sparse regression over a dictionary of candidate functions. GN-SINDy (2024–2026) addresses high dimensionality and large datasets by combining Q-DEIM greedy sampling, differentiable surrogate modelling, and sparse regression, showing robustness on Burgers, Allen-Cahn, and KdV equations. Evo-SINDy (ACM, 2025) uses multi-population co-evolutionary algorithms for universal PDE identification. Bayesian-SINDy quantifies parameter uncertainty robustly (arXiv:2402.15357).

On the neural-symbolic front, Mechanistic PDE Networks (arXiv:2502.18377, 2025) represent spatiotemporal data as space-time dependent linear PDEs within neural network hidden representations, then solve and decode for specific tasks. MORL4PDEs (Chaos Solitons Fractals, 2024) uses reinforcement learning and genetic algorithms for symbolic PDE regression without pre-specified candidate libraries. The Physics-Informed Information Criterion (PIC) (Research, 2022) selects the most appropriate PDE from candidates by incorporating symmetry constraints.

9. Hamilton–Jacobi PDEs #

Hamilton–Jacobi (HJ) PDEs govern optimal control, level-set methods, and front propagation. A comprehensive 2025 review (arXiv:2502.20833) covers grid-based methods, representation formula methods, Monte Carlo via Laplace’s method, and deep learning approaches. Key deep learning advances include actor-critic neural network frameworks for static HJ equations (convergence analysed in 2024), and variational methods that solve HJ PDEs up to 100 dimensions with relative errors of 1–5%. Deep BSDE methods naturally apply to Hamilton-Jacobi-Bellman (HJB) equations arising in stochastic optimal control.

10. Fractional and Non-Local PDEs #

Fractional-order derivatives model anomalous diffusion, viscoelastic behaviour, and memory effects that integer-order PDEs cannot capture. Recent advances include semi-analytical methods (Adomian Decomposition, Variational Iteration) applied to 3D time-fractional diffusion, telegraph, and wave equations; a 2024 comprehensive review of fractional stochastic PDEs covering the latest numerical methods and practical implementations; the Optimised FNO (O-FNO, 2025) achieving 98%+ test accuracy for fractional Poisson equations; and a 2025 meshfree finite difference scheme for the fractional Laplacian on arbitrary bounded domains.

11. Multiscale Methods and Model Order Reduction #

The 2024 Numerical Multiscale Methods dissertation establishes an equivalence between time averaging and space homogenisation, and extends Deep Ritz to multiscale problems via scale convergence theory. Multi-fidelity reduced order models for PDE-constrained optimisation (arXiv:2503.21252, 2025) use a hierarchical trust region algorithm with active learning, constructing a full/reduced/ML model hierarchy on-the-fly. POD-DL-ROMs (Politecnico di Milano, 2024) combine proper orthogonal decomposition with autoencoder architectures for nonlinear parametric PDEs, providing a mathematically rigorous framework enhancing accuracy of reduced models.

12. Uncertainty Quantification and Stochastic PDEs #

Quasi-Monte Carlo (QMC) methods achieve faster convergence than Monte Carlo for smooth integrands. A 2024 paper analyses QMC with generalised Gaussian random variables and Gevrey regular inputs — relaxing the standard uniformly bounded assumption — analysing dimension truncation, FEM, and QMC errors jointly for randomly shifted rank-1 lattice rules (arXiv:2411.03793). Randomised QMC (RQMC) with scrambled Sobol’ sequences achieves smaller bias and RMSE than Monte Carlo for risk-averse optimisation (arXiv:2408.02842). A 2024 ICERM semester at Brown University (“Numerical PDEs: Analysis, Algorithms, and Data Challenges”) served as a major gathering point for researchers integrating uncertainty quantification with PDE methods.

13. Quantum and Photonic Computing for PDEs #

Schrödingerisation techniques convert general linear PDEs into Schrödinger-type equations via the “warped transformation,” enabling direct quantum Hamiltonian simulation. A 2024 Quantum journal paper provides explicit quantum circuit implementations for the heat and advection equations with complexity analysis demonstrating quantum advantage in high dimensions. ColibriTD’s H-DES (March 2025) was reported as the first real-hardware solution of a PDE via variational quantum algorithm, executing on IBM’s 156-qubit Heron R2 processor for the inviscid Burgers’ equation.

LightSolver’s Laser Processing Unit (LPU) (announced September 2025) can now directly map and solve PDEs, with constant-time iteration steps independent of problem size, claiming up to 100× speed gains over GPU solvers and partnerships with Ansys for engineering integration.

Open Problems #

PINN training stability. Despite many improvements, PINN training remains fragile for stiff and multi-scale problems. A general theory of loss landscape conditioning and principled hyperparameter selection is lacking.

Neural operator generalisation theory. While FNO and DeepONet generalise empirically across PDE instances, rigorous approximation-theoretic guarantees relating operator-learning error to network width, depth, and training data remain incomplete.

Foundation model reliability and extrapolation. PDE foundation models show impressive zero-shot accuracy within their pre-training distribution, but their failure modes on out-of-distribution physics — and the extent to which physics-informed fine-tuning can compensate — are not yet well understood.

High-dimensional solvers beyond parabolic PDEs. The Deep BSDE method and MLP method primarily address semilinear parabolic PDEs. Extending their curse-of-dimensionality guarantees to elliptic, hyperbolic, or fully nonlinear PDEs remains largely open.

Structure-preserving deep learning. Integrating conservation laws and geometric structure (symplecticity, divergence-free constraints) into neural PDE solvers at scale — beyond the Geo-NeW approach for specific exterior calculus structures — is an active and unresolved challenge.

Quantum hardware advantage. Near-term quantum devices face noise and connectivity limitations that restrict their practical advantage over classical HPC for PDE solving. Demonstrating genuine quantum speedup for industrially relevant PDEs on real hardware remains an open goal.

References #

Brunton, S. L., Proctor, J. L., & Kutz, J. N. (2016). Discovering governing equations from data by sparse identification of nonlinear dynamical systems. PNAS, 113(15), 3932–3937.

ColibriTD. (2025, March). H-DES: First real-hardware PDE solver via variational quantum algorithm. The Quantum Insider. https://thequantuminsider.com/2025/03/25/colibritd-announces-h-des-pde-solver-as-a-step-toward-accessible-quantum-simulation-in-engineering/

E, W., & Yu, B. (2018). The deep Ritz method: A deep learning-based numerical algorithm for solving variational problems. Communications in Mathematics and Statistics, 6(1), 1–12.

E, W., Han, J., & Jentzen, A. (2022). Algorithms for solving high dimensional PDEs: From nonlinear Monte Carlo to machine learning. Nonlinearity, 35(1), 278.

Han, J., Jentzen, A., & E, W. (2018). Solving high-dimensional partial differential equations using deep learning. PNAS, 115(34), 8505–8510. https://www.pnas.org/doi/10.1073/pnas.1718942115

Han, J. (2025). A brief review of the Deep BSDE method for solving high-dimensional partial differential equations. arXiv:2505.17032. https://arxiv.org/abs/2505.17032

Hu, J., Jin, S., Liu, N., & Zhang, L. (2024). Quantum circuits for partial differential equations via Schrödingerisation. Quantum, 8, 1563. https://quantum-journal.org/papers/q-2024-12-12-1563/

IEEE. (2025). A staged training approach for physics-informed neural networks in solving partial differential equations. https://ieeexplore.ieee.org/document/11172661/

IEEE. (2025). Higher-order-ReLU-KANs (HRKANs) for solving physics-informed neural networks more accurately, robustly and faster. https://ieeexplore.ieee.org/document/11105234/

IEEE. (2025). ReBA: A hybrid sparse reconfigurable butterfly accelerator for solving PDEs via hardware and algorithm co-design. https://ieeexplore.ieee.org/document/11044078/

IEEE. (2025). An optimized Fourier neural operator for the 2D fractional Poisson equation. https://ieeexplore.ieee.org/document/11405135/

Li, Z., et al. (2020). Fourier neural operator for parametric partial differential equations. arXiv:2010.08895.

LightSolver. (2025, September). LightSolver announces advance in physical modeling on the LPU. The Quantum Insider. https://thequantuminsider.com/2025/09/16/lightsolver-announces-advance-in-physical-modeling-on-the-lpu-and-new-roadmap-for-optical-analog-pde-solving/

Liu, Z., et al. (2024). KAN: Kolmogorov-Arnold Networks. arXiv:2404.19756. ICLR 2025. https://arxiv.org/abs/2404.19756

Lu, L., Jin, P., Pang, G., Zhang, Z., & Karniadakis, G. E. (2021). Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Nature Machine Intelligence, 3, 218–229.

Lu, L., et al. (2024). Learning nonlinear operators in latent spaces for real-time predictions of complex dynamics in physical systems. Nature Communications. https://www.nature.com/articles/s41467-024-49411-w

McCabe, M., et al. (2025). Poseidon: Efficient foundation models for PDEs. arXiv:2405.19101. https://arxiv.org/html/2405.19101v2

Peng, W., et al. (2025). OmniArch: Building foundation model for scientific computing. ICML 2025. https://icml.cc/virtual/2025/poster/45099

Raissi, M., Perdikaris, P., & Karniadakis, G. E. (2019). Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378, 686–707.

Shi, Z., et al. (2025). Physics-informed fine-tuning of foundation models for partial differential equations. arXiv:2603.15431. https://arxiv.org/html/2603.15431v1

Wang, S., et al. (2025). Geo-NeW: Structure-preserving learning improves geometry generalization in PDEs. arXiv:2602.02788. https://arxiv.org/abs/2602.02788

Wang, Z., et al. (2024). Kolmogorov–Arnold-Informed neural network: A physics-informed deep learning framework for solving forward and inverse problems. Computer Methods in Applied Mechanics and Engineering. https://linkinghub.elsevier.com/retrieve/pii/S0045782524007722

Xiao, P., et al. (2025). Quantum DeepONet: Neural operators accelerated by quantum computing. Quantum, 9, 1761. https://quantum-journal.org/papers/q-2025-06-04-1761/

Xie, Z., et al. (2025). Anant-Net: Breaking the curse of dimensionality with scalable and interpretable neural surrogates. arXiv:2505.03595. https://arxiv.org/html/2505.03595v3

Xie, Z., et al. (2025). A deep shotgun method for solving high-dimensional parabolic partial differential equations. Journal of Scientific Computing. https://link.springer.com/10.1007/s10915-025-02983-1

Xu, K., & Darve, E. (2025). Integration matters for learning PDEs with backwards SDEs. arXiv:2505.01078. https://arxiv.org/abs/2505.01078

Zeng, Q., et al. (2025). Automatic network structure discovery of physics informed neural networks via knowledge distillation. Nature Communications. https://www.nature.com/articles/s41467-025-64624-3

Zhang, Y., et al. (2024). PDEformer: Towards a foundation model for one-dimensional partial differential equations. arXiv:2402.12652. http://arxiv.org/pdf/2402.12652.pdf

Zhang, Y., et al. (2025). A multimodal PDE foundation model for prediction and scientific text descriptions. arXiv:2502.06026. https://arxiv.org/abs/2502.06026

Recent Advances in Steady States of Navier-Stokes Equations

Mon, 30 Mar 2026 00:00:00 +0000

The study of steady-state and self-similar solutions of the incompressible Navier-Stokes equations (NSE) has undergone remarkable progress in the 2020s. This post surveys landmark results from 2024–2026 touching on existence, uniqueness, classification, and stability of such solutions. The stationary (steady) NSE in $\mathbb{R}^3$ reads:

$$-\nu \Delta u + (u \cdot \nabla) u + \nabla p = 0, \quad \operatorname{div} u = 0.$$

A central object of the self-similar theory is the class of $(-1)$-homogeneous (scale-invariant) solutions: a function $u$ is $(-1)$-homogeneous if $u(\lambda x) = \lambda^{-1} u(x)$ for all $\lambda > 0$. These are precisely the profiles of forward self-similar solutions $u(x,t) = t^{-1/2} U(x/\sqrt{t})$ of the time-dependent NSE.

Overview #

Five landmark results define the frontier of this area in 2024–2026:

Non-uniqueness of Leray–Hopf solutions via a computer-assisted proof in the self-similar framework (Hou, Wang, & Yang, 2025).
Forward self-similar solutions in 2D for arbitrarily large initial data (Albritton, Guillod, Korobkov, & Ren, 2026).
Existence of self-similar solutions in high dimensions ($4 \leq n \leq 16$) without smallness conditions (Bang, Gui, Liu, Wang, & Xie, 2025).
Sharp removable singularity results for $(-1)$-homogeneous solutions with singular rays (Li, Li, & Yan, 2024).
Steady NSE in junction domains with large, non-small fluxes (Gazzola, Korobkov, Ren, & Sperone, 2025).

Paper	Authors	Contribution
arXiv:2410.11170	Li, Li, Yan	Optimal removable singularity for $(-1)$-homogeneous solutions
arXiv:2412.07283	Bang, Gui, Liu, Wang, Xie	Self-similar solutions in 2D sector: existence/non-uniqueness
arXiv:2505.14642	Gazzola, Korobkov, Ren, Sperone	Steady NSE in junction channels, non-small fluxes
arXiv:2509.25116	Hou, Wang, Yang	First rigorous non-uniqueness of Leray–Hopf
arXiv:2510.10488	Bang, Gui, Liu, Wang, Xie	$(-1)$-homogeneous solutions, dimensions $4 \leq n \leq 16$
arXiv:2601.03161	Albritton, Guillod, Korobkov, Ren	Forward self-similar solutions, 2D, large data
arXiv:2601.03833	Gui, Liu, Xie	Global existence of 2D forward self-similar solutions
arXiv:2602.19846	Fujii	Sharp uniqueness/non-uniqueness in critical Besov spaces

Background #

Landau Solutions and Šverák’s Classification #

In 1944, Landau discovered a three-parameter explicit family of $(-1)$-homogeneous axisymmetric no-swirl solutions of the 3D stationary NSE. Known as Landau solutions, they are parameterized by vectors $b \in \mathbb{R}^3$ and represent fluid jets emanating from the origin. A seminal result of Šverák (2006) established that all $(-1)$-homogeneous solutions smooth on $\mathbb{S}^2$ must be Landau solutions — the only scale-invariant flows without singularities on the sphere.

Forward Self-Similar Solutions #

A forward self-similar solution takes the form

$$u(x, t) = \frac{1}{\sqrt{t}} U!\left(\frac{x}{\sqrt{t}}\right),$$

where the self-similar profile $U$ solves the stationary scaled NSE. The seminal work of Jia and Šverák (2014) showed that for any $(-1)$-homogeneous initial data smooth away from the origin, at least one global self-similar solution exists for large data — without any smallness restriction. Existence is proved via the Leray–Schauder continuation theorem rather than a fixed-point contraction (Jia & Šverák, 2015).

Discretely self-similar (DSS) solutions, where $u(\lambda x, \lambda^2 t) = \lambda^{-1} u(x,t)$ for a specific $\lambda > 1$, were constructed for large data by Tsai (2014).

Classification of $(-1)$-Homogeneous Solutions #

Tian and Xin (1998) proved that all $(-1)$-homogeneous axisymmetric solutions with exactly one singularity must be Landau solutions. A key series of papers by Li, Li, and Yan (2016–2023) classified all $(-1)$-homogeneous axisymmetric no-swirl solutions with singularities at both the north and south poles of $\mathbb{S}^2$, parameterizing them as a four-dimensional surface with boundary. They also constructed the first non-axisymmetric $(-1)$-homogeneous solutions with swirl using the Weierstrass representation of minimal surfaces.

Recent Developments #

1. Removable Singularity Theorem (Li, Li, & Yan, 2024) #

One of the sharpest results of 2024 is the removable singularity theorem proved by Li, Li, and Yan (arXiv:2410.11170, to appear in Trans. Amer. Math. Soc.): any local $(-1)$-homogeneous solution $u$ near a potential singular ray through $P \in \mathbb{S}^2$ extends smoothly across $P$, provided $u = o(\ln \operatorname{dist}(x, P))$ on $\mathbb{S}^2$.

The result is sharp: for any $\alpha > 0$, there exist local solutions where $|u(x)| / \ln |x’| \to -\alpha$ as $x \to P$, showing that logarithmic growth exactly prevents smooth extension. The paper also establishes existence of solutions with any finite number of singularities located arbitrarily on $\mathbb{S}^2$. A companion survey by Li and Yan (arXiv:2509.07243, Sep 2025) provides a state-of-the-art exposition of this topic.

2. Self-Similar Solutions in High Dimensions (Bang et al., 2025) #

Bang, Gui, Liu, Wang, and Xie (arXiv:2510.10488, Oct 2025) proved existence of $(-1)$-homogeneous solutions to the steady NSE in high spatial dimensions:

For any $(-3)$-homogeneous, locally Lipschitz external force on $\mathbb{R}^n \setminus {0}$ with $4 \leq n \leq 16$, the steady NSE admit at least one $(-1)$-homogeneous solution that is scale-invariant and regular away from the origin.

Global uniqueness holds when the external force is small. The key novelty is a dimension-reduction effect from self-similarity: integral estimates of the positive part of the total head pressure enable energy estimates even in the supercritical dimension regime. For forces with only a nonnegative radial component, existence extends to all $n \geq 4$.

The same group (arXiv:2412.07283, Dec 2024) also established existence, uniqueness, and non-uniqueness of self-similar solutions to the steady NSE in 2D sectors with no-slip boundary conditions, providing rigorous corrections to classical Rosenhead (1940) calculations.

3. Forward Self-Similar Solutions in 2D for Large Data (2026) #

Two independent papers in January 2026 addressed the 2D problem, where classical local energy estimates break down because the initial $(-1)$-homogeneous vorticity is not locally integrable:

Gui, Liu, and Xie (arXiv:2601.03833) established global existence of forward self-similar solutions for any divergence-free, $(-1)$-homogeneous, locally Hölder continuous initial velocity, with no smallness assumption.
Albritton, Guillod, Korobkov, and Ren (arXiv:2601.03161) independently constructed such solutions from arbitrarily large initial data and provided numerical evidence for non-uniqueness — the first construction and validation of non-uniqueness for the 2D self-similar problem.

4. Non-Uniqueness of Leray–Hopf Solutions (Hou, Wang, & Yang, 2025) #

The most dramatic recent development is the first rigorous computer-assisted proof of non-uniqueness of Leray–Hopf solutions to the unforced 3D NSE by Hou, Wang, and Yang (arXiv:2509.25116, Sep 2025, revised Mar 2026):

There exist infinitely many distinct suitable Leray–Hopf solutions to the 3D NSE on $\mathbb{R}^3 \times [0,1]$ with the same compactly supported, divergence-free initial condition $u_{in} \in L^q$ for any $q < 3$.

The proof executes the Jia–Šverák program (Jia & Šverák, 2015), which requires finding a large forward self-similar background flow whose linearized operator has an unstable eigenvalue (positive real part), then bifurcating to produce infinitely many Leray–Hopf solutions. The key steps are:

A finite-element + spectral-basis numerical method computes a highly precise candidate profile $\tilde{U}$.
The linearized operator $L_{\tilde{U}}$ is decomposed into a coercive part plus a finite-rank perturbation, whose invertibility is certified by computer-assisted interval arithmetic.
This certifies an unstable eigenpair $(\tilde{v}, \tilde{\lambda})$ with $\operatorname{Re}(\tilde{\lambda}) > 0$, yielding the second (and infinitely many) solutions via Riesz projection and Duhamel analysis.

These solutions just miss the Prodi–Serrin condition that guarantees uniqueness. Guillod and Šverák (2017) had provided strong numerical evidence that such unstable profiles exist, but the rigorous proof remained elusive until Hou et al.

5. Sharp Non-Uniqueness for Weak Solutions via Convex Integration (2022–2026) #

A parallel program uses convex integration to prove non-uniqueness of weak solutions. Cheskidov and Luo (Invent. Math., 2022) proved sharp non-uniqueness in $L^p_t L^\infty$ for any $p < 2$ in the periodic setting. Miao, Nie, and Ye (arXiv:2412.09637, Dec 2024) extended this to $\mathbb{R}^3$. Fujii (arXiv:2602.19846, Feb 2026) completed a sharp classification in critical Besov spaces $C([0,T); \dot{B}^{n/p-1}_{p,q}(\mathbb{R}^n))$, finding that large-time asymptotics of non-unique solutions are governed by non-trivial stationary flows — a first in the critical regularity setting.

Result	Authors	Year	Setting	Self-similar?
Non-uniqueness, $L^p_t L^\infty$, torus	Cheskidov & Luo	2022	3D periodic	No
Non-uniqueness, $L^p_t L^\infty$, $\mathbb{R}^3$	Miao, Nie & Ye	2024	3D whole space	No
Non-uniqueness of Leray–Hopf, 3D	Hou, Wang & Yang	2025	3D whole space	Yes
Forward self-similar, 2D, large data	Albritton et al.	2026	2D whole space	Yes
Steady NSE in 2D sector	Bang et al.	2024	2D sector	Yes

6. Liouville Theorems and Stability of Landau Solutions #

Tan (arXiv:2501.03609, Jan 2025) proved new Liouville theorems for the stationary NSE (including the fractional case) under growth conditions in Lebesgue spaces. Ding and Tan (arXiv:2501.03615, Jan 2025) proved a Liouville theorem for the stationary inhomogeneous NSE via frequency localization of the Dirichlet energy near the origin.

The asymptotic stability of small Landau solutions in $L^3$ was sharpened by Bradshaw and Wang (arXiv:2409.12918, Sep 2024): $L^3$-asymptotic stability holds in Lorentz spaces $L^{3,q}$ for $q < \infty$, but fails in $L^{3,\infty}$ (weak-$L^3$), marking the precise boundary of stability.

7. Steady NSE in Bounded and Unbounded Domains #

A major reference work by Korobkov, Pileckas, and Russo (Springer/Birkhäuser, March 2024) provides the first comprehensive book treatment of Leray’s problem: existence of a solution in bounded domains under only the condition of zero total flux — without smallness on the boundary data.

Gazzola, Korobkov, Ren, and Sperone (arXiv:2505.14642, May 2025) studied steady NSE in a junction of unbounded channels with sources and sinks, under inhomogeneous Dirichlet boundary conditions and without smallness of fluxes. They prove existence of a solution with uniformly bounded Dirichlet integral in every compact subset via Leray’s reductio ad absurdum argument using Morse–Sard-type theorems in Sobolev spaces.

Open Problems #

Several central questions remain unresolved or only partially answered:

The Clay Millennium Prize Problem. Whether 3D NSE solutions from smooth initial data can blow up in finite time is not resolved. The Hou et al. non-uniqueness result concerns Leray–Hopf solutions from singular $L^q$ ($q < 3$) initial data, not smooth data.

Complete classification of $(-1)$-homogeneous solutions in 3D. The axisymmetric no-swirl case is fully classified, and swirl solutions are well-studied, but a complete classification for all $(-1)$-homogeneous solutions with arbitrarily many singular rays and all possible swirl configurations is not yet achieved.

Rigorous non-uniqueness of forward self-similar solutions in 3D. The Jia–Šverák program produced numerical evidence (Guillod & Šverák, 2017), but a fully rigorous, non-computer-assisted proof of non-uniqueness for the forward (not backward) self-similar 3D problem remains open.

Asymptotic stability of large Landau solutions. While small Landau solutions are asymptotically stable in $L^3$, stability for large-parameter Landau solutions is not fully understood.

The Leray problem in non-axisymmetric 3D exterior domains without flux restrictions. The axisymmetric case was solved by Korobkov, Pileckas, and Russo, but the general 3D exterior domain problem under large flux remains open.

References #

Albritton, D., Guillod, J., Korobkov, M., & Ren, X. (2026). Forward self-similar solutions to the 2D Navier-Stokes equations from large data. arXiv:2601.03161. https://arxiv.org/abs/2601.03161

Bang, J., Gui, C., Liu, Y., Wang, C., & Xie, C. (2024). Self-similar solutions to the steady Navier-Stokes equations in 2D sectors. arXiv:2412.07283. https://arxiv.org/abs/2412.07283

Bang, J., Gui, C., Liu, Y., Wang, C., & Xie, C. (2025). On the existence of self-similar solutions to the steady Navier-Stokes equations in high dimensions. arXiv:2510.10488. https://arxiv.org/abs/2510.10488

Bradshaw, Z., & Wang, X. (2024). Asymptotic stability of Landau solutions in Lorentz spaces. arXiv:2409.12918. https://arxiv.org/pdf/2409.12918.pdf

Cheskidov, A., & Luo, X. (2022). Sharp nonuniqueness for the Navier-Stokes equations. Inventiones Mathematicae. arXiv:2009.06596. https://arxiv.org/abs/2009.06596

Ding, M., & Tan, W. (2025). Liouville-type theorem for the stationary inhomogeneous Navier-Stokes equations. arXiv:2501.03615. https://arxiv.org/abs/2501.03615

Fujii, M. (2026). Sharp non-uniqueness for the Navier-Stokes equations in critical Besov spaces. arXiv:2602.19846. https://arxiv.org/html/2602.19846v1

Gazzola, F., Korobkov, M., Ren, X., & Sperone, G. (2025). The steady Navier-Stokes equations in a system of unbounded channels with sources and sinks. arXiv:2505.14642. https://arxiv.org/abs/2505.14642

Gui, C., Liu, Y., & Xie, C. (2026). On the forward self-similar solutions to the two-dimensional Navier-Stokes equations. arXiv:2601.03833. https://arxiv.org/html/2601.03833v2

Hou, T., Wang, Y., & Yang, C. (2025). Nonuniqueness of Leray-Hopf solutions to the unforced incompressible 3D Navier-Stokes equations. arXiv:2509.25116. https://arxiv.org/abs/2509.25116

Jia, H., & Šverák, V. (2015). Are the incompressible 3d Navier–Stokes equations locally ill-posed in the natural energy space? Journal of Functional Analysis, 268(12), 3734–3766. https://www.sciencedirect.com/science/article/pii/S002212361500138X

Korobkov, M., Pileckas, K., & Russo, R. (2024). The Steady Navier-Stokes System: Basics of the Theory and the Leray Problem. Springer/Birkhäuser. https://books.google.com/books/about/The_Steady_Navier_Stokes_System.html?id=GOf8EAAAQBAJ

Korobkov, M., & Ren, X. (2024). On basic velocity estimates for the plane steady-state Navier-Stokes equations in convex domains. arXiv:2405.17884. https://arxiv.org/abs/2405.17884

Li, L., Li, Y., & Yan, Y. (2024). Removable singularity of $(-1)$-homogeneous solutions of stationary Navier-Stokes equations. Transactions of the American Mathematical Society. arXiv:2410.11170. https://arxiv.org/abs/2410.11170

Li, Y., & Yan, Y. (2025). Recent research on $(-1)$-homogeneous solutions of stationary Navier-Stokes equations. arXiv:2509.07243. https://arxiv.org/abs/2509.07243

Miao, C., Nie, Y., & Ye, W. (2024). Sharp non-uniqueness for the Navier-Stokes equations in the whole space. arXiv:2412.09637. https://arxiv.org/abs/2412.09637

Tan, W. (2025). New Liouville type theorems for the stationary Navier-Stokes equations. arXiv:2501.03609. https://arxiv.org/pdf/2501.03609.pdf

Tsai, T.-P. (2014). Forward discretely self-similar solutions of the Navier-Stokes equations. arXiv:1210.2783. https://arxiv.org/abs/1210.2783

Recent Research Directions in Analysis of PDEs 2021–2026

Mon, 30 Mar 2026 00:00:00 +0000

The arXiv section of Analysis of Partial Differential Equations is one of the most prolific areas of pure mathematics, producing over 400 preprints per month as of early 2026. The period 2021–2026 has witnessed landmark breakthroughs — including a computer-assisted proof of finite-time singularity in the 3D Euler equations, the resolution of Hilbert’s Sixth Problem via kinetic theory, and the emergence of probabilistic and nonlocal operator methods as dominant paradigms. This survey identifies, categorises, and profiles the key research directions and landmark papers in math.AP during this era.

Overview #

The landscape of math.AP in 2021–2026 organises into several major research directions:

Direction	Landmark Papers	Landmark Results
Fluid singularity (Euler)	Chen & Hou (2022–2023)	Finite-time blowup for 3D Euler/2D Boussinesq, smooth data (PNAS 2025)
NS non-uniqueness	Albritton, Brué & Colombo (2021)	Non-unique Leray–Hopf solutions for forced NS
Hilbert’s 6th Problem	Deng, Hani & Ma (2024–2025)	Long-time Boltzmann derivation; fluid equations from Newton’s laws
Wave kinetic equation	Deng & Hani (2021)	Rigorous WKE derivation from cubic NLS
Mixed local-nonlocal operators	Biagi, Dipierro, Valdinoci et al. (2020–2022)	Regularity, max. principles, Faber-Krahn inequalities
Double phase functionals	De Filippis & Mingione (2022–2023)	Gradient regularity in mixed/double phase settings
Normalized Schrödinger	Wei & Wu (2021); Jeanjean & Le (2020)	Critical mass constraints, ground states, NLS
MFG inverse problems	Imanuvilov, Liu & Yamamoto (2023)	Lipschitz stability, Carleman estimates for MFG
Keller-Segel chemotaxis	Li & Winkler (2022); Lyu & Wang (2021)	Signal-dependent motility, global regularity
Stefan/free boundary	Ferrari et al. (2024); Arya, Jeon & Julin (2026)	$C^{1,\alpha}$ regularity, supercooled Stefan
Stochastic PDEs	Bailleul & Bruned (2021); Bailleul & Hoshino (2025)	Renormalisation, regularity structures
Calderón inverse problem	Cârstea, Uhlmann et al. (2021); Krupchyk (2025)	Nonlinear and fractional settings
Dispersive PDEs	Deng, Nahmod & Yue (2020); Gubinelli et al. (2025)	Random tensors, modulated dispersive equations

Background #

The math.AP Landscape #

Analysis of PDEs is the mathematical study of equations involving unknown functions and their partial derivatives, arising in physics, geometry, probability, and engineering. The arXiv math.AP category encompasses everything from regularity theory for elliptic and parabolic equations to global well-posedness for dispersive equations, from geometric flows to inverse problems, and from kinetic theory to stochastic PDEs. With roughly 300–400 papers per month (408 in February 2026 alone), it is one of the most active and interconnected areas of pure mathematics.

The period 2021–2026 is characterised by three broad trends. First, grand-challenge resolutions: several longstanding open problems — including Hilbert’s Sixth Problem and the existence of finite-time singularities for 3D Euler equations with smooth data — were settled using novel combinations of rigorous analysis, Feynman-diagram combinatorics, and computer-assisted numerics. Second, new paradigm emergence: mixed local-nonlocal operators, double phase functionals, and normalised solutions have matured from isolated curiosities into systematic research programmes with their own regularity theories. Third, interdisciplinary expansion: MFG systems, optimal transport, SPDEs, and AI-assisted methods have become structural parts of the math.AP ecosystem.

Recent Developments #

1. Mathematical Fluid Dynamics: Singularity, Non-Uniqueness, and Stability #

Finite-Time Blowup of the 3D Euler Equations #

The question of whether the 3D incompressible Euler equations

$$\partial_t u + (u \cdot \nabla) u + \nabla p = 0, \qquad \operatorname{div} u = 0,$$

can develop a singularity from smooth initial data — open since Euler introduced the equations in 1757 — saw a decisive resolution in a bounded-domain setting through a landmark two-part series by Jiajie Chen and Thomas Y. Hou (arXiv:2210.07191, arXiv:2305.05660, PNAS 2025). Their work proves finite-time, nearly self-similar blowup of both the 2D Boussinesq and 3D axisymmetric Euler equations with smooth initial data and finite energy in the presence of a solid boundary. The proof employs weighted $L^\infty$ and $C^{1/2}$ norms, sharp functional inequalities inspired by optimal transport, and computer-assisted rigorous numerics to verify nonlinear stability constants. The result was praised as one of the most significant advances in mathematical fluid mechanics in decades.

Prior to Chen–Hou, Tarek Elgindi (2021) showed finite-time singularity for the 3D axisymmetric Euler equations without swirl from $C^{1,\alpha}$ initial vorticity. The Chen–Hou 2021 paper on the Hou-Luo model proved asymptotically self-similar blowup from smooth data for the HL model. Concurrently, Hou and collaborators presented numerical evidence for singularity in 3D Navier-Stokes achieving a $10^7$-fold increase in maximum vorticity, and DeepMind (2025) used AI-assisted methods to discover families of unstable singularities in the Incompressible Porous Media and Boussinesq equations.

Non-Uniqueness of Leray–Hopf Solutions for Navier-Stokes #

A 2021 breakthrough by Dallas Albritton, Elia Brué, and Maria Colombo proved non-uniqueness of Leray–Hopf solutions to the forced 3D Navier-Stokes equations: they exhibited two distinct Leray solutions with zero initial velocity and identical body force, exploiting the extreme instability of a self-similar background solution. Recognised as the most influential 2021 math.AP paper on arXiv by Paper Digest, the result was subsequently extended to bounded domains via gluing methods (arXiv:2209.03530) and to stochastic settings (Electronic Journal of Probability, 2024).

Stability of Shear Flows and Kinetic Theory #

Parallel to the singularity programme, sharp asymptotic stability results for 2D monotone shear flows with no-slip boundary conditions, and extensive work on inviscid damping and enhanced dissipation near shear flows, have appeared throughout 2025–2026.

Arguably the most monumental result in kinetic PDE theory during this period: Yu Deng, Zaher Hani, and Xiao Ma provided a rigorous long-time derivation of the Boltzmann equation from hard-sphere dynamics (arXiv:2408.07818, 2024), extending Lanford’s 1975 short-time theorem to all times within the lifespan of the Boltzmann solution. In a companion paper (arXiv:2503.01800, 2025), they completed the derivation of the compressible Euler and incompressible Navier-Stokes-Fourier equations from Newton’s laws — effectively resolving Hilbert’s Sixth Problem for rarefied hard-sphere gases. The proof uses cumulant ansätze, Feynman-diagram combinatorics, and a molecule-reduction algorithm. This followed the same team’s 2021 derivation of the wave kinetic equation from the cubic NLS.

2. Nonlocal and Fractional PDEs: Mixed Local-Nonlocal Operators #

One of the dominant new paradigms of the 2020s is the study of operators of the form

$$\mathcal{L} u = -\Delta u + (-\Delta)^s u, \quad s \in (0,1),$$

which superpose a classical Laplacian with a fractional (nonlocal) Laplacian. These arise naturally in models combining Brownian and Lévy diffusion processes. The foundational paper by Biagi, Dipierro, Valdinoci, and Vecchi (2020/2021) initiated a systematic theory of regularity and maximum principles for such operators.

Between 2021 and 2026 an explosion of activity produced: gradient regularity for mixed local-nonlocal problems via De Filippis and Mingione (2022, minimisers of mixed functionals are locally $C^{1,\beta}$-regular); Hölder regularity for mixed local-nonlocal degenerate elliptic equations (Garain & Lindgren, 2022); the Wiener criterion for nonlocal Dirichlet problems (Kim, Lee & Lee, 2022); and a Faber-Krahn inequality for mixed operators (Biagi, Dipierro, Valdinoci & Vecchi, 2021). Serena Dipierro and Enrico Valdinoci were among the most prolific contributors, publishing on nonlocal logistic equations with Neumann conditions, ecological niches for mixed dispersal, and Sobolev inequalities for mixed operators.

Giovanni Leoni’s 2023 treatise A First Course in Fractional Sobolev Spaces provided a self-contained reference covering definitions, embeddings, Hardy inequalities, and interpolation inequalities, and ranked among the most-cited arXiv math.AP papers of 2023. Concurrently, a 2025 paper established well-posedness and regularity theory for time-fractional stochastic PDEs involving Caputo derivatives and general nonlocal operators driven by Gaussian and Lévy noise (arXiv:2512.03754).

3. Double Phase Operators and Nonstandard Growth #

The double phase functional

$$\mathcal{H}(u) := \int_\Omega \bigl(|Du|^p + a(x)|Du|^q\bigr),dx, \quad q > p > 1,\ a(x) \geq 0,$$

introduced by Colombo and Mingione, generated a remarkable surge of activity throughout 2021–2026.

Year	Paper	Authors	Key Contribution
2021	A new class of double phase variable exponent problems	Crespo-Blanco, Gasiński, Harjulehto, Winkert	Existence/uniqueness for new double phase with variable exponents
2021	Double phase implicit obstacle problems	Zeng, Rădulescu, Winkert	Mixed BVPs with convection and multivalued conditions
2022	Nonuniformly elliptic Schauder theory	De Filippis, Mingione	Schauder estimates in nonuniform elliptic settings
2022	New embedding results for double phase problems	Ho, Winkert	Musielak-Orlicz Sobolev spaces with variable exponent
2023	Regularity at nearly linear growth	De Filippis, Mingione	Hölder gradient regularity for log-type functionals
2025	Partial regularity for parabolic double phase systems	Ok, Scilla, Stroffolini	Partial Hölder regularity for parabolic systems

The work of Cristiana De Filippis and Giuseppe Mingione is particularly prominent throughout, providing a comprehensive regularity theory for double phase and nonuniformly elliptic functionals (arXiv:2308.10222).

4. Normalized Solutions and Variational Methods for Schrödinger Equations #

The problem of finding solutions $u \in H^1(\mathbb{R}^N)$ with prescribed $L^2$-norm — the mass constraint

$$\int_{\mathbb{R}^N} |u|^2,dx = c$$

— has become a central theme in the study of nonlinear Schrödinger equations. The influential papers by Louis Jeanjean and Thanh Trung Le on multiple normalized solutions for Sobolev critical equations (2020–2021) and by Juncheng Wei and Yuanze Wu on normalized solutions with critical Sobolev exponent and mixed nonlinearities (2021) launched a wave of activity. Key directions include: normalized ground states for NLS with potential (Bartsch, Molle, Rizzi & Verzini); normalized solutions for Schrödinger-Poisson-Slater equations; and standing waves and stability for Choquard equations. The March 2026 arXiv listings confirm that sharp exponents, existence and asymptotics for Choquard equations, and boosted ground states for pseudo-relativistic Schrödinger equations remain highly active.

Parallel work on eigenvalue problems addresses Steklov eigenvalues (monotonicity for regular $N$-gons, sharp geometric bounds), eigenvalues of Pucci’s extremal operator in 3D, and biharmonic Steklov problems on thin sets.

5. Mean Field Games and Aggregation-Diffusion PDEs #

Mean field game theory generated a prolific suite of PDE questions between 2021 and 2026. Highlights include: Imanuvilov, Liu, and Yamamoto (2023) proving Lipschitz stability for determining states and inverse sources in MFG equations using Carleman estimates; Klibanov, Li, and Liu (2023) on Hölder stability via Carleman estimates; the inverse boundary problem for first-order master equations (Liu & Zhang, 2022); and Bresch, Jabin, and Soler (2022) introducing a novel probabilistic derivation of the mean-field limit applicable to Vlasov-Poisson-Fokker-Planck in 2D. By 2025–2026, nonlocal MFG models with spatial interactions and new work on Wasserstein gradient flows of kernel mean discrepancies with connections to machine learning appeared on arXiv (arXiv:2506.01200).

Optimal transport has deeply influenced aggregation-diffusion equations and gradient flows. The March 2026 arXiv listings include a major 73-page paper by Carrillo, Gwiazda, and Skrzeczkowski presenting a new formula for the Wasserstein distance between solutions to nonlinear continuity equations.

6. Chemotaxis and Reaction-Diffusion Systems #

Chemotaxis systems — in particular Keller-Segel models with signal-dependent motility (density-suppressed diffusion) — generated intense activity. Key papers include logistic damping effects and global classical solutions for reaction-diffusion systems with density-suppressed motility (Lyu & Wang, 2021), refined regularity analysis for Keller-Segel-consumption systems (Li & Winkler, 2022), and global existence with uniform boundedness under signal-dependent motility (Jiang & Laurençot, 2021). In 2024, a construction of smooth finite-time blowup solutions for the 3D Keller-Segel-Navier-Stokes (chemotaxis-fluid) system with buoyancy appeared, using a quantitative method that directly constructs the singular solution (arXiv:2404.17228).

In parallel, free boundary reaction-diffusion models for species spreading and SIS epidemic models — including 2026 work on asymmetric kernels in advective periodic environments — continue to produce threshold and long-time dynamics results.

7. Free Boundary Problems #

The Stefan problem (modelling solidification and melting) remained highly active throughout 2021–2026. Key results include $C^{1,\alpha}$ regularity of flat free boundaries for the inhomogeneous one-phase Stefan problem (Ferrari, Forcillo, Giovagnoli & Jesus, 2024; arXiv:2404.07535); regularity of the free boundary for the supercooled Stefan problem in arbitrary dimensions (2025; arXiv:2512.10136), where the free boundary decomposes into regular, singular, and jump parts with the singular part having controlled parabolic dimension; and well-posedness and regularity of physical solutions for the supercooled Stefan problem assuming only integrable initial temperature, with explicit classification of free boundary points (2025; arXiv:2506.18741). These results use obstacle problem techniques, non-degeneracy estimates, and sharp free boundary classification arguments.

Shape optimisation for principal eigenvalues of Pucci operators and $\Gamma$-convergence of convolution-type functionals for free discontinuity problems are active related directions in 2026.

8. Stochastic PDEs and Regularity Structures #

Martin Hairer’s theory of regularity structures generated deep ongoing activity. The period 2021–2026 saw Bailleul and Bruned (2021) extending the algebraic renormalisation framework of regularity structures to a broader class of singular SPDEs (arXiv:2101.11949); the publication of “A tourist’s guide to regularity structures” by Bailleul and Hoshino (2025/2026) in EMS Surveys as an essentially self-contained treatment; applications to stochastic quantisation ($\Phi^4_3$), the KPZ equation, and stochastic geometric flows (Hairer, 2021); and variance renormalisation in regularity structures for the 2D generalised Parabolic Anderson Model (Gerencsér & Hsu, 2026).

On the fluid side, global unique solvability for stochastic Navier-Stokes-Korteweg equations and stochastic Allen-Cahn-Navier-Stokes systems with ergodic invariant measures appeared in 2025, and non-uniqueness of Leray-Hopf solutions was extended to the stochastic forced setting.

9. Dispersive PDEs: Wave Turbulence, Well-Posedness, and Blowup #

The full derivation of the wave kinetic equation from the cubic NLS by Deng and Hani (arXiv:1912.09518, 2021) was the most impactful dispersive result of the era. Their analysis relies on absolutely convergent Feynman-diagram (paired-tree) expansions and identifies favourable scaling laws $\alpha \sim L^{-\varepsilon}$ for the kinetic limit.

Ongoing work includes polynomial growth of Sobolev norms for the fractional NLS on $\mathbb{T}^d$ (Wang, 2026); low-regularity global well-posedness for generalised Zakharov-Kuznetsov equations (Nowicki-Koth, 2026); modulated dispersive equations (modulated KdV with normal form reduction; Gubinelli, Li, Li & Oh, 2025; arXiv:2505.24270); and probabilistic well-posedness of dispersive PDEs beyond variance blowup (2025; arXiv:2509.02344). Scattering results for the quintic generalised Benjamin-Bona-Mahony equation and the 3D Zakharov-Kuznetsov equation, and long-time asymptotics via Riemann-Hilbert and inverse scattering methods for integrable equations, appear in the March 2026 listings.

10. Geometric PDEs #

Ricci flow uniqueness in the non-compact setting (Lee, 2025; arXiv:2503.20292) and a new non-Kähler expanding Ricci soliton construction with Kähler tangent cone at infinity (Bamler, Chen & Conlon, 2026) reflect the continued health of geometric flows. The volume-preserving mean curvature flow regularity in dimensions 2 and 3 appeared in March 2026 (Arya, Jeon & Julin).

Regularity theory for Monge-Ampère equations received major contributions via a geometric approach: Brendle, Léger, McCann, and Rankin (2023; arXiv:2311.10208) derived the Pogorelov second-derivative bound using Kim-McCann-Warren’s pseudo-Riemannian geometry, providing a new approach to $C^1$ estimates for optimal transport maps. Liouville theorems and sharp solvability for the parabolic Monge-Ampère equation with periodic data appeared in March 2026.

11. Inverse Problems for PDEs #

The Calderón problem — recovering a coefficient from boundary Dirichlet-to-Neumann data — attracted major advances: the quasilinear setting (Cârstea, Feizmohammadi, Kian, Krupchyk & Uhlmann, 2021), inverse problems for fractional semilinear elliptic equations (Lai & Lin, 2020), the Calderón problem via Vekua theory (Clifford analysis framework, 2026; arXiv:2601.17313), and the convex lifting approach (Alberti, Petit & Sanna, 2025; arXiv:2507.00645). The anisotropic Calderón problem for fractional Schrödinger operators on closed Riemannian manifolds (Krupchyk, 2025) was an important further advance.

Inverse moving source problems for parabolic equations (Zhao, 2023), reconstruction of scalar parameters in subdiffusion, and inverse problems for multi-term time-fractional diffusion with Caputo derivatives are active in 2025–2026.

12. Semi-Classical Analysis, Spectral Theory, and Nonlinear Elliptic Theory #

A 2024 arXiv survey on semi-classical analysis introducing three representative topics ranked as the top 2024 math.AP paper by Paper Digest, and a 2026 paper celebrating the 100th anniversary of the WKB papers (Vũ Ngọc) indicate that semi-classical methods remain foundational.

In nonlinear elliptic and parabolic theory, major contributions include: Regularity Theory for Elliptic PDEs by Fernández-Real and Ros-Oton (2023), a comprehensive self-contained reference; Fujita-type results for degenerate parabolic equations on Heisenberg groups (Fino, Ruzhansky & Torebek, 2023), ranked the highest-impact 2023 math.AP paper; and singularity formation for nonlinear heat equations on infinite graphs (Punko & Zucchero, 2026).

Emerging and Cross-Cutting Themes (2025–2026) #

Computer-assisted proofs and rigorous numerics. The Chen–Hou Euler blowup proof and related work on the CLM model (Hou-Wang, 2026) demonstrate that computer-assisted methods with rigorous error control are becoming standard for complex nonlinear stability analyses. These methods combine spectral Galerkin approximations with interval arithmetic and weighted norm frameworks to certify nonlinear stability constants — a methodology likely to expand further.

AI and machine learning for PDEs. The 2026 workshop MLPDES26 and the NSF/AMS report on AI for the mathematical sciences signal growing interplay between pure math.AP and deep learning. Neural PDE networks for equation discovery (arXiv:2502.18377), geometric operator learning via optimal transport (arXiv:2507.20065), and AI-assisted singularity discovery (DeepMind, 2025) represent this interdisciplinary frontier.

PDE methods in geometry and probability. The intersection of math.AP with differential geometry, probability (SPDEs), and mathematical physics remains extremely active. The March 2026 listings span general relativity (tensorial wave equations), Kähler geometry (Ricci solitons), and stochastic PDEs — confirming that math.AP functions as a hub connecting multiple mathematical disciplines.

Open Problems #

Smooth-data Euler regularity beyond bounded domains. The Chen–Hou result proves blowup in a bounded domain. Whether finite-time singularity occurs for the 3D Euler equations in all of $\mathbb{R}^3$ from smooth, rapidly decaying initial data — the original Euler problem — remains open.

Navier-Stokes uniqueness from smooth initial data. The Albritton-Brué-Colombo result proves non-uniqueness for forced NS from zero initial velocity. Non-uniqueness (or uniqueness) of Leray–Hopf solutions for the unforced equations from smooth $H^1$ initial data is unresolved (see the companion survey on self-similar solutions).

Optimal regularity theory for double phase problems. Despite the comprehensive work of De Filippis and Mingione, optimal Schauder estimates for parabolic double phase systems at the boundary and under critical growth conditions are not fully established.

Complete derivation programme for Hilbert’s Sixth Problem. Deng-Hani-Ma resolved the case of hard-sphere gases in the Boltzmann regime. The derivation of hydrodynamic equations from particle dynamics in other regimes — dense gases, quantum systems, plasma — remains largely open.

Global well-posedness for energy-critical NLS in high dimensions. Despite progress on wave kinetic theory and probabilistic well-posedness, the deterministic global well-posedness theory for energy-critical and supercritical dispersive equations in dimensions $d \geq 5$ has significant gaps.

Quantum and numerical computation in pure math.AP. The growing use of computer-assisted proofs raises methodological questions about standards of verification, reproducibility, and the scope of problems accessible to these techniques.

References #

Albritton, D., Brué, E., & Colombo, M. (2021). Non-uniqueness of Leray solutions of the forced Navier-Stokes equations. https://cvgmt.sns.it/media/doc/paper/5405/main.pdf

Bailleul, I., & Bruned, Y. (2021). Renormalised singular stochastic PDEs. arXiv:2101.11949. https://www.pure.ed.ac.uk/ws/portalfiles/portal/194767736/2101.11949.pdf

Bailleul, I., & Hoshino, M. (2025). A tourist’s guide to regularity structures and singular stochastic PDEs. EMS Surveys in Mathematical Sciences. https://ems.press/journals/emss/articles/14298505

Brendle, S., Léger, F., McCann, R. J., & Rankin, C. (2023). A geometric approach to a priori estimates for optimal transport maps. arXiv:2311.10208. https://arxiv.org/abs/2311.10208

Chen, J., & Hou, T. Y. (2022). Stable nearly self-similar blowup of the 2D Boussinesq and 3D Euler equations with smooth data I: Analysis. arXiv:2210.07191. https://arxiv.org/abs/2210.07191

Chen, J., & Hou, T. Y. (2023). Stable nearly self-similar blowup of the 2D Boussinesq and 3D Euler equations with smooth data II: Rigorous numerics. arXiv:2305.05660. https://arxiv.org/abs/2305.05660

Chen, J., & Hou, T. Y. (2025). Singularity formation in 3D Euler equations with smooth initial data. PNAS, 122(28). https://www.pnas.org/doi/10.1073/pnas.2500940122

De Filippis, C., & Mingione, G. (2023). Regularity for double phase problems at nearly linear growth. arXiv:2308.10222. https://arxiv.org/abs/2308.10222

DeepMind. (2025). Discovering new solutions to century-old problems in fluid dynamics. https://deepmind.google/blog/discovering-new-solutions-to-century-old-problems-in-fluid-dynamics/

Deng, Y., & Hani, Z. (2021). On the derivation of the wave kinetic equation for NLS. arXiv:1912.09518. http://arxiv.org/pdf/1912.09518.pdf

Deng, Y., Hani, Z., & Ma, X. (2024). Long time derivation of the Boltzmann equation from hard sphere dynamics. arXiv:2408.07818. https://www.semanticscholar.org/paper/91b67412a6058c1ace054a32fbf36fa2d2998d3d

Deng, Y., Hani, Z., & Ma, X. (2025). Hilbert’s sixth problem: Derivation of fluid equations via Boltzmann’s kinetic theory. arXiv:2503.01800. https://www.semanticscholar.org/paper/01d8f11b5d31f7037fb4914797e938db11d76ec5

Ferrari, F., Forcillo, N., Giovagnoli, D., & Jesus, B. (2024). Free boundary regularity for the inhomogeneous one-phase Stefan problem. arXiv:2404.07535. https://arxiv.org/abs/2404.07535

Gubinelli, M., Li, J., Li, T., & Oh, T. (2025). Nonlinear PDEs with modulated dispersion IV: Normal form reduction for modulated KdV. arXiv:2505.24270. https://arxiv.org/pdf/2505.24270.pdf

Hou, T. Y. (2021). The potentially singular behavior of the 3D Navier-Stokes equations. arXiv:2107.06509. https://arxiv.org/abs/2107.06509

Hu, J., Jin, S., Liu, N., & Zhang, L. (2024). Quantum circuits for partial differential equations via Schrödingerisation. Quantum, 8, 1563.

Imanuvilov, O. Y., Liu, Y., & Yamamoto, M. (2023). Lipschitz stability for determining states and inverse sources in MFG equations. [Journal of Mathematical Analysis].

Ok, J., Scilla, G., & Stroffolini, B. (2025). Partial regularity for parabolic systems of double phase type. arXiv:2510.03849. https://arxiv.org/pdf/2510.03849.pdf

Paper Digest. (2025, March). Most influential arXiv (Analysis of PDEs) papers — 2025-03 version. https://www.paperdigest.org/2025/03/most-influential-arxiv-analysis-of-pdes-papers-2025-03-version/

Segata, J., & Chen, M. (2026). Scattering for the 3D Zakharov-Kuznetsov equation [arXiv preprint]. arXiv math.AP March 2026.

arXiv math.AP listings. (2026, February–March). https://arxiv.org/list/math.AP/2026-03

Paper Reading - Optimization problems for elliptic PDEs (2601.01591)

Fri, 20 Feb 2026 00:00:00 +0000

This paper is a panoramic tour of three families of optimal control problems for elliptic PDEs: where the control is the coefficient, the potential, or the source term, unifying and sharpening results from the authors’ previous works.

Three ways to control an elliptic PDE #

The authors always consider a Dirichlet problem on a bounded domain $\Omega \subset \mathbb{R}^d$, with the solution $u$ as the state and a function (or measure) as the control. They study three settings:

Optimal coefficients $a(x)$: $$ -\mathrm{div}(a(x)\nabla u) = f \text{ in } \Omega, \quad u=0 \text{ on } \partial\Omega, $$ cost function $J(u,a) = \int_\Omega j(u,a),dx$, with a constraint $\int_\Omega \psi(a),dx \le 1$.
Optimal potentials $V(x)$: $$ -\Delta u + V(x)u = f \text{ in } \Omega, \quad u\in H_0^1(\Omega), $$ cost function $J(u,V) = \int_\Omega (j(x,u) + \psi(V)),dx$.
Optimal sources $f$: $$ -\Delta u = f \text{ in } \Omega, \quad u\in H_0^1(\Omega), $$ cost function $J(f) = \int_\Omega j(x,u_f,f),dx$ with $\int_\Omega \psi(f),dx \le m$.

In all cases, $\psi$ is convex and lower semi-continuous (l.s.c), encoding constraints and penalizations on the control. The paper focuses on existence of optimal controls (sometimes as measures), characterization via auxiliary variational problems and adjoint states, bang–bang behavior, and regularity of optimal controls and their induced interfaces.

Optimal Coefficients: Where to Put the Good Material? #

Minimal Compliance and Measure-Valued Coefficients #

The model problem is compliance minimization for $-\mathrm{div}(a(x)\nabla u) = f$, $u=0$, with non-negative $a$.

Compliance is defined as: $$ C(a) = \int_\Omega f u_a,dx, $$ and it relates to the energy $$ E(a) = \inf_{u\in H_0^1} \int_\Omega \left(\tfrac{1}{2} a|\nabla u|^2 - f u\right)dx $$ via $C(a) = -2E(a)$.

The optimization problem is written as: $$ \min_{a \geq 0} \left\{ C(a) + \int_\Omega \psi(a)dx \right\}, $$ or equivalently as a max–min problem in $(a,u)$.

Two growth regimes of $\psi$ are crucial:

Superlinear: $\psi(s)/s \to +\infty$. Then admissible coefficients are in $L^1(\Omega)$, and there exists an optimal $a_{\mathrm{opt}}\in L^1(\Omega)$.
Linear growth: $\psi(s)/s \to k>0$. Then it is natural to extend the problem to measures $\mu\ge 0$, allowing “thin” structures on lower-dimensional sets. The cost $\int \psi(\mu)$ is interpreted through the Lebesgue–singular decomposition and the recession function $\psi_\infty$. An optimal measure $\mu_{\mathrm{opt}}\in \mathcal{M}^+(\Omega)$ still exists.

Because the functional is convex in $u$ and concave in $a$, the authors exchange inf and sup and reduce to an auxiliary minimization problem in $u$ alone: $$ \inf_{u} \int_\Omega \psi^{*}(|\nabla u|^2)dx - 2\int_\Omega u df, $$ where $\psi^{*}$ is the Legendre–Fenchel conjugate. Under mild assumptions this problem has a unique minimizer $\bar u$, and the optimal coefficient is recovered point-wise from the optimality condition: $$ a_{\mathrm{opt}}|\nabla\bar u|^2 = \psi(a_{\mathrm{opt}}) + \psi^*(|\nabla\bar u|^2). $$

Examples:

Power penalization $\psi(s) = s^p/p$, $p>1$: The auxiliary problem involves a nonlinear PDE $$-\Delta_{2p/(p-1)} u = \tfrac{2p}{p-1} f,$$ and the optimal coefficient is $a_{\mathrm{opt}}(x) = |\nabla \bar u(x)|^{2/(p-1)}$. For $\Omega$ a ball and $f=1$ or $f=\delta_0$, the authors give explicit radial formulas and plots for $\bar u$ and $a_{\mathrm{opt}}$.
Two-phase box constraint $\psi(s) = s$ on $[\alpha,\beta]$, $+\infty$ otherwise: The auxiliary problem yields an optimal coefficient $a_{\mathrm{opt}}\in L^\infty(\Omega)$ taking values in $[\alpha,\beta]$, and under regularity of $\Omega$ and $f$ one gets extra smoothness (e.g. $\nabla a_{\mathrm{opt}}\cdot \nabla \bar u \in L^2(\Omega)$).

General Coefficients and G-Closure #

For a general cost: $$\min_{a\ge 0}\min_{u} \int_\Omega (j(x,u)+\psi(a)),dx \quad \text{s.t. } u \text{ solves } -\mathrm{div}(a\nabla u)=f,$$ existence of an optimal $a$ may fail.

The relaxed problem is naturally expressed via G-convergence: sequences of scalar coefficients $a_n\in[\alpha,\beta]$ can generate limit operators with matrix-valued coefficients $A(x)$, described by the celebrated Murat–Tartar G-closure.

The G-closure set $\mathcal{A}$ consists of symmetric matrices $A(x)$ whose eigenvalues $\lambda_1\le\cdots\le\lambda_d$ lie in $[\alpha,\beta]$ and satisfy a family of inequalities depending on a mixing parameter $t\in[0,1]$, involving the arithmetic and harmonic means $\mu_t, \nu_t$ of $\alpha,\beta$. For $d=2$, this gives an explicit admissible region in the $(\lambda_1,\lambda_2)$-plane.

Relaxed functionals of the form $\int \psi(x,a),dx$ over G-limits have been studied in special cases, e.g. $\psi(x,a)=g(x)a$, where one can express the relaxation in terms of the largest eigenvalue $\lambda_{\max}(A(x))$. The authors show a numerical example where the relaxed optimal matrix $A_{\mathrm{opt}}$ has eigenvalues $\lambda_1\neq \lambda_2$ on a set of positive measure, revealing genuine microstructure.

Optimal Potentials: Shaping the “Landscape” $V(x)$ #

Here the control is a nonnegative potential $V$ in $$-\Delta u + V u = f, \quad u\in H_0^1(\Omega).$$ The cost is: $$\min \int_\Omega (j(x,u) + \psi(V)),dx,$$ with $V\ge 0$ and $\psi$ convex, l.s.c., super-linear (so any finite-cost $V$ lies in $L^1(\Omega)$).

Compliance Case: Eliminating the Control #

For the compliance choice $j(x,u) = f(x)u$, the problem can again be reduced to a variational problem in $u$ only.

Define: $$ E(V) = \min_{u\in H_0^1(\Omega)} \int_\Omega \left(\tfrac{1}{2} |\nabla u|^2 + \tfrac{1}{2} V u^2 - f u\right)dx, \quad \Psi(V)=\int_\Omega \psi(V),dx. $$

Minimizing $-2E(V)+\Psi(V)$ over $V\ge 0$ is equivalent to: $$ \min_{u\in H_0^1(\Omega)} \int_\Omega \left(|\nabla u|^2 + \psi^*(u^2) - 2 f u\right)dx, $$ a semi-linear elliptic problem in $u$ with nonlinearity $g(s)=s(\psi^*)’(s^2)$. The optimal state $\bar u$ solves: $$ -\Delta u + g(u) = f, \quad u\in H_0^1(\Omega), $$ and the optimal potential is: $$ V_{\mathrm{opt}} = (\psi^*)’(\bar u^2). $$ So in this special case the control can be explicitly reconstructed from the state.

General Costs, Adjoint Equation, and Regularity #

For a general $j(x,u)$, the authors prove an existence theorem of an optimal $V_{\mathrm{opt}}\in L^1(\Omega)$ under natural growth and coercivity assumptions on $j$ and super-linearity of $\psi$.

Optimality conditions involve:

The state $\bar u$ solving $-\Delta u + V_{\mathrm{opt}}u = f$.
An adjoint state $v$ solving $-\Delta v + V_{\mathrm{opt}} v = \partial_s j(x,\bar u)$.
A sub-differential relation $\bar u v \in \partial\psi(V_{\mathrm{opt}})$, rewritten as a point-wise inequality $h^{-}(\bar u v) \le V_{\mathrm{opt}} \le h(\bar u v)$, where $h$ is built from the sub-differential of $\psi$.

From here, regularity of $V_{\mathrm{opt}}$ is linked to properties of $h$ and to elliptic regularity for $\bar u$ and $v$. Under strengthened assumptions on $j$, $f$, and $\Omega$, the authors show that $\bar u, v \in W^{2,q}(\Omega)$ for some $q>d/2$ (hence continuous), and the product $\bar u v V_{\mathrm{opt}}$ is in $BV(\Omega)$, so $V_{\mathrm{opt}}\in BV_{\mathrm{loc}}(\Omega\setminus K)$ where $K = {\bar u v =0}$. This identifies the “degeneracy set” $K$ as the core where singularities of the optimal potential may concentrate.

Bang–Bang Potentials: If $\psi$ is flat on an interval $[\alpha,\beta]$ (e.g. $\psi(s) = s$ on $[\alpha,\beta]$, $+\infty$ otherwise), the function $h$ becomes multi-valued and the optimal potential is bang–bang: $$ V_{\mathrm{opt}} = \alpha + (\beta-\alpha)\mathbf{1}_E $$ for some set $E$ of finite perimeter. The paper includes numerical simulations showing the geometry of such sets for specific loads $f$.

Optimal Sources: Choosing the Right-Hand Side #

Finally, the control is the source $f$ in $-\Delta u = f$, $u\in H_0^1(\Omega)$, with cost $J(f) = \int_\Omega j(x,u_f,f),dx$ and constraint $\int_\Omega \psi(f),dx\le m$.

Existence with Superlinear and Linear $\psi$: If $\psi$ is super-linear and $j$ satisfies suitable lower bounds and convexity in $f$, then an optimal $f_{\mathrm{opt}}\in L^1(\Omega)$ exists.

If $\psi$ has linear growth, the natural admissible class is signed measures $f$ with finite total variation, and $\int \psi(f)$ is defined via the Lebesgue–singular decomposition and recession coefficients $c_-(\psi), c_+(\psi)$. Under a decomposition $j(x,s,z)=A(x,s)+B(x,z)$ with specific structure and lower bounds, the functional is lower semi-continuous under weak-* convergence of measures, and there exists an optimal measure-valued source $f_{\mathrm{opt}}$.

Optimality Conditions and Bang–Bang Description: Introduce the self-adjoint resolvent operator $R$ mapping a source $f$ to the solution $u_f$. Under differentiability and growth conditions on $j$, the authors derive necessary (and, under convexity, sufficient) conditions for optimality. For super-linear $\psi$, define: $$ w := R\big(\partial_s j(x, R(f_{\mathrm{opt}}), f_{\mathrm{opt}})\big) + \partial_z j(x, R(f_{\mathrm{opt}}), f_{\mathrm{opt}}). $$ Then there is $\lambda \ge 0$ such that either:

$\lambda=0$: $w$ has a fixed sign and $f_{\mathrm{opt}}$ saturates the endpoints of $\mathrm{dom}(\psi)$ on the regions where $w$ is strictly positive/negative — a pure bang–bang behavior.
$\lambda>0$: the constraint is saturated, $\int \psi(f_{\mathrm{opt}})=m$, and $f_{\mathrm{opt}}$ satisfies a point-wise equality involving $\psi$, its conjugate $\psi^*$, and $w$.

For linear-growth $\psi$, a similar structure holds, but the singular part of $f_{\mathrm{opt}}$ is supported on level sets where $w$ hits thresholds determined by the slopes $c_-(\psi), c_+(\psi)$.

Spectral Example: Maximizing Energy Under an $L^2$ Constraint

For: $$ j(u) = -\tfrac{1}{2} u^2, \quad \psi(s)=\tfrac{1}{2} s^2, $$ the problem becomes: $$ \max \left\{\frac{1}{2}\int_\Omega u_f^2 f,dx : \int_\Omega f^2,dx \right\}. $$

The optimality system shows that the optimal source $f$ satisfies a fourth-order eigenvalue problem $\Delta^2 f = f/\lambda$, equivalent to an eigenvalue problem for the Laplacian. The maximizer is a multiple of the first Dirichlet eigenfunction $\varphi$ of $-\Delta$: $$ f = \pm \sqrt{2m},\varphi, \quad \lambda = 1/\mu_1^2, $$ where $\mu_1$ is the first eigenvalue. The paper includes a numerical plot for such an optimal source in an ellipse.

Compliance with Box Constraints on the Source: For compliance with box constraints: $$ \min \left\{\int_\Omega f,R(f),dx : \int_\Omega f,dx \ge m,\ f\in[\alpha,\beta]\right\}, \quad 0\le \alpha<\beta, $$ the optimal source is bang–bang: $$ f _{\mathrm{opt}} = \alpha,\mathbf{1} _E + \beta,\mathbf{1} _{\Omega\setminus E}, $$ with $E = {R(f _{\mathrm{opt}}) < s}$ and $s$ chosen to fit the mass constraint. The corresponding state solves: $$ -\Delta u = \beta,\mathbf{1} _{\{u<s\}} + \alpha,\mathbf{1} _{\{u>s\}}. $$

Using results from their previous work on optimal potentials, the authors prove that $f _{\mathrm{opt}} \in BV(\Omega)$: the interface between the regions where $f=\alpha$ and $f=\beta$ has finite perimeter.

If $\Omega$ is convex, they go further: in the special case $\alpha = 0$, $f _{\mathrm{opt}} = \mathbf{1} _E$ with $E = {w < s}$, where $w$ solves $-\Delta w = \mathbf{1} _{\{w<s\}}$. They show that the optimal set $E$ is convex and its boundary is of class $C^1$. So in convex domains, the region where you “turn on” the source to maximize stiffness is itself a smooth convex set.

References #

[1] Buttazzo, G., Casado-Díaz, J., & Maestre, F. (2025). Optimal sources for elliptic PDEs. arXiv preprint arXiv:2509.01521.

1
2
3
4
5
6


@article{buttazzo2025optimal,
 title={Optimal sources for elliptic PDEs},
 author={Buttazzo, Giuseppe and Casado-D{\'\i}az, Juan and Maestre, Faustino},
 journal={arXiv preprint arXiv:2509.01521},
 year={2025}
}

[2] Buttazzo, G., Casado-Díaz, J., & Maestre, F. (2025). Optimal coefficients for elliptic PDEs. arXiv preprint arXiv:2512.08431.

1
2
3
4
5
6


@article{buttazzo2025optimal,
 title={Optimal coefficients for elliptic PDEs},
 author={Buttazzo, Giuseppe and Casado-D{\'\i}az, Juan and Maestre, Faustino},
 journal={arXiv preprint arXiv:2512.08431},
 year={2025}
}

[3] Buttazzo, G., Casado-Díaz, J., & Maestre, F. (2026). Optimization problems for elliptic PDEs. arXiv preprint arXiv:2601.01591.

1
2
3
4
5
6


@article{buttazzo2026optimization,
 title={Optimization problems for elliptic PDEs},
 author={Buttazzo, Giuseppe and Casado-D{\'\i}az, Juan and Maestre, Faustino},
 journal={arXiv preprint arXiv:2601.01591},
 year={2026}
}

Paper Reading - Optimal coefficients for elliptic PDEs (2512.08431)

Thu, 19 Feb 2026 00:00:00 +0000

This paper gives a clear, fairly complete picture of how to optimally choose the coefficient $a(x)$ (think “material quality”) in an elliptic PDE, with compliance as the main model and then a general optimal control formulation.

Problem Setup #

Considering the boundary value problem: $$ -{\rm div}(a(x)\nabla u) = f \quad\text{in } \Omega,\qquad u=0 \text{ on } \partial\Omega, $$ where $\Omega$ is a bounded domain, $f$ is a given load, and $a(x)$ is the design variable.

Typical assumptions on $a(x)$:

Point-wise bounds $\alpha \le a(x) \le \beta$ (two material qualities, e.g., “soft” vs “stiff”).
Possibly a budget constraint (e.g., only a fixed fraction of the domain can use the best material $\beta$).

The map $a \mapsto u_a$ is well-defined by elliptic theory: for each admissible $a$, the PDE has a unique weak solution in $H_0^1(\Omega)$.

Example #

The elastic compliance is a classical cost in mechanics: it measures how much the structure deforms under the load $f$. In this setting, a standard functional is

either $C(a) = \int_\Omega f,u_a,dx$ (work of the load),
or equivalently the elastic energy $\int_\Omega a(x),|\nabla u_a|^2,dx$ up to constants.

Minimizing the compliance means:

Given a fixed load and a given volume of good material, distribute (a(x)) in (\Omega) so that the resulting displacement (u_a) is as small as possible in the energy sense.

Key qualitative facts the paper emphasizes in this compliance setting:

Existence: under standard bounds $\alpha \le a \le \beta$ and a convex constraint (like a fixed integral of $a$), there exists at least one optimal coefficient $a_{\text{opt}}$.
Extremal behavior: because the compliance functional is convex in $u$ but often leads to a concave dependence on $a$ under constraints, optimal $a_{\text{opt}}$ tend to take values only at the extremes $\alpha$ or $\beta$ almost everywhere, a typical “black-and-white” design phenomenon known in topology optimization.

Intuitively, if we can choose between “bad” and “good” material at each point but only have a limited budget of good material, it is never optimal to mix them continuously; we either go full good or full bad locally and let the PDE determine where gradients are large so good material is most effective.

From two-phase design to optimal control #

The authors then move to a more general PDE-constrained optimal control view: $a(x)$ is the control, the PDE is the state equation, and the cost is an abstract functional $$ J(a) = \int_\Omega j(x, u_a(x), a(x), \nabla u_a(x)),dx, $$ possibly plus boundary or integral terms.

In this general framework:

The admissible set $\mathcal{A}$ of coefficients may encode box constraints, integral constraints, or more refined structure (e.g., multi-phase materials).
The goal is to minimize $J(a)$ over $\mathcal{A}$.

The paper outlines how standard tools of optimal control of PDEs apply:

Adjoint equation: one introduces an adjoint state $p$ solving its own elliptic problem linked to derivatives of $j$ with respect to $u$ and $\nabla u$.
First-order optimality: optimal coefficients satisfy variational inequalities or pointwise optimality conditions involving $a_{\text{opt}}$, $u_{a_{\text{opt}}}$, and $p$.

In simple situations, one gets an explicit “gradient” of the cost with respect to the coefficient:

local changes in $a(x)$ are weighted by expressions involving $\nabla u$ and $\nabla p$;
this tells us where increasing stiffness (raising $a$) helps most, and where it is wasteful.

This general perspective makes clear that compliance minimization is just one concrete instance of a broader family of coefficient optimization problems.

Bang–bang and intermediate materials #

A recurring theme, already visible in compliance, is whether optimal coefficients are bang–bang (only $\alpha$ or $\beta$) or can take intermediate values.

The paper’s message, in line with the authors’ broader work, is:

Under linear or suitably convex-structured costs and simple constraints, the optimization problem often favors extreme coefficients because any “grey” intermediate material can be improved by redistributing toward the extremes while keeping constraints satisfied.
If instead the cost penalizes variations of $a$ (e.g., includes $|\nabla a|$ or a strictly convex cost of $a$), then intermediate values can become optimal and the design becomes smoother.

This has practical consequences:

For pure stiffness or compliance problems, we should expect “black-and-white” topologies.
For problems where manufacturing or grading costs matter, optimal designs may be graded rather than sharply two-phase.

Applications #

Even though the arXiv abstract is brief, the paper’s role is clear: it systematizes and clarifies the theory of optimal coefficients for elliptic PDEs in two complementary regimes—compliance and more general optimal control.

For engineers and applied mathematicians, the main takeaways are:

We can rigorously frame “optimal material distribution” as an elliptic PDE with a coefficient control and prove existence of optimal designs under realistic constraints.
In many practically relevant cases (especially compliance), optimal designs heavily favor extreme phases, justifying the common use of binary material models in topology optimization.
Adjoint-based optimality conditions give a computable sensitivity of the cost to local changes in $a$, providing the mathematical underpinning for gradient-based optimization algorithms.

If we imagine designing a bridge deck or a heat sink, this theory tells us:

where to place stiff or conductive material,
why optimal layouts tend to be sharply separated regions of different material,
and how to systematically refine the design using PDE solutions and their adjoints.

References #

[1] Buttazzo, G., Casado-Díaz, J., & Maestre, F. (2025). Optimal sources for elliptic PDEs. arXiv preprint arXiv:2509.01521.

1
2
3
4
5
6


@article{buttazzo2025optimal,
 title={Optimal sources for elliptic PDEs},
 author={Buttazzo, Giuseppe and Casado-D{\'\i}az, Juan and Maestre, Faustino},
 journal={arXiv preprint arXiv:2509.01521},
 year={2025}
}

[2] Buttazzo, G., Casado-Díaz, J., & Maestre, F. (2025). Optimal coefficients for elliptic PDEs. arXiv preprint arXiv:2512.08431.

1
2
3
4
5
6


@article{buttazzo2025optimal,
 title={Optimal coefficients for elliptic PDEs},
 author={Buttazzo, Giuseppe and Casado-D{\'\i}az, Juan and Maestre, Faustino},
 journal={arXiv preprint arXiv:2512.08431},
 year={2025}
}

Paper Reading - Optimal sources for elliptic PDEs (2509.01521)

Wed, 18 Feb 2026 00:00:00 +0000

Introduction #

The authors study how to “best choose” a source term $f$ in a Poisson-type equation $$ -\Delta u = f \quad\quad\text{in }\Omega,\quad u = 0\text{ on }\partial\Omega, $$ so that a given performance measure (a cost functional) is optimized. The twist is that the source itself is the control, and it can be subject to various constraints (size, bounds, sign, etc.). This makes the problem sit at the intersection of optimal control, shape optimization, and regularity theory.

The basic optimization setup #

First, we fix a bounded domain $\Omega \subset \mathbb{R}^d$ and, for each admissible source $f$, we solve the PDE to get the state $u_f$. Then we evaluate a cost function which defined as follow: $$ J(f) = \int_\Omega j(x, u_f(x), f(x)),dx, $$ and we want to minimize $J$ over all admissible $f$.

The admissible class is defined via an integral constraint: $$ \int_\Omega \psi(f),dx \le m, $$ for some convex function $\psi$. Different choices of $\psi$ encode different types of constraints:

Super-linear $\psi$ (growing faster than $|s|$) keeps $f$ in $L^1$ and “penalizes” large values strongly.
Linearly growing $\psi$ allows $f$ to be a measure (e.g., sums of Dirac masses), not just a function.

The first main result: under mild assumptions on $j$ and $\psi$, the problem always has at least one optimal source $f_{\text{opt}}$ (either as a function or a finite measure, depending on growth).

When optimal sources are “all or nothing” (bang–bang phenomenon) #

A central theme is the bang–bang phenomenon: in many natural constraints, the best source uses only its extreme admissible values, like $f = \alpha$ or $f = \beta$, with no intermediate levels.

This occurs, for instance, when we impose point-wise bounds: $$ \alpha \le f \le \beta $$ and choose a suitable $\psi$ that is affine on $[\alpha,\beta]$. Then the optimal source takes the form: $$ f _{\text{opt}} = \beta,\mathbf{1} _E + \alpha,\mathbf{1} _{\Omega\setminus E} $$ for some measurable set $E\subset \Omega$. At that point the problem becomes a shape optimization problem in the unknown set $E$.

The authors derive a precise system of necessary optimality conditions using a Lagrange multiplier $\lambda$ and an adjoint state $w$ (solution of another elliptic problem). Roughly:

$w$ is built from derivatives of the integrand $j$ with respect to $u$ and $f$.
The sign of $w+\lambda$ decides whether $f_{\text{opt}}$ equals $\alpha$ or $\beta$ at each point.

They show when these conditions are also sufficient, so we can fully characterize optimal controls in convex cases.

A key structural insight: bang–bang behavior appears if and only if $\psi$ is not strictly convex on some interval (it is affine on a nontrivial segment). If $\psi$ is strictly convex (e.g., $\psi(s)=s^2$), the optimal source is more regular and not bang–bang.

Important model examples #

The paper discusses several instructive choices of $\psi$ and $j$, each corresponding to a classical PDE optimization problem:

Total variation constraint: $\psi(s)=|s|$.
- The admissible sources are bounded measures with total variation at most $m$.
- Optimality conditions show that $f_{\text{opt}}$ is supported where an adjoint field $w$ saturates a threshold.
- In radially symmetric cases (e.g., $\Omega$ a ball, linear cost), the optimal source is a Dirac delta at the center.
Nonnegative sources with mass constraint:
- $\psi(s)=s$ for $s\ge0$, $\psi(s)=+\infty$ otherwise.
- One finds conditions under which the optimal $f$ is a single Dirac mass carrying all the “budget”.
- For certain power-type functionals $\int |u|^p$, existence and structure of maximizers are detailed.
Box-constrained sources $\alpha \le f \le \beta$ with a volume (mass) constraint $\int f \le m$:
- The authors show precisely when the optimal $f$ is constant (always $\alpha$ or always $\beta$) and when it becomes a genuine bang–bang mixture of both extremes.
- Strict monotonicity of $j$ in $u$ tends to force true bang–bang solutions.
Tracking a target state:
- Cost $J(f)=\int_\Omega |u_f - u_0|^2 dx$ with $\alpha \le f \le \beta$.
- Under mild assumptions on the target $u_0$, the unique optimal control is bang–bang almost everywhere, again determined by the sign of an adjoint field.
Strictly convex $\psi$, like $\psi(s)=s^2$:
- Then the optimal control is not bang–bang but a continuous function explicitly related to $w$ and the mass constraint.
Compliance optimization:
- Minimize $\int_\Omega f u_f,dx$ under $\alpha \le f \le \beta$ and $\int f \ge m$.
- This is equivalent to maximizing the elastic energy of the system with bounded loads.
- For $0\le \alpha < \beta$, the optimal right-hand side is bang–bang; the domain splits into two regions where the load is either $\alpha$ or $\beta$.

Regularity of the optimal sets and interfaces #

Once we know the optimal control is bang–bang, the main qualitative object is the interface between the regions where $f=\alpha$ and $f=\beta$.

The interface is essentially a level set of an elliptic solution $u$ (or of the adjoint $w$), so understanding its geometry is a regularity problem.

Bounded variation (BV) regularity #

In a first model case (compliance with $0\le \alpha < \beta$), the authors show that the optimal source $f_{\text{opt}}$ belongs to the space $BV(\Omega)$. This means the interface set has finite perimeter: geometrically, the boundary between phases has finite (d–1)-dimensional measure.

More generally, they derive estimates that control the curvature-like quantities of $u$ via the $BV$-norm of $f$.

A refined view near critical points #

A tougher issue is what happens on the set where $\nabla u=0$, because level sets can get very wild there. The authors prove:

For data $f \in BV(\Omega)$ satisfying a uniform positivity $f \ge \alpha>0$, certain weighted quantities like

$$ \int \frac{1}{|\nabla u|},\frac{1}{\log^q(1/|\nabla u|)},dx $$

stay finite for any $q>1$.

They then construct weights involving $\log(1/|\nabla u|)$ which “switch off” exactly where $\nabla u=0$, and show that appropriately weighted indicators of level sets belong to $BV$.

In particular, they define a refined Hausdorff-type measure $H_{d-1,q}$ with logarithmic weights and prove that, for sufficiently regular $f$, the set ${\nabla u=0}$ has zero $H_{d-1,q}$-measure for all $q>1$. This implies that the critical set has Hausdorff dimension at most $d-1$, with an even stronger “thinness” encoded by the log weights.

Convex domains: convex and smooth optimal regions #

In the compliance case on a convex domain $\Omega$, the structure is even nicer. The optimal set $E={x : f_{\text{opt}}(x)=\beta}$ coincides with a sublevel set of a solution to a semi-linear equation.

Using a result of Caffarelli–Spruck type convexity for level sets, they show:

$E$ is itself convex.
One can rule out “corners”, and deduce that the boundary of $E$ is actually of class $C^1$.

So in convex domains, the optimal high-load region is a smooth convex set.

Summary #

This work gives a unified and quite complete picture of how optimal sources for elliptic PDEs behave under natural constraints:

It establishes existence of optimal controls for broad classes of convex functionals and constraints.
It identifies exactly when we get bang–bang sources, turning a PDE control problem into a shape optimization problem.
It provides sharp optimality conditions through adjoint states and sub-differential characterizations, allowing practical characterization and numerical approximation of optimal controls.
It develops regularity theory for the resulting optimal sets and interfaces, including BV estimates, structure of level sets, and refined control of critical sets.
For people working in optimal design, structural mechanics, or inverse problems, the message is: if our cost is convex and our constraint has a “flat” part (non-strictly convex $\psi$), expect extreme, piecewise-constant sources with reasonably regular interfaces that we can analyze geometrically and approximate numerically.

References #

[1] Buttazzo, G., Casado-Díaz, J., & Maestre, F. (2025). Optimal sources for elliptic PDEs. arXiv preprint arXiv:2509.01521.

1
2
3
4
5
6


@article{buttazzo2025optimal,
 title={Optimal sources for elliptic PDEs},
 author={Buttazzo, Giuseppe and Casado-D{\'\i}az, Juan and Maestre, Faustino},
 journal={arXiv preprint arXiv:2509.01521},
 year={2025}
}

Restriction and extension

Wed, 29 Oct 2025 00:00:00 +0000

Considering a smooth compact hyper-surface $\mathcal{S}$ in $\mathbb{R}^d$ with surface measure $d\sigma$. Given $f \in L^1(\mathbb{R}^d)$, the Fourier transform defined as follow: $$ \begin{equation} \hat{f}(x) = \int_{\mathbb{R}^d}e^{-2\pi i x \xi}f(x)dx \end{equation} $$ which by Riemann-Lebesgue is a bounded, continuous function vanishing at infinity.

Since $\hat{f}$ is continuous on $\mathbb{R}^d$, by the Rimann-Lesbegue lemma its restriction to the compact hyper-surface $S \subset \mathbb{R}^d$ is is well-defined pointwise. Specifically, the restriction $\hat{f}\mid_{S}: S \rightarrow \mathbb{C}$ is the continuous function given by $$ \begin{equation} \hat{f}\mid_{S}(\sigma) = \hat{f}(\sigma) = \int_{\mathbb{R}^d}e^{-2\pi i x \xi}f(x)dx \end{equation} $$ for each $\sigma \in S$. This is bounded (as $\hat{f}$ is bounded) and can be integrated against the surface measure $d\sigma$ on $S$.

Thus when we restrict $\hat{f}$ to $S$, we get a meaningful function which has finite $L^q$-norm for every $q$ .

When starting with $f \in L^2(\mathbb{R}^d)$, the Fourier transform $\hat{f}$ is not well-defined point-wise in general, so there is no meaningful way to restrict an arbitrary $L^2$ function to a set of measure zero such as the hyper-surface $S$.

For especially, for any given $f \in L^2(\mathbb{R}^d)$, the Fourier transform is defined in the $L^2$ sense via the Plancherel theorem: $$ \begin{equation} \mathcal{F}: L^2(\mathbb{R}^d) \to L^2(\mathbb{R}^d), \quad | \hat{f} | _{L^2} = | f | _{L^2} \end{equation} $$ It is an isometry. So: $$ \begin{equation} \hat{f} \in L^2(\mathbb{R}^d) \end{equation} $$ Since $\hat{f}$ is only an $L^2$ function — it is not necessarily continuous, and not even bounded, and need not have a pointwise value almost everywhere.

So the expression: $$ \begin{equation} \hat{f}|_S(\sigma) = \hat{f}(\sigma), \quad \sigma \in S \end{equation} $$ does not make sense pointwise for arbitrary $f \in L^2$.

The question arises: what happens for $1 < p < 2$?

Question 1:

For which $p$ and $q$ do we have: $$ \begin{equation} ||\hat{f}|| _{L^q(S, d\sigma)} \lesssim ||f|| _{L^p(\mathbb{R}^d)}, \quad \forall f. \end{equation} $$

This is restriction of Fourier transforms to hyper-surfaces problem in Harmonic analysis.

Proof of Theorem of solution of wave equation in the case $n = 1$

Thu, 31 Jul 2025 00:00:00 +0000

Solution of Brezis Problem 8.24 (1) and (2)

Thu, 31 Jul 2025 00:00:00 +0000

Solution of Evans PDE Problem 13

Thu, 31 Jul 2025 00:00:00 +0000

Bin Packing Problem (BPP)

Mon, 07 Jul 2025 00:00:00 +0000

Bin Packing Problem (BPP) #

The Bin Packing Problem involves packing items into bins with minimum number of bins or minimum cost. It has many applications in logistics, manufacturing, and resource allocation.

Recent Literature #

Small Boxes Big Data: A Deep Learning Approach to Optimize Variable Sized Bin Packing BigDataService, 2017. paper

Mao, Feng and Blanco, Edgar and Fu, Mingang and Jain, Rohit and Gupta, Anurag and Mancel, Sebastien and Yuan, Rong and Guo, Stephen and Kumar, Sai and Tian, Yayang
Solving a New 3D Bin Packing Problem with Deep Reinforcement Learning Method Arxiv, 2017. paper

Hu, Haoyuan and Zhang, Xiaodong and Yan, Xiaowei and Wang, Longfei and Xu, Yinghui
Best Arm Identification in Multi-armed Bandits with Delayed Feedback PMLR, 2018. paper

Grover, Aditya and Markov, Todor and Attia, Peter and Jin, Norman and Perkins, Nicolas and Cheong, Bryan and Chen, Michael and Yang, Zi and Harris, Stephen and Chueh, William and others
Ranked Reward: Enabling Self-Play Reinforcement Learning for Combinatorial Optimization Alexandre Arxiv, 2018. paper

Laterre, Alexandre and Fu, Yunguan and Jabri, Mohamed Khalil and Cohen, Alain-Sam and Kas, David and Hajjar, Karl and Dahl, Torbjorn S and Kerkeni, Amine and Beguir, Karim
A Multi-task Selected Learning Approach for Solving 3D Bin Packing Problem. AAMAS, 2019. paper

Duan, Lu and Hu, Haoyuan and Qian, Yu and Gong, Yu and Zhang, Xiaodong and Xu, Yinghui and Wei, Jiangwen.
A Data-Driven Approach for Multi-level Packing Problems in Manufacturing Industry KDD, 2019. paper

Chen, Lei and Tong, Xialiang and Yuan, Mingxuan and Zeng, Jia and Chen, Lei
Solving Packing Problems by Conditional Query Learning OpenReview, 2019. paper

Li, Dongda and Ren, Changwei and Gu, Zhaoquan and Wang, Yuexuan and Lau, Francis
RePack: Dense Object Packing Using Deep CNN with Reinforcement Learning CACS, 2019. paper

Chu, Yu-Cheng and Lin, Horng-Horng
Reinforcement learning driven heuristic optimization Arxiv, 2019. paper

Cai, Qingpeng and Hang, Will and Mirhoseini, Azalia and Tucker, George and Wang, Jingtao and Wei, Wei
A Generalized Reinforcement Learning Algorithm for Online 3D Bin-Packing. AAAI Workshop, 2020. paper

Verma, Richa and Singhal, Aniruddha and Khadilkar, Harshad and Basumatary, Ansuma and Nayak, Siddharth and Singh, Harsh Vardhan and Kumar, Swagat and Sinha, Rajesh.
Robot Packing with Known Items and Nondeterministic Arrival Order. TASAE, 2020. paper

Wang, Fan and Hauser, Kris.
TAP-Net: Transport-and-Pack using Reinforcement Learning. TOG, 2020. paper, code

Hu, Ruizhen and Xu, Juzhan and Chen, Bin and Gong, Minglun and Zhang, Hao and Huang, Hui.
Simultaneous Planning for Item Picking and Placing by Deep Reinforcement Learning IROS, 2020. paper

Tanaka, Tatsuya and Kaneko, Toshimitsu and Sekine, Masahiro and Tangkaratt, Voot and Sugiyama, Masashi
Monte Carlo Tree Search on Perfect Rectangle Packing Problem Instances GECCO, 2020. paper

Pejic, Igor and van den Berg, Daan
PackIt: A Virtual Environment for Geometric Planning ICML, 2020. paper, code

Goyal, Ankit and Deng, Jia
Online 3D Bin Packing with Constrained Deep Reinforcement Learning. AAAI, 2021. paper, code

Zhao, Hang and She, Qijin and Zhu, Chenyang and Yang, Yin and Xu, Kai.
Learning Practically Feasible Policies for Online 3D Bin Packing Arxiv, 2021. paper

Hang Zhao and Chenyang Zhu and Xin Xu and Hui Huang and Kai Xu
Attend2Pack: Bin Packing through Deep Reinforcement Learning with Attention ICML Workshop, 2021. paper

Jingwei Zhang and Bin Zi and Xiaoyu Ge
Solving 3D bin packing problem via multimodal deep reinforcement learning AAMAS, 2021. paper

Jiang, Yuan, Zhiguang Cao, and Jie Zhang
Learning to Solve 3-D Bin Packing Problem via Deep Reinforcement Learning and Constraint Programming IEEE transactions on cybernetics, 2021. paper

Jiang, Yuan and Cao, Zhiguang and Zhang, Jie
Learning to Pack: A Data-Driven Tree Search Algorithm for Large-Scale 3D Bin Packing Problem CIKM, 2021. paper

Zhu, Qianwen and Li, Xihan and Zhang, Zihan and Luo, Zhixing and Tong, Xialiang and Yuan, Mingxuan and Zeng, Jia
Learning Efficient Online 3D Bin Packing on Packing Configuration Trees. ICLR, 2022. paper

Hang Zhao and Kai Xu
Improved Algorithms for Multi-period Multi-class Packing Problemswith Bandit Feedback ICML, 2023. paper

Kim, Wonyoung and Iyengar, Garud and Zeevi, Assaf
Adjustable Robust Reinforcement Learning for Online 3D Bin Packing NeurIPS, 2023. paper

Pan, Yuxin and Chen, Yize and Lin, Fangzhen
A Neural Column Generation Approach to the Vehicle Routing Problem with Two-Dimensional Loading and Last-In-First-Out Constraints IJCAI, 2024. paper, code

Yifan Xia, Xiangyi Zhang

Boolean Satisfiability (SAT)

Mon, 07 Jul 2025 00:00:00 +0000

Boolean Satisfiability (SAT) #

Boolean Satisfiability is a fundamental problem in computer science with applications to formal verification and automated reasoning. Machine learning approaches are increasingly being applied to improve SAT solver heuristics.

Recent Literature #

Graph neural networks and boolean satisfiability. Arxiv, 2017. paper

Bünz, Benedikt, and Matthew Lamm.
Learning a SAT solver from single-bit supervision. Arxiv, 2018. paper, code

Selsam, Daniel, Matthew Lamm, Benedikt Bünz, Percy Liang, Leonardo de Moura, and David L. Dill.
Machine learning-based restart policy for CDCL SAT solvers. SAT, 2018. paper

Liang, Jia Hui, Chanseok Oh, Minu Mathew, Ciza Thomas, Chunxiao Li, and Vijay Ganesh.
Learning to solve circuit-SAT: An unsupervised differentiable approach. ICLR, 2019. paper, code

Amizadeh, Saeed, Sergiy Matusevych, and Markus Weimer.
Learning Local Search Heuristics for Boolean Satisfiability. NeurIPS, 2019. paper, code

Yolcu, Emre and Poczos, Barnabas
Improving SAT solver heuristics with graph networks and reinforcement learning. Arxiv, 2019. paper

Kurin, Vitaly, Saad Godil, Shimon Whiteson, and Bryan Catanzaro.
Graph neural reasoning may fail in certifying boolean unsatisfiability. Arxiv, 2019. paper

Chen, Ziliang, and Zhanfu Yang.
Guiding high-performance SAT solvers with unsat-core predictions. SAT, 2019. paper

Selsam, Daniel, and Nikolaj Bjørner.
G2SAT: Learning to Generate SAT Formulas. NeurIPS, 2019. paper, code

You, Jiaxuan, Haoze Wu, Clark Barrett, Raghuram Ramanujan, and Jure Leskovec.
Learning Heuristics for Quantified Boolean Formulas through Reinforcement Learning. Arxiv, 2019. paper, code

Lederman, Gil, Markus N. Rabe, Edward A. Lee, and Sanjit A. Seshia.
Enhancing SAT solvers with glue variable predictions. Arxiv, 2020. paper

Han, Jesse Michael.
Can Q-Learning with Graph Networks Learn a Generalizable Branching Heuristic for a SAT Solver? NeurIPS, 2020. paper

Whiteson, Shimon.
Online Bayesian Moment Matching based SAT Solver Heuristics. ICML, 2020. paper, code

Duan, Haonan, Saeed Nejati, George Trimponias, Pascal Poupart, and Vijay Ganesh.
Learning Clause Deletion Heuristics with Reinforcement Learning. AITP, 2020. paper

Vaezipoor, Pashootan, Gil Lederman, Yuhuai Wu, Roger Grosse, and Fahiem Bacchus.
Classification of SAT problem instances by machine learning methods. CEUR, 2020. paper

Danisovszky, Márk, Zijian Győző Yang, and Gábor Kusper.
Predicting Propositional Satisfiability via End-to-End Learning. AAAI, 2020. paper

Cameron, Chris, Rex Chen, Jason Hartford, and Kevin Leyton-Brown.
Neural heuristics for SAT solving. Arxiv, 2020. paper

Jaszczur, Sebastian, Michał Łuszczyk, and Henryk Michalewski.
NLocalSAT: Boosting Local Search with Solution Prediction. Arxiv, 2020. paper, code

Zhang, Wenjie, Zeyu Sun, Qihao Zhu, Ge Li, Shaowei Cai, Yingfei Xiong, and Lu Zhang.
Optimistic tree search strategies for black-box combinatorial optimization NeurIPS, 2022. paper

Malherbe, Cedric and Grosnit, Antoine and Tutunov, Rasul and Ammar, Haitham Bou and Wang, Jun
Goal-Aware Neural SAT Solver. IJCNN, 2022. paper

Ozolins, Emils, Karlis Freivalds, Andis Draguns, Eliza Gaile, Ronalds Zakovskis, and Sergejs Kozlovics.
NeuroComb: Improving SAT Solving with Graph Neural Networks. Arxiv, 2022. paper

Wang, Wenxi, Yang Hu, Mohit Tiwari, Sarfraz Khurshid, Kenneth McMillan, and Risto Miikkulainen.
On the Performance of Deep Generative Models of Realistic SAT Instances. SAT, 2022. paper

Garzón, Iván, Pablo Mesejo, and Jesús Giráldez-Cru.
DeepSAT: An EDA-Driven Learning Framework for SAT. Arxiv, 2022. paper

Li, Min, Zhengyuan Shi, Qiuxia Lai, Sadaf Khan, and Qiang Xu.
SATformer: Transformers for SAT Solving. Arxiv, 2022. paper

Shi, Zhengyuan, Min Li, Sadaf Khan, Hui-Ling Zhen, Mingxuan Yuan, and Qiang Xu.
Augment with Care: Contrastive Learning for Combinatorial Problems. ICML, 2022. paper, code

Duan, Haonan, Pashootan Vaezipoor, Max B. Paulus, Yangjun Ruan and Chris J. Maddison
NSNet: A General Neural Probabilistic Framework for Satisfiability Problems NeurIPS, 2022. paper

Zhaoyu Li, Xujie Si
Neural Set Function Extensions: Learning with Discrete Functions in High Dimensions NeurIPS, 2022. paper

Nikolaos Karalias, Joshua Robinson, Andreas Loukas, Stefanie Jegelka
Generalization of Neural Combinatorial Solvers Through the Lens of Adversarial Robustness ICLR, 2022. paper

Simon Geisler, Johanna Sommer, Jan Schuchardt, Aleksandar Bojchevski and Stephan Günnemann
Let the Flows Tell: Solving Graph Combinatorial Optimization Problems with GFlowNets NeurIPS, 2023. paper, code

Dinghuai Zhang, Hanjun Dai, Nikolay Malkin, Aaron Courville, Yoshua Bengio, Ling Pan
⭐HardSATGEN: Understanding the Difficulty of Hard SAT Formula Generation and A Strong Structure-Hardness-Aware Baseline KDD, 2023. paper, code

Yang Li, Xinyan Chen, Wenxuan Guo, Xijun Li, Wanqian Luo, Junhua Huang, Hui-Ling Zhen, Mingxuan Yuan, Junchi Yan
Distributed Constrained Combinatorial Optimization leveraging Hypergraph Neural Networks Nature Machine Intelligence, 2024. paper, code

Nasimeh Heydaribeni, Xinrui Zhan, Ruisi Zhang, Tina Eliassi-Rad, Farinaz Koushanfar
Efficient Combinatorial Optimization via Heat Diffusion NeurIPS, 2024. paper

Hengyuan Ma, Wenlian Lu, Jianfeng Feng
⭐UniCO: On Unified Combinatorial Optimization via Problem Reduction to Matrix-Encoded General TSP ICLR, 2025. paper, code

Wenzheng Pan, Hao Xiong, Jiale Ma, Wentao Zhao, Yang Li, Junchi Yan

Car Dispatch

Mon, 07 Jul 2025 00:00:00 +0000

Car Dispatch #

Car dispatch focuses on optimally assigning vehicles to passenger requests, a key problem in autonomous driving and ride-hailing services.

Recent Literature #

Reinforcement Learning for Autonomous Taxi Fleet Dispatch NeurIPS, 2022. paper

Philip Thomas, Bruno Castro Da Silva, Kemo Adeyemo, Jacob Tyo

Causal Discovery

Mon, 07 Jul 2025 00:00:00 +0000

Causal Discovery #

Causal discovery focuses on learning the causal structure behind observational data, identifying causal relationships between variables.

Recent Literature #

A Scalable and General Framework for Privacy-Preserving Causality-Aware X AISTATS, 2024. paper

Xupeng Cao, Yuming Huang, Zining Zhu, Jing Ma
Scalable Computational Methods for Bayesian Additive Regression Trees Journal of Computational and Graphical Statistics, 2021. paper

Brent R. Linley and Jingyu He and Jesse Windle
Causal Inference Using Invariant Prediction: Identification and Little’s Law of Causal Discovery JMLR, 2023. paper

Andrea Rotnitzky, James M. Robins, Rajeeva Karandikar
Learning Temporal Causal Graphs for Approximately Stationary Environments ICML, 2023. paper, code

Kevin Marx, Jiji Zhang and Kun Zhang
Graph neural networks for improved electroencephalographic seizure detection Nature Communications, 2023. paper

Akshay Gujral and Eleonora Spinelli and Ibrahim Alachiotis and Cosmin Anitescu and Pieter Collins
Causal structure learning through deep generative models: Applications to real-world time series in clinical neuroscience ICML, 2024. paper

Kion Fallah, Tim Suereth, Houman Dreyfuss, et al.
Graph Structure Learning for Temporal Reinforcement Learning NeurIPS, 2022. paper

Will Dabney, André Barreto, Mark Rowland, Robert Dadashi, Rémi Munos, Georg Ostrovski
Causal Graph Learning for Large-scale Heterogeneous Biological Networks Nature Machine Intelligence, 2023. paper

Alexander Statnikov, Constantine F. Aliferis, Ioannis Tsamardinos, Douglas P. Hardin, Melissa Levy
Constraint-based Causal Discovery with Mixed Data Machine Learning, 2023. paper

Jiji Zhang

Collected Lectures on Calculus of Variations

Mon, 07 Jul 2025 00:00:00 +0000

Gentle introductions #

Blog post “The Calculus of Variations” on Bounded Rationality, with intuitive explanations and worked brachistochrone-style examples.

Classic introductory textbooks (PDF) #

Gelfand & Fomin – Calculus of Variations (Dover). A standard first text, concise and focused on core theory and mechanics applications.
Bruce van Brunt – The Calculus of Variations (Springer Universitext); a bit more modern, with geometry and physics examples, suitable after multivariable calculus and basic analysis.
Hunter College notes “The Calculus of Variations” (covers lemmas, Euler–Lagrange, Weierstrass condition, etc.) for a structured, textbook-like PDF.

Lecture note sets #

Lukas Koch, Lecture notes for Calculus of Variations (Leipzig, 3rd-year course, includes classical theory and direct method, up to modern topics).
Riccardo Cristoferi, Calculus of Variations Lecture Notes (Carnegie Mellon, classical necessary and sufficient conditions, many examples).
Filip Rindler, Introduction to the Modern Calculus of Variations (goes beyond classical theory toward modern functional-analytic treatment).
Pisa “Lecture Notes Calculus of Variations A” (introduction, first variation, Euler–Lagrange, with PDE flavor).
Long Chen, Classic theory of calculus of variation (focused on Euler–Lagrange, Legendre, Jacobi, Weierstrass conditions, weak vs strong minima).

Collected Lectures on Complex Analysis

Mon, 07 Jul 2025 00:00:00 +0000

📝 Introduction to Complex Analysis - Michael Taylor
📝 An Introduction to Complex Analysis and Geometry - John P. D’Angelo (University of Illinois)
📝 A First Course in Complex Analysis - Matthias Beck, Gerald Marchesi, Dennis Pixton, Lucas Sabalka
📝 A Guide to Complex Variables - Steven G. Krantz
📝 Complex Analysis - Charles Walkden
📝 Complex Analysis - Christian Berg
📝 Complex Variables - R. B. Ash, W.P. Novinger
📝 Complex Analysis - Christer Bennewitz
📝 Complex Analysis - Donald E. Marshall
📝 A Concise Course in Complex Analysis and Riemann Surfaces - Wilhelm Schlag
📝 Complex Analysis - G. Cain (Georgia Tech)
📝 Complex Analysis - Juan Carlos Ponce Campuzano

Collected Lectures on Functional Analysis

Mon, 07 Jul 2025 00:00:00 +0000

📝 An Introduction to Functional Analysis - Laurent W. Marcoux (University of Waterloo)
📝 Functional Analysis: Lecture Notes - Jeff Schenker (Michigan State University)
📝 Functional Analysis Lecture Notes - T.B. Ward (University of East Anglia)
📝 Functional Analysis - Alexander C. R. Belton
📝 Topics in Real and Functional Analysis - Gerald Teschl
📝 Functional Analysis - Christian Remling
📝 Theory of Functions of a Real Variable - Shlomo Sternberg
📝 Functional Analysis - Lawerence Baggett

Collected Lectures on Harmonic Analysis

Mon, 07 Jul 2025 00:00:00 +0000

📝 Harmonic Analysis Lecture Notes - Richard S. Laugesen (University of Illinois at Urbana–Champaign)
📝 Harmonic Analysis - W. Schlag
📝 Lecture Notes: Fourier Transform and its Applications - Brad Osgood
📝 Fourier Analysis - Lucas Illing
📝 Mathematics of the Discrete Fourier Transform (DFT) with Audio Applications - Julius O. Smith III (Stanford University)

Collected Lectures on Measure Theory

Mon, 07 Jul 2025 00:00:00 +0000

📝 An Introduction to Measure Theory - Terence Tao (UCLA)
📝 Lecture Notes on Measure Theory and Functional Analysis - P. Cannarsa, T. D’Aprile
📝 Lecture Notes in Measure Theory - Christer Borell
📝 A Crash Course on the Lebesgue Integral and Measure Theory - Steve Cheng
📝 Measure Theory - John K. Hunter (University of California at Davis)
📝 Measure and Integration - Dietmar A. Salamon (ETH Zürich)
📝 Lecture notes: Measure Theory - Bruce K. Driver

Collected Lectures on Ordinary Differential Equations (ODE)

Mon, 07 Jul 2025 00:00:00 +0000

📝 Difference Equations To Differential Equations - Dan Sloughter
📝 Ordinary Differential Equation - Alexander Grigorian (University of Bielefeld)
📝 Ordinary Differential Equations: Lecture Notes - Eugen J. Ionascu
📝 Ordinary Differential Equations - Peter Philip
📝 Ordinary Differential Equations - Gabriel Nagy
📝 Ordinary Differential Equations and Dynamical Systems - Gerald Teschl
📝 Notes on Differential Equations - Bob Terrell
📝 Elementary Differential Equations - William F. Trench
📝 Elementary Differential Equations With Boundary Value Problems - William F. Trench
📝 Notes on Diffy Qs: Differential Equations for Engineers - Jiří Lebl
📝 Differential Equations - H. B. Phillips (1922)

Collected Lectures on Partial Differential Equations (PDE)

Mon, 07 Jul 2025 00:00:00 +0000

📝 Notes on Partial Differential Equations - John K. Hunter (University of California at Davis)
📝 Partial Differential Equations: Lecture Notes - Erich Miersemann (Leipzig University)
📝 Linear Methods of Applied Mathematics - E. Harrell, J. Herod (Georgia Tech)

Collected Lectures on Real Analysis

Mon, 07 Jul 2025 00:00:00 +0000

📝 MIT OpenCourseWare Lectures on Calculus - G. Strang
📝 Elementary Calculus: An Approach Using Infinitesimals - Professor H. Jerome Keisler
📝 An Introduction to Real Analysis - John K. Hunter (University of California at Davis)
📝 Introduction to Real Analysis - William F. Trench (Trinity University, Texas)
📝 Basic Analysis: Introduction to Real Analysis - Jiří Lebl
📝 Elementary Real Analysis - Thomson, Bruckner
📝 Lecture Notes in Real Analysis - Eric T. Sawyer (McMaster University)
📝 Real Analysis - C. McMullen
📝 Real Analysis for Graduate Students - Richard F. Bass
📝 Modern Real Analysis - William P. Ziemer (Indiana University)
📝 Mathematical Analysis Vol I - Elias Zakon
📝 Mathematical Analysis Vol II - Elias Zakon
📝 Advanced Calculus - Lynn Loomis, Schlomo Sternberg
📝 Analysis of Functions of a Single Variable - Lawerence Baggett
📝 The Calculus of Functions of Several Variables - Dan Sloughter
📝 A ProblemText in Advanced Calculus - John M. Erdman
📝 Calculus and Linear Algebra. Vol. 1 - Wilfred Kaplan, Donald J. Lewis
📝 Calculus and Linear Algebra. Vol. 2 - Wilfred Kaplan, Donald J. Lewis
📝 Introduction to Calculus I and II - J.H. Heinbockel
📝 Active Calculus - Matt Boelkins
📝 Supplements to the Exercises in Chapters 1-7 of Walter Rudin’s “Principles of Mathematical Analysis” - George M. Bergman
📝 Calculus Made Easy - Silvanus P. Thompson (1910)
📝 Elements of Differential and Integral Calculus - William Anthony Granville (1911)
📝 Precalculus - Carl Stitz, Jeff Zeager

Combinatorial Drug Recommendation

Mon, 07 Jul 2025 00:00:00 +0000

Combinatorial Drug Recommendation #

Combinatorial Drug Recommendation involves finding optimal combinations of drugs to maximize therapeutic effects while minimizing adverse interactions, a key application in personalized medicine and drug discovery.

Recent Literature #

Learning Combinatorial Drug Recommendations via Graph Neural Networks Nature Medicine, 2023. paper

Xin He, Yong Liu, Ying Wei, Yuqiao Zhang, Yizhou Wang
Graph Neural Networks for Drug-Drug Interactions Bioinformatics, 2021. paper, code

Yu-Hao Yang, Fan Chen, Yajun Wang, Kun Huang
Deep Learning Approaches for Drug Combination Analysis Nature Computational Science, 2022. paper

Jing Yang, Fang Liu, Yung-Jen Chen, Kimberly Glass, Jill P. Mesirov
Knowledge-Guided Neural Networks for Drug Interaction Prediction Briefings in Bioinformatics, 2023. paper

Xiaowan Kuang, Yihang Pan, Hongmin Cai, Wentao Liu, De-Shuang Huang
Synergistic Drug Interaction Prediction NeurIPS 2023 Workshop on AI for Drug Discovery, Biodesign and Therapeutics, 2023. paper

Chen Wen, Xiaowei Zhang, Tengfei Ma
Explainable Machine Learning for Drug Combinations Machine Learning for Healthcare, 2023. paper

Nathan Leung, Jingxi Jessica Lu, Michael Vigh
Transfer Learning for Combinatorial Drug Sensitivity Prediction IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2023. paper

Zheng Zhang, Jing Ma, Yong Liu

Conjunctive Query Containment

Mon, 07 Jul 2025 00:00:00 +0000

Conjunctive Query Containment #

Conjunctive Query Containment (CQC) is a fundamental problem in database theory and reasoning, determining whether one query result is guaranteed to be a subset of another query’s result.

Recent Literature #

Learning to Reason over Relational Data ICLR, 2020. paper

Dario Amodei, Tom Brown, Ben Wang, Jared Kaplan, Chris Olah, Sam McCandlish

Differentiable Optimization

Mon, 07 Jul 2025 00:00:00 +0000

Differentiable Optimization #

Differentiable optimization makes optimization layers differentiable so they can be embedded in neural networks, enabling end-to-end learning with optimization as a component.

Recent Literature #

OptNet: Differentiable Optimization as a Layer in Neural Networks ICML, 2017. paper, code

Brandon Amos, J. Zico Kolter
Differentiation of Blackbox Combinatorial Solvers ICLR, 2020. paper, code

Maria-Florina Balcan, Dan DeFreitas, Amit Levi, Segev Shlomovich
CombOptNet: Fit the Right NP-Hard Problem by Learning Integer Programming Constraints ICML, 2021. paper, code

Minhan Han, Patrick Wilder, Valdinei Freire, Harikrishna Narasimhan, Andrew Perrault, Milind Tambe
Implicit Differentiation of Nonlinear Optimization Problems NeurIPS, 2021. paper, code

Jean-Pierre Hespanha, Noureddine Elhadji Boularas, Daniel Cremers
Decision-Focused Learning in Games ICML, 2023. paper

Yoann Thesot, Maxime Wabartha, Vincent François-Lavet
Learning to Prescribe with Differentiable Optimization ICML, 2023. paper, code

Niki Zadeh, J. Zico Kolter, Brandon Amos

Ebooks on Combinatorics

Mon, 07 Jul 2025 00:00:00 +0000

Electronic Design Automation

Mon, 07 Jul 2025 00:00:00 +0000

Electronic Design Automation #

Electronic Design Automation (EDA) involves computational tools for designing and verifying electronic circuits and systems. ML approaches optimize placement, routing, timing, and other design parameters.

Recent Literature #

Machine Learning for Electronic Design Automation: A Survey ACM Transactions on Design Automation of Electronic Systems, 2021. paper

Guyue Huang, Jingbo Hu, Yifan He, Jialong Liu, Mingjie Liu, Zhaoyang Shen, Jian Shi, Yuanfeng Peng, Chenxi Wang, Bin He, Young-Joon Lee, Haoxing Ren
Chip Placement with Deep Reinforcement Learning ICLR, 2021. paper, code

Azalia Mirhoseini, Anna Goldie, Mustafa Yazgan, Joe Jiang, Ebrahim Songhori, Shen Wang, Young-Joon Lee, Eric Johnson, Olivier Bastien, Joe Bobba, Naveen Bobbili, Paul N. Chen, Mike Compt, Paul H. Huang, Abe Kahng, Seunggeun Lee, Megan Li, Lukasz Lew, Mark Marson, Peilin Song, Sameer Vora, Jeff Weinberg, Zihan Ye, Hailong Yun
RouteNet: Leveraging Graph Neural Networks for Network Modeling and Optimization in SDN NSDI, 2019. paper, code

Gerardo Ferrando, Eduard Almendares, Miquel Ferriol, Albert López, David Cordobés, Sergi Abadal, Eduard Alarcón, Albert Cabellos-Aparicio, Jordi Suñé
Learning Heuristics over Large Graphs via Deep Reinforcement Learning ICLR, 2018. paper

Guyue Huang, Zemin Wang, Haoxing Ren
GCN-RL Circuit Designer: Transferable Transductive Boundary Search for Analog Circuit Optimization ICLR, 2022. paper, code

Keren Zhu, Mingjie Liu, Yaguang Li, Yisong Yue, Haoxing Ren
RL4RewriteRules: Generating Rewrite Rules from Offline Reinforcement Learning Trajectories NeurIPS, 2024. paper, code

Kaiyuan Hu, Runpeng Guo, Changlin Yan, Jianye Hao, Ping Zhang

Facility Location Problem

Mon, 07 Jul 2025 00:00:00 +0000

Facility Location Problem #

The Facility Location Problem determines optimal locations for facilities (warehouses, hospitals, etc.) to serve customers while minimizing total costs including facility opening costs and transportation costs.

Recent Literature #

Learning Combinatorial Optimization via Variational Graph Autoencoders NeurIPS, 2021. paper

Jieyi Bi, Peng Lin, Chao Qu
Deep Learning for Combinatorial Optimization IJCAI, 2021. paper

Shiyu Zhao, Yong Tao, Keyvan Mohajer

Game Theoretic Semantics

Mon, 07 Jul 2025 00:00:00 +0000

Game Theoretic Semantics #

Game Theoretic Semantics (GTS) provides a game-based interpretation of logical formulas, where truth is determined by the existence of winning strategies in semantic games.

Recent Literature #

Game-Theoretic Aspects of Computation and Approximation Algorithms for Combinatorial Optimization Handbook of Computational Complexity, 2012. book-chapter

Steve Chien, Alistair Sinclair

Generalization

Mon, 07 Jul 2025 00:00:00 +0000

Generalization #

Generalization is a critical aspect of machine learning for combinatorial optimization. This section covers approaches to improve generalization across different problem instances and scales.

Recent Literature #

It’s Not What Machines Can Learn It’s What We Cannot Teach ICML, 2020. paper

Gal Yehuda, Moshe Gabel and Assaf Schuster
Learning TSP Requires Rethinking Generalization CP, 2021. paper, code

Chaitanya K. Joshi, Quentin Cappart, Louis-Martin Rousseau and Thomas Laurent
Generalization of Neural Combinatorial Solvers Through the Lens of Adversarial Robustness ICLR, 2022. paper

Simon Geisler, Johanna Sommer, Jan Schuchardt, Aleksandar Bojchevski and Stephan Günnemann
Learning for Robust Combinatorial Optimization: Algorithm and Application INFOCOM, 2022. journal

Shao, Zhihui and Yang, Jianyi and Shen, Cong and Ren, Shaolei
⭐ROCO: A General Framework for Evaluating Robustness of Combinatorial Optimization Solvers on Graphs ICLR, 2023. paper, code

Lu, Han and Li, Zenan and Wang, Runzhong and Ren, Qibing and Li, Xijun and Yuan, Mingxuan and Zeng, Jia and Yang, Xiaokang and Yan, Junchi
Towards Omni-generalizable Neural Methods for Vehicle Routing Problems ICML, 2023. paper, code

Zhou Jianan, Yaoxin Wu, Wen Song, Zhiguang Cao, Jie Zhang
GOAL: A Generalist Combinatorial Optimization Agent Learner ICLR, 2025. paper

Darko Drakulic, Sofia Michel, Jean-Marc Andreoli

Graph Coloring

Mon, 07 Jul 2025 00:00:00 +0000

Graph Coloring #

Graph Coloring is the problem of assigning colors to vertices such that no two adjacent vertices have the same color, with applications in scheduling and frequency assignment.

Recent Literature #

Deep Learning-based Hybrid Graph-Coloring Algorithm for Register Allocation. Arxiv, 2019. paper

Das, Dibyendu and Ahmad, Shahid Asghar and Venkataramanan, Kumar.
Neural Models for Output-Space Invariance in Combinatorial Problems ICLR, 2022. paper

Nandwani, Yatin and Jain, Vidit and Singla, Parag and others
Enhancing Column Generation by a Machine-Learning-Based Pricing Heuristic for Graph Coloring AAAI, 2022. paper, code

Shen, Yunzhuang, Yuan Sun, Xiaodong Li, Andrew Craig Eberhard and Andreas T. Ernst.
Learning to Generate Columns with Application to Vertex Coloring ICLR, 2023. paper, code

Sun, Yuan and Ernst, Andreas T and Li, Xiaodong and Weiner, Jake

Graph Edit Distance (GED)

Mon, 07 Jul 2025 00:00:00 +0000

Graph Edit Distance (GED) #

Graph Edit Distance measures the minimum cost of transformations needed to change one graph into another. It has applications in pattern matching and graph similarity computation.

Recent Literature #

SimGNN - A Neural Network Approach to Fast Graph Similarity Computation WSDM, 2019. paper, code

Bai, Yunsheng and Ding, Hao and Bian, Song and Chen, Ting and Sun, Yizhou and Wang, Wei
Graph Matching Networks for Learning the Similarity of Graph Structured Objects ICML, 2019. paper, code

Li, Yujia and Gu, Chenjie and Dullien, Thomas and Vinyals, Oriol and Kohli, Pushmeet
Convolutional Embedding for Edit Distance SIGIR, 2020. paper, code

Dai, Xinyan and Yan, Xiao and Zhou, Kaiwen and Wang, Yuxuan and Yang, Han and Cheng, James
Learning-Based Efficient Graph Similarity Computation via Multi-Scale Convolutional Set Matching AAAI, 2020. paper, code

Bai, Yunsheng and Ding, Hao and Gu, Ken and Sun, Yizhou and Wang, Wei
⭐A Bi-Level Framework for Learning to Solve Combinatorial Optimization on Graphs NeurIPS, 2021. paper, code

Wang, Runzhong and Hua, Zhigang and Liu, Gan and Zhang, Jiayi and Yan, Junchi and Qi, Feng and Yang, Shuang and Zhou, Jun and Yang, Xiaokang
⭐Combinatorial Learning of Graph Edit Distance via Dynamic Embedding. CVPR, 2021. paper, code

Wang, Runzhong and Zhang, Tianqi and Yu, Tianshu and Yan, Junchi and Yang, Xiaokang.

Graph Matching (GM)

Mon, 07 Jul 2025 00:00:00 +0000

Graph Matching (GM) #

Graph Matching is a fundamental combinatorial optimization problem that involves finding correspondences between vertices of two graphs.

Recent Literature #

Revised Note on Learning Algorithms for Quadratic Assignment with Graph Neural Networks Arxiv, 2017. paper, code

Nowak, Alex and Villar, Soledad and Bandeira, S. Afonso and Bruna, Joan
Deep Learning of Graph Matching. CVPR, 2018. paper

Zanfir, Andrei and Sminchisescu, Cristian
⭐Learning Combinatorial Embedding Networks for Deep Graph Matching. ICCV, 2019. paper, code

Wang, Runzhong and Yan, Junchi and Yang, Xiaokang
Deep Graphical Feature Learning for the Feature Matching Problem. ICCV, 2019. paper

Zhang, Zhen and Lee, Wee Sun
GLMNet: Graph Learning-Matching Networks for Feature Matching. Arxiv, 2019. paper

Jiang, Bo and Sun, Pengfei and Tang, Jin and Luo, Bin
⭐Learning deep graph matching with channel-independent embedding and Hungarian attention. ICLR, 2020. paper, code

Yu, Tianshu and Wang, Runzhong and Yan, Junchi and Li, Baoxin
Deep Graph Matching Consensus. ICLR, 2020. paper

Fey, Matthias and Lenssen, Jan E. and Morris, Christopher and Masci, Jonathan and Kriege, Nils M.
⭐Graduated Assignment for Joint Multi-Graph Matching and Clustering with Application to Unsupervised Graph Matching Network Learning. NeurIPS, 2020. paper, code

Wang, Runzhong and Yan, Junchi and Yang, Xiaokang
⭐Combinatorial Learning of Robust Deep Graph Matching: An Embedding Based Approach. TPAMI, 2020. paper, code

Wang, Runzhong and Yan, Junchi and Yang, Xiaokang
Deep Graph Matching via Blackbox Differentiation of Combinatorial Solvers. ECCV, 2020. paper, code

Rolinek, Michal and Swoboda, Paul and Zietlow, Dominik and Paulus, Anselm and Musil, Vit and Martius, Georg
⭐Neural Graph Matching Network: Learning Lawler’s Quadratic Assignment Problem with Extension to Hypergraph and Multiple-graph Matching. TPAMI, 2021. paper, code

Wang, Runzhong and Yan, Junchi and Yang, Xiaokang
⭐Deep Latent Graph Matching ICML, 2021. paper

Yu, Tianshu and Wang, Runzhong and Yan, Junchi and Li, Baoxin.
IA-GM: A Deep Bidirectional Learning Method for Graph Matching AAAI, 2021. paper

Zhao, Kaixuan and Tu, Shikui and Xu, Lei
Deep Graph Matching under Quadratic Constraint CVPR, 2021. paper

Gao, Quankai and Wang, Fudong and Xue, Nan and Yu, Jin-Gang and Xia, Gui-Song
GAMnet: Robust Feature Matching via Graph Adversarial-Matching Network MM, 2021. paper

Jiang, Bo and Sun, Pengfei and Zhang, Ziyan and Tang, Jin and Luo, Bin
Hypergraph Neural Networks for Hypergraph Matching ICCV, 2021. paper

Liao, Xiaowei and Xu, Yong and Ling, Haibin
Learning to Match Features with Seeded Graph Matching Network ICCV, 2021. paper

Chen, Hongkai and Luo, Zixin and Zhang, Jiahui and Zhou, Lei and Bai, Xuyang and Hu, Zeyu and Tai, Chiew-Lan and Quan, Long
⭐Appearance and Structure Aware Robust Deep Visual Graph Matching: Attack, Defense and Beyond CVPR, 2022. paper, code

Ren, Qibing and Bao, Qingquan and Wang, Runzhong and Yan, Junchi
⭐Self-supervised Learning of Visual Graph Matching ECCV, 2022. paper, code

Liu, Chang and Zhang, Shaofeng and Yang, Xiaokang and Yan, Junchi
⭐Revocable Deep Reinforcement Learning with Affinity Regularization for Outlier-Robust Graph Matching. ICLR, 2023. paper, code

Liu, Chang and Jiang, Zetian and Wang, Runzhong and Yan, Junchi and Huang, Lingxiao and Lu, Pinyan
SeedGNN: Graph Neural Network for Supervised Seeded Graph Matching ICML, 2023. paper

Yu, Liren and Xu, Jiaming and Lin, Xiaojun
D2Match: Leveraging Deep Learning and Degeneracy for Subgraph Matching ICML, 2023. paper

Liu, Xuan, Lin Zhang, Jiaqi Sun, Yujiu Yang and Haiqing Yang
⭐LinSATNet: The Positive Linear Satisfiability Neural Networks ICML, 2023. paper, code

Runzhong Wang and Yunhao Zhang and Ziao Guo and Tianyi Chen and Xiaokang Yang and Junchi Yan
LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching NeurIPS, 2023. paper, code

Nguyen, Duy MH and Nguyen, Hoang and Diep, Nghiem T and Pham, Tan N and Cao, Tri and Nguyen, Binh T and Swoboda, Paul and Ho, Nhat and Albarqouni, Shadi and Xie, Pengtao and others
Improving Graph Matching with Positional Reconstruction Encoder-Decoder Network NeurIPS, 2023. paper

Zhou, Yixiao and Jia, Ruiqi and Lin, Hongxiang and Quan, Hefeng and Zhao, Yumeng and Lyu, Xiaoqing
Learning to Prune Instances of Steiner Tree Problem in Grap INOC, 2024. paper, code

Jiwei Zhang, Dena Tayebi, Saurabh Ray, Deepak Ajwani

Hamiltonian Cycle Problem (HCP)

Mon, 07 Jul 2025 00:00:00 +0000

Hamiltonian Cycle Problem (HCP) #

The Hamiltonian Cycle Problem seeks to find a cycle visiting each vertex exactly once. It is NP-complete and is fundamental to understanding NP-hardness.

Recent Literature #

⭐A Bi-Level Framework for Learning to Solve Combinatorial Optimization on Graphs NeurIPS, 2021. paper, code

Wang, Runzhong and Hua, Zhigang and Liu, Gan and Zhang, Jiayi and Yan, Junchi and Qi, Feng and Yang, Shuang and Zhou, Jun and Yang, Xiaokang
⭐UniCO: On Unified Combinatorial Optimization via Problem Reduction to Matrix-Encoded General TSP ICLR, 2025. paper, code

Wenzheng Pan, Hao Xiong, Jiale Ma, Wentao Zhao, Yang Li, Junchi Yan

Influence Maximization

Mon, 07 Jul 2025 00:00:00 +0000

Influence Maximization #

Influence Maximization seeks to select a set of influential nodes in a network to maximize information spread. It has applications in social network marketing.

Recent Literature #

Learning Heuristics over Large Graphs via Deep Reinforcement Learning. NeurIPS, 2020. paper

Mittal, Akash and Dhawan, Anuj and Manchanda, Sahil and Medya, Sourav and Ranu, Sayan and Singh, Ambuj.
Controlling Graph Dynamics with Reinforcement Learning and Graph Neural Networks. ICML, 2021. paper

Eli A. Meirom, Haggai Maron, Shie Mannor, Gal Chechik
LeNSE: Learning To Navigate Subgraph Embeddings for Large-Scale Combinatorial Optimisation ICML, 2022. paper, code

Ireland, David and G. Montana
⭐Towards One-shot Neural Combinatorial Solvers: Theoretical and Empirical Notes on the Cardinality-Constrained Case ICLR, 2023. paper, code

Wang, Runzhong and Shen, Li and Chen, Yiting and Yan, Junchi and Yang, Xiaokang and Tao, Dacheng
Deep Graph Representation Learning and Optimization for Influence Maximization ICML, 2023. paper

Chen Ling and Junji Jiang and Junxiang Wang and My T. Thai and Lukas Xue and James Song and Meikang Qiu and Liang Zhao

Job Shop Scheduling Problem (JSSP)

Mon, 07 Jul 2025 00:00:00 +0000

Job Shop Scheduling Problem (JSSP) #

The Job Shop Scheduling Problem is a classic combinatorial optimization problem where jobs must be scheduled on machines with precedence constraints.

Recent Literature #

Smart Manufacturing Scheduling With Edge Computing Using Multiclass Deep Q Network Transactions on Industrial Informatics, 2019. journal

Chun-Cheng Lin, Der-Jiunn Deng, Yen-Ling Chih, Hsin-Ting Chiu
Multi-Agent Reinforcement Learning for Job Shop Scheduling in Flexible Manufacturing Systems International Conference on Artificial Intelligence for Industries (AI4I), 2019. paper

Schirin Baer, Jupiter Bakakeu, Richard Meyes, Tobias Meisen
Learning to Dispatch for Job Shop Scheduling via Deep Reinforcement Learning. NeurIPS, 2020. paper, code

Zhang, Cong and Song, Wen and Cao, Zhiguang and Zhang, Jie and Tan, Puay Siew and Xu, Chi.
ScheduleNet: Learn to Solve Multi-agent Scheduling Problems with Reinforcement Learning Arxiv, 2021. paper

Junyoung Park, Sanjar Bakhtiyar, Jinkyoo Park
Dynamic job-shop scheduling in smart manufacturing using deep reinforcement learning Computer Networks, 2021. journal

Libing Wang, Xin Hu, Yin Wang, Sujie Xu, Shijun Ma, Kexin Yang, Zhijun Liu, Weidong Wang
Learning to schedule job-shop problems: Representation and policy learning using graph neural network and reinforcement learning. International Journal of Production Research, 2021. journal

Junyoung Park, Jaehyeong Chun, Sang Hun Kim, Youngkook Kim, Jinkyoo Park
Explainable reinforcement learning in production control of job shop manufacturing system. International Journal of Production Research, 2021. journal

Andreas Kuhnle,Marvin Carl May,Louis Sch?fer & Gisela Lanza
DeepACO: Neural-enhanced Ant Systems for Combinatorial Optimization NeurIPS, 2023. paper, code

Ye, Haoran and Wang, Jiarui and Cao, Zhiguang and Liang, Helan and Li, Yong
Winner Takes It All: Training Performant RL Populations for Combinatorial Optimization NeurIPS, 2023. paper

Grinsztajn, Nathan and Furelos-Blanco, Daniel and Surana, Shikha and Bonnet, Cl{'e}ment and Barrett, Thomas D
Combinatorial Optimization with Policy Adaptation using Latent Space Search NeurIPS, 2023. paper

Chalumeau, Felix and Surana, Shikha and Bonnet, Cl{'e}ment and Grinsztajn, Nathan and Pretorius, Arnu and Laterre, Alexandre and Barrett, Thomas D
Neural DAG Scheduling via One-Shot Priority Sampling ICLR, 2023. paper

Jeon, Wonseok and Gagrani, Mukul and Bartan, Burak and Zeng, Weiliang Will and Teague, Harris and Zappi, Piero and Lott, Christopher
Robust Scheduling with GFlowNets ICLR, 2023. paper

Zhang, David W and Rainone, Corrado and Peschl, Markus and Bondesan, Roberto
Continual Task Allocation in Meta-Policy Network via Sparse Prompting ICML, 2023. paper

Yang, Yijun, Tianyi Zhou, Jing Jiang, Guodong Long and Yuhui Shi.
Applicability of Neural Combinatorial Optimization: A Critical View TELO, 2024. journal, code

Andoni I. Garmendia, Josu Ceberio, Alexander Mendiburu

Knapsack Problem

Mon, 07 Jul 2025 00:00:00 +0000

Knapsack Problem #

The Knapsack Problem is a classic optimization problem where items with weights and values must be selected to maximize total value while respecting a weight constraint.

Recent Literature #

A Novel Method to Solve Neural Knapsack Problems ICML, 2021. paper, code

Li Duanshun and Liu Jing and Lee Dongeun and Seyedmazloom Ali and Kaushik Giridhar and Lee Kookjin and Park Noseong
DeepACO: Neural-enhanced Ant Systems for Combinatorial Optimization NeurIPS, 2023. paper, code

Ye, Haoran and Wang, Jiarui and Cao, Zhiguang and Liang, Helan and Li, Yong
Winner Takes It All: Training Performant RL Populations for Combinatorial Optimization NeurIPS, 2023. paper

Grinsztajn, Nathan and Furelos-Blanco, Daniel and Surana, Shikha and Bonnet, Clément and Barrett, Thomas D
Efficient Meta Neural Heuristic for Multi-Objective Combinatorial Optimization NeurIPS, 2023. paper, code

Chen, Jinbiao and Wang, Jiahai and Zhang, Zizhen and Cao, Zhiguang and Ye, Te and Chen, Siyuan
BQ-NCO: Bisimulation Quotienting for Efficient Neural Combinatorial Optimization NeurIPS, 2023. paper, code

Drakulic, Darko and Michel, Sofia and Mai, Florian and Sors, Arnaud and Andreoli, Jean-Marc
Neural Multi-Objective Combinatorial Optimization with Diversity Enhancement NeurIPS, 2023. paper, code

Chen, Jinbiao and Zhang, Zizhen and Cao, Zhiguang and Wu, Yaoxin and Ma, Yining and Ye, Te and Wang, Jiahai
Rethinking Neural Multi-Objective Combinatorial Optimization via Neat Weight Embedding ICLR, 2025. paper

Jinbiao Chen, Zhiguang Cao, Jiahai Wang, Yaoxin Wu, Hanzhang Qin, Zizhen Zhang, Yue-Jiao Gong
Approximation algorithms for combinatorial optimization with predictions ICLR, 2025. paper

Antonios Antoniadis, Marek Elias, Adam Polak, Moritz Venzin

Max Clique

Mon, 07 Jul 2025 00:00:00 +0000

Max Clique #

The Maximum Clique problem seeks the largest clique in a graph. A clique is a subset of vertices where every vertex is connected to every other vertex.

Recent Literature #

Can Hybrid Geometric Scattering Networks Help Solve the Maximum Clique Problem NeurIPS, 2022. paper, code

Yimeng Min, Frederik Wenkel, Michael Perlmutter, Guy Wolf
Variational Annealing on Graphs for Combinatorial Optimization NeurIPS, 2023. paper, code

Sanokowski, Sebastian and Berghammer, Wilhelm Franz and Hochreiter, Sepp and Lehner, Sebastian
DISCS: A Benchmark for Discrete Sampling NeurIPS, 2023. paper, code

Katayoon Goshvadi, Haoran Sun, Xingchao Liu, Azade Nova, Ruqi Zhang, Will Sussman Grathwohl, Dale Schuurmans, Hanjun Dai
Learning fine-grained search space pruning and heuristics for combinatorial optimization. Journal of Heuristics, 2023. journal

Juho Lauri, Sourav Dutta, Marco Grassia, Deepak Ajwani
A Diffusion Model Framework for Unsupervised Neural Combinatorial Optimization ICML, 2024. paper, code

Sanokowski, Sebastian and Hochreiter, Sepp and Lehner, Sebastian
Scalable Discrete Diffusion Samplers: Combinatorial Optimization and Statistical Physics ICLR, 2025. paper

Sebastian Sanokowski, Wilhelm Franz Berghammer, Haoyu Peter Wang, Martin Ennemoser, Sepp Hochreiter, Sebastian Lehner
Approximation algorithms for combinatorial optimization with predictions ICLR, 2025. paper

Antonios Antoniadis, Marek Elias, Adam Polak, Moritz Venzin
⭐COExpander: Adaptive Solution Expansion for Combinatorial Optimization ICML, 2025. paper, code

Jiale Ma and Wenzheng Pan and Yang Li and Junchi Yan
⭐ML4CO-Bench-101: Benchmark Machine Learning for Classic Combinatorial Problems on Graphs NeurIPS, 2025. paper, code

Jiale Ma and Wenzheng Pan and Yang Li and Junchi Yan

Maximal Common Subgraph (MCS)

Mon, 07 Jul 2025 00:00:00 +0000

Maximal Common Subgraph (MCS) #

The Maximal Common Subgraph problem finds the largest subgraph common to two graphs, with applications in molecular matching and pattern discovery.

Recent Literature #

Fast Detection of Maximum Common Subgraph via Deep Q-Learning. Arxiv, 2020. paper

Bai, Yunsheng and Xu, Derek and Wang, Alex and Gu, Ken and Wu, Xueqing and Marinovic, Agustin and Ro, Christopher and Sun, Yizhou and Wang, Wei.

Maximal Cut (Max-Cut)

Mon, 07 Jul 2025 00:00:00 +0000

Maximal Cut (Max-Cut) #

The Maximal Cut problem is to partition the vertices of a graph into two sets to maximize the number of edges between them. It’s a fundamental problem in combinatorial optimization.

Recent Literature #

Learning Combinatorial Optimization Algorithms over Graphs. NeurIPS, 2017. paper

Dai, Hanjun and Khalil, Elias B and Zhang, Yuyu and Dilkina, Bistra and Song, Le
Exploratory Combinatorial Optimization with Reinforcement Learning. AAAI, 2020. paper

LBarrett, Thomas and Clements, William and Foerster, Jakob and Lvovsky, Alex.
Erdos Goes Neural: an Unsupervised Learning Framework for Combinatorial Optimization on Graphs. NeurIPS, 2020. paper

Karalias, Nikolaos and Loukas, Andreas
Reversible Action Design for Combinatorial Optimization with Reinforcement Learning Arxiv, 2021. paper

Yao, Fan and Cai, Renqin and Wang, Hongning
LeNSE: Learning To Navigate Subgraph Embeddings for Large-Scale Combinatorial Optimisation ICML, 2022. paper, code

Ireland, David and G. Montana
Learning to Solve Combinatorial Graph Partitioning Problems via Efficient Exploration Arxiv, 2022. paper, code

Barrett, Thomas D and Parsonson, Christopher WF and Laterre, Alexandre
Revisiting Sampling for Combinatorial Optimization ICML, 2023. paper

Sun, Haoran, Goshvadi Katayoon,Nova Azade,Schuurmans Dale and Dai Hanjun.
Optimizing Solution-Samplers for Combinatorial Problems: The Landscape of Policy-Gradient Methods NeurIPS, 2023. paper

Caramanis, Constantine and Fotakis, Dimitris and Kalavasis, Alkis and Kontonis, Vasilis and Tzamos, Christos
Neural Improvement Heuristics for Graph Combinatorial Optimization Problems TNNLS, 2023. journal

Andoni I. Garmendia, Josu Ceberio, Alexander Mendiburu
Let the Flows Tell: Solving Graph Combinatorial Optimization Problems with GFlowNets NeurIPS, 2023. paper, code

Dinghuai Zhang, Hanjun Dai, Nikolay Malkin, Aaron Courville, Yoshua Bengio, Ling Pan
Variational Annealing on Graphs for Combinatorial Optimization NeurIPS, 2023. paper, code

Sanokowski, Sebastian and Berghammer, Wilhelm Franz and Hochreiter, Sepp and Lehner, Sebastian
DISCS: A Benchmark for Discrete Sampling NeurIPS, 2023. paper

Katayoon Goshvadi, Haoran Sun, Xingchao Liu, Azade Nova, Ruqi Zhang, Will Sussman Grathwohl, Dale Schuurmans, Hanjun Dai
MARCO: A Memory-Augmented Reinforcement Framework for Combinatorial Optimization IJCAl, 2024. paper, code

Andoni I. Garmendia, Quentin Cappart, Josu Ceberio, Alexander Mendiburu
Controlling Continuous Relaxation for Combinatorial Optimization NeurIPS, 2024. paper

Yuma Ichikawa
Efficient Combinatorial Optimization via Heat Diffusion NeurIPS, 2024. paper

Hengyuan Ma, Wenlian Lu, Jianfeng Feng
⭐COExpander: Adaptive Solution Expansion for Combinatorial Optimization ICML, 2025. paper, code

Jiale Ma and Wenzheng Pan and Yang Li and Junchi Yan
⭐ML4CO-Bench-101: Benchmark Machine Learning for Classic Combinatorial Problems on Graphs NeurIPS, 2025. paper, code

Jiale Ma and Wenzheng Pan and Yang Li and Junchi Yan

Maximum Independent Set

Mon, 07 Jul 2025 00:00:00 +0000

Maximum Independent Set #

The Maximum Independent Set problem is about finding the largest subset of vertices in a graph with no edges between them. It’s an NP-hard problem with important applications.

Recent Literature #

Combinatorial Optimization with Graph Convolutional Networks and Guided Tree Search. NeurIPS, 2018. paper

Li, Zhuwen and Chen, Qifeng and Koltun, Vladlen.
Learning What to Defer for Maximum Independent Sets ICML, 2020. paper

Ahn, Sungsoo and Seo, Younggyo and Shin, Jinwoo
Distributed Scheduling Using Graph Neural Networks ICASSP, 2021. paper

Zhao, Zhongyuan and Verma, Gunjan and Rao, Chirag and Swami, Ananthram and Segarra, Santiago
Solving Graph-based Public Good Games with Tree Search and Imitation Learning NeurIPS, 2021. paper

Darvariu, Victor-Alexandru and Hailes, Stephen and Musolesi, Mirco
NN-Baker: A Neural-network Infused Algorithmic Framework for Optimization Problems on Geometric Intersection Graphs NeurIPS, 2021. paper

McCarty, Evan and Zhao, Qi and Sidiropoulos, Anastasios and Wang, Yusu
What’s Wrong with Deep Learning in Tree Search for Combinatorial Optimization ICLR, 2022. paper, code

Bother, Maximilian and Kissig, Otto and Taraz, Martin and Cohen, Sarel and Seidel, Karen and Friedrich, Tobias
Optimistic tree search strategies for black-box combinatorial optimization NeurIPS, 2022. paper

Malherbe, Cedric and Grosnit, Antoine and Tutunov, Rasul and Ammar, Haitham Bou and Wang, Jun
⭐ROCO: A General Framework for Evaluating Robustness of Combinatorial Optimization Solvers on Graphs ICLR, 2023. paper, code

Lu, Han and Li, Zenan and Wang, Runzhong and Ren, Qibing and Li, Xijun and Yuan, Mingxuan and Zeng, Jia and Yang, Xiaokang and Yan, Junchi
Revisiting Sampling for Combinatorial Optimization ICML, 2023. paper

Sun, Haoran, Goshvadi Katayoon,Nova Azade,Schuurmans Dale and Dai Hanjun.
DIFUSCO: Graph-based Diffusion Solvers for Combinatorial Optimization NeurIPS, 2023. paper, code

Zhiqing Sun, Yiming Yang
⭐T2T: From Distribution Learning in Training to Gradient Search in Testing for Combinatorial Optimization NeurIPS, 2023. paper, code

Yang Li, Jinpei Guo, Runzhong Wang, Junchi Yan
Unsupervised Learning for Combinatorial Optimization Needs Meta Learning ICLR, 2023. paper, code

Wang, Haoyu and Li, Pan
Graph-based Deterministic Policy Gradient for Repetitive Combinatorial Optimization Problems ICLR, 2023. paper, code

Zhao, Zhongyuan and Swami, Ananthram and Segarra, Santiago
Let the Flows Tell: Solving Graph Combinatorial Optimization Problems with GFlowNets NeurIPS, 2023. paper, code

Dinghuai Zhang, Hanjun Dai, Nikolay Malkin, Aaron Courville, Yoshua Bengio, Ling Pan
Variational Annealing on Graphs for Combinatorial Optimization NeurIPS, 2023. paper, code

Sanokowski, Sebastian and Berghammer, Wilhelm Franz and Hochreiter, Sepp and Lehner, Sebastian
Maximum Independent Set: Self-Training through Dynamic Programming NeurIPS, 2023. paper, code

Brusca, Lorenzo and Quaedvlieg, Lars CPM and Skoulakis, Stratis and Chrysos, Grigorios G and Cevher, Volkan
DISCS: A Benchmark for Discrete Sampling NeurIPS, 2023. paper, code

Katayoon Goshvadi, Haoran Sun, Xingchao Liu, Azade Nova, Ruqi Zhang, Will Sussman Grathwohl, Dale Schuurmans, Hanjun Dai
MARCO: A Memory-Augmented Reinforcement Framework for Combinatorial Optimization IJCAI, 2024. paper, code

Andoni I. Garmendia, Quentin Cappart, Josu Ceberio, Alexander Mendiburu
⭐Fast T2T: Optimization Consistency Speeds Up Diffusion-Based Training-to-Testing Solving for Combinatorial Optimization NeurIPS, 2024. paper, code

Yang Li, Jinpei Guo, Runzhong Wang, Hongyuan Zha, Junchi Yan
Controlling Continuous Relaxation for Combinatorial Optimization NeurIPS, 2024. paper

Yuma Ichikawa
Distributed Constrained Combinatorial Optimization leveraging Hypergraph Neural Networks Nature Machine Intelligence, 2024. paper, code

Nasimeh Heydaribeni, Xinrui Zhan, Ruisi Zhang, Tina Eliassi-Rad, Farinaz Koushanfar
Efficient Combinatorial Optimization via Heat Diffusion NeurIPS, 2024. paper

Hengyuan Ma, Wenlian Lu, Jianfeng Feng
A Diffusion Model Framework for Unsupervised Neural Combinatorial Optimization ICML, 2024. paper, code

Sanokowski, Sebastian and Hochreiter, Sepp and Lehner, Sebastian
Scalable Discrete Diffusion Samplers: Combinatorial Optimization and Statistical Physics ICLR, 2025. paper

Sebastian Sanokowski, Wilhelm Franz Berghammer, Haoyu Peter Wang, Martin Ennemoser, Sepp Hochreiter, Sebastian Lehner
⭐COExpander: Adaptive Solution Expansion for Combinatorial Optimization ICML, 2025. paper, code

Jiale Ma and Wenzheng Pan and Yang Li and Junchi Yan
⭐ML4CO-Bench-101: Benchmark Machine Learning for Classic Combinatorial Problems on Graphs NeurIPS, 2025. paper, code

Jiale Ma and Wenzheng Pan and Yang Li and Junchi Yan

Metric $k$-center

Mon, 07 Jul 2025 00:00:00 +0000

General $k$-center problem statement: Let $(X, d)$ be a metric space where $X$ is a set and $d$ is a metric. A set $V \subseteq X$ is provided together with a parameter $k$. The goal is to find a subset $C \subseteq V$ with $|C| = k$ such that the maximum distance of a point in $V$ to the closest point in $C$ is minimized. The problem can be formally defined as follows:

Input: a set $V \subseteq X$, and a parameter $k$.
Output: a set $C \subseteq V$ of $k$ points.
Goal: Minimize the cost $r^C(V) = \max_{v \in V} d(v, C)$

The k-Center Clustering problem can also be defined on a complete undirected graph $G = (V, E)$ as follows:

The $k$-Center Clustering problem: Given a complete undirected graph $G = (V, E)$ with distances $d(v_i, v_j) \in \mathbb{N}$ satisfying the triangle inequality, find a subset $C \subseteq V$ with $|C| = k$ while minimizing:

$$ \max_{v \in V} \min_{c \in C} d(v, c) $$

Mixed Integer Programming (MIP)

Mon, 07 Jul 2025 00:00:00 +0000

Mixed Integer Programming (MIP) #

Mixed Integer Programming is a fundamental optimization framework widely used in operations research. Machine learning approaches are being applied to improve MIP solvers.

Recent Literature #

Sequential model-based optimization for general algorithm configuration International conference on learning and intelligent optimization, 2011. journal

Hutter, Frank and Hoos, Holger H and Leyton-Brown, Kevin
Non-model-based Search Guidance for Set Partitioning Problems AAAI, 2012. paper

Kadioglu, Serdar and Malitsky, Yuri and Sellmann, Meinolf
A Aupervised Machine Learning Approach to Variable Branching in Branch-and-bound Citeseer, 2014. journal

Alvarez, Alejandro Marcos and Louveaux, Quentin and Wehenkel, Louis
Learning to Search in Branch-and-Bound Algorithms NeurIPS, 2014. paper

He, He and Daume III, Hal and Eisner, Jason M
Learningto Branch in Mixed Integer Programming AAAI, 2016. paper

E. B. Khalil, P. L. Bodic, L. Song, G. Nemhauser, B. Dilkina
Dash: Dynamic Approach for Switching Heuristics European Journal of Operational Research, 2016. journal

Di Liberto, Giovanni and Kadioglu, Serdar and Leo, Kevin and Malitsky, Yuri
Learning When to Use a Decomposition International conference on AI and OR techniques in constraint programming for combinatorial optimization problems, 2017. journal

Kruber, Markus and L{\u}bbecke Marco E and Parmentier Axel"
Learning to Run Heuristics in Tree Search IJCAI, 2017. paper

Khalil, Elias B and Dilkina, Bistra and Nemhauser, George L and Ahmed, Shabbir and Shao, Yufen
Exact Combinatorial Optimization with Graph Convolutional Neural Networks NeurIPS, 2019. paper, code

Gasse, Maxime and Chetelat, Didier and Ferroni, Nicola and Charlin, Laurent and Lodi, Andrea
Improving Learning to Branch via Reinforcement Learning NeurIPS Workshop, 2020. paper

Sun, Haoran and Chen, Wenbo and Li, Hui and Song, Le.
Reinforcement learning for variable selection in a branch and bound algorithm International Conference on Integration of Constraint Programming, 2020. journal

Etheve, Marc and Al{`e}s, Zacharie and Bissuel, C{^o}me and Juan, Olivier and Kedad-Sidhoum, Safia
Random sampling and machine learning to understand good decompositions Annals of Operations Research, 2020. journal

Basso, Saverio and Ceselli, Alberto and Tettamanzi, Andrea
Hybrid Models for Learning to Branch NeurIPS, 2020. paper, code

Gupta, Prateek and Gasse, Maxime and Khalil, Elias B and Kumar, M Pawan and Lodi, Andrea and Bengio, Yoshua
Reinforcement Learning for Integer Programming: Learning to Cut ICML, 2020. paper

Tang, Yunhao and Agrawal, Shipra and Faenza, Yuri
Solving Mixed Integer Programs Using Neural Networks Arxiv, 2020. paper

Nair, Vinod and Bartunov, Sergey and Gimeno, Felix and von Glehn, Ingrid and Lichocki, Pawel and Lobov, Ivan and O’Donoghue, Brendan and Sonnerat, Nicolas and Tjandraatmadja, Christian and Wang, Pengming and others
Learning Efficient Search Approximation in Mixed Integer Branch and Bound Arxiv, 2020. paper

Yilmaz, Kaan and Yorke-Smith, Neil
Learning a Large Neighborhood Search Algorithm for Mixed Integer Programs Arxiv, 2020. paper

Sonnerat, Nicolas and Wang, Pengming and Ktena, Ira and Bartunov, Sergey and Nair, Vinod
A General Large Neighborhood Search Framework for Solving Integer Linear Programs NeurIPS, 2020. paper

Song, Jialin and Lanka, Ravi and Yue, Yisong and Dilkina, Bistra
Neural Large Neighborhood Search NeurIPS Workshop, 2020. paper

Nair, Vinod and Alizadeh, Mohammad and others
Accelerating Primal Solution Findings for Mixed Integer Programs Based on Solution Prediction AAAI, 2020. paper

Ding, Jian-Ya, Chao Zhang, Lei Shen, Shengyin Li, Bing Wang, Yinghui Xu, and Le Song
CombOptNet: Fit the Right NP-Hard Problem by Learning Integer Programming Constraints Arxiv, 2021. paper, code

Paulus, Anselm and Rolinek, Michal and Musil, Vit and Amos, Brandon and Martius, Georg
Reinforcement Learning for (Mixed) Integer Programming: Smart Feasibility Pump ICML Workshop, 2021. paper

Qi, Meng and Wang, Mengxin and Shen, Zuo-Jun
Parameterizing Branch-and-Bound Search Trees to Learn Branching Policies AAAI, 2021. paper, code

Zarpellon, Giulia and Jo, Jason and Lodi, Andrea and Bengio, Yoshua
Learning to Select Cuts for Efficient Mixed-Integer Programming Arxiv, 2021. journal

Huang, Zeren and Wang, Kerong and Liu, Furui and Zhen, Hui-ling and Zhang, Weinan and Yuan, Mingxuan and Hao, Jianye and Yu, Yong and Wang, Jun
Confidence Threshold Neural Diving NeurIPS ML4CO Competition Workshop, 2021. paper

Taehyun Yoon
Learning large neighborhood search policy for integer programming NeurIPS, 2021. paper

Wu, Yaoxin and Song, Wen and Cao, Zhiguang and Zhang, Jie
Generative Deep Learning for Decision Making in Gas Networks Arxiv, 2021. paper

Lovis Anderson and Mark Turner and Thorsten Koch
Offline Constraint Screening for Online Mixed-integer Optimization Arxiv, 2021. paper

Asunción Jiménez-Cordero and Juan Miguel Morales and Salvador Pineda
Mixed Integer Programming versus Evolutionary Computation for Optimizing a Hard Real-World Staff Assignment Problem ICAPS, 2021. paper

Peters, Jannik and Stephan, Daniel and Amon, Isabel and Gawendowicz, Hans and Lischeid, Julius and Salabarria, Lennart and Umland, Jonas and Werner, Felix and Krejca, Martin S and Rothenberger, Ralf and others
Learning To Scale Mixed-Integer Programs AAAI, 2021. paper

Berthold, Timo, and Gregor Hendel
Learning Pseudo-Backdoors for Mixed Integer Programs AAAI, 2021. paper

Aaron Ferber and Jialin Song and Bistra Dilkina and Yisong Yue
Learning Primal Heuristics for Mixed Integer Programs IJCNN, 2021. paper

Shen, Yunzhuang and Sun, Yuan and Eberhard, Andrew and Li, Xiaodong
Learning to Solve Large-scale Security-constrained Unit Commitment Problems INFORMS Journal on Computing, 2021. journal

Xavier, {'A}linson S and Qiu, Feng and Ahmed, Shabbir
Learning to Branch with Tree MDPs Arxiv, 2022. paper, code

Scavuzzo, Lara, F. Chen, Didier Ch’etelat, Maxime Gasse, Andrea Lodi, N. Yorke-Smith and Karen Aardal.
A Deep Reinforcement Learning Framework For Column Generation Arxiv, 2022. paper

Chi, Cheng, Amine Mohamed Aboussalah, Elias Boutros Khalil, Juyoung Wang and Zoha Sherkat-Masoumi.
Ranking Constraint Relaxations for Mixed Integer Programs Using a Machine Learning Approach Arxiv, 2022. journal

Weiner, Jake, Andreas T. Ernst, Xiaodong Li and Yuan Sun.
Learning to Accelerate Approximate Methods for Solving Integer Programming via Early Fixing Arxiv, 2022. journal, code

Li, Longkang and Baoyuan Wu.
Learning to Cut by Looking Ahead: Cutting Plane Selection via Imitation Learning ICML, 2022. paper

Paulus, Max B., Giulia Zarpellon, Andreas Krause, Laurent Charlin and Chris J. Maddison.
Lookback for Learning to Branch Arxiv, 2022. journal

Gupta, Prateek, Elias Boutros Khalil, Didier Chet’elat, Maxime Gasse, Yoshua Bengio, Andrea Lodi and M. Pawan Kumar.
Learning to Search in Local Branching AAAI, 2022. paper, code

Liu, Defeng and Fischetti, Matteo and Lodi, Andrea
Deep Reinforcement Learning for Exact Combinatorial Optimization: Learning to Branch Arxiv, 2022. paper

Zhang, Tianyu and Banitalebi-Dehkordi, Amin and Zhang, Yong
Learning to Branch with Tree-aware Branching Transformers Knowledge-Based Systems, 2022. journal, code

Lin, Jiacheng and Zhu, Jialin and Wang, Huangang and Zhang, Tao
An Improved Reinforcement Learning Algorithm for Learning to Branch Arxiv, 2022. paper

Qu, Qingyu and Li, Xijun and Zhou, Yunfan and Zeng, Jia and Yuan, Mingxuan and Wang, Jie and Lv, Jinhu and Liu, Kexin and Mao, Kun
Learning to Use Local Cuts Arxiv, 2022. paper

Berthold, Timo and Francobaldi, Matteo and Hendel, Gregor
DOGE-Train: Discrete Optimization on GPU with End-to-end Training Arxiv, 2022. paper

Abbas, Ahmed and Swoboda, Paul
Structural Analysis of Branch-and-Cut and the Learnability of Gomory Mixed Integer Cuts NeurIPS, 2022. paper

Balcan, Maria-Florina and Prasad, Siddharth and Sandholm, Tuomas and Vitercik, Ellen
Constrained Discrete Black-Box Optimization using Mixed-Integer Programming ICML, 2022. paper

Papalexopoulos, Theodore, Christian Tjandraatmadja, Ross Anderson, Juan Pablo Vielma and Daving Belanger.
A GNN-Guided Predict-and-Search Framework for Mixed-Integer Linear Programming ICLR, 2023. paper, code

Han, Qingyu and Yang, Linxin and Chen, Qian and Zhou, Xiang and Zhang, Dong and Wang, Akang and Sun, Ruoyu and Luo, Xiaodong
Learning Cut Selection for Mixed-Integer Linear Programming via Hierarchical Sequence Model ICLR, 2023. paper, code

Wang, Zhihai and Li, Xijun and Wang, Jie and Kuang, Yufei and Yuan, Mingxuan and Zeng, Jia and Zhang, Yongdong and Wu, Feng
On Representing Mixed-Integer Linear Programs by Graph Neural Networks ICLR, 2023. paper, code

Ziang Chen, Jialin Liu, Xinshang Wang, Wotao Yin
GNN-GBDT-Guided Fast Optimizing Framework for Large-scale Integer Programming ICML, 2023. paper, code

Huigen Ye, Hua Xu, Hongyan Wang, Chengming Wang, Yu Jiang
Searching Large Neighborhoods for Integer Linear Programs with Contrastive Learning ICML, 2023. paper, code

Taoan Huang, Aaron M Ferber, Yuandong Tian, Bistra Dilkina, Benoit Steiner
Learning to Configure Separators in Branch-and-Cut NeurIPS, 2023. paper

Li, Sirui and Ouyang, Wenbin and Paulus, Max B and Wu, Cathy
Learning to Dive in Branch and Bound NeurIPS, 2023. paper

Paulus, Max B and Krause, Andreas
A Deep Instance Generative Framework for MILP Solvers Under Limited Data Availability NeurIPS, 2023. paper, code

Geng, Zijie and Li, Xijun and Wang, Jie and Li, Xiao and Zhang, Yongdong and Wu, Feng
Scalable Primal Heuristics Using Graph Neural Networks for Combinatorial Optimization JAIR, 2024. journal, code

Canturk, Furkan and Varol, Taha and Aydogan, Reyhan and Ozener, Okan O

Optimal Power Flow

Mon, 07 Jul 2025 00:00:00 +0000

Optimal Power Flow #

Optimal Power Flow (OPF) is a fundamental problem in power systems optimization, determining the setpoints for generators to supply electricity while minimizing costs and satisfying physical and operational constraints.

Recent Literature #

Learning-based Optimal Power Flow ICLR, 2023. paper, code

Yunqi Ding, Kai Wang, Yuanzhang Xiao, Dongyu Zhang
Physics-Informed Neural Networks for Power Systems in the Presence of Uncertainty IEEE Power & Energy Society General Meeting, 2023. paper

Javed Nasir, Yanlong Sun, Johannes Pschera, Luis Ochoa
Federated Learning for Optimal Power Flow in Smart Grids IEEE Access, 2023. paper

Shuiqing Liu, Ying Tan, Wei Liu, Yuntao Liu

Orienteering Problem (OP)

Mon, 07 Jul 2025 00:00:00 +0000

Orienteering Problem (OP) #

The Orienteering Problem involves selecting a subset of locations to visit with profit maximization subject to distance constraints.

Recent Literature #

A reinforcement learning approach to the orienteering problem with time windows Computers & Operations Research, 2021. paper, code

Ricardo Gama, Hugo L. Fernandes
Meta-SAGE: Scale Meta-Learning Scheduled Adaptation with Guided Exploration for Mitigating Scale Shift on Combinatorial Optimization ICML, 2023. paper

Son, Jiwoo and Kim, Minsu and Kim, Hyeonah and Park, Jinkyoo
DeepACO: Neural-enhanced Ant Systems for Combinatorial Optimization NeurIPS, 2023. paper, code

Ye, Haoran and Wang, Jiarui and Cao, Zhiguang and Liang, Helan and Li, Yong
UDC: A Unified Neural Divide-and-Conquer Framework for Large-Scale Combinatorial Optimization Problems NeurIPS, 2024. paper, code

Zhi Zheng, Changliang Zhou, Tong Xialiang, Mingxuan Yuan, Zhenkun Wang

Portfolio Optimization (PortOpt)

Mon, 07 Jul 2025 00:00:00 +0000

Portfolio Optimization (PortOpt) #

Portfolio Optimization is about selecting and managing assets to achieve financial goals. Machine learning is increasingly being applied to improve portfolio management strategies.

Recent Literature #

⭐LinSATNet: The Positive Linear Satisfiability Neural Networks ICML, 2023. paper, code

Runzhong Wang and Yunhao Zhang and Ziao Guo and Tianyi Chen and Xiaokang Yang and Junchi Yan
Integrating prediction in mean-variance portfolio optimization Quantitative Finance, 2023. paper

Butler, Andrew and Kwon, Roy H
⭐Towards One-shot Neural Combinatorial Solvers: Theoretical and Empirical Notes on the Cardinality-Constrained Case ICLR, 2023. paper, code

Wang, Runzhong and Shen, Li and Chen, Yiting and Yan, Junchi and Yang, Xiaokang and Tao, Dacheng

Predict+Optimize

Mon, 07 Jul 2025 00:00:00 +0000

Predict+Optimize #

Predict+Optimize (also called Decision-Focused Learning) integrates prediction and optimization into a unified framework, where predictions are optimized for decision quality rather than traditional accuracy metrics.

Recent Literature #

Predict then Optimize Operations Research, 2021. paper, code

Adam Elmachtoub, Paul Grigas
Decision-Focused Learning of Robust Predictive Models ICML, 2019. paper, code

Adam N. Elmachtoub, Paul Grigas
Optimization-Based Algorithms for Decision-Focused Evaluation ICML, 2021. paper, code

Yochanan Kotary, Yehuda Navon, Atara Nowik, Yaron Lipman
Decision-Focused Learning with Offline Data NeurIPS, 2022. paper, code

Rian Bruce, Anirudh Jayakumar, Milind Tambe, David Abel
Learning to Optimize in Finance with Large Language Models NeurIPS, 2023. paper

Yizhi Li, Yintao Qi, Zhaozhun Cheng, Yishi Xu
Decision-Focused Learning with Reinforcement Learning ICML, 2023. paper, code

Yochanan Kotary, Anirudh Jayakumar, Milan Yuchao Li, Yaron Lipman
Learning to Minimize Resources for Prediction NeurIPS, 2023. paper

Damien Scieur, Maximilian Balandat, Tom Everitt, Yisong Yue
End-to-End Learning for Optimization-Based Control ICLR, 2019. paper, code

Brandon Amos, Ivan Duriskovic, Gavin Kerrigan, J. Zico Kolter
Learning to Minimize Regret in Convex Games NeurIPS, 2021. paper, code

Guanghui Huang, Johan Suksman, Kai Zhou, Tony Cai
Learning Optimal Thresholds Via Distributionally Robust Optimization AISTATS, 2023. paper

Stefan Ankirchner, Reza Mahmoudi, Sven Wang
Predict then Optimize for Power Systems Climate Change AI, 2021. paper

Xiaobing Sun, Matija Jovanovic, Tongxin Li, Chaoyue Zhao
Decision-Focused Prediction with Limited Information NeurIPS, 2022. paper

Yao Xie, Felipe Caro, Xinya Liang, Yang Liu, Nicholas G Polson
Optimization-Based Prediction with Applications to Wind Energy JMLR, 2020. paper

Adam Elmachtoub, Paul Grigas, Suhrid Balakrishnan
Differentiable Learning of Integer Programs for Portfolio Optimization NeurIPS, 2022. paper, code

Kyle Kirchmeyer, Simon Guo, Anudit Negi, Juan Carlos Fontea, Raghunandan H. Koppula, Dan Feldman
Integrating Deep Learning with Logic Fusion for Information Extraction ACL, 2023. paper

Ruixuan Xiao, Boyang Liu, Hailong Sun, Weiwen Liu, Gang Tang, Jing Huang
Learning with Optimization-Based Uncertainty Estimates for Imbalanced Classification NeurIPS, 2022. paper

Haozhe Sun, Shaoyu Wang, Jiaqi Ma, Chen Gong, Chen Tian

Quadratic Assignment Problem (QAP)

Mon, 07 Jul 2025 00:00:00 +0000

Quadratic Assignment Problem (QAP) #

The Quadratic Assignment Problem is a classical NP-hard combinatorial optimization problem with applications in location theory and circuit design.

Recent Literature #

Revised Note on Learning Algorithms for Quadratic Assignment with Graph Neural Networks Arxiv, 2017. paper, code

Nowak, Alex and Villar, Soledad and Bandeira, S. Afonso and Bruna, Joan
⭐Neural Graph Matching Network: Learning Lawler’s Quadratic Assignment Problem with Extension to Hypergraph and Multiple-graph Matching. TPAMI, 2021. paper, code

Wang, Runzhong and Yan, Junchi and Yang, Xiaokang
⭐Revocable Deep Reinforcement Learning with Affinity Regularization for Outlier-Robust Graph Matching. ICLR, 2023. paper, code

Liu, Chang and Jiang, Zetian and Wang, Runzhong and Yan, Junchi and Huang, Lingxiao and Lu, Pinyan
⭐Towards Quantum Machine Learning for Constrained Combinatorial Optimization: a Quantum QAP Solver ICML, 2023. paper

Ye, Xinyu and Yan, Ge and Yan, Junchi

Sorting & Ranking (Sort&Rank)

Mon, 07 Jul 2025 00:00:00 +0000

Sorting & Ranking (Sort&Rank) #

Sorting and ranking problems involve learning to order elements according to some criteria, with applications in information retrieval and preference learning.

Recent Literature #

Ranking via sinkhorn propagation Arxiv, 2011. paper

Ryan Prescott Adams, Richard S. Zemel
Predict+optimise with ranking objectives: exhaustively learning linear functions IJCAI, 2019. paper

Demirovic, Emir and Stuckey, Peter J. and Bailey, James and Chan, Jeffrey and Leckie, Christopher and Ramamohanarao, Kotagiri and Guns, Tias
Stochastic Optimization of Sorting Networks via Continuous Relaxations ICLR, 2019. paper, code

Aditya Grover, Eric Wang, Aaron Zweig, Stefano Ermon
Differentiable Ranking and Sorting using Optimal Transport NeurIPS, 2019. paper

Marco Cuturi, Olivier Teboul, Jean-Philippe Vert
Optimizing Rank-Based Metrics With Blackbox Differentiation CVPR, 2020. paper, code

Marin Vlastelica,Anselm Paulus,Vít Musil,Georg Martius and Michal Rolínek
Fast Differentiable Sorting and Ranking ICML, 2020. paper, code

Mathieu Blondel Olivier Teboul Quentin Berthet Josip Djolonga
SoftSort: A Continuous Relaxation for the argsort Operator ICML, 2020. paper, code

Sebastian Prillo, Julian Martin Eisenschlos
differentiable top k with optimal transport NeurIPS, 2020. paper

Yujia Xie, Hanjun Dai, Minshuo Chen, Bo Dai, Tuo Zhao, Hongyuan Zha, Wei Wei, Tomas Pfister
Automatic Loss Function Search for Predict-Then-Optimize Problems with Strong Ranking Property ICLR, 2022. paper, code

Boshi Wang, Jialin Yi, Hang Dong, Bo Qiao, Chuan Luo, Qingwei Lin
Decision-Focused Learning: Through the Lens of Learning to Rank ICML, 2022. paper, code

Jayanta Mandi, Vı́ctor Bucarey, Maxime Mulamba Ke Tchomba, Tias Guns
PiRank-Scalable Learning To Rank via Differentiable Sorting NeurIPS, 2022. paper, code

Robin Marcel Edwin Swezey, Aditya Grover, Bruno Charron, Stefano Ermon
Neural Improvement Heuristics for Graph Combinatorial Optimization Problems TNNLS, 2023. journal, code

Andoni I. Garmendia, Josu Ceberio, Alexander Mendiburu
Applicability of Neural Combinatorial Optimization: A Critical View TELO, 2024. journal, code

Andoni I. Garmendia, Josu Ceberio, Alexander Mendiburu

Stochastic Combinatorial Optimization

Mon, 07 Jul 2025 00:00:00 +0000

Stochastic Combinatorial Optimization #

Stochastic Combinatorial Optimization addresses CO problems where some parameters are random or uncertain, requiring robust or adaptive solutions that perform well under uncertainty.

Recent Literature #

Robust Combinatorial Optimization with Locally Predictable Uncertainty ICLR, 2023. paper

Haozhe Sun, Shaoyu Wang, Jiaqi Ma, Chen Gong, Chen Tian
Learning Robust Policies for Combinatorial Optimization ICML, 2022. paper, code

Ankit Anupam, Joon Oh, Jure Leskovec
Stochastic Combinatorial Optimization with Oracle Subsampling NeurIPS, 2021. paper

Paul Grigas, Adam Elmachtoub, Yunchao Liu
Adaptive Policies for Stochastic Knapsack Problems Operations Research Letters, 2020. paper

Wenbo Gao, Oleg V. Pikhurko, Nicholas Harvey
Online Stochastic Optimization under Time-Varying Distributions ICML, 2023. paper

Yudi Zhou, Yinhan He, Jason D. Lee, Yixuan Qiu

Travelling Salesman Problem (TSP)

Mon, 07 Jul 2025 00:00:00 +0000

Travelling Salesman Problem (TSP) #

The Travelling Salesman Problem is one of the most famous NP-hard optimization problems, with extensive research on neural and ML-based approaches.

Recent Literature #

Learning Combinatorial Optimization Algorithms over Graphs. NeurIPS, 2017. paper

Dai, Hanjun and Khalil, Elias B and Zhang, Yuyu and Dilkina, Bistra and Song, Le
Learning Heuristics for the TSP by Policy Gradient CPAIOR, 2018. paper, code

Michel DeudonPierre CournutAlexandre Lacoste
Attention, Learn to Solve Routing Problems! ICLR, 2019. paper

Kool, Wouter and Van Hoof, Herke and Welling, Max.
Learning to Solve NP-Complete Problems: A Graph Neural Network for Decision TSP. AAAI, 2019. paper

Prates, Marcelo and Avelar, Pedro HC and Lemos, Henrique and Lamb, Luis C and Vardi, Moshe Y.
An Efficient Graph Convolutional Network Technique for the Travelling Salesman Problem Arxiv, 2019. paper, code

Chaitanya K. Joshi, Thomas Laurent, Xavier Bresson
POMO: Policy Optimization with Multiple Optima for Reinforcement Learning. NeurIPS, 2020. paper, code

Kwon, Yeong-Dae and Choo, Jinho and Kim, Byoungjip and Yoon, Iljoo and Min, Seungjai and Gwon, Youngjune.
Generalize a Small Pre-trained Model to Arbitrarily Large TSP Instances. Arxiv, 2020. paper

Fu, Zhang-Hua and Qiu, Kai-Bin and Zha, Hongyuan.
A Reinforcement Learning Approach for Optimizing Multiple Traveling Salesman Problems over Graphs KBS, 2020. journal

Hu, Yujiao and Yao, Yuan and Lee, Wee Sun
Learning 2-opt Heuristics for the Traveling Salesman Problem via Deep Reinforcement Learning ACML, 2020. paper, code

d O Costa, Paulo R and Rhuggenaath, Jason and Zhang, Yingqian and Akcay, Alp
Deep Reinforcement Learning for Combinatorial Optimization: Covering Salesman Problems. IEEE Trans Cybern, 2021. journal

Kaiwen Li, Tao Zhang, Rui Wang Yuheng Wang, and Yi Han
The Transformer Network for the Traveling Salesman Problem IPAM, 2021. paper

Xavier Bresson，Thomas Laurent
Learning Improvement Heuristics for Solving Routing Problems TNNLS, 2021. journal

Wu, Yaoxin and Song, Wen and Cao, Zhiguang and Zhang, Jie and Lim, Andrew
Reversible Action Design for Combinatorial Optimization with Reinforcement Learning Arxiv, 2021. paper

Yao, Fan and Cai, Renqin and Wang, Hongning
Solving Dynamic Traveling Salesman Problems with Deep Reinforcement Learning. TNNLS, 2021. journal

Zizhen Zhang, Hong Liu, Meng Chu Zhou, Jiahai Wang
ScheduleNet: Learn to Solve Multi-agent Scheduling Problems with Reinforcement Learning Arxiv, 2021. paper

Junyoung Park, Sanjar Bakhtiyar, Jinkyoo Park
DAN: Decentralized Attention-based Neural Network to Solve the MinMax Multiple Traveling Salesman Problem Arxiv, 2021. paper

Cao, Yuhong and Sun, Zhanhong and Sartoretti, Guillaume
Reinforcement Learning for Route Optimization with Robustness Guarantees IJCAI, 2021. paper

Jacobs, Tobias and Alesiani, Francesco and Ermis, Gulcin
Combining Reinforcement Learning with Lin-Kernighan-Helsgaun Algorithm for the Traveling Salesman Problem AAAI, 2021. paper, code

Jiongzhi Zheng, Kun He, Jianrong Zhou, Yan Jin, Chu-Min Li
Learning to Sparsify Travelling Salesman Problem Instances CPAIOR, 2021. paper

James Fitzpatrick, Deepak Ajwani, Paula Carroll
Learning TSP Requires Rethinking Generalization CP, 2021. paper, code

Chaitanya K. Joshi, Quentin Cappart, Louis-Martin Rousseau and Thomas Laurent
The First AI4TSP Competition: Learning to Solve Stochastic Routing Problems Arxiv, 2022. paper, code

Bliek, Laurens and da Costa, Paulo and Afshar, Reza Refaei and Zhang, Yingqian and Catshoek, Tom and Vos, Daniel and Verwer, Sicco and Schmitt-Ulms, Fynn and Hottung, Andre and Shah, Tapan and others
Graph Neural Network Guided Local Search for the Traveling Salesperson Problem ICLR, 2022. paper

Hudson, Benjamin and Li, Qingbiao and Malencia, Matthew and Prorok, Amanda
Preference Conditioned Neural Multi-objective Combinatorial Optimization ICLR, 2022. paper

Lin, Xi and Yang, Zhiyuan and Zhang, Qingfu
Learning Generalizable Models for Vehicle Routing Problems via Knowledge Distillation NeurIPS, 2022. paper, code

Bi, Jieyi and Ma, Yining and Wang, Jiahai and Cao, Zhiguang and Chen, Jinbiao and Sun, Yuan and Chee, Yeow Meng
DIMES: A Differentiable Meta Solver for Combinatorial Optimization Problems NeurIPS, 2022. paper

Qiu, Ruizhong and Sun, Zhiqing and Yang, Yiming
Sym-NCO: Leveraging Symmetricity for Neural Combinatorial Optimization NeurIPS, 2022. paper, code

Kim, Minsu and Park, Junyoung and Park, Jinkyoo
Simulation-guided Beam Search for Neural Combinatorial Optimization NeurIPS, 2022. paper, code

Choo, Jinho and Kwon, Yeong-Dae and Kim, Jihoon and Jae, Jeongwoo and Hottung, Andr{'e} and Tierney, Kevin and Gwon, Youngjune
Generalization of Neural Combinatorial Solvers Through the Lens of Adversarial Robustness ICLR, 2022. paper

Simon Geisler, Johanna Sommer, Jan Schuchardt, Aleksandar Bojchevski and Stephan Günnemann
⭐LinSATNet: The Positive Linear Satisfiability Neural Networks ICML, 2023. paper, code

Runzhong Wang and Yunhao Zhang and Ziao Guo and Tianyi Chen and Xiaokang Yang and Junchi Yan
Learning to CROSS exchange to solve min-max vehicle routing problems ICLR, 2023. paper

Kim, Minjun and Park, Junyoung and Park, Jinkyoo
Generalize Learned Heuristics to Solve Large-scale Vehicle Routing Problems in Real-time ICLR, 2023. paper

Hou, Qingchun and Yang, Jingwei and Su, Yiqiang and Wang, Xiaoqing and Deng, Yuming
⭐ROCO: A General Framework for Evaluating Robustness of Combinatorial Optimization Solvers on Graphs ICLR, 2023. paper, code

Lu, Han and Li, Zenan and Wang, Runzhong and Ren, Qibing and Li, Xijun and Yuan, Mingxuan and Zeng, Jia and Yang, Xiaokang and Yan, Junchi
Pointerformer: Deep Reinforced Multi-Pointer Transformer for the Traveling Salesman Problem Arxiv, 2023. paper, code

Yan Jin, Yuandong Ding, Xuanhao Pan, Kun He, Li Zhao, Tao Qin, Lei Song, Jiang Bian
H-tsp: Hierarchically solving the large-scale traveling salesman problem AAAI, 2023. paper, code

Xuanhao Pan, Yan Jin, Yuandong Ding, Mingxiao Feng, Li Zhao, Lei Song, Jiang Bian
Select and Optimize: Learning to solve large-scale TSP instances AISTATS, 2023. paper

Hanni Cheng, Haosi Zheng, Ya Cong, Weihao Jiang, Shiliang Pu
Multi-View Graph Contrastive Learning for Solving Vehicle Routing Problems UAI, 2023. paper

Yuan Jiang, Zhiguang Cao, Yaoxin Wu, Jie Zhang
Revisiting Sampling for Combinatorial Optimization ICML, 2023. paper

Sun, Haoran, Goshvadi Katayoon,Nova Azade,Schuurmans Dale and Dai Hanjun.
Meta-SAGE: Scale Meta-Learning Scheduled Adaptation with Guided Exploration for Mitigating Scale Shift on Combinatorial Optimization ICML, 2023. paper

Son, Jiwoo and Kim, Minsu and Kim, Hyeonah and Park, Jinkyoo
Towards Omni-generalizable Neural Methods for Vehicle Routing Problems ICML, 2023. paper, code

Zhou Jianan, Yaoxin Wu, Wen Song, Zhiguang Cao, Jie Zhang
DIFUSCO: Graph-based Diffusion Solvers for Combinatorial Optimization NeurIPS, 2023. paper, code

Zhiqing Sun, Yiming Yang
DeepACO: Neural-enhanced Ant Systems for Combinatorial Optimization NeurIPS, 2023. paper, code

Ye, Haoran and Wang, Jiarui and Cao, Zhiguang and Liang, Helan and Li, Yong
Winner Takes It All: Training Performant RL Populations for Combinatorial Optimization NeurIPS, 2023. paper

Grinsztajn, Nathan and Furelos-Blanco, Daniel and Surana, Shikha and Bonnet, Cl{'e}ment and Barrett, Thomas D
Optimizing Solution-Samplers for Combinatorial Problems: The Landscape of Policy-Gradient Methods NeurIPS, 2023. paper, code

Caramanis, Constantine and Fotakis, Dimitris and Kalavasis, Alkis and Kontonis, Vasilis and Tzamos, Christos
Combinatorial Optimization with Policy Adaptation using Latent Space Search NeurIPS, 2023. paper

Chalumeau, Felix and Surana, Shikha and Bonnet, Cl{'e}ment and Grinsztajn, Nathan and Pretorius, Arnu and Laterre, Alexandre and Barrett, Thomas D
Efficient Meta Neural Heuristic for Multi-Objective Combinatorial Optimization NeurIPS, 2023. paper, code

Chen, Jinbiao and Wang, Jiahai and Zhang, Zizhen and Cao, Zhiguang and Ye, Te and Chen, Siyuan
BQ-NCO: Bisimulation Quotienting for Efficient Neural Combinatorial Optimization NeurIPS, 2023. paper, code

Drakulic, Darko and Michel, Sofia and Mai, Florian and Sors, Arnaud and Andreoli, Jean-Marc
Neural Combinatorial Optimization with Heavy Decoder: Toward Large Scale Generalization NeurIPS, 2023. paper, code

Luo, Fu and Lin, Xi and Liu, Fei and Zhang, Qingfu and Wang, Zhenkun
Neural Multi-Objective Combinatorial Optimization with Diversity Enhancement NeurIPS, 2023. paper, code

Chen, Jinbiao and Zhang, Zizhen and Cao, Zhiguang and Wu, Yaoxin and Ma, Yining and Ye, Te and Wang, Jiahai
Unsupervised Learning for Solving the Travelling Salesman Problem NeurIPS, 2023. paper

Min, Yimeng and Bai, Yiwei and Gomes, Carla P
Ensemble-based Deep Reinforcement Learning for Vehicle Routing Problems under Distribution Shift NeurIPS, 2023. paper

Jiang, Yuan and Cao, Zhiguang and Wu, Yaoxin and Song, Wen and Zhang, Jie
Learning to Search Feasible and Infeasible Regions of Routing Problems with Flexible Neural k-Opt NeurIPS, 2023. paper, code

Ma, Yining and Cao, Zhiguang and Chee, Yeow Meng
⭐T2T: From Distribution Learning in Training to Gradient Search in Testing for Combinatorial Optimization NeurIPS, 2023. paper, code

Yang Li, Jinpei Guo, Runzhong Wang, Junchi Yan
Reinforced Lin–Kernighan–Helsgaun Algorithms for the Traveling Salesman Problems Knowledge-Based Systems, 2023. journal, code

Jiongzhi Zheng, Kun He, Jianrong Zhou, Yan Jin, Chu-Min Li
Neural Improvement Heuristics for Graph Combinatorial Optimization Problems TNNLS, 2023. journal, code

Andoni I. Garmendia, Josu Ceberio, Alexander Mendiburu
GLOP: Learning Global Partition and Local Construction for Solving Large-Scale Routing Problems in Real-Time AAAI, 2024. paper, code

Haoran Ye, Jiarui Wang, Helan Liang, Zhiguang Cao, Yong Li, Fanzhang Li
Distilling Autoregressive Models to Obtain High-Performance Non-autoregressive Solvers for Vehicle Routing Problems with Faster Inference Speed AAAI, 2024. paper, code

Yubin Xiao, Di Wang, Boyang Li, Mingzhao Wang, Xuan Wu, Changliang Zhou, You Zhou
Position: Rethinking Post-Hoc Search-Based Neural Approaches for Solving Large-Scale Traveling Salesman Problems ICML, 2024. paper, code

Yifan Xia, Xianliang Yang, Zichuan Liu, Zhihao Liu, Lei Song, Jiang Bian
MARCO: A Memory-Augmented Reinforcement Framework for Combinatorial Optimization IJCAI, 2024. paper, code

Andoni I. Garmendia, Quentin Cappart, Josu Ceberio, Alexander Mendiburu
Neural Combinatorial Optimization for Robust Routing Problem with Uncertain Travel Times NeurIPS, 2024. paper

Pei Xiao, Zizhen Zhang, Jinbiao Chen, Jiahai Wang, Zhenzhen Zhang
Collaboration! Towards Robust Neural Methods for Routing Problems NeurIPS, 2024. paper, code

Jianan Zhou, Yaoxin Wu, Zhiguang Cao, Wen Song, Jie Zhang, Zhiqi Shen
UDC: A Unified Neural Divide-and-Conquer Framework for Large-Scale Combinatorial Optimization Problems NeurIPS, 2024. paper, code

Zhi Zheng, Changliang Zhou, Tong Xialiang, Mingxuan Yuan, Zhenkun Wang
Learning to Handle Complex Constraints for Vehicle Routing Problems NeurIPS, 2024. paper

Jieyi Bi, Yining Ma, Jianan Zhou, Wen Song, Zhiguang Cao, Yaoxin Wu, Jie Zhang
⭐Fast T2T: Optimization Consistency Speeds Up Diffusion-Based Training-to-Testing Solving for Combinatorial Optimization NeurIPS, 2024. paper, code

Yang Li, Jinpei Guo, Runzhong Wang, Hongyuan Zha, Junchi Yan
⭐UniCO: On Unified Combinatorial Optimization via Problem Reduction to Matrix-Encoded General TSP ICLR, 2025. paper, code

Wenzheng Pan, Hao Xiong, Jiale Ma, Wentao Zhao, Yang Li, Junchi Yan
Efficient and Robust Neural Combinatorial Optimization via Wasserstein-Based Coresets ICLR, 2025. paper

Xu Wang, Fuyou Miao, Wenjie Liu, Yan Xiong
⭐Unify ML4TSP: Drawing Methodological Principles for TSP and Beyond from Streamlined Design Space of Learning and Search ICLR, 2025. paper, code

Yang Li, Jiale Ma, Wenzheng Pan, Runzhong Wang, Haoyu Geng, Nianzu Yang, Junchi Yan
⭐COExpander: Adaptive Solution Expansion for Combinatorial Optimization ICML, 2025. paper, code

Jiale Ma and Wenzheng Pan and Yang Li and Junchi Yan
⭐ML4CO-Bench-101: Benchmark Machine Learning for Classic Combinatorial Problems on Graphs NeurIPS, 2025. paper, code

Jiale Ma and Wenzheng Pan and Yang Li and Junchi Yan

Useful resources for studying Algebraic and Analytic Number Theory

Mon, 07 Jul 2025 00:00:00 +0000

General #

Elements of Number Theory by John Stillwell
Elementary Number Theory: Primes, Congruences, and Secrets by William Stein
MIT’s Theory of Numbers
Berkeley’s Number Theory by Richard E Borcherds, 1998 Fields Medalist
UCLA’s Introduction to Number Theory

Algebraic Number Theory #

Algebraic Number Theory, by J.S. Milne
An Algebraic introduction to Number Theory by Kimball Martin

Analytic Number Theory #

Vehicle Routing Problem (VRP)

Mon, 07 Jul 2025 00:00:00 +0000

Vehicle Routing Problem (VRP) #

The Vehicle Routing Problem is about finding optimal routes for a fleet of vehicles to serve a set of customers, a fundamental problem in logistics and transportation.

Recent Literature #

Learning to Perform Local Rewriting for Combinatorial Optimization. NeurIPS, 2019. paper, code

Chen, Xinyun and Tian, Yuandong.
Deep Reinforcement Learning for the Electric Vehicle Routing Problem with Time Windows. Arxiv, 2020. paper

Lin, Bo and Ghaddar, Bissan and Nathwani, Jatin.
Efficiently Solving the Practical,Vehicle Routing Problem: A Novel Joint Learning Approach. KDD, 2020. paper

Lu Duan, Yang Zhan, Haoyuan Hu, Yu Gong, Jiangwen Wei, Xiaodong Zhang, Yinghui Xu
Reinforcement Learning with Combinatorial Actions: An Application to Vehicle Routing NeurIPS, 2020. paper, code

Arthur Delarue, Ross Anderson, Christian Tjandraatmadja
A Learning-based Iterative Method for Solving Vehicle Routing Problems ICLR, 2020. paper

Lu, Hao and Zhang, Xingwen and Yang, Shuang
Neural Large Neighborhood Search for the Capacitated Vehicle Routing Problem Arxiv, 2020. paper

Hottung, Andre and Tierney, Kevin
Learning Improvement Heuristics for Solving Routing Problems TNNLS, 2021. journal

Wu, Yaoxin and Song, Wen and Cao, Zhiguang and Zhang, Jie and Lim, Andrew
Reinforcement Learning for Route Optimization with Robustness Guarantees IJCAI, 2021. paper

Jacobs, Tobias and Alesiani, Francesco and Ermis, Gulcin
Multi-Decoder Attention Model with Embedding Glimpse for Solving Vehicle Routing Problems. AAAI, 2021. paper, code

Liang Xin, Wen Song, Zhiguang Cao, Jie Zhang
Analytics and Machine Learning in Vehicle Routing Research Arxiv, 2021. paper

Bai, Ruibin and Chen, Xinan and Chen, Zhi-Long and Cui, Tianxiang and Gong, Shuhui and He, Wentao and Jiang, Xiaoping and Jin, Huan and Jin, Jiahuan and Kendall, Graham and others
RP-DQN: An application of Q-Learning to Vehicle Routing Problems Arxiv, 2021. paper

Bdeir, Ahmad and Boeder, Simon and Dernedde, Tim and Tkachuk, Kirill and Falkner, Jonas K and Schmidt-Thieme, Lars
Deep Policy Dynamic Programming for Vehicle Routing Problems Arxiv, 2021. paper

Kool, Wouter and van Hoof, Herke and Gromicho, Joaquim and Welling, Max
Learning to Delegate for Large-scale Vehicle Routing NeurIPS, 2021. paper

Li, Sirui and Yan, Zhongxia and Wu, Cathy
Learning a Latent Search Space for Routing Problems using Variational Autoencoders ICLR, 2021. paper

Hottung, Andre and Bhandari, Bhanu and Tierney, Kevin
Preference Conditioned Neural Multi-objective Combinatorial Optimization ICLR, 2022. paper

Lin, Xi and Yang, Zhiyuan and Zhang, Qingfu
Learning Generalizable Models for Vehicle Routing Problems via Knowledge Distillation NeurIPS, 2022. paper, code

Bi, Jieyi and Ma, Yining and Wang, Jiahai and Cao, Zhiguang and Chen, Jinbiao and Sun, Yuan and Chee, Yeow Meng
Sym-NCO: Leveraging Symmetricity for Neural Combinatorial Optimization NeurIPS, 2022. paper, code

Kim, Minsu and Park, Junyoung and Park, Jinkyoo
Simulation-guided Beam Search for Neural Combinatorial Optimization NeurIPS, 2022. paper, code

Choo, Jinho and Kwon, Yeong-Dae and Kim, Jihoon and Jae, Jeongwoo and Hottung, Andr{'e} and Tierney, Kevin and Gwon, Youngjune
Learning to CROSS exchange to solve min-max vehicle routing problems ICLR, 2023. paper

Kim, Minjun and Park, Junyoung and Park, Jinkyoo
Generalize Learned Heuristics to Solve Large-scale Vehicle Routing Problems in Real-time ICLR, 2023. paper

Hou, Qingchun and Yang, Jingwei and Su, Yiqiang and Wang, Xiaoqing and Deng, Yuming
Meta-SAGE: Scale Meta-Learning Scheduled Adaptation with Guided Exploration for Mitigating Scale Shift on Combinatorial Optimization ICML, 2023. paper

Son, Jiwoo and Kim, Minsu and Kim, Hyeonah and Park, Jinkyoo
Towards Omni-generalizable Neural Methods for Vehicle Routing Problems ICML, 2023. paper, code

Zhou Jianan, Yaoxin Wu, Wen Song, Zhiguang Cao, Jie Zhang
DeepACO: Neural-enhanced Ant Systems for Combinatorial Optimization NeurIPS, 2023. paper, code

Ye, Haoran and Wang, Jiarui and Cao, Zhiguang and Liang, Helan and Li, Yong
Winner Takes It All: Training Performant RL Populations for Combinatorial Optimization NeurIPS, 2023. paper

Grinsztajn, Nathan and Furelos-Blanco, Daniel and Surana, Shikha and Bonnet, Cl{'e}ment and Barrett, Thomas D
Combinatorial Optimization with Policy Adaptation using Latent Space Search NeurIPS, 2023. paper

Chalumeau, Felix and Surana, Shikha and Bonnet, Cl{'e}ment and Grinsztajn, Nathan and Pretorius, Arnu and Laterre, Alexandre and Barrett, Thomas D
Efficient Meta Neural Heuristic for Multi-Objective Combinatorial Optimization NeurIPS, 2023. paper, code

Chen, Jinbiao and Wang, Jiahai and Zhang, Zizhen and Cao, Zhiguang and Ye, Te and Chen, Siyuan
BQ-NCO: Bisimulation Quotienting for Efficient Neural Combinatorial Optimization NeurIPS, 2023. paper, code

Drakulic, Darko and Michel, Sofia and Mai, Florian and Sors, Arnaud and Andreoli, Jean-Marc
Neural Combinatorial Optimization with Heavy Decoder: Toward Large Scale Generalization NeurIPS, 2023. paper, code

Luo, Fu and Lin, Xi and Liu, Fei and Zhang, Qingfu and Wang, Zhenkun
Neural Multi-Objective Combinatorial Optimization with Diversity Enhancement NeurIPS, 2023. paper, code

Chen, Jinbiao and Zhang, Zizhen and Cao, Zhiguang and Wu, Yaoxin and Ma, Yining and Ye, Te and Wang, Jiahai
Ensemble-based Deep Reinforcement Learning for Vehicle Routing Problems under Distribution Shift NeurIPS, 2023. paper

Jiang, Yuan and Cao, Zhiguang and Wu, Yaoxin and Song, Wen and Zhang, Jie
Learning to Search Feasible and Infeasible Regions of Routing Problems with Flexible Neural k-Opt NeurIPS, 2023. paper, code

Ma, Yining and Cao, Zhiguang and Chee, Yeow Meng
Learning to Prune Electric Vehicle Routing Problems LION, 2023. paper

James Fitzpatrick, Deepak Ajwani, Paula Carroll
GLOP: Learning Global Partition and Local Construction for Solving Large-Scale Routing Problems in Real-Time AAAI, 2024. paper, code

Haoran Ye, Jiarui Wang, Helan Liang, Zhiguang Cao, Yong Li, Fanzhang Li
Distilling Autoregressive Models to Obtain High-Performance Non-autoregressive Solvers for Vehicle Routing Problems with Faster Inference Speed AAAI, 2024. paper, code

Yubin Xiao, Di Wang, Boyang Li, Mingzhao Wang, Xuan Wu, Changliang Zhou, You Zhou
Neural Combinatorial Optimization for Robust Routing Problem with Uncertain Travel Times NeurIPS, 2024. paper

Pei Xiao, Zizhen Zhang, Jinbiao Chen, Jiahai Wang, Zhenzhen Zhang
Collaboration! Towards Robust Neural Methods for Routing Problems NeurIPS, 2024. paper, code

Jianan Zhou, Yaoxin Wu, Zhiguang Cao, Wen Song, Jie Zhang, Zhiqi Shen
UDC: A Unified Neural Divide-and-Conquer Framework for Large-Scale Combinatorial Optimization Problems NeurIPS, 2024. paper, code

Zhi Zheng, Changliang Zhou, Tong Xialiang, Mingxuan Yuan, Zhenkun Wang
A Scalable Learning Approach for the Capacitated Vehicle Routing Problem Computers and Operations Research, 2024. journal

James Fitzpatrick, Deepak Ajwani, Paula Carroll
A Neural Column Generation Approach to the Vehicle Routing Problem with Two-Dimensional Loading and Last-In-First-Out Constraints IJCAI, 2024. paper, code

Yifan Xia, Xiangyi Zhang
Rethinking Neural Multi-Objective Combinatorial Optimization via Neat Weight Embedding ICLR, 2025. paper

Jinbiao Chen, Zhiguang Cao, Jiahai Wang, Yaoxin Wu, Hanzhang Qin, Zizhen Zhang, Yue-Jiao Gong
Boosting Neural Combinatorial Optimization for Large-Scale Vehicle Routing Problems ICLR, 2025. paper

Fu Luo, Xi Lin, Yaoxin Wu, Zhenkun Wang, Tong Xialiang, Mingxuan Yuan, Qingfu Zhang
⭐COExpander: Adaptive Solution Expansion for Combinatorial Optimization ICML, 2025. paper, code

Jiale Ma and Wenzheng Pan and Yang Li and Junchi Yan
⭐ML4CO-Bench-101: Benchmark Machine Learning for Classic Combinatorial Problems on Graphs NeurIPS, 2025. paper, code

Jiale Ma and Wenzheng Pan and Yang Li and Junchi Yan

Vertex Cover

Mon, 07 Jul 2025 00:00:00 +0000

Vertex Cover #

The Vertex Cover problem seeks the smallest set of vertices such that every edge in the graph is incident to at least one vertex in the set. This is a fundamental NP-hard problem in graph theory.

Recent Literature #

Learning Vertex Cover via Reinforcement Learning ICLR, 2024. paper

Kevin Kuo, Adeola Oscar Adeniyi, Henry Hoffmann
⭐NN-Baker: Neural Network-Guided Baker’s Algorithm for Vertex Cover NeurIPS, 2024. paper, code

Jiale Ma and Wenzheng Pan and Yang Li and Junchi Yan
⭐ GNN-based Generalization for Vertex Cover and Maximum Independent Set ICLR, 2025. paper

Jiale Ma and Wenzheng Pan and Yang Li and Junchi Yan

Virtual Network Embedding

Mon, 07 Jul 2025 00:00:00 +0000

Virtual Network Embedding #

Virtual Network Embedding (VNE) is the problem of mapping virtual network components (nodes and links) onto a physical network infrastructure, optimizing resource utilization and quality of service.

Recent Literature #

Deep Reinforcement Learning for Virtual Network Embedding ACM SIGCOMM, 2020. paper, code

Zhu Ren, Liang Hong, Wei Zhang
GNN-based Reinforcement Learning for Virtual Network Embedding IEEE ICDCS, 2021. paper

Yikang Wang, Zhu Ren, Mingwei Xu, Wei Zhang
Neural Network Assisted Heuristics for Virtual Network Embedding IEEE INFOCOM, 2021. paper

Xiaoming Huo, Shilin Dong, Chen Sun, Yonggang Wen
Graph Reinforcement Learning Based Learning-to-Rank for Node Classification ICDM, 2020. paper

Yupeng Liu, Shuai Zhang, Juncheng Liu, Weiye Li, Shuai Li, Houfeng Wang
Scalable Virtual Network Embedding with Deep Reinforcement Learning IEEE Transactions on Network and Service Management, 2021. paper

Yongmin Choi, Inyoung Kim, Namkyu Park
Machine Learning-Based Resource Allocation for Virtual Network Embedding IEEE Network, 2022. paper

Jun Sun, Shen Su, Shaohua Wan, Qiang Ye
Deep Learning Assisted VNE in Multi-domain Networks IEEE JSAC, 2019. paper

Peng Sun, Mingwei Xu, Yiming Sun
Virtual Network Embedding via Attributed Graph Embeddings and Deep Learning IEEE Access, 2020. paper

Yu Chen, Xiaofeng Zhang, Xiangyang Gong, Jianxin Wang
Accelerating Virtual Network Embedding with Deep Neural Networks IEEE INFOCOM, 2020. paper

Jian Sun, Yangxiu Cui, Yufeng Wang, Tingyu Ma
DRL-based Virtual Network Embedding with Guaranteed Resource Constraints IEEE Transactions on Network and Service Management, 2021. paper

Xuesong Yin, Yong Xia, Zhuo Su
Graph Neural Networks for Virtual Network Embedding IEEE IJCNN, 2021. paper

Jamal Hasan, Mohammed Alreshoodi, Ramin Sadre
Resource Prediction in Virtual Network Embedding using Graph Neural Networks IEEE CLOUDNET, 2021. paper

Jérôme François, Thomas Engel
Virtual Network Embedding: A State-of-the-Art Survey IEEE Communications Surveys & Tutorials, 2020. paper

Nashid Shahriar, Atta ur Rehman Khan, Sanjay P. Deshpande, Reaz Ahmed

A lemma of J. L. Lions

Tue, 24 Jun 2025 00:00:00 +0000

This post explores J. L. Lions’ lemma about Banach spaces with compact injection, including applications to functional analysis.

Lemma statement:

Let $X$, $Y$, and $Z$ be three Banach spaces with norms $|| \cdot ||_X$, $|| \cdot ||_Y$, and $|| \cdot ||_Z$. Assume that $X \subset Y$ with compact injection and that $Y \subset Z$ with continuous injection. Prove that

$$ \forall \varepsilon > 0, \exists C_\varepsilon > 0 \text{ satisfying } || u ||_Y \leq \varepsilon || u ||_X + C _{\varepsilon}|| u ||_Z,\quad \forall u \in X $$

Applications:

Prove that for every $\varepsilon > 0$ there exists $C_\varepsilon > 0$ satisfying

$$ \max_{t \in [0,1]} |u(t)| \leq \varepsilon \max_{t \in [0,1]} |u’(t)| + C_\varepsilon ||u ||_{L^1}, \quad \forall u \in C^1([0,1]). $$

Pick $p > 1$. Prove that for every $\varepsilon > 0$ there exists $C = C(\varepsilon, p)$ such that

$$ || u || _{L^\infty(0,1)} \leq \varepsilon || u || _{W^{1,p}(0,1)} + C || u || _{L^1(0,1)}, \quad \forall u \in W^{1,p}(0,1). $$

Proof:

For the initial lemma, just argue by contradiction. Assume the contrary that there exists some $\varepsilon_0 > 0$ and a sequence $(u_n)_{n \in \mathbb{Z}^{+}} \subset X$ such that

$$ || u ||_Y > \varepsilon || u ||_X + C _{\varepsilon}|| u ||_Z $$

Then $u_n \ne 0, \forall n \in \mathbb{Z}^{+}$.

Let $v_n := \dfrac{u_n}{|| u_n||_X}$

Then clearly, $||v_n||_X = 1$ and we have

$$ ||v_n|| _Y > \varepsilon_0 + C _{\varepsilon_0}||v_n||_Z $$

Since $X \subset Y$ with compact injection.

Assume without loss generalization, there is $v \in Y$ such that $|| v_n - v|| _Y \rightarrow 0$ as $n \rightarrow \infty$. In particular, we have $(||v_n||) _{n \in \mathbb{Z}^{+}}$ bounded. It follows that $||v_n|| \rightarrow 0$ as $n \rightarrow \infty$.

And because $Y \subset Z$ with continuous injection, we obtain:

$$ ||v_n - v||_Z \rightarrow 0 \quad \text{as} \quad n \rightarrow \infty $$

Then $v = 0$ and $||v_n||_Y \rightarrow 0$ as $n \rightarrow \infty$

On the other hand, we also have

$$ \lim_{n \rightarrow \infty} > \varepsilon_0 + \varepsilon_0\lim_{n \rightarrow \infty}||v_n||_Z $$

Consequently,

$$ 0 > \varepsilon_0 > 0 $$ which is a contradiction. The two application are more or less immediate after using the given lemma. The proof is completed.

Complex Hahn-Banach Theorem

Tue, 24 Jun 2025 00:00:00 +0000

Let $X$ be a complex vector space, $X_0$ one of its subspaces, $p: X \to \mathbb{R}_+$ such that

$$ p(\lambda x) = |\lambda| p(x), \quad \forall \lambda \in \mathbb{C}, x \in X \text{ and } p(x + y) \leq p(x) + p(y), \quad \forall x, y \in X, $$

satisfying $|f(x)| \leq p(x)$, $\forall x \in X_0$, where $f: X_0 \to \mathbb{C}$ is linear.

Under these conditions, there exists a linear functional $F: X \to \mathbb{C}$ such that $F|_{X_0} = f$ and

$$ |F(x)| \leq p(x), \quad \forall x \in X. $$

Proof: Since $f$ is linear, it follows that $\text{Re } f: X_0 \to \mathbb{R}$ is linear and $$ \text{Re } f(x) \leq |f(x)| \leq p(x), \quad \forall x \in X_0. $$

By the Real Hahn-Banach Theorem there exists $g: X \to \mathbb{R}$ a linear functional such that $g$ is an extension for $\text{Re } f$ and $g(x) \leq p(x)$, $\forall x \in X$. We also have $g(x) = -g(-x) \geq -p(x)$ so $|g(x)| \leq p(x)$, $\forall x \in X$.

Define now $F(x) = g(x) - i g(ix)$, $\forall x \in X$. This is obviously linear and if $x \in X_0$ we have $$ F(x) = g(x) - i g(ix) = \text{Re } f(x) - i \text{Re } i f(x) = \text{Re } f(x) + i \text{Im } f(x) = f(x), \quad \forall x \in X_0. $$

For the last part we have $|F(x)| = e^{i\theta} F(x) = F(e^{i\theta} x) = g(e^{i\theta} x)$, because this is a real number. Furthermore, we have $g(e^{i\theta} x) \leq p(e^{i\theta} x) = p(x)$. Combining the two above, we get $$ |F(x)| \leq p(x), \quad \forall x \in X, $$ which solves the theorem.

Real Hahn-Banach Theorem

Tue, 24 Jun 2025 00:00:00 +0000

Suppose $X$ is a vector space over $\mathbb{R}$, $p: X \to \mathbb{R}$ has the following properties:

$p(X) = \lambda p(x)$, $\forall x \in X$, $\lambda \in \mathbb{R}_+$ and $p(x + y) \leq p(x) + p(y)$, $\forall x, y \in X$.
Let $X_0$ be a subspace of $X$ and $u: X_0 \to \mathbb{R}$ a linear functional such that $u(x) \leq p(x)$, $\forall x \in X_0$.

Then we can find $f: X \to \mathbb{R}$ a linear functional such that $f|_{X_0} = u$ and $f(x) \leq u(x)$, $\forall x \in X$.

Proof: Let $Y$ is a subspace of $X$, $g: Y \to \mathbb{R}$ is a linear functional which extends $u$ and $g \leq p$ on $Y$

Consider the set $M = { (Y, g) }$. Define an order relation on $M$ like this $(Y_1, g_1) \leq (Y_2, g_2)$ if $Y_1 \subset Y_2$ and $g_2$ is an extension for $g_1$.

We show that in $M$ every chain has an upper bound. Suppose $M_0$ is a totally ordered subset of $M$. Then define $Y_0 = \bigcup_{(Y,g) \in M_0} Y$ and $g: Y_0 \to \mathbb{R}$, $g(y) = g_0(y)$ if $y \in Y_0$ and $(Y_0, g) \in M_0$. This function is well defined, and $Y_0$ is a subspace of $X$ because the set $M_0$ is totally ordered.

Furthermore, from the definition for $g_0$, we have that $g_0 \leq p$. Therefore $(Y_0, g_0) \in M$, and is obviously an upper bound for $M_0$. By Zorn’s Lemma, we find that $M$ has at least one maximal element $(Z, h)$.

Suppose $X \neq Z$. Then we can find $x_0 \in X \setminus Z$. Define $W = \text{Span}{Z, x_0} = \mathbb{R} \cdot x_0 \oplus Z$. Therefore, $W$ is a linear subspace in $X$. Let $y, z \in Z$. Then $$ h(y) + h(z) = h(y + z) \leq p(y + z) = p(y - x_0 + x_0 + z) \leq p(y - x_0) + p(x_0 + z) $$ Therefore, we have $$ h(z) - p(-x _0 + z) + h(y) - p(y - x _0) \leq - h(y) + p(x _0 + y), \quad\forall y, z \in Z $$

Therefore, we can say $$ a = \sup_{z \in Z} (h(z) - p(-x_0 + z)) \leq - \inf_{y \in Z} (-h(y) + p(x_0 + y)) $$ Pick one $c \in [a, b]$ and define $h_1(z) = \lambda c + h(y)$, where $z = \lambda x_0 + y$ (unique representation), $h_1$ is linear, and extends $h_1$ on $W$, which means that it extends $u$ on $X_0$.

We can check that $(W, h_1) \in M$ and the maximal element $h_1$ is the requested functional element, which is a contradiction.

Therefore $Z = X$, and the maximal element $h_1$ is the requested functional.

Riesz Representation Theorem

Tue, 24 Jun 2025 00:00:00 +0000

1. Riesz Representation Theorem #

Let $H$ be a Hilbert space over $\mathbb{R}$ or $\mathbb{C}$, and $T$ be a bounded linear functional on $H$ (a bounded operator from $H$ to the field $\mathbb{R}$ or $\mathbb{C}$, where $H$ is defined over that field). The following is known as the Riesz Representation Theorem:

Theorem 1:

If $T$ is a bounded linear functional on the Hilbert space $H$, then there exists $g \in H$ such that for every $f \in H$, we have: $$ T(f) = \langle f, g \rangle. $$

Moreover, $|T| = |g|$ (here $|T|$ denotes the operator norm of $T$, while $|g|$ is the Hilbert space norm of $g$).

Now, let’s prove this theorem.

Proof:

Assume that $H$ is separable for now. The proof for any Hilbert space is not much more difficult, but the separable case nicely uses ideas we have developed related to Fourier analysis. Additionally, we will work over $\mathbb{R}$.

Since $H$ is separable, we can choose an orthonormal basis $\phi_j$, $j \geq 1$, for $H$. Let $T$ be a bounded linear functional and set $a_j = T(\phi_j)$. For $f \in H$, set $c_j = \langle f, \phi_j \rangle$, and define $$ f_n = \sum_{j=1}^{n} c_j \phi_j. $$

Since the $\phi_j$ form a basis, we know that $|f - f_n| \to 0$ as $n \to \infty$.

Since $T$ is linear, we have: $$ T(f_n) = \sum_{j=1}^{n} a_j c_j. \tag{1} $$

Since $T$ is bounded, assume with norm $|T| < \infty$, we have: $$ |T(f) - T(f_n)| \leq |T| |f - f_n|. \tag{2} $$

Because $|f - f_n| \to 0$ as $n \to \infty$, we conclude from equations (1) and (2) that: $$ T(f) = \lim_{n\to\infty} T(f_n) = \sum_{j=1}^{\infty} a_j c_j. \tag{3} $$

In fact, the sequence $a_j$ must be square-summable. To see this, first note that since $|T(f)| \leq |T| |f|$, we have: $$ \left|\sum_{j=1}^{\infty} c_j a_j\right| \leq |T| \left(\sum_{j=1}^{\infty} c_j^2\right)^{1/2}. \tag{4} $$

Equation (4) must hold for every square-summable sequence $c_j$ (since any such $c_j$ corresponds to some element in $H$). Fix a positive integer $N$ and define the sequence $c_j = a_j$ for $j \leq N$, $c_j = 0$ for $j > N$. Clearly, such a sequence is square-summable, and equation (4) gives us: $$ \left(\sum_{j=1}^{N} a_j^2\right)^{1/2} \leq |T|. \tag{5} $$

Thus, $a_j$ is square-summable, as the sequence of partial sums is bounded above.

Since $a_j$ is square-summable, the function $g = \sum_{j} a_j \phi_j$ is well-defined as an element of $H$, and $T(f) = \sum_{j} a_j c_j = \langle f, g \rangle$. Finally, equation (5) shows that $|g| \leq |T|$. But from the Cauchy-Schwarz inequality, we also have $|T(f)| = |\langle f, g \rangle| \leq |f| |g|$ or $\frac{|T(f)|}{|f|} \leq |g|$, implying $|T| \leq |g|$, hence $|T| = |g|$. The proof is complete.

2. Application to PDE #

This example illustrates how functional analysis methods are used in PDEs (although the example is for an ODE). Consider the ODE: $$ -f’’(x) + b(x)f(x) = q(x) \tag{6} $$

on the interval $0 < x < 1$, with $b(x) \geq \delta > 0$ for some $\delta$; assume the functions $b$ and $q$ are continuous on $[0, 1]$. We want to find a solution to equation (6) with $f’(0) = f’(1) = 0$ (other boundary conditions could also be applied). If we multiply (6) by a $C^1$ function $\phi$ and integrate the first term, $-f’’\phi$, by parts from $x = 0$ to $x = 1$, we obtain: $$ \int_0^1 (f’(x)\phi’(x) + b(x)f(x)\phi(x)),dx = \int_0^1 q(x)\phi(x),dx. \tag{7} $$

Equation (7) must hold for every $\phi \in C^1([0, 1])$, if $f$ is a $C^2(0, 1)$ solution of equation (6) that is continuous on $[0, 1]$. Conversely, if for a $C^2$ function $f$, we find that (7) holds for every $\phi$, then $f$ must be a solution of equation (6), because if we “undo” the integration by parts in (7), we get: $$ \phi(1)f’(1) - \phi(0)f’(0) + \phi(x)(-f’’(x) + b(x)f(x)) = \phi(x)q(x) $$ for every $\phi$.

A familiar PDE argument then shows that $f’(0) = f’(1) = 0$ and equation (6) must hold.

We will show that there is a unique solution to equation (7). Such a “solution” does not necessarily need to be twice differentiable as required by equation (6), but it will satisfy equation (7). Equation (7) is often called the “weak” form of the problem.

Define an inner product: $$ \langle g, h \rangle = \int_0^1 (g’(x)h’(x) + b(x)g(x)h(x)),dx $$

on the space $C^1([0, 1])$, and let $H$ denote the completion of this space. This is essentially the procedure used on the third problem of the first exam; the presence of $b(x)$ makes no difference. (Note that we must use $b \geq \delta > 0$ to ensure that $\langle \cdot, \cdot \rangle$ is indeed an inner product, so that $|g| = \sqrt{\langle g, g \rangle} = 0$ if and only if $g \equiv 0$.) The space $H$ is a Hilbert space and can be understood (if needed) as a subspace of $C([0, 1])$.

Define a functional $T : H \to \mathbb{R}$ by: $$ T(\phi) = \int_0^1 q(x)\phi(x),dx $$

You can easily check that $T$ is bounded on $H$ (using Cauchy-Schwarz). From the Riesz Representation Theorem, it follows that there must exist a function $f \in H$ such that: $$ T(\phi) = \langle f, \phi \rangle $$

for every $\phi \in H$. This is exactly equation (7), the weak form of the ODE!

The function $f$ satisfying equation (7) lies in $H$. Under the conditions on $b$ (specifically, $b \geq \delta > 0$ and $|b|_\infty < \infty$ since $b \in C([0, 1])$), the function $f$ lies in the same space defined in the third problem of the first exam. Specifically, $f$ is a continuous function. Proving that $f$ is actually twice differentiable requires more work, along with additional assumptions about the function $q$.

References #

[1] (Original) The Riesz Representation Theorem, MA 466, Kurt Bryan

The application of Hahn-Banach Theorem 01

Tue, 24 Jun 2025 00:00:00 +0000

Suppose $X$ is a normed space and $X_0$ is a closed subspace of $X$ and $x_0 \in X \setminus X_0$. Then we can find $f \in X’$ such that $f(x_0) = 1$ and $f(x) = 0$, $\forall x \in X_0$.

Proof: Since $x_0 \notin X_0$, we can find $\delta > 0$ such that $|x_0 - x| \geq \delta$, $\forall x \in X_0$, which is equivalent to $1 \leq \dfrac{|x_0 - x|}{\delta}$, $\forall x \in X_0$.

Define $Y = \text{Span}{x_0, X_0} = X_0 \oplus \mathbb{K} \cdot x_0$. Then for each $y \in Y$ we can find a unique $\lambda \in \mathbb{K}$ such that $u = \lambda x_0 + x$, $x \in X_0$. Define $u: Y \to \mathbb{K}$ by $u(y) = u(\lambda x_0 + x) = \lambda$. It is well defined and linear.

Furthermore, we have: $$|u(y)| = |\lambda| \leq |\lambda| \frac{|x _0 + x|}{\delta} = \frac{1}{\delta} |y| \quad \text{for} \lambda \neq 0$$ If $\lambda = 0$, then $y \in X_0$ and $u(y) = 0 \leq \frac{1}{\delta} |y|$.

Therefore, we obtain
$$ u(y) \leq \frac{1}{\delta} |y| \quad\forall y \in Y $$ By Hahn-Banach’s Theorem, we can extend $u$ to $f: X \to \mathbb{K}$ such that $f|_Y = u$ and $|f(x)| \leq \dfrac{1}{\delta} |x|$, $\forall x \in X$. Therefore $f(x_0) = u(x_0) = 1$ and $x \in X_0 \Rightarrow f(x) = 0$.

The application of Hahn-Banach Theorem 02

Tue, 24 Jun 2025 00:00:00 +0000

$X'$ = $\{ f: X \to \mathbb{K} \}$ where $f$ is is linear and continuous and $X$ is a Banach space over $\mathbb{K}$. Prove that $X' \neq {0}$, in fact, for every $x \neq 0 \in X$, we can find $f \in X’$ such that $f(x) = |x|$ and $|f| = 1$.

Proof: Pick $x_0 \in X$. Define $X_0 = x_0 \cdot \mathbb{K}$, a subspace of $X$, and $g: X_0 \to \mathbb{K}$, $g(x) = x$, which is linear. Since $g$ and $|\cdot|$ satisfy the conditions of the Hahn-Banach theorem, we can find $f: X \to \mathbb{K}$ such that $f|_{X_0} = g$, $f$ is linear and $f(x) \leq |x|$, $\forall x \in X$. Therefore $f(x_0) = g(x_0) = |x_0|$ and $|f| \leq 1$. The equality $f(x_0) = |x_0|$ guarantees that $|f| = 1$.

Optimization Papers in JMLR Volume 26

Sun, 29 Sep 2024 00:00:00 +0000

Optimization Research Papers in JMLR Volume 25

Sun, 29 Sep 2024 00:00:00 +0000

Optimization Research Papers in JMLR Volume 25 (2024) #

This document lists papers from JMLR Volume 25 (2024) that focus on optimization research, categorized by their primary themes. Each paper is numbered starting from 1 within its subsection, with a brief description of its key contributions to optimization theory, algorithms, or applications.

Convex Optimization #

Papers addressing convex optimization problems, including sparse NMF, differential privacy, and sparse regression.

Lower Complexity Bounds of Finite-Sum Optimization Problems: The Results and Construction
Authors: Yuze Han, Guangzeng Xie, Zhihua Zhang
Description: Investigates lower complexity bounds for finite-sum optimization problems in convex settings.
Sparse NMF with Archetypal Regularization: Computational and Robustness Properties
Authors: Kayhan Behdin, Rahul Mazumder
Description: Proposes sparse non-negative matrix factorization with archetypal regularization using convex optimization.
Scaling the Convex Barrier with Sparse Dual Algorithms
Authors: Alessandro De Palma, Harkirat Singh Behl, Rudy Bunel, Philip H.S. Torr, M. Pawan Kumar
Description: Develops sparse dual algorithms for scaling convex optimization problems.
Faster Rates in Differentially Private Stochastic Convex Optimization
Authors: Jinyan Su, Lijie Hu, Di Wang
Description: Analyzes faster convergence rates for differentially private stochastic convex optimization.
Estimation of Sparse Gaussian Graphical Models with Hidden Clustering Structure
Authors: Meixia Lin, Defeng Sun, Kim-Chuan Toh, Chengjing Wang
Description: Develops convex optimization methods for sparse Gaussian graphical models with hidden clustering.
A Minimax Optimal Approach to High-Dimensional Double Sparse Linear Regression
Authors: Yanhang Zhang, Zhifan Li, Shixiang Liu, Jianxin Yin
Description: Proposes a minimax optimal approach for high-dimensional double sparse linear regression using convex optimization.
An Inexact Projected Regularized Newton Method for Fused Zero-Norms Regularization Problems
Authors: Yuqia Wu, Shaohua Pan, Xiaoqi Yang
Description: Introduces an inexact projected regularized Newton method for fused zero-norms regularization in convex optimization.

Nonconvex Optimization #

Papers tackling nonconvex optimization, focusing on ADMM, Adam-family methods, and stochastic minimax optimization.

Convergence for Nonconvex ADMM, with Applications to CT Imaging
Authors: Rina Foygel Barber, Emil Y. Sidky
Description: Studies convergence properties of nonconvex ADMM with applications to CT imaging.
Adam-Family Methods for Nonsmooth Optimization with Convergence Guarantees
Authors: Nachuan Xiao, Xiaoyin Hu, Xin Liu, Kim-Chuan Toh
Description: Develops Adam-family methods for nonsmooth nonconvex optimization with convergence guarantees.
Nonasymptotic Analysis of Stochastic Gradient Hamiltonian Monte Carlo under Local Conditions for Nonconvex Optimization
Authors: O. Deniz Akyildiz, Sotirios Sabanis
Description: Provides a nonasymptotic analysis of stochastic gradient Hamiltonian Monte Carlo for nonconvex optimization.
High Probability Convergence Bounds for Non-Convex Stochastic Gradient Descent with Sub-Weibull Noise
Authors: Liam Madden, Emiliano Dall’Anese, Stephen Becker
Description: Derives high-probability convergence bounds for nonconvex stochastic gradient descent with sub-Weibull noise.
Stochastic Regularized Majorization-Minimization with Weakly Convex and Multi-Convex Surrogates
Authors: Hanbaek Lyu
Description: Proposes stochastic regularized majorization-minimization for weakly convex and multi-convex problems.
Near-Optimal Algorithms for Stochastic Minimax Optimization
Authors: Lesi Chen, Luo Luo
Description: Develops near-optimal algorithms for stochastic minimax optimization in nonconvex settings.
Scaled Conjugate Gradient Method for Nonconvex Optimization in Deep Neural Networks
Authors: Naoki Sato, Koshiro Izumi, Hideaki Iiduka
Description: Introduces a scaled conjugate gradient method for nonconvex optimization in deep neural networks.

Stochastic Optimization #

Papers focusing on stochastic optimization methods, including continuous-time approximations, momentum, and curvature estimates.

A Comparison of Continuous-Time Approximations to Stochastic Gradient Descent
Authors: Stefan Ankirchner, Stefan Perko
Description: Compares continuous-time approximations to stochastic gradient descent for optimization.
On the Generalization of Stochastic Gradient Descent with Momentum
Authors: Ali Ramezani-Kebrya, Kimon Antonakopoulos, Volkan Cevher, Ashish Khisti, Ben Liang
Description: Analyzes the generalization properties of stochastic gradient descent with momentum.
Stochastic Modified Flows, Mean-Field Limits and Dynamics of Stochastic Gradient Descent
Authors: Benjamin Gess, Sebastian Kassing, Vitalii Konarovskyi
Description: Studies stochastic modified flows and mean-field limits for stochastic gradient descent dynamics.
Stochastic Approximation with Decision-Dependent Distributions: Asymptotic Normality and Optimality
Authors: Joshua Cutler, Mateo Díaz, Dmitriy Drusvyatskiy
Description: Investigates stochastic approximation with decision-dependent distributions, focusing on asymptotic normality and optimality.
An Algorithm with Optimal Dimension-Dependence for Zero-Order Nonsmooth Nonconvex Stochastic Optimization
Authors: Guy Kornowski, Ohad Shamir
Description: Proposes an algorithm with optimal dimension-dependence for zero-order nonsmooth nonconvex stochastic optimization.
On the Hyperparameters in Stochastic Gradient Descent with Momentum
Authors: Bin Shi
Description: Examines the impact of hyperparameters in stochastic gradient descent with momentum.
Almost Sure Convergence Rates Analysis and Saddle Avoidance of Stochastic Gradient Methods
Authors: Jun Liu, Ye Yuan
Description: Analyzes almost sure convergence rates and saddle avoidance in stochastic gradient methods.
PROMISE: Preconditioned Stochastic Optimization Methods by Incorporating Scalable Curvature Estimates
Authors: Zachary Frangella, Pratik Rathore, Shipu Zhao, Madeleine Udell
Description: Introduces preconditioned stochastic optimization methods with scalable curvature estimates.
Zeroth-Order Stochastic Approximation Algorithms for DR-Submodular Optimization
Authors: Yuefang Lian, Xiao Wang, Dachuan Xu, Zhongrui Zhao
Description: Develops zeroth-order stochastic approximation algorithms for DR-submodular optimization.
Stochastic-Constrained Stochastic Optimization with Markovian Data
Authors: Yeongjong Kim, Dabeen Lee
Description: Studies stochastic-constrained optimization with Markovian data.
High Probability and Risk-Averse Guarantees for a Stochastic Accelerated Primal-Dual Method
Authors: Yassine Laguel, Necdet Serhat Aybat, Mert Gürbüzbalaban
Description: Provides high-probability and risk-averse guarantees for a stochastic accelerated primal-dual method.

Distributed/Decentralized Optimization #

Papers addressing distributed or decentralized optimization algorithms, focusing on communication efficiency and federated learning.

Distributed Gaussian Mean Estimation under Communication Constraints: Optimal Rates and Communication-Efficient Algorithms
Authors: T. Tony Cai, Hongji Wei
Description: Develops optimal rates and communication-efficient algorithms for distributed Gaussian mean estimation.
Accelerated Gradient Tracking over Time-Varying Graphs for Decentralized Optimization
Authors: Huan Li, Zhouchen Lin
Description: Proposes accelerated gradient tracking for decentralized optimization over time-varying graphs.
Compressed and Distributed Least-Squares Regression: Convergence Rates with Applications to Federated Learning
Authors: Constantin Philippenko, Aymeric Dieuleveut
Description: Analyzes convergence rates for compressed and distributed least-squares regression in federated learning.
Federated Automatic Differentiation
Authors: Keith Rush, Zachary Charles, Zachary Garrett
Description: Introduces federated automatic differentiation for distributed optimization.
A Random Projection Approach to Personalized Federated Learning: Enhancing Communication Efficiency, Robustness, and Fairness
Authors: Yuze Han, Xiang Li, Shiyun Lin, Zhihua Zhang
Description: Proposes a random projection approach to enhance communication efficiency in personalized federated learning.
Countering the Communication Bottleneck in Federated Learning: A Highly Efficient Zero-Order Optimization Technique
Authors: Elissa Mhanna, Mohamad Assaad
Description: Develops a zero-order optimization technique to address communication bottlenecks in federated learning.

Bandits and Online Learning #

Papers addressing multi-armed bandits, online optimization, and regret minimization.

Exploration, Exploitation, and Engagement in Multi-Armed Bandits with Abandonment
Authors: Zixian Yang, Xin Liu, Lei Ying
Description: Studies exploration, exploitation, and engagement in multi-armed bandits with abandonment.
Adaptivity and Non-Stationarity: Problem-Dependent Dynamic Regret for Online Convex Optimization
Authors: Peng Zhao, Yu-Jie Zhang, Lijun Zhang, Zhi-Hua Zhou
Description: Analyzes problem-dependent dynamic regret for online convex optimization under non-stationarity.
Materials Discovery Using Max K-Armed Bandit
Authors: Nobuaki Kikkawa, Hiroshi Ohno
Description: Applies max k-armed bandit algorithms to materials discovery, focusing on regret minimization.
Finite-Time Analysis of Globally Nonstationary Multi-Armed Bandits
Authors: Junpei Komiyama, Edouard Fouché, Junya Honda
Description: Provides finite-time analysis for globally nonstationary multi-armed bandits.
Optimistic Online Mirror Descent for Bridging Stochastic and Adversarial Online Convex Optimization
Authors: Sijia Chen, Yu-Jie Zhang, Wei-Wei Tu, Peng Zhao, Lijun Zhang
Description: Develops optimistic online mirror descent for bridging stochastic and adversarial online convex optimization.
Continuous Prediction with Experts’ Advice
Authors: Nicholas J. A. Harvey, Christopher Liaw, Victor S. Portella
Description: Investigates continuous prediction with experts’ advice in online learning settings.
Regret Analysis of Bilateral Trade with a Smoothed Adversary
Authors: Nicolò Cesa-Bianchi, Tommaso Cesari, Roberto Colomboni, Federico Fusco, Stefano Leonardi
Description: Analyzes regret in bilateral trade with a smoothed adversary in online optimization.
Optimal Learning Policies for Differential Privacy in Multi-Armed Bandits
Authors: Siwei Wang, Jun Zhu
Description: Develops optimal learning policies for differential privacy in multi-armed bandits.
Information Capacity Regret Bounds for Bandits with Mediator Feedback
Authors: Khaled Eldowa, Nicolò Cesa-Bianchi, Alberto Maria Metelli, Marcello Restelli
Description: Derives regret bounds for bandits with mediator feedback, focusing on information capacity.
Contextual Bandits with Packing and Covering Constraints: A Modular Lagrangian Approach via Regression
Authors: Aleksandrs Slivkins, Xingyu Zhou, Karthik Abinav Sankararaman, Dylan J. Foster
Description: Proposes a modular Lagrangian approach for contextual bandits with packing and covering constraints.

Optimization in Reinforcement Learning #

Papers focusing on optimization techniques for reinforcement learning, including policy gradient, actor-critic, and safe RL.

Fast Policy Extragradient Methods for Competitive Games with Entropy Regularization
Authors: Shicong Cen, Yuting Wei, Yuejie Chi
Description: Develops fast policy extragradient methods for competitive games with entropy regularization in RL.
Sample-Efficient Adversarial Imitation Learning
Authors: Dahuin Jung, Hyungyu Lee, Sungroh Yoon
Description: Proposes sample-efficient adversarial imitation learning methods for RL optimization.
On the Sample Complexity and Metastability of Heavy-Tailed Policy Search in Continuous Control
Authors: Amrit Singh Bedi, Anjaly Parayil, Junyu Zhang, Mengdi Wang, Alec Koppel
Description: Analyzes sample complexity and metastability for heavy-tailed policy search in continuous control.
Off-Policy Action Anticipation in Multi-Agent Reinforcement Learning
Authors: Ariyan Bighashdel, Daan de Geus, Pavol Jancura, Gijs Dubbelman
Description: Develops off-policy action anticipation methods for multi-agent RL optimization.
Policy Gradient Methods in the Presence of Symmetries and State Abstractions
Authors: Prakash Panangaden, Sahand Rezaei-Shoshtari, Rosie Zhao, David Meger, Doina Precup
Description: Investigates policy gradient methods with symmetries and state abstractions for RL optimization.
Log Barriers for Safe Black-Box Optimization with Application to Safe Reinforcement Learning
Authors: Ilnura Usmanova, Yarden As, Maryam Kamgarpour, Andreas Krause
Description: Proposes log barriers for safe black-box optimization with applications to safe RL.
Decentralized Natural Policy Gradient with Variance Reduction for Collaborative Multi-Agent Reinforcement Learning
Authors: Jinchi Chen, Jie Feng, Weiguo Gao, Ke Wei
Description: Develops decentralized natural policy gradient with variance reduction for multi-agent RL.
Distributionally Robust Model-Based Offline Reinforcement Learning with Near-Optimal Sample Complexity
Authors: Laixi Shi, Yuejie Chi
Description: Studies distributionally robust model-based offline RL with near-optimal sample complexity.
Sample Complexity of Neural Policy Mirror Descent for Policy Optimization on Low-Dimensional Manifolds
Authors: Zhenghao Xu, Xiang Ji, Minshuo Chen, Mengdi Wang, Tuo Zhao
Description: Analyzes sample complexity of neural policy mirror descent for policy optimization on low-dimensional manifolds.
Mean-Field Approximation of Cooperative Constrained Multi-Agent Reinforcement Learning (CMARL)
Authors: Washim Uddin Mondal, Vaneet Aggarwal, Satish V. Ukkusuri
Description: Proposes mean-field approximations for cooperative constrained multi-agent RL optimization.
Instrumental Variable Value Iteration for Causal Offline Reinforcement Learning
Authors: Luofeng Liao, Zuyue Fu, Zhuoran Yang, Yixin Wang, Dingli Ma, Mladen Kolar, Zhaoran Wang
Description: Develops instrumental variable value iteration for causal offline RL optimization.
Matryoshka Policy Gradient for Entropy-Regularized RL: Convergence and Global Optimality
Authors: François G. Ged, Maria Han Veiga
Description: Introduces a Matryoshka policy gradient method for entropy-regularized RL with convergence guarantees.
Data-Efficient Policy Evaluation Through Behavior Policy Search
Authors: Josiah P. Hanna, Yash Chandak, Philip S. Thomas, Martha White, Peter Stone, Scott Niekum
Description: Proposes data-efficient policy evaluation methods for RL through behavior policy search.
Empirical Design in Reinforcement Learning
Authors: Andrew Patterson, Samuel Neumann, Martha White, Adam White
Description: Investigates empirical design strategies for optimization in reinforcement learning.
A New, Physics-Informed Continuous-Time Reinforcement Learning Algorithm with Performance Guarantees
Authors: Brent A. Wallace, Jennie Si
Description: Develops a physics-informed continuous-time RL algorithm with performance guarantees.

Ebooks & related papers on Convex Optimizations

Mon, 15 Jul 2024 00:00:00 +0000

Ebooks #

Boris Mordukhovich , Nguyen Mau Nam. An Easy Path to Convex Analysis and Applications. 2023
Yurii Nesterov. Lectures on Convex Optimization. 2018
Sébastien Bubeck. Convex Optimization: Algorithms and Complexity. 2015
Dimitri Bertsekas. Nonlinear Programming. 2016
Boris Teodorovich Polyak. Introduction to Optimization. 1987
R. T. Rockafellar. Convex Analysis. 1970
H. H. Bauschke & P. L. Combettes. Convex Analysis and Monotone Operator Theory in Hilbert Spaces. 2011
Lieven Vandenberghe and Stephen P. Boyd. Convex Optimization. 2004

Papers #

Yu. E. Nesterov. A method of solving a convex programming problem with convergence rate. 1983

Pre-print articles on Adagrad-variant methods

Mon, 15 Jul 2024 00:00:00 +0000

1. Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models #

Authors: Frederik Kunstner, Robin Yadav, Alan Milligan, Mark Schmidt, Alberto Bietti

Abstract: Adam has been shown to outperform gradient descent on large language models by a larger margin than on other tasks, but it is unclear why. We show that a key factor in this performance gap is the heavy-tailed class imbalance found in language tasks. When trained with gradient descent, the loss of infrequent words decreases more slowly than the loss of frequent ones. This leads to a slow decrease on the average loss as most samples come from infrequent words. On the other hand, Adam and sign-based methods are less sensitive to this problem. To establish that this behavior is caused by class imbalance, we show empirically that it can be reproduced across architectures and data types, on language transformers, vision CNNs, and linear models. On a linear model with cross-entropy loss, we show that class imbalance leads to imbalanced, correlated gradients and Hessians that have been hypothesized to benefit Adam. We also prove that, in continuous time, gradient descent converges slowly on low-frequency classes while sign descent does not.

2. Accelerated Parameter-Free Stochastic Optimization #

Authors: Itai Kreisler, Maor Ivgi, Oliver Hinder, Yair Carmon

Abstract: We propose a method that achieves near-optimal rates for smooth stochastic convex optimization and requires essentially no prior knowledge of problem parameters. This improves on prior work which requires knowing at least the initial distance to optimality d0. Our method, U-DoG, combines UniXGrad (Kavis et al., 2019) and DoG (Ivgi et al., 2023) with novel iterate stabilization techniques. It requires only loose bounds on d0 and the noise magnitude, provides high probability guarantees under sub-Gaussian noise, and is also near-optimal in the non-smooth case. Our experiments show consistent, strong performance on convex problems and mixed results on neural network training.

3. Universal Gradient Methods for Stochastic Convex Optimization #

Authors: Anton Rodomanov, Ali Kavis, Yongtao Wu, Kimon Antonakopoulos, Volkan Cevher

Abstract: We develop universal gradient methods for Stochastic Convex Optimization (SCO). Our algorithms automatically adapt not only to the oracle’s noise but also to the Hölder smoothness of the objective function without a priori knowledge of the particular setting. The key ingredient is a novel strategy for adjusting step-size coefficients in the Stochastic Gradient Method (SGD). Unlike AdaGrad, which accumulates gradient norms, our Universal Gradient Method accumulates appropriate combinations of gradient- and iterate differences. The resulting algorithm has state-of-the-art worst-case convergence rate guarantees for the entire Hölder class including, in particular, both nonsmooth functions and those with Lipschitz continuous gradient. We also present the Universal Fast Gradient Method for SCO enjoying optimal efficiency estimates.

Pre-print articles on Adaptive Optimization

Mon, 15 Jul 2024 00:00:00 +0000

1. A simple uniformly optimal method without line search for convex optimization #

Authors: Tianjiao Li, Guanghui Lan

Abstract: Line search (or backtracking) procedures have been widely employed into first-order methods for solving convex optimization problems, especially those with unknown problem parameters (e.g., Lipschitz constant). In this paper, we show that line search is superfluous in attaining the optimal rate of convergence for solving a convex optimization problem whose parameters are not given a priori. In particular, we present a novel accelerated gradient descent type algorithm called auto-conditioned fast gradient method (AC-FGM) that can achieve an optimal $\mathcal{O}(1/k^2)$ rate of convergence for smooth convex optimization without requiring the estimate of a global Lipschitz constant or the employment of line search procedures. We then extend AC-FGM to solve convex optimization problems with Hölder continuous gradients and show that it automatically achieves the optimal rates of convergence uniformly for all problem classes with the desired accuracy of the solution as the only input. Finally, we report some encouraging numerical results that demonstrate the advantages of AC-FGM over the previously developed parameter-free methods for convex optimization.

Source code: https://github.com/tli432/AC-FGM-Implementation

2. Adaptive Proximal Gradient Method for Convex Optimization #

Authors: Yura Malitsky, Konstantin Mishchenko

Abstract: In this paper, we explore two fundamental first-order algorithms in convex optimization, namely, gradient descent (GD) and proximal gradient method (ProxGD). Our focus is on making these algorithms entirely adaptive by leveraging local curvature information of smooth functions. We propose adaptive versions of GD and ProxGD that are based on observed gradient differences and, thus, have no added computational costs. Moreover, we prove convergence of our methods assuming only local Lipschitzness of the gradient. In addition, the proposed versions allow for even larger stepsizes than those initially suggested in [MM20].

Source code: https://github.com/ymalitsky/AdProxGD

3. An Adaptive Stochastic Gradient Method with Non-negative Gauss-Newton Stepsizes #

Authors: Antonio Orvieto, Lin Xiao

Abstract: We consider the problem of minimizing the average of a large number of smooth but possibly non-convex functions. In the context of most machine learning applications, each loss function is non-negative and thus can be expressed as the composition of a square and its real-valued square root. This reformulation allows us to apply the Gauss-Newton method, or the Levenberg-Marquardt method when adding a quadratic regularization. The resulting algorithm, while being computationally as efficient as the vanilla stochastic gradient method, is highly adaptive and can automatically warmup and decay the effective stepsize while tracking the non-negative loss landscape. We provide a tight convergence analysis, leveraging new techniques, in the stochastic convex and non-convex settings. In particular, in the convex case, the method does not require access to the gradient Lipshitz constant for convergence, and is guaranteed to never diverge. The convergence rates and empirical evaluations compare favorably to the classical (stochastic) gradient method as well as to several other adaptive methods.

4. Stochastic Polyak Step-sizes and Momentum: Convergence Guarantees and Practical Performance #

Authors: Antonio Orvieto, Lin Xiao

Abstract: Stochastic gradient descent with momentum, also known as Stochastic Heavy Ball method (SHB), is one of the most popular algorithms for solving large-scale stochastic optimization problems in various machine learning tasks. In practical scenarios, tuning the step-size and momentum parameters of the method is a prohibitively expensive and time-consuming process. In this work, inspired by the recent advantages of stochastic Polyak step-size in the performance of stochastic gradient descent (SGD), we propose and explore new Polyak-type variants suitable for the update rule of the SHB method. In particular, using the Iterate Moving Average (IMA) viewpoint of SHB, we propose and analyze three novel step-size selections: $\text{MomSPS} _{\max}$, $\text{MomDecSPS}$, and $\text{MomAdaSPS}$. For $\text{MomSPS} _{\max}$, we provide convergence guarantees for SHB to a neighborhood of the solution for convex and smooth problems (without assuming interpolation). If interpolation is also satisfied, then using $\text{MomSPS} _{\max}$, SHB converges to the true solution at a fast rate matching the deterministic HB. The other two variants, MomDecSPS and MomAdaSPS, are the first adaptive step-size for SHB that guarantee convergence to the exact minimizer - without a priori knowledge of the problem parameters and without assuming interpolation. Our convergence analysis of SHB is tight and obtains the convergence guarantees of stochastic Polyak step-size for SGD as a special case. We supplement our analysis with experiments validating our theory and demonstrating the effectiveness and robustness of our algorithms.

Where: 13th International Conference on Learning Representations (ICLR 2025)

Source code: https://openreview.net/forum?id=nuX2yPejiL

Pre-print articles on gradient-clipping methods

Mon, 15 Jul 2024 00:00:00 +0000

1. Why gradient clipping accelerates training: A theoretical justification for adaptivity #

Authors: Jingzhao Zhang, Tianxing He, Suvrit Sra, Ali Jadbabaie

Abstract: We provide a theoretical explanation for the effectiveness of gradient clipping in training deep neural networks. The key ingredient is a new smoothness condition derived from practical neural network training examples. We observe that gradient smoothness, a concept central to the analysis of first-order optimization algorithms that is often assumed to be a constant, demonstrates significant variability along the training trajectory of deep neural networks. Further, this smoothness positively correlates with the gradient norm, and contrary to standard assumptions in the literature, it can grow with the norm of the gradient. These empirical observations limit the applicability of existing theoretical analyses of algorithms that rely on a fixed bound on smoothness. These observations motivate us to introduce a novel relaxation of gradient smoothness that is weaker than the commonly used Lipschitz smoothness assumption. Under the new condition, we prove that two popular methods, namely, \emph{gradient clipping} and \emph{normalized gradient}, converge arbitrarily faster than gradient descent with fixed stepsize. We further explain why such adaptively scaled gradient methods can accelerate empirical convergence and verify our results empirically in popular neural network training settings.

2. Revisiting Gradient Clipping: Stochastic bias and tight convergence guarantees #

Authors: Anastasia Koloskova, Hadrien Hendrikx, Sebastian U. Stich

Abstract: Gradient clipping is a popular modification to standard (stochastic) gradient descent, at every iteration limiting the gradient norm to a certain value $c >0$. It is widely used for example for stabilizing the training of deep learning models (Goodfellow et al., 2016), or for enforcing differential privacy (Abadi et al., 2016). Despite popularity and simplicity of the clipping mechanism, its convergence guarantees often require specific values of c and strong noise assumptions.

In this paper, we give convergence guarantees that show precise dependence on arbitrary clipping thresholds c and show that our guarantees are tight with both deterministic and stochastic gradients. In particular, we show that (i) for deterministic gradient descent, the clipping threshold only affects the higher-order terms of convergence, (ii) in the stochastic setting convergence to the true optimum cannot be guaranteed under the standard noise assumption, even under arbitrary small step-sizes. We give matching upper and lower bounds for convergence of the gradient norm when running clipped SGD, and illustrate these results with experiments.

3. Clipping Improves Adam-Norm and AdaGrad-Norm when the Noise Is Heavy-Tailed #

Authors: Savelii Chezhegov, Yaroslav Klyukin, Andrei Semenov, Aleksandr Beznosikov, Alexander Gasnikov, Samuel Horváth, Martin Takáč, Eduard Gorbunov

Abstract: Methods with adaptive stepsizes, such as AdaGrad and Adam, are essential for training modern Deep Learning models, especially Large Language Models. Typically, the noise in the stochastic gradients is heavy-tailed for the later ones. Gradient clipping provably helps to achieve good high-probability convergence for such noises. However, despite the similarity between AdaGrad/Adam and Clip-SGD, the current understanding of the high-probability convergence of AdaGrad/Adam-type methods is limited in this case. In this work, we prove that AdaGrad/Adam (and their delayed version) can have provably bad high-probability convergence if the noise is heavy-tailed. We also show that gradient clipping fixes this issue, i.e., we derive new high-probability convergence bounds with polylogarithmic dependence on the confidence level for AdaGrad-Norm and Adam-Norm with clipping and with/without delay for smooth convex/non-convex stochastic optimization with heavy-tailed noise. Our empirical evaluations highlight the superiority of clipped versions of AdaGrad/Adam-Norm in handling the heavy-tailed noise.

About

Thu, 27 Jun 2024 23:14:15 +0800

My full name is Lê Nhựt Nam. I completed two distinguish Master Degree in Computer Science and Applied Mathematics in 2024 and 2025, respectively at the University of Science, Vietnam National University, HCMC. Prior to my graduate studies, I earned a Bachelor of Science in Computer Science at the same institution.My interests areas is optimization, especially algorithms and its applications. Furthermore, I also like to read books which related to partial differential equations.

Free Books on Dynamical Systems

Thu, 27 Jun 2024 23:14:15 +0800

Arxiv/ Free Books #

1. Lectures on Neural Dynamics - Francesco Bullo #

Chapter 1: Neural circuit models based on firing rates and Hopfield networks: their dynamics, interconnections, and local Hebbian adaptation rules
Chapter 2: Stability in dynamic neural networks using Lyapunov methods, multistability, and energy functions
Chapter 3: Optimization in neural networks through biologically inspired gradient dynamics and sparse representations.
Chapter 4: Unsupervised learning via neural dynamics, linking Hebbian rules to tasks like PCA, clustering, and similarity-based representation learning.

2. Linear Geometry and Algebra - Taras Banakh #

Abstract: Linear Geometry studies geometric properties which can be expressed via the notion of a line. All information about lines is encoded in a ternary relation called a line relation. A set endowed with a line relation is called a liner. So, Linear Geometry studies liners. Imposing some additional axioms on a liner, we obtain some special classes of liners: regular, projective, affine, proaffine, etc. Linear Geometry includes Affine and Projective Geometries and is a part of Incidence Geometry. The aim of this book is to present a self-contained logical development of Linear Geometry, starting with some intuitive acceptable geometric axioms and ending with algebraic structures that necessarily arise from studying the structure of geometric objects that satisfy those simple and intuitive geometric axioms. We shall meet many quite exotic algebraic structures that arise this way: magmas, loops, ternary-ring, quasi-fields, alternative rings, procorps, profields, etc. We strongly prefer (synthetic) geometric proofs and use tools of analytic geometry only when no purely geometric proof is available. Liner Geometry has been developed by many great mathematicians since times of Antiquity (Thales, Euclides, Proclus, Pappus), through Renaissance (Descartes, Desargues), Early Modernity (Playfair, Gauss, Lobachevski, Bolyai, Poncelet, Steiner, Möbius), Late Modernity Times (Steinitz, Klein, Hilbert, Moufang, Hessenberg, Jordan, Beltrami, Fano, Gallucci, Veblen, Wedderburn, Lenz, Barlotti) till our contempories (Hartshorne, Hall, Buekenhout, Gleason, Kantor, Doyen, Hubault, Dembowski, Klingenberg, Grundhöfer).

3. An introduction to graph theory - Darij Grinberg #

Abstract: This is a graduate-level introduction to graph theory, corresponding to a quarter-long course. It covers simple graphs, multigraphs as well as their directed analogues, and more restrictive classes such as tournaments, trees and arborescences. Among the features discussed are Eulerian circuits, Hamiltonian cycles, spanning trees, the matrix-tree and BEST theorems, proper colorings, Turan’s theorem, bipartite matching and the Menger and Gallai–Milgram theorems. The basics of network flows are introduced in order to prove Hall’s marriage theorem.

4. An introduction to reservoir computing - Michael te Vrugt #

Abstract: There is a growing interest in the development of artificial neural networks that are implemented in a physical system. A major challenge in this context is that these networks are difficult to train since training here would require a change of physical parameters rather than simply of coefficients in a computer program. For this reason, reservoir computing, where one employs high-dimensional recurrent networks and trains only the final layer, is widely used in this context. In this chapter, I introduce the basic concepts of reservoir computing. Moreover, I present some important physical implementations coming from electronics, photonics, spintronics, mechanics, and biology. Finally, I provide a brief discussion of quantum reservoir computing.

5. Nonequilibrium and Irreversibility - Giovanni Gallavotti #

Abstract: The work concentrates on relations, which are general and model independent in chaotic system, between time averages of a few (typically {\it very few}) observables. Equilibrium thermodynamics provides a guide and here is attempted to argue that the viewpoint of Sinai-Ruelle-Bowen can be regarded as a generalization to nonequilibrum phenomena of the theory of the ensembles proposing an answer to classical question like which distributions describe the statistics of stationary states (hence extend the analysis selecting canonical, or equivalent distributions, equilibrim between the uncountably many possibilities). The special name “Chaothic Hypothesis” (CH) is given to the above attempt and its mathematical meaning is discussed. General properties are presented and applied (eg. ‘Fluctuation Theorem’, ‘Fluctuation Patterns’, ‘Pairing Symmetry’) and related to the basic Time Reversal symmetry: which presents irreversibility as due to chaotic motion rather than to viscous forces. The case of a simple incompressible fluid is discussed in some detail. The possibility that CH is violated in various cases is considered: and in the end it is suggested that CH is the paradigm of chaotic evolution, as the harmonic oscillators are a paradigm of ordered motions, but of course {\it tertium datur}. The exposition is informal and often restricted to heuristic analysis, with detailed references to the literature and attention to numerical simulations and importance of stressing strongly the discrete models of Physics, trying to imitate the vision of Boltzmann, is widely considered.

6. Symmetries of Living Systems: Symmetry Fibrations and Synchronization in Biological Networks - Hernan A. Makse, Paolo Boldi, Francesco Sorrentino, Ian Stewart #

Abstract: A symmetry is a `change without change’. As simple as it sounds, this concept is the fundamental cornerstone that unifies all branches of theoretical physics. Virtually all physical laws – ranging from classical mechanics and electrodynamics to relativity, quantum mechanics, and the standard model – can be expressed in terms of symmetry invariances. In this book, we explore whether the same principle can also explain the emergent laws of biological systems. We introduce a new geometry for biological networks and AI architectures, drawing inspiration from the mystic genius of Grothendieck’s fibrations in category theory. We attempt to bridge the gap between physics and biology using symmetries but with a twist. The traditional symmetry groups of physics are global and too rigid to describe biology. Instead, the novel notion of symmetry fibration is local, flexible, and adaptable to evolutionary pressures, providing the right framework for understanding biological complexity. In other words, this more general symmetry invariance is necessary and sufficient to ensure that a given biological network configuration can support a synchronized function. In this book, we review the theoretical progress over the last decades from mathematics, physics, computer science, dynamical systems, and graph theory that has led to the discovery of symmetry fibrations in biological networks. These symmetries act as organizing principles for biological networks. They serve as effective tools for describing the structure of these networks, blending geometry and topology. Fibrations explain how structure dictates function across various biological domains, including the transcriptome, proteome, metabolome, and connectome. Additionally, they facilitate a reduction in the dimensionality of the network, simplifying it into its fundamental building blocks for biological computation.

7. Causal Fermion Systems: An Introduction to Fundamental Structures, Methods and Applications - Felix Finster, Sebastian Kindermann, Jan-Hendrik Treude #

Abstract: This textbook introduces the basic concepts of the theory of causal fermion systems, a recent approach to the description of fundamental physics. The theory yields quantum mechanics, general relativity and quantum field theory as limiting cases and is therefore a candidate for a unified physical theory. From the mathematical perspective, causal fermion systems provide a general framework for describing and analyzing non-smooth geometries and “quantum geometries.” The dynamics is described by a novel variational principle, the causal action principle. The book includes a detailed summary of the mathematical and physical preliminaries. It explains the physical concepts behind the causal fermion system approach from the basics. Moreover, all the mathematical objects and structures are introduced step by step. The mathematical methods used for the analysis of causal fermion systems and the causal action principle are introduced in depth. Many examples and applications are worked out. The textbook is addressed to master and graduate students in mathematics or physics. Furthermore, it serves as a reference work for researchers working in the field.

8. A gentle invitation to the fractional world - Nicola Abatangelo, Serena Dipierro, Enrico Valdinoci #

Abstract: This book is intended as a self-contained introduction to selected topics in the fractional world, focusing particularly on aspects that arise in the study of equations driven by the fractional Laplacian. The scope of this work is not intended to be exhaustive or all-encompassing. We have chosen topics that we believe will appeal to readers embarking on their journey into fractional analysis. It requires only fundamental calculus and a basic understanding of measure theory. In Chapter 1, we introduce the primary object of study, the fractional Laplacian. This operator appears in diverse contexts, prompting multiple definitions and viewpoints, many of which we explore, along with some key identities. A notable distinction between local and nonlocal analysis is that in the latter, explicit calculations are often impractical or impossible. There are anyway some fortunate exceptions which are gathered in Chapter 2, providing useful and instructive examples. Chapter 3 presents an introduction to the important aspect of Liouville-type results. A large portion of this book is devoted to the regularity theory of solutions in Lebesgue spaces. Chapter 4 examines global solutions using Riesz and Bessel potential analysis, capturing the impact of both low and high frequencies on smoothness, decay, and oscillations. These spaces are also flexible enough to provide, as a byproduct, a solid regularity theory in the more commonly used fractional Sobolev spaces. In Chapter 5 we derive the corresponding interior regularity theory for solutions within a bounded domain using appropriate cutoffs and localization techniques. Additionally, technical appendices include auxiliary results used in key proofs.

9. Kinetically constrained models - Ivailo Hartarsky, Cristina Toninelli #

Abstract: The goal of this book is to provide an introduction to the mathematical theory of Kinetically constrained models developed in the last twenty years, intended for both mathematicians and physicists.

10. What is Entropy? - John C. Baez #

Abstract: This short book is an elementary course on entropy, leading up to a calculation of the entropy of hydrogen gas at standard temperature and pressure. Topics covered include information, Shannon entropy and Gibbs entropy, the principle of maximum entropy, the Boltzmann distribution, temperature and coolness, the relation between entropy, expected energy and temperature, the equipartition theorem, the partition function, the relation between expected energy, free energy and entropy, the entropy of a classical harmonic oscillator, the entropy of a classical particle in a box, and the entropy of a classical ideal gas.

11. Alice’s Adventures in a Differentiable Wonderland – Volume I, A Tour of the Land - Simone Scardapane #

Abstract: Neural networks surround us, in the form of large language models, speech transcription systems, molecular discovery algorithms, robotics, and much more. Stripped of anything else, neural networks are compositions of differentiable primitives, and studying them means learning how to program and how to interact with these models, a particular example of what is called differentiable programming. This primer is an introduction to this fascinating field imagined for someone, like Alice, who has just ventured into this strange differentiable wonderland. I overview the basics of optimizing a function via automatic differentiation, and a selection of the most common designs for handling sequences, graphs, texts, and audios. The focus is on a intuitive, self-contained introduction to the most important design techniques, including convolutional, attentional, and recurrent blocks, hoping to bridge the gap between theory and code (PyTorch and JAX) and leaving the reader capable of understanding some of the most advanced models out there, such as large language models (LLMs) and multimodal architectures.

12. Inverse Problems and Data Assimilation: A Machine Learning Approach - Eviatar Bach, Ricardo Baptista, Daniel Sanz-Alonso, Andrew Stuart #

Abstract: The aim of these notes is to demonstrate the potential for ideas in machine learning to impact on the fields of inverse problems and data assimilation. The perspective is one that is primarily aimed at researchers from inverse problems and/or data assimilation who wish to see a mathematical presentation of machine learning as it pertains to their fields. As a by-product, we include a succinct mathematical treatment of various topics in machine learning.

13. The Lanczos algorithm for matrix functions: a handbook for scientists - Tyler Chen #

Abstract: Lanczos-based methods have become standard tools for tasks involving matrix functions. Progress on these algorithms has been driven by several largely disjoint communities, resulting many innovative and important advancements which would not have been possible otherwise. However, this also has resulted in a somewhat fragmented state of knowledge and the propagation of a number of incorrect beliefs about the behavior of Lanczos-based methods in finite precision arithmetic. This monograph aims to provide an accessible introduction to Lanczos-based methods for matrix functions. The intended audience is scientists outside of numerical analysis, graduate students, and researchers wishing to begin work in this area. Our emphasis is on conceptual understanding, with the goal of providing a starting point to learn more about the remarkable behavior of the Lanczos algorithm. Hopefully readers will come away from this text with a better understanding of how to think about Lanczos for modern problems involving matrix functions, particularly in the context of finite precision arithmetic.

14. New Book: Tensor Decompositions for Data Science #

Abstract: This book is intended for a graduate-level course in a data-science domain such as mathematics, computer science, engineering, statistics, physics, neuroscience, etc. It is written so that it can be used flexibly. It can be adapted for a subunit in a longer class or can stand on its own in a full semester course. We include substantial background material in linear algebra, optimization, and probability and statistics in the hopes of making the contents widely accessible. The book includes links to several real-world datasets to be used as examples for experiments in the book, grounding the material and providing a playground for student experimentation.

15. Calculus and applications - Teo Banica #

Abstract: This is an introduction to calculus, and its applications to basic questions from physics. We first discuss the theory of functions $f:\mathbb R\to\mathbb R$, with the notion of continuity, and the construction of the derivative $f’(x)$ and of the integral $\int_a^bf(x)dx$. Then we investigate the case of the complex functions $f:\mathbb C\to\mathbb C$, and notably the holomorphic functions, and harmonic functions. Then, we discuss the multivariable functions, $f:\mathbb R^N\to\mathbb R^M$ or $f:\mathbb R^N\to\mathbb C^M$ or $f:\mathbb C^N\to\mathbb C^M$, with general theory, integration results, maximization questions, and basic applications to physics.

16. Stochastic Partial Differential Equations, Space-time White Noise and Random Fields - Robert C. Dalang, Marta Sanz-Solé #

Abstract: This book is an introduction to the theory of stochastic partial differential equations (SPDEs), using the random field approach pioneered by J.B. Walsh (1986). The volume consists of two blocks: the core matter (Chapters 1 to 5) and the appendices (A, B and C). Chapter 1 introduces the subject, with a discussion of isonormal Gaussian processes, space-time white noise, and motivating examples of SPDEs. Chapter 2 presents a theory of stochastic integration with respect to space-time white noise. Chapter 3 deals with SPDEs with additive noise. In Chapter 4, we study a general class of SPDEs, in which additive and multiplicative nonlinearities appear. In Chapter 5, we present a selection of important topics in the theory of SPDEs, that have been the subject of much research over the last twenty years. Appendix A summarises the main results from the theory of stochastic processes and stochastic analysis that are used throughout the book. Appendix B is devoted to a systematic presentation of properties of fundamental solutions and Green’s functions associated to the classical linear differential operators (heat, fractional heat and wave operators). Appendix C is a toolbox section. Each chapter is followed by a “Notes” section, which gives historically important references, original sources and points towards other related important contributions.

17. Dynamic Programming: Finite States - Thomas J. Sargent, John Stachurski #

Abstract: This book is about dynamic programming and its applications in economics, finance, and adjacent fields. It brings together recent innovations in the theory of dynamic programming and provides applications and code that can help readers approach the research frontier. The book is aimed at graduate students and researchers, although most chapters are accessible to undergraduate students with solid quantitative backgrounds.

18. Resources of the Quantum World - Gilad Gour #

Abstract: This book delves into the burgeoning field of quantum resource theories, a novel and vibrant area of research within quantum information science that seeks to unify diverse quantum phenomena under a single framework. By recognizing various attributes of physical systems as “resources,” this approach offers a fresh perspective on quantum phenomena, transforming our understanding and application of concepts such as quantum entanglement, coherence, and more. With a focus on the pedagogical, the book aims to equip readers with the advanced mathematical tools and physical principles needed to navigate and contribute to this rapidly evolving field. It covers a wide range of topics, from the foundational aspects of quantum mechanics and quantum information to detailed explorations of specific resource theories, including entanglement, asymmetry, and thermodynamics. Through rigorous mathematical exposition and a unique axiomatic approach, the book provides deep insights into the operational and conceptual frameworks that underpin quantum resource theories, making it an invaluable resource for graduate students, early-career researchers, and anyone interested in the cutting-edge developments in quantum information science.

19. Funktionalanalysis Teil I - Christoph Bock #

Abstract: Roughly spoken, Functionalanalysis means the study of the category of infinite-dimensional vectorspaces over the field of real or complex numbers, together with their linear maps. In most cases, one further needs a topological structure on such a vectorspace, because then, you can consider the continuous linear maps between such spaces. The name Functionalanalysis is due to the fact, that in the beginning of the theory, the authors wanted to expand Calculus onto functionals of spaces of functions. Functionalanalytical results give the possibility to solve problems in the Theory of (Partial) Differential Equations, in Complex Analysis or in Quantum Mechanics. But the aim of this lines is not to explain the applications. We will discuss the mathematical theory of almost metric spaces, normed vector spaces and algebras, spaces of continuous resp. $p$-integrable functions as well as reflexive and uniformly convex spaces.

We added that, in the case $p \in {]}0,1{[}$, $L^p$ is the completion of the compactly supported continuous functions (with the obvious metric), too. Actually, the proof is the same as in the case $p \in {[}1, \infty{[}$.

20. Algebraic Topology for Data Scientists - Michael S. Postol #

Abstract: This book gives a thorough introduction to topological data analysis (TDA), the application of algebraic topology to data science. Algebraic topology is traditionally a very specialized field of math, and most mathematicians have never been exposed to it, let alone data scientists, computer scientists, and analysts. I have three goals in writing this book. The first is to bring people up to speed who are missing a lot of the necessary background. I will describe the topics in point-set topology, abstract algebra, and homology theory needed for a good understanding of TDA. The second is to explain TDA and some current applications and techniques. Finally, I would like to answer some questions about more advanced topics such as cohomology, homotopy, obstruction theory, and Steenrod squares, and what they can tell us about data. It is hoped that readers will acquire the tools to start to think about these topics and where they might fit in.

21. Discrete and Continuous Weak KAM Theory: an introduction through examples and its applications to twist maps - Maxime Zavidovique #

Abstract: The aim of these notes is to present a self contained account of discrete weak KAM theory. Put aside the intrinsic elegance of this theory, it is also a toy model for classical weak KAM theory, where many technical difficulties disappear, but where central ideas and results persist. It can therefore serve as a good introduction to (continuous) weak KAM theory. After a general exposition of the general abstract theory, several examples are studied. The last section is devoted to the historical problem of conservative twist maps of the annulus. At the end of the first three Chapters, the relations between the results proved in the discrete setting and the analogous theorems of classical weak KAM theory are discussed. Some key differences are also highlighted between the discrete and classical theory. Those results are new. The text also contains other results never published before, such as the convergence of solutions of discounted equations for degenerate perturbations.

Mathematics Books

Thu, 27 Jun 2024 23:14:15 +0800

Mathematics Lecture Notes

Thu, 27 Jun 2024 23:14:15 +0800

Mathematics MOOCS

Thu, 27 Jun 2024 23:14:15 +0800

Pre-print articles on Difference-of-Convex (DC) Programming

Thu, 27 Jun 2024 23:14:15 +0800

57. Stochastic Difference-of-Convex Optimization with Momentum #

Authors: El Mahdi Chayti, Martin Jaggi

Abstract: Stochastic difference-of-convex (DC) optimization is prevalent in numerous machine learning applications, yet its convergence properties under small batch sizes remain poorly understood. Existing methods typically require large batches or strong noise assumptions, which limit their practical use. In this work, we show that momentum enables convergence under standard smoothness and bounded variance assumptions (of the concave part) for any batch size. We prove that without momentum, convergence may fail regardless of stepsize, highlighting its necessity. Our momentum-based algorithm achieves provable convergence and demonstrates strong empirical performance.

URL: https://arxiv.org/abs/2510.17503

56. On the convergence rate of the boosted Difference-of-Convex Algorithm (DCA) #

Authors: Hadi Abbaszadehpeivasti, Etienne de Klerk, Adrien Taylor

Abstract: The difference-of-convex algorithm (DCA) is a well-established nonlinear programming technique that solves successive convex optimization problems. These sub-problems are obtained from the difference-of-convex~(DC) decompositions of the objective and constraint functions. We investigate the worst-case performance of the unconstrained DCA, with and without boosting, where boosting simply performs an additional step in the direction generated by the usual DCA method. We show that, for certain classes of DC decompositions, the boosted DCA is provably better in the worst-case than the usual DCA. While several numerical studies have reported that boosted DCA outperforms classical DCA, a theoretical explanation for this behavior has, to the best of our knowledge, not been given until now. Our proof technique relies on semidefinite programming (SDP) performance estimation

URL: https://arxiv.org/abs/2510.16569

55. Global solution algorithms for DC programming via polyhedral approximations of convex functions #

Authors: Fahaar M. Pirani & Firdevs Ulus

Abstract: We consider difference of convex (DC) programming problems and propose three algorithms to solve them globally. The main working mechanism of the proposed algorithms is to generate polyhedral underestimators to convex functions. Two of these algorithms generate a ‘fine’ polyhedral approximation of the first convex component over the compact feasible region of the DC programming problem. We prove the finiteness of these algorithms, establish the convergence rate of one of them. Moreover, we show that using the polyhedral approximation of the first component, it is possible to compute an approximate global solution of the corresponding DC programming problem without further computational effort. The third algorithm also computes a polyhedral underestimator of the first component of the DC function. Different from the first two algorithms, the third algorithm approximates it locally until finding an approximate global solution to the DC programming problem. It is shown that for any positive approximation error, the third algorithm stops after finitely many iterations. Computational results based on some test instances from the literature are provided.

URL: https://link.springer.com/article/10.1007/s10898-025-01535-z

54. Improved Rates for Stochastic Variance-Reduced Difference-of-Convex Algorithms #

Authors: Anh Duc Nguyen, Alp Yurtsever, Suvrit Sra, Kim-Chuan Toh

Abstract: In this work, we propose and analyze DCA-PAGE, a novel algorithm that integrates the difference-of-convex algorithm (DCA) with the ProbAbilistic Gradient Estimator (PAGE) to solve structured nonsmooth difference-of-convex programs. In the finite-sum setting, our method achieves a gradient computation complexity of $O(N + N^{1/2}\varepsilon^{-2})$ with sample size $N$, surpassing the previous best-known complexity of $O(N + N^{2/3}\varepsilon^{-2})$ for stochastic variance-reduced (SVR) DCA methods. Furthermore, DCA-PAGE readily extends to online settings with a similar optimal gradient computation complexity $O(b + b^{1/2}\varepsilon^{-2})$ with batch size $b$, a significant advantage over existing SVR DCA approaches that only work for the finite-sum setting. We further refine our analysis with a gap function, which enables us to obtain comparable convergence guarantees under milder assumptions.

Comment: Accepted at IEEE Conference on Decision and Control (IEEE CDC 2025)

URL: https://arxiv.org/pdf/2509.11657

53. New Algorithms for maximizing the difference of convex functions #

Authors: Aharon Ben-Tal, Luba Tetruashvili

Abstract: Maximizing the difference of 2 convex functions over a convex feasible set (the so called DCA problem) is a hard problem. There is a large number of publications addressing this problem. Many of them are variations of widely used DCA algorithm [20]. The success of this algorithm to reach a good approximation of a global optimum, depends crucially on the choice of its starting point. In the algorithm developed in our paper MDCF (Maximizing the Difference of Convex Functions) a major effort is to generate a good starting point. This is obtained by using the COMAX algorithm for maximizing a convex function [6]. The solution found by COMAX is a basis for obtaining a good strating point for MDCF. Another contribution of the paper is the algorithm for solving problems with an indefinite quadratic objective function and compact and convex feasible set. The problem is first converted to maximizing a difference of convex quadratic functions. The new algorithm QMDCF is a specific adaptation of MDCF to this case. The performance of the two new algorithms developed in the paper is tested numerically, and results are compared to the performance of classical DCA, and some other algorithms.

URL: https://optimization-online.org/2025/04/new-algorithms-for-maximizing-the-difference-of-convex-functions/

52. A progressive decoupling algorithm for minimizing the difference of convex and weakly convex functions #

Authors: Welington de Oliveira & João Carlos de Oliveira Souza

Abstract: Commonly, decomposition and splitting techniques for optimization problems strongly depend on convexity. Implementable splitting methods for nonconvex and nonsmooth optimization problems are scarce and often lack convergence guarantees. Among the few exceptions is the Progressive Decoupling Algorithm (PDA), which has local convergence should convexity be elicitable. In this work, we furnish PDA with a descent test and extend the method to accommodate a broad class of nonsmooth optimization problems with non-elicitable convexity. More precisely, we focus on the problem of minimizing the difference of convex and weakly convex functions over a linear subspace. This framework covers, in particular, a family of stochastic programs with nonconvex recourse and statistical estimation problems for supervised learning.

URL: https://link.springer.com/article/10.1007/s10957-024-02574-4

51. An Inexact Proximal Framework for Nonsmooth Riemannian Difference-of-Convex Optimization [arXiv:2509.08561] #

Authors: Bo Jiang, Meng Xu, Xingju Cai, Ya-Feng Liu

Abstract: Nonsmooth Riemannian optimization has attracted increasing attention, especially in problems with sparse structures. While existing formulations typically involve convex nonsmooth terms, incorporating nonsmooth difference-of-convex (DC) penalties can enhance recovery accuracy. In this paper, we study a class of nonsmooth Riemannian optimization problems whose objective is the sum of a smooth function and a nonsmooth DC term. We establish, for the first time in the manifold setting, the equivalence between such DC formulations (with suitably chosen nonsmooth DC terms) and their $\ell_0$-regularized or $\ell_0$-constrained counterparts. To solve these problems, we propose an inexact Riemannian proximal DC (iRPDC) algorithmic framework, which returns an $\epsilon$-Riemannian critical point within $\mathcal{O}(\epsilon^{-2})$ outer iterations. Within this framework, we develop several practical algorithms based on different subproblem solvers. Among them, one achieves an overall iteration complexity of $\mathcal{O}(\epsilon^{-3})$, which matches the best-known bound in the literature. In contrast, existing algorithms either lack provable overall complexity or require $\mathcal{O}(\epsilon^{-3})$ iterations in both outer and overall complexity. A notable feature of the iRPDC algorithmic framework is a novel inexactness criterion that not only enables efficient subproblem solutions via first-order methods but also facilitates a linesearch procedure that adaptively captures the local curvature. Numerical results on sparse principal component analysis demonstrate the modeling flexibility of the DC formulaton and the competitive performance of the proposed algorithmic framework.

URL: https://arxiv.org/abs/2509.08561

50. Tight Convergence Rates in Gradient Mapping for the Difference-of-Convex Algorithm [arXiv:2506.01791] #

Authors: Teodor Rotaru, Panagiotis Patrinos, François Glineur

Abstract: We establish new theoretical convergence guarantees for the difference-of-convex algorithm (DCA), where the second function is allowed to be weakly-convex, measuring progress via composite gradient mapping. Based on a tight analysis of two iterations of DCA, we identify six parameter regimes leading to sublinear convergence rates toward critical points and establish those rates by proving adapted descent lemmas. We recover existing rates for the standard difference-of-convex decompositions of nonconvex-nonconcave functions, while for all other curvature settings our results are new, complementing recently obtained rates on the gradient residual. Three of our sublinear rates are tight for any number of DCA iterations, while for the other three regimes we conjecture exact rates, using insights from the tight analysis of gradient descent and numerical validation using the performance estimation methodology. Finally, we show how the equivalence between proximal gradient descent (PGD) and DCA allows the derivation of exact PGD rates for any constant stepsize.

URL: https://arxiv.org/abs/2506.01791

49. Enforcing Fairness Where It Matters: An Approach Based on Difference-of-Convex Constraints [arXiv:2505.12530] #

Authors: Yutian He, Yankun Huang, Yao Yao, Qihang Lin

Abstract: Fairness in machine learning has become a critical concern, particularly in high-stakes applications. Existing approaches often focus on achieving full fairness across all score ranges generated by predictive models, ensuring fairness in both high and low-scoring populations. However, this stringent requirement can compromise predictive performance and may not align with the practical fairness concerns of stakeholders. In this work, we propose a novel framework for building partially fair machine learning models, which enforce fairness within a specific score range of interest, such as the middle range where decisions are most contested, while maintaining flexibility in other regions. We introduce two statistical metrics to rigorously evaluate partial fairness within a given score range, such as the top 20%-40% of scores. To achieve partial fairness, we propose an in-processing method by formulating the model training problem as constrained optimization with difference-of-convex constraints, which can be solved by an inexact difference-of-convex algorithm (IDCA). We provide the complexity analysis of IDCA for finding a nearly KKT point. Through numerical experiments on real-world datasets, we demonstrate that our framework achieves high predictive performance while enforcing partial fairness where it matters most.

URL:

48. A smoothing moving balls approximation method for a class of conic-constrained difference-of-convex optimization problems [arXiv:2505.12314] #

Authors: Jiefeng Xu, Ting Kei Pong, Nung-sing Sze

Abstract: In this paper, we consider the problem of minimizing a difference-of-convex objective over a nonlinear conic constraint, where the cone is closed, convex, pointed and has a nonempty interior. We assume that the support function of a compact base of the polar cone exhibits a majorizing smoothing approximation, a condition that is satisfied by widely studied cones such as $\mathbb{R}^m_-$ and ${\cal S}^m_-$. Leveraging this condition, we reformulate the conic constraint equivalently as a single constraint involving the aforementioned support function, and adapt the moving balls approximation (MBA) method for its solution. In essence, in each iteration of our algorithm, we approximate the support function by a smooth approximation function and apply one MBA step. The subproblems that arise in our algorithm always involve only one single inequality constraint, and can thus be solved efficiently via one-dimensional root-finding procedures. We design explicit rules to evolve the smooth approximation functions from iteration to iteration and establish the corresponding iteration complexity for obtaining an $ε$-Karush-Kuhn-Tucker point. In addition, in the convex setting, we establish convergence of the sequence generated, and study its local convergence rate under a standard Hölderian growth condition. Finally, we illustrate numerically the effects of different rules of evolving the smooth approximation functions on the rate of convergence.

URL: https://arxiv.org/abs/2505.12314

47. A preconditioned difference of convex functions algorithm with extrapolation and line search [arXiv:2505.11914] #

Authors: Ran Zhang, Hongpeng Sun

Abstract: This paper proposes a novel proximal difference-of-convex (DC) algorithm enhanced with extrapolation and aggressive non-monotone line search for solving non-convex optimization problems. We introduce an adaptive conservative update strategy of the extrapolation parameter determined by a computationally efficient non-monotone line search. The core of our algorithm is to unite the update of the extrapolation parameter with the step size of the non-monotone line search interactively. The global convergence of the two proposed algorithms is established through the Kurdyka-Łojasiewicz properties, ensuring convergence within a preconditioned framework for linear equations. Numerical experiments on two general non-convex problems: SCAD-penalized binary classification and graph-based Ginzburg-Landau image segmentation models, demonstrate the proposed method’s high efficiency compared to existing DC algorithms both in convergence rate and solution accuracy.

URL:

46. Contractive difference-of-convex algorithms [arXiv:2505.10800] #

Authors: Songnian He, Qiao-Li Dong, Michael Th. Rassias

Abstract: The difference-of-convex algorithm (DCA) and its variants are the most popular methods to solve the difference-of-convex optimization problem. Each iteration of them is reduced to a convex optimization problem, which generally needs to be solved by iterative methods such as proximal gradient algorithm. However, these algorithms essentially belong to some iterative methods of fixed point problems of averaged mappings, and their convergence speed is generally slow. Furthermore, there is seldom research on the termination rule of these iterative algorithms solving the subproblem of DCA. To overcome these defects, we ffrstly show that the subproblem of the linearized proximal method (LPM) in each iteration is equal to the ffxed point problem of a contraction. Secondly, by using Picard iteration to approximately solve the subproblem of LPM in each iteration, we propose a contractive difference-ofconvex algorithm (cDCA) where an adaptive termination rule is presented. Both global subsequential convergence and global convergence of the whole sequence of cDCA are established. Finally, preliminary results from numerical experiments are promising.

URL: https://link.springer.com/article/10.1007/s10957-025-02689-2

Journal: Journal of Optimization Theory and Applications

45. A full splitting algorithm for structured difference-of-convex programs [arXiv:2505.02588] #

Authors: Radu Ioan Bot, Rossen Nenov, Min Tao

Abstract: In this paper, we study a class of nonconvex and nonsmooth structured difference-of-convex (DC) programs, which contain in the convex part the sum of a nonsmooth linearly composed convex function and a differentiable function, and in the concave part another nonsmooth linearly composed convex function. Among the various areas in which such problems occur, we would like to mention in particular the recovery of sparse signals. We propose an adaptive double-proximal, full-splitting algorithm with a moving center approach in the final subproblem, which addresses the challenge of evaluating compositions by decoupling the linear operator from the nonsmooth component. We establish the subsequential convergence of the generated sequence of iterates to an approximate stationary point and prove its global convergence under the Kurdyka-Łojasiewicz property. We also discuss the tightness of the convergence results and provide insights into the rationale for seeking an approximate KKT point. This is illustrated by constructing a counterexample showing that the algorithm can diverge when seeking exact solutions. Finally, we present a practical version of the algorithm that incorporates a nonmonotone line search, which significantly improves the convergence performance.

URL:

44. Optimization over Trained Neural Networks: Difference-of-Convex Algorithm and Application to Data Center Scheduling [arXiv:2503.17506] #

Authors: Xinwei Liu, Vladimir Dvorkin

Abstract: When solving decision-making problems with mathematical optimization, some constraints or objectives may lack analytic expressions but can be approximated from data. When an approximation is made by neural networks, the underlying problem becomes optimization over trained neural networks. Despite recent improvements with cutting planes, relaxations, and heuristics, the problem remains difficult to solve in practice. We propose a new solution based on a bilinear problem reformulation that penalizes ReLU constraints in the objective function. This reformulation makes the problem amenable to efficient difference-of-convex algorithms (DCA), for which we propose a principled approach to penalty selection that facilitates convergence to stationary points of the original problem. We apply the DCA to the problem of the least-cost allocation of data center electricity demand in a power grid, reporting significant savings in congested cases.

URL:

43. Tight Analysis of Difference-of-Convex Algorithm (DCA) Improves Convergence Rates for Proximal Gradient Descent [arXiv:2503.04486] #

Authors: Teodor Rotaru, Panagiotis Patrinos, François Glineur

Abstract: We investigate a difference-of-convex (DC) formulation where the second term is allowed to be weakly convex. We examine the precise behavior of a single iteration of the difference-of-convex algorithm (DCA), providing a tight characterization of the objective function decrease, distinguishing between six distinct parameter regimes. Our proofs, inspired by the performance estimation framework, are notably simplified compared to related prior research. We subsequently derive sublinear convergence rates for the DCA towards critical points, assuming at least one of the functions is smooth. Additionally, we explore the underexamined equivalence between proximal gradient descent (PGD) and DCA iterations, demonstrating how DCA, a parameter-free algorithm, without the need for a stepsize, serves as a tool for studying the exact convergence rates of PGD.

URL:

42. Abstract nonautonomous difference inclusions in locally convex spaces [arXiv:2502.05184] #

Authors: Marko Kostic

Abstract: In this paper, we consider abstract nonautonomous difference inclusions in locally convex spaces with integer order differences. We particularly analyze the existence and uniqueness of almost periodic type solutions to abstract nonautonomous difference inclusions. Our results seem to be completely new even in the Banach space setting.

URL:

41. Learning Difference-of-Convex Regularizers for Inverse Problems: A Flexible Framework with Theoretical Guarantees [arXiv:2502.00240] #

Authors: Yasi Zhang, Oscar Leong

Abstract: Learning effective regularization is crucial for solving ill-posed inverse problems, which arise in a wide range of scientific and engineering applications. While data-driven methods that parameterize regularizers using deep neural networks have demonstrated strong empirical performance, they often result in highly nonconvex formulations that lack theoretical guarantees. Recent work has shown that incorporating structured nonconvexity into neural network-based regularizers, such as weak convexity, can strike a balance between empirical performance and theoretical tractability. In this paper, we demonstrate that a broader class of nonconvex functions, difference-of-convex (DC) functions, can yield improved empirical performance while retaining strong convergence guarantees. The DC structure enables the use of well-established optimization algorithms, such as the Difference-of-Convex Algorithm (DCA) and a Proximal Subgradient Method (PSM), which extend beyond standard gradient descent. Furthermore, we provide theoretical insights into the conditions under which optimal regularizers can be expressed as DC functions. Extensive experiments on computed tomography (CT) reconstruction tasks show that our approach achieves strong performance across sparse and limited-view settings, consistently outperforming other weakly supervised learned regularizers. Our code is available at \url{https://github.com/YasminZhang/ADCR}.

URL:

40. An Inexact Boosted Difference of Convex Algorithm for Nondifferentiable Functions [arXiv:2412.05697] #

Authors: Orizon P. Ferreira, Boris S. Mordukhovich, Wilkreffy M. S. Santos, João Carlos O. Souza

Abstract: In this paper, we introduce an inexact approach to the Boosted Difference of Convex Functions Algorithm (BDCA) for solving nonconvex and nondifferentiable problems involving the difference of two convex functions (DC functions). Specifically, when the first DC component is differentiable and the second may be nondifferentiable, BDCA utilizes the solution from the subproblem of the DC Algorithm (DCA) to define a descent direction for the objective function. A monotone linesearch is then performed to find a new point that improves the objective function relative to the subproblem solution. This approach enhances the performance of DCA. However, if the first DC component is nondifferentiable, the BDCA direction may become an ascent direction, rendering the monotone linesearch ineffective. To address this, we propose an Inexact nonmonotone Boosted Difference of Convex Algorithm (InmBDCA). This algorithm incorporates two main features of inexactness: First, the subproblem therein is solved approximately allowing us for a controlled relative error tolerance in defining the linesearch direction. Second, an inexact nonmonotone linesearch scheme is used to determine the step size for the next iteration. Under suitable assumptions, we demonstrate that InmBDCA is well-defined, with any accumulation point of the sequence generated by InmBDCA being a critical point of the problem. We also provide iteration-complexity bounds for the algorithm. Numerical experiments show that InmBDCA outperforms both the nonsmooth BDCA (nmBDCA) and the monotone version of DCA in practical scenarios.

URL:

39. A preconditioned second-order convex splitting algorithm with a difference of varying convex functions and line search [arXiv:2411.07661] #

Authors: Xinhua Shen, Zaijiu Shang, Hongpeng Sun

Abstract: This paper introduces a preconditioned convex splitting algorithm enhanced with line search techniques for nonconvex optimization problems. The algorithm utilizes second-order backward differentiation formulas (BDF) for the implicit and linear components and the Adams-Bashforth scheme for the nonlinear and explicit parts of the gradient flow in variational functions. The proposed algorithm, resembling a generalized difference-of-convex-function approach, involves a changing set of convex functions in each iteration. It integrates the Armijo line search strategy to improve performance. The study also discusses classical preconditioners such as symmetric Gauss-Seidel, Jacobi, and Richardson within this context. The global convergence of the algorithm is established through the Kurdyka-Łojasiewicz properties, ensuring convergence within a finite number of preconditioned iterations. Numerical experiments demonstrate the superiority of the proposed second-order convex splitting with line search over conventional difference-of-convex-function algorithms.

URL:

38. Inertial Proximal Difference-of-Convex Algorithm with Convergent Bregman Plug-and-Play for Nonconvex Imaging [arXiv:2409.03262] #

Authors: Tsz Ching Chow, Chaoyan Huang, Zhongming Wu, Tieyong Zeng, Angelica I. Aviles-Rivero

Abstract: Imaging tasks are typically tackled using a structured optimization framework. This paper delves into a class of algorithms for difference-of-convex (DC) structured optimization, focusing on minimizing a DC function along with a possibly nonconvex function. Existing DC algorithm (DCA) versions often fail to effectively handle nonconvex functions or exhibit slow convergence rates. We propose a novel inertial proximal DC algorithm in Bregman geometry, named iBPDCA, designed to address nonconvex terms and enhance convergence speed through inertial techniques. We provide a detailed theoretical analysis, establishing both subsequential and global convergence of iBPDCA via the Kurdyka-Łojasiewicz property. Additionally, we introduce a Plug-and-Play variant, PnP-iBPDCA, which employs a deep neural network-based prior for greater flexibility and robustness while ensuring theoretical convergence. We also establish that the Gaussian gradient step denoiser used in our method is equivalent to evaluating the Bregman proximal operator for an implicitly weakly convex functional. We extensively validate our method on Rician noise and phase retrieval. We demonstrate that iBPDCA surpasses existing state-of-the-art methods.

URL:

37. Constructing Tight Quadratic Relaxations for Global Optimization: II. Underestimating Difference-of-Convex (D.C.) Functions [arXiv:2408.13058] #

Authors: William R. Strahl, Arvind U. Raghunathan, Nikolaos V. Sahinidis, Chrysanthos E. Gounaris

Abstract: Recent advances in the efficiency and robustness of algorithms solving convex quadratically constrained quadratic programming (QCQP) problems motivate developing techniques for creating convex quadratic relaxations that, although more expensive to compute, provide tighter bounds than their classical linear counterparts. In the first part of this two-paper series [Strahl et al., 2024], we developed a cutting plane algorithm to construct convex quadratic underestimators for twice-differentiable convex functions, which we extend here to address the case of non-convex difference-of-convex (d.c.) functions as well. Furthermore, we generalize our approach to consider a hierarchy of quadratic forms, thereby allowing the construction of even tighter underestimators. On a set of d.c. functions extracted from benchmark libraries, we demonstrate noteworthy reduction in the hypervolume between our quadratic underestimators and linear ones constructed at the same points. Additionally, we construct convex QCQP relaxations at the root node of a spatial branch-and-bound tree for a set of systematically created d.c. optimization problems in up to four dimensions, and we show that our relaxations reduce the gap between the lower bound computed by the state-of-the-art global optimization solver BARON and the optimal solution by an excess of 90%, on average.

URL:

36. Distributed Difference of Convex Optimization [arXiv:2407.16728] #

Authors: Vivek Khatana, Murti V. Salapaka

Abstract: In this article, we focus on solving a class of distributed optimization problems involving $n$ agents with the local objective function at every agent $i$ given by the difference of two convex functions $f_i$ and $g_i$ (difference-of-convex (DC) form), where $f_i$ and $g_i$ are potentially nonsmooth. The agents communicate via a directed graph containing $n$ nodes. We create smooth approximations of the functions $f_i$ and $g_i$ and develop a distributed algorithm utilizing the gradients of the smooth surrogates and a finite-time approximate consensus protocol. We term this algorithm as DDC-Consensus. The developed DDC-Consensus algorithm allows for non-symmetric directed graph topologies and can be synthesized distributively. We establish that the DDC-Consensus algorithm converges to a stationary point of the nonconvex distributed optimization problem. The performance of the DDC-Consensus algorithm is evaluated via a simulation study to solve a nonconvex DC-regularized distributed least squares problem. The numerical results corroborate the efficacy of the proposed algorithm.

URL:

35. An Inexact Bregman Proximal Difference-of-Convex Algorithm with Two Types of Relative Stopping Criteria [arXiv:2406.04646] #

Authors: Lei Yang, Jingjing Hu, Kim-Chuan Toh

Abstract: In this paper, we consider a class of difference-of-convex (DC) optimization problems, which require only a weaker restricted $L$-smooth adaptable property on the smooth part of the objective function, instead of the standard global Lipschitz gradient continuity assumption. Such problems are prevalent in many contemporary applications such as compressed sensing, statistical regression, and machine learning, and can be solved by a general Bregman proximal DC algorithm (BPDCA). However, the existing BPDCA is developed based on the stringent requirement that the involved subproblems must be solved exactly, which is often impractical and limits the applicability of the BPDCA. To facilitate the practical implementations and wider applications of the BPDCA, we develop an inexact Bregman proximal difference-of-convex algorithm (iBPDCA) by incorporating two types of relative-type stopping criteria for solving the subproblems. The proposed inexact framework has considerable flexibility to encompass many existing exact and inexact methods, and can accommodate different types of errors that may occur when solving the subproblem. This enables the potential application of our inexact framework across different DC decompositions to facilitate the design of a more efficient DCA scheme in practice. The global subsequential convergence and the global sequential convergence of our iBPDCA are established under suitable conditions including the Kurdyka-Łojasiewicz property. Some numerical experiments are conducted to show the superior performance of our iBPDCA in comparison to existing algorithms. These results also empirically validate the necessity and significance of developing different types of stopping criteria to facilitate the efficient computation of the subproblem in each iteration of our iBPDCA.

URL:

34. Single-Loop Stochastic Algorithms for Difference of Max-Structured Weakly Convex Functions [arXiv:2405.18577] #

Authors: Quanqi Hu, Qi Qi, Zhaosong Lu, Tianbao Yang

Abstract: In this paper, we study a class of non-smooth non-convex problems in the form of $\min_{x}[\max_{y\in Y}φ(x, y) - \max_{z\in Z}ψ(x, z)]$, where both $Φ(x) = \max_{y\in Y}φ(x, y)$ and $Ψ(x)=\max_{z\in Z}ψ(x, z)$ are weakly convex functions, and $φ(x, y), ψ(x, z)$ are strongly concave functions in terms of $y$ and $z$, respectively. It covers two families of problems that have been studied but are missing single-loop stochastic algorithms, i.e., difference of weakly convex functions and weakly convex strongly-concave min-max problems. We propose a stochastic Moreau envelope approximate gradient method dubbed SMAG, the first single-loop algorithm for solving these problems, and provide a state-of-the-art non-asymptotic convergence rate. The key idea of the design is to compute an approximate gradient of the Moreau envelopes of $Φ, Ψ$ using only one step of stochastic gradient update of the primal and dual variables. Empirically, we conduct experiments on positive-unlabeled (PU) learning and partial area under ROC curve (pAUC) optimization with an adversarial fairness regularizer to validate the effectiveness of our proposed algorithms.

URL:

33. Improved convergence rates for the Difference-of-Convex algorithm [arXiv:2403.16864] #

Authors: Teodor Rotaru, Panagiotis Patrinos, François Glineur

Abstract: We consider a difference-of-convex formulation where one of the terms is allowed to be hypoconvex (or weakly convex). We first examine the precise behavior of a single iteration of the Difference-of-Convex algorithm (DCA), giving a tight characterization of the objective function decrease. This requires distinguishing between eight distinct parameter regimes. Our proofs are inspired by the performance estimation framework, but are much simplified compared to similar previous work. We then derive sublinear DCA convergence rates towards critical points, distinguishing between cases where at least one of the functions is smooth and where both functions are nonsmooth. We conjecture the tightness of these rates for four parameter regimes, based on strong numerical evidence obtained via performance estimation, as well as the leading constant in the asymptotic sublinear rate for two more regimes.

URL:

32. An Efficient Difference-of-Convex Solver for Privacy Funnel [arXiv:2403.04778] #

Authors: Teng-Hui Huang, Hesham El Gamal

Abstract: We propose an efficient solver for the privacy funnel (PF) method, leveraging its difference-of-convex (DC) structure. The proposed DC separation results in a closed-form update equation, which allows straightforward application to both known and unknown distribution settings. For known distribution case, we prove the convergence (local stationary points) of the proposed non-greedy solver, and empirically show that it outperforms the state-of-the-art approaches in characterizing the privacy-utility trade-off. The insights of our DC approach apply to unknown distribution settings where labeled empirical samples are available instead. Leveraging the insights, our alternating minimization solver satisfies the fundamental Markov relation of PF in contrast to previous variational inference-based solvers. Empirically, we evaluate the proposed solver with MNIST and Fashion-MNIST datasets. Our results show that under a comparable reconstruction quality, an adversary suffers from higher prediction error from clustering our compressed codes than that with the compared methods. Most importantly, our solver is independent to private information in inference phase contrary to the baselines.

URL:

31. Approximation analysis for the minimization problem of difference-of-convex functions with Moreau envelopes [arXiv:2402.13461] #

Authors: Yan Tang, Shiqing Zhang

Abstract: In this work the minimization problem for the difference of convex (DC) functions is studied by using Moreau envelopes and the descent method with Moreau gradient is employed to approximate the numerical solution. The main regularization idea in this work is inspired by Hiriart-Urruty [14], Moudafi[17], regularize the components of the DC problem by adapting the different parameters and strategic matrices flexibly to evaluate the whole DC problem. It is shown that the inertial gradient method as well as the classic gradient descent scheme tend towards an approximation stationary point of the original problem.

URL:

30. The Boosted Difference of Convex Functions Algorithm for Value-at-Risk Constrained Portfolio Optimization [arXiv:2402.09194] #

Authors: Marah-Lisanne Thormann, Phan Tu Vuong, Alain B. Zemkoho

Abstract: A highly relevant problem of modern finance is the design of Value-at-Risk (VaR) optimal portfolios. Due to contemporary financial regulations, banks and other financial institutions are tied to use the risk measure to control their credit, market and operational risks. For a portfolio with a discrete return distribution and finitely many scenarios, a Difference of Convex (DC) functions representation of the VaR can be derived. Wozabal (2012) showed that this yields a solution to a VaR constrained Markowitz style portfolio selection problem using the Difference of Convex Functions Algorithm (DCA). A recent algorithmic extension is the so-called Boosted Difference of Convex Functions Algorithm (BDCA) which accelerates the convergence due to an additional line search step. It has been shown that the BDCA converges linearly for solving non-smooth quadratic problems with linear inequality constraints. In this paper, we prove that the linear rate of convergence is also guaranteed for a piecewise linear objective function with linear equality and inequality constraints using the Kurdyka-Łojasiewicz property. An extended case study under consideration of best practices for comparing optimization algorithms demonstrates the superiority of the BDCA over the DCA for real-world financial market data. We are able to show that the results of the BDCA are significantly closer to the efficient frontier compared to the DCA. Due to the open availability of all data sets and code, this paper further provides a practical guide for transparent and easily reproducible comparisons of VaR constrained portfolio selection problems in Python.

URL:

29. A Globally Convergent Algorithm for Neural Network Parameter Optimization Based on Difference-of-Convex Functions [arXiv:2401.07936] #

Authors: Daniel Tschernutter, Mathias Kraus, Stefan Feuerriegel

Abstract: We propose an algorithm for optimizing the parameters of single hidden layer neural networks. Specifically, we derive a blockwise difference-of-convex (DC) functions representation of the objective function. Based on the latter, we propose a block coordinate descent (BCD) approach that we combine with a tailored difference-of-convex functions algorithm (DCA). We prove global convergence of the proposed algorithm. Furthermore, we mathematically analyze the convergence rate of parameters and the convergence rate in value (i.e., the training loss). We give conditions under which our algorithm converges linearly or even faster depending on the local shape of the loss function. We confirm our theoretical derivations numerically and compare our algorithm against state-of-the-art gradient-based solvers in terms of both training loss and test loss.

URL:

28. Higher-order tensor methods for minimizing difference of convex functions [arXiv:2401.05063] #

Authors: Ion Necoara

Abstract: Higher-order tensor methods were recently proposed for minimizing smooth convex and nonconvex functions. Higher-order algorithms accelerate the convergence of the classical first-order methods thanks to the higher-order derivatives used in the updates. The purpose of this paper is twofold. Firstly, to show that the higher-order algorithmic framework can be generalized and successfully applied to (nonsmooth) difference of convex functions, namely, those that can be expressed as the difference of two smooth convex functions and a possibly nonsmooth convex one. We also provide examples when the subproblem can be solved efficiently, even globally. Secondly, to derive a complete convergence analysis for our higher-order difference of convex functions (HO-DC) algorithm. In particular, we prove that any limit point of the HO-DC iterative sequence is a critical point of the problem under consideration, the corresponding objective value is monotonically decreasing and the minimum value of the norms of its subgradients converges globally to zero at a sublinear rate. The sublinear or linear convergence rates of the iterations are obtained under the Kurdyka-Lojasiewicz property.

URL:

27. Handling nonlinearities and uncertainties of fed-batch cultivations with difference of convex functions tube MPC [arXiv:2312.00847] #

Authors: Niels Krausch, Martin Doff-Sotta, Mark Canon, Peter Neubauer, Mariano Nicolas Cruz Bournazou

Abstract: Bioprocesses are often characterized by nonlinear and uncertain dynamics. This poses particular challenges in the context of model predictive control (MPC). Several approaches have been proposed to solve this problem, such as robust or stochastic MPC, but they can be computationally expensive when the system is nonlinear. Recent advances in optimal control theory have shown that concepts from convex optimization, tube-based MPC, and difference of convex functions (DC) enable stable and robust online process control. The approach is based on systematic DC decompositions of the dynamics and successive linearizations around feasible trajectories. By convexity, the linearization errors can be bounded tightly and treated as bounded disturbances in a robust tube-based MPC framework. However, finding the DC composition can be a difficult task. To overcome this problem, we used a neural network with special convex structure to learn the dynamics in DC form and express the uncertainty sets using simplices to maximize the product formation rate of a cultivation with uncertain substrate concentration in the feed. The results show that this is a promising approach for computationally tractable data-driven robust MPC of bioprocesses.

URL:

26. A qualitative difference between gradient flows of convex functions in finite- and infinite-dimensional Hilbert spaces [arXiv:2310.17610] #

Authors: Jonathan W. Siegel, Stephan Wojtowytsch

Abstract: We consider gradient flow/gradient descent and heavy ball/accelerated gradient descent optimization for convex objective functions. In the gradient flow case, we prove the following:

If $f$ does not have a minimizer, the convergence $f(x_t)\to \inf f$ can be arbitrarily slow.
If $f$ does have a minimizer, the excess energy $f(x_t) - \inf f$ is integrable/summable in time. In particular, $f(x_t) - \inf f = o(1/t)$ as $t\to\infty$.
In Hilbert spaces, this is optimal: $f(x_t) - \inf f$ can decay to $0$ as slowly as any given function which is monotone decreasing and integrable at $\infty$, even for a fixed quadratic objective.
In finite dimension (or more generally, for all gradient flow curves of finite length), this is not optimal: We prove that there are convex monotone decreasing integrable functions $g(t)$ which decrease to zero slower than $f(x_t)-\inf f$ for the gradient flow of any convex function on $\mathbb R^d$. For instance, we show that any gradient flow $x_t$ of a convex function $f$ in finite dimension satisfies $\liminf _{t\to\infty} \big(t\cdot \log^2(t)\cdot \big{f(x _t) -\inf f\big}\big)=0$. This improves on the commonly reported $O(1/t)$ rate and provides a sharp characterization of the energy decay law. We also note that it is impossible to establish a rate $O(1/(tφ(t)))$ for any function $φ$ which satisfies $\lim _{t\to\infty}φ(t) = \infty$, even asymptotically. Similar results are obtained in related settings for (1) discrete time gradient descent, (2) stochastic gradient descent with multiplicative noise and (3) the heavy ball ODE. In the case of stochastic gradient descent, the summability of $\mathbb E[f(x_n) - \inf f]$ is used to prove that $f(x_n)\to \inf f$ almost surely - an improvement on the convergence almost surely up to a subsequence which follows from the $O(1/n)$ decay estimate.

URL:

25. Large Convex sets in Difference sets [arXiv:2309.07527] #

Authors: Krishnendu Bhowmick, Ben Lund, Oliver Roche-Newton

Abstract: We give a construction of a convex set $A \subset \mathbb R$ with cardinality $n$ such that $A-A$ contains a convex subset with cardinality $Ω(n^2)$. We also consider the following variant of this problem: given a convex set $A$, what is the size of the largest matching $M \subset A \times A$ such that the set [ { a-b : (a,b) \in M } ] is convex? We prove that there always exists such an $M$ with $|M| \geq \sqrt n$, and that this lower bound is best possible, up a multiplicative constant.

URL:

24. Moreau Envelope Based Difference-of-weakly-Convex Reformulation and Algorithm for Bilevel Programs [arXiv:2306.16761] #

Authors: Lucy L. Gao, Jane J. Ye, Haian Yin, Shangzhi Zeng, Jin Zhang

Abstract: Bilevel programming has emerged as a valuable tool for hyperparameter selection, a central concern in machine learning. In a recent study by Ye et al. (2023), a value function-based difference of convex algorithm was introduced to address bilevel programs. This approach proves particularly powerful when dealing with scenarios where the lower-level problem exhibits convexity in both the upper-level and lower-level variables. Examples of such scenarios include support vector machines and $\ell_1$ and $\ell_2$ regularized regression. In this paper, we significantly expand the range of applications, now requiring convexity only in the lower-level variables of the lower-level program. We present an innovative single-level difference of weakly convex reformulation based on the Moreau envelope of the lower-level problem. We further develop a sequentially convergent Inexact Proximal Difference of Weakly Convex Algorithm (iP-DwCA). To evaluate the effectiveness of the proposed iP-DwCA, we conduct numerical experiments focused on tuning hyperparameters for kernel support vector machines on simulated data.

URL:

23. Generalized Graph Signal Sampling by Difference-of-Convex Optimization [arXiv:2306.14634] #

Authors: Keitaro Yamashita, Kazuki Naganuma, Shunsuke Ono

Abstract: We propose a desigining method of a flexible sampling operator for graph signals via a difference-of-convex (DC) optimization algorithm. A fundamental challenge in graph signal processing is sampling, especially for graph signals that are not bandlimited. In order to sample beyond bandlimited graph signals, there are studies to expand the generalized sampling theory for the graph setting. Vertex-wise sampling and flexible sampling are two main strategies to sample graph signals. Recovery accuracy of existing vertex-wise sampling methods is highly dependent on specific vertices selected to generate a sampled graph signal that may compromise the accurary especially when noise is generated at the vertices. In contrast, a flexible sampling mixes values at multiple vertices to generate a sampled signal for robust sampling; however, existing flexible sampling methods impose strict assumptions and aggressive relaxations. To address these limitations, we aim to design a flexible sampling operator without such strict assumptions and aggressive relaxations by introducing DC optimization. By formulating the problem of designing a flexible sampling operator as a DC optimization problem, our method ensures robust sampling for graph signals under arbitrary priors based on generalized sampling theory. We develop an efficient solver based on the general double-proximal gradient DC algorithm, which guarantees convergence to a critical point. Experimental results demonstrate the superiority of our method in sampling and recovering beyond bandlimited graph signals compared to existing approaches.

URL:

22. A globally convergent difference-of-convex algorithmic framework and application to log-determinant optimization problems [arXiv:2306.02001] #

Authors: Chaorui Yao, Xin Jiang

Abstract: The difference-of-convex algorithm (DCA) is a conceptually simple method for the minimization of (possibly) nonconvex functions that are expressed as the difference of two convex functions. At each iteration, DCA constructs a global overestimator of the objective and solves the resulting convex subproblem. Despite its conceptual simplicity, the theoretical understanding and algorithmic framework of DCA needs further investigation. In this paper, global convergence of DCA at a linear rate is established under an extended Polyak–Łojasiewicz condition. The proposed condition holds for a class of DC programs with a bounded, closed, and convex constraint set, for which global convergence of DCA cannot be covered by existing analyses. Moreover, the DCProx computational framework is proposed, in which the DCA subproblems are solved by a primal–dual proximal algorithm with Bregman distances. With a suitable choice of Bregman distances, DCProx has simple update rules with cheap per-iteration complexity. As an application, DCA is applied to several fundamental problems in network information theory, for which no existing numerical methods are able to compute the global optimum. For these problems, our analysis proves the global convergence of DCA, and more importantly, DCProx solves the DCA subproblems efficiently. Numerical experiments are conducted to verify the efficiency of DCProx.

URL:

21. A property of strictly convex functions which differ from each other by a constant on the boundary of their domain [arXiv:2305.12183] #

Authors: Biagio Ricceri

Abstract: In this paper, in particular, we prove the following result: Let $E$ be a reflexive real Banach space and let $C\subset E$ be a closed convex set, with non-empty interior, whose boundary is sequentially weakly closed and non-convex. Then, for every function $\varphi:\partial C\to {\bf R}$ and for every convex set $S\subseteq E^$ dense in $E^*$, there exists $\tilde{γ} \in S$ having the following property: for every strictly convex lower semicontinuous function $J:C \to {\bf R}$, Gâteaux differentiable in $\hbox {int}(C)$, such that $J _{\mid\partial C}-\varphi$ is constant in $\partial C$ and $\lim _{|x|\to +\infty}{{J(x)}\over {|x|}} = +\infty$ if $C$ is unbounded, $\tilde{γ}$ is an algebraically interior point of $J’(\hbox {\int}(C))$ (with respect to $E^$).

URL:

20. Local Differences Determined by Convex sets [arXiv:2304.00888] #

Authors: Krishnendu Bhowmick, Miriam Patry, Oliver Roche-Newton

Abstract: This paper introduces a new problem concerning additive properties of convex sets. Let $S= {s_1 < \dots <s_n }$ be a set of real numbers and let $D_i(S)= {s_x-s_y: 1 \leq x-y \leq i}$. We expect that $D_i(S)$ is large, with respect to the size of $S$ and the parameter $i$, for any convex set $S$. We give a construction to show that $D_3(S)$ can be as small as $n+2$, and show that this is the smallest possible size. On the other hand, we use an elementary argument to prove a non-trivial lower bound for $D_4(S)$, namely $|D_4(S)| \geq \frac{5}{4}n -1$. For sufficiently large values of $i$, we are able to prove a non-trivial bound that grows with $i$ using incidence geometry.

URL:

19. Preconditioned Algorithm for Difference of Convex Functions with applications to Graph Ginzburg-Landau Model [arXiv:2303.14495] #

Authors: Xinhua Shen, Hongpeng Sun, Xuecheng Tai

Abstract: In this work, we propose and study a preconditioned framework with a graphic Ginzburg-Landau functional for image segmentation and data clustering by parallel computing. Solving nonlocal models is usually challenging due to the huge computation burden. For the nonconvex and nonlocal variational functional, we propose several damped Jacobi and generalized Richardson preconditioners for the large-scale linear systems within a difference of convex functions algorithms framework. They are efficient for parallel computing with GPU and can leverage the computational cost. Our framework also provides flexible step sizes with a global convergence guarantee. Numerical experiments show the proposed algorithms are very competitive compared to the singular value decomposition based spectral method.

URL:

18. Multi-UAV trajectory planning problem using the difference of convex function programming [arXiv:2303.07581] #

Authors: Anh Phuong Ngo, Christian Thomas, Ali Karimoddini, Hieu T. Nguyen

Abstract: The trajectory planning problem for a swarm of multiple UAVs is known as a challenging nonconvex optimization problem, particularly due to a large number of collision avoidance constraints required for individual pairs of UAVs in the swarm. In this paper, we tackle this nonconvexity by leveraging the difference of convex function (DC) programming. We introduce the slack variables to relax and reformulate the collision avoidance conditions and employ the penalty function term to equivalently convert the problem into a DC form. Consequently, we construct a penalty DC algorithm in which we sequentially solve a set of convex optimization problems obtained by linearizing the collision avoidance constraint. The algorithm iteratively tightens the safety condition and reduces the objective cost of the planning problem and the additional penalty term. Numerical results demonstrate the effectiveness of the proposed approach in planning a large number of UAVs in congested space.

URL:

17. Approximate Bilevel Difference Convex Programming for Bayesian Risk Markov Decision Processes [arXiv:2301.11415] #

Authors: Yifan Lin, Enlu Zhou

Abstract: We consider infinite-horizon Markov Decision Processes where parameters, such as transition probabilities, are unknown and estimated from data. The popular distributionally robust approach to addressing the parameter uncertainty can sometimes be overly conservative. In this paper, we utilize the recently proposed formulation, Bayesian risk Markov Decision Process (BR-MDP), to address parameter (or epistemic) uncertainty in MDPs. To solve the infinite-horizon BR-MDP with a class of convex risk measures, we propose a computationally efficient approach called approximate bilevel difference convex programming (ABDCP). The optimization is performed offline and produces the optimal policy that is represented as a finite state controller with desirable performance guarantees. We also demonstrate the empirical performance of the BR-MDP formulation and the proposed algorithm.

URL:

16. Single-Crossing Differences in Convex Environments [arXiv:2212.12009] #

Authors: Navin Kartik, SangMok Lee, Daniel Rappoport

Abstract: An agent’s preferences depend on an ordered parameter or type. We characterize the set of utility functions with single-crossing differences (SCD) in convex environments. These include preferences over lotteries, both in expected utility and rank-dependent utility frameworks, and preferences over bundles of goods and over consumption streams. Our notion of SCD does not presume an order on the choice space. This unordered SCD is necessary and sufficient for ‘‘interval choice’’ comparative statics. We present applications to cheap talk, observational learning, and collective choice, showing how convex environments arise in these problems and how SCD/interval choice are useful. Methodologically, our main characterization stems from a result on linear aggregations of single-crossing functions. △ Less

URL:

15. Control of Uncertain PWA Systems using Difference-of-Convex Decompositions [arXiv:2209.12990] #

Authors: Siddharth H. Nair, Yvonne R. Stürz

Abstract: In this report, we analyze and design feedback policies for discrete-time Piecewise-Affine (PWA) systems with uncertainty in both the affine dynamics and the polytopic partition. The main idea is to utilise the Difference-of-Convex (DC) decomposition of continuous PWA systems to derive quadratic Lyapunov functions as stability certificates and stabilizing affine policies in a higher dimensional space. When projected back to the state space, we obtain time-varying PWQ Lyapunov functions and time-varying PWA feedback policies.

URL:

14. Encoding inductive invariants as barrier certificates: synthesis via difference-of-convex programming [arXiv:2209.09703] #

Authors: Qiuye Wang, Mingshuai Chen, Bai Xue, Naijun Zhan, Joost-Pieter Katoen

Abstract: A barrier certificate often serves as an inductive invariant that isolates an unsafe region from the reachable set of states, and hence is widely used in proving safety of hybrid systems possibly over an infinite time horizon. We present a novel condition on barrier certificates, termed the invariant barrier-certificate condition, that witnesses unbounded-time safety of differential dynamical systems. The proposed condition is the weakest possible one to attain inductive invariance. We show that discharging the invariant barrier-certificate condition – thereby synthesizing invariant barrier certificates – can be encoded as solving an optimization problem subject to bilinear matrix inequalities (BMIs). We further propose a synthesis algorithm based on difference-of-convex programming, which approaches a local optimum of the BMI problem via solving a series of convex optimization problems. This algorithm is incorporated in a branch-and-bound framework that searches for the global optimum in a divide-and-conquer fashion. We present a weak completeness result of our method, namely, a barrier certificate is guaranteed to be found (under some mild assumptions) whenever there exists an inductive invariant (in the form of a given template) that suffices to certify safety of the system. Experimental results on benchmarks demonstrate the effectiveness and efficiency of our approach.

URL:

13. A convex set with a rich difference [arXiv:2208.03258] #

Authors: Oliver Roche-Newton, Audie Warren

Abstract: We construct a convex set $A$ with cardinality $2n$ and with the property that an element of the difference set $A-A$ can be represented in $n$ different ways. We also show that this construction is optimal by proving that for any convex set $A$, the maximum possible number of representations an element of $A-A$ can have is $\lfloor |A|/2 \rfloor $.

URL:

12. Value Function Based Difference-of-Convex Algorithm for Bilevel Hyperparameter Selection Problems [arXiv:2206.05976] #

Authors: Lucy Gao, Jane J. Ye, Haian Yin, Shangzhi Zeng, Jin Zhang

Abstract: Gradient-based optimization methods for hyperparameter tuning guarantee theoretical convergence to stationary solutions when for fixed upper-level variable values, the lower level of the bilevel program is strongly convex (LLSC) and smooth (LLS). This condition is not satisfied for bilevel programs arising from tuning hyperparameters in many machine learning algorithms. In this work, we develop a sequentially convergent Value Function based Difference-of-Convex Algorithm with inexactness (VF-iDCA). We show that this algorithm achieves stationary solutions without LLSC and LLS assumptions for bilevel programs from a broad class of hyperparameter tuning applications. Our extensive experiments confirm our theoretical findings and show that the proposed VF-iDCA yields superior performance when applied to tune hyperparameters.

URL:

11. Decentralized Saddle-Point Problems with Different Constants of Strong Convexity and Strong Concavity [arXiv:2206.00090] #

Authors: Dmitriy Metelev, Alexander Rogozin, Alexander Gasnikov, Dmitry Kovalev

Abstract: Large-scale saddle-point problems arise in such machine learning tasks as GANs and linear models with affine constraints. In this paper, we study distributed saddle-point problems (SPP) with strongly-convex-strongly-concave smooth objectives that have different strong convexity and strong concavity parameters of composite terms, which correspond to min and max variables, and bilinear saddle-point part. We consider two types of first-order oracles: deterministic (returns gradient) and stochastic (returns unbiased stochastic gradient). Our method works in both cases and takes several consensus steps between oracle calls.

URL:

10. The difference of convex algorithm on Hadamard manifolds [arXiv:2112.05250] #

Authors: Ronny Bergmann, Orizon P. Ferreira, Elianderson M. Santos, João Carlos O. Souza

Abstract: In this paper, we propose a Riemannian version of the difference of convex algorithm (DCA) to solve a minimization problem involving the difference of convex (DC) function. We establish the equivalence between the classical and simplified Riemannian versions of the DCA. We also prove that, under mild assumptions, the Riemannian version of the DCA is well-defined, and every cluster point of the sequence generated by the proposed method, if any, is a critical point of the objective DC function. Additionally, we establish some duality relations between the DC problem and its dual. To illustrate the effectiveness of the algorithm, we present some numerical experiments.

URL:

9. Data Fitting with Signomial Programming Compatible Difference of Convex Functions [arXiv:2110.12104] #

Authors: Cody Karcher

Abstract: Signomial Programming (SP) has proven to be a powerful tool for engineering design optimization, striking a balance between the computational efficiency of Geometric Programming (GP) and the extensibility of more general optimization methods like Sequential Quadratic Programming (SQP). But when an existing engineering analysis tool is incompatible with the mathematics of the SP formulation, options are limited. Previous literature has suggested schemes for fitting GP compatible models to pre-computed data, but no methods have yet been proposed that take advantage of the increased modeling flexibility available in SP. This paper describes a new Soft Difference of Max Affine (SDMA) function class that is constructed from existing methods of GP compatible fitting and the theory of Difference of Convex (DC) functions. When a SDMA function is fit to data in log-log transformed space, it becomes either a signomial or a set of signomials upon inverse transformation. Three examples of fitting are presented here, including simple test cases in 2D and 3D, and a fit to the performance data of the NACA 24xx family of airfoils. In each case, RMS error is driven to less than 1%.

URL:

8. Factored couplings in multi-marginal optimal transport via difference of convex programming [arXiv:2110.00629] #

Authors: Quang Huy Tran, Hicham Janati, Ievgen Redko, Rémi Flamary, Nicolas Courty

Abstract: Optimal transport (OT) theory underlies many emerging machine learning (ML) methods nowadays solving a wide range of tasks such as generative modeling, transfer learning and information retrieval. These latter works, however, usually build upon a traditional OT setup with two distributions, while leaving a more general multi-marginal OT formulation somewhat unexplored. In this paper, we study the multi-marginal OT (MMOT) problem and unify several popular OT methods under its umbrella by promoting structural information on the coupling. We show that incorporating such structural information into MMOT results in an instance of a different of convex (DC) programming problem allowing us to solve it numerically. Despite high computational cost of the latter procedure, the solutions provided by DC optimization are usually as qualitative as those obtained using currently employed optimization schemes.

URL:

7. On the rate of convergence of the Difference-of-Convex Algorithm (DCA) [arXiv:2109.13566] #

Authors: Hadi Abbaszadehpeivasti, Etienne de Klerk, Moslem Zamani

Abstract: In this paper, we study the convergence rate of the DCA (Difference-of-Convex Algorithm), also known as the convex-concave procedure, with two different termination criteria that are suitable for smooth and nonsmooth decompositions respectively. The DCA is a popular algorithm for difference-of-convex (DC) problems, and known to converge to a stationary point of the objective under some assumptions. We derive a worst-case convergence rate of $O(1/\sqrt{N})$ after $N$ iterations of the objective gradient norm for certain classes of DC problems, without assuming strong convexity in the DC decomposition, and give an example which shows the convergence rate is exact. We also provide a new convergence rate of $O(1/N)$ for the DCA with the second termination criterion. %In addition, we investigate the DCA with regularization. Moreover, we derive a new linear convergence rate result for the DCA under the assumption of the Polyak-Łojasiewicz inequality. The novel aspect of our analysis is that it employs semidefinite programming performance estimation.

URL:

6. A Different Perspective On The Stochastic Convex Feasibility Problem [arXiv:2108.12029] #

Authors: James Renegar, Song Zhou

Abstract: We analyze a simple randomized subgradient method for approximating solutions to stochastic systems of convex functional constraints, the only input to the algorithm being the size of minibatches. By introducing a new notion of what is meant for a point to approximately solve the constraints, determining bounds on the expected number of iterations reduces to determining a hitting time for a compound Bernoulli process, elementary probability. Besides bounding the expected number of iterations quite generally, we easily establish concentration inequalities on the number of iterations, and more interesting, we establish much-improved bounds when a notion akin to Hölderian growth is satisfied, for all degrees of growth, not just the linear growth of piecewise-linear convex functions or the quadratic growth of strongly convex functions. Finally, we establish the analogous results under a slight modification to the algorithm which results in the user knowing with high confidence an iterate is in hand that approximately solves the system. Perhaps surprisingly, the iteration bounds here are deterministic – all of the probability gets wrapped into the confidence level (albeit at the expense of potentially large minibatches).

URL:

5. Retraction-based first-order feasible methods for difference-of-convex programs with smooth inequality and simple geometric constraints [arXiv:2106.08584] #

Authors: Yongle Zhang, Guoyin Li, Ting Kei Pong, Shiqi Xu

Abstract: In this paper, we propose first-order feasible methods for difference-of-convex (DC) programs with smooth inequality and simple geometric constraints. Our strategy for maintaining feasibility of the iterates is based on a “retraction” idea adapted from the literature of manifold optimization. When the constraints are convex, we establish the global subsequential convergence of the sequence generated by our algorithm under strict feasibility condition, and analyze its convergence rate when the objective is in addition convex according to the Kurdyka-Lojasiewicz (KL) exponent of the extended objective (i.e., sum of the objective and the indicator function of the constraint set). We also show that the extended objective of a large class of Euclidean norm (and more generally, group LASSO penalty) regularized convex optimization problems is a KL function with exponent $\frac12$; consequently, our algorithm is locally linearly convergent when applied to these problems. We then extend our method to solve DC programs with a single specially structured nonconvex constraint. Finally, we discuss how our algorithms can be applied to solve two concrete optimization problems, namely, group-structured compressed sensing problems with Gaussian measurement noise and compressed sensing problems with Cauchy measurement noise, and illustrate the empirical performance of our algorithms.

URL:

4. Synthesizing Invariant Barrier Certificates via Difference-of-Convex Programming [arXiv:2105.14311] #

Authors: Qiuye Wang, Mingshuai Chen, Bai Xue, Naijun Zhan, Joost-Pieter Katoen

Abstract: A barrier certificate often serves as an inductive invariant that isolates an unsafe region from the reachable set of states, and hence is widely used in proving safety of hybrid systems possibly over the infinite time horizon. We present a novel condition on barrier certificates, termed the invariant barrier-certificate condition, that witnesses unbounded-time safety of differential dynamical systems. The proposed condition is by far the least conservative one on barrier certificates, and can be shown as the weakest possible one to attain inductive invariance. We show that discharging the invariant barrier-certificate condition – thereby synthesizing invariant barrier certificates – can be encoded as solving an optimization problem subject to bilinear matrix inequalities (BMIs). We further propose a synthesis algorithm based on difference-of-convex programming, which approaches a local optimum of the BMI problem via solving a series of convex optimization problems. This algorithm is incorporated in a branch-and-bound framework that searches for the global optimum in a divide-and-conquer fashion. We present a weak completeness result of our method, in the sense that a barrier certificate is guaranteed to be found (under some mild assumptions) whenever there exists an inductive invariant (in the form of a given template) that suffices to certify safety of the system. Experimental results on benchmark examples demonstrate the effectiveness and efficiency of our approach.

URL:

3. Algorithms for Difference-of-Convex (DC) Programs Based on Difference-of-Moreau-Envelopes Smoothing [arXiv:2104.01470] #

Authors: Kaizhao Sun, Xu Andy Sun

Abstract: In this paper we consider minimization of a difference-of-convex (DC) function with and without linear constraints. We first study a smooth approximation of a generic DC function, termed difference-of-Moreau-envelopes (DME) smoothing, where both components of the DC function are replaced by their respective Moreau envelopes. The resulting smooth approximation is shown to be Lipschitz differentiable, capture stationary points, local, and global minima of the original DC function, and enjoy some growth conditions, such as level-boundedness and coercivity, for broad classes of DC functions. We then develop four algorithms for solving DC programs with and without linear constraints based on the DME smoothing. In particular, for a smoothed DC program without linear constraints, we show that the classic gradient descent method as well as an inexact variant can obtain a stationary solution in the limit with a convergence rate of $\mathcal{O}(K^{-1/2})$, where $K$ is the number of proximal evaluations of both components. Furthermore, when the DC program is explicitly constrained in an affine subspace, we combine the smoothing technique with the augmented Lagrangian function and derive two variants of the augmented Lagrangian method (ALM), named LCDC-ALM and composite LCDC-ALM, focusing on different structures of the DC objective function. We show that both algorithms find an $ε$-approximate stationary solution of the original DC program in $\mathcal{O}(ε^{-2})$ iterations. Comparing to existing methods designed for linearly constrained weakly convex minimization, the proposed ALM-based algorithms can be applied to a broader class of problems, where the objective contains a nonsmooth concave component. Finally, numerical experiments are presented to demonstrate the performance of the proposed algorithms.

URL:

2. CDiNN -Convex Difference Neural Networks [arXiv:2103.17231] #

Authors: Parameswaran Sankaranarayanan, Raghunathan Rengaswamy

Abstract: Neural networks with ReLU activation function have been shown to be universal function approximators and learn function mapping as non-smooth functions. Recently, there is considerable interest in the use of neural networks in applications such as optimal control. It is well-known that optimization involving non-convex, non-smooth functions are computationally intensive and have limited convergence guarantees. Moreover, the choice of optimization hyper-parameters used in gradient descent/ascent significantly affect the quality of the obtained solutions. A new neural network architecture called the Input Convex Neural Networks (ICNNs) learn the output as a convex function of inputs thereby allowing the use of efficient convex optimization methods. Use of ICNNs for determining the input for minimizing output has two major problems: learning of a non-convex function as a convex mapping could result in significant function approximation error, and we also note that the existing representations cannot capture simple dynamic structures like linear time delay systems. We attempt to address the above problems by introduction of a new neural network architecture, which we call the CDiNN, which learns the function as a difference of polyhedral convex functions from data. We also discuss that, in some cases, the optimal input can be obtained from CDiNN through difference of convex optimization with convergence guarantees and that at each iteration, the problem is reduced to a linear programming problem.

URL:

1. A Difference-of-Convex Cutting Plane Algorithm for Mixed-Binary Linear Program [arXiv:2103.00717] #

Authors: Yi-Shuai Niu, Yu You

Abstract: In this paper, we propose a cutting plane algorithm based on DC (Difference-of-Convex) programming and DC cut for globally solving Mixed-Binary Linear Program (MBLP). We first use a classical DC programming formulation via the exact penalization to formulate MBLP as a DC program, which can be solved by DCA algorithm. Then, we focus on the construction of DC cuts, which serves either as a local cut (namely type-I DC cut) at feasible local minimizer of MBLP, or as a global cut (namely type-II DC cut) at infeasible local minimizer of MBLP if some particular assumptions are verified. Otherwise, the constructibility of DC cut is still unclear, and we propose to use classical global cuts (such as the Lift-and-Project cut) instead. Combining DC cut and classical global cuts, a cutting plane algorithm, namely DCCUT, is established for globally solving MBLP. The convergence theorem of DCCUT is proved. Restarting DCA in DCCUT helps to quickly update the upper bound solution and to introduce more DC cuts for lower bound improvement. A variant of DCCUT by introducing more classical global cuts in each iteration is proposed, and parallel versions of DCCUT and its variant are also designed which use the power of multiple processors for better performance. Numerical simulations of DCCUT type algorithms comparing with the classical cutting plane algorithm using Lift-and-Project cuts are reported. Tests on some specific samples and the MIPLIB 2017 benchmark dataset demonstrate the benefits of DC cut and good performance of DCCUT algorithms.

URL:

Publications

Thu, 27 Jun 2024 23:14:15 +0800

Notes, and Pre-prints #

[1] Nam Le, Extreme Points and the Krein–Milman Theorem: A note on Brezis Problem 1, 2026, Comments are welcome. pdf

Slides, Talks #

[1] Nam Le, “GIÁ TRỊ SHAP TRONG HỌC MÁY GIẢI THÍCH”, 2026. pdf

[2] Nam Le, Slide lectures in Introduction to Machine Learning, 2026. pdfs

Journal Publications #

[1] Le, Thanh, Nam Le, and Bac Le. “Knowledge graph embedding by relational rotation and complex convolution for link prediction.” Expert Systems with Applications 214 (2023): 119122. (ISI, Q1, IF: 8.6 2023)

International Conference Publications #

[1] Thanh Le, Nam Le, and Bac Le. “Embedding Model with Attention over Convolution Kernels and Dynamic Mapping Matrix for Link Prediction.” In Asian Conference on Intelligent Information and Database Systems, pp. 234-246. Springer, Cham, 2022. (Rank B, CORERANK 2021)

[2] Tung Luu*, Nam Le, Duc Le, and Bac Le. (2025, February). From Visual Explanations to Counterfactual Explanations with Latent Diffusion. Proceedings of the Winter Conference on Applications of Computer Vision (WACV), 420–429. (Rank A, CORE 2023, * means first author)

[3] Nam Le, Thanh Le, and Bac Le (2025). Improving Temporal Knowledge Graph Completion via Tensor Decomposition with Relation-Time Context and Multi-Time Perspective. In Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 3, ISBN 978-989-758-737-5, ISSN 2184-433X, pages 326-333. (Rank B, CORE 2023) [Slide]

[4] Nam Le, Thanh Le, and Bac Le (2025). Improving Temporal Knowledge Graph Forecasting via Multi-Rewards Mechanism and Confidence-Guided Tensor Decomposition Reinforcement Learning. In Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 1, ISBN 978-989-758-737-5, ISSN 2184-433X, pages 68-79. (Rank B, CORE 2023) [Slide]

Domestic Conference Publications #

[1] Nam Le, Thanh Le, and Bac Le (2025). Improving Temporal Knowledge Graph Forecasting via Multi-reward mechanism and Confidence-Augmented Reinforcement Learning. The 14th Scientific Conference (VNUHCM-US Conf 2024)

Recent Advanced in Research on Difference-of-Convex (DC) Programming

Thu, 27 Jun 2024 23:14:15 +0800

Second-order Stochastic Optimization methods for Machine Learning

Thu, 27 Jun 2024 23:14:15 +0800

Analysis of the Hessian #

1. Empirical Analysis of the Hessian of Over-Parametrized Neural Networks #

Year: 2017
Authors: Levent Sagun, Utku Evci, V. Ugur Guney, Yann Dauphin, Leon Bottou
ArXiv ID: arXiv:1706.04454
URL: https://arxiv.org/abs/1706.04454

Abstract: We study the properties of common loss surfaces through their Hessian matrix. In particular, in the context of deep learning, we empirically show that the spectrum of the Hessian is composed of two parts: (1) the bulk centered near zero, (2) and outliers away from the bulk. We present numerical evidence and mathematical justifications to the following conjectures laid out by Sagun et al. (2016): Fixing data, increasing the number of parameters merely scales the bulk of the spectrum; fixing the dimension and changing the data (for instance adding more clusters or making the data less separable) only affects the outliers. We believe that our observations have striking implications for non-convex optimization in high dimensions. First, the flatness of such landscapes (which can be measured by the singularity of the Hessian) implies that classical notions of basins of attraction may be quite misleading. And that the discussion of wide/narrow basins may be in need of a new perspective around over-parametrization and redundancy that are able to create large connected components at the bottom of the landscape. Second, the dependence of small number of large eigenvalues to the data distribution can be linked to the spectrum of the covariance matrix of gradients of model outputs. With this in mind, we may reevaluate the connections within the data-architecture-algorithm framework of a model, hoping that it would shed light into the geometry of high-dimensional and non-convex spaces in modern applications. In particular, we present a case that links the two observations: small and large batch gradient descent appear to converge to different basins of attraction but we show that they are in fact connected through their flat region and so belong to the same basin.

Source Code: No explicit source code information found

2. The Full Spectrum of Deepnet Hessians at Scale: Dynamics with SGD Training and Sample Size #

Year: 2018
Authors: Vardan Papyan
ArXiv ID: arXiv:1811.07062
URL: https://arxiv.org/abs/1811.07062

Abstract: We apply state-of-the-art tools in modern high-dimensional numerical linear algebra to approximate efficiently the spectrum of the Hessian of modern deepnets, with tens of millions of parameters, trained on real data. Our results corroborate previous findings, based on small-scale networks, that the Hessian exhibits “spiked” behavior, with several outliers isolated from a continuous bulk. We decompose the Hessian into different components and study the dynamics with training and sample size of each term individually.

Source Code: No explicit source code information found

3. PyHessian: Neural Networks Through the Lens of the Hessian #

Year: 2019
Authors: Zhewei Yao, Amir Gholami, Kurt Keutzer, Michael W. Mahoney
ArXiv ID: arXiv:1912.07145
URL: https://arxiv.org/abs/1912.07145

Abstract: We present PYHESSIAN, a new scalable framework that enables fast computation of Hessian (i.e., second-order derivative) information for deep neural networks. PYHESSIAN enables fast computations of the top Hessian eigenvalues, the Hessian trace, and the full Hessian eigenvalue/spectral density, and it supports distributed-memory execution on cloud/supercomputer systems and is available as open source. This general framework can be used to analyze neural network models, including the topology of the loss landscape (i.e., curvature information) to gain insight into the behavior of different models/optimizers. To illustrate this, we analyze the effect of residual connections and Batch Normalization layers on the trainability of neural networks. One recent claim, based on simpler first-order analysis, is that residual connections and Batch Normalization make the loss landscape smoother, thus making it easier for Stochastic Gradient Descent to converge to a good solution. Our extensive analysis shows new finer-scale insights, demonstrating that, while conventional wisdom is sometimes validated, in other cases it is simply incorrect. In particular, we find that Batch Normalization does not necessarily make the loss landscape smoother, especially for shallower networks.

Source Code: Mentions ‘available’ in abstract; Mentions ‘open source’ in abstract; Known repository: https://github.com/amirgholami/PyHessian

4. A Deeper Look at the Hessian Eigenspectrum of Deep Neural Networks and its Applications to Regularization #

Year: 2020
Authors: Adepu Ravi Sankar, Yash Khasbage, Rahul Vigneswaran, Vineeth N Balasubramanian
ArXiv ID: arXiv:2012.03801
URL: https://arxiv.org/abs/2012.03801

Abstract: Loss landscape analysis is extremely useful for a deeper understanding of the generalization ability of deep neural network models. In this work, we propose a layerwise loss landscape analysis where the loss surface at every layer is studied independently and also on how each correlates to the overall loss surface. We study the layerwise loss landscape by studying the eigenspectra of the Hessian at each layer. In particular, our results show that the layerwise Hessian geometry is largely similar to the entire Hessian. We also report an interesting phenomenon where the Hessian eigenspectrum of middle layers of the deep neural network are observed to most similar to the overall Hessian eigenspectrum. We also show that the maximum eigenvalue and the trace of the Hessian (both full network and layerwise) reduce as training of the network progresses. We leverage on these observations to propose a new regularizer based on the trace of the layerwise Hessian. Penalizing the trace of the Hessian at every layer indirectly forces Stochastic Gradient Descent to converge to flatter minima, which are shown to have better generalization performance. In particular, we show that such a layerwise regularizer can be leveraged to penalize the middlemost layers alone, which yields promising results. Our empirical studies on well-known deep nets across datasets support the claims of this work

Source Code: No explicit source code information found

Diagonal Scaling #

1. AdaHessian: An Adaptive Second Order Optimizer for Machine Learning #

Year: 2020
Authors: Zhewei Yao, Amir Gholami, Sheng Shen, Mustafa Mustafa, Kurt Keutzer, Michael W. Mahoney
ArXiv ID: arXiv:2006.00719
Algorithm: AdaHessian
URL: https://arxiv.org/abs/2006.00719

Abstract: We introduce ADAHESSIAN, a second order stochastic optimization algorithm which dynamically incorporates the curvature of the loss function via ADAptive estimates of the HESSIAN. Second order algorithms are among the most powerful optimization algorithms with superior convergence properties as compared to first order methods such as SGD and Adam. The main disadvantage of traditional second order methods is their heavier per iteration computation and poor accuracy as compared to first order methods. To address these, we incorporate several novel approaches in ADAHESSIAN, including: (i) a fast Hutchinson based method to approximate the curvature matrix with low computational overhead; (ii) a root-mean-square exponential moving average to smooth out variations of the Hessian diagonal across different iterations; and (iii) a block diagonal averaging to reduce the variance of Hessian diagonal elements. We show that ADAHESSIAN achieves new state-of-the-art results by a large margin as compared to other adaptive optimization methods, including variants of Adam. In particular, we perform extensive tests on CV, NLP, and recommendation system tasks and find that ADAHESSIAN: (i) achieves 1.80%/1.45% higher accuracy on ResNets20/32 on Cifar10, and 5.55% higher accuracy on ImageNet as compared to Adam; (ii) outperforms AdamW for transformers by 0.13/0.33 BLEU score on IWSLT14/WMT14 and 2.7/1.0 PPL on PTB/Wikitext-103; (iii) outperforms AdamW for SqueezeBert by 0.41 points on GLUE; and (iv) achieves 0.032% better score than Adagrad for DLRM on the Criteo Ad Kaggle dataset. Importantly, we show that the cost per iteration of ADAHESSIAN is comparable to first order methods, and that it exhibits robustness towards its hyperparameters.

Source Code: Known repository: https://github.com/amirgholami/adahessian

2. Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training #

Year: 2023
Authors: Hong Liu, Zhiyuan Li, David Hall, Percy Liang, Tengyu Ma
ArXiv ID: arXiv:2305.14342
Algorithm: Sophia
URL: https://arxiv.org/abs/2305.14342

Abstract: Given the massive cost of language model pre-training, a non-trivial improvement of the optimization algorithm would lead to a material reduction on the time and cost of training. Adam and its variants have been state-of-the-art for years, and more sophisticated second-order (Hessian-based) optimizers often incur too much per-step overhead. In this paper, we propose Sophia, Second-order Clipped Stochastic Optimization, a simple scalable second-order optimizer that uses a light-weight estimate of the diagonal Hessian as the pre-conditioner. The update is the moving average of the gradients divided by the moving average of the estimated Hessian, followed by element-wise clipping. The clipping controls the worst-case update size and tames the negative impact of non-convexity and rapid change of Hessian along the trajectory. Sophia only estimates the diagonal Hessian every handful of iterations, which has negligible average per-step time and memory overhead. On language modeling with GPT models of sizes ranging from 125M to 1.5B, Sophia achieves a 2x speed-up compared to Adam in the number of steps, total compute, and wall-clock time, achieving the same perplexity with 50% fewer steps, less total compute, and reduced wall-clock time. Theoretically, we show that Sophia, in a much simplified setting, adapts to the heterogeneous curvatures in different parameter dimensions, and thus has a run-time bound that does not depend on the condition number of the loss.

Source Code: Known repository: https://github.com/Liuhong99/Sophia

Hessian-free Optimization #

1. Learning Recurrent Neural Networks with Hessian-Free Optimization #

Year: 2011
Authors: James Martens, Ilya Sutskever
ArXiv ID:
URL: https://www.cs.toronto.edu/~jmartens/docs/RNN_HF.pdf

Abstract: In this work we resolve the long-outstanding problem of how to effectively train recurrent neural networks (RNNs) on complex and difficult sequence modeling problems which may contain long-term data dependencies. Utilizing recent advances in the Hessian-free optimization approach (Martens, 2010), together with a novel damping scheme, we successfully train RNNs on two sets of challenging problems. First, a collection of pathological synthetic datasets which are known to be impossible for standard optimization approaches (due to their extremely long-term dependencies), and second, on three natural and highly complex real-world sequence datasets where we find that our method significantly outperforms the previous state-of-the-art method for training neural sequence models: the Long Short-term Memory approach of Hochreiter and Schmidhuber (1997). Additionally, we offer a new interpretation of the generalized Gauss-Newton matrix of Schraudolph (2002) which is used within the HF approach of Martens.

Source Code: No explicit source code information found

2. Training Neural Networks with Stochastic Hessian-Free Optimization #

Year: 2013
Authors: Ryan Kiros
ArXiv ID: arXiv:1301.3641
Algorithm: SHF
URL: https://arxiv.org/abs/1301.3641

Abstract: Hessian-free (HF) optimization has been successfully used for training deep autoencoders and recurrent networks. HF uses the conjugate gradient algorithm to construct update directions through curvature-vector products that can be computed on the same order of time as gradients. In this paper we exploit this property and study stochastic HF with gradient and curvature mini-batches independent of the dataset size. We modify Martens’ HF for these settings and integrate dropout, a method for preventing co-adaptation of feature detectors, to guard against overfitting. Stochastic Hessian-free optimization gives an intermediary between SGD and HF that achieves competitive performance on both classification and deep autoencoder experiments.

Source Code: Mentions ‘code’ in abstract

Quasi-Newton #

1. A Stochastic Quasi-Newton Method for Large-Scale Optimization #

Year: 2014
Authors: R.H. Byrd, S.L. Hansen, J. Nocedal, Y. Singer
ArXiv ID: arXiv:1401.7020
URL: https://arxiv.org/abs/1401.7020

Abstract: The question of how to incorporate curvature information in stochastic approximation methods is challenging. The direct application of classical quasi- Newton updating techniques for deterministic optimization leads to noisy curvature estimates that have harmful effects on the robustness of the iteration. In this paper, we propose a stochastic quasi-Newton method that is efficient, robust and scalable. It employs the classical BFGS update formula in its limited memory form, and is based on the observation that it is beneficial to collect curvature information pointwise, and at regular intervals, through (sub-sampled) Hessian-vector products. This technique differs from the classical approach that would compute differences of gradients, and where controlling the quality of the curvature estimates can be difficult. We present numerical results on problems arising in machine learning that suggest that the proposed method shows much promise.

Source Code: No explicit source code information found

2. A Multi-Batch L-BFGS Method for Machine Learning #

Year: 2016
Authors: Albert S. Berahas, Jorge Nocedal, Martin Takáč
ArXiv ID: arXiv:1605.06049
URL: https://arxiv.org/abs/1605.06049

Abstract: The question of how to parallelize the stochastic gradient descent (SGD) method has received much attention in the literature. In this paper, we focus instead on batch methods that use a sizeable fraction of the training set at each iteration to facilitate parallelism, and that employ second-order information. In order to improve the learning process, we follow a multi-batch approach in which the batch changes at each iteration. This can cause difficulties because L-BFGS employs gradient differences to update the Hessian approximations, and when these gradients are computed using different data points the process can be unstable. This paper shows how to perform stable quasi-Newton updating in the multi-batch setting, illustrates the behavior of the algorithm in a distributed computing platform, and studies its convergence properties for both the convex and nonconvex cases.

Source Code: No explicit source code information found

3. Stochastic Quasi-Newton with Line-Search Regularization #

Year: 2019
Authors: Adrian Wills, Thomas Schön
ArXiv ID: arXiv:1909.01238
Algorithm: SQN
URL: https://arxiv.org/abs/1909.01238

Abstract: In this paper we present a novel quasi-Newton algorithm for use in stochastic optimisation. Quasi-Newton methods have had an enormous impact on deterministic optimisation problems because they afford rapid convergence and computationally attractive algorithms. In essence, this is achieved by learning the second-order (Hessian) information based on observing first-order gradients. We extend these ideas to the stochastic setting by employing a highly flexible model for the Hessian and infer its value based on observing noisy gradients. In addition, we propose a stochastic counterpart to standard line-search procedures and demonstrate the utility of this combination on maximum likelihood identification for general nonlinear state space models.

Source Code: No explicit source code information found

4. Practical Quasi-Newton Methods for Training Deep Neural Networks #

Year: 2020
Authors: Donald Goldfarb, Yi Ren, Achraf Bahamou
ArXiv ID: arXiv:2006.08877
URL: https://arxiv.org/abs/2006.08877

Abstract: We consider the development of practical stochastic quasi-Newton, and in particular Kronecker-factored block-diagonal BFGS and L-BFGS methods, for training deep neural networks (DNNs). In DNN training, the number of variables and components of the gradient $n$ is often of the order of tens of millions and the Hessian has $n^2$ elements. Consequently, computing and storing a full $n \times n$ BFGS approximation or storing a modest number of (step, change in gradient) vector pairs for use in an L-BFGS implementation is out of the question. In our proposed methods, we approximate the Hessian by a block-diagonal matrix and use the structure of the gradient and Hessian to further approximate these blocks, each of which corresponds to a layer, as the Kronecker product of two much smaller matrices. This is analogous to the approach in KFAC, which computes a Kronecker-factored block-diagonal approximation to the Fisher matrix in a stochastic natural gradient method. Because the indefinite and highly variable nature of the Hessian in a DNN, we also propose a new damping approach to keep the upper as well as the lower bounds of the BFGS and L-BFGS approximations bounded. In tests on autoencoder feed-forward neural network models with either nine or thirteen layers applied to three datasets, our methods outperformed or performed comparably to KFAC and state-of-the-art first-order stochastic methods.

Source Code: Mentions ‘code’ in abstract; Mentions ‘implementation’ in abstract

Gauss-Newton #

1. Efficient Subsampled Gauss-Newton and Natural Gradient Methods for Training Neural Networks #

Year: 2019
Authors: Yi Ren, Donald Goldfarb
ArXiv ID: arXiv:1906.02353
Algorithm: SWM-GN, SWM-NG
URL: https://arxiv.org/abs/1906.02353

Abstract: We present practical Levenberg-Marquardt variants of Gauss-Newton and natural gradient methods for solving non-convex optimization problems that arise in training deep neural networks involving enormous numbers of variables and huge data sets. Our methods use subsampled Gauss-Newton or Fisher information matrices and either subsampled gradient estimates (fully stochastic) or full gradients (semi-stochastic), which, in the latter case, we prove convergent to a stationary point. By using the Sherman-Morrison-Woodbury formula with automatic differentiation (backpropagation) we show how our methods can be implemented to perform efficiently. Finally, numerical results are presented to demonstrate the effectiveness of our proposed methods.

Source Code: No explicit source code information found

2. On the Promise of the Stochastic Generalized Gauss-Newton Method for Training DNNs #

Year: 2020
Authors: Matilde Gargiani, et al.
ArXiv ID: arXiv:2006.02409
Algorithm: SGN
URL: https://arxiv.org/abs/2006.02409

Abstract: Following early work on Hessian-free methods for deep learning, we study a stochastic generalized Gauss-Newton method (SGN) for training DNNs. SGN is a second-order optimization method, with efficient iterations, that we demonstrate to often require substantially fewer iterations than standard SGD to converge. As the name suggests, SGN uses a Gauss-Newton approximation for the Hessian matrix, and, in order to compute an approximate search direction, relies on the conjugate gradient method combined with forward and reverse automatic differentiation. Despite the success of SGD and its first-order variants, and despite Hessian-free methods based on the Gauss-Newton Hessian approximation having been already theoretically proposed as practical methods for training DNNs, we believe that SGN has a lot of undiscovered and yet not fully displayed potential in big mini-batch scenarios. For this setting, we demonstrate that SGN does not only substantially improve over SGD in terms of the number of iterations, but also in terms of runtime. This is made possible by an efficient, easy-to-use and flexible implementation of SGN we propose in the Theano deep learning platform, which, unlike Tensorflow and Pytorch, supports forward automatic differentiation. This enables researchers to further study and improve this promising optimization technique and hopefully reconsider stochastic second-order methods as competitive optimization techniques for training DNNs; we also hope that the promise of SGN may lead to forward automatic differentiation being added to Tensorflow or Pytorch. Our results also show that in big mini-batch scenarios SGN is more robust than SGD with respect to its hyperparameters (we never had to tune its step-size for our benchmarks!), which eases the expensive process of hyperparameter tuning that is instead crucial for the performance of first-order methods.

Source Code: Mentions ‘implementation’ in abstract

3. Stochastic Gauss-Newton Algorithms for Nonconvex Compositional Optimization #

Year: 2020
Authors: Quoc Tran-Dinh, et al.
ArXiv ID: arXiv:2002.07290
Algorithm: SGN with SARAH estimators
URL: https://arxiv.org/abs/2002.07290

Abstract: We develop two new stochastic Gauss-Newton algorithms for solving a class of non-convex stochastic compositional optimization problems frequently arising in practice. We consider both the expectation and finite-sum settings under standard assumptions, and use both classical stochastic and SARAH estimators for approximating function values and Jacobians. In the expectation case, we establish $\mathcal{O}(\varepsilon^{-2})$ iteration-complexity to achieve a stationary point in expectation and estimate the total number of stochastic oracle calls for both function value and its Jacobian, where $\varepsilon$ is a desired accuracy. In the finite sum case, we also estimate $\mathcal{O}(\varepsilon^{-2})$ iteration-complexity and the total oracle calls with high probability. To our best knowledge, this is the first time such global stochastic oracle complexity is established for stochastic Gauss-Newton methods. Finally, we illustrate our theoretical results via two numerical examples on both synthetic and real datasets.

Source Code: No explicit source code information found

4. Nonlinear Least Squares for Large-Scale Machine Learning using Stochastic Jacobian Estimates #

Year: 2021
Authors: Johannes J. Brust
ArXiv ID: arXiv:2107.05598
Algorithm: NLLS1, NLLSL
URL: https://arxiv.org/abs/2107.05598

Abstract: For large nonlinear least squares loss functions in machine learning we exploit the property that the number of model parameters typically exceeds the data in one batch. This implies a low-rank structure in the Hessian of the loss, which enables effective means to compute search directions. Using this property, we develop two algorithms that estimate Jacobian matrices and perform well when compared to state-of-the-art methods.

Source Code: No explicit source code information found

5. Improving Levenberg-Marquardt Algorithm for Neural Networks #

Year: 2022
Authors: Omead Pooladzandi, Yiming Zhou
ArXiv ID: arXiv:2212.08769
Algorithm: LM
URL: https://arxiv.org/abs/2212.08769

Abstract: We explore the usage of the Levenberg-Marquardt (LM) algorithm for regression (non-linear least squares) and classification (generalized Gauss-Newton methods) tasks in neural networks. We compare the performance of the LM method with other popular first-order algorithms such as SGD and Adam, as well as other second-order algorithms such as L-BFGS , Hessian-Free and KFAC. We further speed up the LM method by using adaptive momentum, learning rate line search, and uphill step acceptance.

Source Code: No explicit source code information found

6. Rethinking Gauss-Newton for learning over-parameterized models #

Year: 2023
Authors: Michael Arbel, et al.
ArXiv ID: arXiv:2302.02904
URL: https://arxiv.org/abs/2302.02904

Abstract: This work studies the global convergence and implicit bias of Gauss Newton’s (GN) when optimizing over-parameterized one-hidden layer networks in the mean-field regime. We first establish a global convergence result for GN in the continuous-time limit exhibiting a faster convergence rate compared to GD due to improved conditioning. We then perform an empirical study on a synthetic regression task to investigate the implicit bias of GN’s method. While GN is consistently faster than GD in finding a global optimum, the learned model generalizes well on test data when starting from random initial weights with a small variance and using a small step size to slow down convergence. Specifically, our study shows that such a setting results in a hidden learning phenomenon, where the dynamics are able to recover features with good generalization properties despite the model having sub-optimal training and test performances due to an under-optimized linear layer. This study exhibits a trade-off between the convergence speed of GN and the generalization ability of the learned solution.

Source Code: No explicit source code information found

7. Exact Gauss-Newton Optimization for Training Deep Neural Networks #

Year: 2024
Authors: Mikalai Korbit, Adeyemi D. Adeoye, Alberto Bemporad, Mario Zanon
ArXiv ID: arXiv:2405.14402
Algorithm: EGN
URL: https://arxiv.org/abs/2405.14402

Abstract: We present EGN, a stochastic second-order optimization algorithm that combines the generalized Gauss-Newton (GN) Hessian approximation with low-rank linear algebra to compute the descent direction. Leveraging the Duncan-Guttman matrix identity, the parameter update is obtained by factorizing a matrix which has the size of the mini-batch. This is particularly advantageous for large-scale machine learning problems where the dimension of the neural network parameter vector is several orders of magnitude larger than the batch size. Additionally, we show how improvements such as line search, adaptive regularization, and momentum can be seamlessly added to EGN to further accelerate the algorithm. Moreover, under mild assumptions, we prove that our algorithm converges to an $\epsilon$-stationary point at a linear rate. Finally, our numerical experiments demonstrate that EGN consistently exceeds, or at most matches the generalization performance of well-tuned SGD, Adam, and SGN optimizers across various supervised and reinforcement learning tasks.

Source Code: No explicit source code information found

Fisher Information #

1. Optimizing Neural Networks with Kronecker-factored Approximate Curvature #

Year: 2015
Authors: James Martens, Roger Grosse
ArXiv ID: arXiv:1503.05671
Algorithm: K-FAC
URL: https://arxiv.org/abs/1503.05671

Abstract: We propose an efficient method for approximating natural gradient descent in neural networks which we call Kronecker-Factored Approximate Curvature (K-FAC). K-FAC is based on an efficiently invertible approximation of a neural network’s Fisher information matrix which is neither diagonal nor low-rank, and in some cases is completely non-sparse. It is derived by approximating various large blocks of the Fisher (corresponding to entire layers) as being the Kronecker product of two much smaller matrices. While only several times more expensive to compute than the plain stochastic gradient, the updates produced by K-FAC make much more progress optimizing the objective, which results in an algorithm that can be much faster than stochastic gradient descent with momentum in practice. And unlike some previously proposed approximate natural-gradient/Newton methods which use high-quality non-diagonal curvature matrices (such as Hessian-free optimization), K-FAC works very well in highly stochastic optimization regimes. This is because the cost of storing and inverting K-FAC’s approximation to the curvature matrix does not depend on the amount of data used to estimate it, which is a feature typically associated only with diagonal or low-rank approximations to the curvature matrix.

Source Code: Known repository: Various implementations available

Other #

1. Second-order optimization with lazy Hessians #

Year: 2022
Authors: Nikita Doikov, El Mahdi Chayti, Martin Jaggi
ArXiv ID: arXiv:2212.00781
URL: https://arxiv.org/abs/2212.00781

Abstract: We analyze Newton’s method with lazy Hessian updates for solving general possibly non-convex optimization problems. We propose to reuse a previously seen Hessian for several iterations while computing new gradients at each step of the method. This significantly reduces the overall arithmetical complexity of second-order optimization schemes. By using the cubic regularization technique, we establish fast global convergence of our method to a second-order stationary point, while the Hessian does not need to be updated each iteration. For convex problems, we justify global and local superlinear rates for lazy Newton steps with quadratic regularization, which is easier to compute. The optimal frequency for updating the Hessian is once every $d$ iterations, where $d$ is the dimension of the problem. This provably improves the total arithmetical complexity of second-order algorithms by a factor $\sqrt{d}$.

Source Code: No explicit source code information found

Some popular partial differential equations (PDEs)

Thu, 27 Jun 2024 23:14:15 +0800

Single PDEs #

Linear equations #

Laplace’s equation

$$ \begin{equation} \Delta u = \sum_{i=1}^{n} u_{x_i x_i} = 0. \end{equation} $$

Helmholtz’s (or eigenvalue) equation

$$ \begin{equation} -\Delta u = \lambda u. \end{equation} $$

Linear transport equation

$$ \begin{equation} u_t + \sum_{i=1}^{n} b^i u_{x_i} = 0. \end{equation} $$

Liouville’s equation

$$ \begin{equation} u_t + \sum_{i=1}^{n} (b^i u)_{x_i} = 0. \end{equation} $$

Heat (or diffusion) equation

$$ \begin{equation} u_t - \Delta u = 0. \end{equation} $$

Schrödinger’s equation

$$ \begin{equation} i u_t + \Delta u = 0. \end{equation} $$

Kolmogorov’s equation

$$ \begin{equation} u_t - \sum_{i,j=1}^{n} a^{ij} u_{x_i x_j} + \sum_{i=1}^{n} b^i u_{x_i} = 0. \end{equation} $$

Fokker–Planck equation

$$ \begin{equation} u _t - \sum _{i,j = 1}^{n} (a^{ij} u) _{x_i x_j} - \sum _{i=1}^{n} (b^i u) _{x_i} = 0. \end{equation} $$

Wave equation

$$ \begin{equation} u_{tt} - \Delta k = 0. \end{equation} $$

Klein–Gordon equation

$$ \begin{equation} u_{tt} - \Delta u + m^2 u = 0. \end{equation} $$

Telegraph equation

$$ \begin{equation} u_{tt} + 2\delta u_t - u_{xx} = 0. \end{equation} $$

General wave equation

$$ \begin{equation} u_t - \sum_{i,j=1}^{n} a^{ij} u_{x_i x_j} + \sum_{i=1}^{n} b^i u_{x_i} = 0. \end{equation} $$

Airy’s equation

$$ \begin{equation} u_t + u_{xxx} = 0. \end{equation} $$

Beam equation

$$ \begin{equation} u_t + u_{xxxx} = 0. \end{equation} $$

Nonlinear equations #

Eikonal equation

$$ \begin{equation} |Du| = 1. \end{equation} $$

Nonlinear Poisson equation

$$ \begin{equation} -\Delta u = f(u). \end{equation} $$

$p$-Laplacian equation

$$ \begin{equation} \operatorname{div}(|Du|^{p-2} Du) = 0. \end{equation} $$

Minimal surface equation

$$ \begin{equation} \operatorname{div} \left( \frac{Du}{\sqrt{1 + |Du|^2}} \right) = 0. \end{equation} $$

Monge–Ampère equation

$$ \begin{equation} \det(D^2 u) = f. \end{equation} $$

Hamilton–Jacobi equation

$$ \begin{equation} u_t + H(Du, x) = 0. \end{equation} $$

Scalar conservation law

$$ \begin{equation} u_t + \operatorname{div} F(u) = 0. \end{equation} $$

Inviscid Burgers’ equation

$$ \begin{equation} u_t + u u_x = 0. \end{equation} $$

Scalar reaction-diffusion equation

$$ \begin{equation} u_t - \Delta u = f(u). \end{equation} $$

Porous medium equation

$$ \begin{equation} u_t - \Delta(u^m) = 0. \end{equation} $$

Nonlinear wave equation

$$ \begin{equation} u_{tt} - \Delta u + f(u) = 0. \end{equation} $$

Korteweg–deVries (KdV) equation

$$ \begin{equation} u_t + u u_x + u_{xxx} = 0. \end{equation} $$

Nonlinear Schrödinger equation

$$ \begin{equation} i u_t + \Delta u = f(|u|^2) u. \end{equation} $$

Systems of PDEs #

Linear systems #

Equilibrium equations of linear elasticity

$$ \begin{equation} \mu \Delta u + (\lambda + \mu) D(\operatorname{div} u) = 0. \end{equation} $$

Evolution equations of linear elasticity

$$ \begin{equation} u_{tt} - \mu \Delta u - (\lambda + \mu) D(\operatorname{div} u) = 0. \end{equation} $$

Maxwell’s equations

$$ \begin{equation} \begin{cases} E_t = \operatorname{curl} B \\ B_t = -\operatorname{curl} E \\ \operatorname{div} B = \operatorname{div} E = 0. \end{cases} \end{equation} $$

Nonlinear systems #

System of conservation laws

$$ \begin{equation} u_t + \operatorname{div} F(u) = 0. \end{equation} $$

Reaction-diffusion system

$$ \begin{equation} u_t - \Delta u = f(u). \end{equation} $$

Euler’s equations for incompressible, inviscid flow

$$ \begin{equation} \begin{cases} u_t + u \cdot Du = -Dp \ \operatorname{div} u = 0. \end{cases} \end{equation} $$

Navier–Stokes equations for incompressible, viscous flow

Study Mathematics at HCMUS

Thu, 27 Jun 2024 23:14:15 +0800

1. Applied Mathematics #

MNC - Research Methodologies
MTT001 - Advanced Functional Analysis
MTT006 - Advanced Linear Algebra
MTT011 - Numerical Analysis
MTT012 - Stochastic Process
MTT081 - Optimization Algorithms
MTT106 - Non-linear Programming
MTT107 - Set-valued Analysis
MTT083 - Convex Analysis
MTT130 - Numerical Programming for Applied Problems
MTT131 - Seminar in Applied Mathematics
MTT139 - Mathematical Models in Economics
MTT147 - Statistical Modelling
MTT099 - Differential Equations
MTT097 - Partial Differential Equations
MTH10403 - Functional Analysis
MTT090 - Complex Analysis
MTT149 - Convex Analysis and Optimization

2. Mathematical Analysis #

MTT001 - Advanced Functional Analysis
MTT006 - Advanced Linear Algebra
MTT099 - Differential Equations
MTT097 - Partial Differential Equations
MTT090 - Complex Analysis
MTT149 - Convex Analysis and Optimization

Explainable Reinforcement Learning (XRL)

Fri, 09 Feb 2024 00:00:00 +0000

In the progress…

Reinforcement Learning (RL)

Fri, 09 Feb 2024 00:00:00 +0000

In the progress…

Temporal Knowledge Graph Completetion

Fri, 09 Feb 2024 00:00:00 +0000

In the progress…

Ngoại suy tri thức (Knowledge Extrapolation) cho đồ thị tri thức (Knowledge Graphs)

Wed, 22 Nov 2023 00:00:00 +0000

Động lực nghiên cứu #

Trong nhiều ứng dụng thực tế như các cơ sở dữ liệu đồ thị (graph database systems), hệ thống gợi ý (recommendation systems), hay hệ thống trả lời câu hỏi (question answering sytems), đồ thị tri thức (knowledge graphs - KG) đóng vai trò là nguồn tri thức giá trị. Có nhiều hướng tiếp cận cho các phương pháp khai thác loại cơ sở tri thức này, và trong đó hướng tiếp cận nhúng đồ thị tri thức (knowledge graph embedding - KGE) là một trong những hướng tiếp cận khả thi và hiệu quả cho nhiều tác vụ downstream như dự đoán liên kết (link prediction/ missing fact completion), hiệu chỉnh thực thể (entity alignment). Tuy nhiên, các phương pháp KGE vẫn phải đối mặt với nhiều vấn đề và thách thức, trong đó vấn đề xử lý các thực thể hay quan hệ chưa biết (unseen objects - entities/ relations) trong quá trình đánh giá/ triển khai mô hình là một trong những khó khăn đó.

Lấy động lực từ vấn đề này, một hướng nghiên cứu mới ra đời dựa trên hàng loạt các công trình gần đây, ngoại suy tri thức (Knowledge Extrapolation - KE) được hình thành. Trong notes này, chúng tôi dựa trên bài báo Generalizing to Unseen Elements: A Survey on Knowledge Extrapolation for Knowledge Graphs của Mingyang Chen để tổng hợp và trình bày bổ sung các phương pháp gần đây cho hướng nghiên cứu KE.

Nếu bạn đọc có quan tâm đến hướng nghiên cứu này, vui lòng đọc paper để có thêm thông tin chi tiết:

Chen, M., Zhang, W., Geng, Y., Xu, Z., Pan, J. Z., & Chen, H. (2023). Generalizing to Unseen Elements: A Survey on Knowledge Extrapolation for Knowledge Graphs. arXiv preprint arXiv:2302.01859.

Nhúng đồ thị tri thức (knowledge graph embedding) #

Ta định nghĩa một cách hình thức đồ thị tri thức là $\mathcal{G} = \{\mathcal{E}, \mathcal{R}, \mathcal{T}\}$, trong đó:

$\mathcal{E}$ là tập hợp các thực thể (entities).
$\mathcal{R}$ là tập hợp các quan hệ (relations).
$\mathcal{T}$ là tập hợp các bộ ba dữ liệu (fact triplets). Một bộ ba dữ liệu biểu diễn một mối liên hệ giữa hai thực thể thông qua một quan hệ, và có thể được biểu diễn như một tập hợp $\{h, r, t\} \subseteq \mathcal{E} \times \mathcal{R} \times \mathcal{E}$

Do cơ sở tri thức này có cấu trúc đồ thị, nên ta hoàn toàn có thể biểu diễn nó thông qua ma trận kề. Tuy nhiên, cách này rất tốn kém, và điều đó thật là không hiệu quả. Thay vì sử dụng phương pháp nhúng “ngây thơ” như vậy, người ta sử dụng phương pháp đơn giản mà hiệu quả hơn “nhúng tra nông”, “shallow lookup embedding"Trong shallow embedding, bộ mã hóa được định nghĩa bằng một “bảng tra” sao cho tính tương đồng trong không gian này có thể xấp xỉ tính tương đồng trong không gian trước đó. Mỗi một cột của ma trận này thể hiệu bảng nhúng của nút, còn tổng số dòng của ma trận thể hiện số chiều nhúng/ kích thước nhúng. Hơn nữa, ta cũng cần phải phân biệt giữa “shallow embedding” và “deep embedding”.. Nói chung, mục tiêu chính của phương pháp nhúng đồ thị tri thức là biểu diễn các phần trong các tập hợp thực tể $\mathcal{E}$ và quan hệ $\mathcal{R}$ vào không gian vector liên tục thấp chiều trong khi vẫn bảo toàn cấu trúc nội tại của dữ liệu đồ thị.

Để đánh giá một phương pháp nhúng đồ thị tri thức có tốt hay không, người ta thường khảo sát tác tục dự đoán liên kết(có thể hiểu là dự đoán các bộ dữ kiện bị thiếu, điều này chưa đúng đắn về mặt bản chất nhưng ta vẫn có thể chấp nhận được) cho việc đánh giá mức độ hiệu quả của phương pháp KGE được đề xuất.

(a) Tập huấn luyện (training), và (b) Tập kiểm tra (test) cho KGE truyền thống. Ví dụ về tập kiểm tra cho thiết lập bài toán ngoại suy thực thể (c) và thiết lập bài toán ngoại suy quan hệ (d). Trong đó có thể có bất kỳ thông tin bổ trợ nào về những thực thể chưa biết trong tập hỗ trợ (support set), và sử những bộ ba dữ kiện liên quan như những ví dụ.

Các phương pháp được đề xuất cho thiết lập ngoại suy tri thức có mục tiêu thực hiện dự đoán liên kết trên những phần tử chưa biết (unseen elements). Một cách thống nhất, trong quá trình ngoại suy tri thúc, có hai tập được sử dụng cho đánh giá:

Một tập cung cấp thông tin hỗ trợ về những phần tử chưa biết;
Tập còn lại đánh giá khả năng dự đoán liên kết của mô hình.

Về mặt phân loại, ta có thể chia các phương pháp tiếp cận hiện tại theo hai hướng: ngoại suy thực thể (Entity Extrapolation), và ngoại suy quan hệ (Relation Extrapolation). Hình bên dưới thể hiện tổng quan hệ thống phân loại các phương pháp tiếp cận.

Các phương pháp ngoại suy thực thể (Entity extrapolation methods) #

Mã hóa thực thể (Entity encoding) #

Một trong những cách để xử lý những thực thể chưa biết đó là học cách mã hóa những thực thể thay vì học các bảng nhúng “cố định”. Những bộ mã hóa học được này (learned encoders) có thể thực thi trên tập hợp hỗ trợ của các thực thể để tạo ra các bảng nhúng hợp lý (reasonable embeddings) cho chúng. Hiện nay, có nhiều cách để thiết kế các mô hình mã hóa này. Tùy thuộc vào tính chất của tập hỗ trợ mà ta có thể chọn lựa các phương pháp tiếp cận phù hợp.

Encode from structural information (khi tập support chỉ chứa những thông tin về bộ ba chưa biết):
- (MEAN) Bi, Z., Zhang, T., Zhou, P., & Li, Y. (2020). Knowledge transfer for out-of-knowledge-base entities: Improving graph-neural-network-based embedding using convolutional layers. IEEE Access, 8, 159039-159049.
- (LAN) Wang, P., Han, J., Li, C., & Pan, R. (2019, July). Logic attention based neighborhood aggregation for inductive knowledge graph embedding. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 33, No. 01, pp. 7152-7159).
- Bhowmik, R., & de Melo, G. (2020). Explainable link prediction for emerging entities in knowledge graphs. In The Semantic Web–ISWC 2020: 19th International Semantic Web Conference, Athens, Greece, November 2–6, 2020, Proceedings, Part I 19 (pp. 39-55). Springer International Publishing.
- Albooyeh, M., Goel, R., & Kazemi, S. M. (2020, November). Out-of-sample representation learning for knowledge graphs. In Findings of the Association for Computational Linguistics: EMNLP 2020 (pp. 2657-2666).
- (CFAG) Wang, C., Zhou, X., Pan, S., Dong, L., Song, Z., & Sha, Y. (2022, June). Exploring Relational Semantics for Inductive Knowledge Graph Completion. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 36, No. 4, pp. 4184-4192).
- (ARGCN) Cui, Y., Wang, Y., Sun, Z., Liu, W., Jiang, Y., Han, K., & Hu, W. (2022, October). Inductive knowledge graph reasoning for multi-batch emerging entities. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management (pp. 335-344).
- (QBLP) Ali, M., Berrendorf, M., Galkin, M., Thost, V., Ma, T., Tresp, V., & Lehmann, J. (2021). Improving inductive link prediction using hyper-relational facts. In The Semantic Web–ISWC 2021: 20th International Semantic Web Conference, ISWC 2021, Virtual Event, October 24–28, 2021, Proceedings 20 (pp. 74-92). Springer International Publishing.
- (GEN) Baek, J., Lee, D. B., & Hwang, S. J. (2020). Learning to extrapolate knowledge: Transductive few-shot out-of-graph link prediction. Advances in Neural Information Processing Systems, 33, 546-560.
- (HRFN) Zhang, Y., Wang, W., Chen, W., Xu, J., Liu, A., & Zhao, L. (2021, October). Meta-learning based hyper-relation feature modeling for out-of-knowledge-base embedding. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management (pp. 2637-2646).
- (INDIGO) Liu, S., Grau, B., Horrocks, I., & Kostylev, E. (2021). Indigo: Gnn-based inductive knowledge graph completion using pair-wise encoding. Advances in Neural Information Processing Systems, 34, 2034-2045.
- (MorsE) Chen, M., Zhang, W., Zhu, Y., Zhou, H., Yuan, Z., Xu, C., & Chen, H. (2022, July). Meta-knowledge transfer for inductive knowledge graph embedding. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 927-937).
- (NodePiece) Galkin, M., Denis, E., Wu, J., & Hamilton, W. L. (2021). Nodepiece: Compositional and parameter-efficient representations of large knowledge graphs. arXiv preprint arXiv:2106.12144.
Encode from other information (khi tập support có chứa những thông tin khác):
- (DKRL) Xie, R., Liu, Z., Jia, J., Luan, H., & Sun, M. (2016, March). Representation learning of knowledge graphs with entity descriptions. In Proceedings of the AAAI conference on artificial intelligence (Vol. 30, No. 1).
- (ConMask) Shi, B., & Weninger, T. (2018, April). Open-world knowledge graph completion. In Proceedings of the AAAI conference on artificial intelligence (Vol. 32, No. 1).
- (OWE) Shah, H., Villmow, J., Ulges, A., Schwanecke, U., & Shafait, F. (2019, July). An open-world extension to knowledge graph completion models. In Proceedings of the AAAI conference on artificial intelligence (Vol. 33, No. 01, pp. 3044-3051).
- (KEPLER) Wang, X., Gao, T., Zhu, Z., Zhang, Z., Liu, Z., Li, J., & Tang, J. (2021). KEPLER: A unified model for knowledge embedding and pre-trained language representation. Transactions of the Association for Computational Linguistics, 9, 176-194.
- (StAR) Wang, B., Shen, T., Long, G., Zhou, T., Wang, Y., & Chang, Y. (2021, April). Structure-augmented text representation learning for efficient knowledge graph completion. In Proceedings of the Web Conference 2021 (pp. 1737-1748).
- (BLP) Daza, D., Cochez, M., & Groth, P. (2021, April). Inductive entity representations from text via link prediction. In Proceedings of the Web Conference 2021 (pp. 798-808).
- (SimKGC) Wang, L., Zhao, W., Wei, Z., & Liu, J. (2022). SimKGC: Simple contrastive knowledge graph completion with pre-trained language models. arXiv preprint arXiv:2203.02167.
- (StATIK) Markowitz, E., Balasubramanian, K., Mirtaheri, M., Annavaram, M., Galstyan, A., & Ver Steeg, G. (2022, July). StATIK: Structure and text for inductive knowledge graph completion. In Findings of the Association for Computational Linguistics: NAACL 2022 (pp. 604-615).

Dự đoán đồ thị con (Subgraph predicting) #

(GraIL) Teru, K., Denis, E., & Hamilton, W. (2020, November). Inductive relation prediction by subgraph reasoning. In International Conference on Machine Learning (pp. 9448-9457). PMLR.
(CoMPILE) Mai, S., Zheng, S., Yang, Y., & Hu, H. (2021, May). Communicative message passing for inductive relation reasoning. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 35, No. 5, pp. 4294-4302).
(TACT) Chen, J., He, H., Wu, F., & Wang, J. (2021, May). Topology-aware correlations between relations for inductive link prediction in knowledge graphs. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 35, No. 7, pp. 6271-6278).
(ConGLR) Lin, Q., Liu, J., Xu, F., Pan, Y., Zhu, Y., Zhang, L., & Zhao, T. (2022, July). Incorporating context graph with logical reasoning for inductive relation prediction. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 893-903).
(SNRI) Xu, X., Zhang, P., He, Y., Chao, C., & Yan, C. (2022). Subgraph neighboring relations infomax for inductive link prediction on knowledge graphs. arXiv preprint arXiv:2208.00850.
(BertRL) Zha, H., Chen, Z., & Yan, X. (2022, June). Inductive relation prediction by BERT. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 36, No. 5, pp. 5923-5931).
(RMPI) Geng, Y., Chen, J., Pan, J. Z., Chen, M., Jiang, S., Zhang, W., & Chen, H. (2023, April). Relational message passing for fully inductive knowledge graph completion. In 2023 IEEE 39th International Conference on Data Engineering (ICDE) (pp. 1221-1233). IEEE.
(PathCon) Wang, H., Ren, H., & Leskovec, J. (2021, August). Relational message passing for knowledge graph completion. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (pp. 1697-1707).
(NBFNet) Zhu, Z., Zhang, Z., Xhonneux, L. P., & Tang, J. (2021). Neural bellman-ford networks: A general graph neural network framework for link prediction. Advances in Neural Information Processing Systems, 34, 29476-29490.
(RED-GNN) Zhang, Y., & Yao, Q. (2022, April). Knowledge graph reasoning with relational digraph. In Proceedings of the ACM web conference 2022 (pp. 912-924).

Dựa trên khai thác luật (Rule mining) #

(AMIE) Galárraga, L. A., Teflioudi, C., Hose, K., & Suchanek, F. (2013, May). AMIE: association rule mining under incomplete evidence in ontological knowledge bases. In Proceedings of the 22nd international conference on World Wide Web (pp. 413-422).
(RuleN) Meilicke, C., Fink, M., Wang, Y., Ruffinelli, D., Gemulla, R., & Stuckenschmidt, H. (2018). Fine-grained evaluation of rule-and embedding-based systems for knowledge graph completion. In The Semantic Web–ISWC 2018: 17th International Semantic Web Conference, Monterey, CA, USA, October 8–12, 2018, Proceedings, Part I 17 (pp. 3-20). Springer International Publishing.
(AnyBURL) Meilicke, C., Chekol, M. W., Ruffinelli, D., & Stuckenschmidt, H. (2019, August). Anytime Bottom-Up Rule Learning for Knowledge Graph Completion. In IJCAI (pp. 3137-3143).
(NeuralLP) Yang, F., Yang, Z., & Cohen, W. W. (2017). Differentiable learning of logical rules for knowledge base reasoning. Advances in neural information processing systems, 30.
(DRUM) Sadeghian, A., Armandpour, M., Ding, P., & Wang, D. Z. (2019). Drum: End-to-end differentiable rule mining on knowledge graphs. Advances in Neural Information Processing Systems, 32.
(CBGNN) Yan, Z., Ma, T., Gao, L., Tang, Z., & Chen, C. (2022, June). Cycle representation learning for inductive relation prediction. In International Conference on Machine Learning (pp. 24895-24910). PMLR

Các phương pháp ngoại suy quan hệ (Relation extrapolation methods) #

Mã hóa quan hệ (Relation encoding) #

Encode from structural information (khi tập support chỉ chứa những thông tin về bộ ba chưa biết):
- (MetaR) Chen, M., Zhang, W., Zhang, W., Chen, Q., & Chen, H. (2019). Meta relational learning for few-shot link prediction in knowledge graphs. arXiv preprint arXiv:1909.01515.
- (GANA) Niu, G., Li, Y., Tang, C., Geng, R., Dai, J., Liu, Q., … & Si, L. (2021, July). Relational learning with gated and attentive neighbor aggregator for few-shot knowledge graph completion. In Proceedings of the 44th International ACM SIGIR conference on research and development in information retrieval (pp. 213-222).
Encode from other information (khi tập support có chứa những thông tin khác):
- (ZSGAN) Qin, P., Wang, X., Chen, W., Zhang, C., Xu, W., & Wang, W. Y. (2020, April). Generative adversarial zero-shot relational learning for knowledge graphs. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34, No. 05, pp. 8673-8680).
- (OntoZSL) Geng, Y., Chen, J., Chen, Z., Pan, J. Z., Ye, Z., Yuan, Z., … & Chen, H. (2021, April). Ontozsl: Ontology-enhanced zero-shot learning. In Proceedings of the Web Conference 2021 (pp. 3325-3336).
- (DMoG) Song, R., He, S., Zheng, S., Gao, S., Liu, K., Yu, Z., & Zhao, J. (2022, October). Decoupling Mixture-of-Graphs: Unseen Relational Learning for Knowledge Graph Completion by Fusing Ontology and Textual Experts. In Proceedings of the 29th International Conference on Computational Linguistics (pp. 2237-2246).
- (HAPZSL) Li, X., Ma, J., Yu, J., Xu, T., Zhao, M., Liu, H., … & Yu, R. (2022). HAPZSL: A hybrid attention prototype network for knowledge graph zero-shot relational learning. Neurocomputing, 508, 324-336.
- (DOZSL) Geng, Y., Chen, J., Zhang, W., Xu, Y., Chen, Z., Z. Pan, J., … & Chen, H. (2022, August). Disentangled ontology embedding for zero-shot learning. In Proceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining (pp. 443-453).

Khớp cặp thực thể (Entity pair matching) #

Các công trình tiêu biểu

(GMatching) Xiong, W., Yu, M., Chang, S., Guo, X., & Wang, W. Y. (2018). One-shot relational learning for knowledge graphs. arXiv preprint arXiv:1808.09040.
(FSRL) Zhang, C., Yao, H., Huang, C., Jiang, M., Li, Z., & Chawla, N. V. (2020, April). Few-shot knowledge graph completion. In Proceedings of the AAAI conference on artificial intelligence (Vol. 34, No. 03, pp. 3041-3048).
(FAAN) Sheng, J., Guo, S., Chen, Z., Yue, J., Wang, L., Liu, T., & Xu, H. (2020). Adaptive attentional network for few-shot knowledge graph completion. arXiv preprint arXiv:2010.09638.
(MetaP) Jiang, Z., Gao, J., & Lv, X. (2021, July). Metap: Meta pattern learning for one-shot knowledge graph completion. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 2232-2236).
(P-INT) Xu, J., Zhang, J., Ke, X., Dong, Y., Chen, H., Li, C., & Liu, Y. (2021, November). P-INT: A path-based interaction model for few-shot knowledge graph completion. In Findings of the Association for Computational Linguistics: EMNLP 2021 (pp. 385-394).
(GraphANGEL) Jin, J., Wang, Y., Du, K., Zhang, W., Zhang, Z., Wipf, D., … & Gan, Q. (2021, October). Inductive Relation Prediction Using Analogy Subgraph Embeddings. In International Conference on Learning Representations.
(CSR) Huang, Q., Ren, H., & Leskovec, J. (2022). Few-shot relational reasoning via connection subgraph pretraining. Advances in Neural Information Processing Systems, 35, 6397-6409.

Dữ liệu #

Các bộ dữ liệu:

WN11-{Head/Tail/Both}-{1,000/3,000/5,000}
- Được đề xuất bởi
{WN18RR/FB15k-237/NELL995}-{v1/2/3/4}
NELL-One/Wiki-One
NELL-ZS/Wiki-ZS

Bàn luận #

Bàn luận 1: Những gia định về ngoại suy thực thể

Thường có hai giả định khác nhau về ngoại suy thực thể (entity extroplation).

Giả định thứ nhất: các thực thể chưa biết trong tập support được liên kết với những thực thể đã biết. Giả định này được gọi là bán ngoại suy thực thể (semi-entity extrapolation).
Giả định thứ hai: các thực thể chưa biết tạo thành một đồ thị tri thức hoàn toàn mới trong các tập support và không liên kết bởi các thực thể đã biết. Giả định này được gọi là ngoại suy thực thể hoàn toàn (fully-entity extrapolation).

Như vậy, ta hoàn toàn có thể thấy các mô hình được thiết kế để giải quyết cho vấn đề ngoại suy hoàn toàn thì có thể áp dụng để giải quyết cho trường hợp bán ngoại suy, nhưng chiều ngược lại thì không được.

Hầu hết các mô hình bán ngoại suy thực thể nằm trong nhóm các mô hình dựa trên mã hóa thực thể và mã hóa thực thể chưa biết từ thông tin cấu trúc bởi vì chúng thường thiết kế các module cho việc chuyển giao tri thức từ các thực thể đã biết. Một số mô hình thiết kế bộ mã hóa độc lập với thực thể khiến chúng có thể giải quyết vấn đề ngoại suy hoàn toàn.

Các phương pháp mã hóa các thực thể chưa biết từ các nguồn thông tin khác như thông tin văn bản mô tả cũng có thể giải quyết được bài toán ngoại suy hoàn toàn. Các phương pháp dựa trên dự đoán đồ thị con và học dựa trên luật có khả năng xử lý bài toán ngoại suy hoàn toàn bởi vì các đồ thị con và luật thì độc lập với thực thể.

Bàn luận 2: Khai thác thông tin trong tập support

Nhiều thể loại thông tin có thể được khai thác để xây dựng các tập support cho các thành phần chưa biết, bao gồm các bộ ba dữ kiện, mô tả ngữ cảnh, và bản thể học (ontologies). Chúng ta sẽ lần lượt xem xét từng thể loại một.

Đầu tiên, các bộ ba dữ kiện, mà cung cấp thông tin cấu trúc, một kiểu trực quan của thông tin hỗ trợ cho các thành phần chưa biết bởi chúng thường xuất hiện với những thành phần khác trong dạng thức của một bộ ba dữ kiện thay vì đứng một mình. Tri thức từ những thành phần đă biết được cung cấp bởi các bộ ba mà có thể sử dụng bởi các thành phần chưa biết.

Bên cạnh đó, thông tin mô tả ngữ cảnh cũng phổ biến cho KG bởi vì nhiều KG được xây dựng từ dữ liệu văn bản. Mô tả ngữ cảnh có thể cung cấp một cách tự nhiên khả năng ngoại suy đến cho những thành phần chưa biết, và thường được sử dụng trong các bộ mã hóa văn bản để biến đổi văn bản thành các embeddings.

Cuối cùng, bản thế học (ontologies) thường được sử dụng như tri thức tiên nghiệm (prior knowledge) về mối tương quan giữa các thành phần đã biết và chưa biết, và được sử dụng giải quyết các quan hệ chưa biết trong nhiều trong trình hiện nay. Một ontology thường được thể hiện như một đồ thị bao gồm các quan hệ phân cấp và ràng buộc trên các miền và khoảng quan hệ. Embedding của các quan hệ chưa biết có thể được phát sinh bằng cách sử dụng một phương pháp dựa trên ontology mà sử dụng nhiều kỹ thuật bao gồm GAN hay disentangled representation learning.

Các định hướng tương lai #

Định hướng 1: Khai thác vào các ứng dụng

Hầu hết các phương pháp ngoại suy tri thức hiện nay được đánh giá dựa trên bài toán dự đoán liên kết trên các tập kiểm tra. Mặc dù tác vụ dự đoán liên kết có thể cho thấy tính hiệu quả của mô hình và giúp đồ thị tri thức hoàn thiện, nó cũng có giá trị để khám phát cách để phát sinh những thành phần chưa biết của KG trong nhiều ứng dụng như: answering logical queries expressed in a subset of first-order logic; entity alignment task under the growing KG; question answering; …

Định hướng 2: Thông tin hỗ trợ đa thể thức

Đồ thị tri thức đa thể thức (Multi-modal knowledge graphs) là một trong những chủ đề nghiên cứu được đề cập nhiều trong thời gian gần đây. Trong khi nhiều phương pháp ngoại suy tri thức tập trung vào việc sử dụng ngôn ngữ tự nhiên như trong tin hỗ trợ cho các thành phần chưa biết, thì có tương đối ít các công trình giải quyết vấn đề tiềm năng của việc sử dụng thông tin thị giác.

Định hướng 3: Ngoại suy thực thể và quan hệ

Các nghiên cứu hiện tại trên vấn đề ngoại suy tập trung chủ yếu vào việc giải quyết ngoại suy thực thể và quan suy quan hệ một cách hoàn toàn độc lập, nhưng trong nhiều ứng dụng thực tế, các thực thể và quan hệ chưa biết có thể xuất hiện một cách đồng thời. Một lời giải khả thi ở đây là các phương pháp tích hợp một cách hiệu quả cả ngoại suy thực thể và quan hệ.

Định hướng 4: Thiết lập động và lifelong

Trong nhiều ứng dụng thực tế, một số KG bao gồm các ràng buộc thời gian mà thỏa mãn một số xem xét về thông tin thời gian khi mà đánh giá điểm cho một bộ ba nào đó. Đồ thị tri thức động cũng đối mặt với thách thức về việc xuất hiện của các thành phần bởi vì bản chất động của nó. Để giải quyết vấn đề này, nhiều công trình định nghĩa một bài toán về ngoại suy thực thể trong đồ thị động và sử dụng các kỹ thuật để thu được các embedding cho các thực thể chưa biết.

Tài liệu tham khảo

[1] Chen, M., Zhang, W., Geng, Y., Xu, Z., Pan, J. Z., & Chen, H. (2023). Generalizing to Unseen Elements: A Survey on Knowledge Extrapolation for Knowledge Graphs. arXiv preprint arXiv:2302.01859.

Đồ thị tri thức thực sự là gì? - What're actually knowledge graphs?

Thu, 26 Oct 2023 00:00:00 +0000

Trong lĩnh vực nghiên cứu đặc trưng tri thức (knowledge representation) và suy diễn (reasoning), tích hợp/ tổng hợp dữ liệu là một tác vụ quan trọng, và nó thường được thực hiện bằng cách sử dụng các cơ sở tri thức (knowledge bases). Có nhiều loại cơ sở tri thức, trong đó có đồ thị tri thức (knowledge graphs).

Đồ thị tri thức được tạo ra bằng cách sử dụng một mô hình tri thức (knowledge model), đó là một mô hình dữ liệu cấu trúc hóa dạng đồ thị (graph-structured data model) hay còn được gọi là ontology. Đó là lý do tại sao nói, mô hình tri thức là trái tim của đồ thị tri thức.

Thông thường, đồ thị tri thức thường được sử dụng để mà lưu trữ những mô tả có liên kết nội tại (interlinked descriptions) của các thực thể (entities) bao gồm đối tượng (objects), sự kiện (events), tình huống (situations) hay những khái niệm trừu tượng (abstract concepts).

Các mô tả bên trong đồ thị đều có thông tin ngữ nghĩa (formal sematic) được mã hóa cho phép có thể được sử dụng để làm cơ sở cho việc tương tác người-máy để xử lý theo cách hiệu quả và tránh nhập nhằng. Hơn nữa, chúng cũng đóng góp cho những mô tả khác, hình thành nên một mạng lưới (network) mà trong đó mỗi thực thể thể hiện một phần của mô tả của những thực thể có liên hệ đến nó. Và dựa vào mô hình tri thức, tính đa dạng dữ liệu cũng được liên kết giữa các thành phần trong đồ thị và được mô tả thông qua semantic metadata.

Lịch sử hình thành #

Vào những năm 1972, thuật ngữ “Đồ thị tri thức” hay “Knowledge graphs” được nhà ngôn ngữ học người Australia, Edgar W. Schneider đề ra trong một thảo luận về cách thức xây dựng một hệ thống giảng dạy module hóa (modular instructional systems for courses). Và đến mãi cuối những năm 1980, University of Groningen và University of Twente đã hợp tác trong một dự án gọi là Knowledge Graphs với mục tiêu tập trung vào thiết kế các mạng ngữ nghĩa (semantic networks) với những cạnh giới hạn trong một tập quan hệ hữu hạn để mà tạo điều kiện thuận lợi cho nghiên cứu đại số trên đồ thị. Theo đó trong những thập kỉ tiếp theo, khoảng cách giữa semantic networks và knowledge graphs trở nên mờ hẳn đi.

Những đồ thị tri thức đầu tiên là những cơ sở tri thức trong một miền tri thức cụ thể. Vào năm 1985, cơ sở dữ liệu WordNet được hình thành, nắm bắt các quan hệ ngữ nghĩa giữa các từ và ý nghĩa của chúng. Vào năm 2005, Marc Wirk sáng lập Geonames, nắm bắt các quan hệ giữa những tên gọi địa lý và vị trí và những thực thể được liên kết. Đến năm 1998, Andrew Edmonds of Science - Finance Ltd ở Anh, tạo ra một hệ thống gọi là ThinkBase sử dụng logic mờ (fuzzy-logic) dựa trên suy diễn trong ngữ cảnh trực quan (graphical context).

Đến năm 2007, lần lượt cả DBpedia và Freebase được hình thành và công bố như các cơ sở tri thức dạng đồ thị (graph-based knowledge bases) cho mục tiêu tổng quát hóa tri thức. DBpedia tập trung vào những dữ liệu được rút trích từ Wikipedia, trong khi Freebase tổng hợp một lượng lớn các tập dữ liệu công khai. Tuy nhiên cả hai không tự gọi chúng là “knowledge graphs”.

Đến năm 2012, Google giới thiệu đồ thị tri thức của họ, Google Knowledge Graphs, được xây dựng trên DBpedia và Freebase cùng với một lượng lớn các nguồn dữ liệu khác. Sau đó, họ tích hợp các nội dung được rút trích như RDFa, Microdara, JSON-LD từ các web pages, CIA World Factbook, Wikidata, và Wikipedia. Các loại thực thể và mối quan hệ liên kết trong đồ thị tri thức này đã được tổ chức thêm bằng cách sử dụng các thuật ngữ từ bộ tự vựng schema.org.

Định nghĩa #

Như ta đã biết, một cơ sở tri thức là một tập dữ liệu cụ thể mà thể hiện những dữ liệu thế giới thực và các quan hệ ngữa nghĩa trong dạng các bộ ba (triplets). Khi mà những bộ ba được thể hiện như một đồ thị với các cạnh là những quan hệ và các nút là những thự thể, nó được xem là đồ thị tri thức. Một cách tổng quát, đồ thị tri thức và cơ sở tri thức được xem là giống nhau về mặt khái niệm và có thể thay thế được cho nhau.

Vậy, một đồ thị tri thức thực sự là gì?

Không có một định nghĩa được chấp nhận. Hầu hết chúng đều dựa trên góc nhìn từ semantic web và bao gồm những đặc trưng chính:

Flexible relations among knowledge in topical domains: Một đồ thị tri thức
- định nghĩa các lớp trừu tượng, và các quan hệ của những thực thể trong một lược đồ (schema),
- mô tả chủ yếu những thực thể thế giới thực và các quan hệ nội tại giữa chúng trong tổ chức cấu trúc dữ liệu đồ thị,
- cho phép bất kỳ thực thể nào có quan hệ tiềm năng với những thực thể khác,
- bao quát đa dạng miền tri thức
General structure: một mạng lưới các thực thể, những loại ngữ nghĩa, thuộc tính, và các mối quan hệ.
Supporting reasoning over inferred ontologies: đồ thị tri thức thu thập và tích hợp thông tin vào một ontology và áp dụng bộ suy luận để rút ra kiến thức mới.

Tuy nhiên, có nhiều đặc trưng đồ thị tri thức không thật sự cần thiết và liên quan với nhau trong một số tình huống. Có thể hiểu đơn giản hơn:

Đồ thị tri thức là một cấu trúc số hóa mà thể hiện tri thức như các khái niệm và quan hệ giữa chúng (dữ kiện).

Đặc trưng cốt lỗi của đồ thị tri thức #

Các đồ thị tri thức kết hợp nhiều tính chất của nhiều mô hình quản lý dữ liệu như:

Cơ sở dữ liệu (database) $\rightarrow$ dữ liệu có thể được khai phá thông qua các truy vấn được cấu trúc hóa (structured queries)
Cấu trúc dữ liệu đồ thị (graph) $\rightarrow$ dữ liệu có thể được phân tích như cấu trúc dữ liệu mạng, đồ thị
Cơ sở tri thức (knowledge base) $\rightarrow$ dữ liệu mang trong nó các thông tin ngữ nghĩa hình thức, có thể được sử dụng cho các tác vụ tích hợp và suy diễn.

Thông thường, các đồ thị tri thức được thể hiện trong Resource Description Framework (RDF), nó cho phép thực thi tích hợp (integration), thống nhất (unification), liên kết (linking), và tái sử dụng (reuse) bởi vì nó có đặc điểm:

Tính biểu diễn (expressivity) vì khả năng thể hiện hiệu quả nhiều loại dữ liệu và nội dung.
Hiệu suất (performance) cao khi có thể xử lý hàng tỉ dữ kiện và thuộc tính.
Có khả năng tương tác (interoperability) giữa người và máy nhờ cho phép truy vấn thông qua SPARQL Protocol, quản lý nhờ vào SPARQL Store, và cộng tác (federation).
Có tính tiêu chuẩn hóa thông qua quá trình W3C.

Bản thể luận (ontologies) và ngữ nghĩa hình thức (formal semantics) #

Bản thể luận (ontologies) là xương sống của ngữ nghĩa hình thức (formal semantics) của một đồ thị tri thứ. Nó còn gọi là một lược đồ của dồ thị. Nó là mối liên hệ giữa các developers của một đồ thị tri thức và mong muốn của người dùng về ý nghĩa của dữ liệu bên trong đồ thị.

Một người dùng có thể là con người hoặc một phần mềm mà muốn tích hợp dữ liệu theo một cách đáng tin cậy và chính xác. Các bản thể luận đảm bảo hiểu đúng đắn về dữ liệu và ý nghĩa của nó.

Khi các ngữ nghĩa hình thức (formal semantics) được sử dụng để khai triển và tích hợp dữ liệu của đồ thị tri thức, một số chỉ dẫn cần được đề ra:

Lớp (classes)
Loại quan hệ (relationship types)
Loại (categories)
Mô tả phi ngữ cảnh (free context descriptions)

Thế nào là KHÔNG PHẢI LÀ đồ thị tri thức? #

Không phải mọi đồ thị RDF là một đồ thị tri thức. Cụ thể, một tập hợp dữ liệu thống kế, ví dụ như dữ liệu GDP của các quốc gia được thể hiện trong một RDF thì không phải một đồ thị tri thức. Một đồ thị thể hiện dữ liệu thường thì hữu ích, nhưng nó có thể không thật sự cần thiết để nắm bắt tri thức ngữ nghĩa của dữ liệu. Nó có thể hợp lý cho một ứng dụng chỉ cần có một chuỗi “Italy” liên kết với một chuỗi “GDP” và một con số “1 tỷ” mà không cần phải định nghĩa quốc gia nào hay GDP “Gross Domestic Product” của một quốc gia là gì? Đó là những liên kết và cấu trúc đồ thị tạo nên đồ thị tri thức, không phải do ngôn ngữ dùng để thể hiện dữ liệu.

Không phải mọi cơ sở tri thức là một đồ thị tri thức. Một đặc trưng cốt lõi của một đồ thị tri thức là những mô tả thực thể nên được liên kết nội tại với một thực thể khác. Điều này định nghĩa một thực thể liên kết với một thực thể khác. Và liên kết đó là cách mà đồ thị hình thành, ví dụ A là B mà B là C và C có D thì A có D. Cơ sở tri thức mà không có cấu trúc hình thức và ngữ nghĩa như cơ sở tri thức hỏi đáp về một domain nào đó thì không phải là một đồ thị tri thức. Nó hoàn toàn khả thi để có một hệ thống chuyên gia mà có một tập dữ liệu được tổ chức mà không phải ở dạng đồ thị như một tập các luật “if-then”.

Đồ thị tri thức lớn #

Google Knowledge Graph

DBpedia

Geonames

Wordnet

FactForge

Tham khảo #

[1] What is a knowledge graphs?, https://www.ontotext.com/knowledgehub/fundamentals/what-is-a-knowledge-graph/

Miscellanea

Tue, 17 Oct 2023 00:00:00 +0000

Miscellanea #

$\LaTeX$ Template #

Report Template

[1] DoCS HCMUS - Template Report 01

Link: https://www.overleaf.com/read/mqvdqztstvnf#77b130

[2] DoCS HCMUS - Template Report 02

Link: https://www.overleaf.com/read/qvqpqytgztsn#9c5467

Thesis Template

[1] Master Thesis Proposal

Link: https://www.overleaf.com/read/pmmkbqmsrvnq#7c860f

[2] Master Thesis Template

Link: https://www.overleaf.com/read/ybsqztfjnvjc#a23f6b

[3] Master Math Thesis Template

Link: https://www.overleaf.com/read/bzwmvkymwfwb#2b968e

Beamer Template

[1] DoCS HCMUS - Template Slide 01

Link: https://www.overleaf.com/read/dhkcxygmnxjv#fa7ec3

[2] Slide-template

Link: https://www.overleaf.com/read/jfgnzwpsxmhk#4d4625

Advices #

Links #

Videos #

Mathematics - The Language of the Universe

The World of Mathematical Reality

Paul Lockhart teaching Go

Five Principles of Extraordinary Math Teaching

The map of Mathematics

The map of Computer Science

Youtube channles #

[1] MIT OpenCourseWare

[2] 3Blue1Brown

[3] StatQuest with Josh Starmer

[4] Computer Science Theory Explained

[5] The Math District

Pre-print on Optimization and Operations Research #

Teaching

Tue, 17 Oct 2023 00:00:00 +0000

Teaching Assistant #

Applied in Data Science
Data Hiding and Secret Sharing
Data Structures and Algorithms
Data Mining and Applications
Data Visualization
Fundamental of Artificial Intelligence
Fundemental of Programming
Introduction to Programming
Introduction to Data Science
Introduction to Machine Learning
Introduction to Bigdata
Introduction to Information Technology
Graph Mining
Parallel Programming
Programming for Data Science
Swarm Intelligence

Optimization Research Papers in JMLR Volume 24

Fri, 29 Sep 2023 00:00:00 +0000

Optimization Research Papers in JMLR Volume 24 (2023) #

This document lists papers from JMLR Volume 24 (2023) that focus on optimization research, categorized by their primary themes. Each paper is numbered starting from 1 within its subsection, with a brief description of its key contributions to optimization theory, algorithms, or applications.

Convex Optimization #

Papers addressing convex optimization problems, including sparse PCA, L0 regularization, and matrix decomposition.

Sparse PCA: A Geometric Approach
Authors: Dimitris Bertsimas, Driss Lahlou Kitane
Description: Develops a geometric approach for sparse principal component analysis using convex optimization techniques.
Fundamental Limits and Algorithms for Sparse Linear Regression with Sublinear Sparsity
Authors: Lan V. Truong
Description: Investigates algorithms and theoretical limits for sparse linear regression with sublinear sparsity in a convex framework.
Sparse Training with Lipschitz Continuous Loss Functions and a Weighted Group L0-norm Constraint
Authors: Michael R. Metel
Description: Proposes sparse training methods using Lipschitz continuous loss functions and group L0-norm constraints.
MARS: A Second-Order Reduction Algorithm for High-Dimensional Sparse Precision Matrices Estimation
Authors: Qian Li, Binyan Jiang, Defeng Sun
Description: Presents a second-order reduction algorithm for sparse precision matrix estimation using convex optimization.
Sparse GCA and Thresholded Gradient Descent
Authors: Sheng Gao, Zongming Ma
Description: Develops sparse generalized correlation analysis with thresholded gradient descent in a convex framework.
A Parameter-Free Conditional Gradient Method for Composite Minimization under Hölder Condition
Authors: Masaru Ito, Zhaosong Lu, Chuan He
Description: Introduces a parameter-free conditional gradient method for composite minimization under Hölder smoothness.
L0Learn: A Scalable Package for Sparse Learning using L0 Regularization
Authors: Hussein Hazimeh, Rahul Mazumder, Tim Nonet
Description: Presents a scalable package for sparse learning with L0 regularization in convex optimization.
Sparse Plus Low Rank Matrix Decomposition: A Discrete Optimization Approach
Authors: Dimitris Bertsimas, Ryan Cory-Wright, Nicholas A. G. Johnson
Description: Proposes a discrete optimization approach for sparse plus low-rank matrix decomposition using convex methods.
Distributed Sparse Regression via Penalization
Authors: Yao Ji, Gesualdo Scutari, Ying Sun, Harsha Honnappa
Description: Develops distributed sparse regression algorithms using penalization techniques in convex optimization.
Elastic Gradient Descent, an Iterative Optimization Method Approximating the Solution Paths of the Elastic Net
Authors: Oskar Allerbo, Johan Jonasson, Rebecka Jörnsten
Description: Introduces an iterative method approximating elastic net solution paths in convex settings.
A Novel Integer Linear Programming Approach for Global L0 Minimization
Authors: Diego Delle Donne, Matthieu Kowalski, Leo Liberti
Description: Proposes an integer linear programming approach for global L0 minimization in convex optimization.

Nonconvex Optimization #

Papers tackling nonconvex optimization, focusing on descent algorithms, majorization minimization, and minimax problems.

A Line-Search Descent Algorithm for Strict Saddle Functions with Complexity Guarantees
Authors: Michael J. O’Neill, Stephen J. Wright
Description: Develops a line-search descent algorithm for nonconvex strict saddle functions with complexity guarantees.
An Inertial Block Majorization Minimization Framework for Nonsmooth Nonconvex Optimization
Authors: Le Thi Khanh Hien, Duy Nhat Phan, Nicolas Gillis
Description: Proposes an inertial block majorization minimization framework for nonsmooth nonconvex optimization.
Restarted Nonconvex Accelerated Gradient Descent: No More Polylogarithmic Factor in the O(epsilon^(-7/4)) Complexity
Authors: Huan Li, Zhouchen Lin
Description: Introduces a restarted accelerated gradient descent method for nonconvex optimization, eliminating polylogarithmic factors.
Preconditioned Gradient Descent for Overparameterized Nonconvex Burer-Monteiro Factorization with Global Optimality Certification
Authors: Gavin Zhang, Salar Fattahi, Richard Y. Zhang
Description: Develops preconditioned gradient descent for nonconvex Burer-Monteiro factorization with global optimality guarantees.
Zeroth-Order Alternating Gradient Descent Ascent Algorithms for A Class of Nonconvex-Nonconcave Minimax Problems
Authors: Zi Xu, Zi-Qi Wang, Jun-Lin Wang, Yu-Hong Dai
Description: Proposes zeroth-order alternating gradient descent ascent for nonconvex-nonconcave minimax problems.

Stochastic Optimization #

Papers focusing on stochastic optimization methods, including gradient descent, proximal point methods, and continuous-time approaches.

On the Convergence of Stochastic Gradient Descent with Bandwidth-Based Step Size
Authors: Xiaoyu Wang, Ya-xiang Yuan
Description: Analyzes convergence of stochastic gradient descent with bandwidth-based step sizes.
Stochastic Optimization under Distributional Drift
Authors: Joshua Cutler, Dmitriy Drusvyatskiy, Zaid Harchaoui
Description: Studies stochastic optimization under distributional drift with theoretical guarantees.
Improved Powered Stochastic Optimization Algorithms for Large-Scale Machine Learning
Authors: Zhuang Yang
Description: Proposes improved powered stochastic optimization algorithms for large-scale machine learning.
Sharper Analysis for Minibatch Stochastic Proximal Point Methods: Stability, Smoothness, and Deviation
Authors: Xiao-Tong Yuan, Ping Li
Description: Provides a sharper analysis of minibatch stochastic proximal point methods, focusing on stability and smoothness.
A Continuous-Time Stochastic Gradient Descent Method for Continuous Data
Authors: Kexin Jin, Jonas Latz, Chenguang Liu, Carola-Bibiane Schönlieb
Description: Introduces a continuous-time stochastic gradient descent method for continuous data optimization.
Sensitivity-Free Gradient Descent Algorithms
Authors: Ion Matei, Maksym Zhenirovskyy, Johan de Kleer, John Maxwell
Description: Develops sensitivity-free gradient descent algorithms for stochastic optimization.

Distributed/Decentralized Optimization #

Papers addressing distributed or decentralized optimization algorithms, focusing on federated learning, asynchronous updates, and network topology.

Decentralized Learning: Theoretical Optimality and Practical Improvements
Authors: Yucheng Lu, Christopher De Sa
Description: Analyzes theoretical optimality and practical improvements for decentralized learning algorithms.
A General Theory for Federated Optimization with Asynchronous and Heterogeneous Clients Updates
Authors: Yann Fraboni, Richard Vidal, Laetitia Kameni, Marco Lorenzi
Description: Provides a general theory for federated optimization with asynchronous and heterogeneous client updates.
Buffered Asynchronous SGD for Byzantine Learning
Authors: Yi-Rui Yang, Wu-Jun Li
Description: Proposes buffered asynchronous SGD for Byzantine-resilient distributed learning.
Minimax Estimation for Personalized Federated Learning: An Alternative Between FedAvg and Local Training
Authors: Shuxiao Chen, Qinqing Zheng, Qi Long, Weijie J. Su
Description: Investigates minimax estimation for personalized federated learning, comparing FedAvg and local training.
Removing Data Heterogeneity Influence Enhances Network Topology Dependence of Decentralized SGD
Authors: Kun Yuan, Sulaiman A. Alghunaim, Xinmeng Huang
Description: Enhances decentralized SGD by addressing data heterogeneity and network topology dependence.
Multi-Consensus Decentralized Accelerated Gradient Descent
Authors: Haishan Ye, Luo Luo, Ziang Zhou, Tong Zhang
Description: Develops multi-consensus decentralized accelerated gradient descent for distributed optimization.
Accelerated Primal-Dual Mirror Dynamics for Centralized and Distributed Constrained Convex Optimization Problems
Authors: You Zhao, Xiaofeng Liao, Xing He, Mingliang Zhou, Chaojie Li
Description: Proposes accelerated primal-dual mirror dynamics for centralized and distributed convex optimization.
Beyond Spectral Gap: The Role of the Topology in Decentralized Learning
Authors: Thijs Vogels, Hadrien Hendrikx, Martin Jaggi
Description: Examines the role of network topology in decentralized learning optimization.

Bandits and Online Learning #

Papers addressing multi-armed bandits, online optimization, and regret minimization.

Adaptation to the Range in K-Armed Bandits
Authors: Hédi Hadiji, Gilles Stoltz
Description: Studies adaptation to the range in k-armed bandit problems with regret minimization.
Dimension Reduction in Contextual Online Learning via Nonparametric Variable Selection
Authors: Wenhao Li, Ningyuan Chen, L. Jeff Hong
Description: Proposes dimension reduction techniques for contextual online learning with nonparametric variable selection.
Non-Stationary Online Learning with Memory and Non-Stochastic Control
Authors: Peng Zhao, Yu-Hu Yan, Yu-Xiang Wang, Zhi-Hua Zhou
Description: Investigates non-stationary online learning with memory and non-stochastic control strategies.
Online Non-Stochastic Control with Partial Feedback
Authors: Yu-Hu Yan, Peng Zhao, Zhi-Hua Zhou
Description: Develops online non-stochastic control methods with partial feedback for optimization.
A New Look at Dynamic Regret for Non-Stationary Stochastic Bandits
Authors: Yasin Abbasi-Yadkori, András György, Nevena Lazić
Description: Analyzes dynamic regret in non-stationary stochastic bandit problems.
A PDE Approach for Regret Bounds under Partial Monitoring
Authors: Erhan Bayraktar, Ibrahim Ekren, Xin Zhang
Description: Uses a PDE-based approach to derive regret bounds for partial monitoring in online learning.
Continuous-in-Time Limit for Bayesian Bandits
Authors: Yuhua Zhu, Zachary Izzo, Lexing Ying
Description: Explores the continuous-time limit for Bayesian bandit algorithms with theoretical guarantees.
Bandit Problems with Fidelity Rewards
Authors: Gábor Lugosi, Ciara Pike-Burke, Pierre-André Savalle
Description: Studies bandit problems with fidelity rewards, focusing on regret minimization.
Linear Partial Monitoring for Sequential Decision Making: Algorithms, Regret Bounds and Applications
Authors: Johannes Kirschner, Tor Lattimore, Andreas Krause
Description: Develops algorithms and regret bounds for linear partial monitoring in sequential decision-making.

Optimization in Reinforcement Learning #

Papers focusing on optimization techniques for reinforcement learning, including actor-critic methods and constrained RL.

Reinforcement Learning for Joint Optimization of Multiple Rewards
Authors: Mridul Agarwal, Vaneet Aggarwal
Description: Focuses on reinforcement learning for optimizing multiple rewards simultaneously.
Provably Sample-Efficient Model-Free Algorithm for MDPs with Peak Constraints
Authors: Qinbo Bai, Vaneet Aggarwal, Ather Gattami
Description: Proposes a sample-efficient model-free algorithm for MDPs with peak constraints.
Off-Policy Actor-Critic with Emphatic Weightings
Authors: Eric Graves, Ehsan Imani, Raksha Kumaraswamy, Martha White
Description: Develops off-policy actor-critic methods with emphatic weightings for RL optimization.
q-Learning for MDPs with General Spaces: Convergence and Near Optimality via Quantization under Weak Continuity
Authors: Yanwei Jia, Xun Yu Zhou
Description: Analyzes q-learning convergence and near-optimality for MDPs with general state spaces.
Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity
Authors: Kaiqing Zhang, Sham M. Kakade, Tamer Basar, Lin F. Yang
Description: Studies model-based multi-agent RL in zero-sum Markov games with near-optimal sample complexity.
F2A2: Flexible Fully-Decentralized Approximate Actor-Critic for Cooperative Multi-Agent Reinforcement Learning
Authors: Wenhao Li, Bo Jin, Xiangfeng Wang, Junchi Yan, Hongyuan Zha
Description: Proposes a flexible fully-decentralized approximate actor-critic method for cooperative multi-agent RL.
Adaptation Augmented Model-Based Policy Optimization
Authors: Jian Shen, Hang Lai, Minghuan Liu, Han Zhao, Yong Yu, Weinan Zhang
Description: Introduces adaptation-augmented model-based policy optimization for RL.
Single Timescale Actor-Critic Method to Solve the Linear Quadratic Regulator with Convergence Guarantees
Authors: Mo Zhou, Jianfeng Lu
Description: Develops a single timescale actor-critic method for linear quadratic regulators with convergence guarantees.
Convex Reinforcement Learning in Finite Trials
Authors: Mirco Mutti, Riccardo De Santi, Piersilvio De Bartolomeis, Marcello Restelli
Description: Investigates convex reinforcement learning with finite trials, focusing on optimization techniques.
Double Duality: Variational Primal-Dual Policy Optimization for Constrained Reinforcement Learning
Authors: Zihao Li, Boyi Liu, Zhuoran Yang, Zhaoran Wang, Mengdi Wang
Description: Proposes a variational primal-dual policy optimization method for constrained RL.
Instance-Dependent Confidence and Early Stopping for Reinforcement Learning
Authors: Eric Xia, Koulik Khamaru, Martin J. Wainwright, Michael I. Jordan
Description: Develops instance-dependent confidence bounds and early stopping strategies for RL optimization.

List of Selected Papers on Algorithms for Large-Scale Graph Processing.

Sat, 19 Aug 2023 00:00:00 +0000

1/ [ISAAC'11] Goodrich, M. T., Sitchinava, N., & Zhang, Q. (2011, December). Sorting, searching, and simulation in the mapreduce framework. In International Symposium on Algorithms and Computation (pp. 374-383). Springer, Berlin, Heidelberg.

1
2
3
4
5
6
7
8


@inproceedings{goodrich2011sorting,
 title={Sorting, searching, and simulation in the mapreduce framework},
 author={Goodrich, Michael T and Sitchinava, Nodari and Zhang, Qin},
 booktitle={International Symposium on Algorithms and Computation},
 pages={374--383},
 year={2011},
 organization={Springer}
}

2/ [STOC'14] Andoni, A., Nikolov, A., Onak, K., & Yaroslavtsev, G. (2014, May). Parallel algorithms for geometric graph problems. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing (pp. 574-583).

1
2
3
4
5
6
7


@inproceedings{andoni2014parallel,
 title={Parallel algorithms for geometric graph problems},
 author={Andoni, Alexandr and Nikolov, Aleksandar and Onak, Krzysztof and Yaroslavtsev, Grigory},
 booktitle={Proceedings of the forty-sixth annual ACM symposium on Theory of computing},
 pages={574--583},
 year={2014}
}

3/ [STOC'17] Im, S., Moseley, B., & Sun, X. (2017, June). Efficient massively parallel methods for dynamic programming. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing (pp. 798-811).

1
2
3
4
5
6
7


@inproceedings{im2017efficient,
 title={Efficient massively parallel methods for dynamic programming},
 author={Im, Sungjin and Moseley, Benjamin and Sun, Xiaorui},
 booktitle={Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing},
 pages={798--811},
 year={2017}
}

4/ [FOCS'18] Andoni, A., Song, Z., Stein, C., Wang, Z., & Zhong, P. (2018, October). Parallel graph connectivity in log diameter rounds. In 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS) (pp. 674-685). IEEE.

1
2
3
4
5
6
7
8


@inproceedings{andoni2018parallel,
 title={Parallel graph connectivity in log diameter rounds},
 author={Andoni, Alexandr and Song, Zhao and Stein, Clifford and Wang, Zhengyu and Zhong, Peilin},
 booktitle={2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS)},
 pages={674--685},
 year={2018},
 organization={IEEE}
}

5/ [SOSA'19] Liu, P., & Vondrák, J. (2018). Submodular optimization in the mapreduce model. arXiv preprint arXiv:1810.01489.

1
2
3
4
5
6


@article{liu2018submodular,
 title={Submodular optimization in the mapreduce model},
 author={Liu, Paul and Vondr{\'a}k, Jan},
 journal={arXiv preprint arXiv:1810.01489},
 year={2018}
}

6/ [PODC'19] Behnezhad, S., Brandt, S., Derakhshan, M., Fischer, M., Hajiaghayi, M., Karp, R. M., & Uitto, J. (2019, July). Massively parallel computation of matching and MIS in sparse graphs. In Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing (pp. 481-490).

1
2
3
4
5
6
7


@inproceedings{behnezhad2019massively,
 title={Massively parallel computation of matching and MIS in sparse graphs},
 author={Behnezhad, Soheil and Brandt, Sebastian and Derakhshan, Mahsa and Fischer, Manuela and Hajiaghayi, MohammadTaghi and Karp, Richard M and Uitto, Jara},
 booktitle={Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing},
 pages={481--490},
 year={2019}
}

Brandt, S., Fischer, M., & Uitto, J. (2018). Matching and MIS for uniformly sparse graphs in the low-memory MPC model. arXiv preprint arXiv:1807.05374.

1
2
3
4
5
6


@article{brandt2018matching,
 title={Matching and MIS for uniformly sparse graphs in the low-memory MPC model},
 author={Brandt, Sebastian and Fischer, Manuela and Uitto, Jara},
 journal={arXiv preprint arXiv:1807.05374},
 year={2018}
}

Behnezhad, S., Derakhshan, M., Hajiaghayi, M., & Karp, R. M. (2018). Massively parallel symmetry breaking on sparse graphs: MIS and maximal matching. arXiv preprint arXiv:1807.06701.

1
2
3
4
5
6


@article{behnezhad2018massively,
 title={Massively parallel symmetry breaking on sparse graphs: MIS and maximal matching},
 author={Behnezhad, Soheil and Derakhshan, Mahsa and Hajiaghayi, MohammadTaghi and Karp, Richard M},
 journal={arXiv preprint arXiv:1807.06701},
 year={2018}
}

7/ [PODC'19] Chang, Y. J., Fischer, M., Ghaffari, M., Uitto, J., & Zheng, Y. (2019, July). The complexity of $$(\Delta+ 1)$$ coloring in congested clique, massively parallel computation, and centralized local computation. In Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing (pp. 471-480).

1
2
3
4
5
6
7


@inproceedings{chang2019complexity,
 title={The complexity of ($\Delta$+ 1) coloring in congested clique, massively parallel computation, and centralized local computation},
 author={Chang, Yi-Jun and Fischer, Manuela and Ghaffari, Mohsen and Uitto, Jara and Zheng, Yufan},
 booktitle={Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing},
 pages={471--480},
 year={2019}
}

8/ [FOCS'19] Ghaffari, M., Kuhn, F., & Uitto, J. (2019, November). Conditional hardness results for massively parallel computation from distributed lower bounds. In 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS) (pp. 1650-1663). IEEE.

1
2
3
4
5
6
7
8


@inproceedings{ghaffari2019conditional,
 title={Conditional hardness results for massively parallel computation from distributed lower bounds},
 author={Ghaffari, Mohsen and Kuhn, Fabian and Uitto, Jara},
 booktitle={2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS)},
 pages={1650--1663},
 year={2019},
 organization={IEEE}
}

9/ [SODA'20] Ghaffari, M., Nowicki, K., & Thorup, M. (2020). Faster algorithms for edge connectivity via random 2-out contractions. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms (pp. 1260-1279). Society for Industrial and Applied Mathematics.

1
2
3
4
5
6
7
8


@inproceedings{ghaffari2020faster,
 title={Faster algorithms for edge connectivity via random 2-out contractions},
 author={Ghaffari, Mohsen and Nowicki, Krzysztof and Thorup, Mikkel},
 booktitle={Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms},
 pages={1260--1279},
 year={2020},
 organization={SIAM}
}

Reading list & mathematics resources.

Sat, 19 Aug 2023 00:00:00 +0000

Foundations of Mathematics #

Number Theory #

Algebra #

1/ Lay, D. C., Lay, S. R., & McDonald, J. (2016). Linear algebra and its applications. Pearson Education.

2/ Strang, G. (2019). Linear algebra and learning from data (Vol. 4). Cambridge: Wellesley-Cambridge Press.

Combinatorics - Graph Theory #

Graph theory books #

[1] Lewis, R. (2015). A guide to graph colouring (Vol. 7). Berlin: Springer.

[2] Tucker, A. (1994). Applied combinatorics. John Wiley & Sons, Inc..

[3] Li, Y., & Lin, Q. (2022). Elementary Methods of Graph Ramsey Theory (Vol. 211). Springer Nature.

[4] David Conlon - Extremal graph theory

[5] Trudeau, R. J. (1994). Introduction to graph theory. Dover Pubns.

[6] Reinhard, D. (2017). Graph Theory. GTM, vol. 173.

[7] Bondy, J. A., & Murty, U. S. R. (1976). Graph theory with applications (Vol. 290). London: Macmillan.

[8] Bollobás, B. (1998). Modern graph theory (Vol. 184). Springer Science & Business Media.

[9] Needham, M., & Hodler, A. E. (2019). Graph algorithms: practical examples in Apache Spark and Neo4j. O’Reilly Media.

[10] Guia, J., Soares, V. G., & Bernardino, J. (2017, April). Graph Databases: Neo4j Analysis. In ICEIS (1) (pp. 351-356).

[11] Harary, Frank - Graph Theory-Perseus Books (1999)

[12] Miklós Bóna - A Walk Through Combinatorics: An Introduction to Enumeration and Graph Theory, World Scientific (2016)

[13] Robin J. Wilson - Introduction to Graph Theory, Fourth Edition-Addison Wesley (1996)

[14] (Textbooks in Mathematics) Jonathan L. Gross, Jay Yellen, Mark Anderson - Graph Theory and Its Applications, third edition (2018)

[15] Introduction to Graph Theory, Douglas B. West

Geometry & Topology #

Analysis #

Probability & Statistics #

1/ Gould, R., & Ryan, C. N. (2015). Introductory statistics: Exploring the world through data. Pearson.

2/ Johnson, R. A., & Wichern, D. W. (2002). Applied multivariate statistical analysis.

3/ Härdle, W. K., & Simar, L. (2019). Applied multivariate statistical analysis. Springer Nature.

Numerical Analysis #

Signal processing #

Applied mathematics #

Machine learning #

1/ Deisenroth, M. P., Faisal, A. A., & Ong, C. S. (2020). Mathematics for machine learning. Cambridge University Press.

2/ Bishop, C. M., & Nasrabadi, N. M. (2006). Pattern recognition and machine learning (Vol. 4, No. 4, p. 738). New York: springer.

3/ Koller, D., & Friedman, N. (2009). Probabilistic graphical models: principles and techniques. MIT press.

4/ Barber, D. (2012). Bayesian reasoning and machine learning. Cambridge University Press.

5/ Hastie, T., Tibshirani, R., Friedman, J. H., & Friedman, J. H. (2009). The elements of statistical learning: data mining, inference, and prediction (Vol. 2, pp. 1-758). New York: springer.

6/ Mohri, M., Rostamizadeh, A., & Talwalkar, A. (2018). Foundations of machine learning. MIT press.

7/ Williams, C. K., & Rasmussen, C. E. (2006). Gaussian processes for machine learning (Vol. 2, No. 3, p. 4). Cambridge, MA: MIT press.

8/ Vapnik, V. (1999). The nature of statistical learning theory. Springer science & business media.

9/ Shalev-Shwartz, S., & Ben-David, S. (2014). Understanding machine learning: From theory to algorithms. Cambridge university press.

10/ Wainwright, M. J., & Jordan, M. I. (2008). Graphical models, exponential families, and variational inference. Foundations and Trends® in Machine Learning, 1(1–2), 1-305.

Optimization #

1/ Kochenderfer, M. J., & Wheeler, T. A. (2019). Algorithms for optimization. Mit Press.

2/ Kochenderfer, M. J., Wheeler, T. A., & Wray, K. H. (2022). Algorithms for decision making. MIT press.

3/ Boyd, S. P., & Vandenberghe, L. (2004). Convex optimization. Cambridge university press.

4/ Bertsekas, D. (2009). Convex optimization theory (Vol. 1). Athena Scientific.

5/ Papadimitriou, C. H., & Steiglitz, K. (1998). Combinatorial optimization: algorithms and complexity. Courier Corporation.

6/ Cook, W. J., Cunningham, W. H., Pulleyblank, W. R., & Schrijver, A. (2009). Combinatorial optimization. Oberwolfach Reports, 5(4), 2875-2942.

Reading list on Graph Learning - Explainable artificial intelligence (xAI).

Sat, 19 Aug 2023 00:00:00 +0000

XAI-Graph #

2023 #

[1] Azzolin, S., Longa, A., Barbiero, P., Liò, P., & Passerini, A. (2022). Global explainability of gnns via logic combination of learned concepts. arXiv preprint arXiv:2210.07147.

[2] Miao, S., Luo, Y., Liu, M., & Li, P. (2022). Interpretable Geometric Deep Learning via Learnable Randomness Injection. arXiv preprint arXiv:2210.16966.

[3] Liu, Y., Zhang, X., & Xie, S. (2023, February). A Differential Geometric View and Explainability of GNN on Evolving Graphs. In The Eleventh International Conference on Learning Representations.

[4] Wang, X., & Shen, H. W. (2022). GNNInterpreter: A Probabilistic Generative Model-Level Explanation for Graph Neural Networks. arXiv preprint arXiv:2209.07924.

[5] Xia, W., Lai, M., Shan, C., Zhang, Y., Dai, X., Li, X., & Li, D. (2023, February). Explaining Temporal Graph Models through an Explorer-Navigator Framework. In The Eleventh International Conference on Learning Representations.

2022 #

[1] Zhang, S., Liu, Y., Shah, N., & Sun, Y. (2022, January). GStarX: Explaining Graph Neural Networks with Structure-Aware Cooperative Games. In Advances in Neural Information Processing Systems.

[2] Xie, Y., Katariya, S., Tang, X., Huang, E., Rao, N., Subbian, K., & Ji, S. (2022). Task-agnostic graph explanations. arXiv preprint arXiv:2202.08335.

[3] Peng, X., Riedl, M., & Ammanabrolu, P. (2022). Inherently explainable reinforcement learning in natural language. Advances in Neural Information Processing Systems, 35, 16178-16190.

[4] Ma, J., Guo, R., Mishra, S., Zhang, A., & Li, J. (2022). CLEAR: Generative Counterfactual Explanations on Graphs. arXiv preprint arXiv:2210.08443.

[5] Xiong, P., Schnake, T., Montavon, G., Müller, K. R., & Nakajima, S. (2022, June). Efficient Computation of Higher-Order Subgraph Attribution via Message Passing. In International Conference on Machine Learning (pp. 24478-24495). PMLR.

[6] Miao, S., Liu, M., & Li, P. (2022, June). Interpretable and generalizable graph learning via stochastic attention mechanism. In International Conference on Machine Learning (pp. 15524-15543). PMLR.

[7] Wu, Y. X., Wang, X., Zhang, A., He, X., & Chua, T. S. (2022). Discovering invariant rationales for graph neural networks. arXiv preprint arXiv:2201.12872.

[8] Feng, Q., Liu, N., Yang, F., Tang, R., Du, M., & Hu, X. (2023). Degree: Decomposition based explanation for graph neural networks. arXiv preprint arXiv:2305.12895.

[9] Tena Cucala, D. J., Cuenca Grau, B., Kostylev, E. V., & Motik, B. (2022). Explainable GNN-based models over knowledge graphs.

[10] Dong, Y., Wang, S., Wang, Y., Derr, T., & Li, J. (2022, August). On structural explanation of bias in graph neural networks. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (pp. 316-326).

[11] Liu, G., Zhao, T., Xu, J., Luo, T., & Jiang, M. (2022, August). Graph rationalization with environment-based augmentations. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (pp. 1069-1078).

[12] Wang, P., Cai, R., & Wang, H. (2022, April). Graph-based Extractive Explainer for Recommendations. In Proceedings of the ACM Web Conference 2022 (pp. 2163-2171).

[13] Tan, J., Geng, S., Fu, Z., Ge, Y., Xu, S., Li, Y., & Zhang, Y. (2022, April). Learning and evaluating graph neural network explanations based on counterfactual and factual reasoning. In Proceedings of the ACM Web Conference 2022 (pp. 1018-1027).

[14] Islam, S. M., & Bhattacharya, S. (2022, April). AR-BERT: Aspect-relation enhanced Aspect-level Sentiment Classification with Multi-modal Explanations. In Proceedings of the ACM Web Conference 2022 (pp. 987-998).

[15] Zhang, Z., Liu, Q., Wang, H., Lu, C., & Lee, C. (2022, June). Protgnn: Towards self-explaining graph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 36, No. 8, pp. 9127-9135).

[16] Feng, A., You, C., Wang, S., & Tassiulas, L. (2022, June). Kergnns: Interpretable graph neural networks with graph kernels. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 36, No. 6, pp. 6614-6622).

[17] Aglionby, G., & Teufel, S. (2022, December). Faithful Knowledge Graph Explanations in Commonsense Question Answering. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (pp. 10811-10817).

[18] Li, X., Zhang, X., JiaHao, P., Mao, R., Zhou, M., Xie, X., & Liao, H. (2022, December). A Joint Learning Framework for Restaurant Survival Prediction and Explanation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (pp. 3285-3297).

2021 #

[1] Shan, C., Shen, Y., Zhang, Y., Li, X., & Li, D. (2021). Reinforcement learning enhanced explainer for graph neural networks. Advances in Neural Information Processing Systems, 34, 22523-22533.

[2] Wang, X., Wu, Y., Zhang, A., He, X., & Chua, T. S. (2021). Towards multi-grained explainability for graph neural networks. Advances in Neural Information Processing Systems, 34, 18446-18458.

[3] Bajaj, M., Chu, L., Xue, Z. Y., Pei, J., Wang, L., Lam, P. C. H., & Zhang, Y. (2021). Robust counterfactual explanations on graph neural networks. Advances in Neural Information Processing Systems, 34, 5644-5655.

[4] Yuan, H., Yu, H., Wang, J., Li, K., & Ji, S. (2021, July). On explainability of graph neural networks via subgraph explorations. In International Conference on Machine Learning (pp. 12241-12252). PMLR.

[5] Lin, W., Lan, H., & Li, B. (2021, July). Generative causal explanations for graph neural networks. In International Conference on Machine Learning (pp. 6666-6679). PMLR.

[6] Henderson, R., Clevert, D. A., & Montanari, F. (2021, July). Improving molecular graph neural network explainability with orthonormalization and induced sparsity. In International Conference on Machine Learning (pp. 4203-4213). PMLR.

[7] Wang, X., Fan, S., Kuang, K., & Zhu, W. (2021, July). Explainable automated graph representation learning with hyperparameter importance. In International Conference on Machine Learning (pp. 10727-10737). PMLR.

[8] Faber, L., K. Moghaddam, A., & Wattenhofer, R. (2021, August). When comparing to ground truth is wrong: On evaluating gnn explanation methods. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (pp. 332-341).

[9] Abrate, C., & Bonchi, F. (2021, August). Counterfactual graphs for explainable classification of brain networks. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (pp. 2495-2504).

[10] Liu, Y., Chen, C., Liu, Y., Zhang, X., & Xie, S. (2021, December). Multi-objective Explanations of GNN Predictions. In 2021 IEEE International Conference on Data Mining (ICDM) (pp. 409-418). IEEE.

[11] Gao, Y., Sun, T., Bhatt, R., Yu, D., Hong, S., & Zhao, L. (2021, December). Gnes: Learning to explain graph neural networks. In 2021 IEEE International Conference on Data Mining (ICDM) (pp. 131-140). IEEE.

[12] Fan, Y., Yao, Y., & Joe-Wong, C. (2021, December). Gcn-se: Attention as explainability for node classification in dynamic graphs. In 2021 IEEE International Conference on Data Mining (ICDM) (pp. 1060-1065). IEEE.

2020 #

[1] Vu, M., & Thai, M. T. (2020). Pgm-explainer: Probabilistic graphical model explanations for graph neural networks. Advances in neural information processing systems, 33, 12225-12235.

[2] Luo, D., Cheng, W., Xu, D., Yu, W., Zong, B., Chen, H., & Zhang, X. (2020). Parameterized explainer for graph neural network. Advances in neural information processing systems, 33, 19620-19631.

[3] Sanchez-Lengeling, B., Wei, J., Lee, B., Reif, E., Wang, P., Qian, W., … & Wiltschko, A. (2020). Evaluating attribution for graph neural networks. Advances in neural information processing systems, 33, 5898-5910.

Research & Teaching

Sat, 19 Aug 2023 00:00:00 +0000

Research in Mathematics and Computational #

(Vietnamese Translation) Cambridge Notes. You can access by using this link.
(Vietnamese Translation) Daniel Raban’s Note Repository. You can access by using this link.

Research in Computer Science and Machine Learning #

Teaching Assistant @ FIT-HCMUS #

Applied in Data Science
Data Hiding and Secret Sharing
Data Structures and Algorithms
Data Mining and Applications
Data Visualization
Fundamental of Artificial Intelligence
Fundemental of Programming
Introduction to Programming
Introduction to Data Science
Introduction to Machine Learning
Introduction to Bigdata
Introduction to Information Technology
Graph Mining
Parallel Programming
Programming for Data Science
Swarm Intelligence

Miscellanea #

$\LaTeX$ Resources
Blogs and Advice
Mathematical Journals

JSTOR (lots of back issues of journals)
Electronic Library of Mathematics (lots of free online journals, proceedings, etc.)
Math Journal Archive
Elsevier Science, ScienceDirect, SpringerOnline, SpringerLink, Kluwer Online Journals, Birkhauser, Cambridge University Press, AMS Journals, SIAM Journals, INFORMS Journals, ACM Journals, Project Euclid, Wiley Interscience, World Scientific, Marcel Dekker, Taylor & Francis, Palgrave Macmillan

Mathematical books: Academic Press , A K Peters , AMS, Birkhauser, Cambridge, CRC Press , Dover , INFORMS, International Press, Kluwer , Oxford , Prentice-Hall , SIAM, Springer, Wiley , World Scientific.

(More)

Timeless Quotes

Sat, 19 Aug 2023 00:00:00 +0000

There are some things which cannot be learned quickly, and time, which is all we have, must be paid heavily for their acquiring. They are the very simplest things, and because it takes a man’s life to know them the little new that each man gets from life is very costly and the only heritage he has to leave.

– Ernest Hemingway (From A. E. Hotchner, Papa Hemingway, Random House, NY, 1966)

We are punished by our sins, not for them.

– Elbert Hubbard

the lyf so short, the craft so long to lerne

– Chaucer (1340-1400)

Ars longa, vita brevis, occasio praeceps, experimentum periculosum, iudicium difficile (Life is short, [the] craft long, opportunity fleeting, experiment treacherous, judgment difficult.)

– Hippocrates (c. 400BC)

‘the cat sat on the mat’ is not the beginning of a story, but ‘the cat sat on the dog’s mat’ is.

– John le Carré (David John Moore Cornwell)

Excellence in any department can be attained only by the labor of a lifetime; it is not to be purchased at a lesser price.

– Samuel Johnson

Only one who devotes himself to a cause with his whole strength and soul can be a true master. For this reason mastery demands all of a person.

– Albert Einstein

Books are attracted to me. They make a beeline for me, and stick to me. I have been so fond of them that at last they have begun to reciprocate. In my hands books burst like ripe fruit. Like magic flowers they unfold their petals to show me the vital thought, the suggestive word, the confirming quotation, the decisive illustration.

– Sergei Eisenstein

If we concentrate our attention on trying to solve a problem of geometry, and if at the end of an hour we are no nearer to doing so than at the beginning, we have nevertheless been making progress each minute of that hour in another more mysterious dimension. Without knowing or feeling it, this apparent barren effort has brought more light into the soul.

– Simone Weil

We see things not as they are, but as we are.

– The Talmud

The scientist does not study nature because it is useful; he studies it because he delights in it, and he delights in it because it is beautiful. If nature were not beautiful, it would not be worth knowing, and if nature were not worth knowing, life would not be worth living.

– Henri Poincaré

A noble man compares and estimates himself by an idea which is higher than himself; and a mean man, by one lower than himself. The one produces aspiration; the other ambition, which is the way in which a vulgar man aspires.

– Joseph Conrad

Believe that none of the effort you put into coming closer to God is ever wasted – even if in the end you don’t achieve what you are striving for.

– Rebbe Nachman of Breslov

When you look at a human being, you see his hands working, his feet walking, his mouth talking. You don’t see his heart, his brain, his lungs and kidneys. They work quietly, inside. But they are the essential organs of life. The world, too, has hands and feet—those who are making the news, moving things around, shaking things up. The heart, the inner organs, they are those who work quietly from the inside, those unnoticed, those who do a simple act of kindness with no thought of reward.

– Rabbi M. M. Schneerson

Too many people spend money they haven’t earned to buy things they don’t want to impress people they don’t like.

– Will Rogers

When you thwart what’s real about you in order to keep creating content for financial need, you’re just not gonna make it. You’re not gonna keep going. You have your number. It’s very dangerous to be liked by more people than should like you. It’s bad for them, and it’s bad for you. There’s gonna be a shock down the road for them, or you’re gonna dilute yourself and take yourself to a place where you can’t live with who you are. I think that you make an honest account of who you are and you live with the results. The results will be appropriate to who you are… If you’re saying things just to piss people off, then I don’t know why do it. If you’re saying things just to please people, that’s a short-lived victory. But if you just say the things you believe, and the things you like to say, and that mean something to you — if you stay close to the gut — then everything will work itself out.

– Louis C.K.

To exist is to change, to change is to mature, to mature is to go on creating oneself endlessly.

– Henri Bergson

What we have done for ourselves alone dies with us; what we have done for others and the world remains and is immortal.

– Albert Pike

Perhaps all the dragons of our lives are princesses who are only waiting to see us once beautiful and brave.

– Rainer Maria Rilke

You must stay drunk on writing so reality cannot destroy you.

– Ray Bradbury

The ultimate test of a man’s conscience may be his willingness to sacrifice something today for future generations whose words of thanks will not be heard.

– Gaylord Nelson

Finish each day and be done with it. You have done what you could; some blunders and absurdities have crept in; forget them as soon as you can. Tomorrow is a new day; you shall begin it serenely and with too high a spirit to be encumbered with your old nonsense.

– Ralph Waldo Emerson

Marriage is an alliance entered into by a man who can’t sleep with the window shut and a woman who can’t sleep with the window open.

– George Bernard Shaw

If you think education is expensive, try ignorance.

– Derek Bok

People talk about “wasting time,” or even “killing time.” Neither term is accurate. Time does not belong to you that you can waste it. Yetట Yet neither does it have a life of its own that you can take away. Rather, time awaits you to give it life.

– Rabbi M. M. Schneerson

Most folks are about as happy as they make up their minds to be.

– Abraham Lincoln

One who loves must learn fear. One who fears must learn love. The thinker must do. The doer must think. The pacifist must fight, the fighter must find peace. If you flow as a river, burn as a fire. If you burn as a furnace, flow as a river. If you fly as a bird, sit firm as a rock. If you sit firmly, then fly as a bird. Be a fire that flows. A rock that flies. Love with fear and fear with love. For we are not fire, not water, not air, not rocks, not thoughts, not deeds, not fear, not love. We are G-dly beings.

– Rabbi M. M. Schneerson

When you come to the end of all the light you know, and it’s time to step into the darkness of the unknown, faith is knowing that one of two things shall happen: Either you will be given something solid to stand on or you will be taught to fly.

– Edward Teller

Whatever you can do, or dream you can do, begin it. Boldness has genius and power and magic in it.

– Johann Goethe (John Anster’s translation of Faust)

It is impossible to enjoy idling thoroughly unless one has plenty of work to do.

– Jerome K. Jerome

Every society honors its live conformists and its dead troublemakers.

– Mignon McLaughlin

You can easily judge the character of a man by how he treats those who can do nothing for him.

– James D. Miles

In our thinking…we attribute to this concept of the bodily object a significance, which is to high degree independent of the sense impression which orignally gives rise to it. This is what we mean when we attribute to the bodily object a real existence. …By means of such concepts and mental relations between them, we are able to orient ourselves in the labyrinth of sense impressions. These notions and relations…appear to us as stronger and more unalterable than the individual sense experience itself, the character of which as anything other than the result of an illusion or hallucination is never completely guaranteed.

– Albert Einstein

Praise and blame, gain and loss, pleasure and sorrow come and go like the wind. To be happy, rest like a giant tree in the midst of them all.

– Buddha

I am always doing things I can’t do, that’s how I get to do them.

– Pablo Picasso

This above all: to thine own self be true. And it must follow, as the night the day, Thou canst not then be false to any man.

– William Shakespeare

If the world is cold make it your business to build fires.

– Horace Traubel

Nearly all men can stand adversity, but if you want to test a man’s character, give him power.

– Abraham Lincoln

Your work is to discover your work and then, with all your heart, to give yourself to it.

– Buddha

Strive to realize a state of inward happiness, independent of circumstances.

– J.P. Greaves

When one door of happiness closes, another opens; but often we look so long at the closed door that we do not see the one which has opened for us.

– Helen Keller

I keep six honest serving men (They taught me all I know) Their names are What and Why and When And How and Where and Who

– Rudyard Kipling, in Just So Stories

Whatsoever is, is in God, and without God nothing can be, or be conceived.

– Baruch Spinoza

We must not forget that when radium was discovered no one knew that it would prove useful in hospitals. The work was one of pure science. And this is a proof that scientific work must not be considered from the point of view of the direct usefulness of it. It must be done for itself, for the beauty of science, and then there is always the chance that a scientific discovery may become like the radium a benefit for humanity.

– Marie Curie

I believe that a scientist looking at nonscientific problems is just as dumb as the next guy.

– Richard Feynman

To be what we are, and to become what we are capable of becoming, is the only end in life.

– Baruch Spinoza

The highest activity a human being can attain is learning for understanding, because to understand is to be free.

– Baruch Spinoza

I call him free who is led solely by reason.

– Baruch Spinoza

God is the indwelling and not the transient cause of all things.

– Baruch Spinoza

He who finds a thought that enables him to obtain a slightly deeper glimpse into the eternal secrets of nature has been given great grace.

– Albert Einstein

Watch your thoughts; they become words. Watch your words; they become actions. Watch your actions, they become habits. Watch your habits, they become character. Watch your character; it becomes your destiny.

– Frank Outlaw

Creativity is God’s gift to you. What you do with it is your gift to God.

– Bob Moawad

In the long run men hit only what they aim at. Therefore, though they should fail immediately, they had better aim at something high.

– Henry David Thoreau

We act as though comfort and luxury were the chief requirements of life, when all that we need to make us really happy is something to be enthusiastic about.

– Charles Kingsley

I have always believed that whatever good or bad fortune may come our way we can always give it meaning and transform it into something of value.

– Hermann Hesse

It is even harder for the average ape to believe that he has descended from man.

– H.L. Mencken

Truth, like gold, is to be obtained not by its growth, but by washing away from it all that is not gold.

– Leo Tolstoy

If you do not change direction, you may end up where you are heading.

– Lao Tzu

The best thing for being sad is to learn something. That is the only thing that never fails. You may grow old and trembling in your anatomies, you may lie awake at night listening to the disorder of your veins, you may miss your only love, you may see the world about you devastated by evil lunatics, or know your honor trampled in the sewers of baser minds. There is only one thing for it then to learn. Learn why the world wags and what wags it. That is the only thing which the mind can never exhaust, never alienate, never be tortured by, never fear or distrust, and never dream of regretting.

– T. H. White, in The Once and Future King

What we hope ever to do with ease we may learn first to do with diligence.

– Samuel Johnson

The way is long if one follows precepts, but short… if one follows patterns.

– Lucius Annaeus Seneca

Find out just what any people will quietly submit to and you have found out the exact measure of injustice and wrong which will be imposed upon them.

– Frederick Douglass

Somewhere, something incredible is waiting to be known.

– Carl Sagan

Principles for the Development of a Complete Mind: Study the science of art. Study the art of science. Develop your senses – especially learn how to see. Realise that everything connects to everything else.

– Leonardo DaVinci

I have come here to chew bubblegum and kick ass … and I’m all out of bubblegum.

– Nada, in They Live (1988) by John Carpenter

Where the mind is without fear and the head is held high Where knowledge is free Where the world has not been broken up into fragments By narrow domestic walls Where the words come out From the depth of truth Where the tireless striving stretches its arms towards perfection Where the clear stream of reason has not lost its way into the dreary desert sand of dead habit Where the mind is led forward by thee In ever widening thought and action Into that heaven of freedom, my father Let my country awake.

– Rabindranath Tagore (from Gitanjali)

By all means marry; if you get a good wife, you’ll become happy; if you get a bad one, you’ll become a philosopher.

– Socrates

The belief in an external world independent of the perceiving subject is the basis of all natural science. Since, however, sense perception only gives information of this external world or of “physical reality” indirectly, we can only grasp the latter by speculative means. It follows from this that our notions of physical reality can never be final. We must always be ready to change these notions – that is to say, the axiomatic basis of physics – in order to do justice to perceived facts in the most perfect way logically.

– Albert Einstein

I love you when you bow in your mosque, kneel in your temple, pray in your church. For you and I are sons of one religion, and it is the spirit.

– Kahlil Gibran

To find yourself, think for yourself.

– Socrates

The earth is but one country, and mankind its citizens.

– Baha’u’llah

There is only one good, knowledge, and one evil, ignorance.

– Socrates

For every complicated problem there is a solution that is simple, direct, understandable, and wrong.

– H. L. Mencken

If people do not believe that mathematics is simple, it is only because they do not realize how complicated life is.

– John Louis von Neumann

The only true wisdom is in knowing you know nothing.

– Socrates

It’s only recently that I’ve come to understand that writers are not marginal to our society, that they, in fact, do all our thinking for us, that we are writing myths and our myths are believed, and that old myths are believed until someone writes a new one.

– Kurt Vonnegut

All ads do the same: create an anxiety relievable by purchase.

– David Foster Wallace

Beginnings are hard. For good reason. If they were easy, we would prowl into each new venture like a snug fat cat. When you begin pent up in an iron cage, a new life emerges. A tiger that breaks through the door of its cage and pounces with a vengeance. Bless those cages, those impossible brick walls, those rivers of fire that lie at the outset of each worthwhile journey. Without them we would be only as powerful as we appear.

– Rabbi M. M. Schneerson

I really think the mark of experience isn’t the ability to write a lot of good pages, it’s the ability to generate shitty pages faster without worrying so much about it.

– Justin Marks

The more subtle and elegant you are in hiding your plot points, the better you are as a writer.

– Billy Wilder

Inspiration does exist, but it must find you working.

– Pablo Picasso

Every character should want something, even if it is only a glass of water.

– Kurt Vonnegut

There is no abstract art. You must always start with something. Afterward you can remove all traces of reality.

– Pablo Picasso

To the complaint, ‘There are no people in these photographs,’ I respond, There are always two people: the photographer and the viewer.

– Ansel Adams

The more abstract is form, the more clear and direct its appeal.

– Wassily Kandinsky

The artist must have something to say, for mastery over form is not his goal but rather the adapting of form to its inner meaning.

– Wassily Kandinsky

Treat a man as he appears to be, and you make him worse. But treat a man as if he were what he potentially could be, and you make him what he should be.

– Johann Wolfgang von Goethe

Pure mathematics is, in its way, the poetry of logical ideas. One seeks the most general ideas of operation which will bring together in simple, logical and unified form the largest possible circle of formal relationships. In this effort toward logical beauty spiritual formulas are discovered necessary for the deeper penetration into the laws of nature.

– Albert Einstein

Pursue some path, however narrow and crooked, in which you can walk with love and reverence.

– Henry David Thoreau

If any man wish to write in a clear style, let him be first clear in his thoughts; and if any would write in a noble style, let him first possess a noble soul.

– Johann Wolfgang von Goethe

Only the curious will learn, only the resolute overcome the obstacles to learning. The Quest quotient has always excited me more than the intelligence quotient.

– Eugene S. Wilson

Human beings can attain a worthy and harmonious life only if they are able to rid themselves, within the limits of human nature, of striving to fulfill wishes of the material kind.

– Albert Einstein

You can’t wait for inspiration. You have to go after it with a club.

– Jack London

If you don’t have time to read, you don’t have the time – or the tools—to write.

– Stephen King

A professor must have a theory as a dog must have fleas.

– H. L. Mencken

We shall not cease from exploration, and the end of all our exploring will be to arrive where we started, and know the place for the first time.

– T. S. Eliot

Humankind has not woven the web of life. We are but one thread within it. Whatever we do to the web we do to ourselves. All things are bound together. All things are connected.

– Chief Seattle

We do not inherit the earth from our ancestors, we borrow it from our children.

– Native American Proverb

We cannot command Nature except by obeying her.

– Francis Bacon

Each player must accept the cards life deals him or her: but once they are in hand, he or she alone must decide how to play the cards in order to win the game.

– Voltaire

We are what we think. All that we are arises with our thoughts. With our thoughts, we make the world.

– Buddha

Every intellectual has a very special responsibility. He has the privilege and opportunity of studying. In return, he owes it to his fellow men (or ‘to society’) to represent the results of his study as simply, clearly and modestly as he can. The worst thing that intellectuals can do – the cardinal sin – is to try to set themselves up as great prophets vis-a-vis their fellow men and to impress them with puzzling philosophies. Anyone who cannot speak simply and clearly should say nothing and continue to work until he can do so.

– Karl Popper

A man who stands for nothing will fall for anything.

– Malcolm X

You need not leave your room. Remain sitting at your table and listen. You need not even listen, simply wait, just learn to become quiet, and still, and solitary. The world will freely offer itself to you to be unmasked. It has no choice; it will roll in ecstasy at your feet.

– Franz Kafka

Creativity is essentially a lonely art. An even lonelier struggle. To some a blessing. To others a curse. It is in reality the ability to reach inside yourself and drag forth from your very soul an idea.

– Lou Dorfsman

Life is not easy for any of us. But what of that? We must have perseverance and above all confidence in ourselves. We must believe that we are gifted for something, and that this thing, at whatever cost, must be attained.

– Marie Curie

Nothing in this world can take the place of persistence. Talent will not; nothing is more common than unsuccessful people with talent. Genius will not; unrewarded genius is almost a proverb. Education will not; the world is full of educated derelicts. Persistence and determination alone are omnipotent. The slogan “press on” has solved and always will solve the problems of the human race.

– Calvin Coolidge

Education is the passport to the future, for tomorrow belongs to those who prepare for it today.

– Malcolm X

Jump off the cliff and build your wings on the way down.

– Ray Bradbury

However great a man’s natural talent may be, the act of writing cannot be learned all at once.

– Jean Jacques Rousseau

Talent is cheaper than table salt. What separates the talented individual from the successful one is a lot of hard work.

– Stephen King

My ambition is to find freedom, without taking it from someone else.

– George Dyson

writing = ass + chair

– Oliver Stone

Truth is a demure lady, much too ladylike to knock you on your head and drag you to her cave. She is there, but people must want her, and seek her out.

– William F. Buckley

Death is a dignitary who when he comes announced is to be received with formal manifestations of respect, even by those most familiar with him. In the code of military etiquette silence and fixity are forms of deference.

– Ambrose Bierce (from An Occurrence at Owl Creek, 1890)

Don’t market yourself. Editors and readers don’t know what they want until they see it. Scratch what itches. Write what you need to write, feed the hunger for meaning in your life. Play at the serious questions of life and death.

– Donald M. Murray

Never, under any circumstances, hate a movie. It won’t help you and it’s a waste of time. There’s plenty of reasons to not to like a movie. But if you hate them? Meaning if let them bother you? Then they’ll do nothing but bother you. And I mean if you want to do this for a fucking living and you’re absolutely serious, then never hate a movie. You can learn so much about the craft from bad movies. Bad movies teach you what not to do and what to correct in your process and that’s way more helpful. And fuck man, hating movies closes you off to stuff that seems like whatever you hate. Or stuff by the same guy. And who knows? That other stuff could be awesome. Some of my favorite filmmakers made bad movies. It won’t help you. It just won’t. It stops your development right in its tracks, okay? I mean like everything and I ain’t trying to get you to be like me or anything. I’m just saying I think it’s better for you. And it makes me way, way happier. Never hate a movie. They’re gifts. Every fucking one of em.

– Quentin Tarantino

I think everybody should get rich and famous and do everything they ever dreamed of so they can see that it’s not the answer.

– Jim Carrey

Due to circumstances beyond my control, I am the master of my fate and captain of my soul.

– Ashleigh Brilliant (variant from a line in the poem “Invictus” by William Earnest Henley, written in 1875)

Entertain yourself. Luck comes just as often (and just as rarely) to every writer. Don’t be the writer that got lucky doing something they hate.

– Dan Harmon

If there’s a book you really want to read but it hasn’t been written yet, then you must write it.

– Toni Morrison

I’d just say to aspiring journalists or writers – who I meet a lot of – do it now. Don’t wait for permission to make something that’s interesting or amusing to you. Just do it now. Don’t wait. Find a story idea, start making it, give yourself a deadline, show it to people who’ll give you notes to make it better. Don’t wait till you’re older, or in some better job than you have now. Don’t wait for anything. Don’t wait till some magical story idea drops into your lap. That’s not where ideas come from. Go looking for an idea and it’ll show up. Begin now. Be a fucking soldier about it and be tough.

– Ira Glass

Success consists of going from failure to failure without loss of enthusiasm.

– Winston Churchill

All the gods, all the heavens, all the hells, are within you.

– Joseph Campbell

– Simone Weil

We see things not as they are, but as we are.

– The Talmud

– Henri Poincaré

– Joseph Conrad

Believe that none of the effort you put into coming closer to God is ever wasted – even if in the end you don’t achieve what you are striving for.

– Rebbe Nachman of Breslov

Too many people spend money they haven’t earned to buy things they don’t want to impress people they don’t like.

– Will Rogers

To exist is to change, to change is to mature, to mature is to go on creating oneself endlessly.

– Henri Bergson

What we have done for ourselves alone dies with us; what we have done for others and the world remains and is immortal.

– Albert Pike

Perhaps all the dragons of our lives are princesses who are only waiting to see us once beautiful and brave.

– Rainer Maria Rilke

The ultimate test of a man’s conscience may be his willingness to sacrifice something today for future generations whose words of thanks will not be heard.

– Gaylord Nelson

– Ralph Waldo Emerson

Marriage is an alliance entered into by a man who can’t sleep with the window shut and a woman who can’t sleep with the window open.

– George Bernard Shaw

If you think education is expensive, try ignorance.

– Derek Bok

Most folks are about as happy as they make up their minds to be.

– Abraham Lincoln

– Rabbi M. M. Schneerson

– Edward Teller

Whatever you can do, or dream you can do, begin it. Boldness has genius and power and magic in it.

– Johann Goethe (John Anster’s translation of Faust)

It is impossible to enjoy idling thoroughly unless one has plenty of work to do.

– Jerome K. Jerome

Every society honors its live conformists and its dead troublemakers.

– Mignon McLaughlin

You can easily judge the character of a man by how he treats those who can do nothing for him.

– James D. Miles

In our thinking…we attribute to this concept of the bodily object a significance, which is to high degree independent of the sense impression which originally gives rise to it. This is what we mean when we attribute to the bodily object a real existence. …By means of such concepts and mental relations between them, we are able to orient ourselves in the labyrinth of sense impressions. These notions and relations…appear to us as stronger and more unalterable than the individual sense experience itself, the character of which as anything other than the result of an illusion or hallucination is never completely guaranteed.

– Albert Einstein

Praise and blame, gain and loss, pleasure and sorrow come and go like the wind. To be happy, rest like a giant tree in the midst of them all.

– Buddha

The right time to show your good character is when you are pestered by somebody weaker than you.

– Buddha

I am always doing things I can’t do, that’s how I get to do them.

– Pablo Picasso

This above all: to thine own self be true. And it must follow, as the night the day, Thou canst not then be false to any man.

– William Shakespeare

If the world is cold make it your business to build fires.

– Horace Traubel

Nearly all men can stand adversity, but if you want to test a man’s character, give him power.

– Abraham Lincoln

Your work is to discover your work and then, with all your heart, to give yourself to it.

– Buddha

Strive to realize a state of inward happiness, independent of circumstances.

– J.P. Greaves

When one door of happiness closes, another opens; but often we look so long at the closed door that we do not see the one which has opened for us.

– Helen Keller

I keep six honest serving men (They taught me all I know) Their names are What and Why and When And How and Where and Who

– Rudyard Kipling, in Just So Stories

– Marie Curie

I believe that a scientist looking at nonscientific problems is just as dumb as the next guy.

– Richard Feynman

To be what we are, and to become what we are capable of becoming, is the only end in life.

– Baruch Spinoza

– Frank Outlaw

Creativity is God’s gift to you. What you do with it is your gift to God.

– Bob Moawad

In the long run men hit only what they aim at. Therefore, though they should fail immediately, they had better aim at something high.

– Henry David Thoreau

We act as though comfort and luxury were the chief requirements of life, when all that we need to make us really happy is something to be enthusiastic about.

– Charles Kingsley

I have always believed that whatever good or bad fortune may come our way we can always give it meaning and transform it into something of value.

– Hermann Hesse

Truth, like gold, is to be obtained not by its growth, but by washing away from it all that is not gold.

– Leo Tolstoy

If you do not change direction, you may end up where you are heading.

– Lao Tzu

– T. H. White, in The Once and Future King

What we hope ever to do with ease we may learn first to do with diligence.

– Samuel Johnson

The way is long if one follows precepts, but short… if one follows patterns.

– Lucius Annaeus Seneca

Find out just what any people will quietly submit to and you have found out the exact measure of injustice and wrong which will be imposed upon them.

– Frederick Douglass

Somewhere, something incredible is waiting to be known.

– Carl Sagan

– Leonardo DaVinci

I have come here to chew bubblegum and kick ass … and I’m all out of bubblegum.

– Nada, in They Live (1988) by John Carpenter

– Rabindranath Tagore (from Gitanjali)

By all means marry; if you get a good wife, you’ll become happy; if you get a bad one, you’ll become a philosopher.

– Socrates

– Albert Einstein

I love you when you bow in your mosque, kneel in your temple, pray in your church. For you and I are sons of one religion, and it is the spirit.

– Kahlil Gibran

To find yourself, think for yourself.

– Socrates

The earth is but one country, and mankind its citizens.

– Baha’u’llah

There is only one good, knowledge, and one evil, ignorance.

– Socrates

For every complicated problem there is a solution that is simple, direct, understandable, and wrong.

– H. L. Mencken

If people do not believe that mathematics is simple, it is only because they do not realize how complicated life is.

– John Louis von Neumann

The only true wisdom is in knowing you know nothing.

– Socrates

– Kurt Vonnegut

All ads do the same: create an anxiety relievable by purchase.

– David Foster Wallace

– Rabbi M. M. Schneerson

I really think the mark of experience isn’t the ability to write a lot of good pages, it’s the ability to generate shitty pages faster without worrying so much about it.

– Justin Marks

Mathematics

Sat, 08 Apr 2023 00:00:00 +0000

Study Mathematics #

Master of Science in Mathematics @ HCMUS

Branches of Mathematics #

1. Foundation of Mathematics #

Transition To Pure Rigour Math
Set Theory
Logic
Category Theory
Type Theory
Homotopy Type Theory
Surreal Numbers

2. Number Theory #

Algebraic Number Theory
Analytic Number Theory

3. Algebra #

4. Combinatorics #

Probabilistic methods in Combinatorics
Algebraic Combinatorics
Graph Theory

5. Geometry Topology #

Differential Geometry
Algebraic Geometry
Algebraic Statistics
Topology
Algebraic Topology

6. Mathematical analysis #

Real Analysis
Harmonic Analysis
Complex Analysis
Functional Analysis
Measure Theory
ODE
PDE
Variational Analysis
Calculus of Variations
Calculus (Single/ Multi-variables)
Optimization & Operation Research
Dynamical Systems
Set-valued Analysis

7. Probability and Statistics #

Probability Theory
Statistics
Statistical Learning
Stochastic processes

8. Numerical Analysis #

Numerical methods for PDEs
Numerical methods for ODEs
Computational Linear Algebra

9. Signal Processing #

10. Mathematics for Computer Science #

11. Mathematical Physics #

Parallel Programming

Sat, 08 Apr 2023 00:00:00 +0000

CUDA C++ Programming Guide (Link)

Table of instruction throughputs (Link)

PTX Reference Manual (Link)

Inline PTX syntax guide (Link)
Tensor core instruction data layouts (Link)

SASS Instruction List (Link)
Compiler Explorer by Matt Godbolt (Link)
GPU Mode (Link)
Modal GPU Glossary (Link)

Theoretical Computer Science

Sat, 08 Apr 2023 00:00:00 +0000

Study Computer Science #

Master of Science in Computer Science @ HCMUS

Branches of Theoretical Computer Science #

1. Theory of Computation #

Computational Complexity

Communication Complexity
Circuit Complexity
Quantum Complexity
Proof Complexity

Computability Theory

2. Logic #

3. Programming Language Theory #

Basic of Programming Language Theory
Formal Verification
Type Theory
Functional Programming

4. Algorithms #

General Algorithms
Lower Bounds
Randomization & Probability for Algorithms
Approximation Algorithms
Parameterized Algorithms
Learning-augmented Algorithms

5. Information/Coding Theory #

6. Cryptography #

7. Machine Learning Theory #

8. Game Theory #

Cambridge Notes (Vietnamese)

Wed, 11 Jan 2023 00:00:00 +0000

Ghi chú bài giảng Cambridge #

Tất cả các ghi chú đều được dịch từ Cambridge Notes do Dexter Chua biên tập. Các bản dịch sang tiếng Việt được sử dụng cho mục đích học tập. Vui lòng không sử dụng cho mục đích thương mại.

Part IA #

Michaelmas Term

Phương trình vi phân - Differential Equations: HTML, PDF, PDF (Trim), PDF (defs), PDF (thm), PDF (thm+proof), Official Notes, PDF (Vi)
Lý thuyết nhóm - Groups
Số học và Tập hợp - Numbers and Sets
Vector và Ma trận - Vectors and Matrices

Lent Term

Giải tích I - Analysis I
Động học và Thuyết tương đối - Dynamics and Relativity
Xác suất - Probability
Giải tích vector - Vector Calculus

Part IB #

Michaelmas Term

Giải tích II - Analysis II
Đại số tuyến tính - Linear Algebra
Xích Markov - Markov Chains
Kỹ thuật toán học - Methods
Cơ học lượng tử - Quantum Mechanics

Lent Term

Giải tích phức - Complex Analysis
Kỹ thuật phức - Complex Methods
Điện tử - Electromagnetism
Cơ học chất lỏng - Fluid Dynamics
Hình học - Geometry
Nhóm, Vành và Modules - Groups, Rings and Modules
Giải tích số - Numerical Analysis
Thống kê - Statistics

Easter Term

Không gian Metric và Topo - Metric and Topological Spaces
Optimisation
Nguyên lý biến phân - Variational Principles

Part II #

Michaelmas Term

Topo Đại số - Algebraic Topology
Lý thuyết Galois - Galois Theory
Hệ khả tích - Integrable Systems
Giải tích tuyến tính - Linear Analysis
Độ đo và Xác suất - Probability and Measure

Lent Term

Logic và Lý thuyết tập hợp - Logic and Set Theory
Trường số học - Number Fields
Lý thuyết biểu diễn - Representation Theory
Vật lý thống kê - Statistical Physics

Part III #

Michaelmas Term

Xác suất nâng cao - Advanced Probability
Topo Đại số - Algebraic Topology
Giải tích về Phương trình Đạo hàm riêng - Analysis of Partial Differential Equations
Tổ hợp - Combinatorics
Hình học vi phân - Differential Geometry
Extremal Graph Theory
Hydrodynamic Stability
Trường địa phương - Local Fields
Các kỹ thuật thống kê hiện đại - Modern Statistical Methods
Percolation and Random Walks on Graphs
Tính toán lượng tử - Quantum Computation
Lý thuyết trường lượng tử - Quantum Field Theory
Symmetries, Fields and Particles

Lent Term

Lý thuyết trường lượng tử - Advanced Quantum Field Theory
Đại số - Algebras
Logic
Modular Forms and L-functions
Tính dương trong Đại số Hình học - Positivity in Algebraic Geometry
Lý thuyết Ramsey - Ramsey Theory
Hình học Riemannian - Riemannian Geometry
Tiến hóa Schramm–Loewner - Schramm–Loewner Evolutions
Giải tích ngẫu nghiên và Ứng dụng - Stochastic Calculus and Applications
Symplectic Geometry
Mô hình chuẩn - The Standard Model
Theoretical Physics of Soft Condensed Matter

Easter Term

Classical and Quantum Solitons

Part IV #

Michaelmas Term

Topics in Geometric Group Theory

Lent Term

Topics in Number Theory

Easter Term

Bounded Cohomology

Daniel Raban's Note Repository Notes (Vietnamese)

Wed, 11 Jan 2023 00:00:00 +0000

Daniel Raban’s Note Repository #

[UCLA] Math 206A: Combinatorial Discrete Geometry (Igor Pak, F18): PDF, PDF (Vi)

[UCLA] Math 206B: Algebraic Combinatorics (Igor Pak, W19): [PDF], PDF (Vi)

[UCLA] Math 210A: Algebra (Romyar Sharifi, F18): PDF, PDF (Vi)

[UCLA] Math 210B: Algebra (Romyar Sharifi, W19): PDF, PDF (Vi)

[UCLA] Math 210C: Algebra (Romyar Sharifi, Sp19): [PDF], PDF (Vi)

[UCLA] Math 245B: Real Analysis (Tim Austin, W19): PDF, PDF (Vi)

[UCLA] Math 245C: Real Analysis (Wilfrid Gangbo, Sp19): PDF, PDF (Vi)

[UCLA] Math 246A: Complex Analysis (John Garnett, F18): [PDF], PDF (Vi)

[UCLA] Math 246B: Complex Analysis (Michael Hitrik, W19): PDF, PDF (Vi)

[UCLA] Math 246C: Complex Analysis (Michael Hitrik, Sp19): PDF, PDF (Vi)

[UCLA] Math 247A: Classical Fourier Analysis (Monica Visan, W20): PDF, PDF (Vi)

[UCLA] Math 254A: Topics in Entropy and Statistical Mechanics (Tim Austin, Sp21): PDF, PDF (Vi)

[UCLA] Math 254B: Ergodic Theory and Fractals (Tim Austin, Sp19): [PDF], PDF (Vi)

[UCLA] Math 255A: Functional Analysis (Michael Hitrik, F18): PDF, PDF (Vi)

[UCLA] Math 255A’: Functional Analysis (Tim Austin, F19): PDF, PDF (Vi)

[UCLA] Math 255B: Functional Analysis (Michael Hitrik, W20): PDF, PDF (Vi)

[UCLA] Math 259A: Operator Algebras in Hilbert Space (Sorin Popa, F19): PDF, PDF (Vi)

[UCLA] Math 275D: Stochastic Calculus (Jun Yin, F19): PDF, PDF (Vi)

[UC Berkeley] CS 294: Analysis of Boolean Functions (Avishay Tal, Sp23): [PDF], PDF (Vi)

[UC Berkeley] EE 229A: Information Theory and Coding (Venkat Anantharam, F21): PDF, PDF (Vi)

[UC Berkeley] Math 142: Algebraic Topology (Jamie Conway, Sp18): PDF, PDF (Vi)

[UC Berkeley] Math 222A: Partial Differential Equations (Daniel Tataru, F21): PDF, PDF (Vi)

[UC Berkeley] Math 222B: Partial Differential Equations (Sung-Jin Oh, Sp22): PDF, PDF (Vi)

[UC Berkeley] Math 249: Algebraic Combinatorics (Mark Haiman, F17): [PDF], PDF (Vi)

[UC Berkeley] Math 250A: Groups, Rings, and Fields (Richard Borcherds, F17): PDF, PDF (Vi)

[UC Berkeley] Math 272: Theory of Combinatorial Limits (Dan Král, S25): [PDF], PDF (Vi)

[UC Berkeley] Math 279: Topics in Stochastic Partial Differential Equations (Fraydoun Rezakhanlou, F21): PDF, PDF (Vi)

[UC Berkeley] Stat 155: Game Theory (Oscar Hernan Madrid Padilla, Sp18): PDF, PDF (Vi)

[UC Berkeley] Stat 206B: Stochastic Processes (Jim Pitman, Sp 18): [PDF], PDF (Vi)

[UC Berkeley] Stat C206B: Statics and Dynamics of Random Surfaces (Shirshendu Ganguly, Sp 22): [PDF], PDF (Vi)

[UC Berkeley] Stat 210A: Theoretical Statistics (Will Fithian, F21): PDF, PDF (Vi)

[UC Berkeley] Stat 210B: High-Dimensional Statistics (Song Mei, Sp22): PDF, PDF (Vi)

List of Ebooks for Data Science

Wed, 11 Jan 2023 00:00:00 +0000

The Law - The mathematical foundations #

Statistical Inference - Casella & Berger
Foundations of Applied Mathematics

History - Foundational works that provide additional context for more advanced concepts #

Convex Optimization - Boyd & Vandenberghe
Probability Theory: The Logic of Science - Jaynes
Clean Code - Martin

Poetry - Prose type works #

Major Prophets - Seminal works on major topics #

Applied Regression Analysis - Draper & Smith
The Data Warehouse Toolkit - Kimball
Bayesian Data Analysis - Gelman
Forecasting: Principles and Practices - Hyndman & Athanasopoulos

Minor Prophets - Important works, but not quite at the level of the DS Major Prophets #

Trustworthy Online Controlled Experiments

The Gospels - The fulfillment of the DS Law #

History Pt. 2 - Data science goes to the Gentiles (non-DS/execs) #

Letters - Further explanation and interpretation of the DS Gospel #

List of Github Repository for Data Science

Wed, 11 Jan 2023 00:00:00 +0000

The Data Engineering Cookbook, Github
A curated list of data engineering tools for software developers, Github
Data Engineering Zoomcamp, Github
Python Data Science Handbook: full text in Jupyter Notebooks, Github
Data Science for Beginners - A Curriculum, Github
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines. Github
Papers & tech blogs by companies sharing their work on data science & machine learning in production. Github
An awesome Data Science repository to learn and apply for real world problems. Github
List of Data Science Cheatsheets to rule the world, Github
Data science interview questions and answers, Github
A curated list of applied machine learning and data science notebooks and libraries across different industries, Github
A curated list of data science blogs, Github

Optimization Research Papers in JMLR Volume 23

Thu, 29 Sep 2022 00:00:00 +0000

Optimization Research Papers in JMLR Volume 23 (2022) #

This document lists papers from JMLR Volume 23 (2022) that focus on optimization research, categorized by their primary themes. Each paper is numbered starting from 1 within its subsection, with a brief description of its key contributions to optimization theory, algorithms, or applications.

Convex Optimization #

Papers addressing convex optimization problems, including sparse PCA, L1-regularized SVMs, and metric-constrained problems.

Solving Large-Scale Sparse PCA to Certifiable (Near) Optimality
Authors: Dimitris Bertsimas, Ryan Cory-Wright, Jean Pauphilet
Description: Develops convex optimization techniques for large-scale sparse principal component analysis with certifiable near-optimal solutions.
Novel Min-Max Reformulations of Linear Inverse Problems
Authors: Mohammed Rayyan Sheriff, Debasish Chatterjee
Description: Proposes min-max reformulations for linear inverse problems using convex optimization frameworks.
New Insights for the Multivariate Square-Root Lasso
Authors: Aaron J. Molstad
Description: Analyzes the square-root Lasso in multivariate settings, focusing on its convex optimization properties.
Towards An Efficient Approach for the Nonconvex lp Ball Projection: Algorithm and Analysis
Authors: Xiangyu Yang, Jiashan Wang, Hao Wang
Description: Develops efficient algorithms for lp ball projection, addressing both convex and nonconvex aspects.
Solving L1-Regularized SVMs and Related Linear Programs: Revisiting the Effectiveness of Column and Constraint Generation
Authors: Antoine Dedieu, Rahul Mazumder, Haoyue Wang
Description: Investigates L1-regularized SVMs using convex optimization with column and constraint generation.
Extensions to the Proximal Distance Method of Constrained Optimization
Authors: Alfonso Landeros, Oscar Hernan Madrid Padilla, Hua Zhou, Kenneth Lange
Description: Extends the proximal distance method for constrained convex optimization problems.
Stochastic Subgradient for Composite Convex Optimization with Functional Constraints
Authors: Ion Necoara, Nitesh Kumar Singh
Description: Analyzes stochastic subgradient methods for composite convex optimization with functional constraints.
On Regularized Square-Root Regression Problems: Distributionally Robust Interpretation and Fast Computations
Authors: Hong T.M. Chu, Kim-Chuan Toh, Yangjing Zhang
Description: Studies regularized square-root regression with a distributionally robust perspective and efficient computational methods.
Project and Forget: Solving Large-Scale Metric Constrained Problems
Authors: Rishi Sonthalia, Anna C. Gilbert
Description: Proposes a convex optimization approach for large-scale metric-constrained problems.
Faster Randomized Interior Point Methods for Tall/Wide Linear Programs
Authors: Agniva Chowdhury, Gregory Dexter, Palma London, Haim Avron, Petros Drineas
Description: Develops randomized interior point methods for efficient optimization of tall/wide linear programs.

Nonconvex Optimization #

Papers tackling nonconvex optimization, focusing on optimality, stability, and convergence in nonsmooth and game settings.

Optimality and Stability in Non-Convex Smooth Games
Authors: Guojun Zhang, Pascal Poupart, Yaoliang Yu
Description: Analyzes optimality and stability in nonconvex smooth games with convergence guarantees.
Simple and Optimal Stochastic Gradient Methods for Nonsmooth Nonconvex Optimization
Authors: Zhize Li, Jian Li
Description: Proposes simple and optimal stochastic gradient methods for nonsmooth, nonconvex optimization.
Oracle Complexity in Nonsmooth Nonconvex Optimization
Authors: Guy Kornowski, Ohad Shamir
Description: Studies the oracle complexity of nonsmooth nonconvex optimization problems.
Distributed Stochastic Gradient Descent: Nonconvexity, Nonsmoothness, and Convergence to Local Minima
Authors: Brian Swenson, Ryan Murray, H. Vincent Poor, Soummya Kar
Description: Investigates distributed SGD for nonconvex, nonsmooth optimization with convergence to local minima.

Stochastic Optimization #

Papers focusing on stochastic optimization methods, including bundle methods, zeroth-order algorithms, and adaptive techniques.

A Stochastic Bundle Method for Interpolation
Authors: Alasdair Paren, Leonard Berrada, Rudra P. K. Poudel, M. Pawan Kumar
Description: Introduces a stochastic bundle method for efficient interpolation in optimization.
On Biased Stochastic Gradient Estimation
Authors: Derek Driggs, Jingwei Liang, Carola-Bibiane Schönlieb
Description: Analyzes biases in stochastic gradient estimation and their impact on optimization performance.
Accelerated Zeroth-Order and First-Order Momentum Methods from Mini to Minimax Optimization
Authors: Feihu Huang, Shangqian Gao, Jian Pei, Heng Huang
Description: Proposes accelerated zeroth-order and first-order momentum methods for a range of optimization problems.
Stochastic Zeroth-Order Optimization under Nonstationarity and Nonconvexity
Authors: Abhishek Roy, Krishnakumar Balasubramanian, Saeed Ghadimi, Prasant Mohapatra
Description: Studies zeroth-order optimization in nonstationary and nonconvex settings.
Accelerating Adaptive Cubic Regularization of Newton’s Method via Random Sampling
Authors: Xi Chen, Bo Jiang, Tianyi Lin, Shuzhong Zhang
Description: Enhances Newton’s method with adaptive cubic regularization using random sampling.
A Momentumized, Adaptive, Dual Averaged Gradient Method
Authors: Aaron Defazio, Samy Jelassi
Description: Develops a momentum-based adaptive gradient method for stochastic optimization.
Stochastic DCA with Variance Reduction and Applications in Machine Learning
Authors: Hoai An Le Thi, Hoang Phuc Hau Luu, Hoai Minh Le, Tao Pham Dinh
Description: Introduces a stochastic difference-of-convex-functions algorithm with variance reduction for machine learning.
Robust Distributed Accelerated Stochastic Gradient Methods for Multi-Agent Networks
Authors: Alireza Fallah, Mert Gürbüzbalaban, Asuman Ozdaglar, Umut Şimşekli, Lingjiong Zhu
Description: Proposes robust stochastic gradient methods for distributed optimization in multi-agent networks.
On Acceleration for Convex Composite Minimization with Noise-Corrupted Gradients and Approximate Proximal Mapping
Authors: Qiang Zhou, Sinno Jialin Pan
Description: Addresses acceleration in convex composite minimization with noisy gradients.
Asymptotic Study of Stochastic Adaptive Algorithms in Non-Convex Landscape
Authors: Sébastien Gadat, Ioana Gavra
Description: Analyzes the asymptotic behavior of stochastic adaptive algorithms in nonconvex settings.
Towards Practical Adam: Non-Convexity, Convergence Theory, and Mini-Batch Acceleration
Authors: Congliang Chen, Li Shen, Fangyu Zou, Wei Liu
Description: Studies the Adam optimizer, focusing on nonconvexity, convergence, and mini-batch acceleration.
An Efficient Sampling Algorithm for Non-Smooth Composite Potentials
Authors: Wenlong Mou, Nicolas Flammarion, Martin J. Wainwright, Peter L. Bartlett
Description: Develops an efficient sampling algorithm for nonsmooth composite potentials in stochastic optimization.
SGD with Coordinate Sampling: Theory and Practice
Authors: Rémi Leluc, François Portier
Description: Explores coordinate sampling in stochastic gradient descent with theoretical and practical insights.

Distributed/Decentralized Optimization #

Papers addressing distributed or decentralized optimization algorithms, focusing on communication efficiency and convergence.

Asymptotic Network Independence and Step-Size for a Distributed Subgradient Method
Authors: Alex Olshevsky
Description: Analyzes step-size and convergence for a distributed subgradient optimization method.
Projection-Free Distributed Online Learning with Sublinear Communication Complexity
Authors: Yuanyu Wan, Guanghui Wang, Wei-Wei Tu, Lijun Zhang
Description: Develops projection-free algorithms for distributed online learning with reduced communication complexity.
Variance Reduced EXTRA and DIGing and Their Optimal Acceleration for Strongly Convex Decentralized Optimization
Authors: Huan Li, Zhouchen Lin, Yongchun Fang
Description: Proposes variance-reduced methods for decentralized optimization with optimal acceleration.

Submodular Optimization #

Papers focusing on submodular optimization, particularly in model selection.

Joint Continuous and Discrete Model Selection via Submodularity
Authors: Jonathan Bunton, Paulo Tabuada
Description: Uses submodularity for joint continuous and discrete model selection in optimization.

Bandits and Online Learning #

Papers addressing multi-armed bandits, online optimization, and regret minimization.

Multi-Agent Online Optimization with Delays: Asynchronicity, Adaptivity, and Optimism
Authors: Yu-Guan Hsieh, Franck Iutzeler, Jérôme Malick, Panayotis Mertikopoulos
Description: Studies multi-agent online optimization with delays, focusing on asynchronicity and optimism.
Online Mirror Descent and Dual Averaging: Keeping Pace in the Dynamic Case
Authors: Huang Fang, Nicholas J. A. Harvey, Victor S. Portella, Michael P. Friedlander
Description: Analyzes online mirror descent and dual averaging for dynamic online optimization.
No Weighted-Regret Learning in Adversarial Bandits with Delays
Authors: Ilai Bistritz, Zhengyuan Zhou, Xi Chen, Nicholas Bambos, Jose Blanchet
Description: Investigates regret minimization in adversarial bandits with delays.
KL-UCB-Switch: Optimal Regret Bounds for Stochastic Bandits from Both a Distribution-Dependent and a Distribution-Free Viewpoints
Authors: Aurélien Garivier, Hédi Hadiji, Pierre Ménard, Gilles Stoltz
Description: Provides optimal regret bounds for stochastic bandits using KL-UCB-Switch.
Multi-Agent Multi-Armed Bandits with Limited Communication
Authors: Mridul Agarwal, Vaneet Aggarwal, Kamyar Azizzadenesheli
Description: Explores multi-agent bandits with limited communication, focusing on regret minimization.
Nonstochastic Bandits with Composite Anonymous Feedback
Authors: Nicolò Cesa-Bianchi, Tommaso Cesari, Roberto Colomboni, Claudio Gentile, Yishay Mansour
Description: Studies nonstochastic bandits with composite feedback, analyzing regret and optimization.
Expected Regret and Pseudo-Regret are Equivalent When the Optimal Arm is Unique
Authors: Daron Anderson, Douglas J. Leith
Description: Proves equivalence of expected regret and pseudo-regret in specific bandit settings.

Bayesian and Hyperparameter Optimization #

Papers addressing Bayesian optimization and hyperparameter tuning for efficient optimization.

SMAC3: A Versatile Bayesian Optimization Package for Hyperparameter Optimization
Authors: Marius Lindauer, Katharina Eggensperger, Matthias Feurer, André Biedenkapp, Difan Deng, Carolin Benjamins, Tim Ruhkopf, René Sass, Frank Hutter
Description: Presents SMAC3, a versatile Bayesian optimization package for hyperparameter tuning.
Implicit Differentiation for Fast Hyperparameter Selection in Non-Smooth Convex Learning
Authors: Quentin Bertrand, Quentin Klopfenstein, Mathurin Massias, Mathieu Blondel, Samuel Vaiter, Alexandre Gramfort, Joseph Salmon
Description: Uses implicit differentiation for efficient hyperparameter selection in nonsmooth convex optimization.
Auto-Sklearn 2.0: Hands-Free AutoML via Meta-Learning
Authors: Matthias Feurer, Katharina Eggensperger, Stefan Falkner, Marius Lindauer, Frank Hutter
Description: Introduces Auto-Sklearn 2.0, leveraging meta-learning for automated hyperparameter optimization.

Optimization in Reinforcement Learning #

Papers focusing on optimization techniques for reinforcement learning, including policy gradient and value estimation.

A Generalized Projected Bellman Error for Off-Policy Value Estimation in Reinforcement Learning
Authors: Andrew Patterson, Adam White, Martha White
Description: Develops optimization methods for off-policy value estimation using a generalized projected Bellman error.
Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences
Authors: Alan Chan, Hugo Silva, Sungsu Lim, Tadashi Kozuno, A. Rupam Mahmood, Martha White
Description: Investigates greedification operators for policy optimization, focusing on KL divergences.
Policy Gradient and Actor-Critic Learning in Continuous Time and Space: Theory and Algorithms
Authors: Yanwei Jia, Xun Yu Zhou
Description: Analyzes policy gradient and actor-critic methods for continuous-time RL optimization.
On the Convergence Rates of Policy Gradient Methods
Authors: Lin Xiao
Description: Studies convergence rates of policy gradient methods in reinforcement learning.
Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor-Critic under State Distribution Mismatch
Authors: Shangtong Zhang, Remi Tachet des Combes, Romain Laroche
Description: Examines global optimality in softmax off-policy actor-critic methods under distribution mismatch.

Optimization Research Papers in JMLR Volume 22

Wed, 29 Sep 2021 00:00:00 +0000

Optimization Research Papers in JMLR Volume 22 (2021) #

This document lists papers from JMLR Volume 22 (2021) that focus on optimization research, categorized by their primary themes. Each paper is numbered starting from 1 within its subsection, with a brief description of its key contributions to optimization theory, algorithms, or applications.

Convex Optimization #

Papers addressing convex optimization problems, including clustering, Wasserstein barycenters, sparse optimization, and bandits.

Convex Clustering: Model, Theoretical Guarantee and Efficient Algorithm
Authors: Defeng Sun, Kim-Chuan Toh, Yancheng Yuan
Description: Proposes a convex clustering model with theoretical guarantees and an efficient algorithm.
A Fast Globally Linearly Convergent Algorithm for the Computation of Wasserstein Barycenters
Authors: Lei Yang, Jia Li, Defeng Sun, Kim-Chuan Toh
Description: Develops a fast, globally linearly convergent algorithm for computing Wasserstein barycenters.
Wasserstein Barycenters Can Be Computed in Polynomial Time in Fixed Dimension
Authors: Jason M. Altschuler, Enric Boix-Adsera
Description: Demonstrates that Wasserstein barycenters can be computed in polynomial time for fixed dimensions.
From Low Probability to High Confidence in Stochastic Convex Optimization
Authors: Damek Davis, Dmitriy Drusvyatskiy, Lin Xiao, Junyu Zhang
Description: Analyzes methods to achieve high-confidence solutions in stochastic convex optimization.
Sparse and Smooth Signal Estimation: Convexification of L0-Formulations
Authors: Alper Atamturk, Andres Gomez, Shaoning Han
Description: Proposes convexification techniques for L0-formulations in sparse and smooth signal estimation.
Stochastic Proximal AUC Maximization
Authors: Yunwen Lei, Yiming Ying
Description: Develops stochastic proximal methods for maximizing the area under the ROC curve (AUC) in convex settings.
Sparse Convex Optimization via Adaptively Regularized Hard Thresholding
Authors: Kyriakos Axiotis, Maxim Sviridenko
Description: Introduces adaptively regularized hard thresholding for sparse convex optimization.
Learning Sparse Classifiers: Continuous and Mixed Integer Optimization Perspectives
Authors: Antoine Dedieu, Hussein Hazimeh, Rahul Mazumder
Description: Explores continuous and mixed-integer optimization approaches for learning sparse classifiers.
First-Order Convergence Theory for Weakly-Convex-Weakly-Concave Min-max Problems
Authors: Mingrui Liu, Hassan Rafique, Qihang Lin, Tianbao Yang
Description: Provides first-order convergence theory for weakly convex-weakly concave min-max problems.
Convex Geometry and Duality of Over-parameterized Neural Networks
Authors: Tolga Ergen, Mert Pilanci
Description: Analyzes convex geometry and duality in over-parameterized neural networks.
Linear Bandits on Uniformly Convex Sets
Authors: Thomas Kerdreux, Christophe Roux, Alexandre d’Aspremont, Sebastian Pokutta
Description: Studies linear bandits on uniformly convex sets, focusing on convex optimization techniques.

Nonconvex Optimization #

Papers tackling nonconvex optimization, including stochastic gradient descent, neural network training, and stability properties.

Online Stochastic Gradient Descent on Non-Convex Losses from High-Dimensional Inference
Authors: Gerard Ben Arous, Reza Gheissari, Aukosh Jagannath
Description: Analyzes online stochastic gradient descent for nonconvex losses in high-dimensional inference.
Non-attracting Regions of Local Minima in Deep and Wide Neural Networks
Authors: Henning Petzka, Cristian Sminchisescu
Description: Investigates non-attracting regions of local minima in deep and wide neural networks.
When Does Gradient Descent with Logistic Loss Find Interpolating Two-Layer Networks?
Authors: Niladri S. Chatterji, Philip M. Long, Peter L. Bartlett
Description: Examines conditions under which gradient descent with logistic loss finds interpolating two-layer networks.
Replica Exchange for Non-Convex Optimization
Authors: Jing Dong, Xin T. Tong
Description: Proposes replica exchange methods for nonconvex optimization problems.
Failures of Model-Dependent Generalization Bounds for Least-Norm Interpolation
Authors: Peter L. Bartlett, Philip M. Long
Description: Analyzes limitations of model-dependent generalization bounds in least-norm interpolation.
On the Stability Properties and the Optimization Landscape of Training Problems with Squared Loss for Neural Networks and General Nonlinear Conic Approximation Schemes
Authors: Constantin Christof
Description: Studies stability and optimization landscapes for neural network training with squared loss.

Stochastic Optimization #

Papers focusing on stochastic optimization methods, including momentum, Langevin dynamics, and communication-efficient algorithms.

Continuous Time Analysis of Momentum Methods
Authors: Nikola B. Kovachki, Andrew M. Stuart
Description: Provides a continuous-time analysis of momentum methods in stochastic optimization.
Generalization Performance of Multi-pass Stochastic Gradient Descent with Convex Loss Functions
Authors: Yunwen Lei, Ting Hu, Ke Tang
Description: Analyzes generalization performance of multi-pass stochastic gradient descent for convex losses.
High-Order Langevin Diffusion Yields an Accelerated MCMC Algorithm
Authors: Wenlong Mou, Yi-An Ma, Martin J. Wainwright, Peter L. Bartlett, Michael I. Jordan
Description: Develops an accelerated MCMC algorithm using high-order Langevin diffusion.
Path Length Bounds for Gradient Descent and Flow
Authors: Chirag Gupta, Sivaraman Balakrishnan, Aaditya Ramdas
Description: Establishes path length bounds for gradient descent and flow in stochastic optimization.
Optimization with Momentum: Dynamical, Control-Theoretic, and Symplectic Perspectives
Authors: Michael Muehlebach, Michael I. Jordan
Description: Analyzes momentum-based optimization from dynamical, control-theoretic, and symplectic perspectives.
L-SVRG and L-Katyusha with Arbitrary Sampling
Authors: Xun Qian, Zheng Qu, Peter Richtárik
Description: Introduces L-SVRG and L-Katyusha algorithms with arbitrary sampling for stochastic optimization.
A Lyapunov Analysis of Accelerated Methods in Optimization
Authors: Ashia C. Wilson, Ben Recht, Michael I. Jordan
Description: Provides a Lyapunov analysis for accelerated optimization methods.
NUQSGD: Provably Communication-Efficient Data-Parallel SGD via Nonuniform Quantization
Authors: Ali Ramezani-Kebrya, Fartash Faghri, Ilya Markov, Vitalii Aksenov, Dan Alistarh, Daniel M. Roy
Description: Proposes NUQSGD, a communication-efficient stochastic gradient descent method using nonuniform quantization.
An Inertial Newton Algorithm for Deep Learning
Authors: Camille Castera, Jérôme Bolte, Cédric Févotte, Edouard Pauwels
Description: Develops an inertial Newton algorithm for deep learning optimization.
Accelerating Ill-Conditioned Low-Rank Matrix Estimation via Scaled Gradient Descent
Authors: Tian Tong, Cong Ma, Yuejie Chi
Description: Proposes scaled gradient descent for accelerating ill-conditioned low-rank matrix estimation.
On ADMM in Deep Learning: Convergence and Saturation-Avoidance
Authors: Jinshan Zeng, Shao-Bo Lin, Yuan Yao, Ding-Xuan Zhou
Description: Analyzes convergence and saturation-avoidance properties of ADMM in deep learning.
A Unified Convergence Analysis for Shuffling-Type Gradient Methods
Authors: Lam M. Nguyen, Quoc Tran-Dinh, Dzung T. Phan, Phuong Ha Nguyen, Marten van Dijk
Description: Provides a unified convergence analysis for shuffling-type gradient methods.
Stochastic Online Optimization Using Kalman Recursion
Authors: Joseph de Vilmarest, Olivier Wintenberger
Description: Applies Kalman recursion to stochastic online optimization.
Expanding Boundaries of Gap Safe Screening
Authors: Cassio F. Dantas, Emmanuel Soubies, Cédric Févotte
Description: Expands gap safe screening techniques for stochastic optimization.
Consensus-Based Optimization on the Sphere: Convergence to Global Minimizers and Machine Learning
Authors: Massimo Fornasier, Lorenzo Pareschi, Hui Huang, Philippe Sünnen
Description: Develops consensus-based optimization on the sphere with applications to machine learning.
Decentralized Stochastic Gradient Langevin Dynamics and Hamiltonian Monte Carlo
Authors: Mert Gürbüzbalaban, Xuefeng Gao, Yuanhan Hu, Lingjiong Zhu
Description: Proposes decentralized stochastic gradient Langevin dynamics and Hamiltonian Monte Carlo methods.

Distributed/Decentralized Optimization #

Papers addressing distributed or decentralized optimization algorithms, focusing on communication efficiency and scalability.

Projection-Free Decentralized Online Learning for Submodular Maximization over Time-Varying Networks
Authors: Junlong Zhu, Qingtao Wu, Mingchuan Zhang, Ruijuan Zheng, Keqin Li
Description: Develops projection-free decentralized online learning for submodular maximization over time-varying networks.
Communication-Efficient Distributed Covariance Sketch, with Application to Distributed PCA
Authors: Zengfeng Huang, Xuemin Lin, Wenjie Zhang, Ying Zhang
Description: Proposes a communication-efficient distributed covariance sketch for distributed PCA.
Optimal Rates of Distributed Regression with Imperfect Kernels
Authors: Hongwei Sun, Qiang Wu
Description: Establishes optimal rates for distributed regression with imperfect kernels.
One-Shot Federated Learning: Theoretical Limits and Algorithms to Achieve Them
Authors: Saber Salehkaleybar, Arsalan Sharifnassab, S. Jamaloddin Golestani
Description: Analyzes theoretical limits and algorithms for one-shot federated learning.
Cooperative SGD: A Unified Framework for the Design and Analysis of Local-Update SGD Algorithms
Authors: Jianyu Wang, Gauri Joshi
Description: Introduces a unified framework for designing and analyzing local-update SGD algorithms.
DeEPCA: Decentralized Exact PCA with Linear Convergence Rate
Authors: Haishan Ye, Tong Zhang
Description: Develops DeEPCA, a decentralized exact PCA method with linear convergence.

Submodular Optimization #

Papers focusing on submodular optimization, particularly in experimental design.

Batch Greedy Maximization of Non-Submodular Functions: Guarantees and Applications to Experimental Design
Authors: Jayanth Jagalur-Mohan, Youssef Marzouk
Description: Provides guarantees for batch greedy maximization of non-submodular functions with applications to experimental design.

Bandits and Online Learning #

Papers addressing multi-armed bandits, online optimization, and regret minimization.

Regulating Greed Over Time in Multi-Armed Bandits
Authors: Stefano Tracà, Cynthia Rudin, Weiyu Yan
Description: Studies methods to regulate greed over time in multi-armed bandits.
Preference-Based Online Learning with Dueling Bandits: A Survey
Authors: Viktor Bengs, Róbert Busa-Fekete, Adil El Mesaoudi-Paul, Eyke Hüllermeier
Description: Surveys preference-based online learning with dueling bandits.
On Multi-Armed Bandit Designs for Dose-Finding Trials
Authors: Maryam Aziz, Emilie Kaufmann, Marie-Karelle Riviere
Description: Explores multi-armed bandit designs for dose-finding trials.
Tsallis-INF: An Optimal Algorithm for Stochastic and Adversarial Bandits
Authors: Julian Zimmert, Yevgeny Seldin
Description: Proposes Tsallis-INF, an optimal algorithm for stochastic and adversarial bandits.
Bandit Convex Optimization in Non-Stationary Environments
Authors: Peng Zhao, Guanghui Wang, Lijun Zhang, Zhi-Hua Zhou
Description: Addresses bandit convex optimization in non-stationary environments.
A Contextual Bandit Bake-off
Authors: Alberto Bietti, Alekh Agarwal, John Langford
Description: Compares contextual bandit algorithms in a comprehensive evaluation.
MetaGrad: Adaptation Using Multiple Learning Rates in Online Learning
Authors: Tim van Erven, Wouter M. Koolen, Dirk van der Hoeven
Description: Introduces MetaGrad, an adaptive online learning algorithm with multiple learning rates.
Achieving Fairness in the Stochastic Multi-Armed Bandit Problem
Authors: Vishakha Patil, Ganesh Ghalme, Vineet Nair, Y. Narahari
Description: Develops methods for achieving fairness in stochastic multi-armed bandits.
Refined Approachability Algorithms and Application to Regret Minimization with Global Costs
Authors: Joon Kwon
Description: Proposes refined approachability algorithms for regret minimization with global costs.
Bandit Learning in Decentralized Matching Markets
Authors: Lydia T. Liu, Feng Ruan, Horia Mania, Michael I. Jordan
Description: Applies bandit learning to decentralized matching markets.
Thompson Sampling Algorithms for Cascading Bandits
Authors: Zixin Zhong, Wang Chi Chueng, Vincent Y. F. Tan
Description: Develops Thompson sampling algorithms for cascading bandits.
Fast Learning for Renewal Optimization in Online Task Scheduling
Authors: Michael J. Neely
Description: Proposes fast learning methods for renewal optimization in online task scheduling.

Bayesian and Hyperparameter Optimization #

Papers addressing Bayesian optimization and hyperparameter tuning for scalable and robust optimization.

An Empirical Study of Bayesian Optimization: Acquisition Versus Partition
Authors: Erich Merrill, Alan Fern, Xiaoli Fern, Nima Dolatnia
Description: Conducts an empirical study comparing acquisition and partition strategies in Bayesian optimization.
Hyperparameter Optimization via Sequential Uniform Designs
Authors: Zebin Yang, Aijun Zhang
Description: Proposes sequential uniform designs for hyperparameter optimization.
Are We Forgetting about Compositional Optimisers in Bayesian Optimisation?
Authors: Antoine Grosnit, Alexander I. Cowen-Rivers, Rasul Tutunov, Ryan-Rhys Griffiths, Jun Wang, Haitham Bou-Ammar
Description: Explores the role of compositional optimizers in Bayesian optimization.
GIBBON: General-Purpose Information-Based Bayesian Optimisation
Authors: Henry B. Moss, David S. Leslie, Javier Gonzalez, Paul Rayson
Description: Introduces GIBBON, a general-purpose information-based Bayesian optimization framework.
On lp-Hyperparameter Learning via Bilevel Nonsmooth Optimization
Authors: Takayuki Okuno, Akiko Takeda, Akihiro Kawana, Motokazu Watanabe
Description: Studies lp-hyperparameter learning using bilevel nonsmooth optimization.

Optimization in Reinforcement Learning #

Papers focusing on optimization techniques for reinforcement learning, including policy iteration and Q-learning.

Safe Policy Iteration: A Monotonically Improving Approximate Policy Iteration Approach
Authors: Alberto Maria Metelli, Matteo Pirotta, Daniele Calandriello, Marcello Restelli
Description: Proposes a safe policy iteration method with monotonic improvement for reinforcement learning.
On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift
Authors: Alekh Agarwal, Sham M. Kakade, Jason D. Lee, Gaurav Mahajan
Description: Analyzes the optimality, approximation, and distribution shift in policy gradient methods.
Langevin Dynamics for Adaptive Inverse Reinforcement Learning of Stochastic Gradient Algorithms
Authors: Vikram Krishnamurthy, George Yin
Description: Applies Langevin dynamics to adaptive inverse reinforcement learning for stochastic gradient algorithms.
Hamilton-Jacobi Deep Q-Learning for Deterministic Continuous-Time Systems with Lipschitz Continuous Controls
Authors: Jeongho Kim, Jaeuk Shin, Insoon Yang
Description: Develops Hamilton-Jacobi deep Q-learning for deterministic continuous-time systems.
Partial Policy Iteration for L1-Robust Markov Decision Processes
Authors: Chin Pang Ho, Marek Petrik, Wolfram Wiesemann
Description: Introduces partial policy iteration for L1-robust Markov decision processes.
Gaussian Approximation for Bias Reduction in Q-Learning
Authors: Carlo D’Eramo, Andrea Cini, Alessandro Nuara, Matteo Pirotta, Cesare Alippi, Jan Peters, Marcello Restelli
Description: Proposes Gaussian approximation techniques for bias reduction in Q-learning.

Optimization Research Papers in JMLR Volume 21

Tue, 29 Sep 2020 00:00:00 +0000

Optimization Research Papers in JMLR Volume 21 (2020) #

This document lists papers from JMLR Volume 21 (2020) that focus on optimization research, categorized by their primary themes. Each paper is numbered starting from 1 within its subsection, with a brief description of its key contributions to optimization theory, algorithms, or applications.

Convex Optimization #

Papers addressing convex optimization problems, including complexity bounds, convergence analysis, and applications in regression and assortment optimization.

A Low Complexity Algorithm with O(√T) Regret and O(1) Constraint Violations for Online Convex Optimization with Long Term Constraints
Authors: Hao Yu, Michael J. Neely
Description: Proposes a low-complexity algorithm for online convex optimization with long-term constraints, achieving O(√T) regret and O(1) constraint violations.
Lower Bounds for Parallel and Randomized Convex Optimization
Authors: Jelena Diakonikolas, Cristóbal Guzmán
Description: Establishes lower complexity bounds for parallel and randomized algorithms in convex optimization.
Discerning the Linear Convergence of ADMM for Structured Convex Optimization through the Lens of Variational Analysis
Authors: Xiaoming Yuan, Shangzhi Zeng, Jin Zhang
Description: Analyzes the linear convergence of ADMM for structured convex optimization using variational analysis.
A Data Efficient and Feasible Level Set Method for Stochastic Convex Optimization with Expectation Constraints
Authors: Qihang Lin, Selvaprabu Nadarajah, Negar Soheili, Tianbao Yang
Description: Develops a data-efficient level set method for stochastic convex optimization with expectation constraints.
Conic Optimization for Quadratic Regression Under Sparse Noise
Authors: Igor Molybog, Ramtin Madani, Javad Lavaei
Description: Applies conic optimization to quadratic regression under sparse noise conditions.
Dynamic Assortment Optimization with Changing Contextual Information
Authors: Xi Chen, Yining Wang, Yuan Zhou
Description: Addresses dynamic assortment optimization with changing contextual information using convex optimization techniques.
Convex Programming for Estimation in Nonlinear Recurrent Models
Authors: Sohail Bahmani, Justin Romberg
Description: Uses convex programming for parameter estimation in nonlinear recurrent models.

Nonconvex Optimization #

Papers tackling nonconvex optimization, focusing on guarantees for local minima, variance reduction, and algorithmic advancements.

Exact Guarantees on the Absence of Spurious Local Minima for Non-negative Rank-1 Robust Principal Component Analysis
Authors: Salar Fattahi, Somayeh Sojoudi
Description: Provides exact guarantees for the absence of spurious local minima in non-negative rank-1 robust PCA.
Stochastic Nested Variance Reduction for Nonconvex Optimization
Authors: Dongruo Zhou, Pan Xu, Quanquan Gu
Description: Introduces a stochastic nested variance reduction method for nonconvex optimization.
ProxSARAH: An Efficient Algorithmic Framework for Stochastic Composite Nonconvex Optimization
Authors: Nhan H. Pham, Lam M. Nguyen, Dzung T. Phan, Quoc Tran-Dinh
Description: Proposes ProxSARAH, an efficient framework for stochastic composite nonconvex optimization.
Convergence Rates for the Stochastic Gradient Descent Method for Non-Convex Objective Functions
Authors: Benjamin Fehrman, Benjamin Gess, Arnulf Jentzen
Description: Analyzes convergence rates of stochastic gradient descent for nonconvex objective functions.
AdaGrad Stepsizes: Sharp Convergence Over Nonconvex Landscapes
Authors: Rachel Ward, Xiaoxia Wu, Leon Bottou
Description: Studies sharp convergence of AdaGrad stepsize schedules in nonconvex optimization.
A Sparse Semismooth Newton Based Proximal Majorization-Minimization Algorithm for Nonconvex Square-Root-Loss Regression Problems
Authors: Peipei Tang, Chengjing Wang, Defeng Sun, Kim-Chuan Toh
Description: Develops a sparse semismooth Newton-based proximal majorization-minimization algorithm for nonconvex square-root-loss regression.

Stochastic Optimization #

Papers focusing on stochastic optimization methods, including gradient descent, variance reduction, and robustness to noise.

Convergences of Regularized Algorithms and Stochastic Gradient Methods with Random Projections
Authors: Junhong Lin, Volkan Cevher
Description: Analyzes convergence of regularized algorithms and stochastic gradient methods with random projections.
Graph-Dependent Implicit Regularisation for Distributed Stochastic Subgradient Descent
Authors: Dominic Richards, Patrick Rebeschini
Description: Studies graph-dependent implicit regularization in distributed stochastic subgradient descent.
Robust Asynchronous Stochastic Gradient-Push: Asymptotically Optimal and Network-Independent Performance for Strongly Convex Functions
Authors: Artin Spiridonoff, Alex Olshevsky, Ioannis Ch. Paschalidis
Description: Proposes a robust asynchronous stochastic gradient-push method with asymptotically optimal performance for strongly convex functions.
On Stationary-Point Hitting Time and Ergodicity of Stochastic Gradient Langevin Dynamics
Authors: Xi Chen, Simon S. Du, Xin T. Tong
Description: Investigates stationary-point hitting time and ergodicity in stochastic gradient Langevin dynamics.
Stochastic Conditional Gradient Methods: From Convex Minimization to Submodular Maximization
Authors: Aryan Mokhtari, Hamed Hassani, Amin Karbasi
Description: Extends stochastic conditional gradient methods from convex minimization to submodular maximization.
A Class of Parallel Doubly Stochastic Algorithms for Large-Scale Learning
Authors: Aryan Mokhtari, Alec Koppel, Martin Takac, Alejandro Ribeiro
Description: Introduces parallel doubly stochastic algorithms for large-scale learning.
Gradient Descent for Sparse Rank-One Matrix Completion for Crowd-Sourced Aggregation of Sparsely Interacting Workers
Authors: Yao Ma, Alex Olshevsky, Csaba Szepesvari, Venkatesh Saligrama
Description: Applies gradient descent to sparse rank-one matrix completion for crowd-sourced worker aggregation.
Optimal Convergence for Distributed Learning with Stochastic Gradient Methods and Spectral Algorithms
Authors: Junhong Lin, Volkan Cevher
Description: Establishes optimal convergence rates for distributed learning using stochastic gradient methods and spectral algorithms.
Estimate Sequences for Stochastic Composite Optimization: Variance Reduction, Acceleration, and Robustness to Noise
Authors: Andrei Kulunchakov, Julien Mairal
Description: Develops estimate sequences for stochastic composite optimization with variance reduction and noise robustness.
A Unified q-Memorization Framework for Asynchronous Stochastic Optimization
Authors: Bin Gu, Wenhan Xian, Zhouyuan Huo, Cheng Deng, Heng Huang
Description: Proposes a unified q-memorization framework for asynchronous stochastic optimization.
Asymptotic Analysis via Stochastic Differential Equations of Gradient Descent Algorithms in Statistical and Computational Paradigms
Authors: Yazhen Wang, Shang Wu
Description: Analyzes gradient descent algorithms using stochastic differential equations in statistical and computational settings.
The Error-Feedback Framework: SGD with Delayed Gradients
Authors: Sebastian U. Stich, Sai Praneeth Karimireddy
Description: Introduces an error-feedback framework for stochastic gradient descent with delayed gradients.

Distributed/Parallel Optimization #

Papers addressing distributed or parallel optimization algorithms, focusing on communication efficiency and scalability.

On the Complexity Analysis of the Primal Solutions for the Accelerated Randomized Dual Coordinate Ascent
Authors: Huan Li, Zhouchen Lin
Description: Analyzes the complexity of primal solutions for accelerated randomized dual coordinate ascent in distributed settings.
WONDER: Weighted One-shot Distributed Ridge Regression in High Dimensions
Authors: Edgar Dobriban, Yue Sheng
Description: Proposes WONDER, a weighted one-shot distributed ridge regression method for high-dimensional data.
GADMM: Fast and Communication Efficient Framework for Distributed Machine Learning
Authors: Anis Elgabli, Jihong Park, Amrit S. Bedi, Mehdi Bennis, Vaneet Aggarwal
Description: Introduces GADMM, a fast and communication-efficient framework for distributed machine learning.
Communication-Efficient Distributed Optimization in Networks with Gradient Tracking and Variance Reduction
Authors: Boyue Li, Shicong Cen, Yuxin Chen, Yuejie Chi
Description: Develops communication-efficient distributed optimization with gradient tracking and variance reduction.
On Convergence of Distributed Approximate Newton Methods: Globalization, Sharper Bounds and Beyond
Authors: Xiao-Tong Yuan, Ping Li
Description: Analyzes convergence of distributed approximate Newton methods with sharper bounds and globalization techniques.

Submodular Optimization #

Papers focusing on submodular optimization, including minimization and maximization problems.

Quadratic Decomposable Submodular Function Minimization: Theory and Practice
Authors: Pan Li, Niao He, Olgica Milenkovic
Description: Studies quadratic decomposable submodular function minimization with theoretical and practical insights.
Optimal Algorithms for Continuous Non-monotone Submodular and DR-Submodular Maximization
Authors: Rad Niazadeh, Tim Roughgarden, Joshua R. Wang
Description: Develops optimal algorithms for continuous non-monotone submodular and DR-submodular maximization.

Bayesian and Hyperparameter Optimization #

Papers addressing Bayesian optimization and hyperparameter tuning for scalable and robust optimization.

Tuning Hyperparameters without Grad Students: Scalable and Robust Bayesian Optimisation with Dragonfly
Authors: Kirthevasan Kandasamy, Karun Raju Vysyaraju, Willie Neiswanger, Biswajit Paria, Christopher R. Collins, Jeff Schneider, Barnabas Poczos, Eric P. Xing
Description: Introduces Dragonfly, a scalable and robust Bayesian optimization framework for hyperparameter tuning.
Distributionally Ambiguous Optimization for Batch Bayesian Optimization
Authors: Nikitas Rontsis, Michael A. Osborne, Paul J. Goulart
Description: Proposes distributionally ambiguous optimization for batch Bayesian optimization.
The Kalai-Smorodinsky Solution for Many-Objective Bayesian Optimization
Authors: Mickael Binois, Victor Picheny, Patrick Taillandier, Abderrahmane Habbal
Description: Applies the Kalai-Smorodinsky solution to many-objective Bayesian optimization.
Robust Reinforcement Learning with Bayesian Optimisation and Quadrature
Authors: Supratik Paul, Konstantinos Chatzilygeroudis, Kamil Ciosek, Jean-Baptiste Mouret, Michael A. Osborne, Shimon Whiteson
Description: Integrates Bayesian optimization and quadrature for robust reinforcement learning.

Optimization in Reinforcement Learning #

Papers focusing on optimization techniques for policy optimization and reinforcement learning.

Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems
Authors: Dhruv Malik, Ashwin Pananjady, Kush Bhatia, Koulik Khamaru, Peter L. Bartlett, Martin J. Wainwright
Description: Develops derivative-free methods for policy optimization in linear quadratic systems with guarantees.
Expected Policy Gradients for Reinforcement Learning
Authors: Kamil Ciosek, Shimon Whiteson
Description: Introduces expected policy gradients for reinforcement learning optimization.
Importance Sampling Techniques for Policy Optimization
Authors: Alberto Maria Metelli, Matteo Papini, Nico Montali, Marcello Restelli
Description: Proposes importance sampling techniques for efficient policy optimization in reinforcement learning.

APA References for Convex Optimization and Analysis Books #

Hendrix, E. M. T., & G.-Tóth, B. (2010). Introduction to nonlinear and global optimization. Springer. https://link.springer.com/book/10.1007/978-0-387-88670-1
Horst, R., & Pardalos, P. M. (1995). Handbook of global optimization. Springer. https://link.springer.com/book/10.1007/978-1-4615-2025-2
Mordukhovich, B. S. (2006a). Variational analysis and generalized differentiation I: Basic theory. Springer. https://link.springer.com/book/10.1007/3-540-31247-1
Mordukhovich, B. S. (2006b). Variational analysis and generalized differentiation II: Applications. Springer. https://link.springer.com/book/10.1007/3-540-31246-3
Mordukhovich, B. S. (2018). Variational analysis and applications. Springer. https://link.springer.com/book/10.1007/978-3-319-92775-6
Mordukhovich, B. S. (2024). Second-order variational analysis in optimization, variational stability, and control: Theory, algorithms. [PDF file]. Retrieved from [source, if available].
Mordukhovich, B. S., & Nguyen, M. N. (2022). Convex analysis and beyond - Volume I: Basic theory. Springer. https://link.springer.com/book/10.1007/978-3-030-14784-6