<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Optimization on Nam Le</title><link>https://blog.namln.org/en/categories/optimization/</link><description>Recent content in Optimization on Nam Le</description><generator>Hugo</generator><language>en-US</language><lastBuildDate>Sun, 29 Sep 2024 00:00:00 +0000</lastBuildDate><atom:link href="https://blog.namln.org/en/categories/optimization/index.xml" rel="self" type="application/rss+xml"/><item><title>Optimization Papers in JMLR Volume 26</title><link>https://blog.namln.org/en/mathematics/analysis/optimization/jmlr-v26/</link><pubDate>Sun, 29 Sep 2024 00:00:00 +0000</pubDate><guid>https://blog.namln.org/en/mathematics/analysis/optimization/jmlr-v26/</guid><description/></item><item><title>Optimization Research Papers in JMLR Volume 25</title><link>https://blog.namln.org/en/mathematics/analysis/optimization/jmlr-v25/</link><pubDate>Sun, 29 Sep 2024 00:00:00 +0000</pubDate><guid>https://blog.namln.org/en/mathematics/analysis/optimization/jmlr-v25/</guid><description>&lt;h1 class="heading" id="optimization-research-papers-in-jmlr-volume-25-2024"&gt;
 Optimization Research Papers in JMLR Volume 25 (2024)&lt;span class="heading__anchor"&gt; &lt;a href="#optimization-research-papers-in-jmlr-volume-25-2024"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h1&gt;&lt;p&gt;This document lists papers from JMLR Volume 25 (2024) that focus on optimization research, categorized by their primary themes. Each paper is numbered starting from 1 within its subsection, with a brief description of its key contributions to optimization theory, algorithms, or applications.&lt;/p&gt;
&lt;h2 class="heading" id="convex-optimization"&gt;
 Convex Optimization&lt;span class="heading__anchor"&gt; &lt;a href="#convex-optimization"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;Papers addressing convex optimization problems, including sparse NMF, differential privacy, and sparse regression.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Lower Complexity Bounds of Finite-Sum Optimization Problems: The Results and Construction&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Yuze Han, Guangzeng Xie, Zhihua Zhang&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Investigates lower complexity bounds for finite-sum optimization problems in convex settings.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Sparse NMF with Archetypal Regularization: Computational and Robustness Properties&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Kayhan Behdin, Rahul Mazumder&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes sparse non-negative matrix factorization with archetypal regularization using convex optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Scaling the Convex Barrier with Sparse Dual Algorithms&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Alessandro De Palma, Harkirat Singh Behl, Rudy Bunel, Philip H.S. Torr, M. Pawan Kumar&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops sparse dual algorithms for scaling convex optimization problems.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Faster Rates in Differentially Private Stochastic Convex Optimization&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Jinyan Su, Lijie Hu, Di Wang&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Analyzes faster convergence rates for differentially private stochastic convex optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Estimation of Sparse Gaussian Graphical Models with Hidden Clustering Structure&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Meixia Lin, Defeng Sun, Kim-Chuan Toh, Chengjing Wang&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops convex optimization methods for sparse Gaussian graphical models with hidden clustering.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;A Minimax Optimal Approach to High-Dimensional Double Sparse Linear Regression&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Yanhang Zhang, Zhifan Li, Shixiang Liu, Jianxin Yin&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes a minimax optimal approach for high-dimensional double sparse linear regression using convex optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;An Inexact Projected Regularized Newton Method for Fused Zero-Norms Regularization Problems&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Yuqia Wu, Shaohua Pan, Xiaoqi Yang&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Introduces an inexact projected regularized Newton method for fused zero-norms regularization in convex optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 class="heading" id="nonconvex-optimization"&gt;
 Nonconvex Optimization&lt;span class="heading__anchor"&gt; &lt;a href="#nonconvex-optimization"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;Papers tackling nonconvex optimization, focusing on ADMM, Adam-family methods, and stochastic minimax optimization.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Convergence for Nonconvex ADMM, with Applications to CT Imaging&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Rina Foygel Barber, Emil Y. Sidky&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Studies convergence properties of nonconvex ADMM with applications to CT imaging.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Adam-Family Methods for Nonsmooth Optimization with Convergence Guarantees&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Nachuan Xiao, Xiaoyin Hu, Xin Liu, Kim-Chuan Toh&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops Adam-family methods for nonsmooth nonconvex optimization with convergence guarantees.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Nonasymptotic Analysis of Stochastic Gradient Hamiltonian Monte Carlo under Local Conditions for Nonconvex Optimization&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: O. Deniz Akyildiz, Sotirios Sabanis&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Provides a nonasymptotic analysis of stochastic gradient Hamiltonian Monte Carlo for nonconvex optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;High Probability Convergence Bounds for Non-Convex Stochastic Gradient Descent with Sub-Weibull Noise&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Liam Madden, Emiliano Dall&amp;rsquo;Anese, Stephen Becker&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Derives high-probability convergence bounds for nonconvex stochastic gradient descent with sub-Weibull noise.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Stochastic Regularized Majorization-Minimization with Weakly Convex and Multi-Convex Surrogates&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Hanbaek Lyu&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes stochastic regularized majorization-minimization for weakly convex and multi-convex problems.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Near-Optimal Algorithms for Stochastic Minimax Optimization&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Lesi Chen, Luo Luo&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops near-optimal algorithms for stochastic minimax optimization in nonconvex settings.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Scaled Conjugate Gradient Method for Nonconvex Optimization in Deep Neural Networks&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Naoki Sato, Koshiro Izumi, Hideaki Iiduka&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Introduces a scaled conjugate gradient method for nonconvex optimization in deep neural networks.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 class="heading" id="stochastic-optimization"&gt;
 Stochastic Optimization&lt;span class="heading__anchor"&gt; &lt;a href="#stochastic-optimization"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;Papers focusing on stochastic optimization methods, including continuous-time approximations, momentum, and curvature estimates.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;A Comparison of Continuous-Time Approximations to Stochastic Gradient Descent&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Stefan Ankirchner, Stefan Perko&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Compares continuous-time approximations to stochastic gradient descent for optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;On the Generalization of Stochastic Gradient Descent with Momentum&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Ali Ramezani-Kebrya, Kimon Antonakopoulos, Volkan Cevher, Ashish Khisti, Ben Liang&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Analyzes the generalization properties of stochastic gradient descent with momentum.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Stochastic Modified Flows, Mean-Field Limits and Dynamics of Stochastic Gradient Descent&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Benjamin Gess, Sebastian Kassing, Vitalii Konarovskyi&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Studies stochastic modified flows and mean-field limits for stochastic gradient descent dynamics.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Stochastic Approximation with Decision-Dependent Distributions: Asymptotic Normality and Optimality&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Joshua Cutler, Mateo Díaz, Dmitriy Drusvyatskiy&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Investigates stochastic approximation with decision-dependent distributions, focusing on asymptotic normality and optimality.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;An Algorithm with Optimal Dimension-Dependence for Zero-Order Nonsmooth Nonconvex Stochastic Optimization&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Guy Kornowski, Ohad Shamir&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes an algorithm with optimal dimension-dependence for zero-order nonsmooth nonconvex stochastic optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;On the Hyperparameters in Stochastic Gradient Descent with Momentum&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Bin Shi&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Examines the impact of hyperparameters in stochastic gradient descent with momentum.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Almost Sure Convergence Rates Analysis and Saddle Avoidance of Stochastic Gradient Methods&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Jun Liu, Ye Yuan&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Analyzes almost sure convergence rates and saddle avoidance in stochastic gradient methods.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;PROMISE: Preconditioned Stochastic Optimization Methods by Incorporating Scalable Curvature Estimates&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Zachary Frangella, Pratik Rathore, Shipu Zhao, Madeleine Udell&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Introduces preconditioned stochastic optimization methods with scalable curvature estimates.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Zeroth-Order Stochastic Approximation Algorithms for DR-Submodular Optimization&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Yuefang Lian, Xiao Wang, Dachuan Xu, Zhongrui Zhao&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops zeroth-order stochastic approximation algorithms for DR-submodular optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Stochastic-Constrained Stochastic Optimization with Markovian Data&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Yeongjong Kim, Dabeen Lee&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Studies stochastic-constrained optimization with Markovian data.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;High Probability and Risk-Averse Guarantees for a Stochastic Accelerated Primal-Dual Method&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Yassine Laguel, Necdet Serhat Aybat, Mert Gürbüzbalaban&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Provides high-probability and risk-averse guarantees for a stochastic accelerated primal-dual method.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 class="heading" id="distributeddecentralized-optimization"&gt;
 Distributed/Decentralized Optimization&lt;span class="heading__anchor"&gt; &lt;a href="#distributeddecentralized-optimization"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;Papers addressing distributed or decentralized optimization algorithms, focusing on communication efficiency and federated learning.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Distributed Gaussian Mean Estimation under Communication Constraints: Optimal Rates and Communication-Efficient Algorithms&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: T. Tony Cai, Hongji Wei&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops optimal rates and communication-efficient algorithms for distributed Gaussian mean estimation.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Accelerated Gradient Tracking over Time-Varying Graphs for Decentralized Optimization&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Huan Li, Zhouchen Lin&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes accelerated gradient tracking for decentralized optimization over time-varying graphs.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Compressed and Distributed Least-Squares Regression: Convergence Rates with Applications to Federated Learning&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Constantin Philippenko, Aymeric Dieuleveut&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Analyzes convergence rates for compressed and distributed least-squares regression in federated learning.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Federated Automatic Differentiation&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Keith Rush, Zachary Charles, Zachary Garrett&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Introduces federated automatic differentiation for distributed optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;A Random Projection Approach to Personalized Federated Learning: Enhancing Communication Efficiency, Robustness, and Fairness&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Yuze Han, Xiang Li, Shiyun Lin, Zhihua Zhang&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes a random projection approach to enhance communication efficiency in personalized federated learning.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Countering the Communication Bottleneck in Federated Learning: A Highly Efficient Zero-Order Optimization Technique&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Elissa Mhanna, Mohamad Assaad&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops a zero-order optimization technique to address communication bottlenecks in federated learning.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 class="heading" id="bandits-and-online-learning"&gt;
 Bandits and Online Learning&lt;span class="heading__anchor"&gt; &lt;a href="#bandits-and-online-learning"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;Papers addressing multi-armed bandits, online optimization, and regret minimization.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Exploration, Exploitation, and Engagement in Multi-Armed Bandits with Abandonment&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Zixian Yang, Xin Liu, Lei Ying&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Studies exploration, exploitation, and engagement in multi-armed bandits with abandonment.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Adaptivity and Non-Stationarity: Problem-Dependent Dynamic Regret for Online Convex Optimization&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Peng Zhao, Yu-Jie Zhang, Lijun Zhang, Zhi-Hua Zhou&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Analyzes problem-dependent dynamic regret for online convex optimization under non-stationarity.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Materials Discovery Using Max K-Armed Bandit&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Nobuaki Kikkawa, Hiroshi Ohno&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Applies max k-armed bandit algorithms to materials discovery, focusing on regret minimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Finite-Time Analysis of Globally Nonstationary Multi-Armed Bandits&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Junpei Komiyama, Edouard Fouché, Junya Honda&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Provides finite-time analysis for globally nonstationary multi-armed bandits.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Optimistic Online Mirror Descent for Bridging Stochastic and Adversarial Online Convex Optimization&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Sijia Chen, Yu-Jie Zhang, Wei-Wei Tu, Peng Zhao, Lijun Zhang&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops optimistic online mirror descent for bridging stochastic and adversarial online convex optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Continuous Prediction with Experts&amp;rsquo; Advice&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Nicholas J. A. Harvey, Christopher Liaw, Victor S. Portella&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Investigates continuous prediction with experts&amp;rsquo; advice in online learning settings.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Regret Analysis of Bilateral Trade with a Smoothed Adversary&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Nicolò Cesa-Bianchi, Tommaso Cesari, Roberto Colomboni, Federico Fusco, Stefano Leonardi&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Analyzes regret in bilateral trade with a smoothed adversary in online optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Optimal Learning Policies for Differential Privacy in Multi-Armed Bandits&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Siwei Wang, Jun Zhu&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops optimal learning policies for differential privacy in multi-armed bandits.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Information Capacity Regret Bounds for Bandits with Mediator Feedback&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Khaled Eldowa, Nicolò Cesa-Bianchi, Alberto Maria Metelli, Marcello Restelli&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Derives regret bounds for bandits with mediator feedback, focusing on information capacity.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Contextual Bandits with Packing and Covering Constraints: A Modular Lagrangian Approach via Regression&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Aleksandrs Slivkins, Xingyu Zhou, Karthik Abinav Sankararaman, Dylan J. Foster&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes a modular Lagrangian approach for contextual bandits with packing and covering constraints.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 class="heading" id="optimization-in-reinforcement-learning"&gt;
 Optimization in Reinforcement Learning&lt;span class="heading__anchor"&gt; &lt;a href="#optimization-in-reinforcement-learning"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;Papers focusing on optimization techniques for reinforcement learning, including policy gradient, actor-critic, and safe RL.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Fast Policy Extragradient Methods for Competitive Games with Entropy Regularization&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Shicong Cen, Yuting Wei, Yuejie Chi&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops fast policy extragradient methods for competitive games with entropy regularization in RL.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Sample-Efficient Adversarial Imitation Learning&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Dahuin Jung, Hyungyu Lee, Sungroh Yoon&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes sample-efficient adversarial imitation learning methods for RL optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;On the Sample Complexity and Metastability of Heavy-Tailed Policy Search in Continuous Control&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Amrit Singh Bedi, Anjaly Parayil, Junyu Zhang, Mengdi Wang, Alec Koppel&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Analyzes sample complexity and metastability for heavy-tailed policy search in continuous control.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Off-Policy Action Anticipation in Multi-Agent Reinforcement Learning&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Ariyan Bighashdel, Daan de Geus, Pavol Jancura, Gijs Dubbelman&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops off-policy action anticipation methods for multi-agent RL optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Policy Gradient Methods in the Presence of Symmetries and State Abstractions&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Prakash Panangaden, Sahand Rezaei-Shoshtari, Rosie Zhao, David Meger, Doina Precup&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Investigates policy gradient methods with symmetries and state abstractions for RL optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Log Barriers for Safe Black-Box Optimization with Application to Safe Reinforcement Learning&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Ilnura Usmanova, Yarden As, Maryam Kamgarpour, Andreas Krause&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes log barriers for safe black-box optimization with applications to safe RL.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Decentralized Natural Policy Gradient with Variance Reduction for Collaborative Multi-Agent Reinforcement Learning&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Jinchi Chen, Jie Feng, Weiguo Gao, Ke Wei&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops decentralized natural policy gradient with variance reduction for multi-agent RL.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Distributionally Robust Model-Based Offline Reinforcement Learning with Near-Optimal Sample Complexity&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Laixi Shi, Yuejie Chi&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Studies distributionally robust model-based offline RL with near-optimal sample complexity.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Sample Complexity of Neural Policy Mirror Descent for Policy Optimization on Low-Dimensional Manifolds&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Zhenghao Xu, Xiang Ji, Minshuo Chen, Mengdi Wang, Tuo Zhao&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Analyzes sample complexity of neural policy mirror descent for policy optimization on low-dimensional manifolds.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Mean-Field Approximation of Cooperative Constrained Multi-Agent Reinforcement Learning (CMARL)&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Washim Uddin Mondal, Vaneet Aggarwal, Satish V. Ukkusuri&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes mean-field approximations for cooperative constrained multi-agent RL optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Instrumental Variable Value Iteration for Causal Offline Reinforcement Learning&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Luofeng Liao, Zuyue Fu, Zhuoran Yang, Yixin Wang, Dingli Ma, Mladen Kolar, Zhaoran Wang&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops instrumental variable value iteration for causal offline RL optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Matryoshka Policy Gradient for Entropy-Regularized RL: Convergence and Global Optimality&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: François G. Ged, Maria Han Veiga&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Introduces a Matryoshka policy gradient method for entropy-regularized RL with convergence guarantees.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Data-Efficient Policy Evaluation Through Behavior Policy Search&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Josiah P. Hanna, Yash Chandak, Philip S. Thomas, Martha White, Peter Stone, Scott Niekum&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes data-efficient policy evaluation methods for RL through behavior policy search.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Empirical Design in Reinforcement Learning&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Andrew Patterson, Samuel Neumann, Martha White, Adam White&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Investigates empirical design strategies for optimization in reinforcement learning.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;A New, Physics-Informed Continuous-Time Reinforcement Learning Algorithm with Performance Guarantees&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Brent A. Wallace, Jennie Si&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops a physics-informed continuous-time RL algorithm with performance guarantees.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 class="heading" id="other-optimization-topics"&gt;
 Other Optimization Topics&lt;span class="heading__anchor"&gt; &lt;a href="#other-optimization-topics"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;Papers covering miscellaneous optimization topics, including optimal transport, bilevel optimization, and tensor recovery.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;On Efficient and Scalable Computation of the Nonparametric Maximum Likelihood Estimator in Mixture Models&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Yangjing Zhang, Ying Cui, Bodhisattva Sen, Kim-Chuan Toh&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes efficient and scalable computation methods for nonparametric MLE in mixture models using optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Tangential Wasserstein Projections&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Florian Gunsilius, Meng Hsuan Hsieh, Myung Jin Lee&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops tangential Wasserstein projections for optimization in optimal transport.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Win: Weight-Decay-Integrated Nesterov Acceleration for Faster Network Training&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Pan Zhou, Xingyu Xie, Zhouchen Lin, Kim-Chuan Toh, Shuicheng Yan&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Introduces a weight-decay-integrated Nesterov acceleration method for faster network training.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Optimal Algorithms for Stochastic Bilevel Optimization under Relaxed Smoothness Conditions&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Xuxing Chen, Tesi Xiao, Krishnakumar Balasubramanian&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops optimal algorithms for stochastic bilevel optimization under relaxed smoothness conditions.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Learning to Warm-Start Fixed-Point Optimization Algorithms&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Rajiv Sambharya, Georgina Hall, Brandon Amos, Bartolomeo Stellato&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes learning-based warm-start techniques for fixed-point optimization algorithms.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Wasserstein Proximal Coordinate Gradient Algorithms&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Rentian Yao, Xiaohui Chen, Yun Yang&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops Wasserstein proximal coordinate gradient algorithms for optimal transport optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;On the Convergence of Projected Alternating Maximization for Equitable and Optimal Transport&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Minhui Huang, Shiqian Ma, Lifeng Lai&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Analyzes convergence of projected alternating maximization for equitable and optimal transport.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Lower Complexity Adaptation for Empirical Entropic Optimal Transport&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Michel Groppe, Shayan Hundrieser&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes lower complexity adaptation methods for empirical entropic optimal transport.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Accelerating Nuclear-Norm Regularized Low-Rank Matrix Optimization Through Burer-Monteiro Decomposition&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Ching-pei Lee, Ling Liang, Tianyun Tang, Kim-Chuan Toh&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Introduces accelerated nuclear-norm regularized low-rank matrix optimization using Burer-Monteiro decomposition.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Guaranteed Nonconvex Factorization Approach for Tensor Train Recovery&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Zhen Qin, Michael B. Wakin, Zhihui Zhu&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops a guaranteed nonconvex factorization approach for tensor train recovery.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Infeasible Deterministic, Stochastic, and Variance-Reduction Algorithms for Optimization under Orthogonality Constraints&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Pierre Ablin, Simon Vary, Bin Gao, Pierre-Antoine Absil&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes algorithms for optimization under orthogonality constraints, including deterministic, stochastic, and variance-reduction methods.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;</description></item><item><title>Ebooks &amp; related papers on Convex Optimizations</title><link>https://blog.namln.org/en/mathematics/analysis/optimization/cvx-refs/</link><pubDate>Mon, 15 Jul 2024 00:00:00 +0000</pubDate><guid>https://blog.namln.org/en/mathematics/analysis/optimization/cvx-refs/</guid><description>&lt;h2 class="heading" id="ebooks"&gt;
 Ebooks&lt;span class="heading__anchor"&gt; &lt;a href="#ebooks"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;ol&gt;
&lt;li&gt;Boris Mordukhovich , Nguyen Mau Nam. &lt;a href="https://link.springer.com/book/10.1007/978-3-031-26458-0"&gt;An Easy Path to Convex Analysis and Applications&lt;/a&gt;. 2023&lt;/li&gt;
&lt;li&gt;Yurii Nesterov. &lt;a href="https://link.springer.com/book/10.1007/978-3-319-91578-4"&gt;Lectures on Convex Optimization&lt;/a&gt;. 2018&lt;/li&gt;
&lt;li&gt;Sébastien Bubeck. &lt;a href="https://arxiv.org/abs/1405.4980"&gt;Convex Optimization: Algorithms and Complexity&lt;/a&gt;. 2015&lt;/li&gt;
&lt;li&gt;Dimitri Bertsekas. &lt;a href="https://mcube.lab.nycu.edu.tw/~cfung/docs/books/bertsekas1999nonlinear_programming.pdf"&gt;Nonlinear Programming&lt;/a&gt;. 2016&lt;/li&gt;
&lt;li&gt;Boris Teodorovich Polyak. &lt;a href="https://www.researchgate.net/profile/Boris-Polyak-2/publication/342978480_Introduction_to_Optimization/links/5f1033e5299bf1e548ba4636/Introduction-to-Optimization.pdf"&gt;Introduction to Optimization&lt;/a&gt;. 1987&lt;/li&gt;
&lt;li&gt;R. T. Rockafellar. Convex Analysis. 1970&lt;/li&gt;
&lt;li&gt;H. H. Bauschke &amp;amp; P. L. Combettes. &lt;a href="https://link.springer.com/book/10.1007/978-3-319-48311-5"&gt;Convex Analysis and Monotone Operator Theory in Hilbert Spaces&lt;/a&gt;. 2011&lt;/li&gt;
&lt;li&gt;Lieven Vandenberghe and Stephen P. Boyd. &lt;a href="https://web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf"&gt;Convex Optimization&lt;/a&gt;. 2004&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 class="heading" id="papers"&gt;
 Papers&lt;span class="heading__anchor"&gt; &lt;a href="#papers"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;ol&gt;
&lt;li&gt;Yu. E. Nesterov. &lt;a href="https://hengshuaiyao.github.io/papers/nesterov83.pdf"&gt;A method of solving a convex programming problem with convergence rate&lt;/a&gt;. 1983&lt;/li&gt;
&lt;/ol&gt;</description></item><item><title>Pre-print articles on Adagrad-variant methods</title><link>https://blog.namln.org/en/mathematics/analysis/optimization/adagrad-variant/</link><pubDate>Mon, 15 Jul 2024 00:00:00 +0000</pubDate><guid>https://blog.namln.org/en/mathematics/analysis/optimization/adagrad-variant/</guid><description>&lt;h2 class="heading" id="1-heavy-tailed-class-imbalance-and-why-adam-outperforms-gradient-descent-on-language-models"&gt;
 1. Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models&lt;span class="heading__anchor"&gt; &lt;a href="#1-heavy-tailed-class-imbalance-and-why-adam-outperforms-gradient-descent-on-language-models"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Frederik Kunstner, Robin Yadav, Alan Milligan, Mark Schmidt, Alberto Bietti&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; Adam has been shown to outperform gradient descent on large language models by a larger margin than on other tasks, but it is unclear why. We show that a key factor in this performance gap is the heavy-tailed class imbalance found in language tasks. When trained with gradient descent, the loss of infrequent words decreases more slowly than the loss of frequent ones. This leads to a slow decrease on the average loss as most samples come from infrequent words. On the other hand, Adam and sign-based methods are less sensitive to this problem. To establish that this behavior is caused by class imbalance, we show empirically that it can be reproduced across architectures and data types, on language transformers, vision CNNs, and linear models. On a linear model with cross-entropy loss, we show that class imbalance leads to imbalanced, correlated gradients and Hessians that have been hypothesized to benefit Adam. We also prove that, in continuous time, gradient descent converges slowly on low-frequency classes while sign descent does not.&lt;/p&gt;
&lt;h2 class="heading" id="2-accelerated-parameter-free-stochastic-optimization"&gt;
 2. Accelerated Parameter-Free Stochastic Optimization&lt;span class="heading__anchor"&gt; &lt;a href="#2-accelerated-parameter-free-stochastic-optimization"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Itai Kreisler, Maor Ivgi, Oliver Hinder, Yair Carmon&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; We propose a method that achieves near-optimal rates for smooth stochastic convex optimization and requires essentially no prior knowledge of problem parameters. This improves on prior work which requires knowing at least the initial distance to optimality d0. Our method, U-DoG, combines UniXGrad (Kavis et al., 2019) and DoG (Ivgi et al., 2023) with novel iterate stabilization techniques. It requires only loose bounds on d0 and the noise magnitude, provides high probability guarantees under sub-Gaussian noise, and is also near-optimal in the non-smooth case. Our experiments show consistent, strong performance on convex problems and mixed results on neural network training.&lt;/p&gt;
&lt;h2 class="heading" id="3-universal-gradient-methods-for-stochastic-convex-optimization"&gt;
 3. Universal Gradient Methods for Stochastic Convex Optimization&lt;span class="heading__anchor"&gt; &lt;a href="#3-universal-gradient-methods-for-stochastic-convex-optimization"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Anton Rodomanov, Ali Kavis, Yongtao Wu, Kimon Antonakopoulos, Volkan Cevher&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; We develop universal gradient methods for Stochastic Convex Optimization (SCO). Our algorithms automatically adapt not only to the oracle&amp;rsquo;s noise but also to the Hölder smoothness of the objective function without a priori knowledge of the particular setting. The key ingredient is a novel strategy for adjusting step-size coefficients in the Stochastic Gradient Method (SGD). Unlike AdaGrad, which accumulates gradient norms, our Universal Gradient Method accumulates appropriate combinations of gradient- and iterate differences. The resulting algorithm has state-of-the-art worst-case convergence rate guarantees for the entire Hölder class including, in particular, both nonsmooth functions and those with Lipschitz continuous gradient. We also present the Universal Fast Gradient Method for SCO enjoying optimal efficiency estimates.&lt;/p&gt;</description></item><item><title>Pre-print articles on Adaptive Optimization</title><link>https://blog.namln.org/en/mathematics/analysis/optimization/adaptive-optimization/</link><pubDate>Mon, 15 Jul 2024 00:00:00 +0000</pubDate><guid>https://blog.namln.org/en/mathematics/analysis/optimization/adaptive-optimization/</guid><description>&lt;h2 class="heading" id="1-a-simple-uniformly-optimal-method-without-line-search-for-convex-optimization"&gt;
 1. &lt;a href="https://arxiv.org/pdf/2310.10082"&gt;A simple uniformly optimal method without line search for convex optimization&lt;/a&gt;&lt;span class="heading__anchor"&gt; &lt;a href="#1-a-simple-uniformly-optimal-method-without-line-search-for-convex-optimization"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Tianjiao Li, Guanghui Lan&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; Line search (or backtracking) procedures have been widely employed into first-order methods for solving convex optimization problems, especially those with unknown problem parameters (e.g., Lipschitz constant). In this paper, we show that line search is superfluous in attaining the optimal rate of convergence for solving a convex optimization problem whose parameters are not given a priori. In particular, we present a novel accelerated gradient descent type algorithm called auto-conditioned fast gradient method (AC-FGM) that can achieve an optimal $\mathcal{O}(1/k^2)$ rate of convergence for smooth convex optimization without requiring the estimate of a global Lipschitz constant or the employment of line search procedures. We then extend AC-FGM to solve convex optimization problems with Hölder continuous gradients and show that it automatically achieves the optimal rates of convergence uniformly for all problem classes with the desired accuracy of the solution as the only input. Finally, we report some encouraging numerical results that demonstrate the advantages of AC-FGM over the previously developed parameter-free methods for convex optimization.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Source code&lt;/strong&gt;: &lt;a href="https://github.com/tli432/AC-FGM-Implementation"&gt;https://github.com/tli432/AC-FGM-Implementation&lt;/a&gt;&lt;/p&gt;
&lt;h2 class="heading" id="2-adaptive-proximal-gradient-method-for-convex-optimization"&gt;
 2. &lt;a href="https://arxiv.org/pdf/2308.02261"&gt;Adaptive Proximal Gradient Method for Convex Optimization&lt;/a&gt;&lt;span class="heading__anchor"&gt; &lt;a href="#2-adaptive-proximal-gradient-method-for-convex-optimization"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Yura Malitsky, Konstantin Mishchenko&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; In this paper, we explore two fundamental first-order algorithms in convex optimization, namely, gradient descent (GD) and proximal gradient method (ProxGD). Our focus is on making these algorithms entirely adaptive by leveraging local curvature information of smooth functions. We propose adaptive versions of GD and ProxGD that are based on observed gradient differences and, thus, have no added computational costs. Moreover, we prove convergence of our methods assuming only local Lipschitzness of the gradient. In addition, the proposed versions allow for even larger stepsizes than those initially suggested in [MM20].&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Source code&lt;/strong&gt;: &lt;a href="https://github.com/ymalitsky/AdProxGD"&gt;https://github.com/ymalitsky/AdProxGD&lt;/a&gt;&lt;/p&gt;
&lt;h2 class="heading" id="3-an-adaptive-stochastic-gradient-method-with-non-negative-gauss-newton-stepsizes"&gt;
 3. &lt;a href="https://arxiv.org/pdf/2407.04358"&gt;An Adaptive Stochastic Gradient Method with Non-negative Gauss-Newton Stepsizes&lt;/a&gt;&lt;span class="heading__anchor"&gt; &lt;a href="#3-an-adaptive-stochastic-gradient-method-with-non-negative-gauss-newton-stepsizes"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Antonio Orvieto, Lin Xiao&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; We consider the problem of minimizing the average of a large number of smooth but possibly non-convex functions. In the context of most machine learning applications, each loss function is non-negative and thus can be expressed as the composition of a square and its real-valued square root. This reformulation allows us to apply the Gauss-Newton method, or the Levenberg-Marquardt method when adding a quadratic regularization. The resulting algorithm, while being computationally as efficient as the vanilla stochastic gradient method, is highly adaptive and can automatically warmup and decay the effective stepsize while tracking the non-negative loss landscape. We provide a tight convergence analysis, leveraging new techniques, in the stochastic convex and non-convex settings. In particular, in the convex case, the method does not require access to the gradient Lipshitz constant for convergence, and is guaranteed to never diverge. The convergence rates and empirical evaluations compare favorably to the classical (stochastic) gradient method as well as to several other adaptive methods.&lt;/p&gt;
&lt;h2 class="heading" id="4-stochastic-polyak-step-sizes-and-momentum-convergence-guarantees-and-practical-performance"&gt;
 4. Stochastic Polyak Step-sizes and Momentum: Convergence Guarantees and Practical Performance&lt;span class="heading__anchor"&gt; &lt;a href="#4-stochastic-polyak-step-sizes-and-momentum-convergence-guarantees-and-practical-performance"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Antonio Orvieto, Lin Xiao&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; Stochastic gradient descent with momentum, also known as Stochastic Heavy Ball method (SHB), is one of the most popular algorithms for solving large-scale stochastic optimization problems in various machine learning tasks. In practical scenarios, tuning the step-size and momentum parameters of the method is a prohibitively expensive and time-consuming process. In this work, inspired by the recent advantages of stochastic Polyak step-size in the performance of stochastic gradient descent (SGD), we propose and explore new Polyak-type variants suitable for the update rule of the SHB method. In particular, using the Iterate Moving Average (IMA) viewpoint of SHB, we propose and analyze three novel step-size selections: $\text{MomSPS} _{\max}$, $\text{MomDecSPS}$, and $\text{MomAdaSPS}$. For $\text{MomSPS} _{\max}$, we provide convergence guarantees for SHB to a neighborhood of the solution for convex and smooth problems (without assuming interpolation). If interpolation is also satisfied, then using $\text{MomSPS} _{\max}$, SHB converges to the true solution at a fast rate matching the deterministic HB. The other two variants, MomDecSPS and MomAdaSPS, are the first adaptive step-size for SHB that guarantee convergence to the exact minimizer - without a priori knowledge of the problem parameters and without assuming interpolation. Our convergence analysis of SHB is tight and obtains the convergence guarantees of stochastic Polyak step-size for SGD as a special case. We supplement our analysis with experiments validating our theory and demonstrating the effectiveness and robustness of our algorithms.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Where&lt;/strong&gt;: 13th International Conference on Learning Representations (ICLR 2025)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Source code&lt;/strong&gt;: &lt;a href="https://openreview.net/forum?id=nuX2yPejiL"&gt;https://openreview.net/forum?id=nuX2yPejiL&lt;/a&gt;&lt;/p&gt;</description></item><item><title>Pre-print articles on gradient-clipping methods</title><link>https://blog.namln.org/en/mathematics/analysis/optimization/gradient-clipping/</link><pubDate>Mon, 15 Jul 2024 00:00:00 +0000</pubDate><guid>https://blog.namln.org/en/mathematics/analysis/optimization/gradient-clipping/</guid><description>&lt;h2 class="heading" id="1-why-gradient-clipping-accelerates-training-a-theoretical-justification-for-adaptivity"&gt;
 1. &lt;a href="https://arxiv.org/pdf/1905.11881"&gt;Why gradient clipping accelerates training: A theoretical justification for adaptivity&lt;/a&gt;&lt;span class="heading__anchor"&gt; &lt;a href="#1-why-gradient-clipping-accelerates-training-a-theoretical-justification-for-adaptivity"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Jingzhao Zhang, Tianxing He, Suvrit Sra, Ali Jadbabaie&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; We provide a theoretical explanation for the effectiveness of gradient clipping in training deep neural networks. The key ingredient is a new smoothness condition derived from practical neural network training examples. We observe that gradient smoothness, a concept central to the analysis of first-order optimization algorithms that is often assumed to be a constant, demonstrates significant variability along the training trajectory of deep neural networks. Further, this smoothness positively correlates with the gradient norm, and contrary to standard assumptions in the literature, it can grow with the norm of the gradient. These empirical observations limit the applicability of existing theoretical analyses of algorithms that rely on a fixed bound on smoothness. These observations motivate us to introduce a novel relaxation of gradient smoothness that is weaker than the commonly used Lipschitz smoothness assumption. Under the new condition, we prove that two popular methods, namely, \emph{gradient clipping} and \emph{normalized gradient}, converge arbitrarily faster than gradient descent with fixed stepsize. We further explain why such adaptively scaled gradient methods can accelerate empirical convergence and verify our results empirically in popular neural network training settings.&lt;/p&gt;
&lt;h2 class="heading" id="2-revisiting-gradient-clipping-stochastic-bias-and-tight-convergence-guarantees"&gt;
 2. &lt;a href="https://arxiv.org/pdf/2305.01588"&gt;Revisiting Gradient Clipping: Stochastic bias and tight convergence guarantees&lt;/a&gt;&lt;span class="heading__anchor"&gt; &lt;a href="#2-revisiting-gradient-clipping-stochastic-bias-and-tight-convergence-guarantees"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Anastasia Koloskova, Hadrien Hendrikx, Sebastian U. Stich&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; Gradient clipping is a popular modification to standard (stochastic) gradient descent, at every iteration limiting the gradient norm to a certain value $c &amp;gt;0$. It is widely used for example for stabilizing the training of deep learning models (Goodfellow et al., 2016), or for enforcing differential privacy (Abadi et al., 2016). Despite popularity and simplicity of the clipping mechanism, its convergence guarantees often require specific values of c and strong noise assumptions.&lt;/p&gt;
&lt;p&gt;In this paper, we give convergence guarantees that show precise dependence on arbitrary clipping thresholds c and show that our guarantees are tight with both deterministic and stochastic gradients. In particular, we show that (i) for deterministic gradient descent, the clipping threshold only affects the higher-order terms of convergence, (ii) in the stochastic setting convergence to the true optimum cannot be guaranteed under the standard noise assumption, even under arbitrary small step-sizes. We give matching upper and lower bounds for convergence of the gradient norm when running clipped SGD, and illustrate these results with experiments.&lt;/p&gt;
&lt;h2 class="heading" id="3-clipping-improves-adam-norm-and-adagrad-norm-when-the-noise-is-heavy-tailed"&gt;
 3. &lt;a href="https://arxiv.org/pdf/2406.04443"&gt;Clipping Improves Adam-Norm and AdaGrad-Norm when the Noise Is Heavy-Tailed&lt;/a&gt;&lt;span class="heading__anchor"&gt; &lt;a href="#3-clipping-improves-adam-norm-and-adagrad-norm-when-the-noise-is-heavy-tailed"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Savelii Chezhegov, Yaroslav Klyukin, Andrei Semenov, Aleksandr Beznosikov, Alexander Gasnikov, Samuel Horváth, Martin Takáč, Eduard Gorbunov&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; Methods with adaptive stepsizes, such as AdaGrad and Adam, are essential for training modern Deep Learning models, especially Large Language Models. Typically, the noise in the stochastic gradients is heavy-tailed for the later ones. Gradient clipping provably helps to achieve good high-probability convergence for such noises. However, despite the similarity between AdaGrad/Adam and Clip-SGD, the current understanding of the high-probability convergence of AdaGrad/Adam-type methods is limited in this case. In this work, we prove that AdaGrad/Adam (and their delayed version) can have provably bad high-probability convergence if the noise is heavy-tailed. We also show that gradient clipping fixes this issue, i.e., we derive new high-probability convergence bounds with polylogarithmic dependence on the confidence level for AdaGrad-Norm and Adam-Norm with clipping and with/without delay for smooth convex/non-convex stochastic optimization with heavy-tailed noise. Our empirical evaluations highlight the superiority of clipped versions of AdaGrad/Adam-Norm in handling the heavy-tailed noise.&lt;/p&gt;</description></item><item><title>Mathematics - Optimization</title><link>https://blog.namln.org/en/mathematics/analysis/optimization/</link><pubDate>Thu, 27 Jun 2024 23:14:15 +0800</pubDate><guid>https://blog.namln.org/en/mathematics/analysis/optimization/</guid><description>&lt;h1 class="heading" id="branches-of-optimization-research"&gt;
 Branches of Optimization Research&lt;span class="heading__anchor"&gt; &lt;a href="#branches-of-optimization-research"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h1&gt;&lt;h2 class="heading" id="convex-optimization"&gt;
 Convex Optimization&lt;span class="heading__anchor"&gt; &lt;a href="#convex-optimization"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;Convex optimization focuses on problems where the objective function and constraints are convex, ensuring a single global optimum. This field is foundational in machine learning, signal processing, and control systems due to its guaranteed convergence and efficient algorithms.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;Convex Optimization&lt;/em&gt; by Boyd and Vandenberghe - &lt;a href="https://web.stanford.edu/~boyd/cvxbook/"&gt;PDF&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Convex Optimization Theory&lt;/em&gt; by Dimitri P. Bertsekas - &lt;a href="https://web.mit.edu/dimitrib/www/Convex_Theory_Entire_Book.pdf"&gt;PDF&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 class="heading" id="discrete-combinatorial-and-integer-optimization"&gt;
 Discrete, Combinatorial, and Integer Optimization&lt;span class="heading__anchor"&gt; &lt;a href="#discrete-combinatorial-and-integer-optimization"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;This branch deals with optimization problems involving discrete variables, such as integers or combinatorial structures, often encountered in scheduling, network design, and logistics. Bayesian optimization, a subset, is particularly useful for optimizing expensive black-box functions.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;Bayesian Optimization In Action&lt;/em&gt; by Quan Nguyen - &lt;a href="https://www.amazon.com/Bayesian-Optimization-Action-Quan-Nguyen/dp/1633439070"&gt;Amazon&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Experimentation for Engineers&lt;/em&gt; by David Sweet - &lt;a href="https://www.amazon.com/Tuning-Up-testing-Bayesian-optimization/dp/1617298158"&gt;Amazon&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 class="heading" id="operations-research"&gt;
 Operations Research&lt;span class="heading__anchor"&gt; &lt;a href="#operations-research"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;Operations research applies mathematical modeling and optimization to complex decision-making in logistics, supply chain, and resource allocation. It integrates techniques like linear programming, simulation, and heuristic methods to optimize real-world systems.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;Operations Research An Introduction&lt;/em&gt; by Hamdy A. Taha - &lt;a href="https://www.pearson.com/en-us/subject-catalog/p/operations-research-an-introduction/P200000003221"&gt;Pearson&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Introduction to Operations Research&lt;/em&gt; by Frederick Hillier and Gerald Lieberman - &lt;a href="https://www.mheducation.com/highered/product/introduction-operations-research-hillier-lieberman/M9781259872990.html"&gt;McGraw Hill&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Julia Programming for Operations Research&lt;/em&gt; by Changhyun Kwon - &lt;a href="https://juliabook.chkwon.net/book"&gt;PDF&lt;/a&gt; - &lt;a href="https://github.com/chkwon/jpor_codes"&gt;code&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Mathematical Programming and Operations Research: Modeling, Algorithms, and Complexity. Examples in Python and Julia&lt;/em&gt;. Edited by Robert Hildebrand - &lt;a href="https://github.com/open-optimization/open-optimization-or-book/blob/master/MathematicalProgrammingandOperationsResearch.pdf"&gt;PDF&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;A First Course in Linear Optimization&lt;/em&gt; by Jon Lee - &lt;a href="https://www.solvermax.com/downloads/lee-linearoptimization4.pdf"&gt;PDF&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Decomposition Techniques in Mathematical Programming&lt;/em&gt; by Conejo , Castillo , Mínguez , and García-Bertrand - &lt;a href="https://link.springer.com/book/10.1007/3-540-27686-6"&gt;Springer&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Algorithms for Optimization&lt;/em&gt; by Mykel J. Kochenderfer and Tim A. Wheeler - &lt;a href="https://algorithmsbook.com/optimization/files/optimization.pdf"&gt;PDF&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Model Building in Mathematical Programming&lt;/em&gt; - Introductory modeling book by H. Paul Williams - &lt;a href="https://www.wiley.com/en-ie/Model&amp;#43;Building&amp;#43;in&amp;#43;Mathematical&amp;#43;Programming,&amp;#43;5th&amp;#43;Edition-p-9781118443330"&gt;Wiley&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 class="heading" id="meta-heuristics"&gt;
 Meta-heuristics&lt;span class="heading__anchor"&gt; &lt;a href="#meta-heuristics"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;Meta-heuristics are high-level strategies for solving complex optimization problems where exact methods are computationally infeasible. They include nature-inspired algorithms like genetic algorithms and simulated annealing, widely used in engineering and data science.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;Metaheuristics&lt;/em&gt; by Patrick Siarry - &lt;a href="https://link.springer.com/book/10.1007/978-3-319-45403-0"&gt;Springer (open access)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Essentials of Metaheuristics&lt;/em&gt; by Sean Luke - &lt;a href="https://cs.gmu.edu/~sean/book/metaheuristics/"&gt;link&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Handbook of Metaheuristics&lt;/em&gt; by Michel Gendreau and Jean-Yves Potvin - &lt;a href="https://link.springer.com/book/10.1007/978-1-4419-1665-5"&gt;Springer (open access)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;An Introduction to Metaheuristics for Optimization&lt;/em&gt; by Bastien Chopard , Marco Tomassini - &lt;a href="https://link.springer.com/book/10.1007/978-3-319-93073-2"&gt;Springer (open access)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Metaheuristic and Evolutionary Computation: Algorithms and Applications&lt;/em&gt; by Hasmat Malik, Atif Iqbal, Puneet Joshi, Sanjay Agrawal, and Farhad Ilahi Bakhsh - &lt;a href="https://link.springer.com/book/10.1007/978-981-15-7571-6"&gt;Springer (open access)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Clever Algorithms: Nature-Inspired Programming Recipes&lt;/em&gt; by Jason Brownlee - &lt;a href="https://github.com/clever-algorithms/CleverAlgorithms"&gt;GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Metaheuristics: from design to implementation&lt;/em&gt; by El-Ghazali Talbi - &lt;a href="https://www.wiley.com/en-us/Metaheuristics%3A&amp;#43;From&amp;#43;Design&amp;#43;to&amp;#43;Implementation&amp;#43;-p-9780470278581#:~:text=Description,-A%20unified%20view&amp;amp;text=This%20book%20provides%20a%20complete,design%2C%20routing%2C%20and%20scheduling."&gt;Wiley&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 class="heading" id="dynamic-programming-and-reinforcement-learning"&gt;
 Dynamic Programming and Reinforcement Learning&lt;span class="heading__anchor"&gt; &lt;a href="#dynamic-programming-and-reinforcement-learning"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;Dynamic programming and reinforcement learning address sequential decision-making problems, breaking them into subproblems or learning optimal policies through interaction with environments. These methods are critical in robotics, finance, and AI.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Various tiltes on &lt;em&gt;Dynamic Programming, Optimal Control and Reinforcement Learning&lt;/em&gt; by Dimitri Bertsekas. - &lt;a href="http://www.athenasc.com/index.html"&gt;List&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Reinforcement Learning: An Introduction (2nd Edition)&lt;/em&gt; by Richard Sutton and Andrew Barto - &lt;a href="http://incompleteideas.net/book/RLbook2020.pdf"&gt;PDF&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Decision Making Under Uncertainty: Theory and Application&lt;/em&gt; by Mykel J. Kochenderfer - &lt;a href="https://web.stanford.edu/group/sisl/public/dmu.pdf"&gt;PDF&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Algorithms for Decision Making&lt;/em&gt; by Mykel J. Kochenderfer, Tim A. Wheeler, and Kyle H. Wray - &lt;a href="https://algorithmsbook.com/files/dm.pdf"&gt;PDF&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 class="heading" id="constraint-programming"&gt;
 Constraint Programming&lt;span class="heading__anchor"&gt; &lt;a href="#constraint-programming"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;Constraint programming solves problems by defining constraints that must be satisfied, often used in scheduling, planning, and configuration tasks. It excels in problems with complex logical constraints and discrete variables.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;Handbook of Constraint Programming&lt;/em&gt; by Francesca Rossi, Peter van Beek and Toby Walsh - &lt;a href="https://www.amazon.com/dp/0444527265"&gt;Amazon&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;A Tutorial on Constraint Programming&lt;/em&gt; by Barbara M. Smith (University of Leeds) - &lt;a href="https://www.dcs.gla.ac.uk/~pat/cpM/papers/95_14.pdf"&gt;PDF&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 class="heading" id="combinatorial-optimization"&gt;
 Combinatorial Optimization&lt;span class="heading__anchor"&gt; &lt;a href="#combinatorial-optimization"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;Combinatorial optimization focuses on finding optimal solutions in discrete structures, such as graphs or sets, often using algorithms for problems like the traveling salesman or graph coloring, with applications in logistics and network design.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;Combinatorial Optimization: Algorithms and Complexity&lt;/em&gt; by by Christos H. Papadimitriou and Kenneth Steiglitz - &lt;a href="https://www.amazon.com/Combinatorial-Optimization-Algorithms-Complexity-Computer-ebook/dp/B00C8UQZAO"&gt;Amazon&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Combinatorial Optimization: Theory and Algorithms&lt;/em&gt; by Bernhard Korte and Jens Vygen - &lt;a href="https://link.springer.com/book/10.1007/978-3-662-56039-6"&gt;Springer&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;A First Course in Combinatorial Optimization&lt;/em&gt; by Jon Lee - &lt;a href="https://www.amazon.com/Combinatorial-Optimization-Cambridge-Applied-Mathematics/dp/0521010128"&gt;Amazon&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 class="heading" id="stochastic-optimization-and-control"&gt;
 Stochastic Optimization and Control&lt;span class="heading__anchor"&gt; &lt;a href="#stochastic-optimization-and-control"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;Stochastic optimization handles problems with uncertainty or randomness, using probabilistic models to optimize objectives. It is widely applied in machine learning, finance, and operations research for robust decision-making.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;Lectures on Stochastic Programming Modeling and Theory&lt;/em&gt; (SIAM) - by Shapiro, Dentcheva, and Ruszczynski - &lt;a href="https://bpb-us-w2.wpmucdn.com/sites.gatech.edu/dist/4/1470/files/2021/03/SPbook.pdf"&gt;PDF&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Introductory Lectures on Stochastic Optimization&lt;/em&gt; by John C. Duchi - &lt;a href="https://web.stanford.edu/~jduchi/PCMIConvex/Duchi16.pdf"&gt;PDF&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 class="heading" id="useful-resources"&gt;
 Useful Resources&lt;span class="heading__anchor"&gt; &lt;a href="#useful-resources"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;ol&gt;
&lt;li&gt;Prof. Nguyen Mau Nam, &lt;a href="https://maunamn.wordpress.com/"&gt;Convex Analysis - An introduction to convexity and nonsmooth analysis&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Ben Recht, &lt;a href="https://www.argmin.net/"&gt;arg min&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Prof. Dimitri P. Bertsekas, &lt;a href="http://www.athenasc.com/convexity.html"&gt;Convex Analysis and Optimization&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Prof. Dimitri P. Bertsekas, &lt;a href="http://www.athenasc.com/nonlinbook.html"&gt;Nonlinear Programming: 3rd Edition&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.offconvex.org/"&gt;Off the convex path&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;h1 class="heading" id="post-on-optimization"&gt;
 Post on Optimization&lt;span class="heading__anchor"&gt; &lt;a href="#post-on-optimization"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h1&gt;</description></item><item><title>Pre-print articles on Difference-of-Convex (DC) Programming</title><link>https://blog.namln.org/en/mathematics/analysis/optimization/dc-programming/</link><pubDate>Thu, 27 Jun 2024 23:14:15 +0800</pubDate><guid>https://blog.namln.org/en/mathematics/analysis/optimization/dc-programming/</guid><description>&lt;h2 class="heading" id="57-stochastic-difference-of-convex-optimization-with-momentum"&gt;
 57. &lt;a href="https://arxiv.org/abs/2510.17503"&gt;Stochastic Difference-of-Convex Optimization with Momentum&lt;/a&gt;&lt;span class="heading__anchor"&gt; &lt;a href="#57-stochastic-difference-of-convex-optimization-with-momentum"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; El Mahdi Chayti, Martin Jaggi&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; Stochastic difference-of-convex (DC) optimization is prevalent in numerous machine learning applications, yet its convergence properties under small batch sizes remain poorly understood. Existing methods typically require large batches or strong noise assumptions, which limit their practical use. In this work, we show that momentum enables convergence under standard smoothness and bounded variance assumptions (of the concave part) for any batch size. We prove that without momentum, convergence may fail regardless of stepsize, highlighting its necessity. Our momentum-based algorithm achieves provable convergence and demonstrates strong empirical performance.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL&lt;/strong&gt;: &lt;a href="https://arxiv.org/abs/2510.17503"&gt;https://arxiv.org/abs/2510.17503&lt;/a&gt;&lt;/p&gt;
&lt;h2 class="heading" id="56-on-the-convergence-rate-of-the-boosted-difference-of-convex-algorithm-dca"&gt;
 56. &lt;a href="https://arxiv.org/abs/2510.16569"&gt;On the convergence rate of the boosted Difference-of-Convex Algorithm (DCA)&lt;/a&gt;&lt;span class="heading__anchor"&gt; &lt;a href="#56-on-the-convergence-rate-of-the-boosted-difference-of-convex-algorithm-dca"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Hadi Abbaszadehpeivasti, Etienne de Klerk, Adrien Taylor&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; The difference-of-convex algorithm (DCA) is a well-established nonlinear programming technique that solves successive convex optimization problems. These sub-problems are obtained from the difference-of-convex~(DC) decompositions of the objective and constraint functions. We investigate the worst-case performance of the unconstrained DCA, with and without boosting, where boosting simply performs an additional step in the direction generated by the usual DCA method. We show that, for certain classes of DC decompositions, the boosted DCA is provably better in the worst-case than the usual DCA. While several numerical studies have reported that boosted DCA outperforms classical DCA, a theoretical explanation for this behavior has, to the best of our knowledge, not been given until now. Our proof technique relies on semidefinite programming (SDP) performance estimation&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL&lt;/strong&gt;: &lt;a href="https://arxiv.org/abs/2510.16569"&gt;https://arxiv.org/abs/2510.16569&lt;/a&gt;&lt;/p&gt;
&lt;h2 class="heading" id="55-global-solution-algorithms-for-dc-programming-via-polyhedral-approximations-of-convex-functions"&gt;
 55. &lt;a href="https://link.springer.com/article/10.1007/s10898-025-01535-z"&gt;Global solution algorithms for DC programming via polyhedral approximations of convex functions&lt;/a&gt;&lt;span class="heading__anchor"&gt; &lt;a href="#55-global-solution-algorithms-for-dc-programming-via-polyhedral-approximations-of-convex-functions"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Fahaar M. Pirani &amp;amp; Firdevs Ulus&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; We consider difference of convex (DC) programming problems and propose three algorithms to solve them globally. The main working mechanism of the proposed algorithms is to generate polyhedral underestimators to convex functions. Two of these algorithms generate a ‘fine’ polyhedral approximation of the first convex component over the compact feasible region of the DC programming problem. We prove the finiteness of these algorithms, establish the convergence rate of one of them. Moreover, we show that using the polyhedral approximation of the first component, it is possible to compute an approximate global solution of the corresponding DC programming problem without further computational effort. The third algorithm also computes a polyhedral underestimator of the first component of the DC function. Different from the first two algorithms, the third algorithm approximates it locally until finding an approximate global solution to the DC programming problem. It is shown that for any positive approximation error, the third algorithm stops after finitely many iterations. Computational results based on some test instances from the literature are provided.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL&lt;/strong&gt;: &lt;a href="https://link.springer.com/article/10.1007/s10898-025-01535-z"&gt;https://link.springer.com/article/10.1007/s10898-025-01535-z&lt;/a&gt;&lt;/p&gt;
&lt;h2 class="heading" id="54-improved-rates-for-stochastic-variance-reduced-difference-of-convex-algorithms"&gt;
 54. &lt;a href="https://arxiv.org/abs/2509.11657"&gt;Improved Rates for Stochastic Variance-Reduced Difference-of-Convex Algorithms&lt;/a&gt;&lt;span class="heading__anchor"&gt; &lt;a href="#54-improved-rates-for-stochastic-variance-reduced-difference-of-convex-algorithms"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Anh Duc Nguyen, Alp Yurtsever, Suvrit Sra, Kim-Chuan Toh&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; In this work, we propose and analyze DCA-PAGE, a novel algorithm that integrates the difference-of-convex algorithm (DCA) with the ProbAbilistic Gradient Estimator (PAGE) to solve structured nonsmooth difference-of-convex programs. In the finite-sum setting, our method achieves a gradient computation complexity of $O(N + N^{1/2}\varepsilon^{-2})$ with sample size $N$, surpassing the previous best-known complexity of $O(N + N^{2/3}\varepsilon^{-2})$ for stochastic variance-reduced (SVR) DCA methods. Furthermore, DCA-PAGE readily extends to online settings with a similar optimal gradient computation complexity $O(b + b^{1/2}\varepsilon^{-2})$ with batch size $b$, a significant advantage over existing SVR DCA approaches that only work for the finite-sum setting. We further refine our analysis with a gap function, which enables us to obtain comparable convergence guarantees under milder assumptions.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Comment&lt;/strong&gt;: Accepted at IEEE Conference on Decision and Control (IEEE CDC 2025)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL&lt;/strong&gt;: &lt;a href="https://arxiv.org/pdf/2509.11657"&gt;https://arxiv.org/pdf/2509.11657&lt;/a&gt;&lt;/p&gt;
&lt;h2 class="heading" id="53-new-algorithms-for-maximizing-the-difference-of-convex-functions"&gt;
 53. &lt;a href="https://optimization-online.org/wp-content/uploads/2025/04/comaxdc1.pdf"&gt;New Algorithms for maximizing the difference of convex functions&lt;/a&gt;&lt;span class="heading__anchor"&gt; &lt;a href="#53-new-algorithms-for-maximizing-the-difference-of-convex-functions"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Aharon Ben-Tal, Luba Tetruashvili&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; Maximizing the difference of 2 convex functions over a convex feasible set (the so called
DCA problem) is a hard problem. There is a large number of publications addressing this
problem. Many of them are variations of widely used DCA algorithm [20]. The success of
this algorithm to reach a good approximation of a global optimum, depends crucially on the
choice of its starting point. In the algorithm developed in our paper MDCF (Maximizing the
Difference of Convex Functions) a major effort is to generate a good starting point. This is
obtained by using the COMAX algorithm for maximizing a convex function [6]. The solution
found by COMAX is a basis for obtaining a good strating point for MDCF.
Another contribution of the paper is the algorithm for solving problems with an indefinite
quadratic objective function and compact and convex feasible set. The problem is first
converted to maximizing a difference of convex quadratic functions. The new algorithm
QMDCF is a specific adaptation of MDCF to this case.
The performance of the two new algorithms developed in the paper is tested numerically,
and results are compared to the performance of classical DCA, and some other algorithms.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt; &lt;a href="https://optimization-online.org/2025/04/new-algorithms-for-maximizing-the-difference-of-convex-functions/"&gt;https://optimization-online.org/2025/04/new-algorithms-for-maximizing-the-difference-of-convex-functions/&lt;/a&gt;&lt;/p&gt;
&lt;h2 class="heading" id="52-a-progressive-decoupling-algorithm-for-minimizing-the-difference-of-convex-and-weakly-convex-functions"&gt;
 52. &lt;a href="https://link.springer.com/article/10.1007/s10957-024-02574-4"&gt;A progressive decoupling algorithm for minimizing the difference of convex and weakly convex functions&lt;/a&gt;&lt;span class="heading__anchor"&gt; &lt;a href="#52-a-progressive-decoupling-algorithm-for-minimizing-the-difference-of-convex-and-weakly-convex-functions"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Welington de Oliveira &amp;amp; João Carlos de Oliveira Souza&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; Commonly, decomposition and splitting techniques for optimization problems strongly depend on convexity. Implementable splitting methods for nonconvex and nonsmooth optimization problems are scarce and often lack convergence guarantees. Among the few exceptions is the Progressive Decoupling Algorithm (PDA), which has local convergence should convexity be elicitable. In this work, we furnish PDA with a descent test and extend the method to accommodate a broad class of nonsmooth optimization problems with non-elicitable convexity. More precisely, we focus on the problem of minimizing the difference of convex and weakly convex functions over a linear subspace. This framework covers, in particular, a family of stochastic programs with nonconvex recourse and statistical estimation problems for supervised learning.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt; &lt;a href="https://link.springer.com/article/10.1007/s10957-024-02574-4"&gt;https://link.springer.com/article/10.1007/s10957-024-02574-4&lt;/a&gt;&lt;/p&gt;
&lt;h2 class="heading" id="51-an-inexact-proximal-framework-for-nonsmooth-riemannian-difference-of-convex-optimization-arxiv250908561"&gt;
 51. An Inexact Proximal Framework for Nonsmooth Riemannian Difference-of-Convex Optimization [arXiv:2509.08561]&lt;span class="heading__anchor"&gt; &lt;a href="#51-an-inexact-proximal-framework-for-nonsmooth-riemannian-difference-of-convex-optimization-arxiv250908561"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Bo Jiang, Meng Xu, Xingju Cai, Ya-Feng Liu&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; Nonsmooth Riemannian optimization has attracted increasing attention, especially in problems with sparse structures. While existing formulations typically involve convex nonsmooth terms, incorporating nonsmooth difference-of-convex (DC) penalties can enhance recovery accuracy. In this paper, we study a class of nonsmooth Riemannian optimization problems whose objective is the sum of a smooth function and a nonsmooth DC term. We establish, for the first time in the manifold setting, the equivalence between such DC formulations (with suitably chosen nonsmooth DC terms) and their $\ell_0$-regularized or $\ell_0$-constrained counterparts. To solve these problems, we propose an inexact Riemannian proximal DC (iRPDC) algorithmic framework, which returns an $\epsilon$-Riemannian critical point within $\mathcal{O}(\epsilon^{-2})$ outer iterations. Within this framework, we develop several practical algorithms based on different subproblem solvers. Among them, one achieves an overall iteration complexity of $\mathcal{O}(\epsilon^{-3})$, which matches the best-known bound in the literature. In contrast, existing algorithms either lack provable overall complexity or require $\mathcal{O}(\epsilon^{-3})$ iterations in both outer and overall complexity. A notable feature of the iRPDC algorithmic framework is a novel inexactness criterion that not only enables efficient subproblem solutions via first-order methods but also facilitates a linesearch procedure that adaptively captures the local curvature. Numerical results on sparse principal component analysis demonstrate the modeling flexibility of the DC formulaton and the competitive performance of the proposed algorithmic framework.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt; &lt;a href="https://arxiv.org/abs/2509.08561"&gt;https://arxiv.org/abs/2509.08561&lt;/a&gt;&lt;/p&gt;
&lt;h2 class="heading" id="50-tight-convergence-rates-in-gradient-mapping-for-the-difference-of-convex-algorithm-arxiv250601791"&gt;
 50. Tight Convergence Rates in Gradient Mapping for the Difference-of-Convex Algorithm [arXiv:2506.01791]&lt;span class="heading__anchor"&gt; &lt;a href="#50-tight-convergence-rates-in-gradient-mapping-for-the-difference-of-convex-algorithm-arxiv250601791"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Teodor Rotaru, Panagiotis Patrinos, François Glineur&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; We establish new theoretical convergence guarantees for the difference-of-convex algorithm (DCA), where the second function is allowed to be weakly-convex, measuring progress via composite gradient mapping. Based on a tight analysis of two iterations of DCA, we identify six parameter regimes leading to sublinear convergence rates toward critical points and establish those rates by proving adapted descent lemmas. We recover existing rates for the standard difference-of-convex decompositions of nonconvex-nonconcave functions, while for all other curvature settings our results are new, complementing recently obtained rates on the gradient residual. Three of our sublinear rates are tight for any number of DCA iterations, while for the other three regimes we conjecture exact rates, using insights from the tight analysis of gradient descent and numerical validation using the performance estimation methodology. Finally, we show how the equivalence between proximal gradient descent (PGD) and DCA allows the derivation of exact PGD rates for any constant stepsize.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt; &lt;a href="https://arxiv.org/abs/2506.01791"&gt;https://arxiv.org/abs/2506.01791&lt;/a&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="49-enforcing-fairness-where-it-matters-an-approach-based-on-difference-of-convex-constraints-arxiv250512530"&gt;
 49. Enforcing Fairness Where It Matters: An Approach Based on Difference-of-Convex Constraints [arXiv:2505.12530]&lt;span class="heading__anchor"&gt; &lt;a href="#49-enforcing-fairness-where-it-matters-an-approach-based-on-difference-of-convex-constraints-arxiv250512530"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Yutian He, Yankun Huang, Yao Yao, Qihang Lin&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; Fairness in machine learning has become a critical concern, particularly in high-stakes applications. Existing approaches often focus on achieving full fairness across all score ranges generated by predictive models, ensuring fairness in both high and low-scoring populations. However, this stringent requirement can compromise predictive performance and may not align with the practical fairness concerns of stakeholders. In this work, we propose a novel framework for building partially fair machine learning models, which enforce fairness within a specific score range of interest, such as the middle range where decisions are most contested, while maintaining flexibility in other regions. We introduce two statistical metrics to rigorously evaluate partial fairness within a given score range, such as the top 20%-40% of scores. To achieve partial fairness, we propose an in-processing method by formulating the model training problem as constrained optimization with difference-of-convex constraints, which can be solved by an inexact difference-of-convex algorithm (IDCA). We provide the complexity analysis of IDCA for finding a nearly KKT point. Through numerical experiments on real-world datasets, we demonstrate that our framework achieves high predictive performance while enforcing partial fairness where it matters most.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="48-a-smoothing-moving-balls-approximation-method-for-a-class-of-conic-constrained-difference-of-convex-optimization-problems-arxiv250512314"&gt;
 48. A smoothing moving balls approximation method for a class of conic-constrained difference-of-convex optimization problems [arXiv:2505.12314]&lt;span class="heading__anchor"&gt; &lt;a href="#48-a-smoothing-moving-balls-approximation-method-for-a-class-of-conic-constrained-difference-of-convex-optimization-problems-arxiv250512314"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Jiefeng Xu, Ting Kei Pong, Nung-sing Sze&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; In this paper, we consider the problem of minimizing a difference-of-convex objective over a nonlinear conic constraint, where the cone is closed, convex, pointed and has a nonempty interior. We assume that the support function of a compact base of the polar cone exhibits a majorizing smoothing approximation, a condition that is satisfied by widely studied cones such as $\mathbb{R}^m_-$ and ${\cal S}^m_-$. Leveraging this condition, we reformulate the conic constraint equivalently as a single constraint involving the aforementioned support function, and adapt the moving balls approximation (MBA) method for its solution. In essence, in each iteration of our algorithm, we approximate the support function by a smooth approximation function and apply one MBA step. The subproblems that arise in our algorithm always involve only one single inequality constraint, and can thus be solved efficiently via one-dimensional root-finding procedures. We design explicit rules to evolve the smooth approximation functions from iteration to iteration and establish the corresponding iteration complexity for obtaining an $ε$-Karush-Kuhn-Tucker point. In addition, in the convex setting, we establish convergence of the sequence generated, and study its local convergence rate under a standard Hölderian growth condition. Finally, we illustrate numerically the effects of different rules of evolving the smooth approximation functions on the rate of convergence.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt; &lt;a href="https://arxiv.org/abs/2505.12314"&gt;https://arxiv.org/abs/2505.12314&lt;/a&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="47-a-preconditioned-difference-of-convex-functions-algorithm-with-extrapolation-and-line-search-arxiv250511914"&gt;
 47. A preconditioned difference of convex functions algorithm with extrapolation and line search [arXiv:2505.11914]&lt;span class="heading__anchor"&gt; &lt;a href="#47-a-preconditioned-difference-of-convex-functions-algorithm-with-extrapolation-and-line-search-arxiv250511914"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Ran Zhang, Hongpeng Sun&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; This paper proposes a novel proximal difference-of-convex (DC) algorithm enhanced with extrapolation and aggressive non-monotone line search for solving non-convex optimization problems. We introduce an adaptive conservative update strategy of the extrapolation parameter determined by a computationally efficient non-monotone line search. The core of our algorithm is to unite the update of the extrapolation parameter with the step size of the non-monotone line search interactively. The global convergence of the two proposed algorithms is established through the Kurdyka-Łojasiewicz properties, ensuring convergence within a preconditioned framework for linear equations. Numerical experiments on two general non-convex problems: SCAD-penalized binary classification and graph-based Ginzburg-Landau image segmentation models, demonstrate the proposed method&amp;rsquo;s high efficiency compared to existing DC algorithms both in convergence rate and solution accuracy.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="46-contractive-difference-of-convex-algorithms-arxiv250510800"&gt;
 46. Contractive difference-of-convex algorithms [arXiv:2505.10800]&lt;span class="heading__anchor"&gt; &lt;a href="#46-contractive-difference-of-convex-algorithms-arxiv250510800"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Songnian He, Qiao-Li Dong, Michael Th. Rassias&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; The difference-of-convex algorithm (DCA) and its variants are the most popular methods to solve the difference-of-convex optimization problem. Each iteration of them is reduced to a convex optimization problem, which generally needs to be solved by iterative methods such as proximal gradient algorithm. However, these algorithms essentially belong to some iterative methods of fixed point problems of averaged mappings, and their convergence speed is generally slow. Furthermore, there is seldom research on the termination rule of these iterative algorithms solving the subproblem of DCA. To overcome these defects, we ffrstly show that the subproblem of the linearized proximal method (LPM) in each iteration is equal to the ffxed point problem of a contraction. Secondly, by using Picard iteration to approximately solve the subproblem of LPM in each iteration, we propose a contractive difference-ofconvex algorithm (cDCA) where an adaptive termination rule is presented. Both global subsequential convergence and global convergence of the whole sequence of cDCA are established. Finally, preliminary results from numerical experiments are promising.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt; &lt;a href="https://link.springer.com/article/10.1007/s10957-025-02689-2"&gt;https://link.springer.com/article/10.1007/s10957-025-02689-2&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Journal&lt;/strong&gt;: Journal of Optimization Theory and Applications&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="45-a-full-splitting-algorithm-for-structured-difference-of-convex-programs-arxiv250502588"&gt;
 45. A full splitting algorithm for structured difference-of-convex programs [arXiv:2505.02588]&lt;span class="heading__anchor"&gt; &lt;a href="#45-a-full-splitting-algorithm-for-structured-difference-of-convex-programs-arxiv250502588"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Radu Ioan Bot, Rossen Nenov, Min Tao&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; In this paper, we study a class of nonconvex and nonsmooth structured difference-of-convex (DC) programs, which contain in the convex part the sum of a nonsmooth linearly composed convex function and a differentiable function, and in the concave part another nonsmooth linearly composed convex function. Among the various areas in which such problems occur, we would like to mention in particular the recovery of sparse signals. We propose an adaptive double-proximal, full-splitting algorithm with a moving center approach in the final subproblem, which addresses the challenge of evaluating compositions by decoupling the linear operator from the nonsmooth component. We establish the subsequential convergence of the generated sequence of iterates to an approximate stationary point and prove its global convergence under the Kurdyka-Łojasiewicz property. We also discuss the tightness of the convergence results and provide insights into the rationale for seeking an approximate KKT point. This is illustrated by constructing a counterexample showing that the algorithm can diverge when seeking exact solutions. Finally, we present a practical version of the algorithm that incorporates a nonmonotone line search, which significantly improves the convergence performance.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="44-optimization-over-trained-neural-networks-difference-of-convex-algorithm-and-application-to-data-center-scheduling-arxiv250317506"&gt;
 44. Optimization over Trained Neural Networks: Difference-of-Convex Algorithm and Application to Data Center Scheduling [arXiv:2503.17506]&lt;span class="heading__anchor"&gt; &lt;a href="#44-optimization-over-trained-neural-networks-difference-of-convex-algorithm-and-application-to-data-center-scheduling-arxiv250317506"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Xinwei Liu, Vladimir Dvorkin&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; When solving decision-making problems with mathematical optimization, some constraints or objectives may lack analytic expressions but can be approximated from data. When an approximation is made by neural networks, the underlying problem becomes optimization over trained neural networks. Despite recent improvements with cutting planes, relaxations, and heuristics, the problem remains difficult to solve in practice. We propose a new solution based on a bilinear problem reformulation that penalizes ReLU constraints in the objective function. This reformulation makes the problem amenable to efficient difference-of-convex algorithms (DCA), for which we propose a principled approach to penalty selection that facilitates convergence to stationary points of the original problem. We apply the DCA to the problem of the least-cost allocation of data center electricity demand in a power grid, reporting significant savings in congested cases.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="43-tight-analysis-of-difference-of-convex-algorithm-dca-improves-convergence-rates-for-proximal-gradient-descent-arxiv250304486"&gt;
 43. Tight Analysis of Difference-of-Convex Algorithm (DCA) Improves Convergence Rates for Proximal Gradient Descent [arXiv:2503.04486]&lt;span class="heading__anchor"&gt; &lt;a href="#43-tight-analysis-of-difference-of-convex-algorithm-dca-improves-convergence-rates-for-proximal-gradient-descent-arxiv250304486"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Teodor Rotaru, Panagiotis Patrinos, François Glineur&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; We investigate a difference-of-convex (DC) formulation where the second term is allowed to be weakly convex. We examine the precise behavior of a single iteration of the difference-of-convex algorithm (DCA), providing a tight characterization of the objective function decrease, distinguishing between six distinct parameter regimes. Our proofs, inspired by the performance estimation framework, are notably simplified compared to related prior research. We subsequently derive sublinear convergence rates for the DCA towards critical points, assuming at least one of the functions is smooth. Additionally, we explore the underexamined equivalence between proximal gradient descent (PGD) and DCA iterations, demonstrating how DCA, a parameter-free algorithm, without the need for a stepsize, serves as a tool for studying the exact convergence rates of PGD.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="42-abstract-nonautonomous-difference-inclusions-in-locally-convex-spaces-arxiv250205184"&gt;
 42. Abstract nonautonomous difference inclusions in locally convex spaces [arXiv:2502.05184]&lt;span class="heading__anchor"&gt; &lt;a href="#42-abstract-nonautonomous-difference-inclusions-in-locally-convex-spaces-arxiv250205184"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Marko Kostic&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; In this paper, we consider abstract nonautonomous difference inclusions in locally convex spaces with integer order differences. We particularly analyze the existence and uniqueness of almost periodic type solutions to abstract nonautonomous difference inclusions. Our results seem to be completely new even in the Banach space setting.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="41-learning-difference-of-convex-regularizers-for-inverse-problems-a-flexible-framework-with-theoretical-guarantees-arxiv250200240"&gt;
 41. Learning Difference-of-Convex Regularizers for Inverse Problems: A Flexible Framework with Theoretical Guarantees [arXiv:2502.00240]&lt;span class="heading__anchor"&gt; &lt;a href="#41-learning-difference-of-convex-regularizers-for-inverse-problems-a-flexible-framework-with-theoretical-guarantees-arxiv250200240"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Yasi Zhang, Oscar Leong&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; Learning effective regularization is crucial for solving ill-posed inverse problems, which arise in a wide range of scientific and engineering applications. While data-driven methods that parameterize regularizers using deep neural networks have demonstrated strong empirical performance, they often result in highly nonconvex formulations that lack theoretical guarantees. Recent work has shown that incorporating structured nonconvexity into neural network-based regularizers, such as weak convexity, can strike a balance between empirical performance and theoretical tractability. In this paper, we demonstrate that a broader class of nonconvex functions, difference-of-convex (DC) functions, can yield improved empirical performance while retaining strong convergence guarantees. The DC structure enables the use of well-established optimization algorithms, such as the Difference-of-Convex Algorithm (DCA) and a Proximal Subgradient Method (PSM), which extend beyond standard gradient descent. Furthermore, we provide theoretical insights into the conditions under which optimal regularizers can be expressed as DC functions. Extensive experiments on computed tomography (CT) reconstruction tasks show that our approach achieves strong performance across sparse and limited-view settings, consistently outperforming other weakly supervised learned regularizers. Our code is available at \url{https://github.com/YasminZhang/ADCR}.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="40-an-inexact-boosted-difference-of-convex-algorithm-for-nondifferentiable-functions-arxiv241205697"&gt;
 40. An Inexact Boosted Difference of Convex Algorithm for Nondifferentiable Functions [arXiv:2412.05697]&lt;span class="heading__anchor"&gt; &lt;a href="#40-an-inexact-boosted-difference-of-convex-algorithm-for-nondifferentiable-functions-arxiv241205697"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Orizon P. Ferreira, Boris S. Mordukhovich, Wilkreffy M. S. Santos, João Carlos O. Souza&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; In this paper, we introduce an inexact approach to the Boosted Difference of Convex Functions Algorithm (BDCA) for solving nonconvex and nondifferentiable problems involving the difference of two convex functions (DC functions). Specifically, when the first DC component is differentiable and the second may be nondifferentiable, BDCA utilizes the solution from the subproblem of the DC Algorithm (DCA) to define a descent direction for the objective function. A monotone linesearch is then performed to find a new point that improves the objective function relative to the subproblem solution. This approach enhances the performance of DCA. However, if the first DC component is nondifferentiable, the BDCA direction may become an ascent direction, rendering the monotone linesearch ineffective. To address this, we propose an Inexact nonmonotone Boosted Difference of Convex Algorithm (InmBDCA). This algorithm incorporates two main features of inexactness: First, the subproblem therein is solved approximately allowing us for a controlled relative error tolerance in defining the linesearch direction. Second, an inexact nonmonotone linesearch scheme is used to determine the step size for the next iteration. Under suitable assumptions, we demonstrate that InmBDCA is well-defined, with any accumulation point of the sequence generated by InmBDCA being a critical point of the problem. We also provide iteration-complexity bounds for the algorithm. Numerical experiments show that InmBDCA outperforms both the nonsmooth BDCA (nmBDCA) and the monotone version of DCA in practical scenarios.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="39-a-preconditioned-second-order-convex-splitting-algorithm-with-a-difference-of-varying-convex-functions-and-line-search-arxiv241107661"&gt;
 39. A preconditioned second-order convex splitting algorithm with a difference of varying convex functions and line search [arXiv:2411.07661]&lt;span class="heading__anchor"&gt; &lt;a href="#39-a-preconditioned-second-order-convex-splitting-algorithm-with-a-difference-of-varying-convex-functions-and-line-search-arxiv241107661"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Xinhua Shen, Zaijiu Shang, Hongpeng Sun&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; This paper introduces a preconditioned convex splitting algorithm enhanced with line search techniques for nonconvex optimization problems. The algorithm utilizes second-order backward differentiation formulas (BDF) for the implicit and linear components and the Adams-Bashforth scheme for the nonlinear and explicit parts of the gradient flow in variational functions. The proposed algorithm, resembling a generalized difference-of-convex-function approach, involves a changing set of convex functions in each iteration. It integrates the Armijo line search strategy to improve performance. The study also discusses classical preconditioners such as symmetric Gauss-Seidel, Jacobi, and Richardson within this context. The global convergence of the algorithm is established through the Kurdyka-Łojasiewicz properties, ensuring convergence within a finite number of preconditioned iterations. Numerical experiments demonstrate the superiority of the proposed second-order convex splitting with line search over conventional difference-of-convex-function algorithms.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="38-inertial-proximal-difference-of-convex-algorithm-with-convergent-bregman-plug-and-play-for-nonconvex-imaging-arxiv240903262"&gt;
 38. Inertial Proximal Difference-of-Convex Algorithm with Convergent Bregman Plug-and-Play for Nonconvex Imaging [arXiv:2409.03262]&lt;span class="heading__anchor"&gt; &lt;a href="#38-inertial-proximal-difference-of-convex-algorithm-with-convergent-bregman-plug-and-play-for-nonconvex-imaging-arxiv240903262"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Tsz Ching Chow, Chaoyan Huang, Zhongming Wu, Tieyong Zeng, Angelica I. Aviles-Rivero&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; Imaging tasks are typically tackled using a structured optimization framework. This paper delves into a class of algorithms for difference-of-convex (DC) structured optimization, focusing on minimizing a DC function along with a possibly nonconvex function. Existing DC algorithm (DCA) versions often fail to effectively handle nonconvex functions or exhibit slow convergence rates. We propose a novel inertial proximal DC algorithm in Bregman geometry, named iBPDCA, designed to address nonconvex terms and enhance convergence speed through inertial techniques. We provide a detailed theoretical analysis, establishing both subsequential and global convergence of iBPDCA via the Kurdyka-Łojasiewicz property. Additionally, we introduce a Plug-and-Play variant, PnP-iBPDCA, which employs a deep neural network-based prior for greater flexibility and robustness while ensuring theoretical convergence. We also establish that the Gaussian gradient step denoiser used in our method is equivalent to evaluating the Bregman proximal operator for an implicitly weakly convex functional. We extensively validate our method on Rician noise and phase retrieval. We demonstrate that iBPDCA surpasses existing state-of-the-art methods.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="37-constructing-tight-quadratic-relaxations-for-global-optimization-ii-underestimating-difference-of-convex-dc-functions-arxiv240813058"&gt;
 37. Constructing Tight Quadratic Relaxations for Global Optimization: II. Underestimating Difference-of-Convex (D.C.) Functions [arXiv:2408.13058]&lt;span class="heading__anchor"&gt; &lt;a href="#37-constructing-tight-quadratic-relaxations-for-global-optimization-ii-underestimating-difference-of-convex-dc-functions-arxiv240813058"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; William R. Strahl, Arvind U. Raghunathan, Nikolaos V. Sahinidis, Chrysanthos E. Gounaris&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; Recent advances in the efficiency and robustness of algorithms solving convex quadratically constrained quadratic programming (QCQP) problems motivate developing techniques for creating convex quadratic relaxations that, although more expensive to compute, provide tighter bounds than their classical linear counterparts. In the first part of this two-paper series [Strahl et al., 2024], we developed a cutting plane algorithm to construct convex quadratic underestimators for twice-differentiable convex functions, which we extend here to address the case of non-convex difference-of-convex (d.c.) functions as well. Furthermore, we generalize our approach to consider a hierarchy of quadratic forms, thereby allowing the construction of even tighter underestimators. On a set of d.c. functions extracted from benchmark libraries, we demonstrate noteworthy reduction in the hypervolume between our quadratic underestimators and linear ones constructed at the same points. Additionally, we construct convex QCQP relaxations at the root node of a spatial branch-and-bound tree for a set of systematically created d.c. optimization problems in up to four dimensions, and we show that our relaxations reduce the gap between the lower bound computed by the state-of-the-art global optimization solver BARON and the optimal solution by an excess of 90%, on average.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="36-distributed-difference-of-convex-optimization-arxiv240716728"&gt;
 36. Distributed Difference of Convex Optimization [arXiv:2407.16728]&lt;span class="heading__anchor"&gt; &lt;a href="#36-distributed-difference-of-convex-optimization-arxiv240716728"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Vivek Khatana, Murti V. Salapaka&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; In this article, we focus on solving a class of distributed optimization problems involving $n$ agents with the local objective function at every agent $i$ given by the difference of two convex functions $f_i$ and $g_i$ (difference-of-convex (DC) form), where $f_i$ and $g_i$ are potentially nonsmooth. The agents communicate via a directed graph containing $n$ nodes. We create smooth approximations of the functions $f_i$ and $g_i$ and develop a distributed algorithm utilizing the gradients of the smooth surrogates and a finite-time approximate consensus protocol. We term this algorithm as DDC-Consensus. The developed DDC-Consensus algorithm allows for non-symmetric directed graph topologies and can be synthesized distributively. We establish that the DDC-Consensus algorithm converges to a stationary point of the nonconvex distributed optimization problem. The performance of the DDC-Consensus algorithm is evaluated via a simulation study to solve a nonconvex DC-regularized distributed least squares problem. The numerical results corroborate the efficacy of the proposed algorithm.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="35-an-inexact-bregman-proximal-difference-of-convex-algorithm-with-two-types-of-relative-stopping-criteria-arxiv240604646"&gt;
 35. An Inexact Bregman Proximal Difference-of-Convex Algorithm with Two Types of Relative Stopping Criteria [arXiv:2406.04646]&lt;span class="heading__anchor"&gt; &lt;a href="#35-an-inexact-bregman-proximal-difference-of-convex-algorithm-with-two-types-of-relative-stopping-criteria-arxiv240604646"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Lei Yang, Jingjing Hu, Kim-Chuan Toh&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; In this paper, we consider a class of difference-of-convex (DC) optimization problems, which require only a weaker restricted $L$-smooth adaptable property on the smooth part of the objective function, instead of the standard global Lipschitz gradient continuity assumption. Such problems are prevalent in many contemporary applications such as compressed sensing, statistical regression, and machine learning, and can be solved by a general Bregman proximal DC algorithm (BPDCA). However, the existing BPDCA is developed based on the stringent requirement that the involved subproblems must be solved exactly, which is often impractical and limits the applicability of the BPDCA. To facilitate the practical implementations and wider applications of the BPDCA, we develop an inexact Bregman proximal difference-of-convex algorithm (iBPDCA) by incorporating two types of relative-type stopping criteria for solving the subproblems. The proposed inexact framework has considerable flexibility to encompass many existing exact and inexact methods, and can accommodate different types of errors that may occur when solving the subproblem. This enables the potential application of our inexact framework across different DC decompositions to facilitate the design of a more efficient DCA scheme in practice. The global subsequential convergence and the global sequential convergence of our iBPDCA are established under suitable conditions including the Kurdyka-Łojasiewicz property. Some numerical experiments are conducted to show the superior performance of our iBPDCA in comparison to existing algorithms. These results also empirically validate the necessity and significance of developing different types of stopping criteria to facilitate the efficient computation of the subproblem in each iteration of our iBPDCA.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="34-single-loop-stochastic-algorithms-for-difference-of-max-structured-weakly-convex-functions-arxiv240518577"&gt;
 34. Single-Loop Stochastic Algorithms for Difference of Max-Structured Weakly Convex Functions [arXiv:2405.18577]&lt;span class="heading__anchor"&gt; &lt;a href="#34-single-loop-stochastic-algorithms-for-difference-of-max-structured-weakly-convex-functions-arxiv240518577"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Quanqi Hu, Qi Qi, Zhaosong Lu, Tianbao Yang&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; In this paper, we study a class of non-smooth non-convex problems in the form of $\min_{x}[\max_{y\in Y}φ(x, y) - \max_{z\in Z}ψ(x, z)]$, where both $Φ(x) = \max_{y\in Y}φ(x, y)$ and $Ψ(x)=\max_{z\in Z}ψ(x, z)$ are weakly convex functions, and $φ(x, y), ψ(x, z)$ are strongly concave functions in terms of $y$ and $z$, respectively. It covers two families of problems that have been studied but are missing single-loop stochastic algorithms, i.e., difference of weakly convex functions and weakly convex strongly-concave min-max problems. We propose a stochastic Moreau envelope approximate gradient method dubbed SMAG, the first single-loop algorithm for solving these problems, and provide a state-of-the-art non-asymptotic convergence rate. The key idea of the design is to compute an approximate gradient of the Moreau envelopes of $Φ, Ψ$ using only one step of stochastic gradient update of the primal and dual variables. Empirically, we conduct experiments on positive-unlabeled (PU) learning and partial area under ROC curve (pAUC) optimization with an adversarial fairness regularizer to validate the effectiveness of our proposed algorithms.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="33-improved-convergence-rates-for-the-difference-of-convex-algorithm-arxiv240316864"&gt;
 33. Improved convergence rates for the Difference-of-Convex algorithm [arXiv:2403.16864]&lt;span class="heading__anchor"&gt; &lt;a href="#33-improved-convergence-rates-for-the-difference-of-convex-algorithm-arxiv240316864"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Teodor Rotaru, Panagiotis Patrinos, François Glineur&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; We consider a difference-of-convex formulation where one of the terms is allowed to be hypoconvex (or weakly convex). We first examine the precise behavior of a single iteration of the Difference-of-Convex algorithm (DCA), giving a tight characterization of the objective function decrease. This requires distinguishing between eight distinct parameter regimes. Our proofs are inspired by the performance estimation framework, but are much simplified compared to similar previous work.
We then derive sublinear DCA convergence rates towards critical points, distinguishing between cases where at least one of the functions is smooth and where both functions are nonsmooth. We conjecture the tightness of these rates for four parameter regimes, based on strong numerical evidence obtained via performance estimation, as well as the leading constant in the asymptotic sublinear rate for two more regimes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="32-an-efficient-difference-of-convex-solver-for-privacy-funnel-arxiv240304778"&gt;
 32. An Efficient Difference-of-Convex Solver for Privacy Funnel [arXiv:2403.04778]&lt;span class="heading__anchor"&gt; &lt;a href="#32-an-efficient-difference-of-convex-solver-for-privacy-funnel-arxiv240304778"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Teng-Hui Huang, Hesham El Gamal&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; We propose an efficient solver for the privacy funnel (PF) method, leveraging its difference-of-convex (DC) structure. The proposed DC separation results in a closed-form update equation, which allows straightforward application to both known and unknown distribution settings. For known distribution case, we prove the convergence (local stationary points) of the proposed non-greedy solver, and empirically show that it outperforms the state-of-the-art approaches in characterizing the privacy-utility trade-off. The insights of our DC approach apply to unknown distribution settings where labeled empirical samples are available instead. Leveraging the insights, our alternating minimization solver satisfies the fundamental Markov relation of PF in contrast to previous variational inference-based solvers. Empirically, we evaluate the proposed solver with MNIST and Fashion-MNIST datasets. Our results show that under a comparable reconstruction quality, an adversary suffers from higher prediction error from clustering our compressed codes than that with the compared methods. Most importantly, our solver is independent to private information in inference phase contrary to the baselines.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="31-approximation-analysis-for-the-minimization-problem-of-difference-of-convex-functions-with-moreau-envelopes-arxiv240213461"&gt;
 31. Approximation analysis for the minimization problem of difference-of-convex functions with Moreau envelopes [arXiv:2402.13461]&lt;span class="heading__anchor"&gt; &lt;a href="#31-approximation-analysis-for-the-minimization-problem-of-difference-of-convex-functions-with-moreau-envelopes-arxiv240213461"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Yan Tang, Shiqing Zhang&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; In this work the minimization problem for the difference of convex (DC) functions is studied by using Moreau envelopes and the descent method with Moreau gradient is employed to approximate the numerical solution. The main regularization idea in this work is inspired by Hiriart-Urruty [14], Moudafi[17], regularize the components of the DC problem by adapting the different parameters and strategic matrices flexibly to evaluate the whole DC problem. It is shown that the inertial gradient method as well as the classic gradient descent scheme tend towards an approximation stationary point of the original problem.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="30-the-boosted-difference-of-convex-functions-algorithm-for-value-at-risk-constrained-portfolio-optimization-arxiv240209194"&gt;
 30. The Boosted Difference of Convex Functions Algorithm for Value-at-Risk Constrained Portfolio Optimization [arXiv:2402.09194]&lt;span class="heading__anchor"&gt; &lt;a href="#30-the-boosted-difference-of-convex-functions-algorithm-for-value-at-risk-constrained-portfolio-optimization-arxiv240209194"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Marah-Lisanne Thormann, Phan Tu Vuong, Alain B. Zemkoho&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; A highly relevant problem of modern finance is the design of Value-at-Risk (VaR) optimal portfolios. Due to contemporary financial regulations, banks and other financial institutions are tied to use the risk measure to control their credit, market and operational risks. For a portfolio with a discrete return distribution and finitely many scenarios, a Difference of Convex (DC) functions representation of the VaR can be derived. Wozabal (2012) showed that this yields a solution to a VaR constrained Markowitz style portfolio selection problem using the Difference of Convex Functions Algorithm (DCA). A recent algorithmic extension is the so-called Boosted Difference of Convex Functions Algorithm (BDCA) which accelerates the convergence due to an additional line search step. It has been shown that the BDCA converges linearly for solving non-smooth quadratic problems with linear inequality constraints. In this paper, we prove that the linear rate of convergence is also guaranteed for a piecewise linear objective function with linear equality and inequality constraints using the Kurdyka-Łojasiewicz property. An extended case study under consideration of best practices for comparing optimization algorithms demonstrates the superiority of the BDCA over the DCA for real-world financial market data. We are able to show that the results of the BDCA are significantly closer to the efficient frontier compared to the DCA. Due to the open availability of all data sets and code, this paper further provides a practical guide for transparent and easily reproducible comparisons of VaR constrained portfolio selection problems in Python.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="29-a-globally-convergent-algorithm-for-neural-network-parameter-optimization-based-on-difference-of-convex-functions-arxiv240107936"&gt;
 29. A Globally Convergent Algorithm for Neural Network Parameter Optimization Based on Difference-of-Convex Functions [arXiv:2401.07936]&lt;span class="heading__anchor"&gt; &lt;a href="#29-a-globally-convergent-algorithm-for-neural-network-parameter-optimization-based-on-difference-of-convex-functions-arxiv240107936"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Daniel Tschernutter, Mathias Kraus, Stefan Feuerriegel&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; We propose an algorithm for optimizing the parameters of single hidden layer neural networks. Specifically, we derive a blockwise difference-of-convex (DC) functions representation of the objective function. Based on the latter, we propose a block coordinate descent (BCD) approach that we combine with a tailored difference-of-convex functions algorithm (DCA). We prove global convergence of the proposed algorithm. Furthermore, we mathematically analyze the convergence rate of parameters and the convergence rate in value (i.e., the training loss). We give conditions under which our algorithm converges linearly or even faster depending on the local shape of the loss function. We confirm our theoretical derivations numerically and compare our algorithm against state-of-the-art gradient-based solvers in terms of both training loss and test loss.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="28-higher-order-tensor-methods-for-minimizing-difference-of-convex-functions-arxiv240105063"&gt;
 28. Higher-order tensor methods for minimizing difference of convex functions [arXiv:2401.05063]&lt;span class="heading__anchor"&gt; &lt;a href="#28-higher-order-tensor-methods-for-minimizing-difference-of-convex-functions-arxiv240105063"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Ion Necoara&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; Higher-order tensor methods were recently proposed for minimizing smooth convex and nonconvex functions. Higher-order algorithms accelerate the convergence of the classical first-order methods thanks to the higher-order derivatives used in the updates. The purpose of this paper is twofold. Firstly, to show that the higher-order algorithmic framework can be generalized and successfully applied to (nonsmooth) difference of convex functions, namely, those that can be expressed as the difference of two smooth convex functions and a possibly nonsmooth convex one. We also provide examples when the subproblem can be solved efficiently, even globally. Secondly, to derive a complete convergence analysis for our higher-order difference of convex functions (HO-DC) algorithm. In particular, we prove that any limit point of the HO-DC iterative sequence is a critical point of the problem under consideration, the corresponding objective value is monotonically decreasing and the minimum value of the norms of its subgradients converges globally to zero at a sublinear rate. The sublinear or linear convergence rates of the iterations are obtained under the Kurdyka-Lojasiewicz property.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="27-handling-nonlinearities-and-uncertainties-of-fed-batch-cultivations-with-difference-of-convex-functions-tube-mpc-arxiv231200847"&gt;
 27. Handling nonlinearities and uncertainties of fed-batch cultivations with difference of convex functions tube MPC [arXiv:2312.00847]&lt;span class="heading__anchor"&gt; &lt;a href="#27-handling-nonlinearities-and-uncertainties-of-fed-batch-cultivations-with-difference-of-convex-functions-tube-mpc-arxiv231200847"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Niels Krausch, Martin Doff-Sotta, Mark Canon, Peter Neubauer, Mariano Nicolas Cruz Bournazou&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; Bioprocesses are often characterized by nonlinear and uncertain dynamics. This poses particular challenges in the context of model predictive control (MPC). Several approaches have been proposed to solve this problem, such as robust or stochastic MPC, but they can be computationally expensive when the system is nonlinear. Recent advances in optimal control theory have shown that concepts from convex optimization, tube-based MPC, and difference of convex functions (DC) enable stable and robust online process control. The approach is based on systematic DC decompositions of the dynamics and successive linearizations around feasible trajectories. By convexity, the linearization errors can be bounded tightly and treated as bounded disturbances in a robust tube-based MPC framework. However, finding the DC composition can be a difficult task. To overcome this problem, we used a neural network with special convex structure to learn the dynamics in DC form and express the uncertainty sets using simplices to maximize the product formation rate of a cultivation with uncertain substrate concentration in the feed. The results show that this is a promising approach for computationally tractable data-driven robust MPC of bioprocesses.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="26-a-qualitative-difference-between-gradient-flows-of-convex-functions-in-finite--and-infinite-dimensional-hilbert-spaces-arxiv231017610"&gt;
 26. A qualitative difference between gradient flows of convex functions in finite- and infinite-dimensional Hilbert spaces [arXiv:2310.17610]&lt;span class="heading__anchor"&gt; &lt;a href="#26-a-qualitative-difference-between-gradient-flows-of-convex-functions-in-finite--and-infinite-dimensional-hilbert-spaces-arxiv231017610"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Jonathan W. Siegel, Stephan Wojtowytsch&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; We consider gradient flow/gradient descent and heavy ball/accelerated gradient descent optimization for convex objective functions. In the gradient flow case, we prove the following:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;If $f$ does not have a minimizer, the convergence $f(x_t)\to \inf f$ can be arbitrarily slow.&lt;/li&gt;
&lt;li&gt;If $f$ does have a minimizer, the excess energy $f(x_t) - \inf f$ is integrable/summable in time. In particular, $f(x_t) - \inf f = o(1/t)$ as $t\to\infty$.&lt;/li&gt;
&lt;li&gt;In Hilbert spaces, this is optimal: $f(x_t) - \inf f$ can decay to $0$ as slowly as any given function which is monotone decreasing and integrable at $\infty$, even for a fixed quadratic objective.&lt;/li&gt;
&lt;li&gt;In finite dimension (or more generally, for all gradient flow curves of finite length), this is not optimal: We prove that there are convex monotone decreasing integrable functions $g(t)$ which decrease to zero slower than $f(x_t)-\inf f$ for the gradient flow of any convex function on $\mathbb R^d$. For instance, we show that any gradient flow $x_t$ of a convex function $f$ in finite dimension satisfies $\liminf _{t\to\infty} \big(t\cdot \log^2(t)\cdot \big{f(x _t) -\inf f\big}\big)=0$.
This improves on the commonly reported $O(1/t)$ rate and provides a sharp characterization of the energy decay law. We also note that it is impossible to establish a rate $O(1/(tφ(t)))$ for any function $φ$ which satisfies $\lim _{t\to\infty}φ(t) = \infty$, even asymptotically.
Similar results are obtained in related settings for (1) discrete time gradient descent, (2) stochastic gradient descent with multiplicative noise and (3) the heavy ball ODE. In the case of stochastic gradient descent, the summability of $\mathbb E[f(x_n) - \inf f]$ is used to prove that $f(x_n)\to \inf f$ almost surely - an improvement on the convergence almost surely up to a subsequence which follows from the $O(1/n)$ decay estimate.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="25-large-convex-sets-in-difference-sets-arxiv230907527"&gt;
 25. Large Convex sets in Difference sets [arXiv:2309.07527]&lt;span class="heading__anchor"&gt; &lt;a href="#25-large-convex-sets-in-difference-sets-arxiv230907527"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Krishnendu Bhowmick, Ben Lund, Oliver Roche-Newton&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; We give a construction of a convex set $A \subset \mathbb R$ with cardinality $n$ such that $A-A$ contains a convex subset with cardinality $Ω(n^2)$. We also consider the following variant of this problem: given a convex set $A$, what is the size of the largest matching $M \subset A \times A$ such that the set [ { a-b : (a,b) \in M } ] is convex? We prove that there always exists such an $M$ with $|M| \geq \sqrt n$, and that this lower bound is best possible, up a multiplicative constant.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="24-moreau-envelope-based-difference-of-weakly-convex-reformulation-and-algorithm-for-bilevel-programs-arxiv230616761"&gt;
 24. Moreau Envelope Based Difference-of-weakly-Convex Reformulation and Algorithm for Bilevel Programs [arXiv:2306.16761]&lt;span class="heading__anchor"&gt; &lt;a href="#24-moreau-envelope-based-difference-of-weakly-convex-reformulation-and-algorithm-for-bilevel-programs-arxiv230616761"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Lucy L. Gao, Jane J. Ye, Haian Yin, Shangzhi Zeng, Jin Zhang&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; Bilevel programming has emerged as a valuable tool for hyperparameter selection, a central concern in machine learning. In a recent study by Ye et al. (2023), a value function-based difference of convex algorithm was introduced to address bilevel programs. This approach proves particularly powerful when dealing with scenarios where the lower-level problem exhibits convexity in both the upper-level and lower-level variables. Examples of such scenarios include support vector machines and $\ell_1$ and $\ell_2$ regularized regression. In this paper, we significantly expand the range of applications, now requiring convexity only in the lower-level variables of the lower-level program. We present an innovative single-level difference of weakly convex reformulation based on the Moreau envelope of the lower-level problem. We further develop a sequentially convergent Inexact Proximal Difference of Weakly Convex Algorithm (iP-DwCA). To evaluate the effectiveness of the proposed iP-DwCA, we conduct numerical experiments focused on tuning hyperparameters for kernel support vector machines on simulated data.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="23-generalized-graph-signal-sampling-by-difference-of-convex-optimization-arxiv230614634"&gt;
 23. Generalized Graph Signal Sampling by Difference-of-Convex Optimization [arXiv:2306.14634]&lt;span class="heading__anchor"&gt; &lt;a href="#23-generalized-graph-signal-sampling-by-difference-of-convex-optimization-arxiv230614634"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Keitaro Yamashita, Kazuki Naganuma, Shunsuke Ono&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; We propose a desigining method of a flexible sampling operator for graph signals via a difference-of-convex (DC) optimization algorithm. A fundamental challenge in graph signal processing is sampling, especially for graph signals that are not bandlimited. In order to sample beyond bandlimited graph signals, there are studies to expand the generalized sampling theory for the graph setting. Vertex-wise sampling and flexible sampling are two main strategies to sample graph signals. Recovery accuracy of existing vertex-wise sampling methods is highly dependent on specific vertices selected to generate a sampled graph signal that may compromise the accurary especially when noise is generated at the vertices. In contrast, a flexible sampling mixes values at multiple vertices to generate a sampled signal for robust sampling; however, existing flexible sampling methods impose strict assumptions and aggressive relaxations. To address these limitations, we aim to design a flexible sampling operator without such strict assumptions and aggressive relaxations by introducing DC optimization. By formulating the problem of designing a flexible sampling operator as a DC optimization problem, our method ensures robust sampling for graph signals under arbitrary priors based on generalized sampling theory. We develop an efficient solver based on the general double-proximal gradient DC algorithm, which guarantees convergence to a critical point. Experimental results demonstrate the superiority of our method in sampling and recovering beyond bandlimited graph signals compared to existing approaches.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="22-a-globally-convergent-difference-of-convex-algorithmic-framework-and-application-to-log-determinant-optimization-problems-arxiv230602001"&gt;
 22. A globally convergent difference-of-convex algorithmic framework and application to log-determinant optimization problems [arXiv:2306.02001]&lt;span class="heading__anchor"&gt; &lt;a href="#22-a-globally-convergent-difference-of-convex-algorithmic-framework-and-application-to-log-determinant-optimization-problems-arxiv230602001"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Chaorui Yao, Xin Jiang&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; The difference-of-convex algorithm (DCA) is a conceptually simple method for the minimization of (possibly) nonconvex functions that are expressed as the difference of two convex functions. At each iteration, DCA constructs a global overestimator of the objective and solves the resulting convex subproblem. Despite its conceptual simplicity, the theoretical understanding and algorithmic framework of DCA needs further investigation. In this paper, global convergence of DCA at a linear rate is established under an extended Polyak&amp;ndash;Łojasiewicz condition. The proposed condition holds for a class of DC programs with a bounded, closed, and convex constraint set, for which global convergence of DCA cannot be covered by existing analyses. Moreover, the DCProx computational framework is proposed, in which the DCA subproblems are solved by a primal&amp;ndash;dual proximal algorithm with Bregman distances. With a suitable choice of Bregman distances, DCProx has simple update rules with cheap per-iteration complexity. As an application, DCA is applied to several fundamental problems in network information theory, for which no existing numerical methods are able to compute the global optimum. For these problems, our analysis proves the global convergence of DCA, and more importantly, DCProx solves the DCA subproblems efficiently. Numerical experiments are conducted to verify the efficiency of DCProx.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="21-a-property-of-strictly-convex-functions-which-differ-from-each-other-by-a-constant-on-the-boundary-of-their-domain-arxiv230512183"&gt;
 21. A property of strictly convex functions which differ from each other by a constant on the boundary of their domain [arXiv:2305.12183]&lt;span class="heading__anchor"&gt; &lt;a href="#21-a-property-of-strictly-convex-functions-which-differ-from-each-other-by-a-constant-on-the-boundary-of-their-domain-arxiv230512183"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Biagio Ricceri&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; In this paper, in particular, we prove the following result: Let $E$ be a reflexive real Banach space and let $C\subset E$ be a closed convex set, with non-empty interior, whose boundary is sequentially weakly closed and non-convex. Then, for every function $\varphi:\partial C\to {\bf R}$ and for every convex set $S\subseteq E^&lt;em&gt;$ dense in $E^*$, there exists $\tilde{γ} \in S$ having the following property: for every strictly convex lower semicontinuous function $J:C \to {\bf R}$, Gâteaux differentiable in $\hbox {int}(C)$, such that $J _{\mid\partial C}-\varphi$ is constant in $\partial C$ and $\lim _{|x|\to +\infty}{{J(x)}\over {|x|}} = +\infty$ if $C$ is unbounded, $\tilde{γ}$ is an algebraically interior point of $J&amp;rsquo;(\hbox {\int}(C))$ (with respect to $E^&lt;/em&gt;$).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="20-local-differences-determined-by-convex-sets-arxiv230400888"&gt;
 20. Local Differences Determined by Convex sets [arXiv:2304.00888]&lt;span class="heading__anchor"&gt; &lt;a href="#20-local-differences-determined-by-convex-sets-arxiv230400888"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Krishnendu Bhowmick, Miriam Patry, Oliver Roche-Newton&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; This paper introduces a new problem concerning additive properties of convex sets. Let $S= {s_1 &amp;lt; \dots &amp;lt;s_n }$ be a set of real numbers and let $D_i(S)= {s_x-s_y: 1 \leq x-y \leq i}$. We expect that $D_i(S)$ is large, with respect to the size of $S$ and the parameter $i$, for any convex set $S$. We give a construction to show that $D_3(S)$ can be as small as $n+2$, and show that this is the smallest possible size. On the other hand, we use an elementary argument to prove a non-trivial lower bound for $D_4(S)$, namely $|D_4(S)| \geq \frac{5}{4}n -1$. For sufficiently large values of $i$, we are able to prove a non-trivial bound that grows with $i$ using incidence geometry.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="19-preconditioned-algorithm-for-difference-of-convex-functions-with-applications-to-graph-ginzburg-landau-model-arxiv230314495"&gt;
 19. Preconditioned Algorithm for Difference of Convex Functions with applications to Graph Ginzburg-Landau Model [arXiv:2303.14495]&lt;span class="heading__anchor"&gt; &lt;a href="#19-preconditioned-algorithm-for-difference-of-convex-functions-with-applications-to-graph-ginzburg-landau-model-arxiv230314495"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Xinhua Shen, Hongpeng Sun, Xuecheng Tai&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; In this work, we propose and study a preconditioned framework with a graphic Ginzburg-Landau functional for image segmentation and data clustering by parallel computing. Solving nonlocal models is usually challenging due to the huge computation burden. For the nonconvex and nonlocal variational functional, we propose several damped Jacobi and generalized Richardson preconditioners for the large-scale linear systems within a difference of convex functions algorithms framework. They are efficient for parallel computing with GPU and can leverage the computational cost. Our framework also provides flexible step sizes with a global convergence guarantee. Numerical experiments show the proposed algorithms are very competitive compared to the singular value decomposition based spectral method.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="18-multi-uav-trajectory-planning-problem-using-the-difference-of-convex-function-programming-arxiv230307581"&gt;
 18. Multi-UAV trajectory planning problem using the difference of convex function programming [arXiv:2303.07581]&lt;span class="heading__anchor"&gt; &lt;a href="#18-multi-uav-trajectory-planning-problem-using-the-difference-of-convex-function-programming-arxiv230307581"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Anh Phuong Ngo, Christian Thomas, Ali Karimoddini, Hieu T. Nguyen&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; The trajectory planning problem for a swarm of multiple UAVs is known as a challenging nonconvex optimization problem, particularly due to a large number of collision avoidance constraints required for individual pairs of UAVs in the swarm. In this paper, we tackle this nonconvexity by leveraging the difference of convex function (DC) programming. We introduce the slack variables to relax and reformulate the collision avoidance conditions and employ the penalty function term to equivalently convert the problem into a DC form. Consequently, we construct a penalty DC algorithm in which we sequentially solve a set of convex optimization problems obtained by linearizing the collision avoidance constraint. The algorithm iteratively tightens the safety condition and reduces the objective cost of the planning problem and the additional penalty term. Numerical results demonstrate the effectiveness of the proposed approach in planning a large number of UAVs in congested space.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="17-approximate-bilevel-difference-convex-programming-for-bayesian-risk-markov-decision-processes-arxiv230111415"&gt;
 17. Approximate Bilevel Difference Convex Programming for Bayesian Risk Markov Decision Processes [arXiv:2301.11415]&lt;span class="heading__anchor"&gt; &lt;a href="#17-approximate-bilevel-difference-convex-programming-for-bayesian-risk-markov-decision-processes-arxiv230111415"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Yifan Lin, Enlu Zhou&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; We consider infinite-horizon Markov Decision Processes where parameters, such as transition probabilities, are unknown and estimated from data. The popular distributionally robust approach to addressing the parameter uncertainty can sometimes be overly conservative. In this paper, we utilize the recently proposed formulation, Bayesian risk Markov Decision Process (BR-MDP), to address parameter (or epistemic) uncertainty in MDPs. To solve the infinite-horizon BR-MDP with a class of convex risk measures, we propose a computationally efficient approach called approximate bilevel difference convex programming (ABDCP). The optimization is performed offline and produces the optimal policy that is represented as a finite state controller with desirable performance guarantees. We also demonstrate the empirical performance of the BR-MDP formulation and the proposed algorithm.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="16-single-crossing-differences-in-convex-environments-arxiv221212009"&gt;
 16. Single-Crossing Differences in Convex Environments [arXiv:2212.12009]&lt;span class="heading__anchor"&gt; &lt;a href="#16-single-crossing-differences-in-convex-environments-arxiv221212009"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Navin Kartik, SangMok Lee, Daniel Rappoport&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; An agent&amp;rsquo;s preferences depend on an ordered parameter or type. We characterize the set of utility functions with single-crossing differences (SCD) in convex environments. These include preferences over lotteries, both in expected utility and rank-dependent utility frameworks, and preferences over bundles of goods and over consumption streams. Our notion of SCD does not presume an order on the choice space. This unordered SCD is necessary and sufficient for &amp;lsquo;&amp;lsquo;interval choice&amp;rsquo;&amp;rsquo; comparative statics. We present applications to cheap talk, observational learning, and collective choice, showing how convex environments arise in these problems and how SCD/interval choice are useful. Methodologically, our main characterization stems from a result on linear aggregations of single-crossing functions.
△ Less&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="15-control-of-uncertain-pwa-systems-using-difference-of-convex-decompositions-arxiv220912990"&gt;
 15. Control of Uncertain PWA Systems using Difference-of-Convex Decompositions [arXiv:2209.12990]&lt;span class="heading__anchor"&gt; &lt;a href="#15-control-of-uncertain-pwa-systems-using-difference-of-convex-decompositions-arxiv220912990"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Siddharth H. Nair, Yvonne R. Stürz&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; In this report, we analyze and design feedback policies for discrete-time Piecewise-Affine (PWA) systems with uncertainty in both the affine dynamics and the polytopic partition. The main idea is to utilise the Difference-of-Convex (DC) decomposition of continuous PWA systems to derive quadratic Lyapunov functions as stability certificates and stabilizing affine policies in a higher dimensional space. When projected back to the state space, we obtain time-varying PWQ Lyapunov functions and time-varying PWA feedback policies.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="14-encoding-inductive-invariants-as-barrier-certificates-synthesis-via-difference-of-convex-programming-arxiv220909703"&gt;
 14. Encoding inductive invariants as barrier certificates: synthesis via difference-of-convex programming [arXiv:2209.09703]&lt;span class="heading__anchor"&gt; &lt;a href="#14-encoding-inductive-invariants-as-barrier-certificates-synthesis-via-difference-of-convex-programming-arxiv220909703"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Qiuye Wang, Mingshuai Chen, Bai Xue, Naijun Zhan, Joost-Pieter Katoen&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; A barrier certificate often serves as an inductive invariant that isolates an unsafe region from the reachable set of states, and hence is widely used in proving safety of hybrid systems possibly over an infinite time horizon. We present a novel condition on barrier certificates, termed the invariant barrier-certificate condition, that witnesses unbounded-time safety of differential dynamical systems. The proposed condition is the weakest possible one to attain inductive invariance. We show that discharging the invariant barrier-certificate condition &amp;ndash; thereby synthesizing invariant barrier certificates &amp;ndash; can be encoded as solving an optimization problem subject to bilinear matrix inequalities (BMIs). We further propose a synthesis algorithm based on difference-of-convex programming, which approaches a local optimum of the BMI problem via solving a series of convex optimization problems. This algorithm is incorporated in a branch-and-bound framework that searches for the global optimum in a divide-and-conquer fashion. We present a weak completeness result of our method, namely, a barrier certificate is guaranteed to be found (under some mild assumptions) whenever there exists an inductive invariant (in the form of a given template) that suffices to certify safety of the system. Experimental results on benchmarks demonstrate the effectiveness and efficiency of our approach.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="13-a-convex-set-with-a-rich-difference-arxiv220803258"&gt;
 13. A convex set with a rich difference [arXiv:2208.03258]&lt;span class="heading__anchor"&gt; &lt;a href="#13-a-convex-set-with-a-rich-difference-arxiv220803258"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Oliver Roche-Newton, Audie Warren&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; We construct a convex set $A$ with cardinality $2n$ and with the property that an element of the difference set $A-A$ can be represented in $n$ different ways. We also show that this construction is optimal by proving that for any convex set $A$, the maximum possible number of representations an element of $A-A$ can have is $\lfloor |A|/2 \rfloor $.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="12-value-function-based-difference-of-convex-algorithm-for-bilevel-hyperparameter-selection-problems-arxiv220605976"&gt;
 12. Value Function Based Difference-of-Convex Algorithm for Bilevel Hyperparameter Selection Problems [arXiv:2206.05976]&lt;span class="heading__anchor"&gt; &lt;a href="#12-value-function-based-difference-of-convex-algorithm-for-bilevel-hyperparameter-selection-problems-arxiv220605976"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Lucy Gao, Jane J. Ye, Haian Yin, Shangzhi Zeng, Jin Zhang&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; Gradient-based optimization methods for hyperparameter tuning guarantee theoretical convergence to stationary solutions when for fixed upper-level variable values, the lower level of the bilevel program is strongly convex (LLSC) and smooth (LLS). This condition is not satisfied for bilevel programs arising from tuning hyperparameters in many machine learning algorithms. In this work, we develop a sequentially convergent Value Function based Difference-of-Convex Algorithm with inexactness (VF-iDCA). We show that this algorithm achieves stationary solutions without LLSC and LLS assumptions for bilevel programs from a broad class of hyperparameter tuning applications. Our extensive experiments confirm our theoretical findings and show that the proposed VF-iDCA yields superior performance when applied to tune hyperparameters.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="11-decentralized-saddle-point-problems-with-different-constants-of-strong-convexity-and-strong-concavity-arxiv220600090"&gt;
 11. Decentralized Saddle-Point Problems with Different Constants of Strong Convexity and Strong Concavity [arXiv:2206.00090]&lt;span class="heading__anchor"&gt; &lt;a href="#11-decentralized-saddle-point-problems-with-different-constants-of-strong-convexity-and-strong-concavity-arxiv220600090"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Dmitriy Metelev, Alexander Rogozin, Alexander Gasnikov, Dmitry Kovalev&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; Large-scale saddle-point problems arise in such machine learning tasks as GANs and linear models with affine constraints. In this paper, we study distributed saddle-point problems (SPP) with strongly-convex-strongly-concave smooth objectives that have different strong convexity and strong concavity parameters of composite terms, which correspond to min and max variables, and bilinear saddle-point part. We consider two types of first-order oracles: deterministic (returns gradient) and stochastic (returns unbiased stochastic gradient). Our method works in both cases and takes several consensus steps between oracle calls.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="10-the-difference-of-convex-algorithm-on-hadamard-manifolds-arxiv211205250"&gt;
 10. The difference of convex algorithm on Hadamard manifolds [arXiv:2112.05250]&lt;span class="heading__anchor"&gt; &lt;a href="#10-the-difference-of-convex-algorithm-on-hadamard-manifolds-arxiv211205250"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Ronny Bergmann, Orizon P. Ferreira, Elianderson M. Santos, João Carlos O. Souza&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; In this paper, we propose a Riemannian version of the difference of convex algorithm (DCA) to solve a minimization problem involving the difference of convex (DC) function. We establish the equivalence between the classical and simplified Riemannian versions of the DCA. We also prove that, under mild assumptions, the Riemannian version of the DCA is well-defined, and every cluster point of the sequence generated by the proposed method, if any, is a critical point of the objective DC function. Additionally, we establish some duality relations between the DC problem and its dual. To illustrate the effectiveness of the algorithm, we present some numerical experiments.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="9-data-fitting-with-signomial-programming-compatible-difference-of-convex-functions-arxiv211012104"&gt;
 9. Data Fitting with Signomial Programming Compatible Difference of Convex Functions [arXiv:2110.12104]&lt;span class="heading__anchor"&gt; &lt;a href="#9-data-fitting-with-signomial-programming-compatible-difference-of-convex-functions-arxiv211012104"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Cody Karcher&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; Signomial Programming (SP) has proven to be a powerful tool for engineering design optimization, striking a balance between the computational efficiency of Geometric Programming (GP) and the extensibility of more general optimization methods like Sequential Quadratic Programming (SQP). But when an existing engineering analysis tool is incompatible with the mathematics of the SP formulation, options are limited. Previous literature has suggested schemes for fitting GP compatible models to pre-computed data, but no methods have yet been proposed that take advantage of the increased modeling flexibility available in SP. This paper describes a new Soft Difference of Max Affine (SDMA) function class that is constructed from existing methods of GP compatible fitting and the theory of Difference of Convex (DC) functions. When a SDMA function is fit to data in log-log transformed space, it becomes either a signomial or a set of signomials upon inverse transformation. Three examples of fitting are presented here, including simple test cases in 2D and 3D, and a fit to the performance data of the NACA 24xx family of airfoils. In each case, RMS error is driven to less than 1%.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="8-factored-couplings-in-multi-marginal-optimal-transport-via-difference-of-convex-programming-arxiv211000629"&gt;
 8. Factored couplings in multi-marginal optimal transport via difference of convex programming [arXiv:2110.00629]&lt;span class="heading__anchor"&gt; &lt;a href="#8-factored-couplings-in-multi-marginal-optimal-transport-via-difference-of-convex-programming-arxiv211000629"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Quang Huy Tran, Hicham Janati, Ievgen Redko, Rémi Flamary, Nicolas Courty&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; Optimal transport (OT) theory underlies many emerging machine learning (ML) methods nowadays solving a wide range of tasks such as generative modeling, transfer learning and information retrieval. These latter works, however, usually build upon a traditional OT setup with two distributions, while leaving a more general multi-marginal OT formulation somewhat unexplored. In this paper, we study the multi-marginal OT (MMOT) problem and unify several popular OT methods under its umbrella by promoting structural information on the coupling. We show that incorporating such structural information into MMOT results in an instance of a different of convex (DC) programming problem allowing us to solve it numerically. Despite high computational cost of the latter procedure, the solutions provided by DC optimization are usually as qualitative as those obtained using currently employed optimization schemes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="7-on-the-rate-of-convergence-of-the-difference-of-convex-algorithm-dca-arxiv210913566"&gt;
 7. On the rate of convergence of the Difference-of-Convex Algorithm (DCA) [arXiv:2109.13566]&lt;span class="heading__anchor"&gt; &lt;a href="#7-on-the-rate-of-convergence-of-the-difference-of-convex-algorithm-dca-arxiv210913566"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Hadi Abbaszadehpeivasti, Etienne de Klerk, Moslem Zamani&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; In this paper, we study the convergence rate of the DCA (Difference-of-Convex Algorithm), also known as the convex-concave procedure, with two different termination criteria that are suitable for smooth and nonsmooth decompositions respectively. The DCA is a popular algorithm for difference-of-convex (DC) problems, and known to converge to a stationary point of the objective under some assumptions. We derive a worst-case convergence rate of $O(1/\sqrt{N})$ after $N$ iterations of the objective gradient norm for certain classes of DC problems, without assuming strong convexity in the DC decomposition, and give an example which shows the convergence rate is exact. We also provide a new convergence rate of $O(1/N)$ for the DCA with the second termination criterion. %In addition, we investigate the DCA with regularization. Moreover, we derive a new linear convergence rate result for the DCA under the assumption of the Polyak-Łojasiewicz inequality. The novel aspect of our analysis is that it employs semidefinite programming performance estimation.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="6-a-different-perspective-on-the-stochastic-convex-feasibility-problem-arxiv210812029"&gt;
 6. A Different Perspective On The Stochastic Convex Feasibility Problem [arXiv:2108.12029]&lt;span class="heading__anchor"&gt; &lt;a href="#6-a-different-perspective-on-the-stochastic-convex-feasibility-problem-arxiv210812029"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; James Renegar, Song Zhou&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; We analyze a simple randomized subgradient method for approximating solutions to stochastic systems of convex functional constraints, the only input to the algorithm being the size of minibatches. By introducing a new notion of what is meant for a point to approximately solve the constraints, determining bounds on the expected number of iterations reduces to determining a hitting time for a compound Bernoulli process, elementary probability. Besides bounding the expected number of iterations quite generally, we easily establish concentration inequalities on the number of iterations, and more interesting, we establish much-improved bounds when a notion akin to Hölderian growth is satisfied, for all degrees of growth, not just the linear growth of piecewise-linear convex functions or the quadratic growth of strongly convex functions. Finally, we establish the analogous results under a slight modification to the algorithm which results in the user knowing with high confidence an iterate is in hand that approximately solves the system. Perhaps surprisingly, the iteration bounds here are deterministic &amp;ndash; all of the probability gets wrapped into the confidence level (albeit at the expense of potentially large minibatches).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="5-retraction-based-first-order-feasible-methods-for-difference-of-convex-programs-with-smooth-inequality-and-simple-geometric-constraints-arxiv210608584"&gt;
 5. Retraction-based first-order feasible methods for difference-of-convex programs with smooth inequality and simple geometric constraints [arXiv:2106.08584]&lt;span class="heading__anchor"&gt; &lt;a href="#5-retraction-based-first-order-feasible-methods-for-difference-of-convex-programs-with-smooth-inequality-and-simple-geometric-constraints-arxiv210608584"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Yongle Zhang, Guoyin Li, Ting Kei Pong, Shiqi Xu&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; In this paper, we propose first-order feasible methods for difference-of-convex (DC) programs with smooth inequality and simple geometric constraints. Our strategy for maintaining feasibility of the iterates is based on a &amp;ldquo;retraction&amp;rdquo; idea adapted from the literature of manifold optimization. When the constraints are convex, we establish the global subsequential convergence of the sequence generated by our algorithm under strict feasibility condition, and analyze its convergence rate when the objective is in addition convex according to the Kurdyka-Lojasiewicz (KL) exponent of the extended objective (i.e., sum of the objective and the indicator function of the constraint set). We also show that the extended objective of a large class of Euclidean norm (and more generally, group LASSO penalty) regularized convex optimization problems is a KL function with exponent $\frac12$; consequently, our algorithm is locally linearly convergent when applied to these problems. We then extend our method to solve DC programs with a single specially structured nonconvex constraint. Finally, we discuss how our algorithms can be applied to solve two concrete optimization problems, namely, group-structured compressed sensing problems with Gaussian measurement noise and compressed sensing problems with Cauchy measurement noise, and illustrate the empirical performance of our algorithms.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="4-synthesizing-invariant-barrier-certificates-via-difference-of-convex-programming-arxiv210514311"&gt;
 4. Synthesizing Invariant Barrier Certificates via Difference-of-Convex Programming [arXiv:2105.14311]&lt;span class="heading__anchor"&gt; &lt;a href="#4-synthesizing-invariant-barrier-certificates-via-difference-of-convex-programming-arxiv210514311"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Qiuye Wang, Mingshuai Chen, Bai Xue, Naijun Zhan, Joost-Pieter Katoen&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; A barrier certificate often serves as an inductive invariant that isolates an unsafe region from the reachable set of states, and hence is widely used in proving safety of hybrid systems possibly over the infinite time horizon. We present a novel condition on barrier certificates, termed the invariant barrier-certificate condition, that witnesses unbounded-time safety of differential dynamical systems. The proposed condition is by far the least conservative one on barrier certificates, and can be shown as the weakest possible one to attain inductive invariance. We show that discharging the invariant barrier-certificate condition &amp;ndash; thereby synthesizing invariant barrier certificates &amp;ndash; can be encoded as solving an optimization problem subject to bilinear matrix inequalities (BMIs). We further propose a synthesis algorithm based on difference-of-convex programming, which approaches a local optimum of the BMI problem via solving a series of convex optimization problems. This algorithm is incorporated in a branch-and-bound framework that searches for the global optimum in a divide-and-conquer fashion. We present a weak completeness result of our method, in the sense that a barrier certificate is guaranteed to be found (under some mild assumptions) whenever there exists an inductive invariant (in the form of a given template) that suffices to certify safety of the system. Experimental results on benchmark examples demonstrate the effectiveness and efficiency of our approach.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="3-algorithms-for-difference-of-convex-dc-programs-based-on-difference-of-moreau-envelopes-smoothing-arxiv210401470"&gt;
 3. Algorithms for Difference-of-Convex (DC) Programs Based on Difference-of-Moreau-Envelopes Smoothing [arXiv:2104.01470]&lt;span class="heading__anchor"&gt; &lt;a href="#3-algorithms-for-difference-of-convex-dc-programs-based-on-difference-of-moreau-envelopes-smoothing-arxiv210401470"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Kaizhao Sun, Xu Andy Sun&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; In this paper we consider minimization of a difference-of-convex (DC) function with and without linear constraints. We first study a smooth approximation of a generic DC function, termed difference-of-Moreau-envelopes (DME) smoothing, where both components of the DC function are replaced by their respective Moreau envelopes. The resulting smooth approximation is shown to be Lipschitz differentiable, capture stationary points, local, and global minima of the original DC function, and enjoy some growth conditions, such as level-boundedness and coercivity, for broad classes of DC functions. We then develop four algorithms for solving DC programs with and without linear constraints based on the DME smoothing. In particular, for a smoothed DC program without linear constraints, we show that the classic gradient descent method as well as an inexact variant can obtain a stationary solution in the limit with a convergence rate of $\mathcal{O}(K^{-1/2})$, where $K$ is the number of proximal evaluations of both components. Furthermore, when the DC program is explicitly constrained in an affine subspace, we combine the smoothing technique with the augmented Lagrangian function and derive two variants of the augmented Lagrangian method (ALM), named LCDC-ALM and composite LCDC-ALM, focusing on different structures of the DC objective function. We show that both algorithms find an $ε$-approximate stationary solution of the original DC program in $\mathcal{O}(ε^{-2})$ iterations. Comparing to existing methods designed for linearly constrained weakly convex minimization, the proposed ALM-based algorithms can be applied to a broader class of problems, where the objective contains a nonsmooth concave component. Finally, numerical experiments are presented to demonstrate the performance of the proposed algorithms.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="2-cdinn--convex-difference-neural-networks-arxiv210317231"&gt;
 2. CDiNN -Convex Difference Neural Networks [arXiv:2103.17231]&lt;span class="heading__anchor"&gt; &lt;a href="#2-cdinn--convex-difference-neural-networks-arxiv210317231"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Parameswaran Sankaranarayanan, Raghunathan Rengaswamy&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; Neural networks with ReLU activation function have been shown to be universal function approximators and learn function mapping as non-smooth functions. Recently, there is considerable interest in the use of neural networks in applications such as optimal control. It is well-known that optimization involving non-convex, non-smooth functions are computationally intensive and have limited convergence guarantees. Moreover, the choice of optimization hyper-parameters used in gradient descent/ascent significantly affect the quality of the obtained solutions. A new neural network architecture called the Input Convex Neural Networks (ICNNs) learn the output as a convex function of inputs thereby allowing the use of efficient convex optimization methods. Use of ICNNs for determining the input for minimizing output has two major problems: learning of a non-convex function as a convex mapping could result in significant function approximation error, and we also note that the existing representations cannot capture simple dynamic structures like linear time delay systems. We attempt to address the above problems by introduction of a new neural network architecture, which we call the CDiNN, which learns the function as a difference of polyhedral convex functions from data. We also discuss that, in some cases, the optimal input can be obtained from CDiNN through difference of convex optimization with convergence guarantees and that at each iteration, the problem is reduced to a linear programming problem.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="1-a-difference-of-convex-cutting-plane-algorithm-for-mixed-binary-linear-program-arxiv210300717"&gt;
 1. A Difference-of-Convex Cutting Plane Algorithm for Mixed-Binary Linear Program [arXiv:2103.00717]&lt;span class="heading__anchor"&gt; &lt;a href="#1-a-difference-of-convex-cutting-plane-algorithm-for-mixed-binary-linear-program-arxiv210300717"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; Yi-Shuai Niu, Yu You&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; In this paper, we propose a cutting plane algorithm based on DC (Difference-of-Convex) programming and DC cut for globally solving Mixed-Binary Linear Program (MBLP). We first use a classical DC programming formulation via the exact penalization to formulate MBLP as a DC program, which can be solved by DCA algorithm. Then, we focus on the construction of DC cuts, which serves either as a local cut (namely type-I DC cut) at feasible local minimizer of MBLP, or as a global cut (namely type-II DC cut) at infeasible local minimizer of MBLP if some particular assumptions are verified. Otherwise, the constructibility of DC cut is still unclear, and we propose to use classical global cuts (such as the Lift-and-Project cut) instead. Combining DC cut and classical global cuts, a cutting plane algorithm, namely DCCUT, is established for globally solving MBLP. The convergence theorem of DCCUT is proved. Restarting DCA in DCCUT helps to quickly update the upper bound solution and to introduce more DC cuts for lower bound improvement. A variant of DCCUT by introducing more classical global cuts in each iteration is proposed, and parallel versions of DCCUT and its variant are also designed which use the power of multiple processors for better performance. Numerical simulations of DCCUT type algorithms comparing with the classical cutting plane algorithm using Lift-and-Project cuts are reported. Tests on some specific samples and the MIPLIB 2017 benchmark dataset demonstrate the benefits of DC cut and good performance of DCCUT algorithms.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;URL:&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;</description></item><item><title>Recent Advanced in Research on Difference-of-Convex (DC) Programming</title><link>https://blog.namln.org/en/mathematics/analysis/optimization/dc-programming/</link><pubDate>Thu, 27 Jun 2024 23:14:15 +0800</pubDate><guid>https://blog.namln.org/en/mathematics/analysis/optimization/dc-programming/</guid><description/></item><item><title>Second-order Stochastic Optimization methods for Machine Learning</title><link>https://blog.namln.org/en/mathematics/analysis/optimization/soms/</link><pubDate>Thu, 27 Jun 2024 23:14:15 +0800</pubDate><guid>https://blog.namln.org/en/mathematics/analysis/optimization/soms/</guid><description>&lt;h2 class="heading" id="analysis-of-the-hessian"&gt;
 Analysis of the Hessian&lt;span class="heading__anchor"&gt; &lt;a href="#analysis-of-the-hessian"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;h3 class="heading" id="1-empirical-analysis-of-the-hessian-of-over-parametrized-neural-networks"&gt;
 1. Empirical Analysis of the Hessian of Over-Parametrized Neural Networks&lt;span class="heading__anchor"&gt; &lt;a href="#1-empirical-analysis-of-the-hessian-of-over-parametrized-neural-networks"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h3&gt;&lt;p&gt;&lt;strong&gt;Year:&lt;/strong&gt; 2017&lt;br&gt;
&lt;strong&gt;Authors:&lt;/strong&gt; Levent Sagun, Utku Evci, V. Ugur Guney, Yann Dauphin, Leon Bottou&lt;br&gt;
&lt;strong&gt;ArXiv ID:&lt;/strong&gt; arXiv:1706.04454&lt;br&gt;
&lt;strong&gt;URL:&lt;/strong&gt; &lt;a href="https://arxiv.org/abs/1706.04454"&gt;https://arxiv.org/abs/1706.04454&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; We study the properties of common loss surfaces through their Hessian matrix. In particular, in the context of deep learning, we empirically show that the spectrum of the Hessian is composed of two parts: (1) the bulk centered near zero, (2) and outliers away from the bulk. We present numerical evidence and mathematical justifications to the following conjectures laid out by Sagun et al. (2016): Fixing data, increasing the number of parameters merely scales the bulk of the spectrum; fixing the dimension and changing the data (for instance adding more clusters or making the data less separable) only affects the outliers. We believe that our observations have striking implications for non-convex optimization in high dimensions. First, the flatness of such landscapes (which can be measured by the singularity of the Hessian) implies that classical notions of basins of attraction may be quite misleading. And that the discussion of wide/narrow basins may be in need of a new perspective around over-parametrization and redundancy that are able to create large connected components at the bottom of the landscape. Second, the dependence of small number of large eigenvalues to the data distribution can be linked to the spectrum of the covariance matrix of gradients of model outputs. With this in mind, we may reevaluate the connections within the data-architecture-algorithm framework of a model, hoping that it would shed light into the geometry of high-dimensional and non-convex spaces in modern applications. In particular, we present a case that links the two observations: small and large batch gradient descent appear to converge to different basins of attraction but we show that they are in fact connected through their flat region and so belong to the same basin.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Source Code:&lt;/strong&gt; No explicit source code information found&lt;/p&gt;
&lt;hr&gt;
&lt;h3 class="heading" id="2-the-full-spectrum-of-deepnet-hessians-at-scale-dynamics-with-sgd-training-and-sample-size"&gt;
 2. The Full Spectrum of Deepnet Hessians at Scale: Dynamics with SGD Training and Sample Size&lt;span class="heading__anchor"&gt; &lt;a href="#2-the-full-spectrum-of-deepnet-hessians-at-scale-dynamics-with-sgd-training-and-sample-size"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h3&gt;&lt;p&gt;&lt;strong&gt;Year:&lt;/strong&gt; 2018&lt;br&gt;
&lt;strong&gt;Authors:&lt;/strong&gt; Vardan Papyan&lt;br&gt;
&lt;strong&gt;ArXiv ID:&lt;/strong&gt; arXiv:1811.07062&lt;br&gt;
&lt;strong&gt;URL:&lt;/strong&gt; &lt;a href="https://arxiv.org/abs/1811.07062"&gt;https://arxiv.org/abs/1811.07062&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; We apply state-of-the-art tools in modern high-dimensional numerical linear algebra to approximate efficiently the spectrum of the Hessian of modern deepnets, with tens of millions of parameters, trained on real data. Our results corroborate previous findings, based on small-scale networks, that the Hessian exhibits &amp;ldquo;spiked&amp;rdquo; behavior, with several outliers isolated from a continuous bulk. We decompose the Hessian into different components and study the dynamics with training and sample size of each term individually.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Source Code:&lt;/strong&gt; No explicit source code information found&lt;/p&gt;
&lt;hr&gt;
&lt;h3 class="heading" id="3-pyhessian-neural-networks-through-the-lens-of-the-hessian"&gt;
 3. PyHessian: Neural Networks Through the Lens of the Hessian&lt;span class="heading__anchor"&gt; &lt;a href="#3-pyhessian-neural-networks-through-the-lens-of-the-hessian"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h3&gt;&lt;p&gt;&lt;strong&gt;Year:&lt;/strong&gt; 2019&lt;br&gt;
&lt;strong&gt;Authors:&lt;/strong&gt; Zhewei Yao, Amir Gholami, Kurt Keutzer, Michael W. Mahoney&lt;br&gt;
&lt;strong&gt;ArXiv ID:&lt;/strong&gt; arXiv:1912.07145&lt;br&gt;
&lt;strong&gt;URL:&lt;/strong&gt; &lt;a href="https://arxiv.org/abs/1912.07145"&gt;https://arxiv.org/abs/1912.07145&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; We present PYHESSIAN, a new scalable framework that enables fast computation of Hessian (i.e., second-order derivative) information for deep neural networks. PYHESSIAN enables fast computations of the top Hessian eigenvalues, the Hessian trace, and the full Hessian eigenvalue/spectral density, and it supports distributed-memory execution on cloud/supercomputer systems and is available as open source. This general framework can be used to analyze neural network models, including the topology of the loss landscape (i.e., curvature information) to gain insight into the behavior of different models/optimizers. To illustrate this, we analyze the effect of residual connections and Batch Normalization layers on the trainability of neural networks. One recent claim, based on simpler first-order analysis, is that residual connections and Batch Normalization make the loss landscape smoother, thus making it easier for Stochastic Gradient Descent to converge to a good solution. Our extensive analysis shows new finer-scale insights, demonstrating that, while conventional wisdom is sometimes validated, in other cases it is simply incorrect. In particular, we find that Batch Normalization does not necessarily make the loss landscape smoother, especially for shallower networks.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Source Code:&lt;/strong&gt; Mentions &amp;lsquo;available&amp;rsquo; in abstract; Mentions &amp;lsquo;open source&amp;rsquo; in abstract; Known repository: &lt;a href="https://github.com/amirgholami/PyHessian"&gt;https://github.com/amirgholami/PyHessian&lt;/a&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h3 class="heading" id="4-a-deeper-look-at-the-hessian-eigenspectrum-of-deep-neural-networks-and-its-applications-to-regularization"&gt;
 4. A Deeper Look at the Hessian Eigenspectrum of Deep Neural Networks and its Applications to Regularization&lt;span class="heading__anchor"&gt; &lt;a href="#4-a-deeper-look-at-the-hessian-eigenspectrum-of-deep-neural-networks-and-its-applications-to-regularization"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h3&gt;&lt;p&gt;&lt;strong&gt;Year:&lt;/strong&gt; 2020&lt;br&gt;
&lt;strong&gt;Authors:&lt;/strong&gt; Adepu Ravi Sankar, Yash Khasbage, Rahul Vigneswaran, Vineeth N Balasubramanian&lt;br&gt;
&lt;strong&gt;ArXiv ID:&lt;/strong&gt; arXiv:2012.03801&lt;br&gt;
&lt;strong&gt;URL:&lt;/strong&gt; &lt;a href="https://arxiv.org/abs/2012.03801"&gt;https://arxiv.org/abs/2012.03801&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; Loss landscape analysis is extremely useful for a deeper understanding of the generalization ability of deep neural network models. In this work, we propose a layerwise loss landscape analysis where the loss surface at every layer is studied independently and also on how each correlates to the overall loss surface. We study the layerwise loss landscape by studying the eigenspectra of the Hessian at each layer. In particular, our results show that the layerwise Hessian geometry is largely similar to the entire Hessian. We also report an interesting phenomenon where the Hessian eigenspectrum of middle layers of the deep neural network are observed to most similar to the overall Hessian eigenspectrum. We also show that the maximum eigenvalue and the trace of the Hessian (both full network and layerwise) reduce as training of the network progresses. We leverage on these observations to propose a new regularizer based on the trace of the layerwise Hessian. Penalizing the trace of the Hessian at every layer indirectly forces Stochastic Gradient Descent to converge to flatter minima, which are shown to have better generalization performance. In particular, we show that such a layerwise regularizer can be leveraged to penalize the middlemost layers alone, which yields promising results. Our empirical studies on well-known deep nets across datasets support the claims of this work&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Source Code:&lt;/strong&gt; No explicit source code information found&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="diagonal-scaling"&gt;
 Diagonal Scaling&lt;span class="heading__anchor"&gt; &lt;a href="#diagonal-scaling"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;h3 class="heading" id="1-adahessian-an-adaptive-second-order-optimizer-for-machine-learning"&gt;
 1. AdaHessian: An Adaptive Second Order Optimizer for Machine Learning&lt;span class="heading__anchor"&gt; &lt;a href="#1-adahessian-an-adaptive-second-order-optimizer-for-machine-learning"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h3&gt;&lt;p&gt;&lt;strong&gt;Year:&lt;/strong&gt; 2020&lt;br&gt;
&lt;strong&gt;Authors:&lt;/strong&gt; Zhewei Yao, Amir Gholami, Sheng Shen, Mustafa Mustafa, Kurt Keutzer, Michael W. Mahoney&lt;br&gt;
&lt;strong&gt;ArXiv ID:&lt;/strong&gt; arXiv:2006.00719&lt;br&gt;
&lt;strong&gt;Algorithm:&lt;/strong&gt; AdaHessian&lt;br&gt;
&lt;strong&gt;URL:&lt;/strong&gt; &lt;a href="https://arxiv.org/abs/2006.00719"&gt;https://arxiv.org/abs/2006.00719&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; We introduce ADAHESSIAN, a second order stochastic optimization algorithm which dynamically incorporates the curvature of the loss function via ADAptive estimates of the HESSIAN. Second order algorithms are among the most powerful optimization algorithms with superior convergence properties as compared to first order methods such as SGD and Adam. The main disadvantage of traditional second order methods is their heavier per iteration computation and poor accuracy as compared to first order methods. To address these, we incorporate several novel approaches in ADAHESSIAN, including: (i) a fast Hutchinson based method to approximate the curvature matrix with low computational overhead; (ii) a root-mean-square exponential moving average to smooth out variations of the Hessian diagonal across different iterations; and (iii) a block diagonal averaging to reduce the variance of Hessian diagonal elements. We show that ADAHESSIAN achieves new state-of-the-art results by a large margin as compared to other adaptive optimization methods, including variants of Adam. In particular, we perform extensive tests on CV, NLP, and recommendation system tasks and find that ADAHESSIAN: (i) achieves 1.80%/1.45% higher accuracy on ResNets20/32 on Cifar10, and 5.55% higher accuracy on ImageNet as compared to Adam; (ii) outperforms AdamW for transformers by 0.13/0.33 BLEU score on IWSLT14/WMT14 and 2.7/1.0 PPL on PTB/Wikitext-103; (iii) outperforms AdamW for SqueezeBert by 0.41 points on GLUE; and (iv) achieves 0.032% better score than Adagrad for DLRM on the Criteo Ad Kaggle dataset. Importantly, we show that the cost per iteration of ADAHESSIAN is comparable to first order methods, and that it exhibits robustness towards its hyperparameters.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Source Code:&lt;/strong&gt; Known repository: &lt;a href="https://github.com/amirgholami/adahessian"&gt;https://github.com/amirgholami/adahessian&lt;/a&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h3 class="heading" id="2-sophia-a-scalable-stochastic-second-order-optimizer-for-language-model-pre-training"&gt;
 2. Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training&lt;span class="heading__anchor"&gt; &lt;a href="#2-sophia-a-scalable-stochastic-second-order-optimizer-for-language-model-pre-training"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h3&gt;&lt;p&gt;&lt;strong&gt;Year:&lt;/strong&gt; 2023&lt;br&gt;
&lt;strong&gt;Authors:&lt;/strong&gt; Hong Liu, Zhiyuan Li, David Hall, Percy Liang, Tengyu Ma&lt;br&gt;
&lt;strong&gt;ArXiv ID:&lt;/strong&gt; arXiv:2305.14342&lt;br&gt;
&lt;strong&gt;Algorithm:&lt;/strong&gt; Sophia&lt;br&gt;
&lt;strong&gt;URL:&lt;/strong&gt; &lt;a href="https://arxiv.org/abs/2305.14342"&gt;https://arxiv.org/abs/2305.14342&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; Given the massive cost of language model pre-training, a non-trivial improvement of the optimization algorithm would lead to a material reduction on the time and cost of training. Adam and its variants have been state-of-the-art for years, and more sophisticated second-order (Hessian-based) optimizers often incur too much per-step overhead. In this paper, we propose Sophia, Second-order Clipped Stochastic Optimization, a simple scalable second-order optimizer that uses a light-weight estimate of the diagonal Hessian as the pre-conditioner. The update is the moving average of the gradients divided by the moving average of the estimated Hessian, followed by element-wise clipping. The clipping controls the worst-case update size and tames the negative impact of non-convexity and rapid change of Hessian along the trajectory. Sophia only estimates the diagonal Hessian every handful of iterations, which has negligible average per-step time and memory overhead. On language modeling with GPT models of sizes ranging from 125M to 1.5B, Sophia achieves a 2x speed-up compared to Adam in the number of steps, total compute, and wall-clock time, achieving the same perplexity with 50% fewer steps, less total compute, and reduced wall-clock time. Theoretically, we show that Sophia, in a much simplified setting, adapts to the heterogeneous curvatures in different parameter dimensions, and thus has a run-time bound that does not depend on the condition number of the loss.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Source Code:&lt;/strong&gt; Known repository: &lt;a href="https://github.com/Liuhong99/Sophia"&gt;https://github.com/Liuhong99/Sophia&lt;/a&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="hessian-free-optimization"&gt;
 Hessian-free Optimization&lt;span class="heading__anchor"&gt; &lt;a href="#hessian-free-optimization"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;h3 class="heading" id="1-learning-recurrent-neural-networks-with-hessian-free-optimization"&gt;
 1. Learning Recurrent Neural Networks with Hessian-Free Optimization&lt;span class="heading__anchor"&gt; &lt;a href="#1-learning-recurrent-neural-networks-with-hessian-free-optimization"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h3&gt;&lt;p&gt;&lt;strong&gt;Year:&lt;/strong&gt; 2011&lt;br&gt;
&lt;strong&gt;Authors:&lt;/strong&gt; James Martens, Ilya Sutskever&lt;br&gt;
&lt;strong&gt;ArXiv ID:&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;URL:&lt;/strong&gt; &lt;a href="https://www.cs.toronto.edu/~jmartens/docs/RNN_HF.pdf"&gt;https://www.cs.toronto.edu/~jmartens/docs/RNN_HF.pdf&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; In this work we resolve the long-outstanding problem of how to effectively train recurrent neural networks (RNNs) on complex and difficult sequence modeling problems which may contain long-term data dependencies. Utilizing recent advances in the Hessian-free optimization approach (Martens, 2010), together with a novel damping scheme, we successfully train RNNs on two sets of challenging problems. First, a collection of pathological synthetic datasets which are known to be impossible for standard optimization approaches (due to their extremely long-term dependencies), and second, on three natural and highly complex real-world sequence datasets where we find that our method significantly outperforms the previous state-of-the-art method for training neural sequence models: the Long Short-term Memory approach of Hochreiter and Schmidhuber (1997). Additionally, we offer a new interpretation of the generalized Gauss-Newton matrix of Schraudolph (2002) which is used within the HF approach of Martens.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Source Code:&lt;/strong&gt; No explicit source code information found&lt;/p&gt;
&lt;hr&gt;
&lt;h3 class="heading" id="2-training-neural-networks-with-stochastic-hessian-free-optimization"&gt;
 2. Training Neural Networks with Stochastic Hessian-Free Optimization&lt;span class="heading__anchor"&gt; &lt;a href="#2-training-neural-networks-with-stochastic-hessian-free-optimization"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h3&gt;&lt;p&gt;&lt;strong&gt;Year:&lt;/strong&gt; 2013&lt;br&gt;
&lt;strong&gt;Authors:&lt;/strong&gt; Ryan Kiros&lt;br&gt;
&lt;strong&gt;ArXiv ID:&lt;/strong&gt; arXiv:1301.3641&lt;br&gt;
&lt;strong&gt;Algorithm:&lt;/strong&gt; SHF&lt;br&gt;
&lt;strong&gt;URL:&lt;/strong&gt; &lt;a href="https://arxiv.org/abs/1301.3641"&gt;https://arxiv.org/abs/1301.3641&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; Hessian-free (HF) optimization has been successfully used for training deep autoencoders and recurrent networks. HF uses the conjugate gradient algorithm to construct update directions through curvature-vector products that can be computed on the same order of time as gradients. In this paper we exploit this property and study stochastic HF with gradient and curvature mini-batches independent of the dataset size. We modify Martens&amp;rsquo; HF for these settings and integrate dropout, a method for preventing co-adaptation of feature detectors, to guard against overfitting. Stochastic Hessian-free optimization gives an intermediary between SGD and HF that achieves competitive performance on both classification and deep autoencoder experiments.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Source Code:&lt;/strong&gt; Mentions &amp;lsquo;code&amp;rsquo; in abstract&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="quasi-newton"&gt;
 Quasi-Newton&lt;span class="heading__anchor"&gt; &lt;a href="#quasi-newton"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;h3 class="heading" id="1-a-stochastic-quasi-newton-method-for-large-scale-optimization"&gt;
 1. A Stochastic Quasi-Newton Method for Large-Scale Optimization&lt;span class="heading__anchor"&gt; &lt;a href="#1-a-stochastic-quasi-newton-method-for-large-scale-optimization"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h3&gt;&lt;p&gt;&lt;strong&gt;Year:&lt;/strong&gt; 2014&lt;br&gt;
&lt;strong&gt;Authors:&lt;/strong&gt; R.H. Byrd, S.L. Hansen, J. Nocedal, Y. Singer&lt;br&gt;
&lt;strong&gt;ArXiv ID:&lt;/strong&gt; arXiv:1401.7020&lt;br&gt;
&lt;strong&gt;URL:&lt;/strong&gt; &lt;a href="https://arxiv.org/abs/1401.7020"&gt;https://arxiv.org/abs/1401.7020&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; The question of how to incorporate curvature information in stochastic approximation methods is challenging. The direct application of classical quasi- Newton updating techniques for deterministic optimization leads to noisy curvature estimates that have harmful effects on the robustness of the iteration. In this paper, we propose a stochastic quasi-Newton method that is efficient, robust and scalable. It employs the classical BFGS update formula in its limited memory form, and is based on the observation that it is beneficial to collect curvature information pointwise, and at regular intervals, through (sub-sampled) Hessian-vector products. This technique differs from the classical approach that would compute differences of gradients, and where controlling the quality of the curvature estimates can be difficult. We present numerical results on problems arising in machine learning that suggest that the proposed method shows much promise.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Source Code:&lt;/strong&gt; No explicit source code information found&lt;/p&gt;
&lt;hr&gt;
&lt;h3 class="heading" id="2-a-multi-batch-l-bfgs-method-for-machine-learning"&gt;
 2. A Multi-Batch L-BFGS Method for Machine Learning&lt;span class="heading__anchor"&gt; &lt;a href="#2-a-multi-batch-l-bfgs-method-for-machine-learning"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h3&gt;&lt;p&gt;&lt;strong&gt;Year:&lt;/strong&gt; 2016&lt;br&gt;
&lt;strong&gt;Authors:&lt;/strong&gt; Albert S. Berahas, Jorge Nocedal, Martin Takáč&lt;br&gt;
&lt;strong&gt;ArXiv ID:&lt;/strong&gt; arXiv:1605.06049&lt;br&gt;
&lt;strong&gt;URL:&lt;/strong&gt; &lt;a href="https://arxiv.org/abs/1605.06049"&gt;https://arxiv.org/abs/1605.06049&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; The question of how to parallelize the stochastic gradient descent (SGD) method has received much attention in the literature. In this paper, we focus instead on batch methods that use a sizeable fraction of the training set at each iteration to facilitate parallelism, and that employ second-order information. In order to improve the learning process, we follow a multi-batch approach in which the batch changes at each iteration. This can cause difficulties because L-BFGS employs gradient differences to update the Hessian approximations, and when these gradients are computed using different data points the process can be unstable. This paper shows how to perform stable quasi-Newton updating in the multi-batch setting, illustrates the behavior of the algorithm in a distributed computing platform, and studies its convergence properties for both the convex and nonconvex cases.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Source Code:&lt;/strong&gt; No explicit source code information found&lt;/p&gt;
&lt;hr&gt;
&lt;h3 class="heading" id="3-stochastic-quasi-newton-with-line-search-regularization"&gt;
 3. Stochastic Quasi-Newton with Line-Search Regularization&lt;span class="heading__anchor"&gt; &lt;a href="#3-stochastic-quasi-newton-with-line-search-regularization"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h3&gt;&lt;p&gt;&lt;strong&gt;Year:&lt;/strong&gt; 2019&lt;br&gt;
&lt;strong&gt;Authors:&lt;/strong&gt; Adrian Wills, Thomas Schön&lt;br&gt;
&lt;strong&gt;ArXiv ID:&lt;/strong&gt; arXiv:1909.01238&lt;br&gt;
&lt;strong&gt;Algorithm:&lt;/strong&gt; SQN&lt;br&gt;
&lt;strong&gt;URL:&lt;/strong&gt; &lt;a href="https://arxiv.org/abs/1909.01238"&gt;https://arxiv.org/abs/1909.01238&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; In this paper we present a novel quasi-Newton algorithm for use in stochastic optimisation. Quasi-Newton methods have had an enormous impact on deterministic optimisation problems because they afford rapid convergence and computationally attractive algorithms. In essence, this is achieved by learning the second-order (Hessian) information based on observing first-order gradients. We extend these ideas to the stochastic setting by employing a highly flexible model for the Hessian and infer its value based on observing noisy gradients. In addition, we propose a stochastic counterpart to standard line-search procedures and demonstrate the utility of this combination on maximum likelihood identification for general nonlinear state space models.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Source Code:&lt;/strong&gt; No explicit source code information found&lt;/p&gt;
&lt;hr&gt;
&lt;h3 class="heading" id="4-practical-quasi-newton-methods-for-training-deep-neural-networks"&gt;
 4. Practical Quasi-Newton Methods for Training Deep Neural Networks&lt;span class="heading__anchor"&gt; &lt;a href="#4-practical-quasi-newton-methods-for-training-deep-neural-networks"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h3&gt;&lt;p&gt;&lt;strong&gt;Year:&lt;/strong&gt; 2020&lt;br&gt;
&lt;strong&gt;Authors:&lt;/strong&gt; Donald Goldfarb, Yi Ren, Achraf Bahamou&lt;br&gt;
&lt;strong&gt;ArXiv ID:&lt;/strong&gt; arXiv:2006.08877&lt;br&gt;
&lt;strong&gt;URL:&lt;/strong&gt; &lt;a href="https://arxiv.org/abs/2006.08877"&gt;https://arxiv.org/abs/2006.08877&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; We consider the development of practical stochastic quasi-Newton, and in particular Kronecker-factored block-diagonal BFGS and L-BFGS methods, for training deep neural networks (DNNs). In DNN training, the number of variables and components of the gradient $n$ is often of the order of tens of millions and the Hessian has $n^2$ elements. Consequently, computing and storing a full $n \times n$ BFGS approximation or storing a modest number of (step, change in gradient) vector pairs for use in an L-BFGS implementation is out of the question. In our proposed methods, we approximate the Hessian by a block-diagonal matrix and use the structure of the gradient and Hessian to further approximate these blocks, each of which corresponds to a layer, as the Kronecker product of two much smaller matrices. This is analogous to the approach in KFAC, which computes a Kronecker-factored block-diagonal approximation to the Fisher matrix in a stochastic natural gradient method. Because the indefinite and highly variable nature of the Hessian in a DNN, we also propose a new damping approach to keep the upper as well as the lower bounds of the BFGS and L-BFGS approximations bounded. In tests on autoencoder feed-forward neural network models with either nine or thirteen layers applied to three datasets, our methods outperformed or performed comparably to KFAC and state-of-the-art first-order stochastic methods.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Source Code:&lt;/strong&gt; Mentions &amp;lsquo;code&amp;rsquo; in abstract; Mentions &amp;lsquo;implementation&amp;rsquo; in abstract&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="gauss-newton"&gt;
 Gauss-Newton&lt;span class="heading__anchor"&gt; &lt;a href="#gauss-newton"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;h3 class="heading" id="1-efficient-subsampled-gauss-newton-and-natural-gradient-methods-for-training-neural-networks"&gt;
 1. Efficient Subsampled Gauss-Newton and Natural Gradient Methods for Training Neural Networks&lt;span class="heading__anchor"&gt; &lt;a href="#1-efficient-subsampled-gauss-newton-and-natural-gradient-methods-for-training-neural-networks"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h3&gt;&lt;p&gt;&lt;strong&gt;Year:&lt;/strong&gt; 2019&lt;br&gt;
&lt;strong&gt;Authors:&lt;/strong&gt; Yi Ren, Donald Goldfarb&lt;br&gt;
&lt;strong&gt;ArXiv ID:&lt;/strong&gt; arXiv:1906.02353&lt;br&gt;
&lt;strong&gt;Algorithm:&lt;/strong&gt; SWM-GN, SWM-NG&lt;br&gt;
&lt;strong&gt;URL:&lt;/strong&gt; &lt;a href="https://arxiv.org/abs/1906.02353"&gt;https://arxiv.org/abs/1906.02353&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; We present practical Levenberg-Marquardt variants of Gauss-Newton and natural gradient methods for solving non-convex optimization problems that arise in training deep neural networks involving enormous numbers of variables and huge data sets. Our methods use subsampled Gauss-Newton or Fisher information matrices and either subsampled gradient estimates (fully stochastic) or full gradients (semi-stochastic), which, in the latter case, we prove convergent to a stationary point. By using the Sherman-Morrison-Woodbury formula with automatic differentiation (backpropagation) we show how our methods can be implemented to perform efficiently. Finally, numerical results are presented to demonstrate the effectiveness of our proposed methods.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Source Code:&lt;/strong&gt; No explicit source code information found&lt;/p&gt;
&lt;hr&gt;
&lt;h3 class="heading" id="2-on-the-promise-of-the-stochastic-generalized-gauss-newton-method-for-training-dnns"&gt;
 2. On the Promise of the Stochastic Generalized Gauss-Newton Method for Training DNNs&lt;span class="heading__anchor"&gt; &lt;a href="#2-on-the-promise-of-the-stochastic-generalized-gauss-newton-method-for-training-dnns"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h3&gt;&lt;p&gt;&lt;strong&gt;Year:&lt;/strong&gt; 2020&lt;br&gt;
&lt;strong&gt;Authors:&lt;/strong&gt; Matilde Gargiani, et al.&lt;br&gt;
&lt;strong&gt;ArXiv ID:&lt;/strong&gt; arXiv:2006.02409&lt;br&gt;
&lt;strong&gt;Algorithm:&lt;/strong&gt; SGN&lt;br&gt;
&lt;strong&gt;URL:&lt;/strong&gt; &lt;a href="https://arxiv.org/abs/2006.02409"&gt;https://arxiv.org/abs/2006.02409&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; Following early work on Hessian-free methods for deep learning, we study a stochastic generalized Gauss-Newton method (SGN) for training DNNs. SGN is a second-order optimization method, with efficient iterations, that we demonstrate to often require substantially fewer iterations than standard SGD to converge. As the name suggests, SGN uses a Gauss-Newton approximation for the Hessian matrix, and, in order to compute an approximate search direction, relies on the conjugate gradient method combined with forward and reverse automatic differentiation. Despite the success of SGD and its first-order variants, and despite Hessian-free methods based on the Gauss-Newton Hessian approximation having been already theoretically proposed as practical methods for training DNNs, we believe that SGN has a lot of undiscovered and yet not fully displayed potential in big mini-batch scenarios. For this setting, we demonstrate that SGN does not only substantially improve over SGD in terms of the number of iterations, but also in terms of runtime. This is made possible by an efficient, easy-to-use and flexible implementation of SGN we propose in the Theano deep learning platform, which, unlike Tensorflow and Pytorch, supports forward automatic differentiation. This enables researchers to further study and improve this promising optimization technique and hopefully reconsider stochastic second-order methods as competitive optimization techniques for training DNNs; we also hope that the promise of SGN may lead to forward automatic differentiation being added to Tensorflow or Pytorch. Our results also show that in big mini-batch scenarios SGN is more robust than SGD with respect to its hyperparameters (we never had to tune its step-size for our benchmarks!), which eases the expensive process of hyperparameter tuning that is instead crucial for the performance of first-order methods.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Source Code:&lt;/strong&gt; Mentions &amp;lsquo;implementation&amp;rsquo; in abstract&lt;/p&gt;
&lt;hr&gt;
&lt;h3 class="heading" id="3-stochastic-gauss-newton-algorithms-for-nonconvex-compositional-optimization"&gt;
 3. Stochastic Gauss-Newton Algorithms for Nonconvex Compositional Optimization&lt;span class="heading__anchor"&gt; &lt;a href="#3-stochastic-gauss-newton-algorithms-for-nonconvex-compositional-optimization"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h3&gt;&lt;p&gt;&lt;strong&gt;Year:&lt;/strong&gt; 2020&lt;br&gt;
&lt;strong&gt;Authors:&lt;/strong&gt; Quoc Tran-Dinh, et al.&lt;br&gt;
&lt;strong&gt;ArXiv ID:&lt;/strong&gt; arXiv:2002.07290&lt;br&gt;
&lt;strong&gt;Algorithm:&lt;/strong&gt; SGN with SARAH estimators&lt;br&gt;
&lt;strong&gt;URL:&lt;/strong&gt; &lt;a href="https://arxiv.org/abs/2002.07290"&gt;https://arxiv.org/abs/2002.07290&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; We develop two new stochastic Gauss-Newton algorithms for solving a class of non-convex stochastic compositional optimization problems frequently arising in practice. We consider both the expectation and finite-sum settings under standard assumptions, and use both classical stochastic and SARAH estimators for approximating function values and Jacobians. In the expectation case, we establish $\mathcal{O}(\varepsilon^{-2})$ iteration-complexity to achieve a stationary point in expectation and estimate the total number of stochastic oracle calls for both function value and its Jacobian, where $\varepsilon$ is a desired accuracy. In the finite sum case, we also estimate $\mathcal{O}(\varepsilon^{-2})$ iteration-complexity and the total oracle calls with high probability. To our best knowledge, this is the first time such global stochastic oracle complexity is established for stochastic Gauss-Newton methods. Finally, we illustrate our theoretical results via two numerical examples on both synthetic and real datasets.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Source Code:&lt;/strong&gt; No explicit source code information found&lt;/p&gt;
&lt;hr&gt;
&lt;h3 class="heading" id="4-nonlinear-least-squares-for-large-scale-machine-learning-using-stochastic-jacobian-estimates"&gt;
 4. Nonlinear Least Squares for Large-Scale Machine Learning using Stochastic Jacobian Estimates&lt;span class="heading__anchor"&gt; &lt;a href="#4-nonlinear-least-squares-for-large-scale-machine-learning-using-stochastic-jacobian-estimates"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h3&gt;&lt;p&gt;&lt;strong&gt;Year:&lt;/strong&gt; 2021&lt;br&gt;
&lt;strong&gt;Authors:&lt;/strong&gt; Johannes J. Brust&lt;br&gt;
&lt;strong&gt;ArXiv ID:&lt;/strong&gt; arXiv:2107.05598&lt;br&gt;
&lt;strong&gt;Algorithm:&lt;/strong&gt; NLLS1, NLLSL&lt;br&gt;
&lt;strong&gt;URL:&lt;/strong&gt; &lt;a href="https://arxiv.org/abs/2107.05598"&gt;https://arxiv.org/abs/2107.05598&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; For large nonlinear least squares loss functions in machine learning we exploit the property that the number of model parameters typically exceeds the data in one batch. This implies a low-rank structure in the Hessian of the loss, which enables effective means to compute search directions. Using this property, we develop two algorithms that estimate Jacobian matrices and perform well when compared to state-of-the-art methods.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Source Code:&lt;/strong&gt; No explicit source code information found&lt;/p&gt;
&lt;hr&gt;
&lt;h3 class="heading" id="5-improving-levenberg-marquardt-algorithm-for-neural-networks"&gt;
 5. Improving Levenberg-Marquardt Algorithm for Neural Networks&lt;span class="heading__anchor"&gt; &lt;a href="#5-improving-levenberg-marquardt-algorithm-for-neural-networks"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h3&gt;&lt;p&gt;&lt;strong&gt;Year:&lt;/strong&gt; 2022&lt;br&gt;
&lt;strong&gt;Authors:&lt;/strong&gt; Omead Pooladzandi, Yiming Zhou&lt;br&gt;
&lt;strong&gt;ArXiv ID:&lt;/strong&gt; arXiv:2212.08769&lt;br&gt;
&lt;strong&gt;Algorithm:&lt;/strong&gt; LM&lt;br&gt;
&lt;strong&gt;URL:&lt;/strong&gt; &lt;a href="https://arxiv.org/abs/2212.08769"&gt;https://arxiv.org/abs/2212.08769&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; We explore the usage of the Levenberg-Marquardt (LM) algorithm for regression (non-linear least squares) and classification (generalized Gauss-Newton methods) tasks in neural networks. We compare the performance of the LM method with other popular first-order algorithms such as SGD and Adam, as well as other second-order algorithms such as L-BFGS , Hessian-Free and KFAC. We further speed up the LM method by using adaptive momentum, learning rate line search, and uphill step acceptance.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Source Code:&lt;/strong&gt; No explicit source code information found&lt;/p&gt;
&lt;hr&gt;
&lt;h3 class="heading" id="6-rethinking-gauss-newton-for-learning-over-parameterized-models"&gt;
 6. Rethinking Gauss-Newton for learning over-parameterized models&lt;span class="heading__anchor"&gt; &lt;a href="#6-rethinking-gauss-newton-for-learning-over-parameterized-models"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h3&gt;&lt;p&gt;&lt;strong&gt;Year:&lt;/strong&gt; 2023&lt;br&gt;
&lt;strong&gt;Authors:&lt;/strong&gt; Michael Arbel, et al.&lt;br&gt;
&lt;strong&gt;ArXiv ID:&lt;/strong&gt; arXiv:2302.02904&lt;br&gt;
&lt;strong&gt;URL:&lt;/strong&gt; &lt;a href="https://arxiv.org/abs/2302.02904"&gt;https://arxiv.org/abs/2302.02904&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; This work studies the global convergence and implicit bias of Gauss Newton&amp;rsquo;s (GN) when optimizing over-parameterized one-hidden layer networks in the mean-field regime. We first establish a global convergence result for GN in the continuous-time limit exhibiting a faster convergence rate compared to GD due to improved conditioning. We then perform an empirical study on a synthetic regression task to investigate the implicit bias of GN&amp;rsquo;s method. While GN is consistently faster than GD in finding a global optimum, the learned model generalizes well on test data when starting from random initial weights with a small variance and using a small step size to slow down convergence. Specifically, our study shows that such a setting results in a hidden learning phenomenon, where the dynamics are able to recover features with good generalization properties despite the model having sub-optimal training and test performances due to an under-optimized linear layer. This study exhibits a trade-off between the convergence speed of GN and the generalization ability of the learned solution.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Source Code:&lt;/strong&gt; No explicit source code information found&lt;/p&gt;
&lt;hr&gt;
&lt;h3 class="heading" id="7-exact-gauss-newton-optimization-for-training-deep-neural-networks"&gt;
 7. Exact Gauss-Newton Optimization for Training Deep Neural Networks&lt;span class="heading__anchor"&gt; &lt;a href="#7-exact-gauss-newton-optimization-for-training-deep-neural-networks"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h3&gt;&lt;p&gt;&lt;strong&gt;Year:&lt;/strong&gt; 2024&lt;br&gt;
&lt;strong&gt;Authors:&lt;/strong&gt; Mikalai Korbit, Adeyemi D. Adeoye, Alberto Bemporad, Mario Zanon&lt;br&gt;
&lt;strong&gt;ArXiv ID:&lt;/strong&gt; arXiv:2405.14402&lt;br&gt;
&lt;strong&gt;Algorithm:&lt;/strong&gt; EGN&lt;br&gt;
&lt;strong&gt;URL:&lt;/strong&gt; &lt;a href="https://arxiv.org/abs/2405.14402"&gt;https://arxiv.org/abs/2405.14402&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; We present EGN, a stochastic second-order optimization algorithm that combines the generalized Gauss-Newton (GN) Hessian approximation with low-rank linear algebra to compute the descent direction. Leveraging the Duncan-Guttman matrix identity, the parameter update is obtained by factorizing a matrix which has the size of the mini-batch. This is particularly advantageous for large-scale machine learning problems where the dimension of the neural network parameter vector is several orders of magnitude larger than the batch size. Additionally, we show how improvements such as line search, adaptive regularization, and momentum can be seamlessly added to EGN to further accelerate the algorithm. Moreover, under mild assumptions, we prove that our algorithm converges to an $\epsilon$-stationary point at a linear rate. Finally, our numerical experiments demonstrate that EGN consistently exceeds, or at most matches the generalization performance of well-tuned SGD, Adam, and SGN optimizers across various supervised and reinforcement learning tasks.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Source Code:&lt;/strong&gt; No explicit source code information found&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="fisher-information"&gt;
 Fisher Information&lt;span class="heading__anchor"&gt; &lt;a href="#fisher-information"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;h3 class="heading" id="1-optimizing-neural-networks-with-kronecker-factored-approximate-curvature"&gt;
 1. Optimizing Neural Networks with Kronecker-factored Approximate Curvature&lt;span class="heading__anchor"&gt; &lt;a href="#1-optimizing-neural-networks-with-kronecker-factored-approximate-curvature"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h3&gt;&lt;p&gt;&lt;strong&gt;Year:&lt;/strong&gt; 2015&lt;br&gt;
&lt;strong&gt;Authors:&lt;/strong&gt; James Martens, Roger Grosse&lt;br&gt;
&lt;strong&gt;ArXiv ID:&lt;/strong&gt; arXiv:1503.05671&lt;br&gt;
&lt;strong&gt;Algorithm:&lt;/strong&gt; K-FAC&lt;br&gt;
&lt;strong&gt;URL:&lt;/strong&gt; &lt;a href="https://arxiv.org/abs/1503.05671"&gt;https://arxiv.org/abs/1503.05671&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; We propose an efficient method for approximating natural gradient descent in neural networks which we call Kronecker-Factored Approximate Curvature (K-FAC). K-FAC is based on an efficiently invertible approximation of a neural network&amp;rsquo;s Fisher information matrix which is neither diagonal nor low-rank, and in some cases is completely non-sparse. It is derived by approximating various large blocks of the Fisher (corresponding to entire layers) as being the Kronecker product of two much smaller matrices. While only several times more expensive to compute than the plain stochastic gradient, the updates produced by K-FAC make much more progress optimizing the objective, which results in an algorithm that can be much faster than stochastic gradient descent with momentum in practice. And unlike some previously proposed approximate natural-gradient/Newton methods which use high-quality non-diagonal curvature matrices (such as Hessian-free optimization), K-FAC works very well in highly stochastic optimization regimes. This is because the cost of storing and inverting K-FAC&amp;rsquo;s approximation to the curvature matrix does not depend on the amount of data used to estimate it, which is a feature typically associated only with diagonal or low-rank approximations to the curvature matrix.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Source Code:&lt;/strong&gt; Known repository: Various implementations available&lt;/p&gt;
&lt;hr&gt;
&lt;h2 class="heading" id="other"&gt;
 Other&lt;span class="heading__anchor"&gt; &lt;a href="#other"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;h3 class="heading" id="1-second-order-optimization-with-lazy-hessians"&gt;
 1. Second-order optimization with lazy Hessians&lt;span class="heading__anchor"&gt; &lt;a href="#1-second-order-optimization-with-lazy-hessians"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h3&gt;&lt;p&gt;&lt;strong&gt;Year:&lt;/strong&gt; 2022&lt;br&gt;
&lt;strong&gt;Authors:&lt;/strong&gt; Nikita Doikov, El Mahdi Chayti, Martin Jaggi&lt;br&gt;
&lt;strong&gt;ArXiv ID:&lt;/strong&gt; arXiv:2212.00781&lt;br&gt;
&lt;strong&gt;URL:&lt;/strong&gt; &lt;a href="https://arxiv.org/abs/2212.00781"&gt;https://arxiv.org/abs/2212.00781&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; We analyze Newton&amp;rsquo;s method with lazy Hessian updates for solving general possibly non-convex optimization problems. We propose to reuse a previously seen Hessian for several iterations while computing new gradients at each step of the method. This significantly reduces the overall arithmetical complexity of second-order optimization schemes. By using the cubic regularization technique, we establish fast global convergence of our method to a second-order stationary point, while the Hessian does not need to be updated each iteration. For convex problems, we justify global and local superlinear rates for lazy Newton steps with quadratic regularization, which is easier to compute. The optimal frequency for updating the Hessian is once every $d$ iterations, where $d$ is the dimension of the problem. This provably improves the total arithmetical complexity of second-order algorithms by a factor $\sqrt{d}$.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Source Code:&lt;/strong&gt; No explicit source code information found&lt;/p&gt;
&lt;hr&gt;</description></item><item><title>Optimization Research Papers in JMLR Volume 24</title><link>https://blog.namln.org/en/mathematics/analysis/optimization/jmlr-v24/</link><pubDate>Fri, 29 Sep 2023 00:00:00 +0000</pubDate><guid>https://blog.namln.org/en/mathematics/analysis/optimization/jmlr-v24/</guid><description>&lt;h1 class="heading" id="optimization-research-papers-in-jmlr-volume-24-2023"&gt;
 Optimization Research Papers in JMLR Volume 24 (2023)&lt;span class="heading__anchor"&gt; &lt;a href="#optimization-research-papers-in-jmlr-volume-24-2023"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h1&gt;&lt;p&gt;This document lists papers from JMLR Volume 24 (2023) that focus on optimization research, categorized by their primary themes. Each paper is numbered starting from 1 within its subsection, with a brief description of its key contributions to optimization theory, algorithms, or applications.&lt;/p&gt;
&lt;h2 class="heading" id="convex-optimization"&gt;
 Convex Optimization&lt;span class="heading__anchor"&gt; &lt;a href="#convex-optimization"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;Papers addressing convex optimization problems, including sparse PCA, L0 regularization, and matrix decomposition.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Sparse PCA: A Geometric Approach&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Dimitris Bertsimas, Driss Lahlou Kitane&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops a geometric approach for sparse principal component analysis using convex optimization techniques.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Fundamental Limits and Algorithms for Sparse Linear Regression with Sublinear Sparsity&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Lan V. Truong&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Investigates algorithms and theoretical limits for sparse linear regression with sublinear sparsity in a convex framework.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Sparse Training with Lipschitz Continuous Loss Functions and a Weighted Group L0-norm Constraint&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Michael R. Metel&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes sparse training methods using Lipschitz continuous loss functions and group L0-norm constraints.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;MARS: A Second-Order Reduction Algorithm for High-Dimensional Sparse Precision Matrices Estimation&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Qian Li, Binyan Jiang, Defeng Sun&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Presents a second-order reduction algorithm for sparse precision matrix estimation using convex optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Sparse GCA and Thresholded Gradient Descent&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Sheng Gao, Zongming Ma&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops sparse generalized correlation analysis with thresholded gradient descent in a convex framework.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;A Parameter-Free Conditional Gradient Method for Composite Minimization under Hölder Condition&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Masaru Ito, Zhaosong Lu, Chuan He&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Introduces a parameter-free conditional gradient method for composite minimization under Hölder smoothness.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;L0Learn: A Scalable Package for Sparse Learning using L0 Regularization&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Hussein Hazimeh, Rahul Mazumder, Tim Nonet&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Presents a scalable package for sparse learning with L0 regularization in convex optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Sparse Plus Low Rank Matrix Decomposition: A Discrete Optimization Approach&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Dimitris Bertsimas, Ryan Cory-Wright, Nicholas A. G. Johnson&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes a discrete optimization approach for sparse plus low-rank matrix decomposition using convex methods.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Distributed Sparse Regression via Penalization&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Yao Ji, Gesualdo Scutari, Ying Sun, Harsha Honnappa&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops distributed sparse regression algorithms using penalization techniques in convex optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Elastic Gradient Descent, an Iterative Optimization Method Approximating the Solution Paths of the Elastic Net&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Oskar Allerbo, Johan Jonasson, Rebecka Jörnsten&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Introduces an iterative method approximating elastic net solution paths in convex settings.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;A Novel Integer Linear Programming Approach for Global L0 Minimization&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Diego Delle Donne, Matthieu Kowalski, Leo Liberti&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes an integer linear programming approach for global L0 minimization in convex optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 class="heading" id="nonconvex-optimization"&gt;
 Nonconvex Optimization&lt;span class="heading__anchor"&gt; &lt;a href="#nonconvex-optimization"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;Papers tackling nonconvex optimization, focusing on descent algorithms, majorization minimization, and minimax problems.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;A Line-Search Descent Algorithm for Strict Saddle Functions with Complexity Guarantees&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Michael J. O&amp;rsquo;Neill, Stephen J. Wright&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops a line-search descent algorithm for nonconvex strict saddle functions with complexity guarantees.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;An Inertial Block Majorization Minimization Framework for Nonsmooth Nonconvex Optimization&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Le Thi Khanh Hien, Duy Nhat Phan, Nicolas Gillis&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes an inertial block majorization minimization framework for nonsmooth nonconvex optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Restarted Nonconvex Accelerated Gradient Descent: No More Polylogarithmic Factor in the O(epsilon^(-7/4)) Complexity&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Huan Li, Zhouchen Lin&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Introduces a restarted accelerated gradient descent method for nonconvex optimization, eliminating polylogarithmic factors.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Preconditioned Gradient Descent for Overparameterized Nonconvex Burer-Monteiro Factorization with Global Optimality Certification&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Gavin Zhang, Salar Fattahi, Richard Y. Zhang&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops preconditioned gradient descent for nonconvex Burer-Monteiro factorization with global optimality guarantees.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Zeroth-Order Alternating Gradient Descent Ascent Algorithms for A Class of Nonconvex-Nonconcave Minimax Problems&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Zi Xu, Zi-Qi Wang, Jun-Lin Wang, Yu-Hong Dai&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes zeroth-order alternating gradient descent ascent for nonconvex-nonconcave minimax problems.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 class="heading" id="stochastic-optimization"&gt;
 Stochastic Optimization&lt;span class="heading__anchor"&gt; &lt;a href="#stochastic-optimization"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;Papers focusing on stochastic optimization methods, including gradient descent, proximal point methods, and continuous-time approaches.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;On the Convergence of Stochastic Gradient Descent with Bandwidth-Based Step Size&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Xiaoyu Wang, Ya-xiang Yuan&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Analyzes convergence of stochastic gradient descent with bandwidth-based step sizes.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Stochastic Optimization under Distributional Drift&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Joshua Cutler, Dmitriy Drusvyatskiy, Zaid Harchaoui&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Studies stochastic optimization under distributional drift with theoretical guarantees.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Improved Powered Stochastic Optimization Algorithms for Large-Scale Machine Learning&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Zhuang Yang&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes improved powered stochastic optimization algorithms for large-scale machine learning.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Sharper Analysis for Minibatch Stochastic Proximal Point Methods: Stability, Smoothness, and Deviation&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Xiao-Tong Yuan, Ping Li&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Provides a sharper analysis of minibatch stochastic proximal point methods, focusing on stability and smoothness.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;A Continuous-Time Stochastic Gradient Descent Method for Continuous Data&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Kexin Jin, Jonas Latz, Chenguang Liu, Carola-Bibiane Schönlieb&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Introduces a continuous-time stochastic gradient descent method for continuous data optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Sensitivity-Free Gradient Descent Algorithms&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Ion Matei, Maksym Zhenirovskyy, Johan de Kleer, John Maxwell&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops sensitivity-free gradient descent algorithms for stochastic optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 class="heading" id="distributeddecentralized-optimization"&gt;
 Distributed/Decentralized Optimization&lt;span class="heading__anchor"&gt; &lt;a href="#distributeddecentralized-optimization"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;Papers addressing distributed or decentralized optimization algorithms, focusing on federated learning, asynchronous updates, and network topology.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Decentralized Learning: Theoretical Optimality and Practical Improvements&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Yucheng Lu, Christopher De Sa&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Analyzes theoretical optimality and practical improvements for decentralized learning algorithms.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;A General Theory for Federated Optimization with Asynchronous and Heterogeneous Clients Updates&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Yann Fraboni, Richard Vidal, Laetitia Kameni, Marco Lorenzi&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Provides a general theory for federated optimization with asynchronous and heterogeneous client updates.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Buffered Asynchronous SGD for Byzantine Learning&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Yi-Rui Yang, Wu-Jun Li&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes buffered asynchronous SGD for Byzantine-resilient distributed learning.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Minimax Estimation for Personalized Federated Learning: An Alternative Between FedAvg and Local Training&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Shuxiao Chen, Qinqing Zheng, Qi Long, Weijie J. Su&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Investigates minimax estimation for personalized federated learning, comparing FedAvg and local training.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Removing Data Heterogeneity Influence Enhances Network Topology Dependence of Decentralized SGD&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Kun Yuan, Sulaiman A. Alghunaim, Xinmeng Huang&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Enhances decentralized SGD by addressing data heterogeneity and network topology dependence.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Multi-Consensus Decentralized Accelerated Gradient Descent&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Haishan Ye, Luo Luo, Ziang Zhou, Tong Zhang&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops multi-consensus decentralized accelerated gradient descent for distributed optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Accelerated Primal-Dual Mirror Dynamics for Centralized and Distributed Constrained Convex Optimization Problems&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: You Zhao, Xiaofeng Liao, Xing He, Mingliang Zhou, Chaojie Li&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes accelerated primal-dual mirror dynamics for centralized and distributed convex optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Beyond Spectral Gap: The Role of the Topology in Decentralized Learning&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Thijs Vogels, Hadrien Hendrikx, Martin Jaggi&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Examines the role of network topology in decentralized learning optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 class="heading" id="bandits-and-online-learning"&gt;
 Bandits and Online Learning&lt;span class="heading__anchor"&gt; &lt;a href="#bandits-and-online-learning"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;Papers addressing multi-armed bandits, online optimization, and regret minimization.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Adaptation to the Range in K-Armed Bandits&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Hédi Hadiji, Gilles Stoltz&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Studies adaptation to the range in k-armed bandit problems with regret minimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Dimension Reduction in Contextual Online Learning via Nonparametric Variable Selection&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Wenhao Li, Ningyuan Chen, L. Jeff Hong&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes dimension reduction techniques for contextual online learning with nonparametric variable selection.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Non-Stationary Online Learning with Memory and Non-Stochastic Control&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Peng Zhao, Yu-Hu Yan, Yu-Xiang Wang, Zhi-Hua Zhou&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Investigates non-stationary online learning with memory and non-stochastic control strategies.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Online Non-Stochastic Control with Partial Feedback&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Yu-Hu Yan, Peng Zhao, Zhi-Hua Zhou&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops online non-stochastic control methods with partial feedback for optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;A New Look at Dynamic Regret for Non-Stationary Stochastic Bandits&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Yasin Abbasi-Yadkori, András György, Nevena Lazić&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Analyzes dynamic regret in non-stationary stochastic bandit problems.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;A PDE Approach for Regret Bounds under Partial Monitoring&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Erhan Bayraktar, Ibrahim Ekren, Xin Zhang&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Uses a PDE-based approach to derive regret bounds for partial monitoring in online learning.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Continuous-in-Time Limit for Bayesian Bandits&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Yuhua Zhu, Zachary Izzo, Lexing Ying&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Explores the continuous-time limit for Bayesian bandit algorithms with theoretical guarantees.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Bandit Problems with Fidelity Rewards&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Gábor Lugosi, Ciara Pike-Burke, Pierre-André Savalle&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Studies bandit problems with fidelity rewards, focusing on regret minimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Linear Partial Monitoring for Sequential Decision Making: Algorithms, Regret Bounds and Applications&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Johannes Kirschner, Tor Lattimore, Andreas Krause&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops algorithms and regret bounds for linear partial monitoring in sequential decision-making.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 class="heading" id="optimization-in-reinforcement-learning"&gt;
 Optimization in Reinforcement Learning&lt;span class="heading__anchor"&gt; &lt;a href="#optimization-in-reinforcement-learning"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;Papers focusing on optimization techniques for reinforcement learning, including actor-critic methods and constrained RL.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Reinforcement Learning for Joint Optimization of Multiple Rewards&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Mridul Agarwal, Vaneet Aggarwal&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Focuses on reinforcement learning for optimizing multiple rewards simultaneously.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Provably Sample-Efficient Model-Free Algorithm for MDPs with Peak Constraints&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Qinbo Bai, Vaneet Aggarwal, Ather Gattami&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes a sample-efficient model-free algorithm for MDPs with peak constraints.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Off-Policy Actor-Critic with Emphatic Weightings&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Eric Graves, Ehsan Imani, Raksha Kumaraswamy, Martha White&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops off-policy actor-critic methods with emphatic weightings for RL optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;q-Learning for MDPs with General Spaces: Convergence and Near Optimality via Quantization under Weak Continuity&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Yanwei Jia, Xun Yu Zhou&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Analyzes q-learning convergence and near-optimality for MDPs with general state spaces.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Kaiqing Zhang, Sham M. Kakade, Tamer Basar, Lin F. Yang&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Studies model-based multi-agent RL in zero-sum Markov games with near-optimal sample complexity.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;F2A2: Flexible Fully-Decentralized Approximate Actor-Critic for Cooperative Multi-Agent Reinforcement Learning&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Wenhao Li, Bo Jin, Xiangfeng Wang, Junchi Yan, Hongyuan Zha&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes a flexible fully-decentralized approximate actor-critic method for cooperative multi-agent RL.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Adaptation Augmented Model-Based Policy Optimization&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Jian Shen, Hang Lai, Minghuan Liu, Han Zhao, Yong Yu, Weinan Zhang&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Introduces adaptation-augmented model-based policy optimization for RL.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Single Timescale Actor-Critic Method to Solve the Linear Quadratic Regulator with Convergence Guarantees&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Mo Zhou, Jianfeng Lu&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops a single timescale actor-critic method for linear quadratic regulators with convergence guarantees.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Convex Reinforcement Learning in Finite Trials&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Mirco Mutti, Riccardo De Santi, Piersilvio De Bartolomeis, Marcello Restelli&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Investigates convex reinforcement learning with finite trials, focusing on optimization techniques.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Double Duality: Variational Primal-Dual Policy Optimization for Constrained Reinforcement Learning&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Zihao Li, Boyi Liu, Zhuoran Yang, Zhaoran Wang, Mengdi Wang&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes a variational primal-dual policy optimization method for constrained RL.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Instance-Dependent Confidence and Early Stopping for Reinforcement Learning&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Eric Xia, Koulik Khamaru, Martin J. Wainwright, Michael I. Jordan&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops instance-dependent confidence bounds and early stopping strategies for RL optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 class="heading" id="other-optimization-topics"&gt;
 Other Optimization Topics&lt;span class="heading__anchor"&gt; &lt;a href="#other-optimization-topics"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;Papers covering miscellaneous optimization topics, including Riemannian optimization, matrix completion, and optimal transport.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;A Relaxed Inertial Forward-Backward-Forward Algorithm for Solving Monotone Inclusions with Application to GANs&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Radu I. Bot, Michael Sedlmayer, Phan Tu Vuong&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes a relaxed inertial forward-backward-forward algorithm for monotone inclusions with applications to GANs.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Discrete Variational Calculus for Accelerated Optimization&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Cédric M. Campos, Alejandro Mahillo, David Martín de Diego&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Introduces discrete variational calculus for accelerating optimization processes.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Online Optimization over Riemannian Manifolds&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Xi Wang, Zhipeng Tu, Yiguang Hong, Yingyi Wu, Guodong Shi&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops online optimization algorithms over Riemannian manifolds.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Fast Objective &amp;amp; Duality Gap Convergence for Non-Convex Strongly-Concave Min-Max Problems with PL Condition&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Zhishuai Guo, Yan Yan, Zhuoning Yuan, Tianbao Yang&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Analyzes fast convergence for non-convex strongly-concave min-max problems under the Polyak-Łojasiewicz condition.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Asynchronous Iterations in Optimization: New Sequence Results and Sharper Algorithmic Guarantees&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Hamid Reza Feyzmahdavian, Mikael Johansson&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Provides new sequence results and sharper guarantees for asynchronous optimization iterations.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;The Proximal ID Algorithm&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Ilya Shpitser, Zach Wood-Doughty, Eric J. Tchetgen Tchetgen&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Introduces a proximal algorithm for identification problems in optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;An Inexact Augmented Lagrangian Algorithm for Training Leaky ReLU Neural Network with Group Sparsity&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Wei Liu, Xin Liu, Xiaojun Chen&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops an inexact augmented Lagrangian algorithm for training leaky ReLU networks with group sparsity.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;On the Optimality of Nuclear-Norm-Based Matrix Completion for Problems with Smooth Non-Linear Structure&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Yunhua Xiang, Tianyu Zhang, Xu Wang, Ali Shojaie, Noah Simon&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Studies nuclear-norm-based matrix completion for problems with smooth nonlinear structures.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Importance Sparsification for Sinkhorn Algorithm&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Mengyu Li, Jun Yu, Tao Li, Cheng Meng&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes importance sparsification techniques for the Sinkhorn algorithm in optimal transport.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Near-Optimal Weighted Matrix Completion&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Oscar López&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Investigates near-optimal weighted matrix completion using optimization techniques.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Implicit Regularization and Entrywise Convergence of Riemannian Optimization for Low Tucker-Rank Tensor Completion&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Haifeng Wang, Jinchi Chen, Ke Wei&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Analyzes implicit regularization and entrywise convergence in Riemannian optimization for low Tucker-rank tensor completion.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;On Unbalanced Optimal Transport: Gradient Methods, Sparsity and Approximation Error&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Quang Minh Nguyen, Hoang H. Nguyen, Yi Zhou, Lam M. Nguyen&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Studies gradient methods for unbalanced optimal transport, focusing on sparsity and approximation error.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;</description></item><item><title>Optimization Research Papers in JMLR Volume 23</title><link>https://blog.namln.org/en/mathematics/analysis/optimization/jmlr-v23/</link><pubDate>Thu, 29 Sep 2022 00:00:00 +0000</pubDate><guid>https://blog.namln.org/en/mathematics/analysis/optimization/jmlr-v23/</guid><description>&lt;h1 class="heading" id="optimization-research-papers-in-jmlr-volume-23-2022"&gt;
 Optimization Research Papers in JMLR Volume 23 (2022)&lt;span class="heading__anchor"&gt; &lt;a href="#optimization-research-papers-in-jmlr-volume-23-2022"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h1&gt;&lt;p&gt;This document lists papers from JMLR Volume 23 (2022) that focus on optimization research, categorized by their primary themes. Each paper is numbered starting from 1 within its subsection, with a brief description of its key contributions to optimization theory, algorithms, or applications.&lt;/p&gt;
&lt;h2 class="heading" id="convex-optimization"&gt;
 Convex Optimization&lt;span class="heading__anchor"&gt; &lt;a href="#convex-optimization"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;Papers addressing convex optimization problems, including sparse PCA, L1-regularized SVMs, and metric-constrained problems.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Solving Large-Scale Sparse PCA to Certifiable (Near) Optimality&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Dimitris Bertsimas, Ryan Cory-Wright, Jean Pauphilet&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops convex optimization techniques for large-scale sparse principal component analysis with certifiable near-optimal solutions.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Novel Min-Max Reformulations of Linear Inverse Problems&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Mohammed Rayyan Sheriff, Debasish Chatterjee&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes min-max reformulations for linear inverse problems using convex optimization frameworks.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;New Insights for the Multivariate Square-Root Lasso&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Aaron J. Molstad&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Analyzes the square-root Lasso in multivariate settings, focusing on its convex optimization properties.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Towards An Efficient Approach for the Nonconvex lp Ball Projection: Algorithm and Analysis&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Xiangyu Yang, Jiashan Wang, Hao Wang&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops efficient algorithms for lp ball projection, addressing both convex and nonconvex aspects.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Solving L1-Regularized SVMs and Related Linear Programs: Revisiting the Effectiveness of Column and Constraint Generation&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Antoine Dedieu, Rahul Mazumder, Haoyue Wang&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Investigates L1-regularized SVMs using convex optimization with column and constraint generation.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Extensions to the Proximal Distance Method of Constrained Optimization&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Alfonso Landeros, Oscar Hernan Madrid Padilla, Hua Zhou, Kenneth Lange&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Extends the proximal distance method for constrained convex optimization problems.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Stochastic Subgradient for Composite Convex Optimization with Functional Constraints&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Ion Necoara, Nitesh Kumar Singh&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Analyzes stochastic subgradient methods for composite convex optimization with functional constraints.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;On Regularized Square-Root Regression Problems: Distributionally Robust Interpretation and Fast Computations&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Hong T.M. Chu, Kim-Chuan Toh, Yangjing Zhang&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Studies regularized square-root regression with a distributionally robust perspective and efficient computational methods.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Project and Forget: Solving Large-Scale Metric Constrained Problems&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Rishi Sonthalia, Anna C. Gilbert&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes a convex optimization approach for large-scale metric-constrained problems.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Faster Randomized Interior Point Methods for Tall/Wide Linear Programs&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Agniva Chowdhury, Gregory Dexter, Palma London, Haim Avron, Petros Drineas&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops randomized interior point methods for efficient optimization of tall/wide linear programs.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 class="heading" id="nonconvex-optimization"&gt;
 Nonconvex Optimization&lt;span class="heading__anchor"&gt; &lt;a href="#nonconvex-optimization"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;Papers tackling nonconvex optimization, focusing on optimality, stability, and convergence in nonsmooth and game settings.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Optimality and Stability in Non-Convex Smooth Games&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Guojun Zhang, Pascal Poupart, Yaoliang Yu&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Analyzes optimality and stability in nonconvex smooth games with convergence guarantees.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Simple and Optimal Stochastic Gradient Methods for Nonsmooth Nonconvex Optimization&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Zhize Li, Jian Li&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes simple and optimal stochastic gradient methods for nonsmooth, nonconvex optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Oracle Complexity in Nonsmooth Nonconvex Optimization&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Guy Kornowski, Ohad Shamir&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Studies the oracle complexity of nonsmooth nonconvex optimization problems.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Distributed Stochastic Gradient Descent: Nonconvexity, Nonsmoothness, and Convergence to Local Minima&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Brian Swenson, Ryan Murray, H. Vincent Poor, Soummya Kar&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Investigates distributed SGD for nonconvex, nonsmooth optimization with convergence to local minima.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 class="heading" id="stochastic-optimization"&gt;
 Stochastic Optimization&lt;span class="heading__anchor"&gt; &lt;a href="#stochastic-optimization"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;Papers focusing on stochastic optimization methods, including bundle methods, zeroth-order algorithms, and adaptive techniques.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;A Stochastic Bundle Method for Interpolation&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Alasdair Paren, Leonard Berrada, Rudra P. K. Poudel, M. Pawan Kumar&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Introduces a stochastic bundle method for efficient interpolation in optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;On Biased Stochastic Gradient Estimation&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Derek Driggs, Jingwei Liang, Carola-Bibiane Schönlieb&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Analyzes biases in stochastic gradient estimation and their impact on optimization performance.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Accelerated Zeroth-Order and First-Order Momentum Methods from Mini to Minimax Optimization&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Feihu Huang, Shangqian Gao, Jian Pei, Heng Huang&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes accelerated zeroth-order and first-order momentum methods for a range of optimization problems.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Stochastic Zeroth-Order Optimization under Nonstationarity and Nonconvexity&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Abhishek Roy, Krishnakumar Balasubramanian, Saeed Ghadimi, Prasant Mohapatra&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Studies zeroth-order optimization in nonstationary and nonconvex settings.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Accelerating Adaptive Cubic Regularization of Newton’s Method via Random Sampling&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Xi Chen, Bo Jiang, Tianyi Lin, Shuzhong Zhang&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Enhances Newton’s method with adaptive cubic regularization using random sampling.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;A Momentumized, Adaptive, Dual Averaged Gradient Method&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Aaron Defazio, Samy Jelassi&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops a momentum-based adaptive gradient method for stochastic optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Stochastic DCA with Variance Reduction and Applications in Machine Learning&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Hoai An Le Thi, Hoang Phuc Hau Luu, Hoai Minh Le, Tao Pham Dinh&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Introduces a stochastic difference-of-convex-functions algorithm with variance reduction for machine learning.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Robust Distributed Accelerated Stochastic Gradient Methods for Multi-Agent Networks&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Alireza Fallah, Mert Gürbüzbalaban, Asuman Ozdaglar, Umut Şimşekli, Lingjiong Zhu&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes robust stochastic gradient methods for distributed optimization in multi-agent networks.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;On Acceleration for Convex Composite Minimization with Noise-Corrupted Gradients and Approximate Proximal Mapping&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Qiang Zhou, Sinno Jialin Pan&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Addresses acceleration in convex composite minimization with noisy gradients.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Asymptotic Study of Stochastic Adaptive Algorithms in Non-Convex Landscape&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Sébastien Gadat, Ioana Gavra&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Analyzes the asymptotic behavior of stochastic adaptive algorithms in nonconvex settings.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Towards Practical Adam: Non-Convexity, Convergence Theory, and Mini-Batch Acceleration&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Congliang Chen, Li Shen, Fangyu Zou, Wei Liu&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Studies the Adam optimizer, focusing on nonconvexity, convergence, and mini-batch acceleration.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;An Efficient Sampling Algorithm for Non-Smooth Composite Potentials&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Wenlong Mou, Nicolas Flammarion, Martin J. Wainwright, Peter L. Bartlett&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops an efficient sampling algorithm for nonsmooth composite potentials in stochastic optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;SGD with Coordinate Sampling: Theory and Practice&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Rémi Leluc, François Portier&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Explores coordinate sampling in stochastic gradient descent with theoretical and practical insights.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 class="heading" id="distributeddecentralized-optimization"&gt;
 Distributed/Decentralized Optimization&lt;span class="heading__anchor"&gt; &lt;a href="#distributeddecentralized-optimization"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;Papers addressing distributed or decentralized optimization algorithms, focusing on communication efficiency and convergence.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Asymptotic Network Independence and Step-Size for a Distributed Subgradient Method&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Alex Olshevsky&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Analyzes step-size and convergence for a distributed subgradient optimization method.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Projection-Free Distributed Online Learning with Sublinear Communication Complexity&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Yuanyu Wan, Guanghui Wang, Wei-Wei Tu, Lijun Zhang&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops projection-free algorithms for distributed online learning with reduced communication complexity.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Variance Reduced EXTRA and DIGing and Their Optimal Acceleration for Strongly Convex Decentralized Optimization&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Huan Li, Zhouchen Lin, Yongchun Fang&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes variance-reduced methods for decentralized optimization with optimal acceleration.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 class="heading" id="submodular-optimization"&gt;
 Submodular Optimization&lt;span class="heading__anchor"&gt; &lt;a href="#submodular-optimization"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;Papers focusing on submodular optimization, particularly in model selection.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Joint Continuous and Discrete Model Selection via Submodularity&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Jonathan Bunton, Paulo Tabuada&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Uses submodularity for joint continuous and discrete model selection in optimization.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 class="heading" id="bandits-and-online-learning"&gt;
 Bandits and Online Learning&lt;span class="heading__anchor"&gt; &lt;a href="#bandits-and-online-learning"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;Papers addressing multi-armed bandits, online optimization, and regret minimization.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Multi-Agent Online Optimization with Delays: Asynchronicity, Adaptivity, and Optimism&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Yu-Guan Hsieh, Franck Iutzeler, Jérôme Malick, Panayotis Mertikopoulos&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Studies multi-agent online optimization with delays, focusing on asynchronicity and optimism.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Online Mirror Descent and Dual Averaging: Keeping Pace in the Dynamic Case&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Huang Fang, Nicholas J. A. Harvey, Victor S. Portella, Michael P. Friedlander&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Analyzes online mirror descent and dual averaging for dynamic online optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;No Weighted-Regret Learning in Adversarial Bandits with Delays&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Ilai Bistritz, Zhengyuan Zhou, Xi Chen, Nicholas Bambos, Jose Blanchet&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Investigates regret minimization in adversarial bandits with delays.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;KL-UCB-Switch: Optimal Regret Bounds for Stochastic Bandits from Both a Distribution-Dependent and a Distribution-Free Viewpoints&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Aurélien Garivier, Hédi Hadiji, Pierre Ménard, Gilles Stoltz&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Provides optimal regret bounds for stochastic bandits using KL-UCB-Switch.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Multi-Agent Multi-Armed Bandits with Limited Communication&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Mridul Agarwal, Vaneet Aggarwal, Kamyar Azizzadenesheli&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Explores multi-agent bandits with limited communication, focusing on regret minimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Nonstochastic Bandits with Composite Anonymous Feedback&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Nicolò Cesa-Bianchi, Tommaso Cesari, Roberto Colomboni, Claudio Gentile, Yishay Mansour&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Studies nonstochastic bandits with composite feedback, analyzing regret and optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Expected Regret and Pseudo-Regret are Equivalent When the Optimal Arm is Unique&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Daron Anderson, Douglas J. Leith&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proves equivalence of expected regret and pseudo-regret in specific bandit settings.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 class="heading" id="bayesian-and-hyperparameter-optimization"&gt;
 Bayesian and Hyperparameter Optimization&lt;span class="heading__anchor"&gt; &lt;a href="#bayesian-and-hyperparameter-optimization"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;Papers addressing Bayesian optimization and hyperparameter tuning for efficient optimization.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;SMAC3: A Versatile Bayesian Optimization Package for Hyperparameter Optimization&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Marius Lindauer, Katharina Eggensperger, Matthias Feurer, André Biedenkapp, Difan Deng, Carolin Benjamins, Tim Ruhkopf, René Sass, Frank Hutter&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Presents SMAC3, a versatile Bayesian optimization package for hyperparameter tuning.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Implicit Differentiation for Fast Hyperparameter Selection in Non-Smooth Convex Learning&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Quentin Bertrand, Quentin Klopfenstein, Mathurin Massias, Mathieu Blondel, Samuel Vaiter, Alexandre Gramfort, Joseph Salmon&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Uses implicit differentiation for efficient hyperparameter selection in nonsmooth convex optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Auto-Sklearn 2.0: Hands-Free AutoML via Meta-Learning&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Matthias Feurer, Katharina Eggensperger, Stefan Falkner, Marius Lindauer, Frank Hutter&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Introduces Auto-Sklearn 2.0, leveraging meta-learning for automated hyperparameter optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 class="heading" id="optimization-in-reinforcement-learning"&gt;
 Optimization in Reinforcement Learning&lt;span class="heading__anchor"&gt; &lt;a href="#optimization-in-reinforcement-learning"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;Papers focusing on optimization techniques for reinforcement learning, including policy gradient and value estimation.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;A Generalized Projected Bellman Error for Off-Policy Value Estimation in Reinforcement Learning&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Andrew Patterson, Adam White, Martha White&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops optimization methods for off-policy value estimation using a generalized projected Bellman error.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Alan Chan, Hugo Silva, Sungsu Lim, Tadashi Kozuno, A. Rupam Mahmood, Martha White&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Investigates greedification operators for policy optimization, focusing on KL divergences.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Policy Gradient and Actor-Critic Learning in Continuous Time and Space: Theory and Algorithms&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Yanwei Jia, Xun Yu Zhou&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Analyzes policy gradient and actor-critic methods for continuous-time RL optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;On the Convergence Rates of Policy Gradient Methods&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Lin Xiao&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Studies convergence rates of policy gradient methods in reinforcement learning.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor-Critic under State Distribution Mismatch&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Shangtong Zhang, Remi Tachet des Combes, Romain Laroche&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Examines global optimality in softmax off-policy actor-critic methods under distribution mismatch.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 class="heading" id="other-optimization-topics"&gt;
 Other Optimization Topics&lt;span class="heading__anchor"&gt; &lt;a href="#other-optimization-topics"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;Papers covering miscellaneous optimization topics, including proximal algorithms, tensor completion, and learning-to-optimize frameworks.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;TFPnP: Tuning-Free Plug-and-Play Proximal Algorithms with Applications to Inverse Imaging Problems&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Kaixuan Wei, Angelica Aviles-Rivero, Jingwei Liang, Ying Fu, Hua Huang, Carola-Bibiane Schönlieb&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Introduces tuning-free proximal algorithms for inverse imaging problems.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;On the Complexity of Approximating Multimarginal Optimal Transport&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Tianyi Lin, Nhat Ho, Marco Cuturi, Michael I. Jordan&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Analyzes the complexity of approximating multimarginal optimal transport problems.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Riemannian Stochastic Proximal Gradient Methods for Nonsmooth Optimization over the Stiefel Manifold&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Bokun Wang, Shiqian Ma, Lingzhou Xue&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes stochastic proximal gradient methods for nonsmooth optimization on the Stiefel manifold.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Provable Tensor-Train Format Tensor Completion by Riemannian Optimization&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Jian-Feng Cai, Jingyang Li, Dong Xia&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops Riemannian optimization for tensor-train format tensor completion.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Let’s Make Block Coordinate Descent Converge Faster: Faster Greedy Rules, Message-Passing, Active-Set Complexity, and Superlinear Convergence&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Julie Nutini, Issam Laradji, Mark Schmidt&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Enhances block coordinate descent with faster convergence techniques.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;On the Efficiency of Entropic Regularized Algorithms for Optimal Transport&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Tianyi Lin, Nhat Ho, Michael I. Jordan&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Studies entropic regularization for efficient optimal transport algorithms.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Explicit Convergence Rates of Greedy and Random Quasi-Newton Methods&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Dachao Lin, Haishan Ye, Zhihua Zhang&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Provides explicit convergence rates for greedy and random quasi-Newton methods.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Scaling and Scalability: Provable Nonconvex Low-Rank Tensor Estimation from Incomplete Measurements&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Tian Tong, Cong Ma, Ashley Prater-Bennette, Erin Tripp, Yuejie Chi&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Addresses nonconvex low-rank tensor estimation with provable guarantees.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Learning to Optimize: A Primer and A Benchmark&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Tianlong Chen, Xiaohan Chen, Wuyang Chen, Howard Heaton, Jialin Liu, Zhangyang Wang, Wotao Yin&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Provides a primer and benchmark for learning-to-optimize techniques.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Clustering with Semidefinite Programming and Fixed Point Iteration&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Pedro Felzenszwalb, Caroline Klivans, Alice Paul&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Uses semidefinite programming and fixed-point iteration for clustering optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;A Bregman Learning Framework for Sparse Neural Networks&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Leon Bungert, Tim Roith, Daniel Tenbrinck, Martin Burger&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Introduces a Bregman learning framework for optimizing sparse neural networks.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;When is the Convergence Time of Langevin Algorithms Dimension Independent? A Composite Optimization Viewpoint&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Yoav Freund, Yi-An Ma, Tong Zhang&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Analyzes dimension-independent convergence of Langevin algorithms from a composite optimization perspective.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Sparse Continuous Distributions and Fenchel-Young Losses&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: André F. T. Martins, Marcos Treviso, António Farinhas, Pedro M. Q. Aguiar, Mário A. T. Figueiredo, Mathieu Blondel, Vlad Niculae&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Explores sparse continuous distributions using Fenchel-Young losses for optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Handling Hard Affine SDP Shape Constraints in RKHSs&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Pierre-Cyril Aubin-Frankowski, Zoltan Szabo&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Addresses affine SDP constraints in reproducing kernel Hilbert spaces for optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;OMLT: Optimization &amp;amp; Machine Learning Toolkit&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Francesco Ceccon, Jordan Jalving, Joshua Haddad, Alexander Thebelt, Calvin Tsay, Carl D Laird, Ruth Misener&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Presents OMLT, a toolkit integrating optimization and machine learning techniques.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;</description></item><item><title>Optimization Research Papers in JMLR Volume 22</title><link>https://blog.namln.org/en/mathematics/analysis/optimization/jmlr-v22/</link><pubDate>Wed, 29 Sep 2021 00:00:00 +0000</pubDate><guid>https://blog.namln.org/en/mathematics/analysis/optimization/jmlr-v22/</guid><description>&lt;h1 class="heading" id="optimization-research-papers-in-jmlr-volume-22-2021"&gt;
 Optimization Research Papers in JMLR Volume 22 (2021)&lt;span class="heading__anchor"&gt; &lt;a href="#optimization-research-papers-in-jmlr-volume-22-2021"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h1&gt;&lt;p&gt;This document lists papers from JMLR Volume 22 (2021) that focus on optimization research, categorized by their primary themes. Each paper is numbered starting from 1 within its subsection, with a brief description of its key contributions to optimization theory, algorithms, or applications.&lt;/p&gt;
&lt;h2 class="heading" id="convex-optimization"&gt;
 Convex Optimization&lt;span class="heading__anchor"&gt; &lt;a href="#convex-optimization"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;Papers addressing convex optimization problems, including clustering, Wasserstein barycenters, sparse optimization, and bandits.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Convex Clustering: Model, Theoretical Guarantee and Efficient Algorithm&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Defeng Sun, Kim-Chuan Toh, Yancheng Yuan&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes a convex clustering model with theoretical guarantees and an efficient algorithm.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;A Fast Globally Linearly Convergent Algorithm for the Computation of Wasserstein Barycenters&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Lei Yang, Jia Li, Defeng Sun, Kim-Chuan Toh&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops a fast, globally linearly convergent algorithm for computing Wasserstein barycenters.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Wasserstein Barycenters Can Be Computed in Polynomial Time in Fixed Dimension&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Jason M. Altschuler, Enric Boix-Adsera&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Demonstrates that Wasserstein barycenters can be computed in polynomial time for fixed dimensions.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;From Low Probability to High Confidence in Stochastic Convex Optimization&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Damek Davis, Dmitriy Drusvyatskiy, Lin Xiao, Junyu Zhang&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Analyzes methods to achieve high-confidence solutions in stochastic convex optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Sparse and Smooth Signal Estimation: Convexification of L0-Formulations&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Alper Atamturk, Andres Gomez, Shaoning Han&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes convexification techniques for L0-formulations in sparse and smooth signal estimation.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Stochastic Proximal AUC Maximization&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Yunwen Lei, Yiming Ying&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops stochastic proximal methods for maximizing the area under the ROC curve (AUC) in convex settings.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Sparse Convex Optimization via Adaptively Regularized Hard Thresholding&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Kyriakos Axiotis, Maxim Sviridenko&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Introduces adaptively regularized hard thresholding for sparse convex optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Learning Sparse Classifiers: Continuous and Mixed Integer Optimization Perspectives&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Antoine Dedieu, Hussein Hazimeh, Rahul Mazumder&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Explores continuous and mixed-integer optimization approaches for learning sparse classifiers.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;First-Order Convergence Theory for Weakly-Convex-Weakly-Concave Min-max Problems&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Mingrui Liu, Hassan Rafique, Qihang Lin, Tianbao Yang&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Provides first-order convergence theory for weakly convex-weakly concave min-max problems.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Convex Geometry and Duality of Over-parameterized Neural Networks&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Tolga Ergen, Mert Pilanci&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Analyzes convex geometry and duality in over-parameterized neural networks.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Linear Bandits on Uniformly Convex Sets&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Thomas Kerdreux, Christophe Roux, Alexandre d&amp;rsquo;Aspremont, Sebastian Pokutta&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Studies linear bandits on uniformly convex sets, focusing on convex optimization techniques.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 class="heading" id="nonconvex-optimization"&gt;
 Nonconvex Optimization&lt;span class="heading__anchor"&gt; &lt;a href="#nonconvex-optimization"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;Papers tackling nonconvex optimization, including stochastic gradient descent, neural network training, and stability properties.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Online Stochastic Gradient Descent on Non-Convex Losses from High-Dimensional Inference&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Gerard Ben Arous, Reza Gheissari, Aukosh Jagannath&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Analyzes online stochastic gradient descent for nonconvex losses in high-dimensional inference.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Non-attracting Regions of Local Minima in Deep and Wide Neural Networks&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Henning Petzka, Cristian Sminchisescu&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Investigates non-attracting regions of local minima in deep and wide neural networks.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;When Does Gradient Descent with Logistic Loss Find Interpolating Two-Layer Networks?&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Niladri S. Chatterji, Philip M. Long, Peter L. Bartlett&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Examines conditions under which gradient descent with logistic loss finds interpolating two-layer networks.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Replica Exchange for Non-Convex Optimization&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Jing Dong, Xin T. Tong&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes replica exchange methods for nonconvex optimization problems.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Failures of Model-Dependent Generalization Bounds for Least-Norm Interpolation&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Peter L. Bartlett, Philip M. Long&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Analyzes limitations of model-dependent generalization bounds in least-norm interpolation.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;On the Stability Properties and the Optimization Landscape of Training Problems with Squared Loss for Neural Networks and General Nonlinear Conic Approximation Schemes&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Constantin Christof&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Studies stability and optimization landscapes for neural network training with squared loss.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 class="heading" id="stochastic-optimization"&gt;
 Stochastic Optimization&lt;span class="heading__anchor"&gt; &lt;a href="#stochastic-optimization"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;Papers focusing on stochastic optimization methods, including momentum, Langevin dynamics, and communication-efficient algorithms.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Continuous Time Analysis of Momentum Methods&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Nikola B. Kovachki, Andrew M. Stuart&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Provides a continuous-time analysis of momentum methods in stochastic optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Generalization Performance of Multi-pass Stochastic Gradient Descent with Convex Loss Functions&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Yunwen Lei, Ting Hu, Ke Tang&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Analyzes generalization performance of multi-pass stochastic gradient descent for convex losses.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;High-Order Langevin Diffusion Yields an Accelerated MCMC Algorithm&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Wenlong Mou, Yi-An Ma, Martin J. Wainwright, Peter L. Bartlett, Michael I. Jordan&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops an accelerated MCMC algorithm using high-order Langevin diffusion.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Path Length Bounds for Gradient Descent and Flow&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Chirag Gupta, Sivaraman Balakrishnan, Aaditya Ramdas&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Establishes path length bounds for gradient descent and flow in stochastic optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Optimization with Momentum: Dynamical, Control-Theoretic, and Symplectic Perspectives&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Michael Muehlebach, Michael I. Jordan&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Analyzes momentum-based optimization from dynamical, control-theoretic, and symplectic perspectives.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;L-SVRG and L-Katyusha with Arbitrary Sampling&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Xun Qian, Zheng Qu, Peter Richtárik&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Introduces L-SVRG and L-Katyusha algorithms with arbitrary sampling for stochastic optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;A Lyapunov Analysis of Accelerated Methods in Optimization&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Ashia C. Wilson, Ben Recht, Michael I. Jordan&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Provides a Lyapunov analysis for accelerated optimization methods.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;NUQSGD: Provably Communication-Efficient Data-Parallel SGD via Nonuniform Quantization&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Ali Ramezani-Kebrya, Fartash Faghri, Ilya Markov, Vitalii Aksenov, Dan Alistarh, Daniel M. Roy&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes NUQSGD, a communication-efficient stochastic gradient descent method using nonuniform quantization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;An Inertial Newton Algorithm for Deep Learning&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Camille Castera, Jérôme Bolte, Cédric Févotte, Edouard Pauwels&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops an inertial Newton algorithm for deep learning optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Accelerating Ill-Conditioned Low-Rank Matrix Estimation via Scaled Gradient Descent&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Tian Tong, Cong Ma, Yuejie Chi&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes scaled gradient descent for accelerating ill-conditioned low-rank matrix estimation.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;On ADMM in Deep Learning: Convergence and Saturation-Avoidance&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Jinshan Zeng, Shao-Bo Lin, Yuan Yao, Ding-Xuan Zhou&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Analyzes convergence and saturation-avoidance properties of ADMM in deep learning.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;A Unified Convergence Analysis for Shuffling-Type Gradient Methods&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Lam M. Nguyen, Quoc Tran-Dinh, Dzung T. Phan, Phuong Ha Nguyen, Marten van Dijk&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Provides a unified convergence analysis for shuffling-type gradient methods.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Stochastic Online Optimization Using Kalman Recursion&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Joseph de Vilmarest, Olivier Wintenberger&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Applies Kalman recursion to stochastic online optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Expanding Boundaries of Gap Safe Screening&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Cassio F. Dantas, Emmanuel Soubies, Cédric Févotte&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Expands gap safe screening techniques for stochastic optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Consensus-Based Optimization on the Sphere: Convergence to Global Minimizers and Machine Learning&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Massimo Fornasier, Lorenzo Pareschi, Hui Huang, Philippe Sünnen&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops consensus-based optimization on the sphere with applications to machine learning.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Decentralized Stochastic Gradient Langevin Dynamics and Hamiltonian Monte Carlo&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Mert Gürbüzbalaban, Xuefeng Gao, Yuanhan Hu, Lingjiong Zhu&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes decentralized stochastic gradient Langevin dynamics and Hamiltonian Monte Carlo methods.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 class="heading" id="distributeddecentralized-optimization"&gt;
 Distributed/Decentralized Optimization&lt;span class="heading__anchor"&gt; &lt;a href="#distributeddecentralized-optimization"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;Papers addressing distributed or decentralized optimization algorithms, focusing on communication efficiency and scalability.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Projection-Free Decentralized Online Learning for Submodular Maximization over Time-Varying Networks&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Junlong Zhu, Qingtao Wu, Mingchuan Zhang, Ruijuan Zheng, Keqin Li&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops projection-free decentralized online learning for submodular maximization over time-varying networks.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Communication-Efficient Distributed Covariance Sketch, with Application to Distributed PCA&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Zengfeng Huang, Xuemin Lin, Wenjie Zhang, Ying Zhang&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes a communication-efficient distributed covariance sketch for distributed PCA.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Optimal Rates of Distributed Regression with Imperfect Kernels&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Hongwei Sun, Qiang Wu&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Establishes optimal rates for distributed regression with imperfect kernels.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;One-Shot Federated Learning: Theoretical Limits and Algorithms to Achieve Them&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Saber Salehkaleybar, Arsalan Sharifnassab, S. Jamaloddin Golestani&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Analyzes theoretical limits and algorithms for one-shot federated learning.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Cooperative SGD: A Unified Framework for the Design and Analysis of Local-Update SGD Algorithms&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Jianyu Wang, Gauri Joshi&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Introduces a unified framework for designing and analyzing local-update SGD algorithms.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;DeEPCA: Decentralized Exact PCA with Linear Convergence Rate&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Haishan Ye, Tong Zhang&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops DeEPCA, a decentralized exact PCA method with linear convergence.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 class="heading" id="submodular-optimization"&gt;
 Submodular Optimization&lt;span class="heading__anchor"&gt; &lt;a href="#submodular-optimization"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;Papers focusing on submodular optimization, particularly in experimental design.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Batch Greedy Maximization of Non-Submodular Functions: Guarantees and Applications to Experimental Design&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Jayanth Jagalur-Mohan, Youssef Marzouk&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Provides guarantees for batch greedy maximization of non-submodular functions with applications to experimental design.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 class="heading" id="bandits-and-online-learning"&gt;
 Bandits and Online Learning&lt;span class="heading__anchor"&gt; &lt;a href="#bandits-and-online-learning"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;Papers addressing multi-armed bandits, online optimization, and regret minimization.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Regulating Greed Over Time in Multi-Armed Bandits&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Stefano Tracà, Cynthia Rudin, Weiyu Yan&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Studies methods to regulate greed over time in multi-armed bandits.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Preference-Based Online Learning with Dueling Bandits: A Survey&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Viktor Bengs, Róbert Busa-Fekete, Adil El Mesaoudi-Paul, Eyke Hüllermeier&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Surveys preference-based online learning with dueling bandits.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;On Multi-Armed Bandit Designs for Dose-Finding Trials&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Maryam Aziz, Emilie Kaufmann, Marie-Karelle Riviere&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Explores multi-armed bandit designs for dose-finding trials.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Tsallis-INF: An Optimal Algorithm for Stochastic and Adversarial Bandits&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Julian Zimmert, Yevgeny Seldin&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes Tsallis-INF, an optimal algorithm for stochastic and adversarial bandits.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Bandit Convex Optimization in Non-Stationary Environments&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Peng Zhao, Guanghui Wang, Lijun Zhang, Zhi-Hua Zhou&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Addresses bandit convex optimization in non-stationary environments.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;A Contextual Bandit Bake-off&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Alberto Bietti, Alekh Agarwal, John Langford&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Compares contextual bandit algorithms in a comprehensive evaluation.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;MetaGrad: Adaptation Using Multiple Learning Rates in Online Learning&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Tim van Erven, Wouter M. Koolen, Dirk van der Hoeven&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Introduces MetaGrad, an adaptive online learning algorithm with multiple learning rates.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Achieving Fairness in the Stochastic Multi-Armed Bandit Problem&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Vishakha Patil, Ganesh Ghalme, Vineet Nair, Y. Narahari&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops methods for achieving fairness in stochastic multi-armed bandits.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Refined Approachability Algorithms and Application to Regret Minimization with Global Costs&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Joon Kwon&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes refined approachability algorithms for regret minimization with global costs.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Bandit Learning in Decentralized Matching Markets&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Lydia T. Liu, Feng Ruan, Horia Mania, Michael I. Jordan&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Applies bandit learning to decentralized matching markets.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Thompson Sampling Algorithms for Cascading Bandits&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Zixin Zhong, Wang Chi Chueng, Vincent Y. F. Tan&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops Thompson sampling algorithms for cascading bandits.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Fast Learning for Renewal Optimization in Online Task Scheduling&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Michael J. Neely&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes fast learning methods for renewal optimization in online task scheduling.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 class="heading" id="bayesian-and-hyperparameter-optimization"&gt;
 Bayesian and Hyperparameter Optimization&lt;span class="heading__anchor"&gt; &lt;a href="#bayesian-and-hyperparameter-optimization"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;Papers addressing Bayesian optimization and hyperparameter tuning for scalable and robust optimization.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;An Empirical Study of Bayesian Optimization: Acquisition Versus Partition&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Erich Merrill, Alan Fern, Xiaoli Fern, Nima Dolatnia&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Conducts an empirical study comparing acquisition and partition strategies in Bayesian optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Hyperparameter Optimization via Sequential Uniform Designs&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Zebin Yang, Aijun Zhang&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes sequential uniform designs for hyperparameter optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Are We Forgetting about Compositional Optimisers in Bayesian Optimisation?&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Antoine Grosnit, Alexander I. Cowen-Rivers, Rasul Tutunov, Ryan-Rhys Griffiths, Jun Wang, Haitham Bou-Ammar&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Explores the role of compositional optimizers in Bayesian optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;GIBBON: General-Purpose Information-Based Bayesian Optimisation&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Henry B. Moss, David S. Leslie, Javier Gonzalez, Paul Rayson&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Introduces GIBBON, a general-purpose information-based Bayesian optimization framework.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;On lp-Hyperparameter Learning via Bilevel Nonsmooth Optimization&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Takayuki Okuno, Akiko Takeda, Akihiro Kawana, Motokazu Watanabe&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Studies lp-hyperparameter learning using bilevel nonsmooth optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 class="heading" id="optimization-in-reinforcement-learning"&gt;
 Optimization in Reinforcement Learning&lt;span class="heading__anchor"&gt; &lt;a href="#optimization-in-reinforcement-learning"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;Papers focusing on optimization techniques for reinforcement learning, including policy iteration and Q-learning.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Safe Policy Iteration: A Monotonically Improving Approximate Policy Iteration Approach&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Alberto Maria Metelli, Matteo Pirotta, Daniele Calandriello, Marcello Restelli&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes a safe policy iteration method with monotonic improvement for reinforcement learning.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Alekh Agarwal, Sham M. Kakade, Jason D. Lee, Gaurav Mahajan&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Analyzes the optimality, approximation, and distribution shift in policy gradient methods.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Langevin Dynamics for Adaptive Inverse Reinforcement Learning of Stochastic Gradient Algorithms&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Vikram Krishnamurthy, George Yin&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Applies Langevin dynamics to adaptive inverse reinforcement learning for stochastic gradient algorithms.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Hamilton-Jacobi Deep Q-Learning for Deterministic Continuous-Time Systems with Lipschitz Continuous Controls&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Jeongho Kim, Jaeuk Shin, Insoon Yang&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops Hamilton-Jacobi deep Q-learning for deterministic continuous-time systems.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Partial Policy Iteration for L1-Robust Markov Decision Processes&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Chin Pang Ho, Marek Petrik, Wolfram Wiesemann&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Introduces partial policy iteration for L1-robust Markov decision processes.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Gaussian Approximation for Bias Reduction in Q-Learning&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Carlo D&amp;rsquo;Eramo, Andrea Cini, Alessandro Nuara, Matteo Pirotta, Cesare Alippi, Jan Peters, Marcello Restelli&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes Gaussian approximation techniques for bias reduction in Q-learning.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 class="heading" id="other-optimization-topics"&gt;
 Other Optimization Topics&lt;span class="heading__anchor"&gt; &lt;a href="#other-optimization-topics"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;Papers covering miscellaneous optimization topics, including Newton methods, SVM training, and eigenvector computation.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Global and Quadratic Convergence of Newton Hard-Thresholding Pursuit&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Shenglong Zhou, Naihua Xiu, Hou-Duo Qi&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Analyzes global and quadratic convergence of Newton hard-thresholding pursuit.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;A Two-Level Decomposition Framework Exploiting First and Second Order Information for SVM Training Problems&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Giulio Galvan, Matteo Lapucci, Chih-Jen Lin, Marco Sciandrone&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes a two-level decomposition framework for SVM training using first and second-order information.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Approximate Newton Methods&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Haishan Ye, Luo Luo, Zhihua Zhang&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops approximate Newton methods for optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;A Unified Analysis of First-Order Methods for Smooth Games via Integral Quadratic Constraints&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Guodong Zhang, Xuchan Bao, Laurent Lessard, Roger Grosse&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Provides a unified analysis of first-order methods for smooth games using integral quadratic constraints.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;LassoNet: A Neural Network with Feature Sparsity&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Ismael Lemhadri, Feng Ruan, Louis Abraham, Robert Tibshirani&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Introduces LassoNet, a neural network architecture promoting feature sparsity.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;An Algorithmic View of L2 Regularization and Some Path-Following Algorithms&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Yunzhang Zhu, Renxiong Liu&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Explores L2 regularization from an algorithmic perspective with path-following algorithms.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;The Ensmallen Library for Flexible Numerical Optimization&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Ryan R. Curtin, Marcus Edel, Rahul Ganesh Prabhu, Suryoday Basak, Zhihao Lou, Conrad Sanderson&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Introduces the ensmallen library for flexible numerical optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Black-Box Reductions for Zeroth-Order Gradient Algorithms to Achieve Lower Query Complexity&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Bin Gu, Xiyuan Wei, Shangqian Gao, Ziran Xiong, Cheng Deng, Heng Huang&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes black-box reductions for zeroth-order gradient algorithms to reduce query complexity.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;On the Riemannian Search for Eigenvector Computation&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Zhiqiang Xu, Ping Li&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops Riemannian search methods for eigenvector computation.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;</description></item><item><title>Optimization Research Papers in JMLR Volume 21</title><link>https://blog.namln.org/en/mathematics/analysis/optimization/jmlr-v21/</link><pubDate>Tue, 29 Sep 2020 00:00:00 +0000</pubDate><guid>https://blog.namln.org/en/mathematics/analysis/optimization/jmlr-v21/</guid><description>&lt;h1 class="heading" id="optimization-research-papers-in-jmlr-volume-21-2020"&gt;
 Optimization Research Papers in JMLR Volume 21 (2020)&lt;span class="heading__anchor"&gt; &lt;a href="#optimization-research-papers-in-jmlr-volume-21-2020"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h1&gt;&lt;p&gt;This document lists papers from JMLR Volume 21 (2020) that focus on optimization research, categorized by their primary themes. Each paper is numbered starting from 1 within its subsection, with a brief description of its key contributions to optimization theory, algorithms, or applications.&lt;/p&gt;
&lt;h2 class="heading" id="convex-optimization"&gt;
 Convex Optimization&lt;span class="heading__anchor"&gt; &lt;a href="#convex-optimization"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;Papers addressing convex optimization problems, including complexity bounds, convergence analysis, and applications in regression and assortment optimization.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;A Low Complexity Algorithm with O(√T) Regret and O(1) Constraint Violations for Online Convex Optimization with Long Term Constraints&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Hao Yu, Michael J. Neely&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes a low-complexity algorithm for online convex optimization with long-term constraints, achieving O(√T) regret and O(1) constraint violations.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Lower Bounds for Parallel and Randomized Convex Optimization&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Jelena Diakonikolas, Cristóbal Guzmán&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Establishes lower complexity bounds for parallel and randomized algorithms in convex optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Discerning the Linear Convergence of ADMM for Structured Convex Optimization through the Lens of Variational Analysis&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Xiaoming Yuan, Shangzhi Zeng, Jin Zhang&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Analyzes the linear convergence of ADMM for structured convex optimization using variational analysis.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;A Data Efficient and Feasible Level Set Method for Stochastic Convex Optimization with Expectation Constraints&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Qihang Lin, Selvaprabu Nadarajah, Negar Soheili, Tianbao Yang&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops a data-efficient level set method for stochastic convex optimization with expectation constraints.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Conic Optimization for Quadratic Regression Under Sparse Noise&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Igor Molybog, Ramtin Madani, Javad Lavaei&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Applies conic optimization to quadratic regression under sparse noise conditions.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Dynamic Assortment Optimization with Changing Contextual Information&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Xi Chen, Yining Wang, Yuan Zhou&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Addresses dynamic assortment optimization with changing contextual information using convex optimization techniques.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Convex Programming for Estimation in Nonlinear Recurrent Models&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Sohail Bahmani, Justin Romberg&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Uses convex programming for parameter estimation in nonlinear recurrent models.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 class="heading" id="nonconvex-optimization"&gt;
 Nonconvex Optimization&lt;span class="heading__anchor"&gt; &lt;a href="#nonconvex-optimization"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;Papers tackling nonconvex optimization, focusing on guarantees for local minima, variance reduction, and algorithmic advancements.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Exact Guarantees on the Absence of Spurious Local Minima for Non-negative Rank-1 Robust Principal Component Analysis&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Salar Fattahi, Somayeh Sojoudi&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Provides exact guarantees for the absence of spurious local minima in non-negative rank-1 robust PCA.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Stochastic Nested Variance Reduction for Nonconvex Optimization&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Dongruo Zhou, Pan Xu, Quanquan Gu&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Introduces a stochastic nested variance reduction method for nonconvex optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;ProxSARAH: An Efficient Algorithmic Framework for Stochastic Composite Nonconvex Optimization&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Nhan H. Pham, Lam M. Nguyen, Dzung T. Phan, Quoc Tran-Dinh&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes ProxSARAH, an efficient framework for stochastic composite nonconvex optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Convergence Rates for the Stochastic Gradient Descent Method for Non-Convex Objective Functions&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Benjamin Fehrman, Benjamin Gess, Arnulf Jentzen&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Analyzes convergence rates of stochastic gradient descent for nonconvex objective functions.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;AdaGrad Stepsizes: Sharp Convergence Over Nonconvex Landscapes&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Rachel Ward, Xiaoxia Wu, Leon Bottou&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Studies sharp convergence of AdaGrad stepsize schedules in nonconvex optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;A Sparse Semismooth Newton Based Proximal Majorization-Minimization Algorithm for Nonconvex Square-Root-Loss Regression Problems&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Peipei Tang, Chengjing Wang, Defeng Sun, Kim-Chuan Toh&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops a sparse semismooth Newton-based proximal majorization-minimization algorithm for nonconvex square-root-loss regression.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 class="heading" id="stochastic-optimization"&gt;
 Stochastic Optimization&lt;span class="heading__anchor"&gt; &lt;a href="#stochastic-optimization"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;Papers focusing on stochastic optimization methods, including gradient descent, variance reduction, and robustness to noise.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Convergences of Regularized Algorithms and Stochastic Gradient Methods with Random Projections&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Junhong Lin, Volkan Cevher&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Analyzes convergence of regularized algorithms and stochastic gradient methods with random projections.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Graph-Dependent Implicit Regularisation for Distributed Stochastic Subgradient Descent&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Dominic Richards, Patrick Rebeschini&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Studies graph-dependent implicit regularization in distributed stochastic subgradient descent.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Robust Asynchronous Stochastic Gradient-Push: Asymptotically Optimal and Network-Independent Performance for Strongly Convex Functions&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Artin Spiridonoff, Alex Olshevsky, Ioannis Ch. Paschalidis&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes a robust asynchronous stochastic gradient-push method with asymptotically optimal performance for strongly convex functions.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;On Stationary-Point Hitting Time and Ergodicity of Stochastic Gradient Langevin Dynamics&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Xi Chen, Simon S. Du, Xin T. Tong&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Investigates stationary-point hitting time and ergodicity in stochastic gradient Langevin dynamics.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Stochastic Conditional Gradient Methods: From Convex Minimization to Submodular Maximization&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Aryan Mokhtari, Hamed Hassani, Amin Karbasi&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Extends stochastic conditional gradient methods from convex minimization to submodular maximization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;A Class of Parallel Doubly Stochastic Algorithms for Large-Scale Learning&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Aryan Mokhtari, Alec Koppel, Martin Takac, Alejandro Ribeiro&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Introduces parallel doubly stochastic algorithms for large-scale learning.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Gradient Descent for Sparse Rank-One Matrix Completion for Crowd-Sourced Aggregation of Sparsely Interacting Workers&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Yao Ma, Alex Olshevsky, Csaba Szepesvari, Venkatesh Saligrama&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Applies gradient descent to sparse rank-one matrix completion for crowd-sourced worker aggregation.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Optimal Convergence for Distributed Learning with Stochastic Gradient Methods and Spectral Algorithms&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Junhong Lin, Volkan Cevher&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Establishes optimal convergence rates for distributed learning using stochastic gradient methods and spectral algorithms.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Estimate Sequences for Stochastic Composite Optimization: Variance Reduction, Acceleration, and Robustness to Noise&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Andrei Kulunchakov, Julien Mairal&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops estimate sequences for stochastic composite optimization with variance reduction and noise robustness.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;A Unified q-Memorization Framework for Asynchronous Stochastic Optimization&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Bin Gu, Wenhan Xian, Zhouyuan Huo, Cheng Deng, Heng Huang&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes a unified q-memorization framework for asynchronous stochastic optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Asymptotic Analysis via Stochastic Differential Equations of Gradient Descent Algorithms in Statistical and Computational Paradigms&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Yazhen Wang, Shang Wu&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Analyzes gradient descent algorithms using stochastic differential equations in statistical and computational settings.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;The Error-Feedback Framework: SGD with Delayed Gradients&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Sebastian U. Stich, Sai Praneeth Karimireddy&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Introduces an error-feedback framework for stochastic gradient descent with delayed gradients.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 class="heading" id="distributedparallel-optimization"&gt;
 Distributed/Parallel Optimization&lt;span class="heading__anchor"&gt; &lt;a href="#distributedparallel-optimization"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;Papers addressing distributed or parallel optimization algorithms, focusing on communication efficiency and scalability.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;On the Complexity Analysis of the Primal Solutions for the Accelerated Randomized Dual Coordinate Ascent&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Huan Li, Zhouchen Lin&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Analyzes the complexity of primal solutions for accelerated randomized dual coordinate ascent in distributed settings.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;WONDER: Weighted One-shot Distributed Ridge Regression in High Dimensions&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Edgar Dobriban, Yue Sheng&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes WONDER, a weighted one-shot distributed ridge regression method for high-dimensional data.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;GADMM: Fast and Communication Efficient Framework for Distributed Machine Learning&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Anis Elgabli, Jihong Park, Amrit S. Bedi, Mehdi Bennis, Vaneet Aggarwal&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Introduces GADMM, a fast and communication-efficient framework for distributed machine learning.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Communication-Efficient Distributed Optimization in Networks with Gradient Tracking and Variance Reduction&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Boyue Li, Shicong Cen, Yuxin Chen, Yuejie Chi&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops communication-efficient distributed optimization with gradient tracking and variance reduction.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;On Convergence of Distributed Approximate Newton Methods: Globalization, Sharper Bounds and Beyond&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Xiao-Tong Yuan, Ping Li&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Analyzes convergence of distributed approximate Newton methods with sharper bounds and globalization techniques.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 class="heading" id="submodular-optimization"&gt;
 Submodular Optimization&lt;span class="heading__anchor"&gt; &lt;a href="#submodular-optimization"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;Papers focusing on submodular optimization, including minimization and maximization problems.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Quadratic Decomposable Submodular Function Minimization: Theory and Practice&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Pan Li, Niao He, Olgica Milenkovic&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Studies quadratic decomposable submodular function minimization with theoretical and practical insights.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Optimal Algorithms for Continuous Non-monotone Submodular and DR-Submodular Maximization&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Rad Niazadeh, Tim Roughgarden, Joshua R. Wang&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops optimal algorithms for continuous non-monotone submodular and DR-submodular maximization.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 class="heading" id="bayesian-and-hyperparameter-optimization"&gt;
 Bayesian and Hyperparameter Optimization&lt;span class="heading__anchor"&gt; &lt;a href="#bayesian-and-hyperparameter-optimization"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;Papers addressing Bayesian optimization and hyperparameter tuning for scalable and robust optimization.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Tuning Hyperparameters without Grad Students: Scalable and Robust Bayesian Optimisation with Dragonfly&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Kirthevasan Kandasamy, Karun Raju Vysyaraju, Willie Neiswanger, Biswajit Paria, Christopher R. Collins, Jeff Schneider, Barnabas Poczos, Eric P. Xing&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Introduces Dragonfly, a scalable and robust Bayesian optimization framework for hyperparameter tuning.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Distributionally Ambiguous Optimization for Batch Bayesian Optimization&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Nikitas Rontsis, Michael A. Osborne, Paul J. Goulart&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes distributionally ambiguous optimization for batch Bayesian optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;The Kalai-Smorodinsky Solution for Many-Objective Bayesian Optimization&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Mickael Binois, Victor Picheny, Patrick Taillandier, Abderrahmane Habbal&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Applies the Kalai-Smorodinsky solution to many-objective Bayesian optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Robust Reinforcement Learning with Bayesian Optimisation and Quadrature&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Supratik Paul, Konstantinos Chatzilygeroudis, Kamil Ciosek, Jean-Baptiste Mouret, Michael A. Osborne, Shimon Whiteson&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Integrates Bayesian optimization and quadrature for robust reinforcement learning.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 class="heading" id="optimization-in-reinforcement-learning"&gt;
 Optimization in Reinforcement Learning&lt;span class="heading__anchor"&gt; &lt;a href="#optimization-in-reinforcement-learning"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;Papers focusing on optimization techniques for policy optimization and reinforcement learning.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Dhruv Malik, Ashwin Pananjady, Kush Bhatia, Koulik Khamaru, Peter L. Bartlett, Martin J. Wainwright&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops derivative-free methods for policy optimization in linear quadratic systems with guarantees.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Expected Policy Gradients for Reinforcement Learning&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Kamil Ciosek, Shimon Whiteson&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Introduces expected policy gradients for reinforcement learning optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Importance Sampling Techniques for Policy Optimization&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Alberto Maria Metelli, Matteo Papini, Nico Montali, Marcello Restelli&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes importance sampling techniques for efficient policy optimization in reinforcement learning.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 class="heading" id="other-optimization-topics"&gt;
 Other Optimization Topics&lt;span class="heading__anchor"&gt; &lt;a href="#other-optimization-topics"&gt;#&lt;/a&gt;&lt;/span&gt;
&lt;/h2&gt;&lt;p&gt;Papers covering miscellaneous optimization topics, including dictionary learning, neural network verification, and differential privacy.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Learning with Fenchel-Young Losses&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Mathieu Blondel, André F.T. Martins, Vlad Niculae&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Introduces optimization with Fenchel-Young losses for structured prediction.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Branch and Bound for Piecewise Linear Neural Network Verification&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Rudy Bunel, Jingyue Lu, Ilker Turkaslan, Philip H.S. Torr, Pushmeet Kohli, M. Pawan Kumar&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Applies branch and bound techniques for piecewise linear neural network verification.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Conjugate Gradients for Kernel Machines&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Simon Bartels, Philipp Hennig&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops conjugate gradient methods for optimization in kernel machines.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Unique Sharp Local Minimum in L1-Minimization Complete Dictionary Learning&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Yu Wang, Siqi Wu, Bin Yu&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Analyzes unique sharp local minima in L1-minimization for complete dictionary learning.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Community-Based Group Graphical Lasso&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Eugen Pircalabelu, Gerda Claeskens&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes a community-based group graphical Lasso for structured optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Constrained Dynamic Programming and Supervised Penalty Learning Algorithms for Peak Detection in Genomic Data&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Toby Dylan Hocking, Guillem Rigaill, Paul Fearnhead, Guillaume Bourque&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops constrained dynamic programming and supervised penalty learning for peak detection in genomic data.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Loss Control with Rank-One Covariance Estimate for Short-Term Portfolio Optimization&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Zhao-Rong Lai, Liming Tan, Xiaotian Wu, Liangda Fang&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Applies rank-one covariance estimation for loss control in short-term portfolio optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;A Unified Framework of Online Learning Algorithms for Training Recurrent Neural Networks&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Owen Marschall, Kyunghyun Cho, Cristina Savin&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes a unified framework of online learning algorithms for training recurrent neural networks.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Chaining Meets Chain Rule: Multilevel Entropic Regularization and Training of Neural Networks&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Amir R. Asadi, Emmanuel Abbe&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Introduces multilevel entropic regularization for neural network training using chaining and chain rule.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Nesterov&amp;rsquo;s Acceleration for Approximate Newton&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Haishan Ye, Luo Luo, Zhihua Zhang&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Applies Nesterov’s acceleration to approximate Newton methods for optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;New Insights and Perspectives on the Natural Gradient Method&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: James Martens&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Provides new insights into the natural gradient method for optimization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Complete Dictionary Learning via L4-Norm Maximization over the Orthogonal Group&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Yuexiang Zhai, Zitong Yang, Zhenyu Liao, John Wright, Yi Ma&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops complete dictionary learning via L4-norm maximization over the orthogonal group.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Empirical Risk Minimization in the Non-Interactive Local Model of Differential Privacy&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Di Wang, Marco Gaboardi, Adam Smith, Jinhui Xu&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Studies empirical risk minimization in the non-interactive local model of differential privacy.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Stable Regression: On the Power of Optimization over Randomization&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Dimitris Bertsimas, Ivan Paskov&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Analyzes the power of optimization over randomization in stable regression.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Fast Exact Matrix Completion: A Unified Optimization Framework for Matrix Completion&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Dimitris Bertsimas, Michael Lingzhi Li&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Proposes a unified optimization framework for fast exact matrix completion.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Rank-Based Lasso - Efficient Methods for High-Dimensional Robust Model Selection&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;Authors&lt;/em&gt;: Wojciech Rejchel, Małgorzata Bogdan&lt;br&gt;
&lt;em&gt;Description&lt;/em&gt;: Develops rank-based Lasso methods for high-dimensional robust model selection.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;</description></item></channel></rss>