VARIATIONAL METHODS FOR NEURAL NETWORK TRAINING: APPLICATIONS OF STURM-LIOUVILLE ENERGY ESTIMATES
Keywords:
variational methods, artificial neural networks, boundary value problemsDOI:
https://doi.org/10.17654/0972087125015Abstract
This paper establishes a novel connection between local minimization principles for Sturm-Liouville equations and optimization techniques used in training neural networks. By interpreting the training of neural networks as a variational problem, we demonstrate how recent results on energy estimates for mixed boundary value problems in Sturm-Liouville theory can be adapted to analyze and improve neural network convergence. We present two main theorems: the first establishes conditions for guaranteed convergence to non-zero local minima in neural network training, and the second demonstrates the existence of multiple critical points with energy estimates. Our theoretical results are supported by experimental validation on benchmark datasets, showing improved performance in avoiding trivial solutions during training. This work bridges the gap between classical differential equation theory and modern machine learning optimization.
Received: April 12, 2025
Accepted: May 9, 2025
References
A. Choromanska, M. Henaff, M. Mathieu, G. B. Arous and Y. LeCun, The loss surfaces of multilayer networks, Artificial Intelligence and Statistics, 2015, pp. 192-204.
Y. N. Dauphin, R. Pascanu, C. Gulcehre, K. Cho, S. Ganguli and Y. Bengio, Identifying and attacking the saddle point problem in high-dimensional non-convex optimization, Advances in Neural Information Processing Systems, 2014, pp. 2933-2941.
L. C. Evans, Partial differential equations, Graduate Studies in Mathematics, American Mathematical Society, Vol. 19, 2010.
S. Heidarkhani, S. Moradi and M. Ferrara, Energy estimates and existence results for a mixed boundary value problem for a complete Sturm-Liouville equation exploiting a local minimization principle, WSEAS Trans. Math. 24 (2025), 220-230.
K. Kawaguchi, Deep learning without poor local minima, Advances in Neural Information Processing Systems, 2016, pp. 586-594.
I. E. Lagaris, A. Likas and D. I. Fotiadis, Artificial neural networks for solving ordinary and partial differential equations, IEEE Transactions on Neural Networks 9(5) (1998), 987-1000.
G. H. Liu and E. A. Theodorou, Deep learning theory review: an optimal control and dynamical systems perspective, 2019. arXiv preprint arXiv:1908.10920.
P. H. Rabinowitz, Minimax methods in critical point theory with applications to differential equations, American Mathematical Society, 1986, pp. 1-100.
M. Raissi, P. Perdikaris and G. E. Karniadakis, Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations, J. Comput. Phys. 378 (2019), 686-707.
S. Ruder, An overview of gradient descent optimization algorithms, 2016. arXiv preprint arXiv:1609.04747.
R. T. Q. Chen, Y. Rubanova, J. Bettencourt and D. Duvenaud, Neural ordinary differential equations, Advances in Neural Information Processing Systems, 2018, pp. 6571-6583.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 PUSHPA PUBLISHING HOUSE, PRAYAGRAJ, INDIA

This work is licensed under a Creative Commons Attribution 4.0 International License.
_________________________
Attribution: Credit Pushpa Publishing House as the original publisher, including title and author(s) if applicable.
Non-Commercial Use: For non-commercial purposes only. No commercial activities without explicit permission.
Contact Puspha Publishing House for more info or permissions.





