A Machine Learning Approach to Multi-Join Query Optimization in Large-Scale Relational Databases

Main Article Content

Jalel Messaoudi
Nacim Touahria

Abstract

Large-scale relational database systems increasingly serve workloads with complex analytical queries involving many joins over large tables. Traditional cost-based optimizers rely on manually engineered heuristics, dynamic programming, and cardinality models that were calibrated for smaller data volumes and simpler schemas. As workloads and schemas evolve, the assumptions underlying these components gradually drift, leading to suboptimal multi-join plans and increased query latency. At the same time, modern database deployments generate extensive execution traces and performance logs that implicitly encode information about plan quality and runtime behavior across a diversity of conditions. These trends motivate approaches that complement classical optimization techniques with data-driven models that can adapt to the observed behavior of the system. This study investigates a machine learning formulation of multi-join query optimization in large-scale relational databases. The approach represents candidate join plans, join graph structures, and runtime properties as feature vectors and trains predictive models to approximate plan cost and to guide join order enumeration. The learned models are integrated into the optimizer as an additional source of information rather than as a replacement for existing mechanisms. The study examines how linear and piecewise linear models can capture dominant interactions between joins and how they can be trained using offline traces while remaining stable under changing workloads. An experimental evaluation on synthetic and production-inspired benchmarks illustrates how the learned models influence plan selection, how they interact with existing cost models, and which design choices affect robustness. The results highlight practical trade-offs of this machine learning based optimizer extension in realistic environments.

Article Details

Section

Articles

How to Cite

A Machine Learning Approach to Multi-Join Query Optimization in Large-Scale Relational Databases. (2025). Orient Journal of Emerging Paradigms in Artificial Intelligence and Autonomous Systems, 15(11), 1-15. https://orientacademies.com/index.php/OJEPAIAS/article/view/2025-11-16