Architectural Advancements in Big Data Analytics: A Comparative Study of Scalable Frameworks for High-Performance Computing
Main Article Content
Abstract
Big Data Analytics has emerged as a critical domain, driving advancements in various industries by enabling data-driven decision-making. As datasets grow exponentially, traditional data processing architectures struggle to handle large-scale computations efficiently. High-performance computing (HPC) frameworks have been developed to address these challenges, offering scalability, fault tolerance, and optimized resource utilization. This paper presents a comparative study of scalable architectures for big data analytics, focusing on distributed computing frameworks such as Apache Hadoop, Apache Spark, and Dask. We analyze their architectural differences, computational efficiency, and adaptability to large-scale workloads. The study examines key design principles, including data partitioning, in-memory processing, parallel execution, and cluster management. Furthermore, we evaluate their suitability for real-time and batch processing applications, highlighting their strengths and limitations. Benchmarks and case studies from existing literature are reviewed to provide insights into performance trade-offs across different workloads. By understanding the architectural advancements in these frameworks, organizations can make informed decisions on selecting the most appropriate technology for their big data needs. Our findings indicate that while Hadoop remains relevant for batch processing, Spark's in-memory execution significantly enhances computational speed, and Dask's dynamic task scheduling improves scalability for complex analytics. The paper concludes with a discussion on emerging trends and future research directions in high-performance big data computing.