A Learned Bloom Filter Cascade for High-Selectivity Lookups in Distributed Key–Value Stores with Tight Error Bounds

Main Article Content

Adel Benamar
Karim Selmani

Abstract

Distributed key--value stores increasingly serve read-heavy workloads in which negative lookups dominate, such as cache misses, spam and abuse filtering, authorization checks, and existence tests preceding conditional writes. In such regimes, classical Bloom filters reduce backend load by compactly representing set membership, but they incur either additional memory to meet low false-positive targets or additional round trips when deployed as multi-level filters. Learned Bloom filters offer an alternative by using a model to score membership likelihood, yet their deployment in distributed systems is constrained by operational requirements, distribution drift, calibration error, and the need for tight and auditable bounds on false positives and false negatives. This paper develops a learned Bloom filter cascade tailored to high-selectivity lookups in distributed key--value stores, combining a calibrated scorer with a small sequence of fallback filters and an optional exact guard to enforce correctness policies. The cascade is constructed to minimize total memory under explicit constraints on end-to-end error rates and query cost, while supporting shard-local operation, online threshold adjustment, and bounded degradation under data shift. We present a system model that captures networked lookup pipelines and heterogeneous shard populations, derive tight error bounds that compose across cascade stages, and formulate an optimization procedure that selects thresholds and filter sizes to satisfy a prescribed global false-positive budget. The analysis emphasizes practical tightness, monotonicity, and verifiability, enabling deployment in environments where probabilistic admission control must be justified quantitatively.

Article Details

Section

Articles

How to Cite

A Learned Bloom Filter Cascade for High-Selectivity Lookups in Distributed Key–Value Stores with Tight Error Bounds. (2023). Orient Journal of Emerging Paradigms in Artificial Intelligence and Autonomous Systems, 13(3), 1-16. https://orientacademies.com/index.php/OJEPAIAS/article/view/2023-03-04