Differentially Private Learned Indexes

Jianzhang Du,Tilak Mudgal,Rutvi Rahul Gadre,Yukui Luo,Chenghong Wang

提交: 2024-09-27更新: 2024-10-15

TL;DR

This work proposes the first differentially-private learned indexes to accelerate predicate search on encrypted data.

摘要

In this paper, we study the problem of efficiently answering predicate queries for encrypted databases—those powered by Trusted Execution Environments (TEEs), allowing untrusted providers to process encrypted user data all without revealing sensitive details. A common strategy in conventional databases to accelerate query processing is the use of indexes, which map attribute values to their corresponding record locations within a sorted data array. This allows for fast lookup and retrieval of data subsets that satisfy specific predicates. Unfortunately, these traditional indexing methods cannot be directly applied to encrypted databases due to strong data-dependent leakages. Recent approaches use differential privacy (DP) to construct noisy indexes that enable faster access to encrypted data while maintaining provable privacy guarantees. However, these methods often suffer from significant data loss and high overhead. To address these challenges, we propose to explore learned indexes---a trending technique that repurposes machine learning models as indexing structures---to build more efficient DP indexes. Our contributions are threefold: (i) We propose a flat learned index structure that seamlessly integrates with differentially private stochastic gradient descent (DPSGD) algorithms for efficient and private index training. (ii) We introduce a novel noisy-max based private index lookup technique that ensures lossless indexing while maintaining provable privacy. (iii) We benchmark our DP learned indexes against state-of-the-art (SOTA) DP indexing methods. Results show that our method outperform the existing DP indexes by up to 925.6$\times$ in performance.

关键词

learned indexdifferential privacyencrypted databases

评审与讨论

撤稿通知

2024-10-15

We identified an issue with our method, particularly concerning the sensitivity used for generating Gaussian noise--- it should also scale with the batch sizes. Following discussions, we have decided to withdraw the submission to prepare a more rigorous and complete manuscript.