Tesi magistrale su distance sensitive Bloom filters

Immagine Silvestri Francesco
Tesi magistrale su distance sensitive Bloom filters
di Silvestri Francesco - giovedì, 6 aprile 2017, 08:59

Distance Sensitive Bloom Filters

Given a set S, a Bloom filter [1] is a data structure that efficiently answers the query “Is an element x in S?”. A Bloom filter requires less space than the trivial solution of storing the entire set S, however it has false positives: for some elements x, the filter returns “Yes” even when the correct answer is “No”.

This kind of errors is acceptable in many applications. For instance, when checking if a new image is already contained in a set S of images: if the answer is “no”, then we can certainly conclude that the image is not in S; if the answer is “yes” we can check the correctness with a (expensive) standard search within S. This approach is efficient when the number of “no” answers is much larger than the “yes” ones. Bloom filters are used in many applications like Google Chrome, Apache HBase, Bitcoin…

A desirable feature that is not supported by Bloom filters is to recognise similar objects: we would like to answer the query “Does set S contain an element similar to x?”. Filters supporting these queries are called ”distance sensitive Bloom filters”. For instance, we would like to recognise if set S contains an image that differs from an image x in a few pixels, or a document that differs from a text x in a few words. 

A recent [2] result has shown that such a data structure exists and has theoretical proprieties similar to the standard Bloom filters. The goal of this thesis is to further investigate distance sensitive Bloom filters by studying the filters in some specific settings, providing an efficient implementation and then by experimentally evaluating them.

For more information, contact Francesco Silvestri (,

[2] M. Goswami, R. Pagh, F. Silvestri and J. Sivertsen. Distance Sensitive Bloom Filters Without False Negatives. Proc. 28th ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 257-269. 2017.