I’ve been playing with the Naive implementation of the Bloomier Filter recently. The Bloomier Filter is just a stack of several Bloom Filters on each level. (Well the Naive version described in the first few pages of the paper is any ways) Is it me or is it that I just realized that there is a huge savings that can be had by allocating the second layer to have size FPR*N where FPR is the guaranteed False Positive Rate and N is the total number of items we inserted. So, but how small is the set of items that may make it to the next layer? Well, the expected value can be calculated as FPR*N. I guess if I am smarter, I would go ahead and calculate the variance of this RV, and compute how much variance to the overall expected false positives I add by making this second layer just a fraction of the first layer….
I guess he probably says this in the paper some where, but alas, took me a day to figure out.
Actually in reality it is very rare for me to break out of the first layer unless I construct a really really obnoxious case where I tell the BloomFilter that I plan to put in 100 items, and that I want 0.00001 FPR, and then put in 100k items, then it explodes and generates extra false positives.
This is such an interesting data structure. Have you had success with this algorithm?