-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
I was expecting the False Positive rate to stay below the level the bloomfilter was initialized with until the bloomfilter reaches capacity, but instead it vastly exceeds it.
Create a bloomfilter with 400 million capacity, FPP of 0.001, put 100 million things into the bloomfilter, then test 1 million things that weren't put into the bloomfilter.
I expected to find 1,000 or less false positives, but instead I find 23,242.
iex(1)> b = Blex.new(400000000, 0.001)
%Blex{
a: #Reference<0.3778163223.2071855107.97792>,
b: 30,
hash_fn: #Function<4.114810444/2 in Blex.get_hash_fn/1>,
hash_id: 202,
k: 10,
m: 1073741824
}
iex(2)> 1..100000000 |> Enum.map(fn x -> Blex.put(b, x) end)
[:ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok,
:ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok,
:ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok,
:ok, :ok, ...]
iex(3)> 101000001..102000000 |> Enum.map(fn x -> Blex.member?(b, x) end) |> Enum.filter(fn x -> x end) |> Enum.count
23242
I get the exact same false positive count when I initialize the bloomfilter with a capacity of 1.6 billion.
While experimenting I also tried creating a bloomfilter with capacity of 3.2 billion, but that resulted in a very different error that I'll open a separate issue for.
Terkwood
Metadata
Metadata
Assignees
Labels
No labels