What are Bloom filters?
To understand Bloom filters, you first have to understand hashing.
A hash is like a
fingerprint for data. A hash function takes your data — which can be any
length — as an input, and gives you back an identifier of a smaller
(usually), fixed (usually) length, which you can use to index or compare
or identify the data.
This is a hash of my full name:
b3f9b3a3504ccb29c4183730a42c8d56
--
There are a few properties which are desirable in a useful hash function. The most important one is that the same input must always hash to the same output.
--
When suggesting stories to a
user, the system may come up with thousands of possibilities, but not
every one will be appropriate for a user. Perhaps a story has been
suggested to them too many times already. Perhaps they’ve already read
it. We need to filter those stories out, but we don’t want to retrieve
thousands of records from the database to do it. We need a mechanism
that is particularly good at answering the question “has this been seen
before?”.
And for that, we use a Bloom filter.
Inga kommentarer:
Skicka en kommentar