tisdag 21 juli 2015

What are Bloom filters?

To understand Bloom filters, you first have to understand hashing.
A hash is like a fingerprint for data. A hash function takes your data — which can be any length — as an input, and gives you back an identifier of a smaller (usually), fixed (usually) length, which you can use to index or compare or identify the data.
This is a hash of my full name:
b3f9b3a3504ccb29c4183730a42c8d56
--
There are a few properties which are desirable in a useful hash 
function. The most important one is that the same input must always hash
 to the same output. 
--
When suggesting stories to a user, the system may come up with thousands of possibilities, but not every one will be appropriate for a user. Perhaps a story has been suggested to them too many times already. Perhaps they’ve already read it. We need to filter those stories out, but we don’t want to retrieve thousands of records from the database to do it. We need a mechanism that is particularly good at answering the question “has this been seen before?”.
And for that, we use a Bloom filter.
 

Inga kommentarer:

Skicka en kommentar

Clicky Web Analytics Web Analytics