The purpose of this one-way-hash filter is to have large scale hashing in a small subset of data.
It is similar to a bloom filter except that instead of using a binary system the filter uses an a-z system where each point can hold more that one value and is not limited to binary values.
Why use these?
A 250x250 matrix (large) has a unique factor of over 40 million and is around 16 megabytes blank. The structure is designed for large hashes. It would be easily possible to hold the entire domain name system hashed in around 20 megs. What this means is you could have very fast lookups on if a name is in the system.
How does this filter work?
Each item hashed in the database will have a unique value set to each of the above markers. For lookups, the letter must be in two exact places withing the entire filter. In the above example of the spelling checker, there is a 1/1.625M probabily of a duplicate at any given point (due to doubling data inserting).
These filters can take a while to generate, they are very processor intensive to create. These filters use a matrix format for speed in lookups, each matrix point holds a small filter.
How is that this filter can hold so much information?
Example small filter
0000000000Insert "d" at position '2' (note:Counting starts at zero)
00d0000000Insert "z" at position '5' (note:Counting starts at zero)
00d00z0000Now it becomes interesting. insert "f" at position '2' (on top of another letter in place)
00Df00z0000If another needs to be positioned at the same point. insert "g" at position '2' (on top of another two letters in place)
00DFg00z0000The array maintains all positions and if it encounters an uppercase letter it concats until a non-uppercase letter is found.
Array
(
[0] => 0
[1] => 0
[2] => DFg
[3] => 0
[4] => 0
[5] => z
[6] => 0
[7] => 0
[8] => 0
[9] => 0
)
Multiple hashing algorithms are used for placements.
TO DO: