Why should hash functions use a prime number modulus?



A long time ago, I bought a data structures book off the bargain table for $1.25. In it, the explanation for a hashing function said that it should ultimately mod by a prime number because of "the nature of math".

What do you expect from a $1.25 book?

Anyway, I've had years to think about the nature of math, and still can't figure it out.

Is the distribution of numbers truly more even when there are a prime number of buckets? Or is this an old programmer's tale that everyone accepts because everybody else accepts it?


Posted 2009-07-17T19:30:04.560

Reputation: 5 430

1This question appears to be off-topic because it more than likely belongs on [cs.se]. – Lightness Races in Orbit – 2014-06-06T17:00:26.387

1http://cs.stackexchange.com/a/64191/64222 another well argued explanation. – Green Tree – 2017-03-17T19:14:30.063

1Perfectly reasonable question: Why should there be a prime number of buckets? – Draemon – 2009-07-17T19:35:50.430



Usually a simple hash function works by taking the "component parts" of the input (characters in the case of a string), and multiplying them by the powers of some constant, and adding them together in some integer type. So for example a typical (although not especially good) hash of a string might be:

(first char) + k * (second char) + k^2 * (third char) + ...

Then if a bunch of strings all having the same first char are fed in, then the results will all be the same modulo k, at least until the integer type overflows.

[As an example, Java's string hashCode is eerily similar to this - it does the characters reverse order, with k=31. So you get striking relationships modulo 31 between strings that end the same way, and striking relationships modulo 2^32 between strings that are the same except near the end. This doesn't seriously mess up hashtable behaviour.]

A hashtable works by taking the modulus of the hash over the number of buckets.

It's important in a hashtable not to produce collisions for likely cases, since collisions reduce the efficiency of the hashtable.

Now, suppose someone puts a whole bunch of values into a hashtable that have some relationship between the items, like all having the same first character. This is a fairly predictable usage pattern, I'd say, so we don't want it to produce too many collisions.

It turns out that "because of the nature of maths", if the constant used in the hash, and the number of buckets, are coprime, then collisions are minimised in some common cases. If they are not coprime, then there are some fairly simple relationships between inputs for which collisions are not minimised. All the hashes come out equal modulo the common factor, which means they'll all fall into the 1/n th of the buckets which have that value modulo the common factor. You get n times as many collisions, where n is the common factor. Since n is at least 2, I'd say it's unacceptable for a fairly simple use case to generate at least twice as many collisions as normal. If some user is going to break our distribution into buckets, we want it to be a freak accident, not some simple predictable usage.

Now, hashtable implementations obviously have no control over the items put into them. They can't prevent them being related. So the thing to do is to ensure that the constant and the bucket counts are coprime. That way you aren't relying on the "last" component alone to determine the modulus of the bucket with respect to some small common factor. As far as I know they don't have to be prime to achieve this, just coprime.

But if the hash function and the hashtable are written independently, then the hashtable doesn't know how the hash function works. It might be using a constant with small factors. If you're lucky it might work completely differently and be nonlinear. If the hash is good enough, then any bucket count is just fine. But a paranoid hashtable can't assume a good hash function, so should use a prime number of buckets. Similarly a paranoid hash function should use a largeish prime constant, to reduce the chance that someone uses a number of buckets which happens to have a common factor with the constant.

In practice, I think it's fairly normal to use a power of 2 as the number of buckets. This is convenient and saves having to search around or pre-select a prime number of the right magnitude. So you rely on the hash function not to use even multipliers, which is generally a safe assumption. But you can still get occasional bad hashing behaviours based on hash functions like the one above, and prime bucket count could help further.

Putting about the principle that "everything has to be prime" is as far as I know a sufficient but not a necessary condition for good distribution over hashtables. It allows everybody to interoperate without needing to assume that the others have followed the same rule.

[Edit: there's another, more specialized reason to use a prime number of buckets, which is if you handle collisions with linear probing. Then you calculate a stride from the hashcode, and if that stride comes out to be a factor of the bucket count then you can only do (bucket_count / stride) probes before you're back where you started. The case you most want to avoid is stride = 0, of course, which must be special-cased, but to avoid also special-casing bucket_count / stride equal to a small integer, you can just make the bucket_count prime and not care what the stride is provided it isn't 0.]

Steve Jessop

Posted 2009-07-17T19:30:04.560

Reputation: 231 153

Just as a side note: a discussion for a sensible choice of the factor k for hashCodes is here: http://stackoverflow.com/q/1835976/21499

– Hans-Peter Störr – 2012-08-16T07:30:27.100

8this is an awesome answer. can you please explain this further "So you get striking relationships modulo 31 between strings that end the same way, and striking relationships modulo 2^32 between strings that are the same except near the end. This doesn't seriously mess up hashtable behaviour." I especially dont understand the 2^32 part – ordinary – 2013-11-16T07:33:43.550

1Additional note to make things more clear about this: "All the hashes come out equal modulo the common factor" -> This is because, if you consider the example hash function hash = 1st char + 2nd char*k + ... , and take strings with the same first character, hash%k will be the same for these strings. If M is the size of the hashtable and g is the gcd of M and k, then (hash%k)%g equals hash%g (since g divides k) and hence hash%g will also be the same for these strings. Now consider (hash%M)%g, this is equal to hash%g (since g divides M). So (hash%M)%g is equal for all these strings. – Quark – 2015-07-04T00:02:51.470

This kind of raises the question "why are we using a hash function of that form in the first place?" – Daniel McLaury – 2017-01-18T23:36:39.490

@DanielMcLaury: well, whoever wrote that function for Sun might have their own version of the story, but generally it's because it's easy to write, it runs fast, and it has adequate distribution for most practical purposes. Anyway the rule of thumb the questioner is asking about, "hashes should ultimately mod by a prime number", doesn't apply if your hash is designed to use some other mathematical means to ensure there are no nasty surprises modulo 2^N, N being the number of bits in the hash. – Steve Jessop – 2017-01-24T19:47:22.983


@DanielMcLaury Joshua Bloch explained why for Java - it was recommended in two popular books (K&R, Dragon book) and performed well with low collisions on the English dictionary. It is fast (uses Horner's method). Apparently even K&R don't remember where it came from. Similar function is Rabin fingerprint from Rabin-Karp algorithm (1981) but K&R (1978) predates that.

– bain – 2017-04-23T10:32:25.290

This is a wonderfully detailed, well explained answer. I can only wish more people would read it. – Tanveer Badar – 2017-08-23T13:53:10.313

1@SteveJessop, please can you explain "striking relationships modulo 2^32 between strings that are the same except near the end."? Thanks. – Khanna111 – 2017-12-19T21:39:34.170


The first thing you do when inserting/retreiving from hash table is to calculate the hashCode for the given key and then find the correct bucket by trimming the hashCode to the size of the hashTable by doing hashCode % table_length. Here are 2 'statements' that you most probably have read somewhere

  1. If you use a power of 2 for table_length, finding (hashCode(key) % 2^n ) is as simple and quick as (hashCode(key) & (2^n -1)). But if your function to calculate hashCode for a given key isn't good, you will definitely suffer from clustering of many keys in a few hash buckets.
  2. But if you use prime numbers for table_length, hashCodes calculated could map into the different hash buckets even if you have a slightly stupid hashCode function.

And here is the proof.

If suppose your hashCode function results in the following hashCodes among others {x , 2x, 3x, 4x, 5x, 6x...}, then all these are going to be clustered in just m number of buckets, where m = table_length/GreatestCommonFactor(table_length, x). (It is trivial to verify/derive this). Now you can do one of the following to avoid clustering

Make sure that you don't generate too many hashCodes that are multiples of another hashCode like in {x, 2x, 3x, 4x, 5x, 6x...}.But this may be kind of difficult if your hashTable is supposed to have millions of entries. Or simply make m equal to the table_length by making GreatestCommonFactor(table_length, x) equal to 1, i.e by making table_length coprime with x. And if x can be just about any number then make sure that table_length is a prime number.

From - http://srinvis.blogspot.com/2006/07/hash-table-lengths-and-prime-numbers.html


Posted 2009-07-17T19:30:04.560




Pretty clear explanation, with pictures too.

Edit: As a summary, primes are used because you have the best chance of obtaining a unique value when multiplying values by the prime number chosen and adding them all up. For example given a string, multiplying each letter value with the prime number and then adding those all up will give you its hash value.

A better question would be, why exactly the number 31?


Posted 2009-07-17T19:30:04.560

Reputation: 10 289

3@SteveJessop The number 31 is easily optimized by the CPU as a (x32)-1 operation, in which `32is a simple bit shift, or even better an immediate address scale factor (e.g.lea eax,eax8; leax, eax,eax4on x86/x64). So*31` is a good candidate for prime number multiplication. This was pretty much true some years ago - now latest CPUs architecture have an almost instant multiplication - division is always slower... – Arnaud Bouchez – 2013-09-27T13:48:57.543

@AlbertoPL If that were so, they could as well have used 15 since its factors (besides itself and 1) are 3 and 5, both primes. Many other numbers: 10 (2 and 5),14 (2 and 7), 21 (3, 7) etc – afaolek – 2016-05-03T13:56:07.820

4Although, I think a summary would be helpful, in case that site is ever dead, some remnant of its content will be saved here on SO. – Thomas Owens – 2009-07-17T19:35:44.830

1+1 for linking, but the article doesn't exactly delve into the math. – Aiden Bell – 2009-07-17T19:35:49.773

2The article does not explain why, but says "Researchers found that using a prime of 31 gives a better distribution to the keys, and lesser no of collisions. No one knows why..." Funny, asking the same question as me in effect. – theschmitzer – 2009-07-17T20:20:02.603

> A better question would be, why exactly the number 31?

If you mean why is the number 31 used, then the article you point tells you why, ie because it is quick to multiple by and cos tests show it is the best one to use. The other popular multiplier I have seen is 33 which lends weight to the theory that the speed issue was (at least initially) an important factor.

If you mean, what is it about 31 that makes it better in the tests, then I'm afraid I don't know. – sgmoore – 2009-07-17T20:21:54.597

333 is not prime. – starblue – 2009-07-17T20:45:14.987

Exactly, so the only reason that it could have been used as a multiplier was because it was easy to multiply by. (When I say I have seen 33 used as a multiplier, I don't mean recently, this was probably decades ago, and possible before a lot of analysis was done on hashing). – sgmoore – 2009-07-17T21:02:29.897

They probably used it because the factors of 33 (besides itself and 1) are 3 and 11, both primes. – AlbertoPL – 2009-07-17T21:36:21.683

Well obviously the less factors the better, but if you knew that, why pick something that has an extra factor. I assumed it was picked by someone who did not realise the importance of primes. Maybe someone thought addition would be quicker than subtraction and hence x33 would be faster than x31. – sgmoore – 2009-07-17T21:47:08.027



index[hash(input)%2] would result in a collision for half of all possible hashes and a range of values. index[hash(input)%prime] results in a collision of <2 of all possible hashes. Fixing the divisor to the table size also ensures that the number cannot be greater than the table.


Posted 2009-07-17T19:30:04.560

Reputation: 1 828

2 is a prime number dude – Ganesh Chowdhary Sadanala – 2018-10-25T15:25:15.030


Primes are used because you have good chances of obtaining a unique value for a typical hash-function which uses polynomials modulo P. Say, you use such hash-function for strings of length <= N, and you have a collision. That means that 2 different polynomials produce the same value modulo P. The difference of those polynomials is again a polynomial of the same degree N (or less). It has no more than N roots (this is here the nature of math shows itself, since this claim is only true for a polynomial over a field => prime number). So if N is much less than P, you are likely not to have a collision. After that, experiment can probably show that 37 is big enough to avoid collisions for a hash-table of strings which have length 5-10, and is small enough to use for calculations.


Posted 2009-07-17T19:30:04.560

Reputation: 878

1While the explanation seems now obvious, it got to me after reading a book by A.Shen "Programming: Theorems and problems" (in Russian), see discussion of Rabin algorithm. Not sure if an English translation exists. – TT_ – 2013-11-26T01:20:45.207


Just to provide an alternate viewpoint there's this site:


Which contends that you should use the largest number of buckets possible as opposed to to rounding down to a prime number of buckets. It seems like a reasonable possibility. Intuitively, I can certainly see how a larger number of buckets would be better, but I'm unable to make a mathematical argument of this.


Posted 2009-07-17T19:30:04.560

Reputation: 6 057

1@Unknown you have missed that collisions also depend on the hash function itself. So if the has function is really bad, then no matter how big you increase the size, there still might be significant amount of collisions – Suraj Chandran – 2011-11-24T02:25:40.730

Larger number of buckets means less collisions: See the pigeonhole principle. – Unknown – 2009-09-10T03:23:47.227

11@Unknown: I don't believe that's true. Please correct me if I'm wrong, but I believe applying the pigeonhole principle to hash tables only allows you to assert that there WILL be collisions if you have more elements than bins, not to draw any conclusions on the amount or density of collisions. I still believe that the larger number of bins is the correct route, however. – Falaina – 2009-09-10T14:18:43.590

If you assume that the collisions are for all intents and purposes random, then by the birthday paradox a larger space (buckets) will reduce the probability of a collision occurring. – Unknown – 2009-09-29T02:57:52.267

The original article seems to be gone, but there are some insightful comments here, including a discussion with the original author. https://news.ycombinator.com/item?id=650487

– Adrian McCarthy – 2014-10-29T22:52:55.290

@AdrianMcCarthy: but watch out: someone says “if n is prime, then so's 2^n-1”, which is back to front. Some make the point that if you write a library allowing the user to provide a hash function, it may be favour certain residues modulo some N, which can be countered by a table size prime to N. And it sounds plausible that the cost of a slightly smaller table (~Δ#buckets) could be less than the benefit using all buckets (~#buckets/x). – PJTraill – 2015-05-21T22:21:14.503


Primes are unique numbers. They are unique in that, the product of a prime with any other number has the best chance of being unique (not as unique as the prime itself of-course) due to the fact that a prime is used to compose it. This property is used in hashing functions.

Given a string “Samuel”, you can generate a unique hash by multiply each of the constituent digits or letters with a prime number and adding them up. This is why primes are used.

However using primes is an old technique. The key here to understand that as long as you can generate a sufficiently unique key you can move to other hashing techniques too. Go here for more on this topic about http://www.azillionmonkeys.com/qed/hash.html



Posted 2009-07-17T19:30:04.560

Reputation: 6 880

@Beska Here "uniqueness" is defined recursively, so I believe "non-uniqueness" should be defined in the same way :) – TT_ – 2016-11-16T17:56:43.463

1What does "not as unique" mean? – Beska – 2009-07-17T19:55:39.017

1hahahah.... actually doesn't the product of 2 primes have a better chance of being 'unique' than the product of a prime and any other number? – SpaceghostAli – 2009-07-17T20:14:54.577


It depends on the choice of hash function.

Many hash functions combine the various elements in the data by multiplying them with some factors modulo the power of two corresponding to the word size of the machine (that modulus is free by just letting the calculation overflow).

You don't want any common factor between a multiplier for a data element and the size of the hash table, because then it could happen that varying the data element doesn't spread the data over the whole table. If you choose a prime for the size of the table such a common factor is highly unlikely.

On the other hand, those factors are usually made up from odd primes, so you should also be safe using powers of two for your hash table (e.g. Eclipse uses 31 when it generates the Java hashCode() method).


Posted 2009-07-17T19:30:04.560

Reputation: 44 551


Suppose your table-size (or the number for modulo) is T = (B*C). Now if hash for your input is like (N*A*B) where N can be any integer, then your output won't be well distributed. Because every time n becomes C, 2C, 3C etc., your output will start repeating. i.e. your output will be distributed only in C positions. Note that C here is (T / HCF(table-size, hash)).

This problem can be eliminated by making HCF 1. Prime numbers are very good for that.

Another interesting thing is when T is 2^N. These will give output exactly same as all the lower N bits of input-hash. As every number can be represented powers of 2, when we will take modulo of any number with T, we will subtract all powers of 2 form number, which are >= N, hence always giving off number of specific pattern, dependent on the input. This is also a bad choice.

Similarly, T as 10^N is bad as well because of similar reasons (pattern in decimal notation of numbers instead of binary).

So, prime numbers tend to give a better distributed results, hence are good choice for table size.


Posted 2009-07-17T19:30:04.560

Reputation: 561


I'd like to add something for Steve Jessop's answer(I can't comment on it since I don't have enough reputation). But I found some helpful material. His answer is very help but he made a mistake: the bucket size should not be a power of 2. I'll just quote from the book "Introduction to Algorithm" by Thomas Cormen, Charles Leisersen, et al on page263:

When using the division method, we usually avoid certain values of m. For example, m should not be a power of 2, since if m = 2^p, then h(k) is just the p lowest-order bits of k. Unless we know that all low-order p-bit patterns are equally likely, we are better off designing the hash function to depend on all the bits of the key. As Exercise 11.3-3 asks you to show, choosing m = 2^p-1 when k is a character string interpreted in radix 2^p may be a poor choice, because permuting the characters of k does not change its hash value.

Hope it helps.


Posted 2009-07-17T19:30:04.560

Reputation: 11


Copying from my other answer https://stackoverflow.com/a/43126969/917428. See it for more details and examples.

I believe that it just has to do with the fact that computers work with in base 2. Just think at how the same thing works for base 10:

  • 8 % 10 = 8
  • 18 % 10 = 8
  • 87865378 % 10 = 8

It doesn't matter what the number is: as long as it ends with 8, its modulo 10 will be 8.

Picking a big enough, non-power-of-two number will make sure the hash function really is a function of all the input bits, rather than a subset of them.


Posted 2009-07-17T19:30:04.560

Reputation: 145


For a hash function it's not only important to minimize colisions generally but to make it impossible to stay with the same hash while chaning a few bytes.

Say you have an equation: (x + y*z) % key = x with 0<x<key and 0<z<key. If key is a primenumber n*y=key is true for every n in N and false for every other number.

An example where key isn't a prime example: x=1, z=2 and key=8 Because key/z=4 is still a natural number, 4 becomes a solution for our equation and in this case (n/2)*y = key is true for every n in N. The amount of solutions for the equation have practially doubled because 8 isn't a prime.

If our attacker already knows that 8 is possible solution for the equation he can change the file from producing 8 to 4 and still gets the same hash.


Posted 2009-07-17T19:30:04.560

Reputation: 10 686


I've read the popular wordpress website linked in some of the above popular answers at the top. From what I've understood, I'd like to share a simple observation I made.

You can find all the details in the article here, but assume the following holds true:

  • Using a prime number gives us the "best chance" of an unique value

A general hashmap implementation wants 2 things to be unique.

  • Unique hash code for the key
  • Unique index to store the actual value

How do we get the unique index? By making the initial size of the internal container a prime as well. So basically, prime is involved because it possesses this unique trait of producing unique numbers which we end up using to ID objects and finding indexes inside the internal container.


key = "key"

value = "value" uniqueId = "k" * 31 ^ 2 + "e" * 31 ^ 1` + "y"

maps to unique id

Now we want a unique location for our value - so we

uniqueId % internalContainerSize == uniqueLocationForValue , assuming internalContainerSize is also a prime.

I know this is simplified, but I'm hoping to get the general idea through.


Posted 2009-07-17T19:30:04.560

Reputation: 1 263