Get floored!

This another episode of friends don’t let friends commit bugs into main. I was reading a previous post here on the FAM blog. It gave me a flash back of the one time when I discovered hash tables.

…to bucket a String, s, say in the jvm, you might like writing “s.hashCode() % bucketCount”

Except that expression will produce, as you will quickly discover if you had the whole hash table, negative numbers half of the time for strings with just a few characters.

The ‘%’ operator returns a remainder that has the same sign as the dividend of the division in question, and in this case the dividend is the hash code which is negative sometime due to overflow of 32-but integer range. One replacement operator you probably can use is “math.floorMod(s.hashCode(), bucketCount)”. The floor modulo operator produces remainders that always has same sign as the divisor, in this case the bucket count. This behavior is well documented since C and backwards, so hopefully you will find this information corroborated by many webpages.

Python, (and other scripting languages like Perl and Ruby) on the other hand, has the opposite convention. ‘%’ operator is C/Java’s floorMod. In the python ecosystem, ‘numpy.mod’, ‘tf.math.mod’ and ‘torch.remainder’ all correspond to floor division and return remainder having same sign as the devisor. Confusingly, to get C/Java ‘%’ method in python, you have to call something these all call ‘fmod’.

Some websites suggest masking the number to take the lower bits then %, which may be slightly different from computing the hash in long integers followed by %. Another (stack) overflow post adds the module calculation inside the String.hashCode implementation so that only remainders accumulate, thus avoiding the negativity entirely.

In any case, if you had to write this, mostly likely the slight imbalance between positive and negative numbers would not bother you and floorMod is easiest method to call on any generic “hashCode()” implementation.

Friends don’t let friends drink and drive.

Friends don’t let friends err and merge.

Leave a comment