Funny thing happened to me today. It’s like that time I found out about integer division // by googling during an interview. This time, it was not an interview, but still a hair raising surprise to me. So the following with numpy arrays:
>>> a = 16796160.0
>>> np.array([a,], ‘float32’) == (a-1)
array([ True])
>>> np.array([a,], ‘float32’) == (a+1)
array([ True])
Cool, huh?
The problem is 32-bit float has about 7 significant digits for integers and this number has 8 digits, the least significant is not kept. Even though numpy came out of the box float64 by default, some tool, somewhere among the Great Python Data-science Stack, the default for float is 32 bits.
The data we deal with now counts in the tens of millions and you cannot count it using float.