Get real!

Funny thing happened to me today. It’s like that time I found out about integer division // by googling during an interview. This time, it was not an interview, but still a hair raising surprise to me. So the following with numpy arrays:

>>> a = 16796160.0

>>> np.array([a,], ‘float32’) == (a-1)

array([ True])

>>> np.array([a,], ‘float32’) == (a+1)

array([ True])

Cool, huh?

The problem is 32-bit float has about 7 significant digits for integers and this number has 8 digits, the least significant is not kept. Even though numpy came out of the box float64 by default, some tool, somewhere among the Great Python Data-science Stack, the default for float is 32 bits.

The data we deal with now counts in the tens of millions and you cannot count it using float.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s