I’ve been thinking about autoactivations recently. This is one of those great innovations that stood up to the test of time, it still works after a lot of debugging and exposure to new data and models.
I find that I have been referring to autoactivations as pre-activation because they occur to deep neural nets before the parameters are actually mixed with input data (or previous layers activations) but if you look at the two expressions:
- To pre-activate a parameter means to apply nonlinearity before it is used. e.g. preheating the oven, the suffix is a verb and happens before something else.
- But a pre-activation is actually an adjective meaning before any activations. e.g. pre-trial motions. It’s suffix is a noun and becomes the subject to be preceded.
And actually similar problem applies to ante- prefix. So, to avoid confusion, we should probably refer to autoactivations as foreactivations and to foreactivate the layer. This prefix also means before and it works both for nouns: foresight, foreknowledge, forethought, forerunner, foreword, foreman, and also works for verbs: forecast, foreshadow, foredone, foreshorten, forewarn, forestall, foredoom. In each case the suffix is always the prior thing before but never preceded by another.
So, let us all try out foreactivations and related approaches. The speed up in training will surely be a good thing for humanity, at least, we won’t be consuming as much energy training models without foreactivations.