Ensembles and Architectures

I recall an epiphany about neural networks I had years back while developing a new system: ensembles of neural networks with highly dissimilar training paradigms can be superior, sometimes vastly so, in learning a set of training data when compared to a network of neurons trained using a single learning paradigm. I discovered this quite by accident while trying out different learning paradigms within an embedded system. I assumed the testing and training data were inadvertently mixed at some point in the procedure and the resulting network, having been exposed to all the data, had simply cross correlated the total set of inputs to the answers with nothing left to guess but this wasn’t the case. Puzzled and intrigued it was a simple step to insert another layer of neurons at the top of the networks I was experimenting with to decide which of the subordinates were most effective depending on the input training data. Not unlike the analogy of a ‘boss’ taking input from all his direct reports and ‘learning’ who to trust given different contexts, it appeared the subordinate neurons were learning particular parts of the weight space and the superordinate neurons were learning to pick this out quite effectively; no matter how I restarted the training and mixed the subordinates the resulting improvement was clearly statistically significant.

I experimented with sets of radically different learning paradigms: radial basis, back-prop, Bayesian, Self Organizing Maps, recurrent systems with Hebbian reinforcement, and on and on. The conclusion I consistently reached was compelling: ensembles, or groups of vastly different learning paradigms competing side by side are more effective; especially when the learning paradigms were seemingly as orthogonal as I could imagine to their sibling learning algorithms, over a monolithic layer of learning paradigms which is so often the default architecture. This type of multivariate learning system technique it turns out was discovered by neural network researchers and similar results published many times over but it didn’t stand out as the watershed insight I felt in my gut. It still tends to come and go as a subtext or part of a set of assumptions in various neural network research areas but its an important insight I’ve always followed in a myriad of ways and found success with.

But it also seems to me an important architectural component of real neural networks and perhaps a lasting attribute of future AI designs. The insight that competing ensembles led to better classification systems compelled me to follow this a bit more and I found at least a partial biological comparison drawn in papers exploring how important regions of the brain involved in conscious thought had diverse layers populated with neurons hosting differing shapes, spiking/bursting rates and a myriad of other distinguishing features that affect the output of the network electrically, like the differing learning paradigms I had experimented with. In addition research showing the role of Glia cells, which constitute a higher percentage of brain mass than neurons in the cortex, have been theorized to have a role of low pass filtering signals from regions of activities which perhaps illustrates the value nature has for a winner-take-all ensemble approach to neural hierarchical architectures.  They push asynchronous perception events and associated singals toward superordinate networks to carry forward to other networks as they rattle around and ultimately produce some stable state that helps foster awareness. I realize this is a big leap to make and probably best left for coffee conjectures but what does seem clear is that understanding how neuron transfer functions are choreographed and related to each other across the brain is an important opportunity for new and novel silicon systems – for certain applications this may be the equivalent of Edison’s filament quest and, similar to deep learning models, will likely lead to more effective overall architectures for AI and vary radically depending on the domain of the problem. One of yet another of the factors affecting the complex framework of biological self organizing neural systems that’s shaped my perspective on AI.

This entry was posted in Gleaning Insight, Nonlinear recurrence, Self Organization. Bookmark the permalink.