Nonlinear Recurrent Connections

Unusually for me, I’ve usually agreed with the choice of linear circuits; especially when applied to practical stuff.  Noisechaotic behaviorphase distortion  – there’s a long list of reasons guiding designers away from a nonlinear approach in many diverse domains.  But I could never relate to nonlinear design’s harshest critiques: “strange, odd, risky” given the need for differential equations and the layout separation that often accompanies their implementation.

However, we can make mistakes when defaulting to linear design, regression, and analysis; many fundamental processes are elastic, and serious consequences in science, finance, and so on result when ignoring the ubiquitous nonlinear nature of interactions.  To me, the most interesting of which are the infinite number of self-organizing nonlinear transfer functions at the heart of neurons, which comprise a critical building block in the opaque essence of human consciousness.

In addition, in most large electrical systems, feedback loops nested throughout complex networks, or simpler cascading directed graphs, can give rise to incredible potential and features; and natural neural systems appear no different.  The recurrent wiring inherent across the neocortex and other critical regions are vital to an array of conscious and unconscious processes.  However, for quite a while, it seemed most neural network experiments seemed focused on feedforward networks.

It was an early time in the reintroduction of research into nonlinear connections and much time was spent on exploring basic tenants of their domain; however, another likely reason was the order of magnitude of complexity recurrent connections introduce in order to analyze and understand their chaotic behaviors in the presence of nonlinear systems.  This includes temporal effects, and how they influence short-term memory in the hypocampus leading to the subsequent, necortex formation of long term associative neural connections.

On the path to sentience, I believe recurrence is paramount to providing continuity of thought, convergence of auto-associative memory, as well as predictive abilities.  To me, nonlinear, recurrent, connectionist architectures of enormous scale have very little in common with current digital computers and much hyped AI simulations, other than the basic ability of universal turning machines to emulate similar interactions.  As such, to me this is an early, but fascinating time in the exploration of singularly unique machines, whose successors I expect will progressively give rise to systems it’s easy to imagine will possess incredible promise.

As such, this blog entry will touch on the advantages and challenges of complex designs applying nonlinear architectures in recurrent connectionist systems, as compared to more deterministic approaches toward artificial intelligence.  But more importantly, how their use can diverge in important ways, including fundamental questions, which when taken to the limit, may even ultimately affect humanity.  Perhaps here I’m edging toward philosophy as much as computer science and cybernetics, the usual focus of this blog.  Thank you for indulging me.

Simple is better; however, newer technology and tools support malleable, easy to use nonlinear designs.  An example of the multifaceted, combinatorial aspects of technology today; borne of insight out of phase from past leaders, replaced by entrepreneurs with the mettle for exploring new territory, and in the process continually refining the state of the art.   I can’t remember running across a young electrical engineer worried about the use of nonlinear designs, once considered heresy by so many.

My perspective also hosts nostalgia and admiration for past mentors, whose incredible narratives and vision of the facilities and inventions they leveraged from the initial analog design phase inspired me; like Solomon secrets:  hosting a style of nonlinear thinking, deeply expressive, representing art as much as engineering.   An early phase in the remarkable era that inspired amazing combinations of analog, digital and nonlinear advances.  For example, the PSTN provided five 9s to millions long before incorporating  ICs.  Yet, ultimately losing the race to linear design and digital technology rapidly boosted by Moore’s Law, along with libraries of arrays of simple flip-flops; queued for fabrication techniques and ready to begin mass production, before there were projects and services ready to receive them.  The die was cast before we were aware of how digital dies could best be cast, and in their wake analog design faded fast.  Moores the pity and the law.

Perceptrons showed up early in the AI contests: supervised learning computers designed as linear classifiers, with support from Ivy League institutions, hyped by even the Navy as a breakthrough in image recognition, heralding great promise: “the embryo of an electronic computer that [the Navy] expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence.”  That still sounds a bit goofy, and the Perceptron story a bit stale; probably recounted more often than it deserves, but an important catalyst for a broad effect on the legacy of AI that I still see echos of today.  Somehow, along the way, we lost sight that op amps and transduction are realistic reflections of interfaces and processes everywhere and as such, should never be shed as engineering tools.  To me, digital and analog are symbiotic lexes useful on anyone’s journey to know more – it’s a self con to discard one for the other.

One of the Perceptron’s novelties was the ability to partition sections on a two-dimensional plane, classifying objects into distinct categories; this was a cool and intriguing first step, except when it became clear nonlinear patterns couldn’t be segmented in this fashion, due to the linear transfer function at the heart of this very particular paradigm’s training scheme. Several esteemed researchers gained notoriety for stomping on the Perceptron, proving even simple XOR problems couldn’t be linearly separated.  Funding quickly evaporated in vast, related fields.  Like ancient feuds, one can still find the very definition of Perceptrons debated, as if an attempt to repair their lack of malleability and justify an early stumble.  A lesson for me is that linear and nonlinear designs, as simply one example of many design considerations, ultimately lead to tremendous variance of design potential and philosophy of approach, as do countless other emerging ideas on the road to architect sentience.

Of course now there are oodles of endless nonlinear transfer functions of every shape, wrap, band-pass, cadence, and other creative names in recurrent, deep hierarchies that have and continue to be proven mathematically to be able to emulate most, if not any kind of function.  So much for the powerful XOR nonlinear problem that derailed an entire branch of research, funding, and opportunity. For most complex, modern networks and transfer functions, it’s typically the more important question for research how networks of infinite forms of transfer functions can be most effectively trained and stabilized.  Many are also becoming aware there may be issues constraining them, once scaled up to trillions of concurrent interactions, which we have no current hope of gleaning insight into and perhaps even interrupting once launched.

Curiously, I recently read that new generations of AI students seem to go through a predictable cycle: initially excited by its prospects and promise, soaking up directed training/unsupervised learning paradigms, making incremental progress from decades of documented dabbling, and embedding them in all sorts of emerging distributed and mainframe contexts.  Ultimately financial needs, the end of their thesis, time, etc. steer them to embrace more practical areas of deterministic AI where progress (self driving cars and an endless succession of other cool stuff on the horizon) is ever easier to achieve and sustain a career.  But most realize whatever they are creating is a compromise and another delta in updating the ongoing definition of Weak AI.

Students and researchers seem bummed by memories of endless training sessions, disillusionment as they ultimately glean insight their solutions are completely opaque, and incredibly difficult to reliably reproduce; yet, surprisingly, frustratingly, and ironically astonishingly powerful in the narrow domain where targeted – an irony similar to many tenants of physics and science in general.  The field is clearly having second thoughts of late, brandishing their frustrations across several decades of slow, incremental progress, and focusing anew with ever more powerful contexts of software and machines toward parallel, self organizing connectionism.  The idea seems to be to be to encapsulate them in discrete, controlled environments for safety and focus of function.  A tall order that.

It’s interesting how some have decided the answer is a stone’s soup of all corners of current AI approaches, and at least one recent book I read challenged the reader to get on with assembling it, as if we simply need to follow through with his recipe.  It appears some support deterministic supervision using expert systems, a pinch of SVMs, and so on along a host of other attributes misc paradigms still in vogue can offer to balance, provide safety, and add value to the whole.  I wonder how they skip over the obvious fact the brain doesn’t use any current AI paradigms.  For anyone who is familiar with business, its easy to sympathize with the reality that their investors now probably demand to see AI as part of the corporate portfolio, but that doesn’t mean a solution is nascent.  Nor has it ever been.

A proof based, deterministic approach eclipsed everything AI for decades after connectionism was temporarily abandoned, continually fostering interesting, new experiments and prototypes; wafting about a bright future and financial fortunes, while blowing oodles of taxpayer and private money alike.  Quite many seemed to shimmer until limitations dragged reality back to the forefront, and the next ‘breakthrough’ percolated up.  Regardless, its clear there were and continue to be many stunning successes: in particular, exhaustive search in the form of chess, combinatorial statistics based strategies that beat Jeopardy champions while refining their approach over distributed architectures, as well as other domain specific problems which gave momentum to an approach which still encourages many, if not most developers, who likely, in this life, will continue to believe in it as a panacea.

I’m a pragmatic fan of determinism and Weak AI for intellectual and other earthy reasons.  Yet, I’ll never empathize with why some were given so much credit for compering the clarion call that set back research in important, tangential fields, like nonlinear connectionist systems, for decades. Conversely, Grossberg’s Adaptive Resonance Theory is still considered a watershed in advancing understanding of convergence within connectionist systems and the power it brings to neural systems.  I can ostensibly forever reflect on it with new insight.  Thank you Steven; yours is the closest point of perigee to understanding how neurons converge towards quiescence IMHO.

To me, nonlinear complexity seems to hold an interesting psychoanalytical relationship with humans, given the immense potential and power it represents.  My own ten-cent psychology leads me to believe it’s upsetting to realize something so powerful converges toward the incomprehensible.  As such, we’re prone to classify it as non science, regardless of the theoretical potential and hard evidence of realization via our own neural network, the human mind.  No wonder it was set aside for the much more practical, immediate, and rapidly expanding digital world.

So it’s odd that posterior probabilities, and random statistical emulations, are clearly part of the kindling it appears to me are fundamental for creating and testing ‘Broad or Strong AI;’ which leads to a more capable, complex, and ultimately opaque intelligence that inevitably will learn on its own.  In contrast to ‘Narrow or Weak AI:’ typically a deterministic, potentially comprehensible approach usually built from reason and logical inference.  Perhaps a more precise understanding follows if we see what the neocortex might or can potentially grasp at some specific level, e.g. Narrow/Weak AI, and in the case of the former, what it never will, when taken to the limit, as the system naturally, irrevocably becomes both erudite and impenetrable.

To refine the idea, perhaps the point where human intellect becomes challenged in understanding a complex system is simply somewhere a bit before the cross over to what gives rise to systems which can bootstrap their own intelligence.  Even if the process of launching sentient life is known, the resulting system will likely be unknowable by us.  In contrast, Narrow or Weak AI can already do pretty amazing stuff and, given enough effort, showcase malleable attributes that create systems directed toward specific goals that quite clearly appear imbued with more than just determinism – perhaps a kindling of human-like intelligence, but clearly not the same.

But it’s Broad or Strong AI that both threatens and beguiles us toward something permanently beyond ourselves that may lead to fundamental, perhaps one-way changes. That’s a bit concerning to many already, perhaps as it should be, especially as time and systems keep advancing exponentially.  Yet it’s difficult to imagine thwarting the vector of human will to innovate; ironically illustrated in the ending of Player Piano, as a vending machine is repaired in the ruins of an apocalypse brought on by machines, and the effort applauded.  Like many I don’t see an answer, just concern over something important and fascinating, moving rapidly, and which we appear unable to grasp, much less define currently.

The analytically impervious nature of neocortex meshes of connections host the impossible to comprehend: concurrent, millisecond synapses and an average of eight thousand axial to dendritic pairs between billions of neocortex neurons, along with as yet unimaginable numbers of other discoveries ahead.  I agree with those who classify our own neocortex as the first and only example we’ve seen of Broad or Strong AI.  The issue then is that there may ultimately be more examples which we inadvertently play a role in boot strapping and quite feasibly lose control of; all the while only dimly aware of the initial catalyst we stumbled across, much less the inestimable outcomes.  That systems become opaque to human understanding as they become complex seems self-evident.   The problem is then they can also become uber capable in ways we just can’t relate to in our present form.

Back to incremental progress over the past few decades, the many learning paradigms I’ve reviewed seem to have little to do with natural self-organization:  inspired by the fundamental ways nature pieces everything together, from molecular crystals to information in the nucleus of cells used as a function to create life which gives rise to consciousness.  Perhaps we could foster vital insight through staid, sustained focus on how self-organization of recurrent meshes of massively parallel, recurrent, neural connections with nonlinear transfer functions, low pass filtered by glia cells and awash in hormones, fosters sentient thought.  Breakthroughs might follow faster than we expect.   For a simple comparison, we marvel at the less than one-hundred years it took for flight to space travel to transpire.

The outcome now of many a blur of Narrow and Broad AI projects, seems to have been to attract investment toward small eddies next to sand bars that spin until the money drains away; where it’s easy to get lost in goofy goals cast as zillion dollar future IPOs, launched by the chemically clouded/technically vacant evangelists who whisper frantic fear to confidants far afield from their typical, initial over the top braggadocio confidence – eventually lost in disillusionment and a dissolved financial stake they’ll never pay back, representing monkeys on their backs for the duration of most of their lives, forcing them and their beneficiaries to forget the beautiful ideas that once inspired them.  Do yourself a favor and run from them.

But it doesn’t have to be that way.  Conversely, developers can give quiet rise to some of the most potent concepts and instantiations that make progress, with small, even singular cadres of developers who seek anonymity; fostered by development contexts which cost virtually nothing; save a compiler, an average computer, perhaps a Faraday Cage and a box of ARM machines with pieces of magic that cost pennies.  Some desire nothing more than to develop systems that seek answers at the top of a mountain in the clouds.  Even if it leads nowhere to nothing, except a closer understanding of being human.  Seems to me it’s a magnificent time to occupy a digital abbey.

In addition, all this culture of AI may be getting in the way of itself.  Perhaps we should stop naming new learning paradigms, like we do compilers and languages, and instead itemize their coefficients and parameters, such as nonlinear matched filters, spiking patterns, and so on in biological and evolutionary sets.  At any rate, it appears many now see this in a more mature context; that self-awareness from other than carbon based life forms is probably going to get here in a few generations.  If and when so, I hope they have mercy on us before they decide to escape this important rock, with its thin, gentle, paradise biosphere.

So it seems clear to me there are well-known connectionist architectures with specific attributes that look to possess incredible potential. Although an easy trap to get lost in, and wholly unwieldy, recurrent connections seem fundamental for sustaining the catalyst for vital aspects of AI. As a personal preference, it’s often selected as part of the building blocks chosen from the many variants on auto associative memories, bidirectional or otherwise.  This often stands out to me more important than an overriding focus on scaling up the number of hidden neuron layers across hierarchies, which has received much importance given its relationship to deep learning.  In my experience nonlinear, recurrent connectionist systems become entirely opaque, as they distribute complex functions across vast, cavernous, meshed layers of both forward and coiled, backward connections; encapsulated in massively parallel, recursive copies of the same.  They quickly become imperceptible as their potential exponentially rises.  As the decades pass and new instantiations are realized, I wonder how long will we be a part of the progress.

The context I enjoy tinkering in this field are small, embedded, concurrent Turing machines, that emulate nonlinear transfer functions with more backward referenced connections than forward, hosting countless varieties of varying coefficients influenced by time, phase, and frequency matched Kalman like filters.  These trillions of what appears to be packets of order statistics seem to saturate themselves, for me, in the form of mesh typologies, an interesting aspect of which time doesn’t always waft across the connections at the same rate, unlike a CPU or Stratum clock tick; while generally overall producing a system that might look like mud to the casual observer.  There are endless small loci of effects, where every neuron fires within its own context, while asynchronously interacting with neurons several inches away across a small skull holding a three pound, impossibly opaque, squashy chaos of 100 billion cells that can produce anything from wit to imagination via what looks like a scaffold of semi-organized mud.

How to wash it in simulated software hormones that can mimic catalysts for system vectors on the path to sentience eludes me: one of millions of hidden heuristics that I believe hinder progress across the spectrum of the ostensibly inestimable deltas of knowledge prerequisite for neocortex emulation.  There are many other mysteries which I have confidence will ultimately fall and leave us not sure what is the next step, much less the path forward.  I agree with those who say we must plan for that eventuality.  I don’t think it’s wise to tinker with large systems of this ilk, much less share it in tangible form.  What’s daunting and damning to the churn of the future is apparently all you need are a few billion, concurrent, emulated Arduinos/Raspberry Pis along with a not so weak mind, which is probably pretty common in any State University CS program.

For the deterministic mind-set, I do believe some logic bounds exist at certain levels to give some semblance of control, for which we take great pride in ourselves as conscious beings that can cast our own destiny.  Infinite quantum waves collapse from infinite particle potentials, which free will aggregates into the reality we experience, supporting the ego joy we revile in, as we convince ourselves that we’ve charted our unique future while threads of other universes have, as I perceive the theory, effectively vanished from our current ability to get to them.  I agree with Richard Feynman that I’ll never understand quantum mechanics, even as quantum theory will likely, and ironically in a few years, usher in quantum machines that have a very realistic shot at emulating true sentience.  Go figure.  As a very respected physicist has said: “The universe is under no obligation to make sense to you.”

This mental magic transpires, completely oblivious to us, interwoven in the vast, unexplored characteristics of the wiring of the brain which rattles consciousness along following no granularity of time; with eigenvectors of electrochemical effects washing concurrently in every direction across a massively parallel cortex firing in infinite slices, roughly at the millisecond level, hosting magic that somehow integrates thought from a billion corners within a small skull, enabling me to put nouns and verbs together.  It’s daunting to say the least, to consider how far we would have to go to get near anything resembling comprehension of the least of the details of this incredible dance of infinite potential.

The self organization of our minds is a beautiful counter example to entropy and a characteristic of life that is likely ubiquitous, across countless galaxies Hubble collected photons reveal after weeks of looking at a pinpoint in the blackness of space.  It’s painfully apparent how little we understand of ourselves and how what we perceive as consciousness has emerged; but most now realize it’s extremely unlikely we’re not the only creatures, made of star stuff, who have, are, or will ponder their own meaning and origin.  How can a human not be spiritual in the face of this?

An interesting analyst of consciousness once said “The essence of society is the repression of the individual and the essence of the individual is repression of himself.” Psychoanalytical considerations aside, I expect unraveling ourselves from the id to the axon to glutamic acids are an important step to questions many orders of magnitude more interesting; and, will ultimately affect society in ways we can’t predict.  This may be especially significant when insight inescapably compels forward engineering of sentience instantiations that immediately cross over, beyond each of us individually, and echo back onto the larger locus of humanity that will, in my opinion, effect everyone, everywhere.  What then of societal and psychoanalytical repression forms that have held us wherever and whatever we currently are?  What happens to sentience when it can self evolve via massively parallel, self-organizing systems in a microsecond what took carbon based life forms several billion years?  A clear reason Artificial Intelligence is, and will increasingly be, the major question worth pursuing.  Plausibly the only question rightly worth pursuing, as it determines answers to so many other questions.

Posted in Nonlinear recurrence | Comments Off on Nonlinear Recurrent Connections

Ensembles and Architectures

I recall an epiphany about neural networks I had years back while developing a new system: ensembles of neural networks with highly dissimilar training paradigms can be superior, sometimes vastly so, in learning a set of training data when compared to a network of neurons trained using a single learning paradigm. I discovered this quite by accident while trying out different learning paradigms within an embedded system. I assumed the testing and training data were inadvertently mixed at some point in the procedure and the resulting network, having been exposed to all the data, had simply cross correlated the total set of inputs to the answers with nothing left to guess but this wasn’t the case. Puzzled and intrigued it was a simple step to insert another layer of neurons at the top of the networks I was experimenting with to decide which of the subordinates were most effective depending on the input training data. Not unlike the analogy of a ‘boss’ taking input from all his direct reports and ‘learning’ who to trust given different contexts, it appeared the subordinate neurons were learning particular parts of the weight space and the superordinate neurons were learning to pick this out quite effectively; no matter how I restarted the training and mixed the subordinates the resulting improvement was clearly statistically significant.

I experimented with sets of radically different learning paradigms: radial basis, back-prop, Bayesian, Self Organizing Maps, recurrent systems with Hebbian reinforcement, and on and on. The conclusion I consistently reached was compelling: ensembles, or groups of vastly different learning paradigms competing side by side are more effective; especially when the learning paradigms were seemingly as orthogonal as I could imagine to their sibling learning algorithms, over a monolithic layer of learning paradigms which is so often the default architecture. This type of multivariate learning system technique it turns out was discovered by neural network researchers and similar results published many times over but it didn’t stand out as the watershed insight I felt in my gut. It still tends to come and go as a subtext or part of a set of assumptions in various neural network research areas but its an important insight I’ve always followed in a myriad of ways and found success with.

But it also seems to me an important architectural component of real neural networks and perhaps a lasting attribute of future AI designs. The insight that competing ensembles led to better classification systems compelled me to follow this a bit more and I found at least a partial biological comparison drawn in papers exploring how important regions of the brain involved in conscious thought had diverse layers populated with neurons hosting differing shapes, spiking/bursting rates and a myriad of other distinguishing features that affect the output of the network electrically, like the differing learning paradigms I had experimented with. In addition research showing the role of Glia cells, which constitute a higher percentage of brain mass than neurons in the cortex, have been theorized to have a role of low pass filtering signals from regions of activities which perhaps illustrates the value nature has for a winner-take-all ensemble approach to neural hierarchical architectures.  They push asynchronous perception events and associated singals toward superordinate networks to carry forward to other networks as they rattle around and ultimately produce some stable state that helps foster awareness. I realize this is a big leap to make and probably best left for coffee conjectures but what does seem clear is that understanding how neuron transfer functions are choreographed and related to each other across the brain is an important opportunity for new and novel silicon systems – for certain applications this may be the equivalent of Edison’s filament quest and, similar to deep learning models, will likely lead to more effective overall architectures for AI and vary radically depending on the domain of the problem. One of yet another of the factors affecting the complex framework of biological self organizing neural systems that’s shaped my perspective on AI.

Posted in Gleaning Insight, Nonlinear recurrence, Self Organization | Comments Off on Ensembles and Architectures

Algorithms, Data Structures and AI

I recently had an interesting discussion about the enduring classic Gödel, Escher and Bach: An Eternal Gold Braid (GEB) by Douglas Hofstadter. The book is so famous and influential that High School classes have been modeled after it; it won the Pulitzer Prize for non fiction in 1980. To group it in with other classics is a well-deserved sorting as it’s a marvel covering a great many interesting topics in math, art and music. But what I remember best is it’s Lewis Carol like spirit of exploration of discrete computer science concepts such as recursion and symbolic representation. It reminds me how some suggest that recursion, hierarchies of data structures and other computer science concepts are useful building blocks for AI. In the book Hofstadter points out how cognition/thinking bootstraps from simple processing constructs, e.g. neurons.

Tools like recursion have always had an appeal for AI developers given their economy of expression and power they leverage from a surprisingly few statements. It’s very slick how complex problems can be broken down by recursion into simpler versions of the same thing and ultimately solved via a few statements nested at the bottom of a stack the Operating System so naturally and invisibly manages. Any hierarchy and any direction of traversal can be embodied in a subroutine consisting of just a few lines of self referential code.

This same quest for algorithmic parsimony seems central to the idea that thought will ultimately be modeled by a handful of recursive-like, Platonic principles which simply haven’t been pinned down yet. Perhaps, but as I became a better programmer I tended to leave recursion alone. Maintaining recursive decent involved increasing amounts of time when adding a new twist for a customer – what once shined as a perfect expression of concise power often would iteratively evolve into a multi-threaded knot of confusion. At some point it’s simply easier to completely spread the work out since RAM/CPUs are ever faster and Disks/Labor can be found ever cheaper as long as the coding chore is made simple.

The ancient question of whether we live in a world that can be modeled with pure logic or is stochastic by nature extends back to the time of Plato and formal logic/proofs as codified by Aristotle. This is in contrast to the probabilistic view characterized by the later and more colorful Roman society with their leaders throwing dice for decisions and ultimately consummated with Neils Bohr telling Einstein to ‘stop telling God what to do with his dice’ at the birth of quantum physics. The probabilistic view’s relationship to consciousness is taken to the limit by some very capable people who assert that there are quantum effects influencing neurons and as such they may never be suitably modeled. I tend to think the self-organizing view requires neither formal proofs nor quantum physics as the former I expect will lead to the same locus bound systems that have stumped AI for the last several decades and the latter seems a bit of an overkill, as well as a convenient way to thwart attempts to model consciousness, perhaps based more on ethics and a not unfounded pandora’s box fear than reality.

GEB goes from fun and fanciful to welcoming the casual reader into the mathematically serious world of irrationality ad infunitum as it playfully starts to tap and then kick the stuffing out of symbolic logic using Godel’s Incompleteness Theorem. Everything gets wrapped up in atomic statements like “I’m lying” and its recursive contradiction. But the arguments he makes are so carefully iterated and shown so to be so clearly unavoidable in formal systems that when I first read it I reflected back that I probably wasted time chasing the self defining world in the Theory of Formal Languages class I struggled with in college assuming it led to something bigger. Actually that class was very useful in simply helping me understanding compiler front end parsers like LEX/YACC, which I’ve used in a dozen Unix projects in my career, so my advice to CS students is to try to enjoy it as there’s obviously great value there, but production rules, meta-symbols and grammars are not a panacea for AI to me.

After GEB made samurai sandwiches out of so much of the discrete, finite math stuff I was fed in Computer Science courses, it was conveniently easy to rationalize a new direction. Now did I really need to read that book to realize a better direction for AI lies in taking a bit more inspiration from nature? No, of course not, lots of folks just use common sense to conclude its a bit silly to attempt to model an infinite universe using rules in what we used to call a ‘finite core.’ The unconscious serves up a lot of stuff from a lot of different directions for what we perceive as a single thread of consciousness. That tip of the iceberg awareness has led to years of folks starting out over and over with systems that try to capture the decision making we label as thought but its just that, its just our perception of what’s happening in our minds. Cascades of pattern recognition start transforming light into nascent concepts as soon as it starts moving down the optic nerve. We say say ‘cup’ and then fool ourselves by designing forward object oriented databases that seem to be a complete model of the ‘cup’ system but of course its not. Its the billions of neurons firing in parallel and transforming information into awareness while moving across a signalling distance equivalent to a few hundred milliseconds of relatively slow, electrochemical synapses which is why we can recognize a cup no matter what light, angle or portion of it we see. We may believe we think in discrete fashion from our awareness of the thread of our consciousness’s trail of thought, but we’re built from billions of neurons that are constantly producing this environment and we’re unaware of any of it.

Picking through some of the other common data structures though, it seems to me casting neurons in a hierarchical template is pretty fundamental if not essential. Jeff Hawkins argues this skillfully in “On Intelligence” and the fact there are very roughly some six identified layers of the neocortex divided between instructing downward towards the cerebellum and upwards for what seems to me as imagination and pure consciousness seems prima facie evidence the neocortex leverages a hierarchy. An interesting twist is that many of the leaf neurons involve a recurrent connection to input neurons to enable one to essentially “hold that thought’ or provide a continuity of cascading memories or predict the future or any one of a number of other such temporal capabilities; in this way the hierarchies seem to me to be loops. The branches encapsulate loops of branches of loops and so on limited physically but in theory like some connection nested ever expanding Mandelbrot set.

A fractal like structure of loops within hierarchies could be boundless but a pared down version of this structure seems to exist everywhere across regions of the brain giving them their characteristic Golgi stained look of an sloppily artistic contour copy of M. C. Escher’s Cubic space division. This odd construct can apparently provide the plasticity brains demand: for example, the hippocampus can immediately store a short list of numbers, such as a taxi driver sorting away a large number of addresses, and is speculated as a cause of thickening of this area in the brains of folks who have to memorize a lot of information.  At night sleep seems to act like a kind of genetic algorithm as represented by the neocortex slowly sifting through and deciding which recent short-term memories should merge with the larger, longer term, auto-associative memories stored there.

As such its become a bit of a corollary for me that custom design variants on auto-associative memories are vital components of starting successful AI architectures. Its obvious new insights stemming from non binary architectures, similar to a guitar player switching to Drop D tuning and finding his compositions gain a completely new style, have a creative effect that designers learn to quickly make use of and represent a markedly apparent contrast in style from indexed sequential memory. Of course the only thing worthy of long term interest which means money will be working code which is precisely how it should be. Its an understatement to say its a financial risk for very small companies to skip ahead and turn extremely complex, revolutionary hardware architectures directly into silicon; however, I’m very encouraged that IBM appears to be taking the risk , now with help from DARPA, and attempting to create a new kind of CPU inspired from biological systems that is fundamentally different from the Turing Machine architecture we’ve all used since the end of WWII. Still, designing 3D matrices of inexpensive, microcontrollers to emulate neurons is one fun alternative to custom silicon or purchasing a data center for raw computing power to emulate massively parallel computation. Besides, all those cool 70s era heuristics like circular queues for real-time communication are relevant again on small systems with no OS, and the physical dimensions and weight of Arduinos even with 9V batteries attached can drive real stuff you need a transistor to step up power for like legs and arms.

To sum up this overextended blog entry its my belief the essential data structures for some realistic attempt at biologically inspired brain emulation are probably already well known: neurons with ensembles of simple nonlinear transfer functions and hierarchies encapsulating loops everywhere reading/writing information via a very distant cousin to what wiki and texts usually describe as auto-associative memory. That’s basically one way of describing the neocortex as I interpret neuroscientists; multiplied by a few billion neurons, with about 8,000 connections on average between each neuron, wired everywhere near and far across it’s fabric. We probably won’t see them on the shelves of Frys for a while though as the white matter wiring sitting beneath the neocortex alone is beyond any technology of today. Jeff Hawkins, whom I’m obviously a big fan of, in On Intelligence proposes a fiber, presumably wave division multiplexing, emulation for this serious communication bump in the road to AI together with ever faster machines to simulate the billions of neurons that we obviously could never use metal solutions, e.g. solder, to hold together. Finally, everything in biological systems fire in parallel with no direct equivalent of a system clock but a collection of overlapping electromagnetic ‘waves’ whose eigenvectors somehow wash over neurons in a combinatorial dance conducive to thought. There could be many more layers to the onion we’re not yet aware of.

Apart from the tall order of instantiating this machine an important step will involve linking what is likely already a sentient system to deterministic systems we’ll hopefully still be in control of so as to focus their attention on interesting problems. As Carl Sagan said so beautifully “we’re star stuff seeking to know itself”: whether carbon based, silicon or mixed I believe that’s a forever true statement. Someday there may be a ‘Google’ pondering a million times faster than a chemically bound human neuron’s transfer function in orchestration with several billion parallel processing units involved in much more than just finding cool links on the web. I feel there’s an incredibly challenging, mesmerizing and beguiling development ahead of us for self organizing, massively parallel AI and the answers resulting systems provide might well help us with some of the most significant questions of all time. I guess we should all stay tuned and of course pick up a copy of GEB, its really a wonderfully fun book to explore.

Posted in Cool Speculations, GEB, Self Organization | Comments Off on Algorithms, Data Structures and AI

In Awe of the Cerebellum

A lot of the cool books and papers on AI I’ve read have focused on the role of the hippocampus and neocortex in human cognition. This seems right since the hippocampus is key in initial short term memory formation and the neocortex in long term memory, as well as the latter’s famous role in thought, language and other areas considered essential to being human. But it seems to me an area of the brain often overlooked is the cerebellum, especially when trying to produce a software/hardware architecture that can enable smooth movements.

The cerebellum appears to act like a sort of differential correction subsystem, like some elaborate phased locked loop or comparator circuit that continually and unconsciously makes corrections to signals emanating from the neocortex that direct the body. Damage or disease to this area typically results in jerky, stiff or wildly over-corrected movements: symptoms not unfamiliar to embedded programmers attempting to make servos smoothly operate a robot’s arms or pseudo hands. Similarly, walking robots still appear unnatural or wooden and always limited. I don’t think this is for lack of effort on the part of designers. Usually progress is a function of faster processors/more efficient programming or an improved solution to the differential equations governing the dynamics of the system.

Yet it’s obvious insects and animals are not solving high order math problems while traveling around. There is normally no effort required for them to move seemingly beyond picking a direction. This is a characteristic of all living creatures irrespective of whether they have fins, legs or wings. Humans from toddler age and on are ambulatory without conscious thought, having committed to ‘muscle memory’ in the cerebellum the motor learning skills required for smooth walking.

There are other more basic problems facing a designer attempting to emulate creature movements, not the least of which is finding a suitable power supply that can sustain a skeletal framework hosting the gear required for the equivalent of legs, sockets, hoofs etc. Still, the purely mechanical issues I expect will be solved with technology that exists or is nascent. An unsolved problem is the real-time computational control of such systems as well as how to direct smooth, fine motor movement for any situation. Given the decades of work that have preceded with still no good solution at hand a different paradigm for programming the system may be required to make progress. An exemplary system to mimic in my mind is the self-organizing, neuron architecture of the cerebellum; its hard to argue with such amazing success.

Not unlike training an infant first to roll over or learn to crawl one could expect some sort of feedback system with a cerebellum inspired governor to ‘teach’ machines to ‘walk’. The payoff is that such systems can be copied, not unlike the genetic wiring that enables a colt to stumble to its feet on its first day after being born. This magic seems to me yet another in a long train of instances where self-organizing systems, like the neocerebellum and its relationship to the neocortex, are a likely prerequisite to move from gawky, barely stumbling robots we’ve all seen on YouTube to free flowing systems that can begin to move a fraction as well as a lizard. I mean to take nothing anything away from engineers working on making robots move using traditional approaches, it’s an incredibly difficult problem and the widespread use of robotic arms for manufacturing and similar jobs with a finite locus of movement are a testament to their success. They have only my respect and I wish anyone taking on this challenge in their career or spare time the best. I certainly could be wrong on my fascination with self-organizing systems as a key component of a ubiquitous solution, but that’s my belief and pursuit.

What I am looking forward to is the day when we’ll finally acheive smooth walking systems that can help the elderly and sick folks with simple physical tasks through a voice command or save a life after a fire or disaster by scampering over debris too difficult for rolling robots. There’s much to be done in order to emulate biological movement; a good starting place in my opinion is with the cerebellum and how it controls so much of the body automatically and with such incredible precision.

Posted in Cerebellum | Comments Off on In Awe of the Cerebellum

Following The Trail To AI

Many AI start-up stories I recall can trace their history through well known phases: the initial hype and formation phase sometimes leveraging a thesis, book or even a simple Power Point presentation based on a new heuristic in hopes of motivating Angel investors. If funded this usually leads to a 4 P’s of the Marketing Mix roll out plan driving towards the launch phase long before the product has been developed/tested.

Often the real programming work then begins and, especially in the case of AI, problems can lead to a kind of combinatorial explosion of issues unless there is incredible focus on the task at hand. An unfortunate consequence can be pressure to produce a tightly bounded, domain specific solution so as to make good on promised dates and give investors a warm and fuzzy feeling that they’ve picked a good pony. Product schedules, which all business naturally use as a guidepost, is a critical yeardstick to measure successful execution but sometimes problematic when the solution, like real AI, requires so much different thinking, creativity and has proven so elusive for so many decades for so many. I expect someone creatively designing and developing off in the quiet contemplative corner of their garage, without the pressure of paying back investors before the next quarter, has as good of a chance as anyone these days given the ubiquity of low cost primary and secondary storage and incredibly fast microcontrollers that cost less than a cheeseburger.

The initial roll out of many AI products burns brightly at first and is further fanned with accolades by the tech press; but, as so often has happened in the past, fails to deliver a real-time system commensurate with the things humans do so easily. For good reason this fact has jaded many today who in years past might otherwise have bet their tenure/jobs and/or pocketbooks on any one of the endless sequence of new AI paradigms.

Nowadays if you even anecdotally mention an AI project you’re working on you’re likely to get the glazed, ‘can we move onto something real expression’ or stony silence from competent developers and tech investors alike. There is often a long, perquisite recounting, starting with why deterministic rules can’t cut it to the over fitting of neural networks, in order to explain why all the obvious AI solutions of the past haven’t produced much beyond the glue logic of the best Loebner prize winners. The task is a bit easier when talking to someone with the magical 10,000 requisite hours of expert qualifying software and firmware development experience. I had passed that mark several times over in my career and was still confident simple if then else statements would be all that’s needed to produce HAL before stumbling upon tech classics like “Parallel Distributed Processing” or those for all audiences like “Computer Power and Human Reason” by the great Joseph Weizenbaum.

That said, I still expect the simple control flow mechanisms inherent in machine instructions of the last 70 years actually will be all that’s required to usher in breakthroughs in AI someday, but likely as a Turing machine emulation of the increasingly well understood scaffolding of the neocortex, with its many dissimilar ensembles of neuron layers, hosted by some enormous cluster of CPUs; capable of exceeding the signaling speed of billions of axons firing in parallel and organizing information around the temporal nature of memory . Given all the AI hyperbole of the past, or simply some of the bizarre implications and complexity of the subject, I can understand why many choose to loudly sigh when the subject is broached. But it’s also hard for developers like me to not talk about something we’re so excited about after we’ve realized the set of problems that will be solvable after this one problem is solved will have as transformative of an impact on society as PCs or the Internet.

I am happy to report that I’ve made it to this point in my career/life and I’m still very enthused about what AI has to offer and am in constant awe of how exceedingly difficult it is to mimic the even simplest biological actions – I find no reason to avoid the topic, no matter how untenable it may seem no matter how fragmented its starts and stops of the past. To me it’s as if the problems of before have simply underscored how best to make progress now. In some sense its right there in front of us all now: folks at conferences seemingly scream the solutions through their papers and presentations on theories of machines encapsulating massively parallel, self organizing algorithms – this is clearly what all neuron hosting systems are empirically built from. We might best be served by the example of the Wright Brothers deciding the best way forward was to waste a few afternoons watching birds in flight.

Its obvious now the road to AI that embodies characteristics of the neocortex will involve a Himalayan effort and some of the road ahead appears to me to be about as opposite of a direction over the first attempted passes as if one had decided instead to walk around the entire mountain range. I for one look forward to the journey, cheer along those I can see are ahead of me or those just now starting and I expect there are many like me simply enjoying their own journey everyday. I hope to see someone finish before my time on the trail is over.

Posted in Why I'm Blogging | Comments Off on Following The Trail To AI

Self Organization

To me, AI self organization is a kind of spontaneous order that appears to emerge from random networks of simple neurons. Capturing this magic is a primary aim of AI these days and progress, although plagued with periods of stops and starts over the years, has been always steadily moving ahead.

Reflexes, the senses and awareness can all be distilled down to a simple type of reinforcement/strengthening between neurons via the electrically charged axon/synapses and dendrites of the next neuron. The flow of current is directional and a neuron in some sense behaves like a simple nonlinear switch, albeit with thousands of independent output terminals that feed forward. This gave rise to Back Propigation, Radial Basis, PDF and a myriad of other feed-forward learning paradigms in the 80s.

A limitation was that the software systems which trained them tended to use a few layers, usually one input, hidden and output layers with the assumption more complex networks could be factored into these more simpler constructs. The deep learning systems of more recent years seems an attempt to layer such systems so that intermediate results can be learned to effect a more capable overall system.

Software has always had the ability to ‘self modify’ – a fact usually found by accident in my experience after a wild bit of code wrote data to a protected area where prefetching of instructions was intended. Sometimes, the op-codes and operands read from such stray writes resulted in interesting actions, but more often than not it simply vectored to an interrupt, not unlike the intended actions of malware and viruses that overflow an input buffer.

DNA also seems to have the ability to self modify its encoding as described by ‘jumping genes’ that can change positions on a chromosome through successive generations, thereby giving rise to different characteristics within an organism.

These types of systems become uber complex when compared to a strictly deterministic set of instructions but therein lies their great potential. Fortunately software is easy to change and can emulate some of the complexity of such systems. Given the simple statistics that govern the transfer functions of neurons the key factors to me are the malleability and sheer size of networks that can be emulated on computers. The limitations become controlling the system so it doesn’t just produce mush and feeding the output back into the input for temporal and other recurrent properties to emerge. There is a Copernicus moment in all of this when, after several generations of successive layers of neural networks are produced to fit a set of learning data, you realize humans will never have the ability to glean insight as to why a network of such complexity has converged in the particular fashion it has. Nor would the same set of inputs produce an identical network were the training set run through again, not unlike each person having their own unique perception of the world based on their individual experience and wiring. Self organizing systems are incredible and incredibly complex.

Posted in Self Organization | Comments Off on Self Organization

Speculations on Permutations

I’ve read estimates that each neuron in the neocortex connects to many thousands of other neurons and in total there are over 100 billion neurons in the brain. This alone is an interesting and revealing statistic about the capacity of the brain to hold information.

If you assume each connection to another neuron can take on one of only two states: on and off, which is a gross oversimplification as axons have a non-linear ability to signal based on their rate of firing and their internal thresholding function, its interesting to compute the total number of patterns each neuron can provide.

A neuron with one synapse to another neuron and a simple on/off state can hold 1 ‘bit’ of information: on and off. A neuron with 2 synapses can hold 4 (e.g. 2^2) and so on. A neuron with 50 synapses can represent 2^50 unique patterns or 1,125,899,906,842,624. I remember reading that each neuron in the brain has at least 1,000 synapses: 2^1,000 is an enormous number and with the neocortex having at least 10 billions neurons the potential number of combinations is astonishing – I’ve read its a number some estimate to be greater than the total atoms in the universe. Perhaps that’s why we accept intuitively that the human mind is infinite in its true capacity even though our consciousness seems severely limited based on the sheer number of details we can retain over a lifetime of learning. Neurons firing in massively parallel, self organizing frameworks are a truly marvelous type of a computer and God is a marvelous architect.

Posted in Cool Speculations | Comments Off on Speculations on Permutations

Embedded, Self Organizing Systems

A few years back, after many years developing embedded, real-time systems, I turned my attention to AI systems of the day to explore opportunities for embedding them into new products. Fuzzy Logic was in vogue at the time along with Expert Systems and Neural Networks. Each had enabled new products in limited spheres but it was painfully apparent nothing could yet approach what natural systems do so effortlessly, such as a fly avoiding the branches and leaves of a bush while zipping through it at top speed. This caught my imagination, and I became aware of how far computer architectures based on Turing machines were from real life nervous systems with their massively parallel, noisy, slow, self-organizing fabrics. In addition, none of the pattern recognition systems took much consideration of temporal changes in the environment which is the hallmark of any living creature’s ability to react to stimulus.

It seemed unlikely folks would ever glean insight about how a brain does anything from looking at a cross section of neurons or attempting to top down reverse engineer the neocortex from people’s outward behavior. As it has to so many others, it eventually seemed clear to me the realization of systems that could best do the things that biological systems do should be derived from a system self organizing its own complex networks similar to what all living creatures do either through eons of generations of genetic conditioning or through the process of behavioral learning. This seemed counter intuitive at first: after all, software engineers produce highly complex systems by incrementally battling the deterministic rules of merciless computers, quite unlike neurons; and I had lived in that environment for many years. To give up that paradigm in order to take a step in the direction of solving problems that have eluded AI is a tall order and a controversial one but nevertheless a direction I believed was required so as to make real progress.

I still think that’s the best route and, thanks to inspiration from a myriad of places on the Internet and books that I’ve enjoyed like Jeff Hawkin’s “On Intelligence,” I’m motivated to explore this field and related topics in this blog which has so much potential and has held such a fascination for me and so many others over the years. In my opinion it’s also a great time to be exploring applications of embedded, parallel, self organizing systems as cost and processing power are no longer the barriers to entry they were in earlier times. Platforms like the Arudino open source hardware environment with its CCL license both legally and financially position small business to begin to break down the barriers that have held back traditional machine architectures and patents, commercialization of applications will no doubt follow.

Posted in Why I'm Blogging | Comments Off on Embedded, Self Organizing Systems