Skip to content

Brain Pattern-based Recommender Systems–Coming Soon?

Now things are going to start to get real interesting! In The Learning Layer, I categorized the types of behaviors that could be used by recommender systems to infer people’s preferences and interests. The categories I described are:

  1.  Navigation and Access Behaviors (e.g., click streams, searches, transactions)
  2. Collaborative Behaviors (e.g., person-to-person communications, broadcasts, contributing comments and content)
  3. Reference Behaviors (e.g., saving, tagging, or organizing information)
  4. Direct Feedback Behaviors (e.g., ratings, comments)
  5. Self-Profiling and Subscription Behaviors (e.g., personal attributes and affiliations, subscriptions to topics and people)
  6. Physical Location and Environment (e.g., location relative to physical objects and people, lighting levels, local weather conditions)

Various subsets of these categories are already being used in a variety of systems to provide intelligent, personalized recommendations, with location-awareness being perhaps the most recent behavioral information to be leveraged, and the sensing of environmental conditions and the incorporation of that information in recommendation engines representing the very leading edge.

But I also described an intriguing, to say the least, seventh category–monitored attention and physiological response behaviors. This category includes the monitoring of extrinsic behaviors such as gaze, gestures, movements, as well as more intrinsic physiological “behaviors” such as heart rate, galvanic responses, and brain wave patterns and imaging. Exotic and futuristic stuff to be sure. But apparently it may not be as far off as one might think to actually be put into widespread practice, given this new advance in smart phone brain scanner systems.

Sure, it will take time for this type of technology to be cost effective, user-friendly, scale to the mass market, etc. But can there be any doubt that it will eventually play a role in providing information to our intelligent recommender systems?


Social Networking and the Curse of Aristotle

The recent release and early rapid growth of Google+ has mostly been a direct consequence of social networking privacy concerns—with the Circles functionality being the key distinguishing feature versus Facebook.  Circles allows for a somewhat easier categorization of people with whom you would like to share (and gratefully only you see the categorizations in which you place people!).

What people rapidly find as their connections and number of Circles or Facebook Lists grow, however, is that the core issue isn’t so much privacy per se, but the ability to effectively and efficiently categorize at scale. A good perspective on this is Yoav Shoham’s recent blog on TechCrunch about the difficulties of manual categorization and his experience trying to categorize 300+ friends on Facebook. Circles is susceptible to the same problem—it just makes it easier and faster to run head-long into the inevitable categorization problem.

A root cause of the problem, as I harp on in The Learning Layer, rests with that purveyor of what-seems-to-be-common-sense-that-isn’t-quite-right, Aristotle.  Aristotle had the notion that an item must either fit in a category or not.  There was no maybe fits, sort of fits, or partly fits for Aristotle.  And Google+ (like Facebook and most other social networks) only enables you to compartmentalize people via the standard Aristotelian (i.e., “crisp”) set. A person is either fully in a circle/list/group or not—there is no capacity for partial inclusion.

But our brain more typically actually categorizes in accordance with non-Aristotelian, or “fuzzy” sets—that is, a person may be included in any given set by degree.  For example, someone may be sort of a friend and also a colleague, but not really a close friend, another person can be a soul mate, another mostly interested in a mutually shared hobby, etc. Sure, there are some social categories that are not fuzzy—either a person was your 12th grade classmate or not—but since non-fuzzy sets are just a special case of the more generalized fuzzy sets, fuzzy sets can gracefully handle all cases. So fuzzy sets have many advantages and this type of categorization naturally leads to fuzzy network-based structures, where relationships are by degree.  (The basic structure of our brain, not surprisingly, is a fuzzy network—the structure I therefore call “the architecture of learning” in The Learning Layer.)

But an issue with implementing in a system the reality of our social networks as fuzzy networks is that it can be hard to prescribe ahead of time sharing controls for fuzzy relations.  If we actually bothered to decide on an individual basis as to whether to share a specific item of content or posting, we would naturally do so on the basis of our nuanced, fuzzy relationships.  But that, or course, would take some consideration and time to do.

So the grand social networking bargain seems to be that for maximum expedience we either resign ourselves to share everything with everyone (what most people do on Facebook), or we employ coarse-grained non-fuzzy controls (e.g., Circles, Lists) that are a pain to set up, imprecise, and don’t scale.  Or there is another option—we cast Aristotle aside and establish and/or let the system establish a fuzzy categorization and then let our system learn from us to become an intelligent sharing proxy that shares as we would if we had time to consider fully each sharing action.  That will, of course, require trusting the system’s learning, which will necessarily have to be earned.  But ultimately that approach and the sharing everything with everyone are the only two alternatives that are durable and will scale.

Does Personalization Pave the Cow Paths?

Michael Hammer, the father of business reengineering, famously used the phrase “paving the cow paths” to describe the ritualizing of inefficient business practices. Now the pervasiveness of personalization of our systems is being accused of paving our cow paths by continuously reinforcing our narrow interests at the expense of exposing us to other points of view. This latest apocalyptic image being painted is one of a world where we are all increasingly locked into our parochial, polarized perspectives as the machine feeds us only what we want to hear. Big Brother turns out to be an algorithm.

I commented on this over at Greg Linden’s blog, but wanted to expand on those thoughts a bit here. Of course, I could first point out the irony that I only became aware of the Eli Pariser’s book about the perils of personalization, The Filter Bubble, through a personalized RSS feed, but I will move on to more substantive points—the main one being that it seems to me that a straw man, highly naïve personalization capability has been constructed to use as the primary foil of the criticism. Does such relatively crude personalization occur today and are some of the concerns, while overblown, valid? Yes. Are these relatively unsophisticated personalization functions likely to remain the state-of-the-art for long? Nope.

As I discuss in The Learning Layer, an advanced personalization capability includes the following features:

  1. A user-controlled tuning function that enables a user to explicitly adjust the “narrowness” of inference of the user’s interests in generating recommendations
  2. An “experimentation” capability within the recommendation algorithm to at least occasionally take the user outside her typical inferred areas of interest
  3. A recommendation explanation function that provides the user with the rationale for the recommendation, including a level of confidence the system has in making the explanation, and an indication when a recommendation is provided that is intentionally outside of the normal areas of interest

And by the way, there are actually two reasons to deliver the occasional experimental recommendation: first, yes, to subtly encourage the user to broaden her horizons, but less obviously, to also enable the recommendation algorithm to gain more information than it would otherwise have, enabling it to develop both a broader and a finer-grained perspective of the user’s “interest space.” This allows for increasingly sophisticated, nuanced, and beneficially serendipitous recommendations. As The Learning Layer puts it:

. . . the wise system will also sometimes take the user a bit off of her well-worn paths. Think of it as the system running little experiments. Only by “taking a jump” with some of these types of experimental recommendations every now and then can the system fine-tune its understanding of the user and get a feel for potential changes in tastes and preferences. The objective of every interaction of the socially aware system is to find the right balance of providing valuable learning to the user in the present, while also interacting so as to learn more about the user in order to become even more useful in the future. It takes a deft touch.

A deft touch indeed, but also completely doable and an inevitable feature of future personalization algorithms.

I’ve got to admit, my first reaction when I see yet another in a long line of hand wringing stories of how the Internet is making us stupider, into gadgets, amplifying our prejudices, etc., is to be dismissive. After all, amidst the overwhelmingly obvious advantages we gain from advances in technology (a boring “dog bites man” story), the opportunity is to sell a “man bites dog” negative counter-view. These stories invariable have two common themes: a naïve extrapolation from the current state of the technology and an assumption people are at best just passive entities, and at worst complete fools. History has shown these to ultimately be bad assumptions, and hence, the resulting stories cannot be taken too seriously.

On the other hand, looking beyond the “Chicken Little” part of these stories, there are some nuggets of insight that those of us building adaptive technologies can learn from. And a lesson from this latest one is that the type of more advanced auto-learning and recommendation capabilities featured in The Learning Layer is an imperative in avoiding a bad case of paving-the-cow-paths syndrome.

Our Conceit of Consciousness

MIT recently held a symposium called “Brains, Minds, and Machines,” which took stock of the current state of cognitive science and machine learning, as well as debating directions for the next generation of advances. The symposium kicked-off with perspectives from some of the various lions of the field of cognitive science such as Marvin Minsky and Noam Chomsky.

A perspective by Chomsky lays down the gauntlet with regard to today’s competing schools of AI development and directions:

Chomsky derided researchers in machine learning who use purely statistical methods to produce behavior that mimics something in the world, but who don’t try to understand the meaning of that behavior. Chomsky compared such researchers to scientists who might study the dance made by a bee returning to the hive, and who could produce a statistically based simulation of such a dance without attempting to understand why the bee behaved that way. “That’s a notion of [scientific] success that’s very novel. I don’t know of anything like it in the history of science,” said Chomsky.

Of course, Chomsky is tacitly assuming that “meaning” and truly understanding natural language is something much more than just statistics. But is it really? That certainly seems like common sense. On the other hand, if we look closely at the brain, all we see are networks of neurons firing in statistical patterns. The meaning somehow emerges from the statistics.

I suspect what Chomsky really wants is an “explanation engine”–an explanatory facility that can convincingly explain itself to us, presenting to us many layers of richly nuanced reasoning. Patrick Winston at the same symposium said as much:

Winston speculated that the magic ingredient that makes humans unique is our ability to create and understand stories using the faculties that support language: “Once you have stories, you have the kind of creativity that makes the species different to any other.”

This capability has been the goal of AI from the beginning, and 50 years later, the fact that such a capability has still not been delivered has clearly not been for lack of trying. I would argue that this perceived failure is a consequence of the early AI community falling prey to the “conceit of consciousness” that we are all prone to. We humans believe that we are our language-based explanation engine, and we therefore literally tell and convince ourselves that true meaning is solely a product of the conscious, language-based reasoning capacity of our explanation engines.

But that ain’t exactly the way nature did it, as Winston implicitly acknowledges. Inarticulate inferencing and decision making capabilities evolved over the course of billions of years and work quite well, thank you. Only very recently did we humans, apparently uniquely, become endowed with a very powerful explanation engine that provides a rationale (and often a rationalization!) for the decisions already made by our unconscious intelligence—an endowment most probably for the primary purpose of delivering compact communications to others rather than for the purpose of improving our individual decision making.

So to focus first on the explanation engine is getting things exactly backward in trying to develop machine intelligence. To recapitulate evolution, we first need to build intelligent systems that generate good inferences and decisions from large amounts of data, just like we humans continuously and unconsciously do. And like it or not, we can only do so by applying those inscrutable, inarticulate, complex, messy, math-based methods. With this essential foundation in place, we can then (mostly for the sake of our own conceit of consciousness!) build useful explanatory engines on top of the highly intelligent unconsciousness.

So I agree with Chomsky and Winston that now is indeed a fruitful time to build explanation engines—not because AI directions have been misguided for the past decade or two, but rather because we have actually come such a long way with the much maligned data-driven, statistical approach to AI, and because there is a clear path to doing even more wonderful things with this approach. Unlike 50 years ago our systems are beginning to actually have something interesting to say to us; so by all means, let us help them begin to do so!

Search = Recommendations

It is good to see that some artificial distinctions that have served to hamper progress in delivering truly intelligent computer interfaces are increasingly melting away. In particular, recommendations, when broadly defined, can beneficially serve as a unifying concept for a variety of computer capabilities. In The Learning Layer I defined a computer-generated recommendation as:

A recommendation is a suggestion generated by a system that is based at least in part on learning from usage behaviors.

In other words, a recommendation is an adaptive communication from the system to the user.

And I had this to say about search:

By the way, the results generated by modern Internet search engines are in practice almost always adaptive recommendations because they are influenced by behavioral information–at a minimum they use the behavioral information associated with people making links from one web page to another. This capability was the original technical breakthrough applied by Google that enabled their search engine to be so much more effective than that of their early competitors.

This feature of contemporary Internet-based search also provides a hint at the reason that the users of enterprise search whom I have talked with over the years have been so often underwhelmed by the performance of their internal searches compared with corresponding Internet versions. Historically, there has been little to no behavioral information embodied within the stored knowledge base of the enterprise, and so search inside the four walls of the business has been basically relegated to the sophisticated, but non-socially aware, pattern matching of text–similar to the way Internet search was before Google. Without social awareness, search can be a bit of a dud.

That’s why, as I mentioned in the previous blog, the computational engines behind the generation of search results and recommendations are inevitably converging, and we are therefore witnessing a voracious appetite of search engines for an ever larger and richer corpus of  behavioral information to work with—Google’s new “plus one” rating function being the most recent example.

On this same note, I just happened across this brief, recent write-up that struggles with categorizing a start-up, exemplifying the blurring of search and recommendations, and finishing with the point that, “. . . Google is also increasingly acting as a recommender system, rather than just a web search engine.” Indeed—now on to bringing this convergence to the enterprise . . .

The Recommendation Revolution, Continued . . .

That “hidden in plain sight” IT revolution of ubiquitous, adaptive recommendations continues to accelerate. Jive Software, a leader in enterprise social software, has just acquired Proximal Labs to bolster their machine learning, recommendation engine, and personalization capabilities. Given that Jive is reportedly preparing for an IPO, this move is a major endorsement of the key theme of The Learning Layer–that adaptive recommendations will be a standard part of enterprise computing. It also is another strong signal that, as I recently wrote, we are going to rapidly witness a standard, adaptive IT architecture emerge in which a layer of automated learning overlays and integrates the social layer, as well as the underlying process, content, and application layers. And I would be remiss if I didn’t mention our own initiative for bringing the learning layer to Microsoft SharePoint and other enterprise collaborative environments, Synxi, which includes a suite of auto-learning functions such as knowledge and expertise discovery, interest graph visualizations, and recommendation explanations.

Another recent and important signal of the recommendation revolution is Google’s recently announced “+1” function, an analog to Facebook’s “like” button. Google will use this tagging/rating function in generating responses to search requests. Google was the pioneer in using behavioral information to improve search results, and this is just the latest, and most dramatic, step in increasingly relying on behavioral cues and signals to deliver personalized search results that most hit the mark.

That is why, as I discussed in the book, responses to search requests should really be considered just a certain kind of adaptive recommendation, one in which more intentionality of the recommendation recipient can be inferred by virtue of the search term or phrase, but that is otherwise processed and delivered like any other type of recommendation. And it is why search processing and recommendation engines will inevitably generalize and converge.

Stay tuned . . .

Creative Combinatorics

The math of innovation is combinatorics. We create by building on what comes before, or more precisely, by decomposing, abstracting, recombining, and extending what comes before. The more combinations that we generate, the greater the creative potential. The complementary challenge is to effectively and presciently evaluate the combinations.

At the macro level, these combinatorics-based dynamics of innovation have often been exemplified by what is known as “clustering economics.” New York, for example, has had self-reinforcing advantages in financial markets; Silicon Valley has enjoyed unique, self-sustaining advantages in emerging technologies. But the advantages of the physical clustering of these and similar examples really boil down to advantages in combinatorics. The innumerable additional encounters, discussions, collaborations, and competitions due to physical proximity lead to massively greater idea combinations than would otherwise occur. And where there is superior infrastructure to evaluate and nurture (e.g., wide-spread, accessible venture capital) and implement (e.g., flexible, risk-taking resources) the ideas, the combinatoric advantages inevitably lead to extraordinary value creation.

Learning layers obey the same math. They create value directly through enhancing collective learning, but the ability to amplify the power of combinatorics is at the heart of their value proposition. The learning layer is an engine of serendipity driven not by physical proximities, but by virtual proximities within multi-dimensional “idea spaces.” Its aim is to bring to our attention useful affinities and clusters among ideas and people of which we might not otherwise be aware. A learning layer can be considered an example of what the cognitive scientist and author Steven Pinker calls discrete combinatorial systems, which, as I discuss in The Learning Layer, occupy a very special place in nature:

We know of, however, two discrete combinatorial systems in the natural universe: DNA and human language. Not coincidently, these systems have the unique capability of generating an endless stream of novelties through a combinatorics made infinite by the application of a kind of generative grammar–a grammar comprising a recursive set a rules for combining and extending the constituent elements.

The discrete combinatorial nature of the learning layer is yet another reason it occupies a special place in the realm of systems. In combination with us, and by continually learning from us, it can engender ever greater streams of valuable, recombinant innovations. Of course, we still need the proper environments to nurture and implement the best of these innovations, and that’s a challenge we need to really be focusing on because somewhere in that multi-dimensional innovation space a very important creative combination impatiently awaits . . .

The Adaptive Stack

IT always evolves by building new system layers on top of preceding layers, while concurrently abstracting away from users of these new layers extraneous details of the underlying layers. In the resulting current IT “stack,” we predominantly interact with content and application layers, and when applicable and available, process layers. And we are well on our way toward abstracting away the networking and hardware infrastructure on which our applications run—bundling that big buzzing confusion into “the cloud.”

Much more recently we have begun to add a social layer on top of our other software layers—still a work in progress in most organizations. So far, these social-based systems can more often be considered architectural bolt-ons rather than a truly integral part of the enterprise IT stack. But that is clearly destined to change.

And coming right on the heels of the social layer is the learning layer—the intelligent and adaptive integrator of the social, content, and process layers. The distinguishing characteristic of this layer is its capacity for automatic learning from the collective experiences of users and delivering the learning back to users in a variety of ways.

So this is the new IT stack that is taking shape and that summarizes the enterprise systems architecture of 2011 and beyond. And since auto-learning features promise to be an integral part of every system and device with which we interact, it is the reason that the next major era of IT is most sensibly labeled “the era of adaptation.”

As I discuss in the book, there is something qualitatively different about the combination of these last two layers of the stack—the social and learning layers—in contrast to all the layers that came before. These new layers cause the boundary between systems and people to become much more blurred—it is no longer just a command and response relationship between man and machine, but rather, a mutual learning relationship. And exactly where the learning of the system and the learning of people begins and ends is a bit fuzzy.

Perhaps then, our new stack more accurately summarizes the next generation enterprise architecture, not just the IT architecture–an enterprise architecture of a different nature than that which has come before, one in which learning and adaptation is woven throughout.

Watson: Will Zombies Inherit the Earth?

Last week we witnessed a modern-day St. Valentine’s Day massacre when yet again a computer entered a hallowed arena of human intellectual combat and trounced the best that the human race had to offer. And Watson’s Jeopardy victory was a seriously impressive feat of engineering, with many more practical business applications than the last widely publicized machine-on-man intellectual violence perpetrated by IBM, Deep Blue beating chess champion Gary Kasparov.

So give Watson its due. Still, Watson is a zombie. In The Learning Layer I describe zombie systems as those that are unable to pay attention to, and learn from, human behaviors and to adapt accordingly. And furthermore, as we all know from late night movies, zombies just do things—they don’t have the ability to articulate why they do things. Just like probably every system you have ever interacted with.

Of course, now we have the technology that could rescue Watson from the realm of the zombies by adding a capacity for social awareness, and an ability to automatically learn from this social awareness. This Watson Jr. would combine Watson Sr.’s formidable textual searching, pattern matching, and natural language processing, with social learning skills. If you work in a call center, you should already be a bit worried about Watson Sr. But Watson Jr. would be quite a formidable force in a whole lot of different job markets!

We could even go a step further and put an explanation engine into Watson Jr.  He could then explain effectively, in human terms, why, for example, he proposed one Jeopardy response versus another response. Of course, since the actual rationale would be basis all of those highly inter-related and complex pattern matching algorithms he inherited from Watson Sr., he would often find it hard to explain to us his specific reasoning. Perhaps he would humor us by just making something up that he thought we would find plausible.

Come to think of it, if we asked Ken Jennings why he guessed one response versus another response, he might have trouble articulating exactly why. Just a hunch or a feeling, perhaps. Maybe there is a little zombie in all of us . . .

The Architecture of Learning

If we want our systems to automatically learn, how should they be architected? The obvious thing to do is to take a lesson from the one “machine” that we know automatically learns, the brain. And, of course, what we find is that the brain is a connection machine; a vast network of neurons that are inter-connected at synapses. And a closer look reveals that these connections are not just binary in nature, either existing or not, but can take on a range of strength of connection. In other words, the brain can be best represented as a weighted, or “fuzzy,” network. Furthermore, it’s a dynamic fuzzy network in that the behavior of one node (i.e., neuronal “firing”) can cascade throughout the network and interact and integrate with other cascades, forming countless different patterns throughout the network. Out of these patterns (somehow!) emerges our mind, with its wondrous plasticity and ability to so effortlessly learn.

Yes, taking this lesson may seem like, well, a no-brainer, but amazingly the lesson has generally been ignored when it comes to our systems. We’re well over half a century into the information age, but looking inside our organizations, we still find hierarchical system structures predominating (e.g., good old folders). The problem with hierarchy-based structures is that they are inherently brittle. They simply don’t embody enough relationship information to effectively encode learning. They don’t even scale (remember the original Yahoo! site?)—as we have all experienced to our great frustration. There is a reason nature didn’t choose this structure to manage information!

Fortunately there has been a revolution in systems structure over the past decade—the rise of the network paradigm. The internet was, of course, the driver for this revolution, and we now find network-based structures throughout our Web 2.0 world—particularly in the form of social networks. But even these networks are not fuzzy—we are limited to establishing our relationships only in binary terms, yes or no. Sure, we can categorize with lists and groups, but we are still at a loss in representing all of the relationship nuances that range from soul mates to good friends to distant acquaintances. And that makes it difficult to apply sophisticated machine learning techniques that truly add value. What’s the percentage of “recommended for you” suggestions that you receive in your favorite social network system that actually hit the mark, for example?

But the situation is even worse with regard to our organizations’ content. In this land-that-time-forgot, our knowledge remains entombed in the non-fuzzy world of hierarchies, or at best, relational structures. Not only are these systems incapable of learning and adapting, but it is often a struggle to even find what you are looking for.

This sad state of affairs can and must be rectified. All we have to do is take our lesson from the brain, integrate our representations of people (i.e., social networks) with our content, allow the relationships to be fuzzy, and we have something that is architected a whole lot like the brain, i.e., architected for learning. That’s not sufficient—we also need clever algorithms to operate against the structure, to create the necessary dynamics and patterns that deliver the benefits of the learning back to us. But the architecture of learning is the necessary prerequisite, quite doable, and therefore quite inevitable.