Sunday, March 27, 2016

Microsoft's Tay Has No AI

(This is the third of three posts about Tay. Previous posts: "Poor Software QA..." and "...Smoking Gun...")

While nearly all the press about Microsoft's Twitter chatbot Tay (@Tayandyou) is about artificial intelligence (AI) and how AI can be poisoned by trolling users, there is a more disturbing possibility:

  • There is no AI (worthy of the name) in Tay. (probably)

I say "probably" because the evidence is strong but not conclusive and the Microsoft Research team has not publicly revealed their architecture or methods.  But I'm willing to bet on it.

Evidence comes from three places. First is from observing a small non-random sample of Tay tweet and direct message sessions (posted by various users). Second is circumstantial, from composition of the team behind Tay. Third piece of evidence is from a person who claims to have worked at Microsoft Research on Tay until June 2015.  He/she made two comments to my first post, but unfortunately deleted the second comment which had lots of details.

(click to enlarge)

AI Worthy of the Name

What is and is not "artificial intelligence" (AI) is a complex topic that is hotly debated.  I won't resolve the debate here, but instead only explain my point of view.

To be worthy of AI, it is not enough that a system "look intelligent".  The first famous chat bot was ELIZA, written between 1964 and 1966. "ELIZA was implemented using simple pattern matching techniques, but was taken seriously by several of its users, even after Weizenbaum [the author] explained to them how it worked."  It looked intelligent, but really wasn't.

To be worthy of AI, the system's behavior needs to be a direct consequence of computational emulation of human intelligence at some level of abstraction.  For Tay to be AI, its behavior should be based on some combination of NLP and common sense knowledge/reasoning, including understanding of conversations.

At the very least, any "learning" in Tay should be due to improved understanding, i.e. better perception, conception, affect, anticipation, and similar -- however these are implemented computationally or abstracted.

Does all of "machine learning" (ML) fit within this definition of AI?  No.  Some does, some doesn't. In nearly all use cases of ML, what we care about are outputs.  We don't care if it is human-like or not.  In some cases, ML does resemble human reasoning, but in other cases it does not.

[Update 9:45pm] Are modern search engines built on AI? I don't know anyone who makes that argument.  Instead, the technology in search engines to mine the "intelligence" inherent in networks of links (the original Pagerank of Google) and various other properties of content and search behavior.

1) No Evidence of NLP or Common Sense Knowledge

Natural Language Processing (NLP) is a set of techniques that attempt computer understanding of natural language.  (See the Stanford NLP site.)  NLP involves sentence understanding, question answering, sentiment inference, and more. 

Common sense knowledge is understanding of our everyday existence and the implications.  Simple examples of such knowledge: "All trees are plants" but also "Some plants are not trees", and so on. It turns out that common sense knowledge is both vast and hard to compile in machine usable form. (See Cyc project here and recent news here).  For what its worth,  most NLP systems to not also try to include common sense knowledge and reasoning.

I won't go through a lot of examples, leaving that as an exercise for readers.  Here's just one tweet sequence between Tay and a troll:

 Looking at the above tweet stream, I can't see any evidence that Tay is performing any of this.  Each of Tay's replies has some relation to the human tweet, but only in a vague general way.  I can't see any specific text in Tay's replies that reflect any understanding of the human tweet.  Furthermore, there is no evidence that Tay understands this sequence of tweets as a conversation.  Each tweet+reply is atomic.

[Update 9:45pm]  Here is the enlarge Tay reply-tweet from above.  Tay found copied it verbatim from its Twitter corpus:

Also, I see no evidence that Tay has common sense understanding of any concepts, either those of humans or in Tay's utterances.

How could anyone believe that Tay had NLP capabilities or common sense?  Part of the trick is the cryptic nature of tweets, being limited to 140 characters.  Another is the social norms of many twitter users, especially younger users, who write cryptically, elliptically, and don't follow standard grammar and spelling.  Most twitter users get used to inferring meaning from cryptic tweets, whether or not that meaning was being intention on the part of the tweet author.

2) The Bing Team Was Involved in Tay

Bing is Microsoft's web search engine, competing with Google and others. From web page: "Tay is an artificial intelligent chat bot developed by Microsoft's Technology and Research and Bing teams to experiment with and conduct research on conversational understanding." (emphasis added) Why would the search engine team be involved in social chat bot research?  I don't know.  I'm not aware of other social bot research where search engine people, tools, or methods are being used.

One of the odd behaviors noted was that Tay was reusing, verbatim, tweets from recent history (i.e. not through direct interaction with users, a.k.a. "learning").  This example comes from a post by Steve Merity.  Here's the human tweet and Tay's reply:

Where did Tay get this?  Everything after "..." is copied from this tweet from its corpus:

It's easy to imagine this as a search result given "Ted Cruz" and "Zodiac Killer" as search text.

3) The Inside Story From Former MSR Engineer 

In the comment shown above, you can see this anonymous commenter say that the technology behind Tay was basically search engine.  This aligns with the fact that the Bing team was a significant participant.

In the deleted tweet, this commenter added important details (he/she left MSR in June 2015):
  • He/she did not see any AMIL bot (ALICE or other) in Tay, but instead such a bot might have been used to expand the corpus or train the search engine.
  • The basic design was this (crudely sketched):
  1. The input text from human was converted into a Regular Expression (i.e. text with wild cards, etc.)
  2. The regular expression is entered into the search engine, which searches over a social media corpus (presumably including Tweet history)
  3. The search results are filtered through Python code ("a 100 or so" if-then-else statements) to pick the result(s) to use, and piece them together into Tay's reply tweet.
  • There is no understanding or reasoning about conversational context.
  • No NLP or semantic reasoning.  No neural nets, deep learning, or other ML outside of what search engines are capable of.


From these three sources of evidence, I infer two things.  First, I believe that an AIML-capable library (probably ALICE) was added after June 2015 in order to implement rule-based behavior they couldn't otherwise easily implement through the pipeline listed above.  Second, I believe that the only learning in Tay was a) adding to the corpus through real-time interaction, and b) tuning the search engine though the normal adaptive mechanisms.


  1. Actually modern search engines do use AI for example Google’s RankBrain uses ANN’s to answer search queries which are never entered before.

    My guess is that Tay does have AI it just hadn’t sufficient amount of training for the ANN’s to produce more meaningful answers and therefore sometimes responded in a ‘stupid’ way. If you give the same input to an ANN it doesn’t have to respond differently every time actually that only happens if it has either some kind of intentional random function or context from the chat history.

    And the python code was probably an api for some ML library like tensorflow

    1. Thanks for your comments.

      Even if Tay's search engine uses Artificial Neural Networks (ANN), I am not convinced that this is "AI worthy of the name". ANNs and many other machine learning methods can be seen as mechanisms for estimating arbitrary functions, and "learning" is defined as " better and better estimations". While ML *can* be used in service to AI as I've defined it (e.g. book: "Machine Learning Methods for Commonsense Reasoning Processes" ed. Xenia Naidenova), I personally do not automatically include ANNs and other ML methods in AI, where "intelligence" is limited to "human-like intelligence".

      About TensorFlow: I didn't see any evidence that Tay had such reasoning capabilities. I wish MSR would publish something about their architecture and implementation.