Conversational AI, also known as chat bots, is the set of artificial intelligence technologies behind speech-enabled software and computer-based systems that provide real-life human-like interactions with artificial intelligent (robot) computers and humans, over communication networks. These systems will allow users to converse with one another over chat platforms such as Facebook or Twitter. Bots can also be programmed to engage in user-organized conversation threads on social networking sites, blogs, or other Internet discussion boards. The systems can be used for basic communication, entertainment, business, or research purposes.
Examples of conversational AI technologies include the “IBM Bot” in IBM’s ThinkPad tablet. This bot has a capability to interpret conversations, making it able to reply in different languages, respond to queries and simple queries, as well as to provide general answers to questions. This capability builds on previous research in AI, such as the “YRI” or” Watson” project from IBM. Other examples include the Google Cardboard-powered eyeglass camera and the Apple iPhone’s voice recognition capability.
To achieve the conversational ai capability, the systems need to be able to approximate and emulate the human language understanding process. In doing so, they should be able to quickly detect and reproduce the grammatical structure, sentence boundaries, punctuation, format, and flow of a normal conversation. This requires some significant amount of training on the part of developers, since it is difficult to teach a machine how to understand language. However, once learned, conversational as technology makes it easy for users to converse with each other using a pre-programmed vocabulary and a limited vocabulary of expressions, while retaining the ability to differentiate between different types of human speech. For example, it should recognize and be able to respond to the different tenses of a sentence.
The first step toward achieving conversational aim is to design and build a tool that can automatically generate responses to questions. A tool like this could, for example, detect the difference between “is there a door between the bedroom and the living room?” and “Is there a door in the living room?” and “does the living room have a door between the bedroom and the kitchen?” It should also be able to detect the main clauses of a question and generate an appropriate response, e.g., “the kitchen has a door between the kitchen and the living room.” Such a tool could then be used to generate inferences and predictions from the raw data.
The second step is to train the machine to utilize its database of knowledge. Currently, one of the most common methods of accomplishing this is through the use of reinforcement learning. This approach is widely used in the field of search engine optimization, where a system rewards the user for producing accurate or useful responses, in the form of positive search results. More specifically, this is done by giving the user the reward (positive results) if the user provides the correct answer to a question or poses a valid question (in the form of an answer), as opposed to punishing the user if the user provides an incorrect response.
The final step toward conversational aim is to build a communication system that allows the user to interact with it in real time. Currently, one of the more prominent platforms in this area is the IBM Work light. The IBM Work light will allow users to exchange data using existing technologies such as e-mail, web, audio, video and SMS over the internet.
Similar approaches can be taken for building conversational AI systems that communicate back to the user using one of several alternative technologies. One approach is to use a backend system that implements the necessary technologies. Another approach is to use a server to communicate back to the user using one of the supported technologies. Currently, this technology is quite immature and poses certain difficulties such as poor scalability, memory consumption, lack of support for different language bindings and the difficulty of designing and deploying the server. In addition, implementing a server to support this functionality is quite expensive, not to mention the difficulty involved in integrating it into an existing infrastructure. Furthermore, there is a strong likelihood that it will not gain enough traction due to the need to replace human users.
Currently, there are four technologies currently in use that fall within the realm of conversational AI. They include Natural Language Understanding (NLU), Context Based Reasoning (CMB), Deep Reinforcement Learning (DRL) and Audio Speech Recognition (ASR). Each of these technologies has their own strengths and limitations, but all four share the basic premise that natural language understanding, context and data can be fed into a recurrent machine learning network via the use of natural language. Conversational AIs fall directly under the latter of the two mentioned technologies. Conversational AIs combine state of the art technologies with user provided inputs in order to achieve true artificial intelligence.