The Intelligent Voice Bot Framework
Today, we are introducing you to our voice bot framework – OpenBrain. In this article, we will describe what is OpenBrain, how it can be used, and its architecture. This is the first of the series of articles where each of the OpenBrain’s components will be described in detail. Stay tuned for our latest discoveries and updates!
Introduction
Voice bots also known as voice chatbots, are AI-based tools that can capture, analyze and understand voice commands and reply by voice in natural language. The goal of these voice bots is to make the communication between the user and the software faster, more reliable, and more user-friendly. The voice bots need to understand the intent of the customers so they can efficiently solve their problems. Throughout the history of technology, with its development different types of voice bots are built, each introducing novel functionalities and improvements in comparison with its predecessors.
To our knowledge, there are 42 companies that provide chatbot platforms around the globe and only 8 of them provide NLP engines and only 4 provide both voice and text services, whereas the others are not considered intelligent chatbots.
The OpenBrain framework is modular and is specially designed to help businesses manage inner knowledge and provide services to customers in a human-like manner. The framework includes a wide range of unique functionalities that are not present in other standard voice bot frameworks offered by other competitors as none of them offers an on-premise solution. Some of the unique functionalities that we comprise and are not a standard part of the other voice bot frameworks are sentiment analysis, opinion mining, general knowledge, summarization, and graph knowledge. From the extensive research, it can be perceived that iReason’s OpenBrain is the only intelligent voice bot framework that offers its services On-premise, and thus assures all the company’s information stays in-house. In the future we will dedicate separate articles for each of the constituent functionalities, both the general and the unique ones, as we work only with proven state-of-the-art models and novel upgraded and improved technology stacks.
The Usage of OpenBrain
OpenBrain is a modular framework comprised of intelligent units to make the service providers more productive in both B2B and B2C communication. OpenBrain automatically unifies internal and external sources of knowledge, gathers and extracts meaningful information which then presents it in a human-like manner on user demand. Thus, the employees and the customers can efficiently provide and require information without having to move from one information source to another or waste additional human resources for technical support as is the case in the existing Interactive Voice Response (IVR) technologies. OpenBrain can easily transform the companies’ IVR systems into intelligent on-premise solutions or can reside as a service in the Cloud.
Thanks to the wide range of functionality, OpenBrain can be applied in a variety of solution areas with specific targets on:
- Business intelligence
- IVR systems intelligence
- Assistive technologies
OpenBrain provides structure and meaning to large volumes of unstructured content. It enables customers and employees to find and use the information in a human-like manner of communication (by asking and acquiring the response in spoken or written natural language). The framework is content-neutral and can integrate with any sources. It can transform your web offer into a knowledge graph, can upgrade your IVR system with intelligent speech and information retrieval technologies, and can transform your web interface into accessible information in compliance with assistive technologies demands.
A unique advantage of OpenBrain is its capability to stay in-house and prevent the leaking of information to external services. The information can be distributed in many ways such as via web interface, mobile technologies, cellular technologies, and other systems within a company.
Technical Standards
OpenBrain’s backend is based on Deep learning methodologies, NL transformers, and Python (Torch library). It has support for HTTP(S) and WSS protocols. The interface is based on React, Spring Boot, and Python. OpenBrain requires Linux Server and users only need a standard web browser. There is support for Unicode UTF-8 characters to cover most languages.
OpenBrain scales to thousands of users. It supports load management, which is a method where simultaneous user requests are distributed and balanced among multiple servers. The intelligent models can run on CPUs, however, GPUs are preferred to speed up the inference. The evaluation results are as follows.
- The CPU evaluation is performed on Intel(R) Xeon(R) CPU E7-8880 @ 2.30GHz
- The GPU evaluation is performed on the NVIDIA Tesla P100 device. All CUDA supporting GPU devices are applicable.
- Number of recommended users per server configuration providing real-time communication (1s audio is processed for less than 1s) with the virtual agent.
- The OpenBrain platform supports horizontal scaling to an infinite number of nodes.
CPU (8 cores) + 12GB RAM | GPU (Tesla P100) + 12RAM | |
Concurrent Users | 5x | 58x |
The Layers of OpenBrain
In this section, we are introducing the architecture of OpenBrain. Each of the five layers will be described in the following text, along with their architecture, core functionalities, and highlighted purposes.
- Knowledge Layer: The knowledge graphs and summarization models are at the core of OpenBrain. This layer represents the inner knowledge and the general knowledge on demand.
- NLP Layer: encompass the multilingual natural language understanding and generation engines, as well as the specific models for named entity recognition, intent and task classification, opinion mining, sentiment analysis, and question answering models.
- Application Layer: at this layer reside the text-to-speech and speech-to-text intelligent models, typo correction models, and also the dialog-flow management system for creating novel custom-based scenarios.
- Security Layer: controls access to OpenBrain, authenticates, and authorizes users.
- User Interface: this is the final layer that meets the end-user of OpenBrain and is responsible for assuring contextual voice, text, or combined conversation.
The NLP Layer is the heart of the framework that pulls the intelligent models at each of the layers, enabling human-like communication as shown in the following Figure 2.
- Knowledge Layer
The knowledge is the core of the OpenBrain framework and is crucial for the efficiency of the instances produced from the framework. The knowledge can be retrieved both from internal sources (the inner databases of the company) or external sources (such as the Internet, Wikipedia, Dbpedia, etc). The data is usually obtained in a semi-structured format and needs to be additionally processed to enable the extraction of meaningful knowledge. The knowledge graphs approach is used to organize the data in terms of related entities and their attributes, thus the search for information is fast and accurate.
- NLP Layer
The NLP Layer is crucial for the managing of the knowledge as structured in the Knowledge Layer. This layer consists of many intelligent models built upon the knowledge and the natural language of interest for the particular instance of OpenBrain, since as previously described, this framework is multilingual, meaning it can be easily adapted to any language for which there are available resources. The framework offers trained models for natural language understanding and natural language generation as the most important feature that assures that the voice bot clearly understands the morphological characteristics of a language, and thus can understand the user’s requests and can appropriately respond to those requests in terms of serving the demanded information, or, even more, to trigger suitable actions.
Communication is a complex process for which understanding a language is not the only problem that should be considered. Human beings have emotions, ask complex questions, ask for help, and usually intend to take some actions when communicating with an intelligent voice bot in both web and an IVR environment. For those reasons, the NLP Layer holds a few more intelligent models built upon Deep learning methodologies:
- Sentiment analysis and opinion mining to understand the user’s emotions and to be able to smoothen an intention for a rude conversation.
- Question answering intelligent module that is able to search for answers to complex questions given the knowledge graph in the first layer, the Knowledge Layer.
- Intent recognition and consequently task classification modules that are trained on a particular domain of interest, able to extract the focus and the entities of the conversation and recognize the intent of the user.
- Application Layer
The Application Layer holds a few more intelligent models based also on Deep learning methodologies that are crucial for the human-like communication experience in natural language, and those are:
- Text-to-speech engine – Deep learning-based model built upon a large corpus of single-speaker audio files able to synthesize human-like speech for a particular language. The model can be trained on any language in a short time.
- Speech-to-text engine – Deep learning-based model built upon hundreds of hours of speech from hundreds of speakers, transcribed in textual form. The model can be trained on any language within a few months.
- Intelligent typo correction model – This model works in pair with the speech-to-text engines with the purpose to correct the misspelled words during the transcribing process.
- Dialog flow management system – This GUI is a unique possibility for the company to build scenarios on its own. This GUI is made only for the service providers with the aim to ease the process of upgrading the voice bot instance with new scenarios. The following figure describes a case of the voice bot usage in the banking sector, in which the administrators can create their own blocks and define transitions among multiple blocks to complete one use case. In its background, a code is automatically generated without any human intervention.
Because of the uniqueness of each of the intelligent models mentioned in the NLP Layer and the Application Layer in the following period, a respective article for each model will be published in which we will dive into the details.
- Security Layer
This subsection is dedicated to the Security Layer and how this layer enables the content to be secured by a few key services that allow:
- Authentication – Since the administrators of the system are the only users that need to log in to the system, Single Sign-On (SSO) is the software authentication method that enables a user to log in once and gain access to the resources of multiple subsystems.
- Access Control – Additional functionality for stopping automated attempts to gain unauthorized access to content.
- Logging – Logging of events. The events that are logged are the data modifications and the administrative actions.
- Encryption – Key system information and passwords are encrypted.
- VPN – Traffic between the Server and Client may be encrypted using SSL or restricted through VPN.
- User Interface Layer
The User Interface Layer offers three possible implementations, web, mobile, and IVR implementation, each of them described below:
Web view
This view is a simple user interface with a chat environment and options to use the speech-to-text functionality to interact with the voice bot by using a speech, and an option to activate the text-to-speech functionality and obtain the answers in audio format. This view has also an admin panel, allowing the administrators to update the knowledge database with unstructured text, ready to be preprocessed by the models in the Knowledge Layer. The web view allows the possibility of being customized depending on the modules chosen to be active when creating a voice bot instance from the OpenBrain framework.
The simple view can be upgraded with a view at the Knowledge Layer, presenting the knowledge graph and the active nodes at the moments a particular user is searching for information.
Mobile phone view
This user interface is integrated with some popular apps such as Viber and WhatsApp. The view aims to ease the communication with the voice bot by using a smartphone.
IVR view
The user interface in the case of the OpenBrain’s instance is integrated with the IVR system and is accessed via a cellular line by the user. The text-to-speech and speech-to-text engines are active during the whole process and the other intelligent models from the Application, NLP and Knowledge Layer. The Security Layer is not active at the moment as the security is handled by the IVR system itself.
The intelligent voice bot takes the call and activates all the intelligent modules to solve the user’s problem or request. If the intelligent voice bot is not able to solve the demands, the call is routed to a human resource as it is done in a traditional IVR system.
Conclusion
Our work here is extensive, but it is only the beginning. The purpose of this article is to tackle the existing challenges in voice bot architectures and how we overcome them. OpenBrain is on the path to having a bright future in replacing the existing IVR technologies and creating a more user-friendly experience in a human-like manner. This is only the first article of many where each of the intelligent models comprised in OpenBrain architecture will be extensively represented. Each of our modules can be separately used in both academic research and commercial applications. We believe that with this contribution, we will be able to assist others in making a positive impact on society.
Stay tuned for more!