Anyone that has dealt with Machine Learning (ML) understands that data is a fundamental ingredient to it. Given that a great deal of the world’s organized data already exists inside databases, doesn't it make sense to bring machine learning capabilities straight to the database itself? To do so, we have developed a concept called AITables. In this article we want to present to you what AITables are, how you can use them in MariaDB. Invite you to try it out, get involved and to join us in the journey of ML meets MariaDB.
AITables differ from normal tables in that they can generate predictions upon being queried and returning such predictions like if it was data that existed on the table. Simply put, an AI-Table allows you to use machine learning models as if they were normal database tables, in something that in plain SQL looks like this;
SELECT <predicted_variable> FROM <ML_model> WHERE <conditions>
To really sink in this idea, let us expand the concept through an example.
Imagine that you want to solve the problem of estimating the right price for a car on your website that has been selling used cars over the past 2 years.
Also, you use MariaDB and in your database there is a table called `used_cars_data` where you keep records of every car you have sold so far, storing information such as: price, transmission, mileage, fuel_type, road_tax, mpg (Miles Per Gallon) and engine_size.
Since you have historical data, you know that you could use Machine Learning to solve this problem. Wouldn't it be nice if you could simply tell your MariaDB server to do and manage the Machine Learning parts for you?
At MindsDB we think so too! And AI-Tables baked into MariaDB are here to do exactly that. Although further down in this article we will guide you step by step on how to run this example yourself, let us introduce you to what you can do and how it looks in standard SQL.
You can for instance with a single INSERT statement, create a machine learning model/predictor trained to predict ‘price’ using the data that lives in the table `sold_cars` and publish it as an AI-Table called 'used_cars_model'.
After that you can get price predictions by querying the generated 'used_cars_model' AI-Table, as follows:
As you can see with AI-Tables, we are aiming to simplify Machine Learning mechanics to simple SQL queries, so that you can focus on the important part; which is to think about what predictions are key for your business and what data you want your ML to learn from to make such predictions.
By now, you might be thinking, how do I get started?, what do I need? How can I run this example in my environment?
As promised, in the rest of this article we will show you step-by-step instructions on how to integrate MindsDB into your MariaDB server, how to build, test and use Machine Learning Models as AI-Tables all without the need for specific machine learning skills and how to evaluate prediction results in an “Explainable AI” way. If you want to preview this tutorial visually go ahead to our youtube channel and follow up the Machine Learning in MariaDB with AI Tables video.
With MindsDB any MariaDB user can train and test neural-networks based Machine Learning models with the same knowledge they have of SQL.
MindsDB is an open-source ML framework that enables running Machine Learning Models as AI-Tables. On top of that it has an exciting “Explainable AI” feature that allows users to get insights into their Machine Prediction accuracy score and evaluate its dependencies. For example, users can estimate how adding or removing certain data would impact on the effectiveness of the prediction.
The whole integration consists of two important parts:
This is possible thanks to MariaDBs CONNECT engine, which enables us to publish tables that live outside MariaDB. Since MindsDB supports the MySQL tcp-ip protocol, AI-Tables can be plugged as if they are external MariaDB tables. The following diagram illustrates this process.
The resource intensive Machine Learning tasks like model training happen on a separate MindsDB server instance or in the cloud, so that the Database performance is not affected.
In the following step-by-step tutorial you will learn how to install a MindsDB server with MariaDB and connect to the data, how to train the model and get predictions using an example dataset. So let’s get started.
If you already have MariaDB installed you can skip this section. MariaDB is one of the most popular database servers in the world and works on the most widely used operating systems. You can find the installation binaries and packages on the mariadb download site. To check the full list of distributions which include MariaDB head over to list of distributions.
Before you install MindsDB you need a Python version greater than 3.6 and pip which comes pre-installed with newer Python versions. Also, you will need to have around 1GB free space on your machine for installing the MindsDB’s dependencies. Other than that the installation is quite simple. Inside your virtual environment just run:
pip install mindsdb
To check if the installation was successful run:
pip show mindsdb
And you should be able to see the MindsDB information as name, version, summary, license etc:
That’s all. Let’s set up the required configuration and start MindsDB.
If you are following this tutorial with your own data, you can skip to the next section. For this example we will use the Audi Car Price dataset from the 100k used cars scraped data. The dataset contains information on price, transmission, mileage, fuel type, road tax, miles per gallon (mpg), and engine size of the used cars in the UK. The idea is to predict the price depending on the above features.
The first thing we need to do is to create the table. Execute the below query:
The InnoDB is a general storage engine and the one offered as the best choice in most cases from the MariaDB team.To see the list with all available ENGINEs and for advice on which one to choose check out the MariaDB engine docs. After creating the table there are a few options that you can do to add the data inside MariaDB.
The TERMINATED BY specifies the separator in the data. The escape characters and new lines are manageable with the ENCLOSED BY and LINES TERMINATED BY clause.
Let’s select the data from used_cars_data table to make sure it is successfully imported:
The data is inside MariaDB so the next step is to add the required configuration.
The first thing we need to set up is the required configuration for MindsDB Server. Let’s create a new file called config.json and add the following example:
This looks like a big configuration file but it is quite simple. What we have included are:
And the last thing we need to do is to install the CONNECT storage engine plugin that we have added in the plugin-load-add variable. To check how to download it for your OS check installing docs. It should be quite simple as using the package manager, e.g:
sudo apt-get install mariadb-plugin-connect
That’s pretty much everything related to the configuration required for successful integration with MariaDB. Let’s jump to the interesting part where we will train the machine learning model and query it.
First, we need to start MindsDB:
python3 -m mindsdb --api=mysql --config=config.json
The flags added here are:
If MindsDB was successfully started there should be a new database automatically created in MariaDB called mindsdb with two tables (commands and predictors).
In these tables, MindsDB will keep information about the models, model accuracy, training status, target variable that we will predict and additional options used for model training.
The main motto when we first started MindsDB was with one line of code, so now we will try to stick to it and present that in the databases with just one query. The Predictor in MindsDB’s words means Machine Learning model, so creating one could be done with an INSERT statement inside mindsdb.predictors table. Execute the following query:
What this query does is it creates a new model called 'used_cars_model' , specifies the column that we will try to predict as 'price' from the used_cars_data table. The required columns(parameters) for training the predictor are:
You should see the message about the successful execution of the query and if you open up the console, the MindsDB logger shall display messages while training the model.
Training could take some time depending on the data used, columns types, size etc. In our example not more than 2-3 min. To check that model was successfully trained run:
The column status shall be complete and the model accuracy will be saved when the training finishes.
The model has been trained successfully. It was quite simple because we didn’t do any hyperparameters tuning or features engineering and leave that out to MindsDB as an AutoML framework to try and fit the best model. With the INSERT query, we just provided labeled data as an input. The next step is to query the trained model with SELECT and get the output from it (predict the price of the car).
To get the prediction from the model is as easy as executing the SELECT statement where we will select the price and price confidence. The main idea is to predict the used car's price depending on the different features.
The first thing that comes to mind when looking for the used car is the car model, fuel type, mileage, the year when the car was produced etc. We should include in the WHERE clause all of this informations and leave it to MindsDB to make the predictions for them e.g:
You should see that MindsDB is quite confident that the car with all of the above characteristics as included in the WHERE clause shall cost around 13,111. To get additional information about the prediction include the explain column in the SELECT e.g:
Note that to beautify the resultFormat you can add command line option format for particular session.
Now MindsDB will display additional information in the info column as prediction quality, confidence interval, missing information for improving the prediction etc.
The confidence_interval specifies the probability that the value of a price lies within the range of 10k to 30k. The important_missing_information in this case is empty, but if we omit some of the important values in the WHERE clause e.g price, year or mpg, MindsDB shall warn us that that column is important for the model. The if_missing in the extra_insights shows the price value if some of the mentioned columns are missing.
Now, let’s try and get the price prediction for different car models with different fuel type, mileage, engine size, transmission:
Now, MindsDB thinks that this type of car would cost around 12k and price ranges to 23k.
Let’s sum up what we did and the simple steps we take to get the predictions:
Quite simple right?
This is a brand new feature that we have developed so we are happy to hear your opinions on it. You can play around and query the model with different values or train and query the model with different datasets.