Comparing MindsDB with SciKit-Learn and Tensorflow

George Hosu

Our Lead Machine Learning Engineer takes a look at how MindsDB compares with SciKit-Learn and Tensorflow

Comparing MindsDB with Sklearn and Tensorflow is not exactly an apples to apples kind of thing. However, Sklearn and Tensorflow are the libraries one would most commonly use to solve a machine learning problem, so I think it would serve us well to see how we go about solving the same problem using Mindsdb, Sklearn and Tensorflow.

This comparison is based on the one in our documentation, which was originally written by Sudhanshu Kumar (sk-ip on github). A big thank you to him!

In case you need a quick reminder: 

  • MindsDB is an easy to use machine learning library.: one line of code to train a model, one line of code to test it, and potentially more importantly, to be able to explain its models and predictions to the user.
  • Sklearn is a machine learning library aiming to expose a lot of functionality while being easy to use. It is mainly focused on tried and tested techniques.
  • Tensorflow is a machine learning library specifically aimed at creating neural networks. Its focus is more narrow than Sklearn, but broader than Mindsdb and it’s often used by researchers when designing their models.

The problem

We are going to use the home rentals dataset for this problem and we’re going to attempt to predict the rental price a property is let for.

Preprocessing

Preprocessing is required for Sklearn and Tensorflow, but not for MindsDB.  MindsDB is able to  interpret the raw CSV file and infer the data types and how to encode them from that.

When working with data we first have to see what type of data we are dealing with; in our case we have some numerical, categorical data. We first need to convert categorical data columns into numerical data.

Building the model

Using this data we’ll build the actual models, train them, and then make some predictions on the testing dataset. For the purpose of this example, we'll build a simple linear regression with both Tensorflow and Sklearn, in order to keep the code to a minimum.

Tensorflow

This is rather tedious. Tensorflow is great if you want a lot of control and you need top-notch speed. But unless you know what you’re doing it gets complicated *really* fast.

Some neural-network focused libraries such as Pytorch are somewhat less convoluted, but the code would still be in the same complexity ballpark.

Sklearn

With scikit-learn things are  easier once we get past the preprocessing stage.

Mindsdb

Both MindsDB and Sklearn can solve our problem but unlike Sklearn there’s no preprocessing required with MindsDB since MindsDB does that automatically as part of the `.learn` method. With MindsDB we’re also loading the files for testing and predicting directly from their respective URLs.

Conclusion

MindsDB differentiates itself from other libraries by its simplicity. It should also be noted in this case we built the simplest possible model with Tensorflow and sklearn, a Linear regression, whilst MindsDB had a much more complex (and thus likely more accurate) model under the hood.The more complicated your models get, the more complexity you avoid with a tool like MindsDB. That being said, some control is obviously lost when compared to Tensorflow, and some of the options sklearn might give you are not present. That being said, we often use sklearn and mindsdb together, for example, we use sklearn’s metric module to evaluate predictions.

Finally, there’s the explainability component of MindsDB, which can’t  be matched by either Tensorflow or SciKit, at least not in any way that wouldn’t require you to build something like MindsDB’s explainability function from scratch. However, more on that in a future article.

Author Bio

George is a bloke who likes writing (both for computers and humans) and overall enjoys problem-solving, learning about the universe and being a skeptic git. He's mainly worked as a jack-of-all trades ML/devops/backend dev guy in various startups dealing with challenges around processing and analyzing large volumes of data.

Be Part of Our Community.

Join our growing community.