Comparing MindsDB with Sklearn and Tensorflow is not exactly an apples to apples kind of thing. However, Sklearn and Tensorflow are the libraries one would most commonly use to solve a machine learning problem, so I think it would serve us well to see how we go about solving the same problem using Mindsdb, Sklearn and Tensorflow.
This comparison is based on the one in our documentation, which was originally written by Sudhanshu Kumar (sk-ip on github). A big thank you to him!
In case you need a quick reminder:
We are going to use the home rentals dataset for this problem and we’re going to attempt to predict the rental price a property is let for.
Preprocessing is required for Sklearn and Tensorflow, but not for MindsDB. MindsDB is able to interpret the raw CSV file and infer the data types and how to encode them from that.
When working with data we first have to see what type of data we are dealing with; in our case we have some numerical, categorical data. We first need to convert categorical data columns into numerical data.
Using this data we’ll build the actual models, train them, and then make some predictions on the testing dataset. For the purpose of this example, we'll build a simple linear regression with both Tensorflow and Sklearn, in order to keep the code to a minimum.
This is rather tedious. Tensorflow is great if you want a lot of control and you need top-notch speed. But unless you know what you’re doing it gets complicated *really* fast.
Some neural-network focused libraries such as Pytorch are somewhat less convoluted, but the code would still be in the same complexity ballpark.
With scikit-learn things are easier once we get past the preprocessing stage.
Both MindsDB and Sklearn can solve our problem but unlike Sklearn there’s no preprocessing required with MindsDB since MindsDB does that automatically as part of the `.learn` method. With MindsDB we’re also loading the files for testing and predicting directly from their respective URLs.
MindsDB differentiates itself from other libraries by its simplicity. It should also be noted in this case we built the simplest possible model with Tensorflow and sklearn, a Linear regression, whilst MindsDB had a much more complex (and thus likely more accurate) model under the hood.The more complicated your models get, the more complexity you avoid with a tool like MindsDB. That being said, some control is obviously lost when compared to Tensorflow, and some of the options sklearn might give you are not present. That being said, we often use sklearn and mindsdb together, for example, we use sklearn’s metric module to evaluate predictions.
Finally, there’s the explainability component of MindsDB, which can’t be matched by either Tensorflow or SciKit, at least not in any way that wouldn’t require you to build something like MindsDB’s explainability function from scratch. However, more on that in a future article.
George is a bloke who likes writing (both for computers and humans) and overall enjoys problem-solving, learning about the universe and being a skeptic git. He's mainly worked as a jack-of-all trades ML/devops/backend dev guy in various startups dealing with challenges around processing and analyzing large volumes of data.