This article is for those of you who are Linux-users/Distro-hoppers who are trying to find the perfect distro to do Machine Learning & AI activities in.
Let’s explore what choice we have and analyze their differences in an attempt to choose the best Linux Distro for Machine Learning & AI activities.
For those of you in a hurry to start your Machine Learning & AI quests, here is the short version of the answer!
The Short Version Of The Answer
Distro#1: Ubuntu and its derivatives
Distro#2: Fedora and the RHEL family of distros
That is just the short version of the answer, let’s go ahead and look at the longer and more informative version and learn what were the factors considered, what other choices you have and see why Ubuntu is chosen as the best distro for Machine Learning & AI along with some valuable resources for getting started with your Machine Learning & AI quests!
Linux has already captured a huge percentage of the server market and it is the most dominant player there. As Linux is getting more and more famous lots of distros are being developed to enhance the desktop support for Linux.
The setup programmers would need for coding will be radically different than the setup artists and video editors would need for content creation and editing! Hence some factors that are considered important for using Linux as a workstation depends upon the kind of work that you are planning to do with it.
But some factors are common across all types of work. These include
- support &
If you plan to do all of your productive activities over on your next Linux distro then I suggest reading the article I wrote recently given in the link below where I have analyzed and compared several options in order to figure out what is the best distro for workstation purposes.
Next let’s go ahead and have a look at some of the needs of a programmer who wish to develop Machine Learning and AI applications.
The Needs for doing Machine Learning & AI
The basic needs to do Machine Learning & AI include the following.
- A good code editor: VS code, Atom, Sublime Text or Brackets.
- Support for Python, R, GO and other languages you may use for machine learning
- Support for PyTorch, TensorFlow, OpenCV and other machine learning libraries that you might need for your project
- Virtualization software to test your Machine Learning & AI apps on several operating systems: On Linux you have support for VirtualBox and GNOME Boxes
- Data collection, verification and organization tools that you may need
- Source code management software like Git
- DevOps tools like GitLab
The above needs can be fulfilled by all the Linux distros as Linux is a major operating system, as such Linux has all the capabilities needed to develop and run your code! All you need to do is to ensure you have the necessary versions of libraries of your choice installed in your distro. One distro is not going to be better than another in terms of how good it executes any given code as these distros are going to be using the same Linux OS underneath anyways.
Other Important Needs
Official Support for the Latest Libraries
But then we are interested in professional development, and hence we need the latest frameworks version available as soon as it is released so that we can test out our code with these latest versions. Usually, the official repo’s of most recent distributions will be behind the latest frameworks releases, but we can always download and install the latest versions straight from the framework’s official website! So, this is another need that is not a decision-making factor for us while choosing a distro!
This brings us to one last important need, the need for modern tools that are specifically made for machine learning applications.
Decision Making Factor: Modern Tools that are designed for machine learning and AI apps
The setup you might need for efficient training of your machine learning app will need a lot of resources so that you don’t have to wait for hours while you wait for your algorithms to finish running! But using the same setup is not needed for the inference phase, as the inference phase is usually lightweight and speedy.
The main strategy used to deploy Machine Learning apps is to use containers like Dockers to deploy them on the cloud. But this brings with it its own challenges, nuances and details that the developers need to concern themselves with, which make your work less productive and usually machine learning specialists don’t find working their way around these silly details to be very interesting and would like to keep their focus on the design of their algorithms instead!
Modern Tool#1: KubeFlow
To solve this need of Machine Learning and AI app developers, on an opensource project called KubeFlow is being actively developed, a tool to manage your resources, libraries and containerization requirements to make development and deployment of Machine learning apps easier.
Modern Tool#2: CUDA
As you might know, GPU has more cores than a CPU but those GPU processing cores are hardwired to do some simpler tasks which are normally the kind of tasks that we come across all the time when we need to draw something on the screen. Whenever we are training AI apps, the algorithms are usually designed to iterate over and over again till they converge. Here we can make use of the GPU’s cores to make the entire process a bit faster and efficient. But before doing that, we need some additional software like CUDA from Nvidia to hack our GPUs to work on these machine learning tasks.
During the deployment phase, we can always just get an Nvidia Docker Image to do the trick for us!
Now that we have looked at all the distro choices, let’s have a look at the distro choices we have and choose one that will best meet our needs!
The Distro Choices
That leaves us with 3 most popular distro families
- The Debian Family: Debian, Ubuntu, Linux Mint, etc
- The RedHat Family: Fedora and CentOS
- The Arch family: Arch, Manjaro, etc
Reason Ubuntu gets 1st place
Ubuntu has official support for KubeFlow, Kubernetes, Docker, CUDA, etc., and hence Ubuntu satisfies all our needs mentioned above. Being a popular distro you can find a wealth of information online like support, machine learning tutorials etc. And hence Ubuntu is chosen as the number 1 distro for machine learning!
If you are uncomfortable with their Unity desktop, you can go ahead with one of their other flavors like Kubuntu, Xubuntu, Lubuntu, etc. You can read more about these flavors in my other article given below.
Debian is considered to be a distro for advanced users and hence it does not have a big user base. The same goes for other derivatives of Debian which does not have a large user base as Ubuntu. This leaves us with Ubuntu and flavors of Ubuntu as our best bet for Machine Learning purposes!
Reason Fedora gets 2nd place
Fedora is from the RedHat family, it is used as a testing ground by RedHat to test their latest features before releasing them in their RHEL enterprise edition. Hence Fedora is the most popular distro in terms of support for the latest advancements in the field of Linux. It also has a big user base, only second to Ubuntu. Due to their big user base, this is another distro most tools are readily available on!
If your Machine Learning app is supposed to be run from a server or a cloud, then since most of the servers are running RHEL, it makes sense that you use Fedora do develop your app. As a bonus, you get to work with some experimental features months and sometimes even years before you get them in Debian based distros like Ubuntu!
What about Arch and Family?
Arch and its derivatives are all about catering to the needs of advanced users who can tinker their way through problems. Hence the IDE companies don’t invest much of their resources in testing their app in the Arch Ecosystem as the Arch users are proficient enough to solve any problems they might come across.
If you already in love with the distro you are using, then there is no reason in switching just for developing Machine Learning and AI apps. If you are a beginner to the Linux world then go with Ubuntu or one of its flavors or Linux mint. If you have been around the Linux field for a while and you want to switch from the Debian family of distros like Ubuntu or Mint, then try out Fedora or Cent OS from the RedHat family. If you are an expert in Linux, then I suggest you try the Arch family and install your own optimized version for doing your Machine Learning & AI activities in.
And with that, I will conclude this article!
I hope you guys enjoyed this article and learned something useful.
If you liked the post, feel free to share this post with your friends and colleagues!
Here are some of my other articles that might interest you!