How to Start With Machine Learning?
Machine learning is about using sample data to build mathematical models that enable computer systems to perform tasks without obtaining explicit instructions. Image recognition, self-driving vehicles, Internet search engines, computer vision, spam email filtering, and many other systems use machine learning. It’s also applied in financial forecasts, medical diagnostics, fraud detection, and so on.
Machine learning is a vast and promising area. It offers exciting solutions to real-world problems as well as a variety of well-paid jobs.
This article is about learning and starting a career in this field.
First, you should learn the fundamentals:
- Learn mathematics
- Learn the theory and intuition behind data science and machine learning
- Learn programming
- Learn libraries for data science and machine learning
- Practice by playing with data
Once you’ve got the foundations, you should always learn more and keep yourself up-to-date by following the progress in the area:
Read data science, machine learning, and artificial intelligence blogs and papers Follow interesting people, groups, companies, and organizations on Twitter and other social networks Include yourself in discussions, ask questions, give the answers to other people’s questions The rest of the article is about the first part: building the fundaments of your knowledge.
The knowledge of mathematics is very important for people into data science and machine learning. It allows them to understand in-depth how and why the machine learning methods function. It also allows one to correctly design experiments, test hypotheses, combine methods, optimize hyperparameters, an so on.
Three main branches of mathematics required for machine learning are:
Calculus Linear algebra Probability and statistics Calculus is important because everything else relies on it, especially probability theory, statistical methods, and convex optimization. There are many potentially useful calculus books like:
Calculus by J. Stewart Thomas’ Calculus by G.B. Thomas, M.D. Weir, and J.R. Hass; please note that the latest edition of this book is authored by J.R. Hass, C.E. Heil, and M.D. Weir If you’re a complete beginner, you can try the tutorial Calculus for Beginners and Artists from the Massachusetts Institute of Technology.
Linear algebra is the basis of many machine learning methods and approaches such as linear regression and linear discriminant analysis. It’ll teach you how to handle multi-dimensional data and how to find relations between them. Some recommended books in linear algebra are:
Linear Algebra and Its Applications by D.C. Lay, S.R. Lay, and J.J. McDonald Introduction to Linear Algebra by G. Strang Linear Algebra and Its Applications by G. Strang Linear Algebra and Learning from Data by G. Strang You might also find beneficial the YouTube lectures of prof. G. Strang from the Massachusetts Institute of Technology available on YouTube.
The theory of probability and statistics have many concepts used in machine learning. Conditional probability, the Bayes theorem, the central limit theorem, hypothesis testing, regression techniques, and the entropy of information are just several examples of such concepts. Some convenient books about probability and statistics are:
Introduction to Probability and Statistics for Engineers and Scientists by S.M. Ross Probability and Statistics for Engineering and the Sciences by J.L. Devore You don’t need a high knowledge level in mathematics to start with machine learning, but once you want to understand and perform some serious stuff, you’ll feel the need for it.
Learn the Theory and Intuition behind Data Science and Machine Learning
You’ll also need to get insight in the applied aspect of mathematical concepts, that is to understand precisely how machine learning methods are designed. Some good books about these concepts are:
An Introduction to Statistical Learning by P. Forrest An Introduction to Statistical Learning with Applications in R by G. James, D. Witten, T. Hastie, and R. Tibshriani The Elements of Statistical Learning: Data Mining, Inference, and Prediction by T. Hastie, R. Tibshirani, and J. Friedman There are also two fantastic, free, online books:
Deep Learning by I. Goodfellow, Y. Bengio, and A. Courville Neural Networks and Deep Learning by M. Nielsen You’ll find many good explanations and visual representations there. The notes from the machine learning courses are freely available from the Web sites of the Stanford University and Massachusetts Institute of Technology. The lectures of these courses are also freely available on YouTube. Duomly offers a comprehensive course on machine learning, as well as several articles you might find useful:
They explain the intuition behind the machine learning methods and provide their step-by-step implementations.
Learn Libraries for Data Science and Machine Learning
One of the most important things is to master programming libraries for data science and machine learning. The leading Python libraries for this purpose are:
- NumPy is a fundamental and high-performance Python library for manipulating arrays and numerical computing
- SciPy is a comprehensive library for numerical computing based on and extending NumPy Pandas is a library for easy and intuitive manipulation of one- and two-dimensional labeled data, also related to NumPy
- Scikit-learn is a comprehensive and widely-used machine learning library built on top NumPy and SciPy for data preprocessing, regression, classification, cluster analysis, model selection, and dimensionality reduction
- TensorFlow is a deep learning library focused primarily on neural networks by Google
- Keras is a library for creating and training neural networks that can be used with the TensorFlow, CNTK, or Theano backends
- Matplotlib is a powerful and widely-used library for data visualization
- Bokeh is a library for interactive data visualization and presentation in the Web browsers
The official Web sites usually provide good and free documentation and tutorials for each of these libraries. One additional especially good tutorial is the Anatomy of Matplotlib. It’s freely available on GitHub.
Practice by Playing with Data
If you want to become an expert in any area, you have to practice a lot.
You should get an interesting dataset. It may be related to sports, medicine, weather, finances, government, just anything you’re passionate about. Then, you can use it to do some data cleaning, data standardization, regression, classification, cluster analysis, pattern recognition, association rule learning, dimensionality reduction, and more.
You can download free datasets from many websites like Kaggle, FiveThirtyEight, Socrata OpenData, Wikipedia, UCI Machine Learning Repository, data.world, data.gov, Google Trends, Google’s BigQuery public datasets, the British government’s official data portal, Reddit, Nord Pool electricity market, and many more.
In addition, the libraries such as scikit-learn, TensorFlow, and Keras provide the datasets suitable for practice.
One more interesting resource is the TensorFlow Neural Network Playground that allows you to create and use neural networks visually from your browser.
For more information on the datasets, check Duomly’s article:
15 Best Machine Learning Datasets For Free.
Learning machine learning is a challenging and interesting task. It requires knowledge in many areas. Once you master it, it offers huge possibilities to apply it and finds interesting and well-paid jobs.
This article presents some resources for learning data science and machine learning, get data to practice with, as well as a few general advises.
There are many more fascinating books, courses, tutorials, blog posts, videos, and so on. Maybe more than one could read or watch during an average human lifetime. There are many average or low-quality stuff, as well. There are some new resources appearing every day.
Machine learning is just at its beginning. It grows and develops. If you want to be involved with it, you should too.
Thank you for reading!
This article was provided by our teammate Mirko.