Skip to main content

Blog entry by Nathan Piccini

6 Books to Help You Learn Data Science

6 Books to Help You Learn Data Science

Learning different concepts in data science can often be daunting. Here are 6 books to help lift the burden.

Machine Learning: A Probabilistic Approach

This is an almost *exhaustive* book on machine learning topics ranging from the very basics of probability to mixture models, variational inference, and deep learning. Even though I first encountered this book as a companion textbook for a university course, I think calling this a textbook is doing it a disservice. It is basically an encyclopedia and can serve as a detailed reference for any data scientist or machine learning engineer. The book doesn’t shy away from proper mathematical notation, which might be jarring for some, which is why the first couple of chapters about the basics are so important to get your feet wet. There are diagrams exploring the characteristics of models, pseudocode, fully worked examples, and even exercises at the end of chapters. There are a bunch of fantastic online learning resources for stats, ML and data science topics but most of them shy away from the maths and theoretical aspects which is where this book shines.

  • Author: Kevin Murphy
  • Education Level: Beginner - Advanced

Fundamentals of Deep Learning (O’Reilly)

Deep Learning is only getting more and more popular each year and with that, the wealth of online tutorials and courses about each topic keep increasing. My main issue with most of these is that they are either too focused on the implementation (feeling more like a tutorial for Keras than deep learning as a field) or they skip out on key theoretical concepts. The Deep Learning by Ian Goodfellow, while being a very detailed exploration of the field and its roots, is (in my opinion) not the best jumping off point for beginners or even many people who understand the basics. This book, The Fundamentals of Deep Learning (O’Reilly), doesn’t have this problem. It uses easy to understand notation and minimal derivation while still covering the breadth of the field’s most common concepts (this is definitely less ‘complete’ than the Goodfellow text). The major advantage this book has as an introductory text is the inclusion of companion code samples in Tensorflow (the most popular DL framework) which makes the jump from reading and learning a topic in the book to actually implementing and experimenting seamless.

  • Author: Nikhil Buduma
  • Education Level: Beginner

An Introduction to Statistical Learning (with Application in R)

An Introduction to Statistical Learning (popularly known as ‘ISLR’) is easily, one of the most popular textbooks available on machine learning. The text builds your machines learning concepts step-by-step. Also, despite consciously restricting the discussions little short of details on ‘mathematical derivations’ and ‘statistical jargon; the text gives a complete treatment to respective topics.    

  • Author: Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani
  • Education Level: Beginner

The Elements of Statistical Learning: Data Mining, Inferencing, and Prediction

The Elements of Statistical Learning (popularly known as ‘ESL’) is often recommended as the next step in learning for machine learning (ISRL being the first step). In my opinion, the ESL text demands an advanced level facility with Algebra, Calculus, and Statistics. Like ISLR, ESL does find mention as either an assigned or a recommended textbook in leading master’s programs in Data Science, Statistics, and Business Analytics.        

  • Author: Trevor Hastie, Robert Tibshirani, Jerome Friedman
  • Education Level: Advanced

R for Everyone

The solution to the often-thought problem that R requires too much knowledge for non-statisticians, R for Everyone draws on making learning easy and intuitive. This book starts with the basics, walking you through downloading and installing R, but takes you through more advanced problems so you'll be able to "tackle statistical problems you care about the most".

You can expect to build both linear and non-linear models, use data mining techniques, use LaTeX, RMarkdown, and Shiny to make your code reproducible.

"This guide focuses on the 20 percent of R functionality you'll need to accomplish 80 percent of modern data tasks.

  • Author: Jared P. Lander
  • Education Level: Beginner

The Cartoon Guide to Statistics

Used as a textbook in Data Science Dojo's data science bootcamp, The Cartoon Guide to Statistics covers everything needed for a basic understanding of statistics. The authors use cartoons and humor to explain the concepts many find hard to learn. This book is great if you're just starting to learn statistics and data science, or if you want a good laugh while you refresh your memory.

The last page reads: "Well, that's it! By now, you should be able to do anything with statistics, except lie, cheat, steal, and gamble. We left those subjects to the bibliography."

  • Author: Larry Gonick and Woollcott Smith
  • Education Level: Beginner

  • Share