Data Visualization & Exploration
Data exploration and visualization are critical to developing an understanding of any data set. Before moving on to any analysis, it is imperative to visualize the data in a comprehensive manner. This video discusses the concepts of exploration and visualization in data science and how these can help in comprehending the characteristics of a data set.
What You'll Learn
> Importance of data exploration and visualization
> Techniques involved in exploration and visualization
All right. So, last, but very certainly not least, is data exploration and visualization. Data exploration and visualization are critically important to the practice of data science. You need to understand what your data looks like before you can start to model it properly.
So, what is data exploration? Essentially, data exploration is visualization and calculation that allows us to better understand the characteristics of a dataset. The key motivations of it are that we want to be sure we select the right tools for preprocessing and analysis. And because it uses our human mind’s really, really powerful ability to recognize patterns. A person will recognize a pattern that a data analysis tool won’t in a lot of contexts.
Building a neural network, which will tell you if a picture is of a face, is a massive endeavor. It’s a very complicated endeavor but humans can do it. Most humans can do it innately, automatically, very, very quickly. This is, of course, related to the historical field of exploratory data analysis, EDA. The original book is Exploratory Data Analysis by John Tukey. And if you’re interested in data exploration, specifically, there’s some information here.
The original focus of the field of EDA is not the same as our focus as data scientists. As data scientists, our focus is on summary statistics and visualization. And EDA, using clustering as exploratory techniques, anomaly detection as exploratory techniques. In our context, now clustering and anomaly detection are major areas of data science interest, major fields, sub-fields of their own, not just a piece of an exploratory. Though, clustering for exploratory purposes is still used a great deal. Good clustering algorithms and good clustering practice are some of your more powerful tools if you have a very complicated dataset.
Data Science Dojo Instructor - Data Science Dojo is a paradigm shift in data science learning. We enable all professionals (and students) to extract actionable insights from data.
© Copyright – Data Science Dojo