In this comprehensive video, we will discuss data pre-processing or data cleaning and its importance and significance in data science.
What You'll Learn
> Data pre-processing and data cleaning
> The importance of data cleaning in all problems
So, now we get to the much-foreshadowed data preprocessing section.
Data preprocessing is sometimes called data cleaning but data preprocessing should involve more steps than just cleaning the data, just removing the problems with the data. So, data cleaning is kind of a subset of preprocessing but most of what we do during data preprocessing is, in fact, data cleaning. Again, lots of different terms to refer to basically the same thing.
There’re a lot of different types of preprocessing. And I’m going to talk about a lot of different strategies, aggregation sampling, all the ones on the screen here. I’m going to talk about all these different strategies. But we don’t want to use all of these different strategies on every data set, right? There’s a lot of different strategies we can use, but for any given data set, we’re only going to use a couple of them usually. We don’t want to overwhelm you. We’re not going to need every technique and every tool in our toolbox every time.
Another note before we keep going, not all of these are strictly independent. These terms categories are all things you see thrown around and terms you see used around the industry but, because data science is such a heterogeneous field, not all of these things are strictly independent. So if you see some overlap in what I’m talking about between different attributes, that’s why.
Data Science Dojo Instructor - Data Science Dojo is a paradigm shift in data science learning. We enable all professionals (and students) to extract actionable insights from data.
© Copyright – Data Science Dojo