Data Transformation

r_subheading-Course Description-r_end Transformation is an important concept in handling and processing data, Several techniques can be used to transform data and make it more useful. This video introduces some of the basic concepts related to data transformation. r_break r_break r_subheading-What You'll Learn-r_end • Data transformation in data processing. r_break • Attribute transformation. r_break • Aggregation as a transformation technique


In addition to doing something very complicated, like a Fourier transform, you can take a lot more similar, a lot more straightforward transformation of your data. Very common transformations are taking the exponential of a data, taking the logarithm of a data value, taking the absolute value of a data value. All of these types of things allow us to very nicely try to bring out different dependencies in our data, to try to correlate our data attributes better with whatever our target is. r_break r_break The other two things here I’m going to take special time to talk about because they show up a lot. Standardization and normalization are probably the most common kinds of transformations that are applied to data, to attributes, in data science. Standardization is where we take our numeric data and we divide the numeric data, each numeric value, by the mean. Sorry. We subtract the mean and divide by the standard deviation of our dataset. So what this does is forces our data to have a mean of 0 and a standard deviation of 1. That’s why it’s standardization. The reason why we do this is that it’s a way of scaling our data down. r_break r_break If you have, for instance, age and annual income, there are a lot of different - really the majority of models of algorithms will overweight your data science or will overweight your annual incomes, so if you have age and annual income. But if we standardize both of those, then age and annual income are going to be weighted in exactly the same way. A somewhat less extreme version to do the same thing is normalization where we simply subtract the minimum from every data value and then divide by the maximum. And that maps the entire data onto the range from 0 to 1. It distorts the separation between the values to a certain extent. But it does scale it very nicely so that age and annual income - again taking the age versus annual income distinction - will end up on the same 0 to 1 scale. They’ll be weighted the same way by our algorithms.

Data Science Dojo Instructor - Data Science Dojo is a paradigm shift in data science learning. We enable all professionals (and students) to extract actionable insights from data.