Data Attributes (Cont.)
Continuing the discussion on attributes and their types, this video explains the subtypes of attributes. This brief introduction to the most important categories and subcategories of data attributes is essential to get acquainted with the world of data science.
What You'll Learn> Categorical and non-categorical attributes
> Ordinal and nominal attributes
> Interval and ratio variables
So, within these two big categories of attributes, we have some subsets that are also important to think about. One of the most important of these is the distinction between categorical attributes and non-categorical attributes.
Categorical attributes are discrete attributes that specifically have a finite set of values that they are allowed to take. There are several examples here. And within categorical, there are two useful subsets. So categorical attributes are any attributes that have only a finite set of values. If that finite set of values has a natural ordering – this is something like rankings or grades or clothing sizes – we call that an ordinal attribute. Ordinal means that it has an order – pretty straightforward linguistics there. Ordinal attributes are nice because we can code them as integers and maintain the ordering between them. We don’t have to treat them particularly special.
Most categorical variables are what we call nominal categorical variables or attributes. Nominal attributes have no inherent ordering to them. Eye color, zip codes, ID numbers, hair color, whether someone is married or not or divorced or living with a partner. There’s no way you can say, “Oh yes, blue should have a value of 5 and green should have a value of 2 because I don’t like green eyes.” There’s no ordering that you can put into those variables. So, for nominal attributes, in particular, we have to be careful about handling.
Other useful types to think about, in terms of variable types that allow us to treat them especially in ways that are useful, that are easier, on the continuous side, are interval and ratio variables. You can certainly have intervals or ratios that are discrete but for the most part, you see them as real or as continuous. Interval variables are a variable where the measurement is a measurement, basically, where the difference between two values is constant and meaningful. For instance, with temperature, say, the temperature in Celsius, a temperature of 100 degrees and a temperature of 90 degrees have the same difference in heat between them as a heat of 80 degrees and a heat of 90 degrees. Interval variables are basically continuous variables that have a nice metric we can assign them that gives us some nice handling.
Something like the decibel scale, on the other hand, is much harder to handle as an interval, because the decibel scale, if you’re thinking about the actual intensity of the sound, it’s a logarithmic scale. So, the difference between three decibels and four decibels is smaller than the difference between 13 and 14 decibels. So that’s an example of a continuous variable that isn’t an interval variable.
Data Science Dojo Instructor - Data Science Dojo is a paradigm shift in data science learning. We enable all professionals (and students) to extract actionable insights from data.
© Copyright – Data Science Dojo