Understanding Factors






     

    Course Description

    Categories or levels are referred to as factors in R. This R objects allows us to treat categorical data accordingly. 

    What You'll Learn

      >  Learn how to cast character strings or numbers as factors so that they are treated as categories.



     

    If you haven’t installed R and Rstudio already, you can watch "Getting started with Python and R for Data Science" video to get started.

    For the dataset used in this exercise, download from here.



     

    Categories or levels are referred to as factors in R. This R objects allows us to treat categorical data accordingly, so that it’s not incorrectly interpreted in the program as a number or a string of text.

    If we have a look at the structure of our “animals” data frame we created in the video on vectors, we can see it includes three vectors here: “animals”, “weight”, and “class.tag”. If we look closely at “class.tag”, here we can see that this has been treated as an integer number when it is in fact a class represented as a number or tagged that its number class to treat this accordingly. We need to turn it into a factor. So that one, two, and three are factor levels or class labels. We can simply do this using the “as.factor” function or the “factor” function in R.

    We’re going to override the numeric “class.tag” vector and treat it as a factor in our “animal” data set. So, to do this, we simply refer to our data set and we’re looking at the “class.tag” variable. Gonna override this by turning all those values that we have there into factors.

    Once again, in our “animals” data set looking at our “class.tag” variable. Now, if we look at the structure of “animals” data, we can see that it is treated accordingly. It is now factors, so we have three distinct factor levels - factor level 1, 2, & 3 - as our kind of class labels.

    In the next video, we’ll look at calling pre-built functions in R.



     

     

    Rebecca Merrett - Rebecca holds a bachelor’s degree of information and media from the University of Technology Sydney and a post graduate diploma in mathematics and statistics from the University of Southern Queensland. She has a background in technical writing for games dev and has written for tech publications.