Control Statements: For Loop, If, Else
Course DescriptionControl statements allow us to control the flow of a program. We make use of the for loop, if and else statement to loop through data values, check if they meet a condition, and assign a string text label to each value.
What You'll Learn
> To use siimple yet useful control statements – the for loop statement, and the if and else statements.
If you haven’t installed R and Rstudio already, you can watch "Getting started with Python and R for Data Science" video to get started.
For the dataset used in this exercise, download from here.
Let’s just say you need to check through a column vector of data values to see if it meets a certain condition or only executes a function if it means a certain condition. In this example, we need control statements that allow us to control the flow of a program.
There are quite a few control statements you can learn in programming but the most useful and simplest control statements are the “for loop” statement, the “if” and “else” statements.
So, let’s just say I have data on the average income for different job roles across, you know, US cities. I want to loop through average income and check if there are incomes greater than or equal to 90,000. If there are incomes that are 90k plus, I want to tag it as ‘high’ and store the tag label ‘high’ in a vector or a list, and for those that fall below this figure, they can be tagged ‘low’ to ‘medium’. To do this and save myself the trouble of manually checking through the data and tagging it myself, I’ll make use of a “for loop” and an “if” and “else” statement.
My “for loop” will check through each data value in average income, iterating through each value, one at a time. My “if” statement will check if each value is greater than or equal to 90K, and if that’s the case, it’ll be labeled ‘high’ income and stored in a new column vector, otherwise, it will be just labeled as ‘low’ to ‘medium’ income. So, I have an empty vector tied to a variable called ‘income level’ ready for storing the tag labels ‘high’ or ‘low-medium’ in this vector. This is going to be a new column vector that we’ll add to our income data later where the ‘high’ label corresponds to all cases that in 90k plus and the ‘low-mid’ corresponds to all cases that fall below 90k.
So, the way to read this block of code here is that for each index value or for “i” starting at index 1 to the last index number or however long average income is. Check if the income value sitting at that particular index is greater than or equal to 90,000. If it is, append this onto the end of income level append the string ‘high’ onto the end of income level and keep going through the next value. And if it also meets this condition, append on to the end of income level. This is this value ‘high’ here and so on and so forth, ‘else’ for everything else that doesn’t meet this condition. Append the value of the string character ‘low-med’ onto the end of income level and the brackets here, or the curly brackets here. Just help us keep track of everything that belongs inside a statement.
So, let’s run this empty vector here and this for, if, and else setup we have here, then print the income level tag labels. So, we’ll run this, and let’s have a look at what it produced here. Okay, as you can see we have some tag labels that categorize high-income level jobs from low to medium level income jobs which is useful to have and we didn’t have to manually check the data ourselves to find this.
In the next video, I’ll show you how to create your own function, so you can make use of repeatable code.
Rebecca Merrett - Rebecca holds a bachelor’s degree of information and media from the University of Technology Sydney and a post graduate diploma in mathematics and statistics from the University of Southern Queensland. She has a background in technical writing for games dev and has written for tech publications.
© Copyright – Data Science Dojo