How to Write a Function
Functions allow us to reuse code, saving us from having to re-write the same code again and again.
What You'll Learn
> Creating a simple function in R.
If you haven’t installed R and Rstudio already, you can watch "Getting started with Python and R for Data Science" video to get started.
For the dataset used in this exercise, download from here.
A function is a reusable code where we want to apply the same code to different inputs. This saves you from having to write the same code again and again where you only need to change the inputs but the functionality, or the code, is the same we discussed using pre-built functions in R but have not yet discussed creating your own custom function in R.
If you’re required to for your own particular need in our video on controlled statements we used the “for”, “if”, and “else” statements to tag our data as ‘high’ or 'low-med' income and store those tags in a new column vector, so that we can add this to our income data frame. But what if, in the future, we are working with different data frames and we create new columns that we would like to add to those data frames.
We will be working with many different data sets, so it’s likely we will want to apply the same functionality onto any data frame and column vector. What we need to do is create a function for this. So, here’s an example of what that function would look like. Most functions require a name of some sort and they also require some kind of inputs, some kind of processing of those inputs, and usually return some kind of output after having processed the inputs. This is the basic makeup of most functions. We first give our function a name and call, so we can use this to call the function later - as we did in the pre-built functions video, we called the “read.CSV” function by its name “read.CSV” and then gave it the required inputs.
In our function here, we have called it ‘add.vect.df’ and we will use its name later when we’re ready to call the function or use it. We also specify that it’s a function that we’re creating here with the given inputs of a data frame and a vector which can be named anything by the user but ‘df’ and ‘vect’ take the place of the users data frame name and the vector name.
When calling the functions, the user of our function will need to first input the name of their data frame and then the name of their column vector. The core functionality is that we use R ‘cbind’ here to basically add the vector onto a data frame and tie it to a variable called ‘new.df’ and then we return ‘new.df’ or the resulting data frame after having added a new column onto the data frame.
Now, if we call this function after running our function created above. So, we’ll run this and we call it. We use the name to call it and give it the required inputs, so our data frame is income and our column vector is ‘income.level’. We run this.
You’ll see it’s done what it’s supposed to do it’s added this column onto our data frame. Now, using this function we can give it any required inputs to any data frame and any new column vector we wish to add. So, we don’t have to rewrite this code or hard code each time we want to, you know, apply this function to a different data frame and vector.
That completes our introduction to the R series. You’re now ready to hit the ground running the first day of the Bootcamp but remember any new skill requires some practice, so revise these learning exercises if you need.
We look forward to building up your data science skills you.
Rebecca Merrett - Rebecca holds a bachelor’s degree of information and media from the University of Technology Sydney and a post graduate diploma in mathematics and statistics from the University of Southern Queensland. She has a background in technical writing for games dev and has written for tech publications.
© Copyright – Data Science Dojo