Calling Pre-built Functions
Course DescriptionR is loaded with pre-built functions to help you carry out routine data science tasks. Whether it’s data manipulation, modeling, or doing calculations, there’s likely a package containing a collection of pre-built functions to help you implement a task.
What You'll Learn
> Learn how to call a function in R
> How to install and load an R package to make use of its pre-built functions.
We’ll use some of these pre-build functions during the Data Science Bootcamp.
If you haven’t installed R and Rstudio already, you can watch "Getting started with Python and R for Data Science" video to get started.
For the dataset used in this exercise, download from here.
Most, if not all, routine data science tasks have been packaged with pre-built functions in R, so we don’t need to rewrite these tasks from scratch. Whether it’s data manipulation, modeling, or doing calculations on the data, there is likely a package containing a collection of pre-built functions to help you implement a task, speeding up the process, so you can be a bit more productive.
Whether you realize it or not, we have already used some pre-built functions in R. For example, “read.CSV”. This is a pre-built function that takes inputs such as the CSV file into a directory and some other optional inputs such as specifying whether the file does or does not have header columns. So, for example, we could just give it the file. Here, we’re calling the function and giving it the required minimum input or argument, and this pre-built function which reads in the CSV file is applied to the given input reads “file.CSV”.
So, when you want to apply the function to a given input or, basically, when you want to use it, you are calling that function, using this function “read.CSV” allows us to read in the data in a single command line saving us a bunch of time. So, we don’t need to write the program from scratch to locate the file, to open it, to interpret and read it, and to make it a data frame. So, “read.CSV” is part of R’s utils package which you’ll see here. This comes with R and contains a standard set of functions that are for common tasks.
“Base” is another R package that also comes with R and offers a standard set of functions. Now, when you want to do something outside the standard functionality in R, we can install a package to help implement a specific task. For example, our task might be to model some data using a decision tree or machine learning algorithm. We can then use the “rpart” package for this and it will give us a bunch of functions designed to help us carry out this specific task.
So, to do this, we first need to install and download the package and then load it into R and then call the functions within that package. So, for example, the first step would be to install the package - takes a moment to download, okay, great. Once it’s installed, we will load it into R and now we can start using this package.
The functions within this package such as the “rpart” function within the “rpart” package. So, for example, gonna use it for modeling and gonna call this function here. Okay, cool. So, to see what kind of inputs or arguments we need to give a function, so to know what is the required inputs, we could look up the documentation for this. Now, you can look up R documentation for the package which lists all the available functions that that package offers. You might want to see everything that’s included in base R, for example, or you can look up the function itself. Either way, the documentation explains what the function does and the kind of inputs you give it. It also includes some examples of how to use the function in that package. The documentation can be accessed through simple command lines and I’ll show you what I mean here.
We want to look up the package and you either see this pop up on your screen or in this bottom right panel in Rstudio here. Under the “help” tab, we can see all the available packages. Sorry, all available functions within this package. If we just click on the “rpart” function here, we can see the kind of inputs that are needed for this function. You can look up the documentation for a particular function rather than the whole package if you like. So, it’s a similar command, just give it the name of the function and it goes straight to the documentation on this function.
Now you’re ready to make the most of R’s pre-built functions which you’ll be using a lot to help carry out many different data science tasks.
In the next video, I’ll introduce you to control statements and why they’re useful.
Rebecca Merrett - Rebecca holds a bachelor’s degree of information and media from the University of Technology Sydney and a post graduate diploma in mathematics and statistics from the University of Southern Queensland. She has a background in technical writing for games dev and has written for tech publications.
© Copyright – Data Science Dojo