Read and Write Data in R
It goes without saying that in data science you will often be reading in and writing out data in R. R has several built-in interfaces for text data reading, writing, and understanding how to utilize these will add valuable tools to your R toolset.
Check out how our Data Science Bootcamp can help you reinforce this skill.
What You'll Learn
If you haven’t installed R and Rstudio already, you can watch "Getting started with Python and R for Data Science" video to get started.
For the dataset used in this exercise, download from here.
When learning R for data science, it goes without saying that you’ll need to learn how to read in and write out data. Data is often stored in comma-separated file format, so we’ll learn how to read and write data in this format but the functions and syntax for reading data in other formats are very similar.
So, here is our income data set which is a comma-separated file that sits in our "Documents" folder. What we want to do is read this data in a simple single command line. This data set is accompanying this video. You can find it on the learning portal. So, we’ll give our data set a variable name and we’ll simply call it "income". Then, we’ll use the read CSV function and we give this function the path or the folder directory of where our data sits. So, in my case, it’s within "users" and, after the forward-slash, I can simply tab and navigate my way there. Otherwise, you’re welcome to type this out by hand.
Okay, now I have headers in my columns but if I didn’t, I could always set header equals false. As I do have headers, it will automatically read this in and it also infers the data types. Now that we have given it the full path with quotation marks around the string, let’s run this line, and we can use the head function here to see if it’s read the data incorrectly. The head function just gives us the first few rows of data. Cool. We have just read in our data as a data frame. We’ll explain more what we mean by data frame later on.
In Windows, you need to give it the full path or directory. You need to use a kind of double backslashes. Windows interpret this as the path. So, for example, inside the read CSV function, you’d simply just use a double backslash and it usually sits in your C Drive.
Now, another way you can read in this data set is, instead of giving the read CSV function the full path, we could set up a working directory to where all our files are, the files that we’ll be working with. So, then we only have to use the filename and extension when reading in the data set. So, for example, I’ll use the set WD function here and I’ll give it my working directory. So, everything I’m gonna use sits within "Documents" and when I read in my data here, I’ll only really need to use the filename and extension-making life a little bit easier.
You can also read in a data set using Rstudio if you like. So, in this bottom right panel here, if you look under the files tab, you’ll see "Documents" where our data is stored. Just click on this and then double click on this and import the data set and that’s another way you can read it into R.
Now, to write out data we use the write CSV function you might want to write out the entire data set. So, we just give it the name of our data set or it could be a variable within our data set, the outcome variable, or the predictions column, for example. Maybe you want to write that to a CSV file and we have to give it the directory to where we want to store our CSV file and the name of our CSV file as well. So, I’ll just put it in the same place it’s been putting everything else and I’m just gonna call it avgincome.csv. We might not want to include row numbers or the index as a column in our CSV file. If that’s the case, we can always set row names equals false.
Okay, let’s run this, and let’s have a look. Okay, great. So, as you can see, we have successfully written data to a CSV file.
In the next video, we’ll discuss data frames.
Rebecca Merrett - Rebecca holds a bachelor’s degree of information and media from the University of Technology Sydney and a post graduate diploma in mathematics and statistics from the University of Southern Queensland. She has a background in technical writing for games dev and has written for tech publications.
© Copyright – Data Science Dojo