It’s really important to know your main r data types so you can check what kind of values you’re working with when modeling data, or when casting it as a certain data type. We'll discuss how to check numeric data types from integers to floating-point numbers, negative and positive numbers, as well as character/strings and logical data types.
What You'll Learn
> Basics of numeric and character data types and how to cast to a different data type
If you haven’t installed R and Rstudio already, you can watch "Getting started with Python and R for Data Science" video to get started.
For the dataset used in this exercise, download from here.
When working with tabled datasets, it’s good to know your data types of your atomic vectors or your variables or columns. The good thing about R, unlike most other languages, is that R will automatically infer the data types of columns when reading a data set into R, so you don’t need to manually tell R the data type for every single column but it’s really important to know your main data types so you can check what kind of values you’re working with when modeling data or when casting as a certain data type.
So, let’s first look at your numeric types, or numbers, that measure things in your data. You can get a whole number or an integer you can get a number with a fraction or a floating-point number A floating-point can be more than one number before or after the decimal point. Something to note when printing the result of a floating-point number, R usually rounds this up to five places after the decimal point. So, it won’t just print, you know, an infinitely long set of numbers after the decimal point. You can also have negative numbers and so the same basically applies. So, R classes all these kinds of numbers as one data type called numeric. If you use the class function here and you give it any of the numbers that we’ve mentioned, you’ll see that it’s classed it as numeric.
Another common data type you’re going to often see in many data sets is character. This could be a single character or it could be a string of characters. It could also be a number represented as a string. All these are classed as character in R so if we use the class function again and we give it any of the string characters you’ll see that it cast it as character. You can also use double quotation marks so that words with apostrophes are not incorrectly interpreted as single marks around a string. For example, “won’t” using an apostrophe. Another way you can write this is to use the escape backslash to read it as a literal apostrophe and not a quotation mark around a string For example, “won’t”.
The datatype character can be used for text strings or unique names of things otherwise they can be cast as factor levels or categories. Later in the video series, we’ll discuss the R object factor logical or boolean data types are also common so where there’s kind of a true or false value to the presence of something or not. For example, a variable on benign cancer might show its values as some people to be true in having this cancer and some people to be false in not having this cancer. So, if we input true or false into our class function here, you’ll see that it properly classes it as a logical data type and that’s it.
You now have an understanding of the main data types you’re likely to get in your data sets. In the next video, we’re going to cover variables.
Rebecca Merrett - Rebecca holds a bachelor’s degree of information and media from the University of Technology Sydney and a post graduate diploma in mathematics and statistics from the University of Southern Queensland. She has a background in technical writing for games dev and has written for tech publications.
© Copyright – Data Science Dojo