Basic Vocabulary

    Course Description

    Get familiarized with the vocabulary you will need in all the upcoming courses. You'll find answers to very basic questions about data, data science, and various attributes of data.

    What You'll Learn

     > To understand data and data types
     > what is data quality and data preprocessing, etc.
     > To form the basis of the upcoming data science discussions

    The purpose of this particular webinar is to give you all some basic vocabulary and a very basic understanding of a number of different important topics regarding data science fundamentals. A lot of this talk is a vocabulary lesson, so it’s really important that you guys make sure you understand all the terms that I’m introducing and all the ways that they’re used. We’re gonna be covering a lot of material over the next couple of hours, so it is pretty aggressively paced, but we should be able to get through all of it. 

    All right. So, you see on your screen here the topics that we’re gonna be covering. We’re gonna be talking to start about data and data types and setting some groundwork for all the things we’ll be talking about over the course of the Bootcamp. Then we’re gonna talk about data quality and data pre-processing, which are very connected things. And, finally, we’re gonna talk about some similarity and dissimilarity metrics and also some data exploration and visualization. We’ll cover data exploration and visualization very briefly here. We’re gonna talk about it a lot more next week in the introduction to our webinar. 

    So, without further ado then, let’s start with data and data types. 

    “What is data?” is a very fundamental question that we can ask. And here’s where our vocabulary lessons start. So, data is a collection of objects that are defined by attributes. Attributes are the properties or characteristics of our objects. Every entry in our table here – and not all data can be represented nicely in a table, but a lot of it can be. In this case, the object – a data object – is a row and a data attribute is a column. We think of the attributes as being properties of the objects. So, the eye color of a person, the temperature, whether someone filed for a tax refund in the next year, what their taxable income was, those are all attributes of our data objects. 

    One of the struggles people sometimes have in getting into data science is that because data science is a synthesis of probably three or four completely distinct fields all coming together in one way, there are a lot of different terms for the same things in a lot of cases. This is our first encounter with that, and it’s going to show up again. 

    So, attribute is sort of a decent name for these ideas. But they’re also called variables and fields and characteristics and features and predictors. And if you’ve got tabular data, they’ll be called columns, sometimes. All of those different names refer to essentially the same thing: they’re all attributes, they're property or characteristic of our object. 

    Similarly, when we have our objectsour objects are then, basically, a collection of attributes. It’s kind of a circular definition but it’s what we’ve got. So, each object is defined by its exact attribute values. And objects – we’ll use the term data objects throughout this talk, but in general, objects have a lot of different names, you’ll see them called records and points and cases, samples, entities, entries, instances, all of that, and many more things. 

    You’ll also see a set of data called a data set but sometimes it will be called a table, and sometimes you’ll just hear, “Oh, yeah, we have our data,” referring to the set as a whole.

    Data Science Dojo Instructor - Data Science Dojo is a paradigm shift in data science learning. We enable all professionals (and students) to extract actionable insights from data.