Skip to main content

Blog entry by Albar Wahab

Busting 5 Common Data Science Myths

Busting 5 Common Data Science Myths

Data science is an ever-growing field and often you’ll come across buzzwords surrounding it. Being a trendy field, sometimes you will come across statements about it that might be confusing or entirely a myth. In this blog, we will be looking into some of these myths, bust them, and ensure your doubts about data science are clarified!

What is Data Science? 

In simple words, data science involves using models and algorithms to extract knowledge from data available in various forms. The data could be large or small or could be structured such as a table or unstructured such as a document containing text and images containing spatial information. The role of the data scientist is to analyze this data and extract information from the data which can be used to make data-driven decisions. 

Now, let us dive into some of these myths 

The flawed data science compass

1. Data Science is all about building Machine Learning and Deep Learning Models 

Although building models is a key aspect of Data Science, it does not define the entirety of the role of a Data Scientist. In fact, a lot of work goes on before you proceed with building these models. There is a common saying in this field that is “Garbage in, garbage out.” Real-life data is rarely available in a clean and processed form, and a lot of effort goes into pre-processing this data to make it useful for building models. In fact, up to 70% of the time can be consumed in this process. 

This entire pipeline can be split up into multiple stages including acquiring, cleaning, and pre-processing data, visualization, analyzing, and understanding it, and only then are you able to build useful models with your data. If you are building machine learning models using the readily available libraries, your code for your model might end up being less than 10 lines! So, it is definitely not a complex part of your pipeline.  

2. Only people with a programming or mathematical background can become Data Scientists 

A very big misconception about the field of data science is that only people coming from certain backgrounds can pursue a career in it, which is in fact not the case at all! Data science is a handy tool that can help a business enhance its performance in almost every field. For example, human resource is a field that might be distant from statistics and programming, but it has a very good implementation of data science as a use case. IBM, by collecting employee data, has built an internal AI system that is able to predict when an employee might quit using machine learning. A person with domain knowledge about the human resource field will be the best fit for building this model.  

Regardless of your background, you can learn data science online with our top-rated data science courses from scratch. Join one of our top-rated programs including Data Science Bootcamp and Python for Data Science and get started on your data science journey!

Join our data science bootcamp today to start your career in the world of data. 

3. Data Analysts, Data Engineers, and Data Scientists all perform the same tasks 

Data Analysts and Data Scientists have overlapping responsibilities. Data Analysts carry out descriptive analytics, collecting current data and making informed decisions using it. For example, a data analyst might notice a drop in sales and will try to uncover the underlying cause using the collected company data. Data Scientists also make these informed business decisions; however, they involve using statistics and machine learning to predict the future! They use the same collection of data but use it to make predictive models that can predict future decisions and guide the company on the right actions to take before something happens. Data engineers on the other hand build and maintain data infrastructures and data systems. They’re responsible for setting up data warehouses and building databases where the collected data is stored. 

4. Large data results in more accurate models 

This myth might be partially wrong but partially right as well. Large data does not necessarily translate to higher accuracy of your model. More often, the performance of your model depends on how well you carry out the cleaning of your dataset and extraction of the features. After a certain point, the performance of your model will start to converge regardless of how much you increase the size of your dataset.

As per the saying “garbage in, garbage out”, if the data you have provided for the model is noisy and not properly processed, it is likely that the accuracy of the model will also be poor. Therefore, to enhance the accuracy of your models, you must ensure that the quality of the data you are providing is up to the mark. Only a greater quantity of relevant data will impact your model’s accuracy in a positive way! 

5. Data Collection is the easiest part of data science 

When learning how to build machine learning models, you would often go to open data sources and download a CSV or Excel file with a click of a button. However, data is not that readily available in the real world and you might need to go to extreme lengths to acquire it. Once acquired, it will not be formatted and in an unstructured form and you will have to pre-process it to make it structured or meaningful. It can be a difficult, challenging, and time-consuming task to source, collect and pre-process data. However, this is an important part because you cannot build a model without any data!

Data comes from numerous sources and is usually collected over a period by using automation or manual resources. For example, for building a health profile of a patient, data about their visits will be recorded. Telemetry data from their health device such as sensors can be collected and so on. This is just the case for one user. A hospital might have thousands of patients they deal with every day. Think about all the data! 

In this blog, we discussed some of the most common myths that we have encountered regarding data science. Please share with us some of the myths that you might have encountered in your data science journey.

Want to upgrade your data science skillset? checkout our Python for Data Science training.