## TF-IDF

**Course Description**

Term frequency-inverse document frequency, commonly referred to as TF-IDF, is used to show the relevancy of a term within a document. We will discuss on how the Document-Term frequency matrix representation can be improved and an introduction of the mighty term Frequency-Inverse document frequency.

**What You'll Learn**

** > **How to deal with documents of unequal lengths.

**> **What to do about terms that are very common across documents.

**> **TF for dealing with documents of unequal lengths.

**> **IDF for dealing with terms that appear frequently across documents.

**> **Implementation of TF-IDF using R functions and applying them to document-term frequency matrices.

**> **Data cleaning of matrices post weighting/transformation.

Text Analytics tutorial slides can be accessed **here**

Download R **here**

SMS Spam Collection Dataset used in this tutorial can be accessed **here**

**Data Science
Dojo Instructor** - Data
Science Dojo is a paradigm shift in data science learning. We enable all
professionals (and students) to extract actionable insights from data.