TF-IDF






     

    Course Description

    Term frequency-inverse document frequency, commonly referred to as TF-IDF, is used to show the relevancy of a term within a document. We will discuss on how the Document-Term frequency matrix representation can be improved and an introduction of the mighty term Frequency-Inverse document frequency.

    What You'll Learn

      >  How to deal with documents of unequal lengths.

      >  What to do about terms that are very common across documents.

      >  TF for dealing with documents of unequal lengths.

      >  IDF for dealing with terms that appear frequently across documents.

      >  Implementation of TF-IDF using R functions and applying them to document-term frequency matrices.

      >  Data cleaning of matrices post weighting/transformation.



     

    Text Analytics tutorial slides can be accessed here

    Download R here

    SMS Spam Collection Dataset used in this tutorial can be accessed here



     

    Data Science Dojo Instructor - Data Science Dojo is a paradigm shift in data science learning. We enable all professionals (and students) to extract actionable insights from data.