Term frequency-inverse document frequency, commonly referred to as TF-IDF, is used to show the relevancy of a term within a document. We will discuss on how the Document-Term frequency matrix representation can be improved and an introduction of the mighty term Frequency-Inverse document frequency.
What You'll Learn
> How to deal with documents of unequal lengths.
> What to do about terms that are very common across documents.
> TF for dealing with documents of unequal lengths.
> IDF for dealing with terms that appear frequently across documents.
> Implementation of TF-IDF using R functions and applying them to document-term frequency matrices.
> Data cleaning of matrices post weighting/transformation.
Text Analytics tutorial slides can be accessed here
Download R here
SMS Spam Collection Dataset used in this tutorial can be accessed here
Data Science Dojo Instructor - Data Science Dojo is a paradigm shift in data science learning. We enable all professionals (and students) to extract actionable insights from data.
© Copyright – Data Science Dojo