r_subheading-Course Description-r_end Term frequency-inverse document frequency, commonly referred to as TF-IDF, is used to show the relevancy of a term within a document. We will discuss on how the Document-Term frequency matrix representation can be improved and an introduction of the mighty term Frequency-Inverse document frequency. r_break r_break r_subheading-What You'll Learn-r_end • How to deal with documents of unequal lengths. r_break • What to do about terms that are very common across documents. r_break • TF for dealing with documents of unequal lengths. r_break • IDF for dealing with terms that appear frequently across documents. r_break • Implementation of TF-IDF using R functions and applying them to document-term frequency matrices. r_break • Data cleaning of matrices post weighting/transformation.

Text Analytics tutorial slides can be accessed r_link-here- r_break r_break Download R r_link-here- r_break r_break SMS Spam Collection Dataset used in this tutorial can be accessed r_link-here-


