In this series of Introduction to Text Analytics with R, we will focus on data pipelines which is a series of processes to migrate data from source to destination.
What You'll Learn
> Exploration of textual data for pre-processing “gotchas”
> Using the quanteda package for text analytics
> Creation of a prototypical text analytics pre-processing pipeline, including (but not limited to): tokenization, lower casing, stop word removal, and stemming
> Creation of a document-frequency matrix used to train machine learning models
Text Analytics Tutorial slides can be accessed here
Download R here
SMS Spam Collection Dataset used in this tutorial can be accessed here
Data Science Dojo Instructor - Data Science Dojo is a paradigm shift in data science learning. We enable all professionals (and students) to extract actionable insights from data.
© Copyright – Data Science Dojo