Data Pipelines for Text Analytics


r_subheading-Course Description-r_end In this series of r_link-Introduction to Text Analytics with R- https://online.datasciencedojo.com/course/Text-Analytics-with-R-r_end, we will focus on data pipelines which is a series of processes to migrate data from source to destination. r_break r_break r_subheading-What You'll Learn-r_end • Exploration of textual data for pre-processing “gotchas”. r_break • Using the quanteda package for text analytics. r_break • Creation of a prototypical text analytics pre-processing pipeline, including (but not limited to): tokenization, lower casing, stop word removal, and stemming. r_break • Creation of a document-frequency matrix used to train machine learning models.

Text Analytics tutorial slides can be accessed r_link-here- https://code.datasciencedojo.com/datasciencedojo/tutorials/tree/master/Introduction%20to%20Text%20Analytics%20with%20R-r_end r_break r_break Download R r_link-here- https://cran.r-project.org/-r_end r_break r_break SMS Spam Collection Dataset used in this tutorial can be accessed r_link-here- https://www.kaggle.com/uciml/sms-spam-collection-dataset-r_end

-

Data Science Dojo Instructor - Data Science Dojo is a paradigm shift in data science learning. We enable all professionals (and students) to extract actionable insights from data.