Natural Language Processing with R Programming Books
Natural Language Processing is a key Data Science skill. Learn how to can expand your R programming knowledge with Text Analytics.
It is my firm conviction that Natural Language Processing/Text Analytics is a must-have skill for any practicing Data Scientist. From analyzing customer
feedback in NSAT surveys to scraping Microsoft’s internal job postings for analyzing popular technical skills, to segmenting customers via textual features, I have universally found that Text Analytics is a wildly useful skill.
Not surprisingly, I am often asked by students of our Bootcamp, folks that I mentor on Data Science and my LinkedIn contacts about the subject of Text Analytics. The good news is that there are many great resources for the R programmer to learn Text Analytics. What follows is a practical curriculum where
the only required knowledge is basic R programming skills. I have read all of the books referenced below and can attest that studying the curriculum will have you mastering Text Analytics in no time!
is quite simply the best, most straightforward introduction to working with text that I have found. Professor Jockers illustrates many of the fundamentals using out of the box R programming. This book provides a great foundation for anyone looking to
get started in Text Analytics with R.
is the next stop on the Text Analytics journey. While this book is primarily written for Java programmers, there is a lot of theory that is immensely useful for R programmers learning to work with text. Additionally, the book covers the OpenNLP Java library
which is available to R programmers via the excellent openNLP package.
while focused primarily on the problem of search, nevertheless contains a wealth of theory and understanding (e.g., the Vector Space Model) to
take the R programmer to the next level. The text is language agnostic, is quite excellent, and free!
While the Natural Language Toolkit (NLTK) is Python-based, the book on the subject of NLP is a wealth of goodness to the R programmer. I put this resource
last in the list as learning the above conceptual material and R packages provides the necessary background to translate some of the concepts (e.g., chunking) into the R context. Awesome stuff and free to boot!
There you have it, a practical curriculum for the R programmer to ramp into Text Analytics. Don’t hesitate to reach out if you have any questions or comments – I monitor my blog almost continually.
Until next time, happy data sleuthing!