There are many blogs and tutorials that teach you how to scrape data from a bunch of web pages once and then you’re done. But one-off web scraping is not useful for many applications that require sentiment analysis on recent or timely content or capturing changing events and commentary or analyzing trends in real-time. As fun as it is to do an academic exercise of web scraping for one-off analysis on historical data, it is not useful when wanting to use timely or frequently updated data.
You would like to tap into news sources to analyze the political events that are changing by the hour and people’s comments on these events. These events could be analyzed to summarize the key discussions and debates in the comments, rate the overall sentiment of the comments, find the key themes in the headlines, see how events and commentary change over time, and more. You need a collection of recent political events or news scraped every hour so that you can analyze these events.
What You'll Learn
- To automatically web scrape using rvest periodically so you can analyze frequently updated data
- To write standard web scraping commands in R using rvest, filtering timely data, analyzing or summarizing key information in the text, and sending an email alert of the results of your analysis
- To set up a script to run every hour so that text is scraped and analyzed periodically to capture changing events and commentary, or analyze trends in real-time
Rebecca Merrett - Rebecca holds a bachelor’s degree of information and media from the University of Technology Sydney and a post graduate diploma in mathematics and statistics from the University of Southern Queensland. She has a background in technical writing for games dev and has written for tech publications.