Scheduling Your Script Using taskscheduleR
We will use taskscheduleR to set up our automated web scraping script to run as a background task on our computer. This will save us from manually running the script in R. The scripts will be scheduled to run hourly, as we are able to grab text on Bitcoin events from the last hour or so.
What You'll Learn
> To write an automated web scraping script in R
Packages used in this tutorial are
For Cron R installation click here
See our Beginning R Programming series to get up-to-speed with basic R commands.
The R full script for this video tutorial can be accessed here
Hi, welcome to this Data Science Dojo tutorial on part two of automating the task of web scraping for analyzing text.
In part one of this video tutorial you followed along while we wrote an R script to scrape hourly bitcoin use, summarize the text and send the summary in an email alert to ourselves In part two we’ll set up our web scraping script to run as a background task on our computer so you don’t need to manually run the script in R yourself. The script will be scheduled to run hourly as we’re grabbing text on bitcoin events from the last hour or so. You can access the full script, just see the link below the video. You can set this up as a task in task schedule in windows, or you can do this through Rstudio itself I’ll show you how to use the task scheduler R package to easily schedule your web scraping script in Rstudio. Otherwise, you can check out our KD Nuggets tutorial link below on how to set this up using the task schedule interface Now if you’re using OSX, automated would be the equivalent tool for this and for Linux It would be genome schedule. And to do this in R studio the equivalent package to task scheduler R for Mac users and for Linux is called cronR. The installation of cronR is fairly simple and a link is provided below on how to do this.
The functions in cronR are similar to those in R task scheduler R too. Let’s go ahead and install and load taskscheduleR into R. So just uncomment and run this line to install and don’t forget to also run this line to load it into R Now you can use the add-in interface. If you prefer to use the add-in interface just install these packages here and then once installed go to the add-ins drop-down menu at the top here select schedule R scripts on windows and you can upload your script here and you can schedule it hourly, daily, weekly or whatever time frame you’re interested in and just hit create task. Now, you might want to have your output data and logs go into another directory folder on your computer otherwise by default it goes into the taskscheduleR extension data folder inside your R folder. And also these interfaces are very similar in cronR, too. Okay, cool. So let’s just use some taskscheduleR functions to schedule your web scraping script to have it run every hour So we’ll use the taskscheduleR create function And now we’re going to give it all our inputs into the function so we’ll start with giving our task a name. And I think I’ll just call it R web scraping Bitcoin.
We’ll give it the full path to where R scripts sits, so in my case it’s Let’s just sit here my Web auto scripts folder And if you’re in Windows don’t forget to use double backslash. And we want to schedule our script to run every hour. And we’ll input the start time. Now, you can specify a start time, but I’m just happy to go with the default time, which is my current time, or my system time and have it like kick off within sixty two seconds. And I’ll just follow that hour/minute format. And you can also specify a date but once again, I’m happy just to go with the default, which is my current date. Now, you just need to make sure that this matches your computer systems date format. So in my case it’s month followed by day. And we’re also going to give it the R executable file to run our R script. And this usually sits within the bin folder in R. let’s run this. Okay, the output states this was successful, so we have now set up our web scraping script to run every hour. It’s also just a good idea,
So, the reason why you want to check your logs is to see if there were any errors Running during the script causing it to halt or anything like that. So, any data saved as the output and the logs are stored in the same directory path of where your script lies and We gave this to task schedule function up here this path here. And basically, I’ve put little print statements in my web scraping script to help with debugging and such and my actual output is email alert. So, I’ll either receive an updated email summarizing Bitcoin events within the last hour or not Just a couple of other things that I wanted to show you it’s just basically like You might decide to stop running your script later or delete it altogether and in this case you simply want to use the taskscheduleR stop function and just feed it to the task So let’s go ahead and do that. I’ll just copy and paste the thing I’ve used up here. Okay, great we’ve successfully kind of stopped our task and to delete it, we’ll just use the task scheduler delete function and that’s it.
After, you know, successfully creating a task you can close out of Rstudio and this will run in the background of your computer Something to take note of, you do need to keep your computer on in order to have the script run in the background. So you just can’t let it go to sleep Power consumption is something you might want to think about You can change your power and sleep settings in windows and if you’re, you know, using a Mac It would be your power and it’ll be your energy saving settings.
Thanks for watching. If you found the video tutorial useful, give us a like. Otherwise, You can check out our other video tutorials
Rebecca Merrett - Rebecca holds a bachelor’s degree of information and media from the University of Technology Sydney and a post graduate diploma in mathematics and statistics from the University of Southern Queensland. She has a background in technical writing for games dev and has written for tech publications.
© Copyright – Data Science Dojo