Mean Absolute Error for Evaluation
In this video, we will discuss the evaluation of time series model predictions using absolute mean error, and Python's statistics and matplotlib packages. You will learn how to plot the differences between the actual and predicted values, and to calculate the mean absolute error to create the ARIMA model. The video also refers to important issues when modeling time series.
What You'll Learn
> Evaluating of time series model using Python
> Plotting actual versus predicted values
> Comparing values for the time series
> Problems with time series models
Code, R & Python Script Repository can be accessed here.
Hi, welcome back to this Data Science Dojo video tutorial series on time series.
In part two we left it at modeling our data and predicting five timestamps ahead into the future. In part three we’ll evaluate our predictions and see how far off the month they were to the actual values in our holdout data, or in the last five timestamps of our full sample data set.
So now we’re going to plot actual versus predicted. We’re going to get two versions of our time series so we’re going to have all our values with the last five being actual values and then we’re going to overlay the plot with all the values again but with the last five being predicted values, and we should see some difference between those actual and predicted values and the last five timestamps. So first we’re going to read in our entire sample which includes our last five values as our actual values. And once again we’re going to use pandas read csv function. Gonna read in our full sample, or entire dataset. And once again, we’ll use our date time column as our index column which is the first column, and we will parse these dates.
We’ll use this squeeze option to return a series. Now, I want to print the row values, or the index values, of the last of our last five values or a holdout set, as we’re going to input these into another series. So the way to get this, we’ll just call index for those values. And we’re going to get the last five here for our actual. I’m gonna get the index values for these starting at 19, going to 23, 24. And we’re going to print these out, so we can have a look as well. Alright, so these values here is basically the time stamps for our holdout set. It’s going to input these into another series with our prediction values. I’m going to tie our predicted values to each of their time stamps. So another way you can read in a time series is using this series function here, we’re wanting to read from a CSV before, but you can do it this way as well. Give it our predicted values, and we’re gonna create row index. I’m gonna paste in these values here, just so you can see the last five timestamps. But you can just feed it that, you know, index for values variable. I’ll clean these up a bit. Okay, great. And let’s print this just to make sure that it is in a correct format. Alright, let’s have a look.
So we have our predicted values tied to their time stamps now in a series, and what we’re going to do is append that on to our training set, so we have, as I said, one version of our series with the predicted values, and one version with the actual. So let’s go ahead and do that. You can comment these out, as we no longer need to print them. And I’m just going to print the tail end of this, just to make sure it appended onto the end of the drawing set. Okay, let’s have a look here. Okay, great. So it looks like it successfully appended onto the training set here. So now we have a full series with predicted values and a full series with actual. Okay, now let’s plot the actual versus predicted. I’m going to create a plot here. We’ll start with our predictive values, and I’m just gonna plot them in the color orange. And I’m gonna give it a label so I can add a legend later. And I’m gonna do the same for actual, obviously. And I’ll just color this a different color, so maybe blue. And I’ll also give it a label. And I’m also going to create a legend for this, so we can differentiate these lines. I’ll just place it in the upper left, it’s pretty reasonable location.
So, let’s have a look at our actual versus predicted. See if it was way off the mark or not. Okay, so having a look at this, the predicted kind of follows the same kind of general downward pattern as the actual. It’s quite off the mark here, but we can’t tell exactly how far off the mark. So we need to calculate the mean absolute error as a way of seeing how big are these differences between actual and predicted, so let’s go ahead and do that. So I’ll comment these out. And we’re going to calculate the mean absolute error to evaluate the model and see if there’s a big difference between actual values and the predicted values. And average over these. So first of all we’ll get our actual values and our holdout set. And we’ll just get the index starting at 19, ending 23. So our last five values of our holdout set. We’ll do the same for predicted.
Now we’re going to basically go through and compare each value so we’re going to take the first actual value, and minus the first predicted value and then we’ll take the second actual value and minus the second predicted value and so on and so forth. And so we’re going to have all these values over the differences between the two, we’re going to store them in an array called prediction errors. And then at the end of that we’re just going to average over their absolute values to get an idea of, you know, the mean absolute error or the overall error rate here. So, for example, you can take the first actual value, minus the first predictive value. And we’re going to pin that onto our predictions error array. And we want to have a quick look at these differences. See if they’re quite big or not. Alright, let’s have a look at these. Between tabbing and having four spaces, the war begins. We use four spaces in this instance. Just make sure that’s consistent because Python is kind of a language that kind of has these issues all the time. So let’s run this again. Okay, so here are our differences between actual and predicted. So they don’t seem too bad. In some cases they might be quite far off the mark,
Considering that we have values that go six places after the decimal point. Zero point two, zero point two five might be quite big of a difference. But the way to really judge this is to average over them their absolute values, we’ll store it in the variable called mean absolute error and we’re going to obviously get the mean first and use the statistics package for this. Look at the mean of the absolute values. And that’s pretty much it. And the absolute values of our prediction errors. And we obviously want to print this, so let’s have a look at it. Okay, so our mean absolute error is about 0.02, so it’s here. So that basically means that it’s off the mark for about 0.02, so it’ll We have to be underestimating or overestimating, but considering, as I said, like, there’s six values past this decimal point, maybe this is quite a big difference. Maybe it’s not too big of a deal. It’s something that we need to consider here. You’d have to think of this and decide whether you would accept this model as it is.
There are a few problems to be aware of in this model. For one, the data might be not entirely stationary, so even though it looked fairly stationary to our judgement when we were plotting it before, a test would help better determine this. So what we could do is use the augmented dickey-fuller test to check if those two rounds of differences that we did resulted in a stationary data or not. So let’s have a look here and see why we’re getting a relatively big mean absolute error. And we’re going to print the p-value for this test, so if the p-value is greater than 0.05, which is our significance level, we’ll accept the null hypothesis as the data is non-stationary and if it’s less than or equal to 0.05 we’re going to reject that null hypothesis and say that the data is stationary. So if we want it to be stationary, we want to see it less than or equal to 0.05. Let’s see if this is the case. Okay, let’s print this and have a look. Okay, so we probably wouldn’t accept the model as it is because it’s confirmed that we have stationary issues with our data, it’s not completely stationary yet. So this could be a reason why it’s a bit off the mark.
So then we need to look at better transforming this data. One way you could do this is you could look at say stabilizing the variance by applying maybe the cube root which can take into account negative and positive values. And then you can difference the data. You might also want to compare models with different AR and MA terms, so remember when we printed the summary of our model and there were some terms that weren’t really significant enough to be included in the model, maybe you look at running a model just with one MA term and see if that makes a difference to the results. Also, another thing to consider is, this is a very small sample size of only 24 timestamps in our entire dataset, 19 in our train set. There might not be enough data to spare for a holdout set. So then to get more out of your data for training, you could look at rolling over time series or time stamps at a time for different holdout sets. And this allows you to train on more time stamps, so it doesn’t stop the model from capturing the last chunk of time stamps stored in a single holdout set.
Another thing is that the data only looks at 24 hours in one day. I mean, would we stop to capture more of a trend in hourly sentiment if we collected data over several days? How would you go about collecting more data? So that’s something else to think about. So what I would like you to do now, is take on this challenge and further improve on this model. So you’ve been given a head start, now I want you to take this example and improve on it. Sometimes we get into the habit of just following along and copying what somebody else is doing, but I want you to think critically about this, and think about some of the issues that we talked about and how you can take this further. To study time series further, you also need to understand things like model diagnostics, using the AIC to search for best model parameters. You need to be able to handle any daytime data issues. You might want to try other modeling techniques. So, time series is something that we plan to introduce in Data Science Dojo’s post bootcamp material, but you can learn more during a short sort of intense bootcamp. We cover some key machine learning algorithms and techniques, and we take you through the critical thinking process behind many data science tasks.
You can check out the curriculum below this video. But keep fine-tuning and keep practicing. Thanks for watching.
Rebecca Merrett - Rebecca holds a bachelor’s degree of information and media from the University of Technology Sydney and a post graduate diploma in mathematics and statistics from the University of Southern Queensland. She has a background in technical writing for games dev and has written for tech publications.
© Copyright – Data Science Dojo