I have some time series in graphite and I'd like to do a forecast of how the series will continue into the future.
It feels really simple. Append the changes from the last 7 days to the current time making it stretch 7 days into the future.
But I've found nothing. It doesn't even look possible to have the x/time-axis stretch into the future.
Is this really not possible?
No, this is not possible natively in Graphite although you could use it with R to do this. Here's one example.
https://roidelapluie.be/blog/2015/05/13/r-and-graphite/
Related
I am writing a machine learning project (I am quite new to this) and now I have gotten a little stuck as to what to do next.
I have 2, somewhat small datasets, one of them has timestamps for when the output has happened, the other one is the same but has the input timestamps, they are in a format: year/month/day/hour/minute/second.
I have tried to do quite a bit of feature engineering and split these columns, as well as looked into the difference between the nearest inputs, and nearest outputs to see understand the time lags better as well as try to see the downtime. I have done a lot of visualizations to see where I can go from here and now I am quite stuck. There isn't any obvious patterns that I can see.
I do not need to do time series forecasting, and am now trying to do anomaly detection on what I have.
My issue is that I have no idea what I should do with this next, maybe you have some advice on what algorithms I can apply?
I am also stuck to see whether I am able to connect the input to its output timestamp, is there any obvious ways that are usually applied to do that?
I mainly want to see patterns, and deviations in the data, I have tried looking at scrap data that is generated. I do not really know what are the good models/experiments to apply and try out in my case.
is there any data mining methods you could advise me to use?
It sounds like you are on the right track!
Here are some ideas to consider:
Is there a trend by day of week? Are weekends peak or not?
Does the hour of the day combined with day of week make a difference?
Have you looked at volume in combination with other variables? A spike in traffic on Wednesday night at 2am could be a red flag.
Basically I'd try to code in seasonality, hour, day of week, month, year, etc. into your data.
Link: How to use machine learning for anomaly detection and condition monitoring;
Mahalanobis distance
I have a question on using R studio to take Arima test on more than one time series.
For example, I have three clients with three time series in 4 periods.
Client 1 client 2 client 3
1 3 7
2 5 3
4 3 1
5 8 9
Now I want to predict the next period after 5/8/9. I know how to use Arima to predict time series one by one, but in practice I have lots of clients and it will take too much time. Could u plz teach me how to do a loop or use lapply or so on to make things easier?
Also, when picking the order of Arima, I only know to use Ident to generate figures of ACF and PACF to tell the orders of MA and AR, which will not work on a large amount of time series - I feel unwise to draw hundreds of figures. Do you have any good advice to tell the order of Arima? Thank you!
apply would help you apply the same function to every column (if you have each column as one time-series). Please learn how to use it, there are a lot of examples on internet.
If you have a lot of time-series, and you want to avoid manual work, auto.arima should come in handy. In case you're not satisfied with its results:
Try finding general rules first. That is, if you know that each of the time-series would be seasonal(with same season-length), you know that you need seasonal-differencing for all. Same thing can be said about long-term trend. These inferences could also be made algorithmically.
Irrespective of whether 1 is applicable or not, in order to decide best value for parameters, you have to code up the logic which you use to decide AR and MA parameters manually. The simpler your manual logic, the easier its to code. One simple rule of thumb is to keep AR + MA = 1, chose AR if ACF falls gradually, MA if it falls rapidly. What is rapid and what is gradual would have to be decided by your code only.
i have a problem with clustering time series in R.
I googled a lot and found nothing that fits my problem.
I have made a STL-Decomposition of Timeseries.
The trend component is in a matrix with 64 columns, one for every series.
Now i want to cluster these series in simular groups, involve the curve shapes and the timely shift. I found some functions that imply one of these aspects but not both.
First i tried to calculte a distance matrix with the dtw-distance so i
found clusters based on the values and inply the time shift but not on the shape of the timeseries. After this i tried some correlation based clustering, but then the timely shift
we're not recognized and the result dont satisfy my claims.
Is there a function that could cover my problem or have i to build up something
on my own. Im thankful for every kind of help, after two days of tutorials and examples i totaly uninspired. I hope i could explain the problem well enough to you.
I attached a picture. Here you can see some example time series.
There you could see the problem. The two series in the middle are set to one cluster,
although the upper and the one on the bottom have the same shape as one of the middle.
Have you tried the R package dtwclust
https://cran.r-project.org/web/packages/dtwclust/index.html
(I'm just starting to explore this package, but it seems like it covers a lot of aspects of time series clustering and it has lots of good references.)
you can use the kml package. It is used specifically to longitudinal data. You can consult its help. It has the next example:
### Generation of some data
cld1 <- generateArtificialLongData(25)
### We suspect 3, 4 or 6 clusters, we want 3 redrawing.
### We want to "see" what happen (so printCal and printTraj are TRUE)
kml(cld1,c(3,4,6),3,toPlot='both')
### 4 seems to be the best. We want 7 more redrawing.
### We don't want to see again, we want to get the result as fast as possible.
kml(cld1,4,10)
Example cluster
First time poster here, so please forgive any faux pas on my part.
I have a set of data which consists of essentially 3 fields:
1)Position 2)Start_of_shift (datetime object) 3)End_of_Shift (datetime object)
From the datetime object I can extract date, day of week, & time. The schedules are 24/7 and do not conform to any standard 3 shift etc. rotation, they are fairly specific to a site. (I am using the lubridate package)
I would like to visualize Time of day vs. Day of Week to show numbers of staff, so that I can see heavy concentrations of staff and where I am light at specific days and times.
I am unsure on how to approach this problem as I am relatively new to R and I have found the various date time packages & base utilities confusing and often conflicting with each other. While I find plenty of examples of time series plotting, I have found next to nothing on how to plot if you have a start and end time in separate fields and want to show areas of overlap
I was thinking of using ggplot2 with geom_tile to plot this out, with a smoother, but wanted to know if there were any good examples out there that do something similar or if anyone has any idea on how I should transform my data to best achieve my end objective. I wanted to keep the time continuous but as a last resort I will discretize it into 15 minute chunks if necessary, but didn't know if there were other options?
Any thoughts?
You might consider using a gannt chart, the gannt.chart function in the plotrix package is one option for creating them.
Maybe the timeline package is what you need. I've found it very good for planning projects. It's on CRAN, but you can see a quick example at it's Github home here.
To work out how many people are present (or should be if it's a future event) you need to think of your staffing as a stock / flow.
First step would be to use the melt function in package reshape2 to get all the dates in one column and the event (starting / finishing) in another.
From this you can create a running total of how many people will be in at any time.
In R, how can you use Holt-Winters smoothing for a financial ("business-day")-based time series?
(For example, a stock data time series has an irregular time index).
You don't, for the reasons I gave you in response to your previous question today: because HoltWinters needs ts, you cannot (easily) use it on irregular time series.
You can approximate it by, say, sampling every Wednesday and creating 52-week years from that. But there is no way around the basic fact that "business day"-based series are irregular.
As Dirk said there is no solid way to do this. Even if it runs (gamma=F) it will use a fixed gain on each observation, that is, it will ignore the fact that a week-end is 3 times longer than your other delta times.
It gets worse with intraday data. I think your best bet is to implement the Holt Winters filter yourself. It's actually not all that hard...