Suggestions for clustering methods - r

I have two time series of meteorological measurements (i.e., X and Y). Both X and Y time series were constructed using daily measurements over a period of one year. By plotting X time series versus Y times series as a scatterplot and connecting all the points by date in ascending order, a closed loop is obtained representing the annual cycle. I have measurements at N locations and thus I have N loops (i.e., annual cycles) which I want to cluster to find those that have similar shapes.
With so many clustering methods, I am not sure which one will be more appropriate to use for this analysis (initially I was
thinking to use self-organizing maps).
Thank you very much for any suggestions.

Unless you have too many time series, I suggest to start with hierarchical clustering. It's easy to interpret because of the dendrogram.
For similarity, a cyclic version of DTW may be good, assuming that there is some delay between different locations.

Related

time series with multiple observations per unit of time

I have a dataset of the daily spreads of 500 stocks. My eventual goal is to make a model using extreme value theory. However as one of the first steps, I want to check my data for volatility clustering and leptokurticity. So I first want R to see my data as a time series and I want to plot my data. However, I only find examples of time series with only one observation per unit of time. Is there a possibility for R to treat my type of dataset as a time series? And what's the best way to plot it?

DBSCAN on high dense dataset. R

I've been recently studying DBSCAN with R for transit research purposes, and I'm hoping if someone could help me with this particular dataset.
Summary of my dataset is described below.
BTIME ATIME
1029 20001 21249
2944 24832 25687
6876 25231 26179
11120 20364 21259
11428 25550 26398
12447 24208 25172
What I am trying to do is to cluster these data using BTIME as x axis, ATIME as y axis. A pair of BTIME and ATIME represents the boarding time and arrival time of a subway passenger.
For more explanation, I will add the scatter plot of my total data set.
However if I split my dataset in different smaller time periods, the scatter plot looks like this. I would call this a sample dataset.
If I perform a DBSCAN clustering on the second image(sample data set), the clustering is performed as expected.
However it seems that DBSCAN cannot perform cluster on the total dataset with smaller scales. Maybe because the data is too dense.
So my question is,
Is there a way I can perform clustering in the total dataset?
What criteria should be used to separate the time scale of the data
I think the total data set is highly dense, which was why I tried clustering on a sample time period.
If I seperate my total data into smaller time scale, how would I choose the hyperparameters for each seperated dataset? If I look at the data, the distribution of the data is similar both in the total dataset and the seperated sample dataset.
I would sincerely appreciate some advices.

[R+zoo]: Operations on time series with different temporal resolutions

I have two time series (sensor data) with different temporal resolutions. A time series from the class "xts / zoo" (TS1) includes hourly values and the other time series (TS2) has a better temporal resolution (one observation every 10 minutes). I.e. for TS1 I have 24 data points (observations) per day and for TS2 I have 144 data points per day.
When I calculate TS1-TS2 for one day I get a result with 24 data points (low temporal resolution). What I would like to achieve is to obtain a result with 144 data points (as TS2, better temporal resolution).
Is it possible to achieve this in R?
P.S.:
That's no a trivial problem because in an hourly interval I just have one observation from TS1 and 6 observations from TS2, so I could imagine this problem can be solved if one draws a fit line between every two points of TS1 and calculate the difference between the line and the data points from TS2. But I know no R Function to do this.
You can approximate missing values using na.approx for linear/constant approx or na.spline for polynomial one.
## new index to be used
new.index <-
seq(min(index(TS1)),max(index(TS1)), by=as.difftime(10,units='mins'))
## linear approx
TS1.new <- na.approx(merge(TS1 ,xts(NULL,new.index)))
Now you can susbtract your ts, (even if maybe you should check that they have same start dates)
TS2-TS1.new

How to compare two forecasted graph for two different time series in R?

Actually I want to compare the forecasted graph for two different time series data. I have data for 5 year for two different city of rain data which has been observed monthly. For that I have plotted the graph for 5 years of period of time series and also for 2 more year in future using forecast package for both city. Now I want to compare graph these two graphs and their future prediction for 2 years(may be in terms of error).
Can anyone help me out of these.
You could start with something like this:
f1 <- forecast(series1, h=24)
f2 <- forecast(series2, h=24)
accuracy(f1)
accuracy(f2)
That will give you a lot of error measures on the historical data. Unless you have the actual data for the future periods, you can't do much more than that.

Fast Fourier Transform in R

I have a dataset with the number of hourly visits an animal made during a period of 12 months. I want to use the Fast Fourier Transform to examine cyclical patterns and periodicity. In the past, I have used Statistica for this this; however, I would like to use R to get a plot of the spectral density vs. period. Is there an easy way to do this in R? I would like to identify 12 and 24 hr peak in activity if possible.
You may consider the following functions.
periodogram from TSA package immediately plots a periodogram.
periodogram from GeneCycle returns a list of frequencies and estimated power spectral densities. It is a wrapper function for stats::spectrum with some special options set.
spectrum from stats allows you to choose the method used to estimate the spectral density: either periodogram or using autoregressive process.
cpgram from stats plots a cumulative periodogram along with a confidence interval.
See, e.g., ?cpgram or ?spectrum for all the details and keep in mind that it is, e.g., TSA::periodogram and GeneCycle::periodogram when names of the functions coincide.
There are also plenty of examples and tutorials online on how to use those functions. See here for the usage of fft and here for an even more extensive tutorial.
Also, as you probably already know, a given time series must be detrended. Hence, use, e.g., diff(x) instead of x. And finally, the length of your time series must be divisible by 12 as to be able to identify 12 and 24 hours frequencies, it can be achieved by, e.g., x[-(1:(length(x) %% 12))], where x is a detrended time series.
Use spectrum to do a spectral density analysis; also fft for the base fast Fourier transform.

Resources