Wind Speed time series simulation in R - r

Following up from an R blog which is interesting and quite useful to simulate the time series of an unknown area using its Weibull parameters.
Although this method gives a reasonably good estimate of time series as a whole it suffers a great deal when we look for seasonal changes.
Lets see an example:
This method would give wind speeds for say below months, for a particular set of Weibull parameters as:
Jan 7.492608
Feb 7.059587
March 7.261821
Apr 7.192106
May 7.399982
Jun 7.195889
July 7.290898
Aug 7.210269
Sept 7.219063
Oct 7.307073
Nov 7.135451
Dec 7.315633
It can be seen that the variation in wind speed is not that much and in reality, the variation will change from one month to another. If I were to prioritise a certain month say July and June over months of November and December such that the Weibull remains unchanged. How would I do it?
Any lead or advice to make these change in the code listed in the link above would be of great help.
On request here is the sample code.
MeanSpeed<-7.29 ## Mean Yearly Wind Speed at the site.
Shape=2; ## Input Shape parameter.
Scale=8 ##Calculated Scale Parameter.
MaxSpeed<-17
nStates<-16
These are the inputs in the blog, the MeanSpeed is the average annual wind speed at a location that has Shape and Scale parameters as provided. The MaxSpeed is the maximum speed possible over the year.
I would like to have Maxspeed for each month say Maxspeed_Jan, Maxspeed_feb ...till Maxspeed_dec. All with different values. This should be able to reflect the seasonallity in the Wind Speed variations across the year.
Then Calculate the following in a certain way that would reflect this variation in the output time series.
nRows<-nStates;
nColumns<-nStates;
LCateg<-MaxSpeed/nStates;
WindSpeed=seq(LCateg/2,MaxSpeed-LCateg/2,by=LCateg) ## Fine the velocity vector-centered on the average value of each category.
##Determine Weibull Probability Distribution.
wpdWind<-dweibull(WindSpeed,shape=Shape, scale=Scale); # Freqency distribution.
plot(wpdWind,type = "b", ylab= "frequency", xlab = "Wind Speed") ##Plot weibull probability distribution.
norm_wpdWind<-wpdWind/sum(wpdWind); ## Convert weibull/Gaussian distribution to normal distribution.
## Correlation between states (Matrix G)
g<-function(x){2^(-abs(x))} ## decreasing correlation function between states.
G<-matrix(nrow=nRows,ncol=nColumns)
G <- row(G)-col(G)
G <- g(G)
##--------------------------------------------------------
## iterative process to calculate the matrix P (initial probability)
P0<-diag(norm_wpdWind); ## Initial value of the MATRIX P.
P1<-norm_wpdWind; ## Initial value of the VECTOR p.
## This iterative calculation must be done until a certain error is exceeded
## Now, as something tentative, I set the number of iterations
steps=1000;
P=P0;
p=P1;
for (i in 1:steps){
r<-P%*%G%*%p;
r<-as.vector(r/sum(r)); ## The above result is in matrix form. I change it to vector
p=p+0.5*(P1-r)
P=diag(p)}
## $$ ----Markov Transition Matrix --- $$ ##
N=diag(1/as.vector(p%*%G));## normalization matrix
MTM=N%*%G%*%P ## Markov Transition Matrix
MTMcum<-t(apply(MTM,1,cumsum));## From the MTM generated the accumulated
##-------------------------------------------
## Calculating the series from the MTMcum
##Insert number of data sets.
LSerie<-52560; Wind Speed every 10 minutes for a year.
RandNum1<-runif(LSerie);## Random number to choose between states
State<-InitialState<-1;## assumes that the initial state is 1 (this must be changed when concatenating days)
StatesSeries=InitialState;
## Initallise----
## The next state is selected to the one in which the random number exceeds the accumulated probability value
##The next iterative procedure chooses the next state whose random number is greater than the cumulated probability defined by the MTM
for (i in 2:LSerie) {
## i has to start on 2 !!
State=min(which(RandNum1[i]<=MTMcum[State,]));
## if (is.infinite (State)) {State = 1}; ## when the above condition is not met max -Inf
StatesSeries=c(StatesSeries,State)}
RandNum2<-runif(LSerie); ## Random number to choose between speeds within a state
SpeedSeries=WindSpeed[StatesSeries]-0.5+RandNum2*LCateg;
##where the 0.5 correction is needed since the the WindSpeed vector is centered around the mean value of each category.
print(fitdistr(SpeedSeries, 'weibull')) ##MLE fitting of SpeedSeries
The obtained result must resemble the input Scale and Shape parameters. And instead of getting uniform wind speed of each month the variation will reflect the input max wind speeds of each month.
Thank you.

Related

How to create and analyze a time series with variable test frequency in R

Here is a short description of the problem I am trying to solve: I have test data for multiple variables (weight, thickness, absorption, etc.) that are taken at varying intervals over time - no set schedule, sometimes a test a day, sometimes days might go between tests. I want to detect trends in each of these and alert stake holders when any parameter is trending up/down more than a certain amount. I first did a linear model between each variable's raw data and test time (I converted the test time to days or weeks since a fixed date) and create a table with slopes for each variable - so the stake holders can view one table for all variables and quickly see if any of them is raising concern. The issue was that the data for most variables is very noisy. Someone suggested using time series functions, separating noise and seasonality from the trends, and studying the trend component for a cleaner analysis. I started to look into this and see a couple concerns/questions already:
Time series analysis seems to require specifying a frequency - how do you handle this if your test data is not taken at regular intervals
If one gets over the issue in #1 above, decomposes the data, and gets the trend separated out (ie. take out particularly the random variation/noise), how would you then get a slope metric from that? Namely, if I wanted to then fit a linear model to the trend component of the raw data (after decomposing), what would be the x (independent) variable? Is there a way to connect the trend component of the ts-decompose function with the original data's x-axis data (in this case the actual test date/times, say converted to weeks or days from a fixed date)?
Finally, is there a better way of accomplishing what I explained above? I am only looking for general trends over time - say over 3 months of data, not day to day trends.
Time series are generally used to see if previous observations of a variable have influence on future observations. You would model under the assumption that the previous observations are able to predict the future observations. That is the reason for that most (not all) time series models require evenly spaced instances of training data. If your data is not only very noisy, but also not collected on a regular basis, then you should seriously consider if time series is the appropriate choice of modelling.
Time series analysis seems to require specifying a frequency - how do you handle this if your test data is not taken at regular intervals.
What you can do, is creating an aggregate by increasing the time bucket (shift from daily data to a weekly average for instance) such that every unit of time has an instance of training data. Following your final comment, you could create the average of the observations of the last 3 months of data instead from the observations.
If one gets over the issue in #1 above, decomposes the data, and gets the trend separated out (ie. take out particularly the random variation/noise), how would you then get a slope metric from that? Namely, if I wanted to then fit a linear model to the trend component of the raw data (after decomposing), what would be the x (independent) variable?
In the simplest case of a linear model, the independent variable is the unit of time corresponding to the prediction you are trying to make. However this is not always regarded a time series model.
In the case of an autoregressive model, this would be the previous observation of what you are trying to predict, something similar to y(t) = x(t-1), for instance multiplied by a smoothing factor. I encourage you to read Forecasting: principles and practice which is an excellent book on the matter.
Is there a way to connect the trend component of the ts-decompose function with the original data's x-axis data (in this case the actual test date/times, say converted to weeks or days from a fixed date)?
The function decompose.ts returns a list which includes trend. Trend is a vector of the estimated trend components corresponding to it's respective time value.
Let's create an example time series with linear trend
df <- data.frame(
date = seq(from = as.Date("2021-01-01"), to = as.Date("2021-01-10"), by=1)
)
df$value <- jitter(seq(from = 1, to = nrow(df), by=1))
time_series <- ts(df$value, frequency = 5)
df$trend <- decompose(time_series)$trend
> df
date value trend
1 2021-01-01 0.9170296 NA
2 2021-01-02 1.8899565 NA
3 2021-01-03 3.0816892 2.992256
4 2021-01-04 4.0075589 4.042486
5 2021-01-05 5.0650478 5.046874
6 2021-01-06 6.1681775 6.051641
7 2021-01-07 6.9118942 7.074260
8 2021-01-08 8.1055282 8.041628
9 2021-01-09 9.1206522 NA
10 2021-01-10 9.9018900 NA
As you see, the trend component already is an estimate of the dependent variable at the corresponding time. In decompose the estimate of trend is based on a moving average.

How to eliminate zeros in simulated data from rnorm function

I have a large set of high frequency data of wind. I use this data in a model to calculate gas exchange between atmosphere and water. I am using the average wind of a 10-day series of measurements to represent gas exchange at a given time. Since the wind is an average value from a 10-day series I want to apply the error to the output by adding the error to the input:
#fictional time series, manually created by me.
wind <- c(0,0,0,0,0,4,3,2,4,3,2,0,0,1,0,0,0,0,1,1,4,5,4,3,2,1,0,0,0,0,0)
I then create 100 values around the mean and sd of the wind vector:
df <- as.data.frame(mapply(rnorm,mean=mean(wind),sd=sd(wind),n=100))
The standard deviation generates negative values. If these are run in the gas exchange model I get disproportionately large error simply because wind speed can't be negative and the model is not constructed to be capable to run with negative wind measurements. I have been suggested to log transform the raw data and run the rnorm() with logged values, and then transform back. But since there are several zeros in the data (0=no wind) I can't simply log the values. Hence I use the log(x+c) method:
wind.log <- log(wind+1)
df.log <- as.data.frame(mapply(rnorm,
mean=mean(wind.log),
sd=sd(wind.log),n=100))
However, I will need to convert values back to actual wind measurements before running them in the model.
This is where it gets problematic, since I will need to use exp(x)-c to convert values back and then I end up with negative values again.
Is there a way to work around this without truncating the 0's and screwing up the generated distribution around the mean?
My only alternative is otherwise is to calculate gas exchange directly at every given time point and generate a distribution from that, those values would never be negative or = 0 and can hence be log-transformed.
Suggestion: use a zero-inflated/altered model, where you generate some proportion of zero values and draw the rest from a log-normal distribution(to make sure you don't get negative values):
wind <- c(0,0,0,0,0,4,3,2,4,3,2,0,0,1,0,0,0,0,1,1,4,5,4,3,2,1,0,0,0,0,0)
prop_nonzero <- mean(wind>0)
lmean <- mean(log(wind[wind>0]))
lsd <- sd(log(wind[wind>0]))
n <- 500
vals <- rbinom(n, size=1,prob=prop_nonzero)*rlnorm(n,meanlog=lmean,sdlog=lsd)
Alternatively you could use a Tweedie distribution (as suggested by #aosmith), or fit a censored model to estimate the distribution of wind values that get measured as zero (assuming that the wind speed is never exactly zero, just too small to measure)

Periodogram (TSA In R) can't find correct frequency

I'm trying to process a sinusoidal time series data set:
I am using this code in R:
library(readxl)
library(stats)
library(matplot.lib)
library(TSA)
Data_frame<-read_excel("C:/Users/James/Documents/labssin2.xlsx")
# compute the Fourier Transform
p = periodogram(Data_frame$NormalisedVal)
dd = data.frame(freq=p$freq, spec=p$spec)
order = dd[order(-dd$spec),]
top2 = head(order, 5)
# display the 2 highest "power" frequencies
top2
time = 1/top2$f
time
However when examining the frequency spectrum the frequency (which is in Hz) is ridiculously low ~ 0.02Hz, whereas it should have one much larger frequency of around 1Hz and another smaller one of 0.02Hz (just visually assuming this is a sinusoid enveloped in another sinusoid).
Might be a rather trivial problem, but has anyone got any ideas as to what could be going wrong?
Thanks in advance.
Edit 1: Using
result <- abs(fft(df$Data_frame.NormalisedVal))
Produces what I am expecting to see.
Edit2: As requested, text file with the output to dput(Data_frame).
http://m.uploadedit.com/bbtc/1553266283956.txt
The periodogram function returns normalized frequencies in the [0,0.5] range, where 0.5 corresponds to the Nyquist frequency, i.e. half your sampling rate. Since you appear to have data sampled at 60Hz, the spike at 0.02 would correspond to a frequency of 0.02*60 = 1.2Hz, which is consistent with your expectation and in the neighborhood of what can be seen in the data your provided (the bulk of the spike being in the range of 0.7-1.1Hz).
On the other hand, the x-axis on the last graph you show based on the fft is an index and not a frequency. The corresponding frequency should be computed according to the following formula:
f <- (index-1)*fs/N
where fs is the sampling rate, and N is the number of samples used by the fft. So in your graph the same 1.2Hz would appear at an index of ~31 assuming N is approximately 1500.
Note: the sampling interval in the data you provided is not quite constant and may affect the results as both periodogram and fft assume a regular sampling interval.

Adding seasonal variations to wind speed time series

Following up from an R blog which is interesting and quite useful to simulate the time series of an unknown area using its Weibull parameters.
Although this method gives a reasonably good estimate of time series as a whole it suffers a great deal when we look for seasonal changes. To account for seasonal changes I want to employ seasonal maximum wind speeds and carry out the time series synthesis such that the yearly distribution remains constant ie. shape and scale parameters (annual values).
I want to employ seasonal maximum wind speeds to the below code by using 12 different maximum wind speeds, one each for every month. This will allow greater wind speeds at certain month and lower in others and should even out the resultant time series.
The code follows like this:
MeanSpeed<-7.29 ## Mean Yearly Wind Speed at the site.
Shape=2; ## Input Shape parameter (yearly).
Scale=8 ##Calculated Scale Parameter ( yearly).
MaxSpeed<-17 (##yearly)
## $$$ 12 values of these wind speed one for each month to be used. The resultant time series should satisfy shape and scale parameters $$ ###
nStates<-16
nRows<-nStates;
nColumns<-nStates;
LCateg<-MaxSpeed/nStates;
WindSpeed=seq(LCateg/2,MaxSpeed-LCateg/2,by=LCateg) ## Fine the velocity vector-centered on the average value of each category.
##Determine Weibull Probability Distribution.
wpdWind<-dweibull(WindSpeed,shape=Shape, scale=Scale); # Freqency distribution.
plot(wpdWind,type = "b", ylab= "frequency", xlab = "Wind Speed") ##Plot weibull probability distribution.
norm_wpdWind<-wpdWind/sum(wpdWind); ## Convert weibull/Gaussian distribution to normal distribution.
## Correlation between states (Matrix G)
g<-function(x){2^(-abs(x))} ## decreasing correlation function between states.
G<-matrix(nrow=nRows,ncol=nColumns)
G <- row(G)-col(G)
G <- g(G)
##--------------------------------------------------------
## iterative process to calculate the matrix P (initial probability)
P0<-diag(norm_wpdWind); ## Initial value of the MATRIX P.
P1<-norm_wpdWind; ## Initial value of the VECTOR p.
## This iterative calculation must be done until a certain error is exceeded
## Now, as something tentative, I set the number of iterations
steps=1000;
P=P0;
p=P1;
for (i in 1:steps){
r<-P%*%G%*%p;
r<-as.vector(r/sum(r)); ## The above result is in matrix form. I change it to vector
p=p+0.5*(P1-r)
P=diag(p)}
## $$ ----Markov Transition Matrix --- $$ ##
N=diag(1/as.vector(p%*%G));## normalization matrix
MTM=N%*%G%*%P ## Markov Transition Matrix
MTMcum<-t(apply(MTM,1,cumsum));## From the MTM generated the accumulated
##-------------------------------------------
## Calculating the series from the MTMcum
##Insert number of data sets.
LSerie<-52560; Wind Speed every 10 minutes for a year.
RandNum1<-runif(LSerie);## Random number to choose between states
State<-InitialState<-1;## assumes that the initial state is 1 (this must be changed when concatenating days)
StatesSeries=InitialState;
## Initallise----
## The next state is selected to the one in which the random number exceeds the accumulated probability value
##The next iterative procedure chooses the next state whose random number is greater than the cumulated probability defined by the MTM
for (i in 2:LSerie) {
## i has to start on 2 !!
State=min(which(RandNum1[i]<=MTMcum[State,]));
## if (is.infinite (State)) {State = 1}; ## when the above condition is not met max -Inf
StatesSeries=c(StatesSeries,State)}
RandNum2<-runif(LSerie); ## Random number to choose between speeds within a state
SpeedSeries=WindSpeed[StatesSeries]-0.5+RandNum2*LCateg;
##where the 0.5 correction is needed since the the WindSpeed vector is centered around the mean value of each category.
print(fitdistr(SpeedSeries, 'weibull')) ##MLE fitting of SpeedSeries
Can anyone suggest where and what changes I need to make to the code?
I don't know much about generating wind speed time series but maybe those guidelines can help you improve your code readability/reusability:
#1 You probably want to have a function which will generate a wind speed time
serie given a number of observations and a seasonal maximum wind speed. So first try to define your code inside a block like this one:
wind_time_serie <- function(nobs, max_speed){
#some code here
}
#2 Doing so, if it seems that some parts of your code are useful to generate wind speed time series but aren't about wind speed time series, try to put them into functions (e.g. the part you compute norm_wpdWind, the part you compute MTMcum,...).
#3 Then, the part of your code at the beginning when your define global variable should disappear and become default arguments in functions.
#4 Avoid using endline comments when your line is already long and delete the ending semicolumns.
#This
State<-InitialState<-1;## assumes that the initial state is 1 (this must be changed when concatenating days)
#Would become this:
#Assumes that the initial state is 1 (this must be changed when concatenating days)
State<-InitialState<-1
Then your code should be more reusable / readable by other people. You have an example below of those guidelines applied to the rnorm part:
norm_distrib<-function(maxSpeed, states = 16, shape = 2, scale = 8){
#Fine the velocity vector-centered on the average value of each category.
LCateg<-maxSpeed/states
WindSpeed=seq(LCateg/2,maxSpeed-LCateg/2,by=LCateg)
#Determine Weibull Probability Distribution.
wpdWind<-dweibull(WindSpeed,shape=shape, scale=scale)
#Convert weibull/Gaussian distribution to normal distribution.
return(wpdWind/sum(wpdWind))
}
#Plot normal distribution with the max speed you want (e.g. 17)
plot(norm_distrib(17),type = "b", ylab= "frequency", xlab = "Wind Speed")

Preventing a Gillespie SSA Stochastic Model From Running Negative

I have produce a stochastic model of infection (parasitic worm), using a Gillespie SSA. The model used the "GillespieSSA"package (https://cran.r-project.org/web/packages/GillespieSSA/index.html).
In short the code models a population of discrete compartments. Movement between compartments is dependent on user defined rate equations. The SSA algorithm acts to calculate the number of events produced by each rate equation for a given timestep (tau) and updates the population accordingly, process repeats up to a given time point. The problem is, the number of events is assumed Poisson distributed (Poisson(rate[i]*tau)), thus produces an error when the rate is negative, including when population numbers become negative.
# Parameter Values
sir.parms <- c(deltaHinfinity=0.00299, CHi=0.00586, deltaH0=0.0854, aH=0.5,
muH=0.02, SigmaW=0.1, SigmaM =0.8, SigmaL=104, phi=1.15, f = 0.6674,
deltaVo=0.0166, CVo=0.0205, alphaVo=0.5968, beta=52, mbeta=7300 ,muV=52, g=0.0096, N=100)
# Inital Population Values
sir.x0 <- c(W=20,M=10,L=0.02)
# Rate Equations
sir.a <- c("((deltaH0+deltaHinfinity*CHi*mbeta*L)/(1+CHi*mbeta*L))*mbeta*L*N"
,"SigmaW*W*N", "muH*W*N", "((1/2)*phi*f)*W*N", "SigmaM*M*N", "muH*M*N",
"(deltaVo/(1+CVo*M))*beta*M*N", "SigmaL*L*N", "muV*L*N", "alphaVo*M*L*N", "(aH/g)*L*N")
# Population change for even
sir.nu <- matrix(c(+0.01,0,0,
-0.01,0,0,
-0.01,0,0,
0,+0.01,0,
0,-0.01,0,
0,-0.01,0,
0,0,+0.01/230,
0,0,-0.01/230,
0,0,-0.01/230,
0,0,-0.01/230,
0,0,-0.01/32),nrow=3,ncol=11,byrow=FALSE)
runs <- 10
set.seed(1)
# Data Frame of output
sir.out <- data.frame(time=numeric(),W=numeric(),M=numeric(),L=numeric())
# Multiple runs and combining data and SSA methods
for(i in 1:runs){
sim <- ssa(sir.x0,sir.a,sir.nu,sir.parms, method="ETL", tau=1/12, tf=140, simName="SIR")
sim.out <- data.frame(time=sim$data[,1],W=sim$data[,2],M=sim$data[,3],L=sim$data[,4])
sim.out$run <- i
sir.out <- rbind(sir.out,sim.out)
}
Thus, rates are computed and the model updates the population values for each time step, with the data store in a data frame, then attached together with previous runs. However, when levels of the population get very low events can occur such that the number of events that occurs reducing a population is greater than the number in the compartment. One method is to make the time step very small, however this greatly increases the length of the simulation very long.
My question is there a way to augment the code so that as the data is created/ calculated at each time step any values of population numbers that are negative are converted to 0?
I have tried working on this problem, but only seem to be able to come up with methods that alter the values once the simulation is complete, with the negative values still causing issues in the runs themselves.
E.g.
if (sir.out$L < 0) sir.out$L == 0
Any help would be appreciated
I believe the problem is the method you set ("ETL") in the ssa function. The ETL method will eventually produce negative numbers. You can try the "OTL" method, based on Efficient step size selection for the tau-leaping simulation method- in which there are a few more parameters that you can tweak, but the basic command is:
ssa(sir.x0,sir.a,sir.nu,sir.parms, method="OTL", tf=140, simName="SIR")
Or the direct method, which will not produce negative number whatsoever:
ssa(sir.x0,sir.a,sir.nu,sir.parms, method="D", tf=140, simName="SIR")

Resources