Bandpassfilter R using fft - r

I have a time series z with sampling frequeny fs = 12(monthly data) and I would like to perform a bandpass filter using the fftat 10 months and 15 months. This is how I would proceed:
y <- as.data.frame(fft(z))
y$freq <- ..
y$y <- ifelse(y$freq>= 1/10 & y$freq<= 1/15,y$y,0)
zz <- fft(y$y, inverse = TRUE)/length(z)
plot zz in the time domain...
However, I don't know how to derive the frequencies of the fft and I don't know how to plot zz in the time domain. Can someone help me?

I have a function, that wraps fft() a bit:
function(y, samp.freq, ...){
N <- length(y)
fk <- fft(y)
fk <- fk[2:length(fk)/2+1]
fk <- 2*fk[seq(1, length(fk), by = 2)]/N
freq <- (1:(length(fk)))* samp.freq/(2*length(fk))
return(data.frame(fur = fk, freq = freq))
}
y is values of your signal, and samp.freq is it's sample frequency. It's output is data.frame with two columns - fur is complex numbers we get after fast fourier transform (Mod(fur) will be an amplitude, Arg(fur) - a phase) and freq is vector of corresponding frequencies.
But for frequency filtering I highly reccomend using signal package.
For example using Butterworth filter:
library('signal')
bf <- butter(2, c(low, high), type = "pass")
signal.filtered <- filtfilt(bf, signal.noisy)
In this case interval should be defined as c(Low.freq, High.freq) * (2/samp.freq), where Low.freq and High.freq - borders of frequency intervals. More information can be found in package documentation and octave reference guide.
Also, notice that with fft you can get only frequencies up to (sample frequency)/2.

Related

Maximum pseudo-likelihood estimator for soft-core point process

I am trying to fit a soft-core point process model on a set of point pattern using maximum pseudo-likelihood. I followed the instructions given in this paper by Baddeley and Turner
And here is the R-code I came up with
`library(deldir)
library(tidyverse)
library(fields)
#MPLE
# irregular parameter k
k <- 0.4
## Generate dummy points 50X50. "RA" and "DE" are x and y coordinates
dum.x <- seq(ramin, ramax, length = 50)
dum.y <- seq(demin, demax, length = 50)
dum <- expand.grid(dum.x, dum.y)
colnames(dum) <- c("RA", "DE")
## Combine with data and specify which is data point and which is dummy, X is the point pattern to be fitted
bind.x <- bind_rows(X, dum) %>%
mutate(Ind = c(rep(1, nrow(X)), rep(0, nrow(dum))))
## Calculate Quadrature weights using Voronoi cell area
w <- deldir(bind.x$RA, bind.x$DE)$summary$dir.area
## Response
y <- bind.x$Ind/w
# the sum of distances between all pairs of points (the sufficient statistics)
tmp <- cbind(bind.x$RA, bind.x$DE)
t1 <- rdist(tmp)^(-2/k)
t1[t1 == Inf] <- 0
t1 <- rowSums(t1)
t <- -t1
# fit the model using quasipoisson regression
fit <- glm(y ~ t, family = quasipoisson, weights = w)
`
However, the fitted parameter for t is negative which is obviously not a correct value for a softcore point process. Also, my point pattern is actually simulated from a softcore process so it does not make sense that the fitted parameter is negative. I tried my best to find any bugs in the code but I can't seem to find it. The only potential issue I see is that my sufficient statistics is extremely large (on the order of 10^14) which I fear may cause numerical issues. But the statistics are large because my observation window spans a very small unit and the average distance between a pair of points is around 0.006. So sufficient statistics based on this will certainly be very large and my intuition tells me that it should not cause a numerical problem and make the fitted parameter to be negative.
Can anybody help and check if my code is correct? Thanks very much!

How to extrapolate a raster using in R

I am trying to downscale climatic conditions using the methodology in this article using the R software. I am almost there, but I am missing a couple of steps
Packages and data needed
For this example I uploaded some data to the archive.org website to load the required packages and data used in this example use the following code:
library(raster)
library(rgdal)
download.file("https://archive.org/download/Downscaling/BatPatagonia.rds", "Bat.rds")
download.file("https://archive.org/download/Downscaling/TempMinPatNow.rds", "Tmin.rds")
BatPatagonia <- readRDS("Bat.rds")
TempMinPatNow <- readRDS("Tmin.rds")
BatPatagonia is a raster file with the Bathymetry and altitude of the area extracted and transformed from the GEBCO dataset, while the TempMinPatNow is the minimum temperature of the same area for January extracted from worldclim. The plots of the datasets are seen bellow:
The goal of this question
In order to downscale past data from the last glacial maximum I need to model how current climate would be like if the sea level was the same as it was in the past. In order to do that I use the GEBCO data and to figure out more or less were the coast was. According to the methodology in the article cited above this are the first three steps to follow:
Create a DEM for land above 20 meters above sea level
Calculate a Multiple Linear Regression in a moving window
Extrapolate coefficients to the ocean
Point 3 is what I have been struggling to do, I will show how I did the first 2 points, and show what I have been looking for trying to solve point 3
1. Create a DEM for land 20 meters above sea level
In order to do this I took the BatPatagonia raster, and replaced all values bellow 20 meters with NA values using the following code:
Elev20 <- BatPatagonia
values(Elev20) <- ifelse(values(Elev20) <= 20, NA, values(Elev20))
The resulting raster is shown in the following image
2. Calculate a Multiple Linear Regression in a moving window
According to the manuscript in page 2591, the next step is to do a Multiple Linear Regression in a moving window using the Following formula for altitudes over 20 meters:
We already have the elevation data, but we also need the rasters for latitude and longitude, for that we use the following code, where we first create the Latitude and Longitude Rasters:
Latitud <- BatPatagonia
Longitud <- BatPatagonia
data_matrix <- raster::xyFromCell(BatPatagonia, 1:ncell(BatPatagonia))
values(Latitud) <- data_matrix[, 2]
values(Longitud) <- data_matrix[, 1]
We will multiply this by a raster mask of the areas that have elevations over 20 meters, so that we get only the values that we need:
Elev20Mask <- BatPatagonia
values(Elev20Mask) <- ifelse(values(Elev20Mask) <= 20, NA, 1)
Longitud <- Elev20Mask*Longitud
Latitud <- Elev20Mask*Latitud
Now I will build a stack with the response variables and the predictor variables:
Preds <- stack(Elev20, Longitud, Latitud, TempMinPatNow)
names(Preds) <- c("Elev", "Longitud", "Latitud", "Tmin")
The resulting stack is shown in the following figure:
As stated in the paper the moving window should be of 25 by 25 cells, resulting in a total of 625 cells, however they state that if the moving window has less than 170 cells with data, the regression should not be performed, and it should have a maximum of 624 cells in order to ensure that we are only modelling areas close to the coast. The result of this Multiple Regression with the moving window should be a stack with the Local intercept, and the local estimation of each one of the Betas that are in the equation shown above. I found out how to make this using the following code using the getValuesFocal function (This loop takes a while):
# First we establish the 25 by 25 window
w <- c(25, 25)
# Then we create the empty layers for the resulting stack
intercept <- Preds[[1]]
intercept[] <- NA
elevationEst <- intercept
latitudeEst <- intercept
longitudeEst <- intercept
Now we start the code:
for (rl in 1:nrow(Preds)) {
v <- getValuesFocal(Preds[[1:4]], row = rl, nrows = 1, ngb = w, array = FALSE)
int <- rep(NA, nrow(v[[1]]))
x1 <- rep(NA, nrow(v[[1]]))
x2 <- rep(NA, nrow(v[[1]]))
x3 <- rep(NA, nrow(v[[1]]))
x4 <- rep(NA, nrow(v[[1]]))
for (i in 1:nrow(v[[1]])) {
xy <- na.omit(data.frame(x1 = v[[1]][i, ], x2 = v[[2]][i, ], x3 = v[[3]][i,
], y = v[[4]][i, ]))
if (nrow(xy) > 170 & nrow(xy) <= 624) {
coefs <- coefficients(lm(as.numeric(xy$y) ~ as.numeric(xy$x1) +
as.numeric(xy$x2) + as.numeric(xy$x3)))
int[i] <- coefs[1]
x1[i] <- coefs[2]
x2[i] <- coefs[3]
x3[i] <- coefs[4]
} else {
int[i] <- NA
x1[i] <- NA
x2[i] <- NA
x3[i] <- NA
}
}
intercept[rl, ] <- int
elevationEst[rl, ] <- x1
longitudeEst[rl, ] <- x2
latitudeEst[rl, ] <- x3
message(paste(rl, "of", nrow(Preds), "ready"))
}
Coeffs <- stack(intercept, elevationEst, latitudeEst, longitudeEst, (intercept + Preds$Elev * elevationEst + Preds$Longitud * longitudeEst + Preds$Latitud *latitudeEst), Preds$Tmin)
names(Coeffs) <- c("intercept", "elevationEst", "longitudeEst", "latitudeEst", "fitted", "Observed")
The result of this loop is the coeffs stack, show bellow:
This is where I got stuck:
Extrapolate coefficients to the ocean
The goal now is to extrapolate the first 4 rasters of the Coeffs stack (intercept, elevationEst, longitudeEst and latitudeEst) to where the coast should be according to the last glacial maximum which was 120 meters shallower
MaxGlacier <- BatPatagonia
values(MaxGlacier) <- ifelse(values(MaxGlacier) < -120, NA,1)
The projected coastline is shown in the following map:
The way the authors projected the coefficients to the coast was by filling the gaps using by solving poisson's equation using the poisson_grid_fill of the NCL language from NCAR. But I would like to keep it simple and try to do all in the same language. I also found a similar function in python.
I would be happy with any extrapolation process that works well, I am not limiting my search to that algorithm.
I found several r packages that fill gaps such as the Gapfill package and even found a review of methods to fill gaps, but most of them are for interpolating and mostly for NDVI layers that can be based on other layers where the gap is filled.
Any ideas on how to move froward on this?
Thanks
Thinking back several decades to my physics undergrad days, we used Laplace relaxation to solve these types of Poisson equation problems. I'm not sure, but I guess that may also be how poisson_grid_fill works. The process is simple. Relaxation is an iterative process where we calculate each cell except those that form the boundary condition as the mean of the cells that are horizontally or vertically adjacent, then repeat until the result approaches a stable solution.
In your case, the cells for which you already have values provide your boundary condition, and we can iterate over the others. Something like this (demonstrated here for the intercept coefficient - you can do the others the same way):
gaps = which(is.na(intercept)[])
intercept.ext = intercept
w=matrix(c(0,0.25,0,0.25,0,0.25,0,0.25,0), nc=3, nr=3)
max.it = 1000
for (i in 1:max.it) intercept.ext[gaps] = focal(intercept.ext, w=w, na.rm=TRUE)[gaps]
intercept.ext = mask(intercept.ext, MaxGlacier)
Edit
Here's the same process embedded in a function, to demonstrate how you might use a while loop that continues until a desired tolerance is reached (or maximum number of iterations is exceeded). Note that this function is to demonstrate the principle, and is not optimised for speed.
gap.fill = function(r, max.it = 1e4, tol = 1e-2, verbose=FALSE) {
gaps = which(is.na(r)[])
r.filled = r
w = matrix(c(0,0.25,0,0.25,0,0.25,0,0.25,0), nc=3, nr=3)
i = 0
while(i < max.it) {
i = i + 1
new.vals = focal(r.filled, w=w, na.rm=TRUE)[gaps]
max.residual = suppressWarnings(max(abs(r.filled[gaps] - new.vals), na.rm = TRUE))
if (verbose) print(paste('Iteration', i, ': residual = ', max.residual))
r.filled[gaps] = new.vals
if (is.finite(max.residual) & max.residual <= tol) break
}
return(r.filled)
}
intercept.ext = gap.fill(intercept)
intercept.ext = mask(intercept.ext, MaxGlacier)
plot(stack(intercept, intercept.ext))

How to calculate amplitude from spectrum()

I have a signal and I need to get the actual magnitude of a frequency found at spectrum()
Consider the following signal
f <- 5
n <- 500
signal <- 4*sin(2*pi*f*seq(0,10,1/n))
S.signal <- spectrum(signal, log="no")
Using spectrum() I get the following:
I can verify the amplitude of the peak using:
> max(S.signal$spec)
[1] 16698.45
How can I convert this value 16698.45 to the actual magnitude of the signal at that frequency 4 - or something close?
There is no relation between the amplitude of your signal and the amplitude of your spectrum here. The Fourier transform of a sinus is a delta function at the corresponding frequency, that is an infinitely narrow pic with an infinite amplitude.
The fact that you find a value for the amplitude of your spetrum is due to the sampling of your signal that cause a loss of information, You can see that here :
f <- 5
n <- 1000
signal <- 4*sin(2*pi*f*seq(0,10,1/n))
S.signal <- spectrum(signal, log="no")
max(S.signal$spec)
[1] 25261.03
You have better sampling, so you get a value closer to the real value of the spectrum (that is inifinity here).
A late answer, but in case it helps others. As previous answers state, it is not a question of how to convert the spectral density to an amplitude, but rather, having found a signal in our density spectra, how do we extract the amplitude at the dominant frequency. I found the custom function proposed in this post useful.
An example implementing it with original poster's example:
power_spec = function(y,samp.freq, ...){
N <- length(y)
fk <- fft(y)
fk <- fk[2:length(fk)/2+1]
fk <- 2*fk[seq(1, length(fk), by = 2)]/N
freq <- (1:(length(fk)))* samp.freq/(2*length(fk))
data.frame(amplitude = Mod(fk), freq = freq)
}
f <- 5
n <- 500
signal <- 4*sin(2*pi*f*seq(0,10,1/n))
x = power_spec(signal,samp.freq = 1/n)
plot(x$amplitude~x$freq,type='l',xlim=c(0,10))
We find a peak with an amplitude of 4 at f = 5.
Please up-vote the original post where this custom function came from if it helps you too!
If your signal is really like what you mentioned in your code, a sin() function, then you should only get a impulse/peak at one location, and anywhere else is simply zero.

Error is fsolve in R when trying to fill a matrix with values; looking for explanation and/or solution

Solution Found
I am trying to plot a volatility surface using "persp" in R. To do so I need to fill a matrix, z, with implied volatilities.
I have a data frame of the strike prices, time and market prices. Data only contains call options.
AAPL <- #data
df <- data.frame(AAPL$Strike.Price,AAPL$Time.Left,AAPL$Market.Price)
I currently have a matrix, zz, that has stock prices in the first column, times as the headers and the respective market prices in columns 2, 3 and 4. It is important to note that some values of the market prices are missing (NA).
zz <- cast(df, df.Strike.Price ~ df.Time.Left)
For my x, y axis, I define the vectors:
x0 <- zz$df.Strike.Price #Strike prices for calculation of imp. vol.
x <- zz$df.Strike.Price / 153.06 #Axis for plotting
y <- c(time1, time2, time3)
Now the z matrix for plotting implied volatility. I start with an empty matrix
z = matrix(data=NA,nrow=length(x0),ncol=length(y))
Then I attempt to fill the matrix, leaving NA for values that cannot be calculated
for(i in 1:length(x0)){
for(j in 1:length(y)){
#Formula for Black-Scholes call option price (no dividends)
BS = function(X,T,sigma){
#Parameters
S=153.06; r=0.05 #Stock value is same for all options, r is arbitrarily selected to be some constant.
d1 = (log(S/X) + (r + sigma^2/2)*T) / (sigma*sqrt(T))
d2 = d1 - sigma*sqrt(T)
#Price for call options
price = S*pnorm(d1) - X*exp(-r*T)*pnorm(d2)
return(price)
}
#To address NA entries in zz
if(is.na(zz[i,j+1] == TRUE)){
z[i,j] = NA
}
#This is the part of the code that causes issues
else{
#Function for fsolve, the Black-Scholes price minus the market price.
A = function(sigma){
a = BS(x0[i], y[j], sigma) - zz[i,j+1]
return(a)
}
V = fsolve(A, 0.5) #Should give me the implied volatility from market data.
z[i,j] = V
}
}
}
Upon executing this piece of code I get the error message:
Error in if (norm(s, "F") < tol || norm(as.matrix(ynew), "F") < tol) break :
missing value where TRUE/FALSE needed
I'm not sure what the error is about. Is there a way to overcome this problem or an alternative method to getting the implied volatilities instead of using fsolve?
The error is to do with the changes in sigma becoming too small for the function fsolve. I was able to find another function that could solve non-linear equations and used that instead.
The function was nleqslv from a package of the same name nleqslv.

R: looped variable assignment, augmenting variable calculation each time

I am trying to calculate a regression variable based on a range of variables in my data set. I would like the regression variable (ei: Threshold 1) to be calculated using a different variable set in each iteration of running the regression.
Aim to collected SSR values for each threshold range, and thus identify the ideal threshold based on the data.
Data (df) variables: Yield, Prec, Price, 0C, 1C, 2C, 3C, 4C, 5C, 6C, 7C, 8C, 9C, 10C
Each loop calculates "thresholds" by selecting a different "b" each time.
a <- df$0C
b <- df$1C
Threshold1 <- (a-b)
Threshold2 <- (b)
Where "b" would be changing in each loop, ranging from 1C to 9C.
Each individual threshold set (1 and 2) should be used to run a regression, and save the SSR for comparison with the subsequent regression utilizing thresholds based on a new "b" value (ranging from 1C TO 9C)
Regression:
reg <- lm(log(Yield)~Threshold1+Threshold2+log(Price)+prec+I(prec^2),data=df)
for each loop of the Regression, I vary the components of calculating thresholds in the following manner:
Current approach is centered around the following code:
df <- read.csv("Data.csv",header=TRUE)
names(df)
0C-9Cvarlist <- names(df)[9:19]
ssr.vec <- matrix(,21,1)
for(i in 1:length(varlist)){
a <- df$0C
b <- df$[i]
Threshold1 <- (a-b)
Threshold2 <- (b)
reg <- lm(log(Yield)~Threshold1+Threshold2+log(Price)+prec+I(prec^2),data=df)
r2 <- summary(reg)$r.squared
ssr.vec[i,] <- c(varlist,r2)
}
colnames(ssr.vec) <- c("varlist","r2")
I am failing to achieve the desired result with the above approach.
Thank you.
I can spot quite a few mistakes...
You need to add variables of interest (Threshold1 anf Threshold2) to the data in the regression. Also, I think that you need to select varlist[i] and not varlist to create your ssr.vec. You need 2 columns to your ssr.vec which is a matrix, so you should call it matrix. You also cannot use something like df$[i] to extract a column! Why is the matrix of length 21 ?! Change the column name to C0,..,C9 and not 0C,..,9C.
For future reference, solve the simple errors before asking question... and include error messages in your post!
This should do the job:
df <- read.csv("Data.csv",header=TRUE)
names(df)[8:19] = paste0("C",0:10)
varlist <- names(df)[9:19]
ssr.vec <- matrix(,21,2)
for(i in 1:length(varlist)){
a <- df$C0
b <- df[,i+9]
df$Threshold1 <- (a-b)
df$Threshold2 <- (b)
reg <- lm(log(Yield)~Threshold1+Threshold2+log(Price)+prec+I(prec^2),data=df)
r2 <- summary(reg)$r.squared
ssr.vec[i,] <- c(varlist[i],r2)
}
colnames(ssr.vec) <- c("varlist","r2")

Resources