I am trying to measure the bandwidth of a signal from the power spectra. I want to be able to extract the min and max values given a relative amplitude value. I have been using "seewave" to calculate the power spectra, and I can make a density plot, and provide the abline, but I cannot figure out how to get R to tell me where the abline intersects with the plot. I will need to change the relative amplitude values of interest, depending on the signal quality, but want to find a straightforward way to measure bandwidth using R. Thanks in advance!
power.spec <- spec(IBK.trill.1, flim=c(0,2))
pow.spec <- as.matrix(power.spec)
head(pow.spec)
# x y
# [1,] 0.000000000 0.007737077
# [2,] 0.007470703 0.029795630
# [3,] 0.014941406 0.021248476
# [4,] 0.022412109 0.015603801
# [5,] 0.029882813 0.014103307
# [6,] 0.037353516 0.014584454
freq <- pow.spec[1:2941,1]
head(freq)
# [1] 0.000000000 0.007470703 0.014941406 0.022412109 0.029882813 0.037353516
ampl <- pow.spec[,2]
head(ampl)
# [1] 0.007737077 0.029795630 0.021248476 0.015603801 0.014103307 0.014584454
plot(ampl ~ freq, type="l",xlim=c(0,2))
abline(h=0.45)
Save the results of the identification of "y" values that exceed your threshold:
wspec <- which( power.spec[, "y"] > 0.45)
Then used those indices to pull from the "x" values to place vertical lines at the first and last indices:
abline( v= power.spec[ c( wspec[1], tail(wspec, 1) ) , "x"], col="blue" )
BTW, I suggested the original "power.spec" values rather than your as.matrix version because spec returns a matrix so coercion is not needed. I tested this on the first example from the ?spec page. I suppose you could get real picky and try to take the mean of "x" where the thresholds were in excess and the ones just before and after. Which would then be:
abline( v= c( mean( myspec[ c( wspec[1]-1, wspec[1]), "x"]) ,
mean( myspec[ c( tail(wspec, 1), tail(wspec, 1)+1 ) , "x"]) ), col="blue" )
I did look at the differences with diff and the typical separation in my example was
mean( diff(myspec[ , "x"]) )
[1] 0.0005549795
So I could have gone back and ahead by half that amount to get a reasonable estimate. (I used this as my estimate for "half-height": max(myspec[, "y"])/2)
Related
I am using terra to get "curvy" distances between points within a bounding polygon and comparing those to straight-line distances that ignore the polygon. The results I'm getting back don't make sense, and I am hoping you all could help me figure out what is going on.
We load the US Congressional map used in the 114th Congress for the state of Texas first:
texas = dget("https://raw.githubusercontent.com/BajczA475/random-data/main/texascongressmaps")
ggplot() + geom_sf(data = texas$geometry)
We also make some storage objects:
longest.dist.district.straight = rep(NA, 36)
longest.dist.district.curved = rep(NA, 36)
Then, we go district by district (n = 36). For each, we take a sample of 100 random points within that district's polygon. Then, we ask "What is the longest straight-line distance between any two of our 100 points?" We then rasterize the polygon, mask it, and go point by point, asking "How far is this point from all others, assuming we cannot travel outside the polygon?" This means we'll have to bend around within the polygon to get between the points some of the time. We find the longest such distance between any two points. We then compare the straight-line and curvy-line approaches, with the assumption that the curvy-line approaches will always be longer by some amount...
for(c in 1:36) { #Texas had 36 districts.
if(c %% 3 == 0) {print(c)} # Progress bar
this.district = texas[c, ] #Get the current district
#We'll get a sample of 100 randomly placed points around the district.
rand.ptsDistrict = sf::st_sample(this.district,
size = 100,
type = 'random',
exact = TRUE)
#What's the max straight-line distance between any two points?
longest.dist.district.straight[c] = max(sf::st_distance(rand.ptsDistrict))
#Now, calculate our 'as the politician would walk' distances (aka curvy distances). We need to do this for each of our 100 points separately, with each as the target point in turn, and save the longest value we get...
current.raster = terra::ext(this.district) # Rasterizing
current.raster = terra::rast(current.raster,
nrow=100, ncol=100,
crs = crs(this.district),
vals = 1)
current.raster = terra::mask(current.raster, # Masking
terra::vect(this.district),
updatevalue = NA)
point.locs = terra::cellFromXY(current.raster, # Getting point locations in the new grid
sf::st_coordinates(rand.ptsDistrict))
longest.dists.i = rep(NA, 100) # Storage object
for(i in 1:100) {
point.i.loc = cellFromXY(current.raster, #Focal point this time.
st_coordinates(rand.ptsDistrict[i]))
point.noni.loc = cellFromXY(current.raster, #All other points
st_coordinates(rand.ptsDistrict[-i]))
terra::values(current.raster)[point.i.loc] = 2 # Make focal point the target value
all.dists = terra::gridDistance(current.raster, #Get all distances to the target value
target = 2, scale = 1)
longest.dists.i[i] = max(values(all.dists)[point.noni.loc], na.rm=TRUE) # Find the longest of these for this point and store it.
terra::values(current.raster)[point.i.loc] = 1
}
longest.dist.district.curved[c] = max(longest.dists.i) # Find the longest curved distance between any two points in the current district.
}
When I do this, I always get straight-line distances that are strictly longer than the curvy distances from the same district, which doesn't logically make sense--how could a straight line between two points ever be longer than a curvy line between them?
> (cbind(longest.dist.district.straight, longest.dist.district.curved))
longest.dist.district.straight longest.dist.district.curved
[1,] 239285.77 121703.64
[2,] 63249.88 48238.89
[3,] 49495.09 24823.91
[4,] 290542.38 147894.80
[5,] 213758.13 108663.63
[6,] 129261.83 68351.77
[7,] 36705.18 22081.22
[8,] 165759.58 87749.33
[9,] 38317.61 19903.54
[10,] 196211.38 100959.66
[11,] 505130.81 261479.58
[12,] 79502.87 45134.11
[13,] 604901.43 313317.24
[14,] 201724.57 115286.81
[15,] 414257.14 208204.75
[16,] 61867.34 32115.77
[17,] 193198.96 103829.75
[18,] 41693.26 26462.02
[19,] 433902.07 225041.00
[20,] 32201.45 17060.41
[21,] 212300.45 119597.54
[22,] 88143.49 46720.59
[23,] 777236.95 394663.54
[24,] 39692.06 21192.98
[25,] 299336.81 153871.46
[26,] 65901.64 35200.83
[27,] 272822.43 158724.70
[28,] 362477.84 205297.74
[29,] 40210.19 30094.43
[30,] 44693.37 23430.33
[31,] 93781.16 50340.85
[32,] 38941.81 21047.40
[33,] 52395.85 31169.46
[34,] 394586.71 206545.50
[35,] 138182.61 73556.10
[36,] 223351.15 112601.38
I can only guess I have either messed up the code somewhere or else have found a bug. Please help! Thanks!
Edit: I just noticed after posting this that it looks like if I were to multiply the curvy distances by 2, I'd get values that were believable (the curvy distances are always longer but by a variable amount)--but I don't see a coding reason to need to do this...can anyone else see one I'm missing?
You are comparing the shortest-distance ("as the crow flies" to those who have not seen crows fly) with the grid-distance (move from the center of a cell to the center of a neighboring cell), only allowing to use the grid cells that fall within a district.
When I run a condensed version of your code, I see that the distances are very similar, with the grid distance always longer, as they should be, except for district 14 as that district is not contiguous.
library(terra)
#terra 1.6.47
texas <- dget("https://raw.githubusercontent.com/BajczA475/random-data/main/texascongressmaps")
tex <- vect(texas)
# generate random points
set.seed(0)
b <- spatSample(tex[, "DISTRICT"], size = 100, method="random", strata=1:nrow(tex))
# max distance between any two random points by district.
pdist <- sapply(tex$DISTRICT, \(i) max( distance( b[b$DISTRICT == i, ])) )
# max grid distance between any two random points by district.
pgrid <- rep(NA, nrow(tex))
for (i in 1:nrow(tex)) {
r <- rast(tex[i,], nrow=100, ncol=100)
r <- rasterize(tex[i,], r)
xy <- crds(b[b$DISTRICT==i, ])
cells <- cellFromXY(r, xy)
maxdists <- rep(NA, 100)
for(j in 1:100) {
r[cells[j]] <- 2
dists <- gridDist(r, target=2)
# Find the longest of these for this point
maxdists[j] <- max( dists[ cells[-j] ], na.rm=TRUE)
r[cells[j]] <- 1
}
pgrid[i] <- max(maxdists)
}
The results look good:
head(cbind(pdist, pgrid))
# pdist pgrid
#1 217746.46 223906.22
#2 61707.87 99422.07
#3 50520.61 51479.98
#4 282744.13 293656.59
#5 196074.08 202014.45
#6 120913.60 126532.72
plot(pdist, pgrid)
abline(0, 1, col="red")
If your results are different you are perhaps using an older version of "terra"? I assume you are because you are using gridDistance which works with a warning because it was renamed to gridDist in the current version.
You use different grid cell sizes for each district. I do not know what your goal is, but it might be more reasonable to use a single template raster for all of Texas. You could do something like
# outside the loop
rr <- rast(tex, res=1/60, vals=1)
# inside the loop
r <- crop(rr, tex[i,], mask=TRUE)
I new new to R and am trying to program a pair trading strategy in R.
I have already written the code for downloading the data. And have created additional columns and prepared the data. Now i need to calculate the trading signals.
My signal rules are as follows.
- If Z-Score is greater than 2.25 , Sell the pair; Buy back when Z-Score is less than 0.25.
- If Z-Score is less than -2.25 , Buy the pair; sell (Exit) when z-score is above -0.25.
- close any open position if there is a change in signal.
When we sell a pair, we sell the first stock and buy the second stock. In this case, we sell ACC and Buy Ambujacem.
When we buy a pair, we buy the first stock and sell the second stock. In this case, we buy ACC and Sell Ambujacem.
Could anyone help me with the coding for the trading signals.
Enclosing the code.
Regards,
Subash
# Trading Code
library(quantmod)
getSymbols("ACC.NS", from=as.Date('2007-01-01'), to=as.Date('2015-07-24'))
getSymbols("AMBUJACEM.NS", from=as.Date('2007-01-01'), to=as.Date('2015-07-24'))
acc=ACC.NS[,6]
amb=AMBUJACEM.NS[,6]
t.zoo <- merge(acc, amb, all=TRUE)
t.zoo=as.data.frame(t.zoo)
typeof(t.zoo)
t.zoo=na.omit(t.zoo)
#adding columns
t.zoo$spread <- 0
t.zoo$adfTest <- 0
t.zoo$mean <- 0
t.zoo$stdev <- 0
t.zoo$zScore <- 0
t.zoo$signal <- 0
t.zoo$BuyPrice <- 0
t.zoo$SellPrice <- 0
t.zoo$LongReturn <- 0
t.zoo$ShortReturn <- 0
t.zoo$Slippage <- 0
t.zoo$TotalReturn <- 0
#preparing the data
#Calculating the pair ratio
t.zoo$pairRatio <- t.zoo$ACC.NS.Adjusted/t.zoo$AMBUJACEM.NS.Adjusted
#Calculate the log prices of the two time series
t.zoo$LogA <- log10(t.zoo$ACC.NS.Adjusted)
t.zoo$LogB <- log10(t.zoo$AMBUJACEM.NS.Adjusted)
#Calculating the spread
t.zoo$spread <- t.zoo$ACC.NS.Adjusted/t.zoo$AMBUJACEM.NS.Adjusted
#Calculating the mean
# Computes the mean using the SMA function
# choose the number of days for calculating the mean
SMAdays = 20
t.zoo$mean <- SMA(t.zoo$spread,SMAdays)
#Calculating the Std Deviation
t.zoo$stdev <- rollapply(t.zoo$spread,20,sd, fill=NA, align='right')
#Calculating the Z Score
t.zoo$zScore <- (t.zoo$pairRatio - t.zoo$mean)/t.zoo$spread
View(t.zoo)
#Calculation of trading signals and trading prices
#Trigger sell or buy signal if Z Score moves above 2.25 or below -2.25.
# Close position if Z Score reaches 0.2 or -0.2.
# close any open position if there is a change in signal.
I think the main issue was to come up with trading signals for a strategy that depends not only on the current level of indicator but also on the direction from which the indicator is crossed.
There were a number of problems with the code posted in comments, including use of single = for comparisons . So I've worked it afresh
Here's my attempt at solving this. It seems to be fine. I've added some plotting code to eyeball the results. I suggest you check the result over different periods.
This code comes after the one in the original question . Only difference is that I have kept t.zoo as an xts/zoo object and not converted it to data.frame. Also, I've multiplied zScores with 100
It generates trigger dates and also a column depicting the state of strategy. Calculating returns would be easy from there
colnames(t.zoo)
#t.zoo must be an xts object
#working on a separate xts object
sigs<- t.zoo[, c("ACC.NS.Adjusted", "AMBUJACEM.NS.Adjusted" , "zScore")]
# creating my own triggers as there are not enough good values
# buyTrig<- mean(t.zoo$zScore ,na.rm = T) - 1*sd(t.zoo$zScore ,na.rm = T)
# sellTrig<- (-1) * buyTrig
# sqOffTrig<- mean(t.zoo$zScore ,na.rm = T) - 0.5*sd(t.zoo$zScore ,na.rm = T)
# Another approach: scaling tz.zoo to fit your criterion
sigs$zScore<- sigs$zScore*100
buyTrig<- (-2.25)
sellTrig<- (-1) * buyTrig
sqOffTrig<- 0.25
cat ( buyTrig, sellTrig , sqOffTrig)
hist(sigs$zScore, breaks = 40)
abline(v=c(buyTrig,sellTrig), col="red")
abline(v=c(-sqOffTrig, sqOffTrig), col="green")
sum(sigs$zScore >= -sqOffTrig & sigs$zScore<= sqOffTrig , na.rm = T) # 139
sigs$action<- 0
sigs$mode <- NA
sigs$zLag<- lag.xts(sigs$zScore,1)
sigs[19:22,]
#these are not the real trigger dates, but they will serve our purpose
# along with na.locf
buyTrigDays<- time(sigs[sigs$zScore<= buyTrig & sigs$zLag > buyTrig, ])
sellTrigDays<- time(sigs[sigs$zScore>= sellTrig & sigs$zLag < sellTrig, ])
#square offs
buySqOffDays<- time( sigs[sigs$zScore>= (-1*sqOffTrig) & sigs$zLag < (-1*sqOffTrig), ] )
buySqOffDays
sellSqOffDays<- time( sigs[sigs$zScore<= (sqOffTrig) & sigs$zLag > (sqOffTrig), ] )
sellSqOffDays
sigs$mode[buyTrigDays]=1 ; sigs$mode[sellTrigDays]= -1;
sigs$mode[buySqOffDays]=0 ; sigs$mode[sellSqOffDays]= 0;
sigs$mode
# use local fill to repeat these triggered position into future
# till you meet another non NA value
sigs$mode<- na.locf(sigs$mode, fromLast = F)
plot((sigs$zScore["2015"] ))
points(sigs$zScore[sigs$mode==1], col="red", on=1, pch = 19)
points(sigs$zScore[sigs$mode==-1], col="green", on=1 , pch = 19)
points(sigs$zScore[sigs$mode==0], col="blue", on=1)
sum(is.na(sigs$mode))
#now to get the real dates when square off is triggered
trigdays<- time( sigs[diff(sigs$mode,1) != 0, ] ) #when the value changes
squareOffTrigger_real<- time(sigs[sigs$mode==0][trigdays])
buyTrigger_real<- time(sigs[sigs$mode==1] [trigdays])
sellTrigger_real<- time(sigs[sigs$mode==-1][trigdays])
#check
length(sellTrigger_real) + length(buyTrigger_real) == length(squareOffTrigger_real)
plot(sigs$zScore["2015"])
points(sigs$zScore[buyTrigger_real] , col="blue", pch = 19, on=1)
points(sigs$zScore[sellTrigger_real] , col="red", pch = 19, on=1)
points(sigs$zScore[squareOffTrigger_real] , col="green", pch = 19, on=1)
abline(h=c(-sqOffTrig, sqOffTrig) , col= "green" )
# further calculations can be easily made using either the mode
# column or the trigger dates computed at the end
I want to find the mode (x-value) of a univariate density function using R
s optimize function
I.e. For a standard normal function f(x) ~ N(3, 1) the mode should be the mean i.e. x=3.
I tried the following:
# Define the function
g <- function(x) dnorm(x = x, mean = 3, sd = 1)
Dvec <- c(-1000, 1000)
# First get the gradient of the function
gradfun <- function(x){grad(g, x)}
# Find the maximum value
x_mode <- optimize(f=g,interval = Dvec, maximum=TRUE)
x_mode
This gives the incorrect value of the mode as:
$maximum
[1] 999.9999
$objective
[1] 0
Which is incorrect i.e. gives the max value of the (-1000, 1000) interval as opposed to x=3.
Could anyone please help edit the optimisation code.
It will be used to pass more generic functions of x if this simple test case works
I would use optim for this, avoiding to mention the interval. You can tailor the seed by taking the maximum of the function on the original guessed interval:
guessedInterval = min(Dvec):max(Dvec)
superStarSeed = guessedInterval[which.max(g(guessedInterval))]
optim(par=superStarSeed, fn=function(y) -g(y))
#$par
#[1] 3
#$value
#[1] -0.3989423
#$counts
#function gradient
# 24 NA
#$convergence
#[1] 0
#$message
#NULL
I am new to R and am trying to calculate the bootstrapped standard deviation (sd) and associated standard error within a 30 observation rolling window. The function below performs the rolling window appropriately if I just want sd. But when I add the bootstrap function using the boot package I get the error specified below. I gather that I am trying to store bootstrap results in a vector that isn't the correct size. Does anyone have any advice on how to store just the bootstrapped sd and associated stderror for each window in rows of a new matrix? The goal is to then plot the sd and associated 95% confidence intervals for each window along the timeseries. Thanks in advance for any help.
> head(data.srs)
LOGFISH
1 0.8274083
2 1.0853433
3 0.8049845
4 0.8912097
5 1.3514569
6 0.8694499
###Function to apply rolling window
rollWin <- function(timeSeries, windowLength)
{
data<-timeSeries
nOut <- length(data[, 1]) - windowLength + 1
out <- numeric(nOut)
if (length(data[,1]) >= windowLength)
{
for (i in 1:nOut)
{
sd.fun <- function(data,d)sd(data[d], na.rm = TRUE)
out[i] <- boot(data[i:(i + windowLength - 1), ], sd.fun, R=1000)
}
}
return (list(result=out))
}
###run rolling window function. ex. rollWin(data, windowlength)
a.temp<-rollWin(data.srs,30)
> warnings()
Warning messages:
1: In out[i] <- boot(data[i:(i + windowLength - 1), ], sd.fun, ... :
number of items to replace is not a multiple of replacement length
You can simplify it quite a lot. I am not familiar with the boot package, but we can roll a function along a vector using the rollapply function quite easily, and then we can make bootstrap samples using the replicate function:
# Create some data, 12 items long
r <- runif(12)
# [1] 0.44997964 0.27425412 0.07327872 0.68054759 0.33577348 0.49239478
# [7] 0.93421646 0.19633079 0.45144966 0.53673296 0.71813017 0.85270346
require(zoo)
# use rollapply to calculate function alonga moving window
# width is the width of the window
sds <- rollapply( r , width = 4 , by = 1 , sd )
#[1] 0.19736258 0.26592331 0.16770025 0.12585750 0.13730946 0.08488467
#[7] 0.16073722 0.22460430 0.22462168
# Now we use replicate to repeatedly evaluate a bootstrap sampling method
# 'n' is number of replications
n <- 4
replicate( n , rollapply( r , width = n , function(x) sd( x[ sample(length(x) , repl = TRUE) ] ) ) )
# [,1] [,2] [,3] [,4]
# [1,] 0.17934073 0.1815371 0.11603320 0.2992379
# [2,] 0.03551822 0.2862702 0.18492837 0.2526193
# [3,] 0.09042535 0.2419768 0.13124738 0.1666012
# [4,] 0.17238705 0.1410475 0.18136178 0.2457248
# [5,] 0.32008385 0.1709326 0.32909368 0.2550859
# [6,] 0.30832533 0.1480320 0.02363968 0.1275594
# [7,] 0.23069951 0.1275594 0.25648052 0.3016909
# [8,] 0.11235170 0.2493055 0.26089969 0.3012610
# [9,] 0.16819174 0.2099518 0.18033502 0.0906986
Each column represents the rollapply which bootstraps the observations in the current window before applying sd.
I am trying to use http://rss.acs.unt.edu/Rdoc/library/stats/html/constrOptim.html in R to do optimization in R with some given linear constraints but not able to figure out how to set up the problem.
For example, I need to maximize $f(x,y) = log(x) + \frac{x^2}{y^2}$ subject to constraints $g_1(x,y) = x+y < 1$, $g_2(x,y) = x > 0$ and $g_3(x,y) = y > 0$. How do I do this in R? This is just a hypothetical example. Do not worry about its structure, instead I am interested to know how to set this up in R.
thanks!
Setting up the function was trivial:
fr <- function(x) { x1 <- x[1]
x2 <- x[2]
-(log(x1) + x1^2/x2^2) # need negative since constrOptim is a minimization routine
}
Setting up the constraint matrix was problematic due to a lack of much documentation, and I resorted to experimentation. The help page says "The feasible region is defined by ui %*% theta - ci >= 0". So I tested and this seemed to "work":
> rbind(c(-1,-1),c(1,0), c(0,1) ) %*% c(0.99,0.001) -c(-1,0, 0)
[,1]
[1,] 0.009
[2,] 0.990
[3,] 0.001
So I put in a row for each constraint/boundary:
constrOptim(c(0.99,0.001), fr, NULL, ui=rbind(c(-1,-1), # the -x-y > -1
c(1,0), # the x > 0
c(0,1) ), # the y > 0
ci=c(-1,0, 0)) # the thresholds
For this problem there is a potential difficulty in that for all values of x the function goes to Inf as y -> 0. I do get a max around x=.95 and y=0 even when I push the starting values out to the "corner", but I'm somewhat suspicious that this is not the true maximum which I would have guessed was in the "corner".
EDIT:
Pursuing this I reasoned that the gradient might provide additional "direction" and added a gradient function:
grr <- function(x) { ## Gradient of 'fr'
x1 <- x[1]
x2 <- x[2]
c(-(1/x[1] + 2 * x[1]/x[2]^2),
2 * x[1]^2 /x[2]^3 )
}
This did "steer" the optimization a bit closer to the c(.999..., 0) corner, instead of moving away from it, as it did for some starting values. I remain somewhat disappointed that the process seems to "head for the cliff" when the starting values are close to the center of the feasible region:
constrOptim(c(0.99,0.001), fr, grr, ui=rbind(c(-1,-1), # the -x-y > -1
c(1,0), # the x > 0
c(0,1) ), # the y > 0
ci=c(-1,0, 0) )
$par
[1] 9.900007e-01 -3.542673e-16
$value
[1] -7.80924e+30
$counts
function gradient
2001 37
$convergence
[1] 11
$message
[1] "Objective function increased at outer iteration 2"
$outer.iterations
[1] 2
$barrier.value
[1] NaN
Note: Hans Werner Borchers posted a better example on R-Help that succeeded in getting the corner values by setting the constraint slightly away from the edge:
> constrOptim(c(0.25,0.25), fr, NULL,
ui=rbind( c(-1,-1), c(1,0), c(0,1) ),
ci=c(-1, 0.0001, 0.0001))
$par
[1] 0.9999 0.0001