Here is the formula which I am trying to calculate in R.
So far, this is my approach using a simplified example
t <- seq(1, 2, 0.1)
expk <- function(k){exp(-2*pi*1i*t*k)}
set.seed(123)
dat <- ts(rnorm(100), start = c(1994,3), frequency = 12)
arfit <- ar(dat, order = 4, aic = FALSE) # represent \phi in the formula
tmp1 <- numeric(4)
for (i in seq_along(arfit$ar)){
ek <- expk(i)
arphi <- arfit$ar[i]
tmp1[i] <- ek * arphi
}
tmp2 <- sum(tmp1)
denom = abs(1-tmp2)^2
s2 <- t/denom
Error : Warning message:
In tmp1[i] <- ek * arphi :
number of items to replace is not a multiple of replacement length
I was trying to avoid using for loop and tried using sapply as in solutions to this question.
denom2 <- abs(1- sapply(seq_along(arfit$ar), function(x)sum(arfit$ar[x]*expf(x))))^2
but doesnt seem to be correct. The problem is to do the sum of the series(over index k) when it is taking values from another vector as well, in this case, t which is in the numerator.
Any solutions ?
Any suggestion for a test dataset, maybe using 0 and 1 to check if the calculation is done correctly in this loop here ?
Typing up the answer determined in chat. Here's a solution involving vapply.
First correct expk to:
expk <- function(k){sum(exp(-2*pi*1i*t*k))}
Then you can create this function and vapply it:
myFun <- function(i) return(expk(i) * arfit$ar[i])
tmp2 <- sum(vapply(seq_along(arfit$ar), myFun, complex(1)))
Related
I cannot seem to even create a reproducible example on this as it works fine when I go through the code one line at a time.
The error message I get is as follows:
"Error in testData[, colCheck][length(testData[, colCheck])] - testData[, :
non-numeric argument to binary operator "
Both colCheck and testData$linearcorrd15N are numeric and like I said, the calculation works fine when I run it at that line. The error comes only when I run the function from QTest(df, colCheck).
Here is an example of what some of the code looks like. It will not produce an error, but maybe you can see something that I don't.
QTest <- function(testData, colCheck)
#%#
# testData <- This is the entire data frame for the std/ref that has too high
# of a SD, this way the data frame can be returned without the outlier
# colCheck <- The column name for values that were flagged for having too high of a SD
# This Q test info provided by: https://www.statisticshowto.com/dixons-q-test/
#%#
{
#Get the mean of the highest and lowest values
testData <- arrange(testData, desc(testData[, colCheck]))
len <- length(testData[,colCheck])-1
high <- sapply(1:len, function(i) testData[,colCheck][i])
meanhigh <- mean(high)
testData <- arrange(testData, (testData[, colCheck]))
low <- sapply(1:len, function(i) testData[,colCheck][i])
meanlow <- mean(low)
#If the mean of the lowest numbers is lower than the mean of the highest numbers, do this
if(meanlow < meanhigh){
QexpVal <- abs((testData[, colCheck][2] - testData[, colCheck][1])/
(testData[, colCheck][length(testData[, colCheck])] - testData[, colCheck][1]))
outlier <- testData[,colCheck][1]
closest <- testData[,colCheck][2]
#else if the mean of the lowest numbers is higher than the mean of the highest numbers, do this
} else {
QexpVal <- abs((testData[, colCheck][length(testData[,colCheck])-1] - (testData[, colCheck][length(testData[,colCheck])])) /
(testData[,colCheck][length(testData[,colCheck])]) - (testData[,colCheck][1]))
outlier <- testData[,colCheck][length(testData[,colCheck])]
closest <- testData[,colCheck][length(testData[,colCheck])-1]
}
return(QexpVal)
}
df <- data.frame(Row = c(1, 2, 3, 4, 5), Identifier.2 = "36-UWSIF-UT Glut1", linearcorrd15N = c(-11.63433,
-22.13869, -57.21795, -17.06438, -16.23358))
colCheck <- as.numeric(grep("linearcorrd15N", colnames(std1)))
QTestCorrVals <- QTest(df, colCheck)
It seems you realy overcomplicate this function by pushing the whole table in the function and loop over everything and read a value again from the whole table...
just the part to get meanhigh and meanlow requires this:
v <- df[, colCheck]
v <- v[order(v)]
n <- length(v)
meanhigh <- mean(v[2:n])
meanlow <- mean(v[1:n-1])
Or if you use the decreasing ordering this:
v <- df[, colCheck]
v <- v[order(v, decreasing = T)]
n <- length(v)
meanhigh <- mean(v[1:n-1])
meanlow <- mean(v[2:n])
Full function
Hereby the full code using this approach and I agree that is not the specific question you asked, but the way you coded it is highly inefficient and error prone by every time take the whole data.frame and subset it and recalculate lengths every time. Also you just have to order once, as if the lowest is on top, the highest is per definition on the bottom. Then play around with the 1 for first and 2 for second and n for last and n-1 for second last.
df <- data.frame(Row = c(1, 2, 3, 4, 5), Identifier.2 = "36-UWSIF-UT Glut1", linearcorrd15N = c(-11.63433,
-22.13869, -57.21795, -17.06438, -16.23358))
colCheck <- as.numeric(grep("linearcorrd15N", colnames(df)))
QTest <- function(v) {
v <- v[order(v)]
n <- length(v)
meanhigh <- mean(v[2:n])
meanlow <- mean(v[1:n-1])
if(meanlow < meanhigh) {
QexpVal <- abs((v[2]-v[1])/(v[n]-v[1]))
outlier <- v[1]
closest <- v[2]
} else {
QexpVal <- abs((v[n-1]-v[n])/(v[n]-v[1]))
outlier <- v[n]
closest <- v[n-1]
}
return(QexpVal)
}
QTestCorrVals <- QTest(df[, colCheck])
Side note
Using the column index number works slightly different whether your data is a data.frame or a data.table
class(df)
df[, colCheck]
dt <- data.table(df)
class(dt)
dt[, ..colCheck]
dt[, colCheck] # throws an error
I have 2 vectors containing numbers, I'm using to simulate power of my study but keeps getting this error at the for loop section
Error in pwr.2p2n.test(h, n1 = i, n2 = j, sig.level = 0.05) :
number of observations in the first group must be at least 2
would be grateful for your suggestions to get it working
##sample code
grp1.n <- seq(30,150,5) ##group 1, N
grp2.n <- seq(30,150,5)-15 ## group 2, N - 15
h=0.85 #specify large effect size
grp1.length <- length(grp1.n)
grp2.length <- length(grp2.n)
power.holder <- array(numeric(grp1.length*grp2.length), dim=c(grp1.length,grp2.length),dimnames=list(grp1.n,grp2.n))
for (i in 1:grp1.length){
for (j in 1:grp2.length){
result.pwr.2p2n.test <- pwr.2p2n.test(h, n1=i, n2=j, sig.level=0.05)
power.holder[i,j] <- ceiling(result.pwr.2p2n.test$power)
return(result.pwr.2p2n.test)
}
}
I'm not entirely sure if this is what you want, but I think it is:
grp1.n <- seq(30,150,5) ##group 1, N
grp2.n <- seq(30,150,5)-15 ## group 2, N - 15
h=0.85 #specify large effect size
grp1.length <- length(grp1.n)
grp2.length <- length(grp2.n)
power.holder <- array(numeric(grp1.length*grp2.length), dim=c(grp1.length,grp2.length),dimnames=list(grp1.n,grp2.n))
for (i in 1:grp1.length){
for (j in 1:grp2.length){
result.pwr.2p2n.test <- pwr.2p2n.test(h, n1=grp1.n[i], n2=grp2.n[j], sig.level=0.05)
power.holder[i,j] <- ceiling(result.pwr.2p2n.test$power)
return(power.holder)
}
}
The changes are in the pwr.2p2n.test function as well as the object you want to return.
Old: pwr.2p2n.test(h, n1=i, n2=j, sig.level=0.05)
New: pwr.2p2n.test(h, n1=grp1.n[i], n2=grp2.n[j], sig.level=0.05)
Note there was also a missing } bracket in your code.
I would like to clean up my code a bit and start to use more functions for my everyday computations (where I would normally use for loops). I have an example of a for loop that I would like to make into a function. The problem I am having is in how to step through the constraint vectors without a loop. Here's what I mean;
## represents spectral data
set.seed(11)
df <- data.frame(Sample = 1:100, replicate(1000, sample(0:1000, 100, rep = TRUE)))
## feature ranges by column number
frm <- c(438,563,953,963)
to <- c(548,803,1000,993)
nm <- c("WL890", "WL1080", "WL1400", "WL1375")
WL.ps <- list()
for (i in 1:length(frm)){
## finds the minimum value within the range constraints and returns the corresponding column name
WL <- colnames(df[frm[i]:to[i]])[apply(df[frm[i]:to[i]],1,which.min)]
WL.ps[[i]] <- WL
}
new.df <- data.frame(WL.ps)
colnames(new.df) <- nm
The part where I iterate through the 'frm' and 'to' vector values is what I'm having trouble with. How does one go from frm[1] to frm[2].. so-on in a function (apply or otherwise)?
Any advice would be greatly appreciated.
Thank you.
You could write a function which returns column name of minimum value in each row for a particular range of columns. I have used max.col instead of apply(df, 1, which.min) to get minimum value in a row since max.col would be efficient compared to apply.
apply_fun <- function(data, x, y) {
cols <- x:y
names(data[cols])[max.col(-data[cols])]
}
Apply this function using Map :
WL.ps <- Map(apply_fun, frm, to, MoreArgs = list(data = df))
I have an empty data frame T_modelled with 2784 columns and 150 rows.
T_modelled <- data.frame(matrix(ncol = 2784, nrow = 150))
names(T_modelled) <- paste0("t=", t_sec_ERT)
rownames(T_modelled) <- paste0("z=", seq(from = 0.1, to = 15, by = 0.1))
where
t_sec_ERT <- seq(from = -23349600, to = 6706800, by = 10800)
z <- seq(from = 0.1, to = 15, by = 0.1)
I filled T_modelled by column with a nested for loop, based on a formula:
for (i in 1:ncol(T_modelled)) {
col_tmp <- colnames(T_modelled)[i]
for (j in 1:nrow(T_modelled)) {
z_tmp <- z[j]-0.1
T_tmp <- MANSRT+As*e^(-z_tmp*(omega/(2*K))^0.5)*sin(omega*t_sec_ERT[i]-((omega/(2*K))^0.5)*z_tmp)
T_modelled[j ,col_tmp] <- T_tmp
}
}
where
MANSRT <- -2.051185
As <- 11.59375
omega <- (2*pi)/(347.875*24*60*60)
c <- 790
k <- 0.00219
pb <- 2600
K <- (k*1000)/(c*pb)
e <- exp(1)
I do get the desired results but I keep thinking there must be a more efficient way of filling that data frame. The loop is quite slow and looks cumbersome to me. I guess there is an opportunity to take advantage of R's vectorized way of calculating. I just cannot see myself how to incorporate the formula in an easier way to fill T_modelled.
Anyone got any ideas how to get the same result in a faster, more "R-like" manner?
I believe this does it.
Run this first instruction right after creating T_modelled, it will be needed to test that the results are equal.
Tm <- T_modelled
Now run your code then run the code below.
z_tmp <- z - 0.1
for (i in 1:ncol(Tm)) {
T_tmp <- MANSRT + As*exp(-z_tmp*(omega/(2*K))^0.5)*sin(omega*t_sec_ERT[i]-((omega/(2*K))^0.5)*z_tmp)
Tm[ , i] <- T_tmp
}
all.equal(T_modelled, Tm)
#[1] TRUE
You don't need the inner loop, that's the only difference.
(I also used exp directly but that is of secondary importance.)
Much like your previous question's solution which you accepted, consider simply using sapply, iterating through the vector, t_sec_ERT, which is the same length as your desired dataframe's number of columns. But first adjust every element of z by 0.1. Plus, there's no need to create empty dataframe beforehand.
z_adj <- z - 0.1
T_modelled2 <- data.frame(sapply(t_sec_ERT, function(ert)
MANSRT+As*e^(-z_adj*(omega/(2*K))^0.5)*sin(omega*ert-((omega/(2*K))^0.5)*z_adj)))
colnames(T_modelled2) <- paste0("t=", t_sec_ERT)
rownames(T_modelled2) <- paste0("z=", z)
all.equal(T_modelled, T_modelled2)
# [1] TRUE
Rui is of course correct, I just want to suggest a way of reasoning when writing a loop like this.
You have two numeric vectors. Functions for numerics in R are usually vectorized. By which I mean you can do stuff like this
x <- c(1, 6, 3)
sum(x)
not needing something like this
x_ <- 0
for (i in x) {
x_ <- i + x_
}
x_
That is, no need for looping in R. Of course looping takes place none the less, it just happens in the underlying C, Fortran etc. code, where it can be done more efficiently. This is usually what we mean when we call a function vectorized: looping takes place "under the hood" as it were. The output of Vectorize() thus isn't strictly vectorized by this definition.
When you have two numeric vectors you want to loop over you have to first see if the constituent functions are vectorized, usually by reading the docs.
If it is, you continue by constructing that central vectorized compound function and and start testing it with one vector and one scalar. In your case it would be something like this (testing with just the first element of t_sec_ERT).
z_tmp <- z - 0.1
i <- 1
T_tmp <- MANSRT + As *
exp(-z_tmp*(omega/(2*K))^0.5) *
sin(omega*t_sec_ERT[i] - ((omega/(2*K))^0.5)*z_tmp)
Looks OK. Then you start looping over the elements of t_sec_ERT.
T_tmp <- matrix(nrow=length(z), ncol=length(t_sec_ERT))
for (i in 1:length(t_sec_ERT)) {
T_tmp[, i] <- MANSRT + As *
exp(-z_tmp*(omega/(2*K))^0.5) *
sin(omega*t_sec_ERT[i] - ((omega/(2*K))^0.5)*z_tmp)
}
Or you can do it with sapply() which is often neater.
f <- function(x) {
MANSRT + As *
exp(-z_tmp*(omega/(2*K))^0.5) *
sin(omega*x - ((omega/(2*K))^0.5)*z_tmp)
}
T_tmp <- sapply(t_sec_ERT, f)
I would prefer to put the data in a long format, with all combinations of z and t_sec_ERT as two columns, in order to take advantage of vectorization. Although I usually prefer tidyr for switching between long and wide formats, I've tried to keep this as a base solution:
t_sec_ERT <- seq(from = -23349600, to = 6706800, by = 10800)
z <- seq(from = 0.1, to = 15, by = 0.1)
v <- expand.grid(t_sec_ERT, z)
names(v) <- c("t_sec_ERT", "z")
v$z_tmp <- v$z-0.1
v$T_tmp <- MANSRT+As*e^(-v$z_tmp*(omega/(2*K))^0.5)*sin(omega*v$t_sec_ERT-((omega/(2*K))^0.5)*v$z_tmp)
T_modelled <- data.frame(matrix(v$T_tmp, nrow = length(z), ncol = length(t_sec_ERT), byrow = TRUE))
names(T_modelled) <- paste0("t=", t_sec_ERT)
rownames(T_modelled) <- paste0("z=", seq(from = 0.1, to = 15, by = 0.1))
I'm trying to do a bootstrap of a spatial count model using the spatcounts-package in R. However, resampling with replacement may lead to "islands", because sometimes no neighbors will be selected. I believe this is causing the error message: Error: NA/NaN/Inf in foreign function call (arg 1). Can somebody confirm this if this indeed the case?
If so, is there a way to get around it, so that I would still be able to carry out a bootstrap?
Please find below the code to recreate the problem using the example dataset of the spatcounts package.
Thank you for your time and I hope somebody can help me out a bit.
####START CODE#####
set.seed(987654321)
library(spatcounts)
AllData <- cbind(sim.Yin, sim.region, sim.fm.X)
colnames(AllData)[1:2] <- c("Yin", "Region")
idx <- sample(1:nrow(AllData), 100, replace=TRUE)
newdata.df <- AllData[idx,]
newdata.df <- newdata.df[order(newdata.df$Region),]
X <- as.data.frame(newdata.df[,3:4])
region <- as.data.frame(newdata.df$Region)
colnames(region) <- "V1"
Yin <- as.data.frame(newdata.df$Yin)
temp.idx <- sort(unique(newdata.df$Region))
g.temp <- sim.gmat[temp.idx,temp.idx]
TotalN <- rowSums(g.temp) ##CHECK IF THERE ARE ZEROS PRESENT IN THIS VECTOR.
##OTHERWISE RUN AGAIN UNTIL AT LEAST ONE ZERO
n.temp <- sim.nmat[temp.idx,]
n.temp$V2 <- temp.idx[match(n.temp$V2, temp.idx)]
n.temp$V3 <- temp.idx[match(n.temp$V3, temp.idx)]
n.temp$V4 <- temp.idx[match(n.temp$V4, temp.idx)]
n.temp$V5 <- temp.idx[match(n.temp$V5, temp.idx)]
n.temp$V6 <- TotalN
n.temp[is.na(n.temp)] <- 0
Yin.NB <- est.sc(Yin, ~ X[,1] + X[,2] -1,
region, model="NB", g.temp, n.temp, totalit=10) ##ERROR