Add na.omit as argument in a function - r

I have the following list of pvalues
pval.list <- list(list(a=c(0.05, 0.0001, 0.32, 0.45), b=c(0.1,0.12,0.01,0.06), c=c(0.1,0.12,0.01,0.06), d=c(0.01,0.02,0.03,0.04)),
list(e=c(0.04, NA, 0.232, 0.245), f=c(0.05, 0.01, 0.22, 0.54), g=c(0.005, 0.1, 0.032, 0.045)),
list(h=c(0.03, 0.01, NA, 0.4), i=c(0.5, 0.0001, 0.132, 0.045), j=c(0.005, 0.0001, 0.0032, 0.045), k=c(0.5, 0.1, 0.932, 0.545)),
list(l=c(0.022, NA, 0.32, 0.45), m=c(0.0589, 0.0001, NA, 0.0045)),
list(n=c(0.051, 0.01, 0.32, 0.45), o=c(0.05, 0.0001, 0.32, 0.45), p=c(0.05, 0.0001, 0.32, 0.45), q=c(0.05, 0.0001, NA, 0.45)),
list(r=c(NA, 0.001, 0.32, 0.45), s=c(0.05, 0.0001, NA, 0.45), t=c(0.05, 0.0001, 0.32, 0.45)))
I'm trying to apply a function (see below) to this list:
Fisher.test <- function(p) {
Xsq <- -2*sum(log(p))
p.val <- 1-pchisq(Xsq, df = 2*length(p))
return(p.val)
}
Following the indications from #G.Grothendieck I have used this command: lapply(lapply(pval.list, Reduce, f = cbind), apply, 1, Fisher.test) however when there are missing values in the list, the approach do not work. I have tried incorporating na.omit as argument to lapply but it does not fix the problem.
Should I add na.omit as an argument in Fisher.test function?
Thanks

Your NAs are produced in sum. If you pass the argument na.rm=TRUE, you won't have NAs in your result:
Xsq <- -2*sum(log(p), na.rm=TRUE)
You could make an na.rm argument for Fisher.test, and pass true or false from lapply, but when you always want to exclude NAs you should just set the argument in the function.
Calling the function on your pvalues:
lapply(lapply(pval.list, Reduce, f = cbind), apply, 1, Fisher.test)
gives:
[[1]]
[1] 1.953968e-03 2.999509e-05 5.320831e-04 1.339104e-02
[[2]]
[1] 0.0007878665 0.0317662968 0.0457435481 0.1146067577
[[3]]
[1] 8.982382e-03 3.055250e-08 4.719676e-02 5.094879e-02
[[4]]
[1] 0.009911091 0.001021034 0.684618971 0.014584426
[[5]]
[1] 2.357950e-03 6.135981e-11 5.543601e-01 6.038488e-01
[[6]]
[1] 6.235072e-02 3.470933e-09 6.016501e-01 5.708789e-01
All NA values are now removed before calculating the sum.
Does this do what you want?

Related

Output selected variables to global environment R function

I have function which is an extension of an earlier question here
Function to calculate median by column to an R dataframe that is done regularly to multiple dataframes
my function below
library(outliers)
MscoreMax <- 3
scores_na <- function(x, ...) {
not_na <- !is.na(x)
scores <- rep(NA, length(x))
scores[not_na] <- outliers::scores(na.omit(x), ...)
scores
}
mediansFunction <- function(x){
labmedians <- sapply(x[-1], median)
median_of_median <- median(labmedians)
grand_median <- median(as.matrix(x[-1]))
labMscore <- as.vector(round(abs(scores_na(labmedians, "mad")), digits = 2)) #calculate mscore by lab
labMscoreIndex <- which(labMscore > MscoreMax) #get the position in the vector that exceeds Mscoremax
x[-1][labMscoreIndex] <- NA # discharge values above threshold by making NA
return(x)
}
the function has the desired outcome of converting my Mscore values above the threshold to NA. However, I would like to send
labmedians
grand_median
labMscore
As their own variables to the global environment from within the function, but not as a list of items as 3 variables. Can i do this or is better to create a second function which is slightly different that sends the variables to the global environment as a function then use list2env outside the function afterwards to extract the variables as seperate items?
my df below
structure(list(Determination_No = 1:6, `2` = c(0.08, 0.08, 0.08,
0.08, 0.08, 0.08), `3` = c(0.08, 0.07, 0.07, 0.08, 0.07, 0.07
), `4` = c(0.07, 0.08, 0.08, 0.08, 0.07, 0.08), `5` = c(0.08,
0.08, 0.08, 0.08, 0.09, 0.09), `7` = c(0.09, 0.09, 0.11, 0.1,
0.1, 0.1), `8` = c(0.086, 0.087, 0.086, 0.09, 0.083, 0.079),
`10` = c(0.049748274, 0.049748274, 0.066331032, 0.066331032,
0.066331032, 0.049748274), `12` = c(0.086, 0.078, 0.078,
0.077, 0.077, 0.068)), class = "data.frame", row.names = c(NA,
-6L))
It is not recommended to write to global environment from inside the function. If you want to create multiple objects in the global environment return a named list from the function and use list2env.
mediansFunction <- function(x){
labmedians <- sapply(x[-1], median)
median_of_median <- median(labmedians)
grand_median <- median(as.matrix(x[-1]))
labMscore <- as.vector(round(abs(scores_na(labmedians, "mad")), digits = 2)) #calculate mscore by lab
labMscoreIndex <- which(labMscore > MscoreMax) #get the position in the vector that exceeds Mscoremax
x[-1][labMscoreIndex] <- NA # discharge values above threshold by making NA
dplyr::lst(data = x, labmedians, grand_median, labMscore)
}
result <- mediansFunction(df)
list2env(result, .GlobalEnv)
Now you have variables data, labmedians, grand_median and labMscore in the global environment.

Problem assigning probabilities in sample function

I have generated a bunch of variables through the following :
max_no=10
list2env(setNames(as.list(c(0.2, 0.25,0.15, 0.1, 0.1, 0.05, 0.03, 0.06, 0.03, 0.02, 0.01)), paste0("proportion", 0:max_no)), envir = .GlobalEnv)
These generate objects like "proportion0", "proportion1",..., "proportion10" with values 0.2, 0.25.... etc.
I want to plug these proportion values as a vector of probabilities into the sample function like the following:
sample(seq.int(0, max_no, 1), size=10000, replace=TRUE, prob=c(paste0("proportion", 0:max_no)))
But I get the error message: Error in sample.int(length(x), size, replace, prob) : NA in probability vector
What would be a simple way of feeding in the probabilities?
I am not sure what is the use-case for this but you can use mget and unlist
sample(seq.int(0, max_no, 1), size=10000, replace=TRUE,
prob=unlist(mget(c(paste0("proportion", 0:max_no)))))
Why not pass probability directly instead of storing each number in a different variable?
sample(seq.int(0, max_no, 1), size=10000, replace=TRUE,
prob=c(0.2, 0.25,0.15, 0.1, 0.1, 0.05, 0.03, 0.06, 0.03, 0.02, 0.01))

How to store the output into a list of matrices

Data:
x <- seq(0, 1, len = 1024)
pos <- c(0.1, 0.13, 0.15, 0.23, 0.25, 0.40, 0.44, 0.65, 0.76, 0.78, 0.81)
hgt <- c(4, 5, 3, 4, 5, 4.2, 2.1, 4.3, 3.1, 5.1, 4.2)
wdt <- c(0.005, 0.005, 0.006, 0.01, 0.01, 0.03, 0.01, 0.01, 0.005, 0.008, 0.005)
pSignal <- numeric(length(x))
for (i in seq(along=pos)) {
pSignal <- pSignal + hgt[i]/(1 + abs((x - pos[i])/wdt[i]))^4
}
df = as.data.frame(rbind(pSignal,pSignal,pSignal))
dflist=list(df,df,df)
I'm trying to run this pracma package's findpeaks() function to find the local maxima of each row in each data.frame in the list, dflist. The output is a N x 4 array. N = the number of peaks. So in the first row of the first data.frame if it finds 4 peaks, it will be a 4x4 matrix. My goal is to loop this function over every row in each data.frame and store the matrix that is output in a list.
My code:
## Find Peaks
pks=list()
for (i in 1:length(dflist)){
for (j in 1:length(dflist[[i]])){
row = dflist[[i]][j,]
temppks = findpeaks(as.vector(row,mode='numeric')
,minpeakheight = 1.1,nups=2)
pks[i][[j]]=rbind(pks,temppks)
}
}
This doesn't seem to be doing quite what I want it too. any ideas?
A combination of apply() and sapply() could do the work:
my.f.row <- function(row) findpeaks(as.vector(row,mode='numeric'), minpeakheight = 1.1, nups=2)
sapply(dflist, function(df.i) apply(df.i, 1, my.f.row))
eventually you have to reorganize the result.

Trying to fit f distribution to a vector

Would anyone know why the following code fails to execute fitdist with error "the function mle failed to estimate the parameters, with the error code 100".
I have encountered this error in the past when working with the normal distribution; the solution in that case was increasing the variance of the vector (by multiplying it by say 100), but that does not help on this case. Please note all elements in the vector are positive. Thank you.
library(fitdistrplus)
VH <- c(0.36, 0.3, 0.36, 0.47, 0, 0.05, 0.4, 0, 0, 0.15, 0.89, 0.03, 0.45, 0.21, 0, 0.18, 0.04, 0.53, 0, 0.68, 0.06, 0.09, 0.58, 0.03, 0.23, 0.27, 0, 0.12, 0.12, 0, 0.32, 0.07, 0.04, 0.07, 0.39, 0, 0.25, 0.28, 0.42, 0.55, 0.04, 0.07, 0.18, 0.17, 0.06, 0.39, 0.65, 0.15, 0.1, 0.32, 0.52, 0.55, 0.71, 0.93, 0, 0.36)
f <- fitdist(na.exclude(VH),"f", start =list(df1=1, df2=2))
The error you get here is actually somewhat informative:
simpleError in optim(par = vstart, fn = fnobj, fix.arg = fix.arg, obs = data, ddistnam = ddistname, hessian = TRUE, method = meth, lower = lower, upper = upper, ...): function cannot be evaluated at initial parameters
Error in fitdist(na.exclude(VH), "f", start = list(df1 = 1, df2 = 2)) :
the function mle failed to estimate the parameters,
with the error code 100
That means something went wrong right away, not in the middle of the optimization process.
Taking a guess, I looked and saw that there was a zero value in your data (so your statement that all the elements are positive is not technically correct -- they're all non-negative ...). The F distribution has an infinite value at 0: df(0,1,2) is Inf.
If I exclude the zero value, I get an answer ...
f <- fitdist(na.exclude(VH[VH>0]),"f", start =list(df1=1, df2=2))
... the estimated value for the second shape parameter is very large (approx. 6e6, with a big uncertainty), but seems to fit OK ...
par(las=1); hist(VH,freq=FALSE,col="gray")
curve(df(x,1.37,6.45e6),add=TRUE)

text to expression in function of variance estimation of derived parameters via Delta Method

I have written a function to perform matrix multiplication on each row of the data set pd.matrix. The function my.var.function performs as intended. However, now I want to generalize the function to handle matrices of variable sizes instead of just the example matrix with five columns.
To generalize the function I imagine that I will need to replace x[1], x[2], x[3], x[4], x[5] in the apply statement with something like x[1]:x[ncol(pd.matrix)]. I imagine I similarly will need to replace the two instances of (x1, x2, x3, x4, x5) within the function.
I have tried making these changes with eval(parse(text= followed by paste0 to create the desired series of x1, x2, x3, x4, x5 or x[1], x[2], x[3], x[4], x[5] for this example. However, I have been unable to get eval(parse(text= to work after trying numerous permutations.
How can I generalize the function and apply statement to handle a pd.matrix of n columns rather than five columns?
pd.matrix <- matrix(c(0.10, 0.20, 0.30, 0.40, 0.50,
0.11, 0.21, 0.31, 0.41, 0.51,
0.12, 0.22, 0.32, 0.42, 0.52,
0.13, 0.23, 0.33, 0.43, 0.53,
0.14, 0.24, 0.34, 0.44, 0.54), nrow = 5, byrow = TRUE)
vcv.mat = matrix(c(0.01, 0.0020, 0.0030, 0.0040, 0.0050,
0.0020, 0.02, 0.0031, 0.0041, 0.0051,
0.0030, 0.0031, 0.03, 0.0042, 0.0052,
0.0040, 0.0041, 0.0042, 0.04, 0.0053,
0.0050, 0.0051, 0.0052, 0.0053, 0.05), nrow = 5, byrow = TRUE)
my.var.function <- function(x1, x2, x3, x4, x5) {
my.pd <- matrix(c(x1, x2, x3, x4, x5), nrow = 1)
my.mat = my.pd %*% vcv.mat
my.var = my.mat %*% t(my.pd)
return(my.var = my.var)
}
apply(pd.matrix, 1, function(x) my.var.function(x[1], x[2], x[3], x[4], x[5]))
# [1] 0.0303160 0.0319642 0.0336588 0.0353998 0.0371872
The solution turned out to be very simple. Not sure why I did not see this solution before.
pd.matrix <- matrix(c(0.10, 0.20, 0.30, 0.40, 0.50,
0.11, 0.21, 0.31, 0.41, 0.51,
0.12, 0.22, 0.32, 0.42, 0.52,
0.13, 0.23, 0.33, 0.43, 0.53,
0.14, 0.24, 0.34, 0.44, 0.54), nrow = 5, byrow = TRUE)
vcv.mat = matrix(c(0.01, 0.0020, 0.0030, 0.0040, 0.0050,
0.0020, 0.02, 0.0031, 0.0041, 0.0051,
0.0030, 0.0031, 0.03, 0.0042, 0.0052,
0.0040, 0.0041, 0.0042, 0.04, 0.0053,
0.0050, 0.0051, 0.0052, 0.0053, 0.05), nrow = 5, byrow = TRUE)
my.var.function <- function(x) {
my.pd <- matrix(c(x), nrow = 1)
my.mat = my.pd %*% vcv.mat
my.var = my.mat %*% t(my.pd)
return(my.var = my.var)
}
apply(pd.matrix, 1, function(x) my.var.function(x))
# [1] 0.0303160 0.0319642 0.0336588 0.0353998 0.0371872

Resources