Output selected variables to global environment R function - r

I have function which is an extension of an earlier question here
Function to calculate median by column to an R dataframe that is done regularly to multiple dataframes
my function below
library(outliers)
MscoreMax <- 3
scores_na <- function(x, ...) {
not_na <- !is.na(x)
scores <- rep(NA, length(x))
scores[not_na] <- outliers::scores(na.omit(x), ...)
scores
}
mediansFunction <- function(x){
labmedians <- sapply(x[-1], median)
median_of_median <- median(labmedians)
grand_median <- median(as.matrix(x[-1]))
labMscore <- as.vector(round(abs(scores_na(labmedians, "mad")), digits = 2)) #calculate mscore by lab
labMscoreIndex <- which(labMscore > MscoreMax) #get the position in the vector that exceeds Mscoremax
x[-1][labMscoreIndex] <- NA # discharge values above threshold by making NA
return(x)
}
the function has the desired outcome of converting my Mscore values above the threshold to NA. However, I would like to send
labmedians
grand_median
labMscore
As their own variables to the global environment from within the function, but not as a list of items as 3 variables. Can i do this or is better to create a second function which is slightly different that sends the variables to the global environment as a function then use list2env outside the function afterwards to extract the variables as seperate items?
my df below
structure(list(Determination_No = 1:6, `2` = c(0.08, 0.08, 0.08,
0.08, 0.08, 0.08), `3` = c(0.08, 0.07, 0.07, 0.08, 0.07, 0.07
), `4` = c(0.07, 0.08, 0.08, 0.08, 0.07, 0.08), `5` = c(0.08,
0.08, 0.08, 0.08, 0.09, 0.09), `7` = c(0.09, 0.09, 0.11, 0.1,
0.1, 0.1), `8` = c(0.086, 0.087, 0.086, 0.09, 0.083, 0.079),
`10` = c(0.049748274, 0.049748274, 0.066331032, 0.066331032,
0.066331032, 0.049748274), `12` = c(0.086, 0.078, 0.078,
0.077, 0.077, 0.068)), class = "data.frame", row.names = c(NA,
-6L))

It is not recommended to write to global environment from inside the function. If you want to create multiple objects in the global environment return a named list from the function and use list2env.
mediansFunction <- function(x){
labmedians <- sapply(x[-1], median)
median_of_median <- median(labmedians)
grand_median <- median(as.matrix(x[-1]))
labMscore <- as.vector(round(abs(scores_na(labmedians, "mad")), digits = 2)) #calculate mscore by lab
labMscoreIndex <- which(labMscore > MscoreMax) #get the position in the vector that exceeds Mscoremax
x[-1][labMscoreIndex] <- NA # discharge values above threshold by making NA
dplyr::lst(data = x, labmedians, grand_median, labMscore)
}
result <- mediansFunction(df)
list2env(result, .GlobalEnv)
Now you have variables data, labmedians, grand_median and labMscore in the global environment.

Related

Split data by difference between rows values

I'm looking to split dataset per 10 days properly. The step between days is not alway 1 : could be 2 in the case of -149 -> -147
Is there any way smarter than test every time difference between days and register begin and end indexes for split ?
df = structure(list(day = c(-155, -153, -152, -151, -150, -149, -147,
-146, -145, -144, -143, -142, -141, -140, -139, -138, -137, -135,
-134, -131), margin = c(0.02, 0.03, 0.065, 0.06, 0.07, 0.05,
0.035, 0.06, 0.0266666666666667, 0.03, 0.04, 0.06, 0.0366666666666667,
0.035, 0.09, 0.12, 0.045, 0.04, 0.02, 0.06)), row.names = c(NA,
-20L), class = c("tbl_df", "tbl", "data.frame"))
An option is to get the diff of adjacent elements of 'day' column, then do the cumulative sum (cumsum), to create a the diff column, use that column to create a grouping with %/% for splitting at each 10 value as this returns an index that increments on every 10, then use that column in group_split to split the data into list of data.frames
library(dplyr)
df %>%
mutate(diff = cumsum(c(0, diff(day))),
diff = pmax(0, (diff - 1)) %/% 10) %>%
group_split(diff, .keep = FALSE)

How to calculate ratios and normalized ratios in all possible combinations in R?

I want to calculate normalised ratios and simple ratios in all possible combinations in R. This is the sample dataset
df = structure(list(var_1 = c(0.035, 0.047, 0.004, 0.011, 0.01, 0.01,
0.024), var_2 = c(0.034, 0.047, 0.004, 0.012, 0.01, 0.011, 0.025
), var_3 = c(0.034, 0.047, 0.006, 0.013, 0.011, 0.013, 0.026),
var_4 = c(0.034, 0.046, 0.008, 0.016, 0.014, 0.015, 0.028
), var_5 = c(0.034, 0.046, 0.009, 0.017, 0.015, 0.016, 0.029
)), class = "data.frame", row.names = c(NA, -7L))
I could able to calculate simple ratios in all possible combinations after taking help from this.
do.call("cbind", lapply(seq_along(df), function(y) apply(df, 2, function(x) df[[y]]/x)))
But I am unable to calculate normalised ratios i.e. (xj - xi)/(xj + xi) and how to name each calculated ratios properly?
Perhaps, you can try nested lapply to get all the combinations :
cols <- 1:ncol(df)
mat <- do.call(cbind, lapply(cols, function(xj)
sapply(cols, function(xi) (df[, xj] - df[, xi])/(df[, xj] + df[, xi]))))
To assign column names, we can use outer
colnames(mat) <- outer(names(df), names(df), paste0)
Thinking about it I think we can directly manipulate this using column indexes.
cols <- 1:ncol(df)
temp <- expand.grid(cols, cols)
new_data <- (df[,temp[,2]] - df[,temp[,1]])/(df[,temp[,2]] + df[,temp[,1]])
We could do this more easily with outer alone
f1 <- function(i, j) (df[, i] - df[, j])/(df[, i] + df[, j])
out <- outer(seq_along(df), seq_along(df), FUN = f1)
colnames(out) <- outer(names(df), names(df), paste0)

Problem assigning probabilities in sample function

I have generated a bunch of variables through the following :
max_no=10
list2env(setNames(as.list(c(0.2, 0.25,0.15, 0.1, 0.1, 0.05, 0.03, 0.06, 0.03, 0.02, 0.01)), paste0("proportion", 0:max_no)), envir = .GlobalEnv)
These generate objects like "proportion0", "proportion1",..., "proportion10" with values 0.2, 0.25.... etc.
I want to plug these proportion values as a vector of probabilities into the sample function like the following:
sample(seq.int(0, max_no, 1), size=10000, replace=TRUE, prob=c(paste0("proportion", 0:max_no)))
But I get the error message: Error in sample.int(length(x), size, replace, prob) : NA in probability vector
What would be a simple way of feeding in the probabilities?
I am not sure what is the use-case for this but you can use mget and unlist
sample(seq.int(0, max_no, 1), size=10000, replace=TRUE,
prob=unlist(mget(c(paste0("proportion", 0:max_no)))))
Why not pass probability directly instead of storing each number in a different variable?
sample(seq.int(0, max_no, 1), size=10000, replace=TRUE,
prob=c(0.2, 0.25,0.15, 0.1, 0.1, 0.05, 0.03, 0.06, 0.03, 0.02, 0.01))

How to store the output into a list of matrices

Data:
x <- seq(0, 1, len = 1024)
pos <- c(0.1, 0.13, 0.15, 0.23, 0.25, 0.40, 0.44, 0.65, 0.76, 0.78, 0.81)
hgt <- c(4, 5, 3, 4, 5, 4.2, 2.1, 4.3, 3.1, 5.1, 4.2)
wdt <- c(0.005, 0.005, 0.006, 0.01, 0.01, 0.03, 0.01, 0.01, 0.005, 0.008, 0.005)
pSignal <- numeric(length(x))
for (i in seq(along=pos)) {
pSignal <- pSignal + hgt[i]/(1 + abs((x - pos[i])/wdt[i]))^4
}
df = as.data.frame(rbind(pSignal,pSignal,pSignal))
dflist=list(df,df,df)
I'm trying to run this pracma package's findpeaks() function to find the local maxima of each row in each data.frame in the list, dflist. The output is a N x 4 array. N = the number of peaks. So in the first row of the first data.frame if it finds 4 peaks, it will be a 4x4 matrix. My goal is to loop this function over every row in each data.frame and store the matrix that is output in a list.
My code:
## Find Peaks
pks=list()
for (i in 1:length(dflist)){
for (j in 1:length(dflist[[i]])){
row = dflist[[i]][j,]
temppks = findpeaks(as.vector(row,mode='numeric')
,minpeakheight = 1.1,nups=2)
pks[i][[j]]=rbind(pks,temppks)
}
}
This doesn't seem to be doing quite what I want it too. any ideas?
A combination of apply() and sapply() could do the work:
my.f.row <- function(row) findpeaks(as.vector(row,mode='numeric'), minpeakheight = 1.1, nups=2)
sapply(dflist, function(df.i) apply(df.i, 1, my.f.row))
eventually you have to reorganize the result.

Add na.omit as argument in a function

I have the following list of pvalues
pval.list <- list(list(a=c(0.05, 0.0001, 0.32, 0.45), b=c(0.1,0.12,0.01,0.06), c=c(0.1,0.12,0.01,0.06), d=c(0.01,0.02,0.03,0.04)),
list(e=c(0.04, NA, 0.232, 0.245), f=c(0.05, 0.01, 0.22, 0.54), g=c(0.005, 0.1, 0.032, 0.045)),
list(h=c(0.03, 0.01, NA, 0.4), i=c(0.5, 0.0001, 0.132, 0.045), j=c(0.005, 0.0001, 0.0032, 0.045), k=c(0.5, 0.1, 0.932, 0.545)),
list(l=c(0.022, NA, 0.32, 0.45), m=c(0.0589, 0.0001, NA, 0.0045)),
list(n=c(0.051, 0.01, 0.32, 0.45), o=c(0.05, 0.0001, 0.32, 0.45), p=c(0.05, 0.0001, 0.32, 0.45), q=c(0.05, 0.0001, NA, 0.45)),
list(r=c(NA, 0.001, 0.32, 0.45), s=c(0.05, 0.0001, NA, 0.45), t=c(0.05, 0.0001, 0.32, 0.45)))
I'm trying to apply a function (see below) to this list:
Fisher.test <- function(p) {
Xsq <- -2*sum(log(p))
p.val <- 1-pchisq(Xsq, df = 2*length(p))
return(p.val)
}
Following the indications from #G.Grothendieck I have used this command: lapply(lapply(pval.list, Reduce, f = cbind), apply, 1, Fisher.test) however when there are missing values in the list, the approach do not work. I have tried incorporating na.omit as argument to lapply but it does not fix the problem.
Should I add na.omit as an argument in Fisher.test function?
Thanks
Your NAs are produced in sum. If you pass the argument na.rm=TRUE, you won't have NAs in your result:
Xsq <- -2*sum(log(p), na.rm=TRUE)
You could make an na.rm argument for Fisher.test, and pass true or false from lapply, but when you always want to exclude NAs you should just set the argument in the function.
Calling the function on your pvalues:
lapply(lapply(pval.list, Reduce, f = cbind), apply, 1, Fisher.test)
gives:
[[1]]
[1] 1.953968e-03 2.999509e-05 5.320831e-04 1.339104e-02
[[2]]
[1] 0.0007878665 0.0317662968 0.0457435481 0.1146067577
[[3]]
[1] 8.982382e-03 3.055250e-08 4.719676e-02 5.094879e-02
[[4]]
[1] 0.009911091 0.001021034 0.684618971 0.014584426
[[5]]
[1] 2.357950e-03 6.135981e-11 5.543601e-01 6.038488e-01
[[6]]
[1] 6.235072e-02 3.470933e-09 6.016501e-01 5.708789e-01
All NA values are now removed before calculating the sum.
Does this do what you want?

Resources